Samuel Rönnqvist, Niko Schenk, Christian Chiarcos
Comments: To appear at ACL2017, code available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We introduce an attention-based Bi-LSTM for Chinese implicit discourse
relations and demonstrate that modeling argument pairs as a joint sequence can
outperform word order-agnostic approaches. Our model benefits from a partial
sampling scheme and is conceptually simple, yet achieves state-of-the-art
performance on the Chinese Discourse Treebank. We also visualize its attention
activity to illustrate the model’s ability to selectively focus on the relevant
parts of an input sequence.
Quynh Nguyen, Matthias Hein
Comments: 14 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
While the optimization problem behind deep neural networks is highly
non-convex, it is frequently observed in practice that training deep networks
seems possible without getting stuck in suboptimal points. It has been argued
that this is the case as all local minima are close to being globally optimal.
We show that this is (almost) true, in fact almost all local minima are
globally optimal, for a fully connected network with squared loss and analytic
activation function given that the number of hidden units of one layer of the
network is larger than the number of training points and the network structure
from this layer on is pyramidal.
Mariusz Bojarski, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, Urs Muller
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
As part of a complete software stack for autonomous driving, NVIDIA has
created a neural-network-based system, known as PilotNet, which outputs
steering angles given images of the road ahead. PilotNet is trained using road
images paired with the steering angles generated by a human driving a
data-collection car. It derives the necessary domain knowledge by observing
human drivers. This eliminates the need for human engineers to anticipate what
is important in an image and foresee all the necessary rules for safe driving.
Road tests demonstrated that PilotNet can successfully perform lane keeping in
a wide variety of driving conditions, regardless of whether lane markings are
present or not.
The goal of the work described here is to explain what PilotNet learns and
how it makes its decisions. To this end we developed a method for determining
which elements in the road image most influence PilotNet’s steering decision.
Results show that PilotNet indeed learns to recognize relevant objects on the
road.
In addition to learning the obvious features such as lane markings, edges of
roads, and other cars, PilotNet learns more subtle features that would be hard
to anticipate and program by engineers, for example, bushes lining the edge of
the road and atypical vehicle classes.
Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Visual Question Answering (VQA) has received a lot of attention over the past
couple of years. A number of deep learning models have been proposed for this
task. However, it has been shown that these models are heavily driven by
superficial correlations in the training data and lack compositionality — the
ability to answer questions about unseen compositions of seen concepts. This
compositionality is desirable and central to intelligence. In this paper, we
propose a new setting for Visual Question Answering where the test
question-answer pairs are compositionally novel compared to training
question-answer pairs. To facilitate developing models under this setting, we
present a new compositional split of the VQA v1.0 dataset, which we call
Compositional VQA (C-VQA). We analyze the distribution of questions and answers
in the C-VQA splits. Finally, we evaluate several existing VQA models under
this new setting and show that the performances of these models degrade by a
significant amount compared to the original VQA setting.
Ke Wei, Ke Yin, Xue-Cheng Tai, Tony F. Chan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose an effective framework for multi-phase image segmentation and
semi-supervised data clustering by introducing a novel region force term into
the Potts model. Assume the probability that a pixel or a data point belongs to
each class is known a priori. We show that the corresponding indicator function
obeys the Bernoulli distribution and the new region force function can be
computed as the negative log-likelihood function under the Bernoulli
distribution. We solve the Potts model by the primal-dual hybrid gradient
method and the augmented Lagrangian method, which are based on two different
dual problems of the same primal problem. Empirical evaluations of the Potts
model with the new region force function on benchmark problems show that it is
competitive with existing variational methods in both image segmentation and
semi-supervised data clustering.
Ling-Yu Duan, Vijay Chandrasekhar, Shiqi Wang, Yihang Lou, Jie Lin, Yan Bai, Tiejun Huang, Alex Chichung Kot, Wen Gao
Comments: 4 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper provides an overview of the on-going compact descriptors for video
analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG).
MPEG-CDVA targets at defining a standardized bitstream syntax to enable
interoperability in the context of video analysis applications. During the
developments of MPEGCDVA, a series of techniques aiming to reduce the
descriptor size and improve the video representation ability have been
proposed. This article describes the new standard that is being developed and
reports the performance of these key technical contributions.
Mohammadreza Soltaninejad, Lei Zhang, Tryphon Lambrou, Nigel Allinson, Xujiong Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this paper, we propose a novel learning based method for automated
segmenta-tion of brain tumor in multimodal MRI images. The machine learned
features from fully convolutional neural network (FCN) and hand-designed texton
fea-tures are used to classify the MRI image voxels. The score map with
pixel-wise predictions is used as a feature map which is learned from
multimodal MRI train-ing dataset using the FCN. The learned features are then
applied to random for-ests to classify each MRI image voxel into normal brain
tissues and different parts of tumor. The method was evaluated on BRATS 2013
challenge dataset. The results show that the application of the random forest
classifier to multimodal MRI images using machine-learned features based on FCN
and hand-designed features based on textons provides promising segmentations.
The Dice overlap measure for automatic brain tumor segmentation against ground
truth is 0.88, 080 and 0.73 for complete tumor, core and enhancing tumor,
respectively.
Jie Luo, Karteek Popuri, Dana Cobzas, Hongyi Ding, William M. Wells, Masashi Sugiyama
Comments: raw version
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Being a task of establishing spatial correspondences, medical image
registration is often formalized as finding the optimal transformation that
best aligns two images. Since the transformation is such an essential component
of registration, most existing researches conventionally quantify the
registration uncertainty, which is the confidence in the estimated spatial
correspondences, by the transformation uncertainty. In this paper, we give
concrete examples and reveal that using the transformation uncertainty to
quantify the registration uncertainty is inappropriate and sometimes
misleading. Based on this finding, we also raise attention to an important yet
subtle aspect of probabilistic image registration, that is whether it is
reasonable to determine the correspondence of a registered voxel solely by the
mode of its transformation distribution.
Badre Munir
Comments: 4 pages, 1 figure, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Among the patch-based image denoising processing methods, smooth ordering of
local patches (patch ordering) has been shown to give state-of-art results. For
image denoising the patch ordering method forms two large TSPs (Traveling
Salesman Problem) comprised of nodes in N-dimensional space. Ten approximate
solutions of the two large TSPs are then used in a filtering process to form
the reconstructed image. Use of large TSPs makes patch ordering a
computationally intensive method. A modified patch ordering method for image
denoising is proposed. In the proposed method, several smaller-sized TSPs are
formed and the filtering process varied to work with solutions of these smaller
TSPs. In terms of PSNR, denoising results of the proposed method differed by
0.032 dB to 0.016 dB on average. In original method, solving TSPs was observed
to consume 85% of execution time. In proposed method, the time for solving TSPs
can be reduced to half of the time required in original method. The proposed
method can denoise images in 40% less time.
Fabio Maria Carlucci, Lorenzo Porzi, Barbara Caputo, Elisa Ricci, Samuel Rota Bulò
Comments: arXiv admin note: substantial text overlap with arXiv:1702.06332
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Classifiers trained on given databases perform poorly when tested on data
acquired in different settings. This is explained in domain adaptation through
a shift among distributions of the source and target domains. Attempts to align
them have traditionally resulted in works reducing the domain shift by
introducing appropriate loss terms, measuring the discrepancies between source
and target distributions, in the objective function. Here we take a different
route, proposing to align the learned representations by embedding in any given
network specific Domain Alignment Layers, designed to match the source and
target feature distributions to a reference one. Opposite to previous works
which define a priori in which layers adaptation should be performed, our
method is able to automatically learn the degree of feature alignment required
at different levels of the deep network. Thorough experiments on different
public benchmarks, in the unsupervised setting, confirm the power of our
approach.
Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song
Comments: To appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper addresses deep face recognition (FR) problem under open-set
protocol, where ideal face features are expected to have smaller maximal
intra-class distance than minimal inter-class distance under a suitably chosen
metric space. However, few existing algorithms can effectively achieve this
criterion. To this end, we propose the angular softmax (A-Softmax) loss that
enables convolutional neural networks (CNNs) to learn angularly discriminative
features. Geometrically, A-Softmax loss can be viewed as imposing
discriminative constraints on a hypersphere manifold, which intrinsically
matches the prior that faces also lie on a manifold. Moreover, the size of
angular margin can be quantitatively adjusted by a parameter m. We further
derive specific (m) to approximate the ideal feature criterion. Extensive
analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF)
and MegaFace Challenge 1 show the superiority of A-Softmax loss in FR tasks.
Adriana Fernandez-Lopez, Federico M. Sukno
Comments: International Conference on Computer Vision Theory and Applications (VISAPP 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Speech is the most common communication method between humans and involves
the perception of both auditory and visual channels. Automatic speech
recognition focuses on interpreting the audio signals, but it has been
demonstrated that video can provide information that is complementary to the
audio. Thus, the study of automatic lip-reading is important and is still an
open problem. One of the key challenges is the definition of the visual
elementary units (the visemes) and their vocabulary. Many researchers have
analyzed the importance of the phoneme to viseme mapping and have proposed
viseme vocabularies with lengths between 11 and 15 visemes. These viseme
vocabularies have usually been manually defined by their linguistic properties
and in some cases using decision trees or clustering techniques. In this work,
we focus on the automatic construction of an optimal viseme vocabulary based on
the association of phonemes with similar appearance. To this end, we construct
an automatic system that uses local appearance descriptors to extract the main
characteristics of the mouth region and HMMs to model the statistic relations
of both viseme and phoneme sequences. To compare the performance of the system
different descriptors (PCA, DCT and SIFT) are analyzed. We test our system in a
Spanish corpus of continuous speech. Our results indicate that we are able to
recognize approximately 58% of the visemes, 47% of the phonemes and 23% of the
words in a continuous speech scenario and that the optimal viseme vocabulary
for Spanish is composed by 20 visemes.
Qier Meng, Takayuki Kitasaka, Masahiro Oda, Junji Ueno, Kensaku Mori
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Some lung diseases are related to bronchial airway structures and morphology.
Although airway segmentation from chest CT volumes is an important task in the
computer-aided diagnosis and surgery assistance systems for the chest, complete
3-D airway structure segmentation is a quite challenging task due to its
complex tree-like structure. In this paper, we propose a new airway
segmentation method from 3D chest CT volumes based on volume of interests (VOI)
using gradient vector flow (GVF). This method segments the bronchial regions by
applying the cavity enhancement filter (CEF) to trace the bronchial tree
structure from the trachea. It uses the CEF in the VOI to segment each branch.
And a tube-likeness function based on GVF and the GVF magnitude map in each VOI
are utilized to assist predicting the positions and directions of child
branches. By calculating the tube-likeness function based on GVF and the GVF
magnitude map, the airway-like candidate structures are identified and their
centrelines are extracted. Based on the extracted centrelines, we can detect
the branch points of the bifurcations and directions of the airway branches in
the next level. At the same time, a leakage detection is performed to avoid the
leakage by analysing the pixel information and the shape information of airway
candidate regions extracted in the VOI. Finally, we unify all of the extracted
bronchial regions to form an integrated airway tree. Preliminary experiments
using four cases of chest CT volumes demonstrated that the proposed method can
extract more bronchial branches in comparison with other methods.
Adriana Fernandez-Lopez, Oriol Martinez, Federico M. Sukno
Comments: IEEE International Conference on Automatic Face and Gesture Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Speech is the most used communication method between humans and it involves
the perception of auditory and visual channels. Automatic speech recognition
focuses on interpreting the audio signals, although the video can provide
information that is complementary to the audio. Exploiting the visual
information, however, has proven challenging. On one hand, researchers have
reported that the mapping between phonemes and visemes (visual units) is
one-to-many because there are phonemes which are visually similar and
indistinguishable between them. On the other hand, it is known that some people
are very good lip-readers (e.g: deaf people). We study the limit of visual only
speech recognition in controlled conditions. With this goal, we designed a new
database in which the speakers are aware of being read and aim to facilitate
lip-reading. In the literature, there are discrepancies on whether
hearing-impaired people are better lip-readers than normal-hearing people.
Then, we analyze if there are differences between the lip-reading abilities of
9 hearing-impaired and 15 normal-hearing people. Finally, human abilities are
compared with the performance of a visual automatic speech recognition system.
In our tests, hearing-impaired participants outperformed the normal-hearing
participants but without reaching statistical significance. Human observers
were able to decode 44% of the spoken message. In contrast, the visual only
automatic system achieved 20% of word recognition rate. However, if we repeat
the comparison in terms of phonemes both obtained very similar recognition
rates, just above 50%. This suggests that the gap between human lip-reading and
automatic speech-reading might be more related to the use of context than to
the ability to interpret mouth appearance.
Tejal Bhamre, Teng Zhang, Amit Singer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Biomolecules (q-bio.BM); Applications (stat.AP); Methodology (stat.ME)
The missing phase problem in X-ray crystallography is commonly solved using
the technique of molecular replacement, which borrows phases from a previously
solved homologous structure, and appends them to the measured Fourier
magnitudes of the diffraction patterns of the unknown structure. More recently,
molecular replacement has been proposed for solving the missing orthogonal
matrices problem arising in Kam’s autocorrelation analysis for single particle
reconstruction using X-ray free electron lasers and cryo-EM. In classical
molecular replacement, it is common to estimate the magnitudes of the unknown
structure as twice the measured magnitudes minus the magnitudes of the
homologous structure, a procedure known as `twicing’. Mathematically, this is
equivalent to finding an unbiased estimator for a complex-valued scalar. We
generalize this scheme for the case of estimating real or complex valued
matrices arising in single particle autocorrelation analysis. We name this
approach “Anisotropic Twicing” because unlike the scalar case, the unbiased
estimator is not obtained by a simple magnitude isotropic correction. We
compare the performance of the least squares, twicing and anisotropic twicing
estimators on synthetic and experimental datasets. We demonstrate 3D homology
modeling in cryo-EM directly from experimental data without iterative
refinement or class averaging, for the first time.
James M. Murphy, Mauro Maggioni
Comments: 42 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The problem of unsupervised learning and segmentation of hyperspectral images
is a significant challenge in remote sensing. The high dimensionality of
hyperspectral data, presence of substantial noise, and overlap of classes all
contribute to the difficulty of automatically segment and cluster hyperspectral
image. In this article, we propose an unsupervised learning technique that
combines a density-based estimation of class modes with partial least squares
regression (PLSR) on the learned modes. The density-based learning incorporates
the geometry of the hyperspectral data by using diffusion distance to promote
learning a unique mode from each class. These class modes are then used to
generate class cores which approximate training labels. Partial least squares
regression using these learned class modes as labeled training points
consequently determines a labeling of the entire dataset. The proposed method
is shown to perform competitively against state-of-the-art clustering and
dimension reduction methods, and often achieves performance comparable to fully
supervised PLSR.
Masataka Yamaguchi, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we address the problem of spatio-temporal person retrieval
from multiple videos using a natural language query, in which we output a tube
(i.e., a sequence of bounding boxes) which encloses the person described by the
query. For this problem, we introduce a novel dataset consisting of videos
containing people annotated with bounding boxes for each second and with five
natural language descriptions. To retrieve the tube of the person described by
a given natural language query, we design a model that combines methods for
spatio-temporal human detection and multimodal retrieval. We conduct
comprehensive experiments to compare a variety of tube and text representations
and multimodal retrieval methods, and present a strong baseline in this task as
well as demonstrate the efficacy of our tube representation and multimodal
feature embedding technique. Finally, we demonstrate the versatility of our
model by applying it to two other important tasks.
Mariusz Bojarski, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, Urs Muller
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
As part of a complete software stack for autonomous driving, NVIDIA has
created a neural-network-based system, known as PilotNet, which outputs
steering angles given images of the road ahead. PilotNet is trained using road
images paired with the steering angles generated by a human driving a
data-collection car. It derives the necessary domain knowledge by observing
human drivers. This eliminates the need for human engineers to anticipate what
is important in an image and foresee all the necessary rules for safe driving.
Road tests demonstrated that PilotNet can successfully perform lane keeping in
a wide variety of driving conditions, regardless of whether lane markings are
present or not.
The goal of the work described here is to explain what PilotNet learns and
how it makes its decisions. To this end we developed a method for determining
which elements in the road image most influence PilotNet’s steering decision.
Results show that PilotNet indeed learns to recognize relevant objects on the
road.
In addition to learning the obvious features such as lane markings, edges of
roads, and other cars, PilotNet learns more subtle features that would be hard
to anticipate and program by engineers, for example, bushes lining the edge of
the road and atypical vehicle classes.
Andrés Romero, Juan León, Pablo Arbeláez
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel convolutional neural network architecture to address the
fine-grained recognition problem of multi-view dynamic facial action unit
detection. We leverage recent gains in large-scale object recognition by
formulating the task of predicting the presence or absence of a specific action
unit in a still image of a human face as holistic classification. We then
explore the design space of our approach by considering both shared and
independent representations for separate action units, and also different CNN
architectures for combining color and motion information. We then move to the
novel setup of the FERA 2017 Challenge, in which we propose a multi-view
extension of our approach that operates by first predicting the viewpoint from
which the video was taken, and then evaluating an ensemble of action unit
detectors that were trained for that specific viewpoint. Our approach is
holistic, efficient, and modular, since new action units can be easily included
in the overall system. Our approach significantly outperforms the baseline of
the FERA 2017 Challenge, which was the previous state-of-the-art in multi-view
dynamic action unit detection, with an absolute improvement of 14%.
Arjun Chandrasekaran, Devi Parikh, Mohit Bansal
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Wit is a quintessential form of rich inter-human interaction, and is often
grounded in a specific situation (e.g., a comment in response to an event). In
this work, we attempt to build computational models that can produce witty
descriptions for a given image. Inspired by a cognitive account of humor
appreciation, we employ linguistic wordplay, specifically puns. We compare our
approach against meaningful baseline approaches via human studies. In a Turing
test style evaluation, people find our model’s description for an image to be
wittier than a human’s witty description 55% of the time!
Yotam Hechtlinger, Purvasha Chakravarti, Jining Qin
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
This paper introduces a generalization of Convolutional Neural Networks
(CNNs) from low-dimensional grid data, such as images, to graph-structured
data. We propose a novel spatial convolution utilizing a random walk to uncover
the relations within the input, analogous to the way the standard convolution
uses the spatial neighborhood of a pixel on the grid. The convolution has an
intuitive interpretation, is efficient and scalable and can also be used on
data with varying graph structure. Furthermore, this generalization can be
applied to many standard regression or classification problems, by learning the
the underlying graph. We empirically demonstrate the performance of the
proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular
activity data set.
Quynh Nguyen, Matthias Hein
Comments: 14 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
While the optimization problem behind deep neural networks is highly
non-convex, it is frequently observed in practice that training deep networks
seems possible without getting stuck in suboptimal points. It has been argued
that this is the case as all local minima are close to being globally optimal.
We show that this is (almost) true, in fact almost all local minima are
globally optimal, for a fully connected network with squared loss and analytic
activation function given that the number of hidden units of one layer of the
network is larger than the number of training points and the network structure
from this layer on is pyramidal.
Johan Ekekrantz, John Folkesson, Patric Jensfelt
Comments: 10 pages, 7 figures, 1 table
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
In this paper we introduce an adaptive cost function for pointcloud
registration. The algorithm automatically estimates the sensor noise, which is
important for generalization across different sensors and environments. Through
experiments on real and synthetic data, we show significant improvements in
accuracy and robustness over state-of-the-art solutions.
Francesca Abastante, Salvatore Corrente, Salvatore Greco, Alessio Ishizaka, Isabella Lami
Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
We propose a development of the Analytic Hierarchy Process (AHP) permitting
to use the methodology also in cases of decision problems with a very large
number of alternatives evaluated with respect to several criteria. While the
application of the original AHP method involves many pairwise comparisons
between alternatives and criteria, our proposal is composed of three steps: (i)
direct evaluation of the alternatives at hand on the considered criteria, (ii)
selection of some reference evaluations; (iii) application of the original AHP
method to reference evaluations; (iv) revision of the direct evaluation on the
basis of the prioritization supplied by AHP on reference evaluations. The new
proposal has been tested and validated in an experiment conducted on a sample
of university students. The new methodology has been therefore applied to a
real world problem involving the evaluation of 21 Social Housing initiatives
sited in the Piedmont region (Italy). To take into account interaction between
criteria, the Choquet integral preference model has been considered within a
Non Additive Robust Ordinal Regression approach.
Steven Meyer
Comments: 8 Pages
Subjects: Artificial Intelligence (cs.AI)
The area of computation called artificial intelligence (AI) is falsified by
describing a previous 1972 falsification of AI by British applied mathematician
James Lighthill. It is explained how Lighthill’s arguments continue to apply to
current AI. It is argued that AI should use the Popperian scientific method in
which it is the duty of every scientist to attempt to falsify theories and if
theories are falsified to replace or modify them. The paper describes the
Popperian method in detail and discusses Paul Nurse’s application of the method
to cell biology that also involves questions of mechanism and behavior.
Arguments used by Lighthill in his original 1972 report that falsifed AI are
discussed. The Lighthill arguments are then shown to apply to current AI. The
argument uses recent scholarship to explain Lighthill’s assumptions and to show
how the arguments based on those assumptions continue to falsify modern AI. An
iimportant focus of the argument involves Hilbert’s philosophical programme
that defined knowledge and truth as provable formal sentences. Current AI takes
the Hilbert programme as dogma beyond criticism while Lighthill as a mid 20th
century applied mathematician had abandoned it. The paper uses recent
scholarship to explain John von Neumann’s criticism of AI that I claim was
assumed by Lighthill. The paper discusses computer chess programs to show
Lighthill’s combinatorial explosion still applies to AI but not humans. An
argument showing that Turing Machines (TM) are not the correct description of
computation is given. The paper concludes by advocating studying computation as
Peter Naur’s Dataology.
Yi Zhou
Subjects: Artificial Intelligence (cs.AI)
In this extended abstract, we propose Structured Production Systems (SPS),
which extend traditional production systems with well-formed syntactic
structures. Due to the richness of structures, structured production systems
significantly enhance the expressive power as well as the flexibility of
production systems, for instance, to handle uncertainty. We show that different
rule application strategies can be reduced into the basic one by utilizing
structures. Also, many fundamental approaches in computer science, including
automata, grammar and logic, can be captured by structured production systems.
Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, Percy Liang
Comments: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (2017)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Our goal is to learn a semantic parser that maps natural language utterances
into executable programs when only indirect supervision is available: examples
are labeled with the correct execution result, but not the program itself.
Consequently, we must search the space of programs for those that output the
correct result, while not being misled by spurious programs: incorrect programs
that coincidentally output the correct result. We connect two common learning
paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML),
and then present a new learning algorithm that combines the strengths of both.
The new algorithm guards against spurious programs by combining the systematic
search traditionally employed in MML with the randomized exploration of RL, and
by updating parameters such that probability is spread more evenly across
consistent programs. We apply our learning algorithm to a new neural semantic
parser and show significant gains over existing state-of-the-art results on a
recent context-dependent semantic parsing task.
James Brusey, Diana Hintea, Elena Gaura, Neil Beloe
Subjects: Artificial Intelligence (cs.AI)
Vehicle climate control systems aim to keep passengers thermally comfortable.
However, current systems control temperature rather than thermal comfort and
tend to be energy hungry, which is of particular concern when considering
electric vehicles. This paper poses energy-efficient vehicle comfort control as
a Markov Decision Process, which is then solved numerically using
Sarsa({lambda}) and an empirically validated, single-zone, 1D thermal model of
the cabin. The resulting controller was tested in simulation using 200 randomly
selected scenarios and found to exceed the performance of bang-bang,
proportional, simple fuzzy logic, and commercial controllers with 23%, 43%,
40%, 56% increase, respectively. Compared to the next best performing
controller, energy consumption is reduced by 13% while the proportion of time
spent thermally comfortable is increased by 23%. These results indicate that
this is a viable approach that promises to translate into substantial comfort
and energy improvements in the car.
Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Visual Question Answering (VQA) has received a lot of attention over the past
couple of years. A number of deep learning models have been proposed for this
task. However, it has been shown that these models are heavily driven by
superficial correlations in the training data and lack compositionality — the
ability to answer questions about unseen compositions of seen concepts. This
compositionality is desirable and central to intelligence. In this paper, we
propose a new setting for Visual Question Answering where the test
question-answer pairs are compositionally novel compared to training
question-answer pairs. To facilitate developing models under this setting, we
present a new compositional split of the VQA v1.0 dataset, which we call
Compositional VQA (C-VQA). We analyze the distribution of questions and answers
in the C-VQA splits. Finally, we evaluate several existing VQA models under
this new setting and show that the performances of these models degrade by a
significant amount compared to the original VQA setting.
Arjun Chandrasekaran, Devi Parikh, Mohit Bansal
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Wit is a quintessential form of rich inter-human interaction, and is often
grounded in a specific situation (e.g., a comment in response to an event). In
this work, we attempt to build computational models that can produce witty
descriptions for a given image. Inspired by a cognitive account of humor
appreciation, we employ linguistic wordplay, specifically puns. We compare our
approach against meaningful baseline approaches via human studies. In a Turing
test style evaluation, people find our model’s description for an image to be
wittier than a human’s witty description 55% of the time!
Yotam Hechtlinger, Purvasha Chakravarti, Jining Qin
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
This paper introduces a generalization of Convolutional Neural Networks
(CNNs) from low-dimensional grid data, such as images, to graph-structured
data. We propose a novel spatial convolution utilizing a random walk to uncover
the relations within the input, analogous to the way the standard convolution
uses the spatial neighborhood of a pixel on the grid. The convolution has an
intuitive interpretation, is efficient and scalable and can also be used on
data with varying graph structure. Furthermore, this generalization can be
applied to many standard regression or classification problems, by learning the
the underlying graph. We empirically demonstrate the performance of the
proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular
activity data set.
Sebastiaan J. van Zelst, Boudewijn F. van Dongen, Wil M.P. van der Aalst
Comments: Accepted for publication in “Knowledge and Information Systems; ” (Springer: this http URL)
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
The aim of process discovery, originating from the area of process mining, is
to discover a process model based on business process execution data. A
majority of process discovery techniques relies on an event log as an input. An
event log is a static source of historical data capturing the execution of a
business process. In this paper we focus on process discovery relying on online
streams of business process execution events. Learning process models from
event streams poses both challenges and opportunities, i.e. we need to handle
unlimited amounts of data using finite memory and, preferably, constant time.
We propose a generic architecture that allows for adopting several classes of
existing process discovery techniques in context of event streams. Moreover, we
provide several instantiations of the architecture, accompanied by
implementations in the process mining tool-kit ProM (this http URL).
Using these instantiations, we evaluate several dimensions of stream-based
process discovery. The evaluation shows that the proposed architecture allows
us to lift process discovery to the streaming domain.
Samuel Rönnqvist, Niko Schenk, Christian Chiarcos
Comments: To appear at ACL2017, code available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We introduce an attention-based Bi-LSTM for Chinese implicit discourse
relations and demonstrate that modeling argument pairs as a joint sequence can
outperform word order-agnostic approaches. Our model benefits from a partial
sampling scheme and is conceptually simple, yet achieves state-of-the-art
performance on the Chinese Discourse Treebank. We also visualize its attention
activity to illustrate the model’s ability to selectively focus on the relevant
parts of an input sequence.
Quynh Nguyen, Matthias Hein
Comments: 14 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
While the optimization problem behind deep neural networks is highly
non-convex, it is frequently observed in practice that training deep networks
seems possible without getting stuck in suboptimal points. It has been argued
that this is the case as all local minima are close to being globally optimal.
We show that this is (almost) true, in fact almost all local minima are
globally optimal, for a fully connected network with squared loss and analytic
activation function given that the number of hidden units of one layer of the
network is larger than the number of training points and the network structure
from this layer on is pyramidal.
Hao Liao, Manuel Sebastian Mariani, Matus Medo, Yi-Cheng Zhang, Ming-Yang Zhou
Comments: 54 pages, 16 figures
Subjects: Physics and Society (physics.soc-ph); Digital Libraries (cs.DL); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Complex networks have emerged as a simple yet powerful framework to represent
and analyze a wide range of complex systems. The problem of ranking the nodes
and the edges in complex networks is critical for a broad range of real-world
problems because it affects how we access online information and products, how
success and talent are evaluated in human activities, and how scarce resources
are allocated by companies and policymakers, among others. This calls for a
deep understanding of how existing ranking algorithms perform, and which are
their possible biases that may impair their effectiveness. Well-established
ranking algorithms (such as the popular Google’s PageRank) are static in nature
and, as a consequence, they exhibit important shortcomings when applied to real
networks that rapidly evolve in time. The recent advances in the understanding
and modeling of evolving networks have enabled the development of a wide and
diverse range of ranking algorithms that take the temporal dimension into
account. The aim of this review is to survey the existing ranking algorithms,
both static and time-aware, and their applications to evolving networks. We
emphasize both the impact of network evolution on well-established static
algorithms and the benefits from including the temporal dimension for tasks
such as prediction of real network traffic, prediction of future links, and
identification of highly-significant nodes.
Arjun Chandrasekaran, Devi Parikh, Mohit Bansal
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Wit is a quintessential form of rich inter-human interaction, and is often
grounded in a specific situation (e.g., a comment in response to an event). In
this work, we attempt to build computational models that can produce witty
descriptions for a given image. Inspired by a cognitive account of humor
appreciation, we employ linguistic wordplay, specifically puns. We compare our
approach against meaningful baseline approaches via human studies. In a Turing
test style evaluation, people find our model’s description for an image to be
wittier than a human’s witty description 55% of the time!
Samuel Rönnqvist, Niko Schenk, Christian Chiarcos
Comments: To appear at ACL2017, code available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We introduce an attention-based Bi-LSTM for Chinese implicit discourse
relations and demonstrate that modeling argument pairs as a joint sequence can
outperform word order-agnostic approaches. Our model benefits from a partial
sampling scheme and is conceptually simple, yet achieves state-of-the-art
performance on the Chinese Discourse Treebank. We also visualize its attention
activity to illustrate the model’s ability to selectively focus on the relevant
parts of an input sequence.
Leandro B. dos Santos, Edilson A. Corrêa Jr, Osvaldo N. Oliveira Jr, Diego R. Amancio, Letícia L. Mansur, Sandra M. Aluísio
Comments: Published in Annual Meeting of the Association for Computational Linguist 2017
Subjects: Computation and Language (cs.CL)
Mild Cognitive Impairment (MCI) is a mental disorder difficult to diagnose.
Linguistic features, mainly from parsers, have been used to detect MCI, but
this is not suitable for large-scale assessments. MCI disfluencies produce
non-grammatical speech that requires manual or high precision automatic
correction of transcripts. In this paper, we modeled transcripts into complex
networks and enriched them with word embedding (CNE) to better represent short
texts produced in neuropsychological assessments. The network measurements were
applied with well-known classifiers to automatically identify MCI in
transcripts, in a binary classification task. A comparison was made with the
performance of traditional approaches using Bag of Words (BoW) and linguistic
features for three datasets: DementiaBank in English, and Cinderella and
Arizona-Battery in Portuguese. Overall, CNE provided higher accuracy than using
only complex networks, while Support Vector Machine was superior to other
classifiers. CNE provided the highest accuracies for DementiaBank and
Cinderella, but BoW was more efficient for the Arizona-Battery dataset probably
owing to its short narratives. The approach using linguistic features yielded
higher accuracy if the transcriptions of the Cinderella dataset were manually
revised. Taken together, the results indicate that complex networks enriched
with embedding is promising for detecting MCI in large-scale assessments
Alexander Fonarev, Oleksii Hrinchuk, Gleb Gusev, Pavel Serdyukov, Ivan Oseledets
Comments: 9 pages, 4 figures, ACL 2017
Subjects: Computation and Language (cs.CL)
Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its
implementation in “word2vec” software, is usually optimized by stochastic
gradient descent. However, the optimization of SGNS objective can be viewed as
a problem of searching for a good matrix with the low-rank constraint. The most
standard way to solve this type of problems is to apply Riemannian optimization
framework to optimize the SGNS objective over the manifold of required low-rank
matrices. In this paper, we propose an algorithm that optimizes SGNS objective
using Riemannian optimization and demonstrates its superiority over popular
competitors, such as the original method to train SGNS and SVD over SPPMI
matrix.
Jey Han Lau, Timothy Baldwin, Trevor Cohn
Comments: 11 pages, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017) (to appear)
Subjects: Computation and Language (cs.CL)
Language models are typically applied at the sentence level, without access
to the broader document context. We present a neural language model that
incorporates document context in the form of a topic model-like architecture,
thus providing a succinct representation of the broader document context
outside of the current sentence. Experiments over a range of datasets
demonstrate that our model outperforms a pure sentence-based model in terms of
language model perplexity, and leads to topics that are potentially more
coherent than those produced by a standard LDA topic model. Our model also has
the ability to generate related sentences for a topic, providing another way to
interpret topics.
Akira Sasaki, Kazuaki Hanawa, Naoaki Okazaki, Kentaro Inui
Comments: To appear in ACL2017
Subjects: Computation and Language (cs.CL)
We present in this paper our approach for modeling inter-topic preferences of
Twitter users: for example, those who agree with the Trans-Pacific Partnership
(TPP) also agree with free trade. This kind of knowledge is useful not only for
stance detection across multiple topics but also for various real-world
applications including public opinion surveys, electoral predictions, electoral
campaigns, and online debates. In order to extract users’ preferences on
Twitter, we design linguistic patterns in which people agree and disagree about
specific topics (e.g., “A is completely wrong”). By applying these linguistic
patterns to a collection of tweets, we extract statements agreeing and
disagreeing with various topics. Inspired by previous work on item
recommendation, we formalize the task of modeling inter-topic preferences as
matrix factorization: representing users’ preferences as a user-topic matrix
and mapping both users and topics onto a latent feature space that abstracts
the preferences. Our experimental results demonstrate both that our proposed
approach is useful in predicting missing preferences of users and that the
latent vector representations of topics successfully encode inter-topic
preferences.
Maria Ryskina, Hannah Alpert-Abrams, Dan Garrette, Taylor Berg-Kirkpatrick
Comments: Short paper (6 pages) accepted at ACL 2017
Subjects: Computation and Language (cs.CL)
Compositor attribution, the clustering of pages in a historical printed
document by the individual who set the type, is a bibliographic task that
relies on analysis of orthographic variation and inspection of visual details
of the printed page. In this paper, we introduce a novel unsupervised model
that jointly describes the textual and visual features needed to distinguish
compositors. Applied to images of Shakespeare’s First Folio, our model predicts
attributions that agree with the manual judgements of bibliographers with an
accuracy of 87%, even on text that is the output of OCR.
Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Visual Question Answering (VQA) has received a lot of attention over the past
couple of years. A number of deep learning models have been proposed for this
task. However, it has been shown that these models are heavily driven by
superficial correlations in the training data and lack compositionality — the
ability to answer questions about unseen compositions of seen concepts. This
compositionality is desirable and central to intelligence. In this paper, we
propose a new setting for Visual Question Answering where the test
question-answer pairs are compositionally novel compared to training
question-answer pairs. To facilitate developing models under this setting, we
present a new compositional split of the VQA v1.0 dataset, which we call
Compositional VQA (C-VQA). We analyze the distribution of questions and answers
in the C-VQA splits. Finally, we evaluate several existing VQA models under
this new setting and show that the performances of these models degrade by a
significant amount compared to the original VQA setting.
Chenhao Tan, Dallas Card, Noah A. Smith
Comments: 11 pages, 9 figures, to appear in Proceedings of ACL 2017, code and data available at this https URL
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Physics and Society (physics.soc-ph)
Understanding how ideas relate to each other is a fundamental question in
many domains, ranging from intellectual history to public communication.
Because ideas are naturally embedded in texts, we propose the first framework
to systematically characterize the relations between ideas based on their
occurrence in a corpus of documents, independent of how these ideas are
represented. Combining two statistics — cooccurrence within documents and
prevalence correlation over time — our approach reveals a number of different
ways in which ideas can cooperate and compete. For instance, two ideas can
closely track each other’s prevalence over time, and yet rarely cooccur, almost
like a “cold war” scenario. We observe that pairwise cooccurrence and
prevalence correlation exhibit different distributions. We further demonstrate
that our approach is able to uncover intriguing relations between ideas through
in-depth case studies on news articles and research papers.
Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, Roberto Gioiosa
Comments: 18th International Conference on High Performance Computing and Communications, IEEE, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Idle periods on different processes of Message Passing applications are
unavoidable. While the origin of idle periods on a single process is well
understood as the effect of system and architectural random delays, yet it is
unclear how these idle periods propagate from one process to another. It is
important to understand idle period propagation in Message Passing applications
as it allows application developers to design communication patterns avoiding
idle period propagation and the consequent performance degradation in their
applications. To understand idle period propagation, we introduce a methodology
to trace idle periods when a process is waiting for data from a remote delayed
process in MPI applications. We apply this technique in an MPI application that
solves the heat equation to study idle period propagation on three different
systems. We confirm that idle periods move between processes in the form of
waves and that there are different stages in idle period propagation. Our
methodology enables us to identify a self-synchronization phenomenon that
occurs on two systems where some processes run slower than the other processes.
Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, Roberto Gioiosa
Comments: 18th International Conference on High Performance Computing and Communications, IEEE, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Next-generation supercomputers will feature more hierarchical and
heterogeneous memory systems with different memory technologies working
side-by-side. A critical question is whether at large scale existing HPC
applications and emerging data-analytics workloads will have performance
improvement or degradation on these systems. We propose a systematic and fair
methodology to identify the trend of application performance on emerging
hybrid-memory systems. We model the memory system of next-generation
supercomputers as a combination of “fast” and “slow” memories. We then analyze
performance and dynamic execution characteristics of a variety of workloads,
from traditional scientific applications to emerging data analytics to compare
traditional and hybrid-memory systems. Our results show that data analytics
applications can clearly benefit from the new system design, especially at
large scale. Moreover, hybrid-memory systems do not penalize traditional
scientific applications, which may also show performance improvement.
Eric Goubault, Sergio Rajsbaum
Comments: arXiv admin note: text overlap with arXiv:1703.11005
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA); Algebraic Topology (math.AT)
The computability power of a distributed computing model is determined by the
communication media available to the processes, the timing assumptions about
processes and communication, and the nature of failures that processes can
suffer. In a companion paper we showed how dynamic epistemic logic can be used
to give a formal semantics to a given distributed computing model, to capture
precisely the knowledge needed to solve a distributed task, such as consensus.
Furthermore, by moving to a dual model of epistemic logic defined by simplicial
complexes, topological invariants are exposed, which determine task
solvability. In this paper we show how to extend the setting above to include
in the knowledge of the processes, knowledge about the model of computation
itself. The extension describes the knowledge processes gain about the current
execution, in problems where processes have no input values at all.
Carmela Troncoso, George Danezis, Marios Isaakidis, Harry Halpin
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Decentralized systems are a subset of distributed systems where multiple
authorities control different components and no authority is fully trusted by
all. This implies that any component in a decentralized system is potentially
adversarial. We revise fifteen years of research on decentralization and
privacy, and provide an overview of key systems. Decentralized designs may lead
to gains in privacy, integrity and availability, but there are also inherent
trade-offs that require to be better understood to be addressed. We argue that
combination of insights from cryptography, distributed systems, and mechanism
design will be necessary to build scalable and successful decentralized
systems.
Abubakr O. Al-Abbasi, Vaneet Aggarwal
Comments: 15 pages, 13 figures
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Multimedia (cs.MM)
The demand for global video has been burgeoning across industries. With the
expansion and improvement of video streaming services, cloud-based video is
evolving into a necessary feature of any successful business for reaching
internal and external audiences. This paper considers video streaming over
distributed systems where the video segments are encoded using an erasure code
for better reliability thus being the first work to our best knowledge that
considers video streaming over erasure-coded distributed cloud systems. The
download time of each coded chunk of each video segment is characterized and
ordered statistics over the choice of the erasure-coded chunks is used to
obtain the playback time of different video segments. Using the playback times,
bounds on the moment generating function on the stall duration is used to bound
the mean stall duration. Moment generating function based bounds on the ordered
statistics are also used to bound the stall duration tail probability which
determines the probability that the stall time is greater than a pre-defined
number. These two metrics, mean stall duration and the stall duration tail
probability, are important quality of experience (QoE) measures for the end
users. Based on these metrics, we formulate an optimization problem to jointly
minimize the convex combination of both the QoE metrics averaged over all
requests over the placement and access of the video content. The non-convex
problem is solved using an efficient iterative algorithm. Numerical results
show significant improvement in QoE metrics for cloud-based video as compared
to the considered baselines.
Dawei Dai, Weimin Tan, Hong Zhan
Comments: arXiv admin note: text overlap with arXiv:1702.04595 by other authors
Subjects: Learning (cs.LG)
In recent years, deep learning based on artificial neural network (ANN) has
achieved great success in pattern recognition. However, there is no clear
understanding of such neural computational models. In this paper, we try to
unravel “black-box” structure of Ann model from network flow. Specifically, we
consider the feed forward Ann as a network flow model, which consists of many
directional class-pathways. Each class-pathway encodes one class. The
class-pathway of a class is obtained by connecting the activated neural nodes
in each layer from input to output, where activation value of neural node
(node-value) is defined by the weights of each layer in a trained
ANN-classifier. From the perspective of the class-pathway, training an
ANN-classifier can be regarded as the formulation process of class-pathways of
different classes. By analyzing the the distances of each two class-pathways in
a trained ANN-classifiers, we try to answer the questions, why the classifier
performs so? At last, from the neural encodes view, we define the importance of
each neural node through the class-pathways, which is helpful to optimize the
structure of a classifier. Experiments for two types of ANN model including
multi-layer MLP and CNN verify that the network flow based on class-pathway is
a reasonable explanation for ANN models.
Quynh Nguyen, Matthias Hein
Comments: 14 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
While the optimization problem behind deep neural networks is highly
non-convex, it is frequently observed in practice that training deep networks
seems possible without getting stuck in suboptimal points. It has been argued
that this is the case as all local minima are close to being globally optimal.
We show that this is (almost) true, in fact almost all local minima are
globally optimal, for a fully connected network with squared loss and analytic
activation function given that the number of hidden units of one layer of the
network is larger than the number of training points and the network structure
from this layer on is pyramidal.
Jianqiao Wangni
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The (ell_1) regularized sparse model has been favourably used in machine
learning society. Due to the non-smoothness, fast optimizers like quasi-Newton
methods can not be directly applied. In this paper, we propose the first
stochastic limited-memory quasi-newton optimizer that specializing in strongly
convex loss function with (ell_1)-regularization. The optimizer consists three
alignment steps which are generalized from batch version of OWL-QN optimizer,
to encourage the parameter update be orthant-wise. We adopt several practical
features from recent stochastic variants of L-BFGS and variance reduction of
subsampled gradient, we also employ various sketch techniques on the Hessian
matrix inversion, squeezing more curvature information and accelerate the
convergence. We prove a linear convergence rate of our optimizer, and
experimentally demonstrate that our optimizer outperforms other linear
convergent optimizers on large-scale sparse logistic regression task.
Pengfei Zhu, Xin Li, Pascal Poupart
Comments: 7 pages, 6 figures, 3 tables
Subjects: Learning (cs.LG)
Deep Reinforcement Learning (RL) recently emerged as one of the most
competitive approaches for learning in sequential decision making problems with
fully observable environments, e.g., computer Go. However, very little work has
been done in deep RL to handle partially observable environments. We propose a
new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to
enhance learning performance in partially observable domains. Actions are
encoded by a fully connected layer and coupled with a convolutional observation
to form an action-observation pair. The time series of action-observation pairs
are then integrated by an LSTM layer that learns latent states based on which a
fully connected layer computes Q-values as in conventional Deep Q-Networks
(DQNs). We demonstrate the effectiveness of our new architecture in several
partially observable domains, including flickering Atari games.
Swapna Buccapatnam, Fang Liu, Atilla Eryilmaz, Ness B. Shroff
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We study the stochastic multi-armed bandit (MAB) problem in the presence of
side-observations across actions that occur as a result of an underlying
network structure. In our model, a bipartite graph captures the relationship
between actions and a common set of unknowns such that choosing an action
reveals observations for the unknowns that it is connected to. This models a
common scenario in online social networks where users respond to their friends’
activity, thus providing side information about each other’s preferences. Our
contributions are as follows: 1) We derive an asymptotic lower bound (with
respect to time) as a function of the bi-partite network structure on the
regret of any uniformly good policy that achieves the maximum long-term average
reward. 2) We propose two policies – a randomized policy; and a policy based on
the well-known upper confidence bound (UCB) policies – both of which explore
each action at a rate that is a function of its network position. We show,
under mild assumptions, that these policies achieve the asymptotic lower bound
on the regret up to a multiplicative factor, independent of the network
structure. Finally, we use numerical examples on a real-world social network
and a routing example network to demonstrate the benefits obtained by our
policies over other existing policies.
Tien Thanh Nguyen, Thi Thu Thuy Nguyen, Xuan Cuong Pham, Alan Wee-Chung Liew, James C. Bezdek
Comments: 19 pages, 3 figures
Subjects: Learning (cs.LG)
In this study, we introduce an ensemble-based approach for online machine
learning. The ensemble of base classifiers in our approach is obtained by
learning Naive Bayes classifiers on different training sets which are generated
by projecting the original training set to lower dimensional space. We propose
a mechanism to learn sequences of data using data chunks paradigm. The
experiments conducted on a number of UCI datasets and one synthetic dataset
demonstrate that the proposed approach performs significantly better than some
well-known online learning algorithms.
Zhao Song, David P. Woodruff, Peilin Zhong
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Learning (cs.LG)
We consider relative error low rank approximation of {it tensors} with
respect to the Frobenius norm: given an order-(q) tensor (A in
mathbb{R}^{prod_{i=1}^q n_i}), output a rank-(k) tensor (B) for which
(|A-B|_F^2 leq (1+epsilon))OPT, where OPT (= inf_{ extrm{rank-}k~A’}
|A-A’|_F^2). Despite the success on obtaining relative error low rank
approximations for matrices, no such results were known for tensors. One
structural issue is that there may be no rank-(k) tensor (A_k) achieving the
above infinum. Another, computational issue, is that an efficient relative
error low rank approximation algorithm for tensors would allow one to compute
the rank of a tensor, which is NP-hard. We bypass these issues via (1)
bicriteria and (2) parameterized complexity solutions:
(1) We give an algorithm which outputs a rank (k’ = O((k/epsilon)^{q-1}))
tensor (B) for which (|A-B|_F^2 leq (1+epsilon))OPT in (nnz(A) + n cdot
extrm{poly}(k/epsilon)) time in the real RAM model. Here (nnz(A)) is the
number of non-zero entries in (A).
(2) We give an algorithm for any (delta >0) which outputs a rank (k) tensor
(B) for which (|A-B|_F^2 leq (1+epsilon))OPT and runs in ( ( nnz(A) + n
cdot extrm{poly}(k/epsilon) + exp(k^2/epsilon) ) cdot n^delta) time in
the unit cost RAM model.
For outputting a rank-(k) tensor, or even a bicriteria solution with
rank-(Ck) for a certain constant (C > 1), we show a (2^{Omega(k^{1-o(1)})})
time lower bound under the Exponential Time Hypothesis.
Our results give the first relative error low rank approximations for tensors
for a large number of robust error measures for which nothing was known, as
well as column row and tube subset selection. We also obtain new results for
matrices, such as (nnz(A))-time CUR decompositions, improving previous
(nnz(A)log n)-time algorithms, which may be of independent interest.
Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
Visual Question Answering (VQA) has received a lot of attention over the past
couple of years. A number of deep learning models have been proposed for this
task. However, it has been shown that these models are heavily driven by
superficial correlations in the training data and lack compositionality — the
ability to answer questions about unseen compositions of seen concepts. This
compositionality is desirable and central to intelligence. In this paper, we
propose a new setting for Visual Question Answering where the test
question-answer pairs are compositionally novel compared to training
question-answer pairs. To facilitate developing models under this setting, we
present a new compositional split of the VQA v1.0 dataset, which we call
Compositional VQA (C-VQA). We analyze the distribution of questions and answers
in the C-VQA splits. Finally, we evaluate several existing VQA models under
this new setting and show that the performances of these models degrade by a
significant amount compared to the original VQA setting.
Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
Comments: 55 pages, 3 figures, 1 table
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
There is widespread sentiment that it is not possible to effectively utilize
fast gradient methods (e.g. Nesterov’s acceleration, conjugate gradient, heavy
ball) for the purposes of stochastic optimization due to their instability and
error accumulation, a notion made precise in d’Aspremont 2008 and Devolder,
Glineur, and Nesterov 2014. This work considers these issues for the special
case of stochastic approximation for the least squares regression problem, and
our main result refutes the conventional wisdom by showing that acceleration
can be made robust to statistical errors. In particular, this work introduces
an accelerated stochastic gradient method that provably achieves the minimax
optimal statistical risk faster than stochastic gradient descent. Critical to
the analysis is a sharp characterization of accelerated stochastic gradient
descent as a stochastic process. We hope this characterization gives insights
towards the broader question of designing simple and effective accelerated
stochastic methods for more general convex and non-convex optimization
problems.
Yotam Hechtlinger, Purvasha Chakravarti, Jining Qin
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
This paper introduces a generalization of Convolutional Neural Networks
(CNNs) from low-dimensional grid data, such as images, to graph-structured
data. We propose a novel spatial convolution utilizing a random walk to uncover
the relations within the input, analogous to the way the standard convolution
uses the spatial neighborhood of a pixel on the grid. The convolution has an
intuitive interpretation, is efficient and scalable and can also be used on
data with varying graph structure. Furthermore, this generalization can be
applied to many standard regression or classification problems, by learning the
the underlying graph. We empirically demonstrate the performance of the
proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular
activity data set.
Mohammadreza Soltaninejad, Lei Zhang, Tryphon Lambrou, Nigel Allinson, Xujiong Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this paper, we propose a novel learning based method for automated
segmenta-tion of brain tumor in multimodal MRI images. The machine learned
features from fully convolutional neural network (FCN) and hand-designed texton
fea-tures are used to classify the MRI image voxels. The score map with
pixel-wise predictions is used as a feature map which is learned from
multimodal MRI train-ing dataset using the FCN. The learned features are then
applied to random for-ests to classify each MRI image voxel into normal brain
tissues and different parts of tumor. The method was evaluated on BRATS 2013
challenge dataset. The results show that the application of the random forest
classifier to multimodal MRI images using machine-learned features based on FCN
and hand-designed features based on textons provides promising segmentations.
The Dice overlap measure for automatic brain tumor segmentation against ground
truth is 0.88, 080 and 0.73 for complete tumor, core and enhancing tumor,
respectively.
Samuel Rönnqvist, Niko Schenk, Christian Chiarcos
Comments: To appear at ACL2017, code available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We introduce an attention-based Bi-LSTM for Chinese implicit discourse
relations and demonstrate that modeling argument pairs as a joint sequence can
outperform word order-agnostic approaches. Our model benefits from a partial
sampling scheme and is conceptually simple, yet achieves state-of-the-art
performance on the Chinese Discourse Treebank. We also visualize its attention
activity to illustrate the model’s ability to selectively focus on the relevant
parts of an input sequence.
Arnaud Joly
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Within machine learning, the supervised learning field aims at modeling the
input-output relationship of a system, from past observations of its behavior.
Decision trees characterize the input-output relationship through a series of
nested (if-then-else) questions, the testing nodes, leading to a set of
predictions, the leaf nodes. Several of such trees are often combined together
for state-of-the-art performance: random forest ensembles average the
predictions of randomized decision trees trained independently in parallel,
while tree boosting ensembles train decision trees sequentially to refine the
predictions made by the previous ones.
The emergence of new applications requires scalable supervised learning
algorithms in terms of computational power and memory space with respect to the
number of inputs, outputs, and observations without sacrificing accuracy. In
this thesis, we identify three main areas where decision tree methods could be
improved for which we provide and evaluate original algorithmic solutions: (i)
learning over high dimensional output spaces, (ii) learning with large sample
datasets and stringent memory constraints at prediction time and (iii) learning
over high dimensional sparse input spaces.
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi
Comments: 7 pages
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
Deep neural networks (DNNs) play a key role in many applications. Current
studies focus on crafting adversarial samples against DNN-based image
classifiers by introducing some imperceptible perturbations to the input.
However, DNNs for natural language processing have not got the attention they
deserve. In fact, the existing perturbation algorithms for images cannot be
directly applied to text. This paper presents a simple but effective method to
attack DNN-based text classifiers. Three perturbation strategies, namely
insertion, modification, and removal, are designed to generate an adversarial
sample for a given text. By computing the cost gradients, what should be
inserted, modified or removed, where to insert and how to modify are determined
effectively. The experimental results show that the adversarial samples
generated by our method can successfully fool a state-of-the-art model to
misclassify them as any desirable classes without compromising their utilities.
At the same time, the introduced perturbations are difficult to be perceived.
Our study demonstrates that DNN-based text classifiers are also prone to the
adversarial sample attack.
Adel Javanmard, Jason D. Lee
Comments: 27 pages
Subjects: Statistics Theory (math.ST); Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME); Machine Learning (stat.ML)
Hypothesis testing in the linear regression model is a fundamental
statistical problem. We consider linear regression in the high-dimensional
regime where the number of parameters exceeds the number of samples ((p> n))
and assume that the high-dimensional parameters vector is (s_0) sparse. We
develop a general and flexible (ell_infty) projection statistic for
hypothesis testing in this model. Our framework encompasses testing whether the
parameter lies in a convex cone, testing the signal strength, testing arbitrary
functionals of the parameter, and testing adaptive hypothesis. We show that the
proposed procedure controls the type I error under the standard assumption of
(s_0 (log p)/sqrt{n} o 0), and also analyze the power of the procedure. Our
numerical experiments confirms our theoretical findings and demonstrate that we
control false positive rate (type I error) near the nominal level, and have
high power.
Feihu Huang, Songcan Chen
Comments: 21 pages, 3 figures and 1 table
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we study the stochastic gradient descent (SGD) method for the
nonconvex nonsmooth optimization, and propose an accelerated SGD method by
combining the variance reduction technique with Nesterov’s extrapolation
technique. Moreover, based on the local error bound condition, we establish the
linear convergence of our method to obtain a stationary point of the nonconvex
optimization. In particular, we prove that not only the sequence generated
linearly converges to a stationary point of the problem, but also the
corresponding sequence of objective values is linearly convergent. Finally,
some numerical experiments demonstrate the effectiveness of our method. To the
best of our knowledge, it is first proved that the accelerated SGD method
converges linearly to the local minimum of the nonconvex optimization.
Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, Percy Liang
Comments: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (2017)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Our goal is to learn a semantic parser that maps natural language utterances
into executable programs when only indirect supervision is available: examples
are labeled with the correct execution result, but not the program itself.
Consequently, we must search the space of programs for those that output the
correct result, while not being misled by spurious programs: incorrect programs
that coincidentally output the correct result. We connect two common learning
paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML),
and then present a new learning algorithm that combines the strengths of both.
The new algorithm guards against spurious programs by combining the systematic
search traditionally employed in MML with the randomized exploration of RL, and
by updating parameters such that probability is spread more evenly across
consistent programs. We apply our learning algorithm to a new neural semantic
parser and show significant gains over existing state-of-the-art results on a
recent context-dependent semantic parsing task.
Mariusz Bojarski, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, Urs Muller
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
As part of a complete software stack for autonomous driving, NVIDIA has
created a neural-network-based system, known as PilotNet, which outputs
steering angles given images of the road ahead. PilotNet is trained using road
images paired with the steering angles generated by a human driving a
data-collection car. It derives the necessary domain knowledge by observing
human drivers. This eliminates the need for human engineers to anticipate what
is important in an image and foresee all the necessary rules for safe driving.
Road tests demonstrated that PilotNet can successfully perform lane keeping in
a wide variety of driving conditions, regardless of whether lane markings are
present or not.
The goal of the work described here is to explain what PilotNet learns and
how it makes its decisions. To this end we developed a method for determining
which elements in the road image most influence PilotNet’s steering decision.
Results show that PilotNet indeed learns to recognize relevant objects on the
road.
In addition to learning the obvious features such as lane markings, edges of
roads, and other cars, PilotNet learns more subtle features that would be hard
to anticipate and program by engineers, for example, bushes lining the edge of
the road and atypical vehicle classes.
Matthew Nokleby, Waheed U. Bajwa
Comments: 13 pages, 1 figure; submitted to IEEE Transactions on Signal and Information Processing over Networks
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Motivated by machine learning applications in networks of sensors,
internet-of-things (IoT) devices, and autonomous agents, we propose techniques
for distributed stochastic convex learning from high-rate data streams. The
setup involves a network of nodes—each one of which has a stream of data
arriving at a constant rate—that solve a stochastic convex optimization
problem by collaborating with each other over rate-limited communication links.
To this end, we present and analyze two algorithms—termed distributed
stochastic approximation mirror descent (D-SAMD) and {em accelerated}
distributed stochastic approximation mirror descent (AD-SAMD)—that are based
on two stochastic variants of mirror descent. The main collaborative step in
the proposed algorithms is approximate averaging of the local, noisy
subgradients using distributed consensus. While distributed consensus is well
suited for collaborative learning, its use in our setup results in perturbed
subgradient averages due to rate-limited links, which may slow down or prevent
convergence. Our main contributions in this regard are: (i) bounds on the
convergence rates of D-SAMD and AD-SAMD in terms of the number of nodes,
network topology, and ratio of the data streaming and communication rates, and
(ii) sufficient conditions for order-optimum convergence of D-SAMD and AD-SAMD.
In particular, we show that there exist regimes under which AD-SAMD, when
compared to D-SAMD, achieves order-optimum convergence with slower
communications rates. This is in contrast to the centralized setting in which
use of accelerated mirror descent results in a modest improvement over regular
mirror descent for stochastic composite optimization. Finally, we demonstrate
the effectiveness of the proposed algorithms using numerical experiments.
Boris Bonev, Lukas Prantl, Nils Thuerey
Comments: Supplemental Android app: this https URL
Subjects: Graphics (cs.GR); Learning (cs.LG)
Liquids exhibit highly complex, non-linear behavior under changing simulation
conditions such as user interactions. We propose a method to map this complex
behavior over a parameter range onto a reduced representation based on
space-time deformations. In order to represent the complexity of the full space
of inputs, we use aligned deformations from optical flow solves, and we
leverage the power of generative neural networks to synthesize additional
deformations for refinement. We introduce a novel deformation-aware loss
function, which enables optimization in the highly non-linear space of multiple
deformations. To demonstrate the effectiveness of our approach, we showcase the
method with several complex examples in two and four dimensions. Our
representation makes it possible to generate implicit surfaces of liquids very
efficiently, which allows us to very efficiently display the scene from any
angle, and to add secondary effects such as particle systems. We have
implemented a mobile application with our full pipeline to demonstrate that
real-time interaction is possible with our approach.
Brendt Wohlberg
Subjects: Optimization and Control (math.OC); Learning (cs.LG)
Appropriate selection of the penalty parameter is crucial to obtaining good
performance from the Alternating Direction Method of Multipliers (ADMM). While
analytic results for optimal selection of this parameter are very limited,
there is a heuristic method that appears to be relatively successful in a
number of different problems. The contribution of this paper is to demonstrate
that their is a potentially serious flaw in this heuristic approach, and to
propose a modification that at least partially addresses it.
Neelakantan Nurani Krishnan, Gokul Sridharan, Ivan Seskar, Narayan Mandayam
Comments: Published in IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2017 held at Baltimore, MD
Subjects: Information Theory (cs.IT)
Recent regulatory changes proposed by the Federal Communications Commission
(FCC) permitting unlicensed use of television white space (TVWS) channels
present new opportunities for designing wireless networks that make efficient
use of this spectrum. The favorable propagation characteristics of these
channels and their widespread availability, especially in rural areas, make
them well-suited for providing broadband services in sparsely populated regions
where economic factors hinder deployment of such services on licensed spectrum.
In this context, this paper explores the deployment of an outdoor Wi-Fi-like
network operating in TVWS channels, referred to commonly as a Super Wi-Fi
network. Since regulations governing unlicensed use of these channels allow (a)
mounting fixed devices up to a height of 30 m and operation at transmit powers
of up to 4 W EIRP, and (b) operation at transmit powers of up to 100 mW EIRP
for portable devices, such networks can provide extended coverage and higher
rates than traditional Wi-Fi networks. However, these gains are subject to the
viability of the uplink from the portable devices (clients) to the fixed
devices (access points (AP)) because of tighter restrictions on transmit power
of clients compared to APs. This paper leverages concepts from stochastic
geometry to study the performance of such networks with specific focus on the
effect of (a) transmit power asymmetry between APs and clients and its impact
on uplink viability and coverage, and (b) the interplay between height and
transmit power of APs in determining the network throughput. Such an analysis
reveals that (a) maximum coverage of no more than 700 m is obtained even when
APs are deployed at 30 m height, and (b) operating APs at transmit power of
more than 1 W is beneficial only at sparse deployment densities when rate is
prioritized over coverage.
Xiaowen Tian, Ming Li, Zihuan Wang, Qian Liu
Subjects: Information Theory (cs.IT)
Millimeter wave (mmWave) communications have been considered as a key
technology for future 5G wireless networks. In order to overcome the severe
propagation loss of mmWave channel, mmWave multiple-input multiple-output
(MIMO) systems with analog/digital hybrid precoding and combining transceiver
architecture have been widely considered. However, physical layer security
(PLS) in mmWave MIMO systems and the secure hybrid beamformer design have not
been well investigated. In this paper, we consider the problem of hybrid
precoder and combiner design for secure transmission in mmWave MIMO systems in
order to protect the legitimate transmission from eavesdropping. When
eavesdropper’s channel state information (CSI) is known, we first propose a
joint analog precoder and combiner design algorithm which can prevent the
information leakage to the eavesdropper. Then, the digital precoder and
combiner are computed based on the obtained effective baseband channel to
further maximize the secrecy rate. Next, if prior knowledge of the
eavesdropper’s CSI is unavailable, we develop an artificial noise (AN)-based
hybrid beamforming approach, which can jam eavesdropper’s reception while
maintaining the quality-of-service (QoS) of intended receiver at the
pre-specified level. Simulation results demonstrate that our proposed
algorithms offer significant secrecy performance improvement compared with
other hybrid beamforming algorithms.
Nir Shlezinger, Ron Dabora, Yonina C. Eldar
Comments: This paper will be presented in part at the 2017 International Symposium on Information Theory
Subjects: Information Theory (cs.IT)
In phase retrieval problems, a signal of interest (SOI) is reconstructed
based on the magnitude of a linear transformation of the SOI observed with
additive noise. The linear transform is typically referred to as a measurement
matrix. Many works on phase retrieval assume that the measurement matrix is a
random Gaussian matrix, which, in the noiseless scenario with sufficiently many
measurements, guarantees invertability of the transformation between the SOI
and the observations, up to an inherent phase ambiguity. However, in many
practical applications, the measurement matrix corresponds to an underlying
physical setup, and is therefore deterministic, possibly with structural
constraints. In this work we study the design of deterministic measurement
matrices, based on maximizing the mutual information between the SOI and the
observations. We characterize necessary conditions for the optimality of a
measurement matrix, and analytically obtain the optimal matrix in the low SNR
regime. Practical methods for designing general measurement matrices and masked
Fourier measurements are proposed. Simulation tests demonstrate the performance
gain achieved by the proposed techniques compared to random Gaussian
measurements for various phase recovery algorithms.
Ali Bereyhi, Ralf Müller, Hermann Schulz-Baldes
Comments: 7 pages, 3 figures, presented at ITA 2017
Subjects: Information Theory (cs.IT)
For noisy compressive sensing systems, the asymptotic distortion with respect
to an arbitrary distortion function is determined when a general class of
least-square based reconstruction schemes is employed. The sampling matrix is
considered to belong to a large ensemble of random matrices including i.i.d.
and projector matrices, and the source vector is assumed to be i.i.d. with a
desired distribution. We take a statistical mechanical approach by representing
the asymptotic distortion as a macroscopic parameter of a spin glass and
employing the replica method for the large-system analysis. In contrast to
earlier studies, we evaluate the general replica ansatz which includes the RS
ansatz as well as RSB. The generality of the solution enables us to study the
impact of symmetry breaking. Our numerical investigations depict that for the
reconstruction scheme with the “zero-norm” penalty function, the RS fails to
predict the asymptotic distortion for relatively large compression rates;
however, the one-step RSB ansatz gives a valid prediction of the performance
within a larger regime of compression rates.
Wenfei Liu, Ming Li, Xiaowen Tian, Zihuan Wang, Qian Liu
Subjects: Information Theory (cs.IT)
Physical layer security has been considered as an important security approach
in wireless communications to protect legitimate transmission from passive
eavesdroppers. This paper investigates the physical layer security of a
wireless multiple-input multiple-output (MIMO) orthogonal frequency division
multiplexing (OFDM) communication system in the presence of a multiple-antenna
eavesdropper. We first propose a transmit-filter-assisted secure MIMO-OFDM
system which can destroy the orthogonality of eavesdropper’s signals. Our
proposed transmit filter can disturb the reception of eavesdropper while
maintaining the quality of legitimate transmission. Then, we propose another
artificial noise (AN)-assisted secure MIMO-OFDM system to further improve the
security of the legitimate transmission. The time-domain AN signal is designed
to disturb the reception of eavesdropper while the legitimate transmission will
not be affected. Simulation results are presented to demonstrate the security
performance of the proposed transmit filter design and AN-assisted scheme in
the MIMO-OFDM system.
Wenfei Liu, Ming Li, Xiaowen Tian, Zihuan Wang, Qian Liu
Subjects: Information Theory (cs.IT)
This paper investigates the problem of hybrid precoder and combiner design
for multiple-input multiple-output (MIMO) orthogonal frequency division
multiplexing (OFDM) systems operating in millimeter-wave (mmWave) bands. We
propose a novel iterative scheme to design the codebook-based analog precoder
and combiner in forward and reverse channels. During each iteration, we apply
compressive sensing (CS) technology to efficiently estimate the equivalent
MIMO-OFDM mmWave channel. Then, the analog precoder or combiner is obtained
based on the orthogonal matching pursuit (OMP) algorithm to alleviate the
interference between different data streams as well as maximize the spectral
efficiency. The digital precoder and combiner are finally obtained based on the
effective baseband channel to further enhance the spectral efficiency.
Simulation results demonstrate the proposed iterative hybrid precoder and
combiner algorithm has significant performance advantages.
Feng Shu, Xiaomin Wu, Jinsong Hu, Riqing Chen, Jiangzhou Wang
Comments: 19 pages, 5 figures
Subjects: Information Theory (cs.IT)
In this paper, a practical wireless transmission scheme is proposed to
transmit confidential messages to the desired user securely and precisely by
the joint use of multiple tools including artificial noise (AN) projection,
phase alignment (PA)/beamforming, and random subcarrier selection (RSCS) based
on OFDM, and directional modulation (DM), short for RSCS-OFDM-DM. This
RSCS-OFDM-DM scheme provides an extremely low-complexity structure for the
desired receiver and makes the secure and precise wireless transmission
realizable in practical. For illegitimate eavesdroppers, the receive power of
confidential messages is so weak that their receivers cannot intercept these
confidential messages successfully once it is corrupted by AN. In such a
scheme, the design of phase alignment/beamforming vector and AN projection
matrix depend intimately on the desired direction angle and distance. It is
particularly noted that the use of RSCS leads to a significant outcome that the
receive power of confidential messages mainly concentrates on the small
adjacent region around the desired receiver and only its power small fraction
leaks out to the remaining large broad regions. This concept is called secure
precise transmission. The probability density function of real-time receive
signal-to-interference-and-noise ratio (SINR) is derived, and the average SINR
is attained. From simulation and analysis, it follows that the proposed scheme
actually can achieve a secure and precise wireless transmission of confidential
messages in line-of-propagation channel and can be applied to the various
future wireless scenarios.
Zihuan Wang, Ming Li, Xiaowen Tian, Qian Liu
Subjects: Information Theory (cs.IT)
Analog/digital hybrid precoder and combiner have been widely used in
millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems due to
its energy-efficient and economic superiorities. Infinite resolution of phase
shifters (PSs) for the analog beamformer can achieve very close performance
compared to the full-digital scheme but will result in high complexity and
intensive power consumption. Thus, more cost effective and energy efficient low
resolution PSs are typically used in practical mmWave MIMO systems. In this
paper, we consider the joint hybrid precoder and combiner design with one-bit
quantized PSs in mmWave MIMO systems. We propose to firstly design the analog
precoder and combiner pair for each data stream successively, aiming at
conditionally maximizing the spectral efficiency. We present a novel binary
analog precoder and combiner optimization algorithm under a Rank-1
approximation of the interference-included equivalent channel with lower than
quadratic complexity. Then the digital precoder and combiner are computed based
on the obtained baseband effective channel to further enhance the spectral
efficiency. Simulation results demonstrate that the proposed algorithm
outperforms the existing one-bit PSs based hybrid beamforming scheme.
Zihuan Wang, Ming Li, Xiaowen Tian, Qian Liu
Subjects: Information Theory (cs.IT)
Millimeter-wave (mmWave) communications have been considered as a key
technology for future 5G wireless networks because of the orders-of-magnitude
wider bandwidth than current cellular bands. In this paper, we consider the
problem of codebook-based joint analog-digital hybrid precoder and combiner
design for spatial multiplexing transmission in a mmWave multiple-input
multiple-output (MIMO) system. We propose to jointly select analog precoder and
combiner pair for each data stream successively aiming at maximizing the
channel gain while suppressing the interference between different data streams.
After all analog precoder/combiner pairs have been determined, we can obtain
the effective baseband channel. Then, the digital precoder and combiner are
computed based on the obtained effective baseband channel to further mitigate
the interference and maximize the sum-rate. Simulation results demonstrate that
our proposed algorithm exhibits prominent advantages in combating interference
between different data streams and offer satisfactory performance improvement
compared to the existing codebook-based hybrid beamforming schemes.
Siddhartan Govindasamy, Itsik Bergel
Subjects: Information Theory (cs.IT)
We consider a user-centric co-operative cellular network, where base stations
close to a mobile co-operate to detect its signal using a (joint) linear
minimum-mean-square-error receiver. The base stations, which have multiple
antennas, and mobiles are modeled as independent Poisson Point Processes to
avoid dependence on specific node locations. Combining stochastic geometry and
infinite random matrix theory, we derive a simple expression for the spectral
efficiency of this complex system as the number of antennas grows large. This
expression is verified by Monte Carlo simulations which support its utility for
even a moderate number of antennas. This result reveals the influence of
tangible system parameters such as mobile and base-station densities, number of
antennas per base station, and number of co-operating base stations on
achievable spectral efficiencies. For instance, we find that for a given base
station density and a constraint on the total number of co-operating antennas,
all co-operating antennas should be located at a single base station. On the
other hand, in our asymptotic regime, for the same number of co-operating
antennas, if the network is limited by the area density of antennas, then the
number of co-operating base stations should be increased with fewer antennas
per base station.
Chao Tian, Kai Zhang
Comments: 16 pages, 4 figures
Subjects: Information Theory (cs.IT)
In the coded caching framework proposed by Maddah Ali and Niesen, there are
two classes of coding schemes known in the literature, namely uncoded
prefetching schemes and coded prefetching schemes. In this work, we provide a
connection between the uncoded prefetching scheme proposed by Maddah Ali and
Niesen (and its improved version by Yu et al.) and the coded prefetching scheme
proposed by Tian and Chen, when the number of files is no larger than that of
users. We make a critical observation that a coding component in the Tian-Chen
scheme can be replaced by a binary code, which enables us to view the two
schemes as the extremes of a more general scheme. The intermediate operating
points of this general scheme can in fact provide new tradeoff points
previously not known in the literature, however, explicit characterizing the
performance of this general scheme appears rather difficult.
Nagaraj T. Janakiraman, Avinash Vem, Krishna R. Narayanan, Jean-Francois Chamberland
Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)
We consider the problem of querying a string (or, a database) of length (N)
bits to determine all the locations where a substring (query) of length (M)
appears either exactly or is within a Hamming distance of (K) from the query.
We assume that sketches of the original signal can be computed off line and
stored. Using the sparse Fourier transform computation based approach
introduced by Pawar and Ramchandran, we show that all such matches can be
determined with high probability in sub-linear time. Specifically, if the query
length (M = O(N^mu)) and the number of matches (L=O(N^lambda)), we show that
for (lambda < 1-mu) all the matching positions can be determined with a
probability that approaches 1 as (N
ightarrow infty) for (K leq
frac{1}{6}M). More importantly our scheme has a worst-case computational
complexity that is only (Oleft(max{N^{1-mu}log^2 N, N^{mu+lambda}log N
}
ight)), which means we can recover all the matching positions in {it
sub-linear} time for (lambda<1-mu). This is a substantial improvement over
the best known computational complexity of (Oleft(N^{1-0.359 mu}
ight)) for
recovering one matching position by Andoni {em et al.} cite{andoni2013shift}.
Further, the number of Fourier transform coefficients that need to be computed,
stored and accessed, i.e., the sketching complexity of this algorithm is only
(Oleft(N^{1-mu}log N
ight)). Several extensions of the main theme are also
discussed.
Thierry P. Berger, Cheikh Thiécoumba Gueye, Jean Belo Klamti
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
Most of the codes that have an algebraic decoding algorithm are derived from
the Reed Solomon codes. They are obtained by taking equivalent codes, for
example the generalized Reed Solomon codes, or by using the so-called subfield
subcode method, which leads to Alternant codes and Goppa codes over the
underlying prime field, or over some intermediate subfield. The main advantages
of these constructions is to preserve both the minimum distance and the
decoding algorithm of the underlying Reed Solomon code. In this paper, we
propose a generalization of the subfield subcode construction by introducing
the notion of subspace subcodes and a generalization of the equivalence of
codes which leads to the notion of generalized subspace subcodes. When the
dimension of the selected subspaces is equal to one, we show that our approach
gives exactly the family of the codes obtained by equivalence and subfield
subcode technique. However, our approach highlights the links between the
subfield subcode of a code defined over an extension field and the operation of
puncturing the (q)-ary image of this code. When the dimension of the subspaces
is greater than one, we obtain codes whose alphabet is no longer a finite
field, but a set of r-uples. We explain why these codes are practically as
efficient for applications as the codes defined on an extension of degree r. In
addition, they make it possible to obtain decodable codes over a large alphabet
having parameters previously inaccessible. As an application, we give some
examples that can be used in public key cryptosystems such as McEliece.
Abubakr O. Al-Abbasi, Vaneet Aggarwal
Comments: 15 pages, 13 figures
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Multimedia (cs.MM)
The demand for global video has been burgeoning across industries. With the
expansion and improvement of video streaming services, cloud-based video is
evolving into a necessary feature of any successful business for reaching
internal and external audiences. This paper considers video streaming over
distributed systems where the video segments are encoded using an erasure code
for better reliability thus being the first work to our best knowledge that
considers video streaming over erasure-coded distributed cloud systems. The
download time of each coded chunk of each video segment is characterized and
ordered statistics over the choice of the erasure-coded chunks is used to
obtain the playback time of different video segments. Using the playback times,
bounds on the moment generating function on the stall duration is used to bound
the mean stall duration. Moment generating function based bounds on the ordered
statistics are also used to bound the stall duration tail probability which
determines the probability that the stall time is greater than a pre-defined
number. These two metrics, mean stall duration and the stall duration tail
probability, are important quality of experience (QoE) measures for the end
users. Based on these metrics, we formulate an optimization problem to jointly
minimize the convex combination of both the QoE metrics averaged over all
requests over the placement and access of the video content. The non-convex
problem is solved using an efficient iterative algorithm. Numerical results
show significant improvement in QoE metrics for cloud-based video as compared
to the considered baselines.