Sean C. Smithson, Guang Yang, Warren J. Gross, Brett H. Meyer
Comments: To appear in ICCAD’16. The authoritative version will appear in the ACM Digital Library
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Artificial neural networks have gone through a recent rise in popularity,
achieving state-of-the-art results in various fields, including image
classification, speech recognition, and automated control. Both the performance
and computational complexity of such models are heavily dependant on the design
of characteristic hyper-parameters (e.g., number of hidden layers, nodes per
layer, or choice of activation functions), which have traditionally been
optimized manually. With machine learning penetrating low-power mobile and
embedded areas, the need to optimize not only for performance (accuracy), but
also for implementation complexity, becomes paramount. In this work, we present
a multi-objective design space exploration method that reduces the number of
solution networks trained and evaluated through response surface modelling.
Given spaces which can easily exceed 1020 solutions, manually designing a
near-optimal architecture is unlikely as opportunities to reduce network
complexity, while maintaining performance, may be overlooked. This problem is
exacerbated by the fact that hyper-parameters which perform well on specific
datasets may yield sub-par results on others, and must therefore be designed on
a per-application basis. In our work, machine learning is leveraged by training
an artificial neural network to predict the performance of future candidate
networks. The method is evaluated on the MNIST and CIFAR-10 image datasets,
optimizing for both recognition accuracy and computational complexity.
Experimental results demonstrate that the proposed method can closely
approximate the Pareto-optimal front, while only exploring a small fraction of
the design space.
Peter O'Connor, Max Welling
Comments: 9 Pages + 1 Reference + 3 Appendix, 5 figures
Subjects: Neural and Evolutionary Computing (cs.NE)
Deep neural networks can be obscenely wasteful. When processing video, a
convolutional network expends a fixed amount of computation for each frame with
no regard to the similarity between neighbouring frames. As a result, it ends
up repeatedly doing very similar computations. To put an end to such waste, we
introduce Sigma-Delta networks. With each new input, each layer in this network
sends a discretized form of its change in activation to the next layer. Thus
the amount of computation that the network does scales with the amount of
change in the input and layer activations, rather than the size of the network.
We introduce an optimization method for converting any pre-trained deep network
into an optimally efficient Sigma-Delta network, and show that our algorithm,
if run on the appropriate hardware, could cut at least an order of magnitude
from the computational cost of processing video data.
Jonas Degrave, Michiel Hermans, Joni Dambre, Francis wyffels
Comments: International Conference on Learning Representations 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Robotics (cs.RO)
One of the most important fields in robotics is the optimization of
controllers. Currently, robots are treated as a black box in this optimization
process, which is the reason why derivative-free optimization methods such as
evolutionary algorithms or reinforcement learning are omnipresent. We propose
an implementation of a modern physics engine, which has the ability to
differentiate control parameters. This has been implemented on both CPU and
GPU. We show how this speeds up the optimization process, even for small
problems, and why it will scale to bigger problems. We explain why this is an
alternative approach to deep Q-learning, for using deep learning in robotics.
Lastly, we argue that this is a big step for deep learning in robotics, as it
opens up new possibilities to optimize robots, both in hardware and software.
Lu Hou, Quanming Yao, James T. Kwok
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Deep neural network models, though very powerful and highly successful, are
computationally expensive in terms of space and time. Recently, there have been
a number of attempts on binarizing the network weights and activations. This
greatly reduces the network size, and replaces the underlying multiplications
to additions or even XNOR bit operations. However, existing binarization
schemes are based on simple matrix approximation and ignore the effect of
binarization on the loss. In this paper, we propose a proximal Newton algorithm
with diagonal Hessian approximation that directly minimizes the loss w.r.t. the
binarized weights. The underlying proximal step has an efficient closed-form
solution, and the second-order information can be efficiently obtained from the
second moments already computed by the Adam optimizer. Experiments on both
feedforward and recurrent networks show that the proposed loss-aware
binarization algorithm outperforms existing binarization schemes, and is also
more robust for wide and deep networks.
Farkhondeh Kiaee, Christian Gagné, Mahdieh Abbasi
Comments: Under review as a conference paper at ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE)
The storage and computation requirements of Convolutional Neural Networks
(CNNs) can be prohibitive for exploiting these models over low-power or
embedded devices. This paper reduces the computational complexity of the CNNs
by minimizing an objective function, including the recognition loss that is
augmented with a sparsity-promoting penalty term. The sparsity structure of the
network is identified using the Alternating Direction Method of Multipliers
(ADMM), which is widely used in large optimization problems. This method
alternates between promoting the sparsity of the network and optimizing the
recognition performance, which allows us to exploit the two-part structure of
the corresponding objective functions. In particular, we take advantage of the
separability of the sparsity-inducing penalty functions to decompose the
minimization problem into sub-problems that can be solved sequentially.
Applying our method to a variety of state-of-the-art CNN models, our proposed
method is able to simplify the original model, generating models with less
computation and fewer parameters, while maintaining and often improving
generalization performance. Accomplishments on a variety of models strongly
verify that our proposed ADMM-based method can be a very useful tool for
simplifying and improving deep CNNs.
James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
Comments: Submitted to conference track at ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)
Recurrent neural networks are a powerful tool for modeling sequential data,
but the dependence of each timestep’s computation on the previous timestep’s
output limits parallelism and makes RNNs unwieldy for very long sequences. We
introduce quasi-recurrent neural networks (QRNNs), an approach to neural
sequence modeling that alternates convolutional layers, which apply in parallel
across timesteps, and a minimalist recurrent pooling function that applies in
parallel across channels. Despite lacking trainable recurrent layers, stacked
QRNNs have better predictive accuracy than stacked LSTMs of the same hidden
size. Due to their increased parallelism, they are up to 16 times faster at
train and test time. Experiments on language modeling, sentiment
classification, and character-level neural machine translation demonstrate
these advantages and underline the viability of QRNNs as a basic building block
for a variety of sequence tasks.
Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli
Comments: ICLR 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Recent works on neural architectures have demonstrated the utility of
attention mechanisms for a wide variety of tasks. Attention models used for
problems such as image captioning typically depend on the image under
consideration, as well as the previous sequence of words that come before the
word currently being generated. While these types of models have produced
impressive results, they are not able to model the higher-order interactions
involved in problems such as video description/captioning, where the
relationship between parts of the video and the concepts being depicted is
complex. Motivated by these observations, we propose a novel memory-based
attention model for video description. Our model utilizes memories of past
attention when reasoning about where to attend to in the current time step,
similar to the central executive system proposed in human cognition. This
allows the model to not only reason about local attention more effectively, it
allows it to consider the entire sequence of video frames while generating each
word. Evaluation on the challenging and popular MSVD and Charades datasets show
that the proposed architecture outperforms all previously proposed methods and
leads to a new state of the art results in the video description.
Michał Nowicki, Jan Wietrzykowski
Subjects: Robotics (cs.RO); Neural and Evolutionary Computing (cs.NE)
Using WiFi signals for indoor localization is the main localization modality
of the existing personal indoor localization systems operating on mobile
devices. WiFi fingerprinting is also used for mobile robots, as WiFi signals
are usually available indoors and can provide rough initial position estimate
or can be used together with other positioning systems. Currently, the best
solutions rely on filtering, manual data analysis, and time-consuming parameter
tuning to achieve reliable and accurate localization. In this work, we propose
to use deep neural networks to significantly lower the work-force burden of the
localization system design, while still achieving satisfactory results.
Assuming the state-of-the-art hierarchical approach, we employ the DNN system
for building/floor classification. We show that stacked autoencoders allow to
efficiently reduce the feature space in order to achieve robust and precise
classification. The proposed architecture is verified on the publicly available
UJIIndoorLoc dataset and the results are compared with other solutions.
Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca
Comments: Submitted to ICLR2017 conference
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Regularization is key for deep learning since it allows training more complex
models while keeping lower levels of overfitting. However, the most prevalent
regularizations do not leverage all the capacity of the models since they rely
on reducing the effective number of parameters. Feature decorrelation is an
alternative for using the full capacity of the models but the overfitting
reduction margins are too narrow given the overhead it introduces. In this
paper, we show that regularizing negatively correlated features is an obstacle
for effective decorrelation and present OrthoReg, a novel regularization
technique that locally enforces feature orthogonality. As a result, imposing
locality constraints in feature decorrelation removes interferences between
negatively correlated features, allowing the regularizer to reach higher
decorrelation bounds, and reducing the overfitting more effectively. In
particular, we show that the models regularized with OrthoReg have higher
accuracy bounds even when batch normalization and dropout present. Moreover,
since our regularization is directly performed on the weights, it is especially
suitable for fully convolutional neural networks, where the weight space is
constant compared to the feature map space. As a result, we are able to reduce
the overfitting of state-of-the-art CNNs on CIFAR-10, CIFAR-100, and SVHN.
Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, Tarek Abdelzaher
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)
Mobile sensing applications usually require time-series inputs from sensors.
Some applications, such as tracking, can use sensed acceleration and rate of
rotation to calculate displacement based on physical system models. Other
applications, such as activity recognition, extract manually designed features
from sensor inputs for classification. Such applications face two challenges.
On one hand, on-device sensor measurements are noisy. For many mobile
applications, it is hard to find a distribution that exactly describes the
noise in practice. Unfortunately, calculating target quantities based on
physical system and noise models is only as accurate as the noise assumptions.
Similarly, in classification applications, although manually designed features
have proven to be effective, it is not always straightforward to find the most
robust features to accommodate diverse sensor noise patterns and user
behaviors. To this end, we propose DeepSense, a deep learning framework that
directly addresses the aforementioned noise and feature customization
challenges in a unified manner. DeepSense integrates convolutional and
recurrent neural networks to exploit local interactions among similar mobile
sensors, merge local interactions of different sensory modalities into global
interactions, and extract temporal relationships to model signal dynamics.
DeepSense thus provides a general signal estimation and classification
framework that accommodates a wide range of applications. We demonstrate the
effectiveness of DeepSense using three representative and challenging tasks:
car tracking with motion sensors, heterogeneous human activity recognition, and
user identification with biometric motion analysis. DeepSense significantly
outperforms the state-of-the-art methods for all three tasks. In addition,
DeepSense is feasible to implement on smartphones due to its moderate energy
consumption and low latency
Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Physics and Society (physics.soc-ph)
When encountering novel object, humans are able to infer a wide range of
physical properties such as mass, friction and deformability by interacting
with them in a goal driven way. This process of active interaction is in the
same spirit of a scientist performing an experiment to discover hidden facts.
Recent advances in artificial intelligence have yielded machines that can
achieve superhuman performance in Go, Atari, natural language processing, and
complex control problems, but it is not clear that these systems can rival the
scientific intuition of even a young child. In this work we introduce a basic
set of tasks that require agents to estimate hidden properties such as mass and
cohesion of objects in an interactive simulated environment where they can
manipulate the objects and observe the consequences. We found that state of art
deep reinforcement learning methods can learn to perform the experiments
necessary to discover such hidden properties. By systematically manipulating
the problem difficulty and the cost incurred by the agent for performing
experiments, we found that agents learn different strategies that balance the
cost of gathering information against the cost of making mistakes in different
situations.
Jacob Andreas, Dan Klein, Sergey Levine
Comments: Submitted to ICLR
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We describe a framework for multitask deep reinforcement learning guided by
policy sketches. Sketches annotate each task with a sequence of named subtasks,
providing high-level structural relationships among tasks, but not providing
the detailed guidance required by previous work on learning policy abstractions
for RL (e.g. intermediate rewards, subtask completion signals, or intrinsic
motivations). Our approach associates every subtask with its own modular
subpolicy, and jointly optimizes over full task-specific policies by tying
parameters across shared subpolicies. This optimization is accomplished via a
simple decoupled actor-critic training objective that facilitates learning
common behaviors from dissimilar reward functions. We evaluate the
effectiveness of our approach on a maze navigation game and a 2-D
Minecraft-inspired crafting game. Both games feature extremely sparse rewards
that can be obtained only after completing a number of high-level subgoals
(e.g. escaping from a sequence of locked rooms or collecting and combining
various ingredients in the proper order). Experiments illustrate two main
advantages of our approach. First, we outperform standard baselines that learn
task-specific or shared monolithic policies. Second, our method naturally
induces a library of primitive behaviors that can be recombined to rapidly
acquire policies for new tasks.
Timothy Dozat, Christopher D. Manning
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
While deep learning parsing approaches have proven very successful at finding
the structure of sentences, most neural dependency parsers use neural networks
only for feature extraction, and then use those features in traditional parsing
algorithms. In contrast, this paper builds off recent work using
general-purpose neural network components, training an attention mechanism over
an LSTM to attend to the head of the phrase. We get state-of-the-art results
for standard dependency parsing benchmarks, achieving 95.44% UAS and 93.76% LAS
on the PTB dataset, 0.8% and 1.0% improvement, respectively, over Andor et al.
(2016). In addition to proposing a new parsing architecture using
dimensionality reduction and biaffine interactions, we examine simple
hyperparameter choices that had a profound influence on the model’s
performance, such as reducing the value of beta2 in the Adam optimization
algorithm.
Mirmorsal Madani
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Today data mining techniques are exploited in medical science for diagnosing,
overcoming and treating diseases. Neural network is one of the techniques which
are widely used for diagnosis in medical field. In this article efficiency of
nine algorithms, which are basis of neural network learning in diagnosing
cardiovascular diseases, will be assessed. Algorithms are assessed in terms of
accuracy, sensitivity, transparency, AROC and convergence rate by means of 10
fold cross validation. The results suggest that in training phase, Lonberg-M
algorithm has the best efficiency in terms of all metrics, algorithm OSS has
maximum accuracy in testing phase, algorithm SCG has the maximum transparency
and algorithm CGB has the maximum sensitivity.
Ishan Durugkar, Ian Gemp, Sridhar Mahadevan
Comments: Submitted as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE)
Generative adversarial networks (GANs) are a framework for producing a
generative model by way of a two-player minimax game. In this paper, we propose
the emph{Generative Multi-Adversarial Network} (GMAN), a framework that
extends GANs to multiple discriminators. In previous work, the successful
training of GANs requires modifying the minimax objective to accelerate
training early on. In contrast, GMAN can be reliably trained with the original,
untampered objective. We explore a number of design perspectives with the
discriminator role ranging from formidable adversary to forgiving teacher.
Image generation tasks comparing the proposed framework to standard GANs
demonstrate GMAN produces higher quality samples in a fraction of the
iterations when measured by a pairwise GAM-type metric.
Patrick McClure, Nikolaus Kriegeskorte
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
As deep neural networks (DNNs) are applied to increasingly challenging
problems, they will need to be able to represent their own uncertainty.
Modeling uncertainty is one of the key features of Bayesian methods. Scalable
Bayesian DNNs that use dropout-based variational distributions have recently
been proposed. Here we evaluate the ability of Bayesian DNNs trained with
Bernoulli or Gaussian distributions over units (dropout) or weights
(dropconnect) to represent their own uncertainty at the time of inference
through sampling. We tested how well Bayesian fully connected and convolutional
DNNs represented their own uncertainty in classifying the MNIST handwritten
digits. By adding different levels of Gaussian noise to the test images, we
assessed how DNNs represented their uncertainty about regions of input space
not covered by the training set. Bayesian DNNs estimated their own uncertainty
more accurately than traditional DNNs with a softmax output. These results are
important for building better deep learning systems and for investigating the
hypothesis that biological neural networks use sampling to represent
uncertainty.
Barret Zoph, Quoc V. Le
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Neural networks are powerful and flexible models that work well for many
difficult learning tasks in image, speech and natural language understanding.
Despite their success, neural networks are still hard to design. In this paper,
we use a recurrent network to generate the model descriptions of neural
networks and train this RNN with reinforcement learning to maximize the
expected accuracy of the generated architectures on a validation set. On the
CIFAR-10 dataset, our method, starting from scratch, can design a novel network
architecture that rivals the best human-invented architecture in terms of test
set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84, which is
only 0.1 percent worse and 1.2x faster than the current state-of-the-art model.
On the Penn Treebank dataset, our model can compose a novel recurrent cell that
outperforms the widely-used LSTM cell, and other state-of-the-art baselines.
Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is
3.6 perplexity better than the previous state-of-the-art.
Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli
Comments: ICLR 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Recent works on neural architectures have demonstrated the utility of
attention mechanisms for a wide variety of tasks. Attention models used for
problems such as image captioning typically depend on the image under
consideration, as well as the previous sequence of words that come before the
word currently being generated. While these types of models have produced
impressive results, they are not able to model the higher-order interactions
involved in problems such as video description/captioning, where the
relationship between parts of the video and the concepts being depicted is
complex. Motivated by these observations, we propose a novel memory-based
attention model for video description. Our model utilizes memories of past
attention when reasoning about where to attend to in the current time step,
similar to the central executive system proposed in human cognition. This
allows the model to not only reason about local attention more effectively, it
allows it to consider the entire sequence of video frames while generating each
word. Evaluation on the challenging and popular MSVD and Charades datasets show
that the proposed architecture outperforms all previously proposed methods and
leads to a new state of the art results in the video description.
João J. de Macedo Neto, Jefersson A. dos Santos, William Robson Schwartz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Food fraud has been an area of great concern due to its risk to public
health, reduction of food quality or nutritional value and for its economic
consequences. For this reason, it’s been object of regulation in many countries
(e.g. [1], [2]). One type of food that has been frequently object of fraud
through the addition of water or an aqueous solution is bovine meat. The
traditional methods used to detect this kind of fraud are expensive,
time-consuming and depend on physicochemical analysis that require complex
laboratory techniques, specific for each added substance. In this paper, based
on digital images of histological cuts of adulterated and not-adulterated
(normal) bovine meat, we evaluate the of digital image analysis methods to
identify the aforementioned kind of fraud, with focus on the Local Binary
Pattern (LBP) algorithm.
Yaniv Taigman, Adam Polyak, Lior Wolf
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We study the problem of transferring a sample in one domain to an analog
sample in another domain. Given two related domains, S and T, we would like to
learn a generative function G that maps an input sample from S to the domain T,
such that the output of a given function f, which accepts inputs in either
domains, would remain unchanged. Other than the function f, the training data
is unsupervised and consist of a set of samples from each domain. The Domain
Transfer Network (DTN) we present employs a compound loss function that
includes a multiclass GAN loss, an f-constancy component, and a regularizing
component that encourages G to map samples from T to themselves. We apply our
method to visual domains including digits and face images and demonstrate its
ability to generate convincing novel images of previously unseen entities,
while preserving their identity.
Yiyi Liao, Lichao Huang, Yue Wang, Sarath Kodagoda, Yinan Yu, Yong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Many standard robotic platforms are equipped with at least a fixed 2D laser
range finder and a monocular camera. Although those platforms do not have
sensors for 3D depth sensing capability, knowledge of depth is an essential
part in many robotics activities. Therefore, recently, there is an increasing
interest in depth estimation using monocular images. As this task is inherently
ambiguous, the data-driven estimated depth might be unreliable in robotics
applications. In this paper, we have attempted to improve the precision of
monocular depth estimation by introducing 2D planar observation from the
remaining laser range finder without extra cost. Specifically, we construct a
dense reference map from the sparse laser range data, redefining the depth
estimation task as estimating the distance between the real and the reference
depth. To solve the problem, we construct a novel residual of residual neural
network, and tightly combine the classification and regression losses for
continuous depth estimation. Experimental results suggest that our method
achieves considerable promotion compared to the state-of-the-art methods on
both NYUD2 and KITTI, validating the effectiveness of our method on leveraging
the additional sensory information. We further demonstrate the potential usage
of our method in obstacle avoidance where our methodology provides
comprehensive depth information compared to the solution using monocular camera
or 2D laser range finder alone.
Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes
Comments: NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Two-stream Convolutional Networks (ConvNets) have shown strong performance
for human action recognition in videos. Recently, Residual Networks (ResNets)
have arisen as a new technique to train extremely deep architectures. In this
paper, we introduce spatiotemporal ResNets as a combination of these two
approaches. Our novel architecture generalizes ResNets for the spatiotemporal
domain by introducing residual connections in two ways. First, we inject
residual connections between the appearance and motion pathways of a two-stream
architecture to allow spatiotemporal interaction between the two streams.
Second, we transform pretrained image ConvNets into spatiotemporal networks by
equipping these with learnable convolutional filters that are initialized as
temporal residual connections and operate on adjacent feature maps in time.
This approach slowly increases the spatiotemporal receptive field as the depth
of the model increases and naturally integrates image ConvNet design
principles. The whole model is trained end-to-end to allow hierarchical
learning of complex spatiotemporal features. We evaluate our novel
spatiotemporal ResNet using two widely used action recognition benchmarks where
it exceeds the previous state-of-the-art.
Adriana Kovashka, Olga Russakovsky, Li Fei-Fei, Kristen Grauman
Comments: A 69-page meta review of the field, Foundations and Trends in Computer Graphics and Vision, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.
Minh-Tan Pham, Grégoire Mercier, Lionel Bombrun, Julien Michel
Comments: 11 pages, in IEEE-TIP peer review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A novel efficient method for content-based image retrieval (CBIR) is
developed in this paper using both texture and color features. Our motivation
is to represent and characterize an input image by a set of local descriptors
extracted at characteristic points (i.e. keypoints) within the image. Then,
dissimilarity measure between images is calculated based on the geometric
distance between the topological feature spaces (i.e. manifolds) formed by the
sets of local descriptors generated from these images. In this work, we propose
to extract and use the local extrema pixels as our feature points. Then, the
so-called local extrema-based descriptor (LED) is generated for each keypoint
by integrating all color, spatial as well as gradient information captured by a
set of its nearest local extrema. Hence, each image is encoded by a LED feature
point cloud and riemannian distances between these point clouds enable us to
tackle CBIR. Experiments performed on Vistex, Stex and colored Brodatz texture
databases using the proposed approach provide very efficient and competitive
results compared to the state-of-the-art methods.
Avijit Dasgupta, Sonam Singh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic segmentation of retinal blood vessels from fundus images plays an
important role in the computer aided diagnosis of retinal diseases. The task of
blood vessel segmentation is challenging due to the extreme variations in
morphology of the vessels against noisy background. In this paper, we formulate
the segmentation task as a multi-label inference task and utilize the implicit
advantages of the combination of convolutional neural networks and structured
prediction. Our proposed convolutional neural network based model achieves
strong performance and significantly outperforms the state-of-the-art for
automatic retinal blood vessel segmentation on DRIVE dataset with 95.33%
accuracy and 0.974 AUC score.
Huabin Zheng, Jingyu Wang, Zhengjie Huang, Rong Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
OCR character segmentation for multilingual printed documents is difficult
due to the diversity of different linguistic characters. Previous approaches
mainly focus on monolingual text and are not suitable for multi-lingual cases.
In this work, we particularly tackle the Chinese/English mixed case by
reframing it as a semantic segmentation problem. We take advantage of the
successful architecture called fully convolutional networks (FCN) in the field
of semantic segmentation. As a deep architecture, FCN can automatically learn
useful features without traditional feature engineering. Given wide enough
receptive field, it can utilize the necessary context around a position to
better determinate whether this is a splitting point or not. Trained on
synthesized samples with simulated random disturbances, FCN can effectively
split characters without any hand-crafted features. The experimental results
show that our model significantly outperforms the previous methods. It is able
to generalize from simulated disturbances to real-world disturbances,
generalize from one text content style to another, generalize from seen font
styles to unseen ones, and correctly handle disconnected structures and
touching characters.
Peisong Wang, Jian Cheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In recent years, Deep Neural Networks (DNNs) based methods have achieved
remarkable performance in a wide range of tasks and have been among the most
powerful and widely used techniques in computer vision, speech recognition and
Natural Language Processing. However, DNN-based methods are both
computational-intensive and resource-consuming, which hinders the application
of these methods on embedded systems like smart phones. To alleviate this
problem, we introduce a novel Fixed-point Factorized Networks (FFN) on
pre-trained models to reduce the computational complexity as well as the
storage requirement of networks. Extensive experiments on large-scale ImageNet
classification task show the effectiveness of our proposed method.
Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, Pierre Alliez
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural networks (CNNs) have received increasing attention over
the last few years. They were initially conceived for image categorization,
i.e., the problem of assigning a semantic label to an entire input image.
In this paper we address the problem of dense semantic labeling, which
consists in assigning a semantic label to every pixel in an image. Since this
requires a high spatial accuracy to determine where labels are assigned,
categorization CNNs, intended to be highly robust to local deformations, are
not directly applicable.
By adapting categorization networks, many semantic labeling CNNs have been
recently proposed. Our first contribution is an in-depth analysis of these
architectures. We establish the desired properties of an ideal semantic
labeling CNN, and assess how those methods stand with regard to these
properties. We observe that even though they provide competitive results, these
CNNs often underexploit properties of semantic labeling that could lead to more
effective and efficient architectures.
Out of these observations, we then derive a CNN framework specifically
adapted to the semantic labeling problem. In addition to learning features at
different resolutions, it learns how to combine these features. By integrating
local and global information in an efficient and flexible manner, it
outperforms previous techniques. We evaluate the proposed framework and compare
it with state-of-the-art architectures on public benchmarks of high-resolution
aerial image labeling.
Ye Liu, Liqiang Nie, Lei Han, Luming Zhang, David S Rosenblum
Comments: IJCAI 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
As compared to simple actions, activities are much more complex, but
semantically consistent with a human’s real life. Techniques for action
recognition from sensor generated data are mature. However, there has been
relatively little work on bridging the gap between actions and activities. To
this end, this paper presents a novel approach for complex activity recognition
comprising of two components. The first component is temporal pattern mining,
which provides a mid-level feature representation for activities, encodes
temporal relatedness among actions, and captures the intrinsic properties of
activities. The second component is adaptive Multi-Task Learning, which
captures relatedness among activities and selects discriminant features.
Extensive experiments on a real-world dataset demonstrate the effectiveness of
our work.
Yong Guo, Mingkui Tan, Qingyao Wu, Jian Chen, Anton Van Den Hengel, Qinfeng Shi
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural networks (CNNs) with very deep architectures, such as
the residual network (ResNet) [6], have shown encouraging results in various
tasks in computer vision and machine learning. Their depth has been one of the
key factors behind the great success of CNNs, with the associated gradient
vanishing issue having been largely addressed by ResNet. There are other issues
associated with increased depth, however. First, when networks get very deep,
the supervision information may vanish due to the associated long
backpropagation path. This means that intermediate layers receive less training
information, which results in redundancy in models. Second, when the model
becomes more complex and redundant, inference becomes more expensive. Third,
very deep models require larger volumes of training data. We propose here
instead an AuxNet and a new training method to propagate not only gradients but
also supervision information from multiple auxiliary outputs at intermediate
layers. The proposed AuxNet gives rise to a more compact network which
outperforms its very deep equivalent (i.e. ResNet). For example, AuxNet with 44
layers performs better than the original ResNet with 110 layers on several
benchmark data sets, i.e. CIFAR-10, CIFAR-100 and SVHN.
Connor J. Parde, Carlos Castillo, Matthew Q. Hill, Y. Ivette Colon, Swami Sankaranarayanan, Jun-Cheng Chen, Alice J. O'Toole
Comments: Submitted to Face and Gesture Conference, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Face recognition algorithms based on deep convolutional neural networks
(DCNNs) have made progress on the task of recognizing faces in unconstrained
viewing conditions. These networks operate with compact feature-based face
representations derived from learning a very large number of face images. While
the learned features produced by DCNNs can be highly robust to changes in
viewpoint, illumination, and appearance, little is known about the nature of
the face code that emerges at the top level of such networks. We analyzed the
DCNN features produced by two face recognition algorithms. In the first set of
experiments we used the top-level features from the DCNNs as input into linear
classifiers aimed at predicting metadata about the images. The results show
that the DCNN features contain surprisingly accurate information about the yaw
and pitch of a face, and about whether the face came from a still image or a
video frame. In the second set of experiments, we measured the extent to which
individual DCNN features operated in a view-dependent or view-invariant manner.
We found that view-dependent coding was a characteristic of the identities
rather than the DCNN features – with some identities coded consistently in a
view-dependent way and others in a view-independent way. In our third analysis,
we visualized the DCNN feature space for over 24,000 images of 500 identities.
Images in the center of the space were uniformly of low quality (e.g., extreme
views, face occlusion, low resolution). Image quality increased monotonically
as a function of distance from the origin. This result suggests that image
quality information is available in the DCNN features, such that consistently
average feature values reflect coding failures that reliably indicate poor or
unusable images. Combined, the results offer insight into the coding mechanisms
that support robust representation of faces in DCNNs.
Bin-Bin Gao, Chao Xing, Chen-Wei Xie, Jianxin Wu, Xin Geng
Comments: Submitted to IEEE Trans. Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional Neural Networks (ConvNets) have achieved excellent recognition
performance in various visual recognition tasks. A large labeled training set
is one of the most important factors for its success. However, it is difficult
to collect sufficient training images with precise labels in some domains such
as apparent age estimation, head pose estimation, multi-label classification
and semantic segmentation. Fortunately, there is ambiguous information among
labels, which makes these tasks different from traditional classification.
Based on this observation, we convert the label of each image into a discrete
label distribution, and learn the label distribution by minimizing a
Kullback-Leibler divergence between the predicted and ground-truth label
distributions using deep ConvNets. The proposed DLDL (Deep Label Distribution
Learning) method effectively utilizes the label ambiguity in both feature
learning and classifier learning, which prevents the network from over-fitting
even when the training set is small. Experimental results show that the
proposed approach produces significantly better results than state-of-the-art
methods for age estimation and head pose estimation. At the same time, it also
improves recognition performance for multi-label classification and semantic
segmentation tasks.
Henrique Tomaz Amaral-Silva, Luiz Otavio Murta-Jr, Paulo Mazzoncini de Azevedo-Marques, Lauro Wichert-Ana, V. B. Surya Prasath, Colin Studholme
Comments: 15 pages, 11 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Medical image registration plays an important role in determining topographic
and morphological changes for functional diagnostic and therapeutic purposes.
Manual alignment and semi-automated software still have been used; however they
are subjective and make specialists spend precious time. Fully automated
methods are faster and user-independent, but the critical point is registration
reliability. Similarity measurement using Mutual Information (MI) with Shannon
entropy (MIS) is the most common automated method that is being currently
applied in medical images, although more reliable algorithms have been proposed
over the last decade, suggesting improvements and different entropies; such as
Studholme et al, (1999), who demonstrated that the normalization of Mutual
Information (NMI) provides an invariant entropy measure for 3D medical image
registration. In this paper, we described a set of experiments to evaluate the
applicability of Tsallis entropy in the Mutual Information (MIT) and in the
Normalized Mutual Information (NMIT) as cost functions for Magnetic Resonance
Imaging (MRI), Positron Emission Tomography (PET) and Computed Tomography (CT)
exams registration. The effect of changing overlap in a simple image model and
clinical experiments on current entropies (Entropy Correlation Coefficient –
ECC, MIS and NMI) and the proposed ones (MIT and NMT) showed NMI and NMIT with
Tsallis parameter close to 1 as the best options (confidence and accuracy) for
CT to MRI and PET to MRI automatic neuroimaging registration.
Johannes Ballé, Valero Laparra, Eero P. Simoncelli
Comments: Under review as a conference paper at ICLR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
We describe an image compression system, consisting of a nonlinear encoding
transformation, a uniform quantizer, and a nonlinear decoding transformation.
Like many deep neural network architectures, the transforms consist of layers
of convolutional linear filters and nonlinear activation functions, but we use
a joint nonlinearity that implements a form of local gain control, inspired by
those used to model biological neurons. Using a variant of stochastic gradient
descent, we jointly optimize the system for rate-distortion performance over a
database of training images, introducing a continuous proxy for the
discontinuous loss function arising from the quantizer. The relaxed
optimization problem resembles that of variational autoencoders, except that it
must operate at any point along the rate-distortion curve, whereas the
optimization of generative models aims only to minimize entropy of the data
under the model. Across an independent database of test images, we find that
the optimized coder exhibits significantly better rate-distortion performance
than the standard JPEG and JPEG 2000 compression systems, as well as a dramatic
improvement in visual quality of compressed images.
Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatically describing an image with a natural language has been an
emerging challenge in both fields of computer vision and natural language
processing. In this paper, we present Long Short-Term Memory with Attributes
(LSTM-A) – a novel architecture that integrates attributes into the successful
Convolutional Neural Networks (CNNs) plus Recurrent Neural Networks (RNNs)
image captioning framework, by training them in an end-to-end manner. To
incorporate attributes, we construct variants of architectures by feeding image
representations and attributes into RNNs in different ways to explore the
mutual but also fuzzy relationship between them. Extensive experiments are
conducted on COCO image captioning dataset and our framework achieves superior
results when compared to state-of-the-art deep models. Most remarkably, we
obtain METEOR/CIDEr-D of 25.2%/98.6% on testing data of widely used and
publicly available splits in (Karpathy & Fei-Fei, 2015) when extracting image
representations by GoogleNet and achieve to date top-1 performance on COCO
captioning Leaderboard.
Victor Campmany, Sergio Silva, Antonio Espinosa, Juan Carlos Moure, David Vázquez, Antonio M. López
Comments: 10 pages
Journal-ref: International Conference on Computational Science 2016 Volume 80
Pages 2377 to 2381
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a real-time pedestrian detection system for the embedded Nvidia
Tegra X1 GPU-CPU hybrid platform. The pipeline is composed by the following
state-of-the-art algorithms: Histogram of Local Binary Patterns (LBP) and
Histograms of Oriented Gradients (HOG) features extracted from the input image;
Pyramidal Sliding Window technique for candidate generation; and Support Vector
Machine (SVM) for classification. Results show a 8x speedup in the target Tegra
X1 platform and a better performance/watt ratio than desktop CUDA platforms in
study.
Jiedong Hao, Jing Dong, Wei Wang, Tieniu Tan
Comments: The verison submitted to ICLR
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Previous work has shown that feature maps of deep convolutional neural
networks (CNNs) can be interpreted as feature representation of a particular
image region. Features aggregated from these feature maps have been exploited
for image retrieval tasks and achieved state-of-the-art performances in recent
years. The key to the success of such methods is the feature representation.
However, the different factors that impact the effectiveness of features are
still not explored thoroughly. There are much less discussion about the best
combination of them.
The main contribution of our paper is the thorough evaluations of the various
factors that affect the discriminative ability of the features extracted from
CNNs. Based on the evaluation results, we also identify the best choices for
different factors and propose a new multi-scale image feature representation
method to encode the image effectively. Finally, we show that the proposed
method generalises well and outperforms the state-of-the-art methods on four
typical datasets used for visual instance retrieval.
Brandon M. Smith, Charles R. Dyer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Despite much interest in face alignment in recent years, the large majority
of work has focused on near-frontal faces. Algorithms typically break down on
profile faces, or are too slow for real-time applications. In this work we
propose an efficient approach to face alignment that can handle 180 degrees of
head rotation in a unified way (e.g., without resorting to view-based models)
using 2D training data. The foundation of our approach is cascaded shape
regression (CSR), which has emerged recently as the leading strategy. We
propose a generalization of conventional CSRs that we call branching cascaded
regression (BCR). Conventional CSRs are single-track; that is, they progress
from one cascade level to the next in a straight line, with each regressor
attempting to fit the entire dataset. We instead split the regression problem
into two or more simpler ones after each cascade level. Intuitively, each
regressor can then operate on a simpler objective function (i.e., with fewer
conflicting gradient directions). Within the BCR framework, we model and infer
pose-related landmark visibility and face shape simultaneously using Structured
Point Distribution Models (SPDMs). We propose to learn task-specific feature
mapping functions that are adaptive to landmark visibility, and that use SPDM
parameters as regression targets instead of 2D landmark coordinates.
Additionally, we introduce a new in-the-wild dataset of profile faces to
validate our approach.
Michał Nowicki, Jan Wietrzykowski, Piotr Skrzypczyński
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
The paper presents an approach to indoor personal localization using a
sequence of captured images. The solution is tailored specifically for
processing vision data on mobile devices with limited computing power. The
presented FastABLE system is based on the ABLE-M place recognition algorithm,
but introduces major modifications to the concept of image matching, in order
to radically cut down the processing time, and to improve scalability. FastABLE
is compared against the original OpenABLE and the popular OpenFABMAP place
recognition systems.
Yoni Choukroun, Alon Shtern, Ron Kimmel
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Many shape analysis methods treat the geometry of an object as a metric space
that can be captured by the Laplace-Beltrami operator. In this paper, we
propose to adapt a classical operator from quantum mechanics to the field of
shape analysis where we suggest to integrate a scalar function through a
unified elliptical Hamiltonian operator. We study the addition of a potential
function to the Laplacian as a generator for dual spaces in which shape
processing is performed. Then, we evaluate the resulting spectral basis for
different applications such as mesh compression and shape matching. The
suggested operator is shown to produce better functional spaces to operate
with, as demonstrated by the proposed framework that outperforms existing
spectral methods, for example, when applied to shape matching benchmarks.
Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Physics and Society (physics.soc-ph)
When encountering novel object, humans are able to infer a wide range of
physical properties such as mass, friction and deformability by interacting
with them in a goal driven way. This process of active interaction is in the
same spirit of a scientist performing an experiment to discover hidden facts.
Recent advances in artificial intelligence have yielded machines that can
achieve superhuman performance in Go, Atari, natural language processing, and
complex control problems, but it is not clear that these systems can rival the
scientific intuition of even a young child. In this work we introduce a basic
set of tasks that require agents to estimate hidden properties such as mass and
cohesion of objects in an interactive simulated environment where they can
manipulate the objects and observe the consequences. We found that state of art
deep reinforcement learning methods can learn to perform the experiments
necessary to discover such hidden properties. By systematically manipulating
the problem difficulty and the cost incurred by the agent for performing
experiments, we found that agents learn different strategies that balance the
cost of gathering information against the cost of making mistakes in different
situations.
Alexey Dosovitskiy, Vladlen Koltun
Comments: ICLR-2017 submission
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We present an approach to sensorimotor control in immersive environments. Our
approach utilizes a high-dimensional sensory stream and a lower-dimensional
measurement stream. The cotemporal structure of these streams provides a rich
supervisory signal, which enables training a sensorimotor control model by
interacting with the environment. The model is trained using supervised
learning techniques, but without extraneous supervision. It learns to act based
on raw sensory input from a complex three-dimensional environment. The
presented formulation allows changing the agent’s goal at test time. We conduct
extensive experiments in three-dimensional simulations based on the classical
first-person game Doom. The results demonstrate that the presented approach
outperforms sophisticated prior formulations, particularly on challenging
tasks. The results also show that trained models successfully generalize across
environments and goals. A model trained using the presented approach won the
Full Deathmatch track of the Visual Doom AI Competition, which was held in
previously unseen environments.
Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Lipreading is the task of decoding text from the movement of a speaker’s
mouth. Traditional approaches separated the problem into two stages: designing
or learning visual features, and prediction. More recent deep lipreading
approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman,
2016a). All existing works, however, perform only word classification, not
sentence-level sequence prediction. Studies have shown that human lipreading
performance increases for longer words (Easton & Basala, 1982), indicating the
importance of features capturing temporal context in an ambiguous communication
channel. Motivated by this observation, we present LipNet, a model that maps a
variable-length sequence of video frames to text, making use of spatiotemporal
convolutions, an LSTM recurrent network, and the connectionist temporal
classification loss, trained entirely end-to-end. To the best of our knowledge,
LipNet is the first lipreading model to operate at sentence-level, using a
single end-to-end speaker-independent deep model to simultaneously learn
spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet
achieves 93.4% accuracy, outperforming experienced human lipreaders and the
previous 79.6% state-of-the-art accuracy.
Meisam Hejazi Nia, Brian Ratchford
Subjects: Artificial Intelligence (cs.AI)
I suggest an approach that helps the online marketers to target their
Gamification elements to users by modifying the order of the list of tasks that
they send to users. It is more realistic and flexible as it allows the model to
learn more parameters when the online marketers collect more data. The
targeting approach is scalable and quick, and it can be used over streaming
data.
Dan Geiger, David Heckerman
Subjects: Artificial Intelligence (cs.AI); Combinatorics (math.CO); Probability (math.PR)
We examine three probabilistic concepts related to the sentence “two
variables have no bearing on each other”. We explore the relationships between
these three concepts and establish their relevance to the process of
constructing similarity networks—a tool for acquiring probabilistic knowledge
from human experts. We also establish a precise relationship between
connectedness in Bayesian networks and relevance in probability.
Oron Anschel, Nir Baram, Nahum Shimkin
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
The commonly used Q-learning algorithm combined with function approximation
induces systematic overestimations of state-action values. These systematic
errors might cause instability, poor performance and sometimes divergence of
learning. In this work, we present the extsc{Averaged Target DQN} (ADQN)
algorithm, an adaptation to the DQN class of algorithms which uses a weighted
average over past learned networks to reduce generalization noise variance. As
a consequence, this leads to reduced overestimations, more stable learning
process and improved performance. Additionally, we analyze ADQN variance
reduction along trajectories and demonstrate the performance of ADQN on a toy
Gridworld problem, as well as on several of the Atari 2600 games from the
Arcade Learning Environment.
Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli
Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Recent years have seen the proposal of a number of neural architectures for
the problem of Program Induction. Given a set of input-output examples, these
architectures are able to learn mappings that generalize to new test inputs.
While achieving impressive results, these approaches have a number of important
limitations: (a) they are computationally expensive and hard to train, (b) a
model has to be trained for each task (program) separately, and (c) it is hard
to interpret or verify the correctness of the learnt mapping (as it is defined
by a neural network). In this paper, we propose a novel technique,
Neuro-Symbolic Program Synthesis, to overcome the above-mentioned problems.
Once trained, our approach can automatically construct computer programs in a
domain-specific language that are consistent with a set of input-output
examples provided at test time. Our method is based on two novel neural
modules. The first module, called the cross correlation I/O network, given a
set of input-output examples, produces a continuous representation of the set
of I/O examples. The second module, the Recursive-Reverse-Recursive Neural
Network (R3NN), given the continuous representation of the examples,
synthesizes a program by incrementally expanding partial programs. We
demonstrate the effectiveness of our approach by applying it to the rich and
complex domain of regular expression based string transformations. Experiments
show that the R3NN model is not only able to construct programs from new
input-output examples, but it is also able to construct new programs for tasks
that it had never observed before during training.
Ricardo Usbeck, Jonathan Huthmann, Nico Duldhardt, Axel-Cyrille Ngonga Ngomo
Comments: 6 pages, 1 figure, pre-print in lncs
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Question answering (QA) has been the subject of a resurgence over the past
years. The said resurgence has led to a multitude of question answering (QA)
systems being developed both by companies and research facilities. While a few
components of QA systems get reused across implementations, most systems do not
leverage the full potential of component reuse. Hence, the development of QA
systems is currently still a tedious and time-consuming process. We address the
challenge of accelerating the creation of novel or tailored QA systems by
presenting a concept for a self-wiring approach to composing QA systems. Our
approach will allow the reuse of existing, web-based QA systems or modules
while developing new QA platforms. To this end, it will rely on QA modules
being described using the Web Ontology Language. Based on these descriptions,
our approach will be able to automatically compose QA systems using a
data-driven approach automatically.
Akshay Balsubramani
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We formulate learning of a binary autoencoder as a biconvex optimization
problem which learns from the pairwise correlations between encoded and decoded
bits. Among all possible algorithms that use this information, ours finds the
autoencoder that reconstructs its inputs with worst-case optimal loss. The
optimal decoder is a single layer of artificial neurons, emerging entirely from
the minimax loss minimization, and with weights learned by convex optimization.
All this is reflected in competitive experimental results, demonstrating that
binary autoencoding can be done efficiently by conveying information in
pairwise correlations in an optimal fashion.
Liwen Zhang, John Winn, Ryota Tomioka
Comments: 12 pages, 2 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We propose the Gaussian attention model for content-based neural memory
access. With the proposed attention model, a neural network has the additional
degree of freedom to control the focus of its attention from a laser sharp
attention to blurred attention. It is applicable whenever we can assume that
the distance in the latent space reflects some notion of semantics. We use the
proposed attention model as a scoring function for the embedding of a
knowledgebase into a continuous vector space and then train a model that
performs question answering about the entities in the knowledgebase. The
proposed attention model can handle both the propagation of uncertainty when
following a series of relations and also the conjunction of thoughts in a
natural way. On a dataset of soccer players who participated in the FIFA World
Cup 2014, we demonstrate that our model can handle both path queries and
conjunctive queries well.
Miguel Lázaro-Gredilla, Yi Liu, D. Scott Phoenix, Dileep George
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We introduce the hierarchical compositional network (HCN), a directed
generative model able to discover and disentangle, without supervision, the
building blocks of a set of binary images. The building blocks are binary
features defined hierarchically as a composition of some of the features in the
layer immediately below, arranged in a particular manner. At a high level, HCN
is similar to a sigmoid belief network with pooling. Inference and learning in
HCN are very challenging and existing variational approximations do not work
satisfactorily. A main contribution of this work is to show that both can be
addressed using max-product message passing (MPMP) with a particular schedule
(no EM required). Also, using MPMP as an inference engine for HCN makes new
tasks simple: adding supervision information, classifying images, or performing
inpainting all correspond to clamping some variables of the model to their
known values and running MPMP on the rest. When used for classification, fast
inference with HCN has exactly the same functional form as a convolutional
neural network (CNN) with linear activations and binary weights. However, HCN’s
features are qualitatively very different.
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Model-free deep reinforcement learning (RL) methods have been successful in a
wide variety of simulated domains. However, a major obstacle facing deep RL in
the real world is the high sample complexity of such methods. Unbiased batch
policy-gradient methods offer stable learning, but at the cost of high
variance, which often requires large batches, while TD-style methods, such as
off-policy actor-critic and Q-learning, are more sample-efficient but biased,
and often require costly hyperparameter sweeps to stabilize. In this work, we
aim to develop methods that combine the stability of unbiased policy gradients
with the efficiency of off-policy RL. We present Q-Prop, a policy gradient
method that uses a Taylor expansion of the off-policy critic as a control
variate. Q-Prop is both sample efficient and stable, and effectively combines
the benefits of on-policy and off-policy methods. We analyze the connection
between Q-Prop and existing model-free algorithms, and use control variate
theory to derive two variants of Q-Prop with conservative and aggressive
adaptation. We show that conservative Q-Prop provides substantial gains in
sample efficiency over trust region policy optimization (TRPO) with generalized
advantage estimation (GAE), and improves stability over deep deterministic
policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods,
on OpenAI Gym’s MuJoCo continuous control environments.
Nadav Bhonker, Shai Rozenberg, Itay Hubara
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Mastering a video game requires skill, tactics and strategy. While these
attributes may be acquired naturally by human players, teaching them to a
computer program is a far more challenging task. In recent years, extensive
research was carried out in the field of reinforcement learning and numerous
algorithms were introduced, aiming to learn how to perform human tasks such as
playing video games. As a results, the Arcade Learning Environment (ALE) has
become a commonly used benchmark environment allowing algorithms to train on
various Atari 2600 games. Most Atari games no longer pose a challenge to
state-of-the-art algorithms. In this paper we introduce a new learning
environment, the Retro Learning Environment — RLE, based on the Super
Nintendo Entertainment System (SNES). The environment is expandable, allowing
for more video games and consoles to be easily added to the environment, while
maintaining the same interface as ALE. Moreover, RLE is compatible with Python
and Torch. SNES games pose a significant challenge to current algorithms due to
their higher level of complexity and versatility. To overcome these challenges,
we introduce a novel training method based on training two agents against each
other.
Valeria Efimova, Andrey Filchenkov, Anatoly Shalyto
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Many algorithms for data analysis exist, especially for classification
problems. To solve a data analysis problem, a proper algorithm should be
chosen, and also its hyperparameters should be selected. In this paper, we
present a new method for the simultaneous selection of an algorithm and its
hyperparameters. In order to do so, we reduced this problem to the multi-armed
bandit problem. We consider an algorithm as an arm and algorithm
hyperparameters search during a fixed time as the corresponding arm play. We
also suggest a problem-specific reward function. We performed the experiments
on 10 real datasets and compare the suggested method with the existing one
implemented in Auto-WEKA. The results show that our method is significantly
better in most of the cases and never worse than the Auto-WEKA.
Ivan Smetannikov, Ilya Isaev, Andrey Filchenkov
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
One of the classical problems in machine learning and data mining is feature
selection. A feature selection algorithm is expected to be quick, and at the
same time it should show high performance. MeLiF algorithm effectively solves
this problem using ensembles of ranking filters. This article describes two
different ways to improve MeLiF algorithm performance with parallelization.
Experiments show that proposed schemes significantly improves algorithm
performance and increase feature selection quality.
Wentao Huang, Kechen Zhang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
A framework is presented for unsupervised learning of representations based
on infomax principle for large-scale neural populations. We use an asymptotic
approximation to the Shannon’s mutual information for a large neural population
to demonstrate that a good initial approximation to the global
information-theoretic optimum can be obtained by a hierarchical infomax method.
From the initial solution, an efficient algorithm based on gradient descent of
the final objective function is proposed to learn representations from the
input datasets, allowing complete, overcomplete, or undercomplete bases. As
confirmed by numerical experiments, our method is robust and highly efficient
for extracting salient features from image datasets. Compared with the main
existing methods, our algorithm has a distinct advantage in both the training
speed and the robustness of unsupervised representation learning.
Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Physics and Society (physics.soc-ph)
When encountering novel object, humans are able to infer a wide range of
physical properties such as mass, friction and deformability by interacting
with them in a goal driven way. This process of active interaction is in the
same spirit of a scientist performing an experiment to discover hidden facts.
Recent advances in artificial intelligence have yielded machines that can
achieve superhuman performance in Go, Atari, natural language processing, and
complex control problems, but it is not clear that these systems can rival the
scientific intuition of even a young child. In this work we introduce a basic
set of tasks that require agents to estimate hidden properties such as mass and
cohesion of objects in an interactive simulated environment where they can
manipulate the objects and observe the consequences. We found that state of art
deep reinforcement learning methods can learn to perform the experiments
necessary to discover such hidden properties. By systematically manipulating
the problem difficulty and the cost incurred by the agent for performing
experiments, we found that agents learn different strategies that balance the
cost of gathering information against the cost of making mistakes in different
situations.
Alexey Dosovitskiy, Vladlen Koltun
Comments: ICLR-2017 submission
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We present an approach to sensorimotor control in immersive environments. Our
approach utilizes a high-dimensional sensory stream and a lower-dimensional
measurement stream. The cotemporal structure of these streams provides a rich
supervisory signal, which enables training a sensorimotor control model by
interacting with the environment. The model is trained using supervised
learning techniques, but without extraneous supervision. It learns to act based
on raw sensory input from a complex three-dimensional environment. The
presented formulation allows changing the agent’s goal at test time. We conduct
extensive experiments in three-dimensional simulations based on the classical
first-person game Doom. The results demonstrate that the presented approach
outperforms sophisticated prior formulations, particularly on challenging
tasks. The results also show that trained models successfully generalize across
environments and goals. A model trained using the presented approach won the
Full Deathmatch track of the Visual Doom AI Competition, which was held in
previously unseen environments.
Shuohang Wang, Jing Jiang
Comments: 11 pages, 2 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Many NLP tasks including machine comprehension, answer selection and text
entailment require the comparison between sequences. Matching the important
units between sequences is a key to solve these problems. In this paper, we
present a general “compare-aggregate” framework that performs word-level
matching followed by aggregation using Convolutional Neural Networks. We
particularly focus on the different comparison functions we can use to match
two vectors. We use four different datasets to evaluate the model. We find that
some simple comparison functions based on element-wise operations can work
better than standard neural network and neural tensor network.
Leopoldo Bertossi, Babak Salimi
Comments: Journal submission. Extended version of Flairs’16 and UAI’15 WS on Causality papers
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
Causality has been recently introduced in databases, to model, characterize,
and possibly compute causes for query answers. Connections between QA-causality
and consistency-based diagnosis and database repairs (wrt. integrity constraint
violations) have already been established. In this work we establish precise
connections between QA-causality and both abductive diagnosis and the
view-update problem in databases, allowing us to obtain new algorithmic and
complexity results for QA-causality. We also obtain new results on the
complexity of view-conditioned causality, and investigate the notion of
QA-causality in the presence of integrity constraints, obtaining complexity
results from a connection with view-conditioned causality. The abduction
connection under integrity constraints allows us to obtain algorithmic tools
for QA-causality.
Feras Saad, Vikash Mansinghka
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Sparse databases with hundreds of variables are commonplace. In this setting,
it is both statistically and computationally challenging to detect true
predictive relationships between variables and also to suppress false
positives. This paper proposes a new approach to dependency detection that
combines probabilistic programming, information theory, and non-parametric
Bayesian modeling. The key ideas are to (i) build an ensemble of joint
probability models for the whole database via approximate posterior inference
in CrossCat, a non-parametric factorial mixture; (ii) identify independencies
by analyzing model structures; and (iii) report the distribution on conditional
mutual information induced by posterior uncertainty over the ensemble of
models. This paper presents experiments showing that the approach finds
relationships that pairwise correlation misses, including context-specific
independencies, on databases of mathematics exam scores and global indicators
of macroeconomic development.
Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based
language model designed to directly capture the global semantic meaning
relating words in a document via latent topics. Because of their sequential
nature, RNNs are good at capturing the local structure of a word sequence –
both semantic and syntactic – but might face difficulty remembering long-range
dependencies. Intuitively, these long-range dependencies are of semantic
nature. In contrast, latent topic models are able to capture the global
underlying semantic structure of a document but do not account for word
ordering. The proposed TopicRNN model integrates the merits of RNNs and latent
topic models: it captures local (syntactic) dependencies using an RNN and
global (semantic) dependencies using latent topics. Unlike previous work on
contextual RNN language modeling, our model is learned end-to-end. Empirical
results on word prediction show that TopicRNN outperforms existing contextual
RNN baselines. In addition, TopicRNN can be used as an unsupervised feature
extractor for documents. We do this for sentiment analysis and report a new
state-of-the-art error rate on the IMDB movie review dataset that amounts to a
(13.3\%) improvement over the previous best result. Finally TopicRNN also
yields sensible topics, making it a useful alternative to document models such
as latent Dirichlet allocation.
Jonas Degrave, Michiel Hermans, Joni Dambre, Francis wyffels
Comments: International Conference on Learning Representations 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Robotics (cs.RO)
One of the most important fields in robotics is the optimization of
controllers. Currently, robots are treated as a black box in this optimization
process, which is the reason why derivative-free optimization methods such as
evolutionary algorithms or reinforcement learning are omnipresent. We propose
an implementation of a modern physics engine, which has the ability to
differentiate control parameters. This has been implemented on both CPU and
GPU. We show how this speeds up the optimization process, even for small
problems, and why it will scale to bigger problems. We explain why this is an
alternative approach to deep Q-learning, for using deep learning in robotics.
Lastly, we argue that this is a big step for deep learning in robotics, as it
opens up new possibilities to optimize robots, both in hardware and software.
Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Policy gradient is an efficient technique for improving a policy in a
reinforcement learning setting. However, vanilla online variants are on-policy
only and not able to take advantage of off-policy data. In this paper we
describe a new technique that combines policy gradient with off-policy
Q-learning, drawing experience from a replay buffer. This is motivated by
making a connection between the fixed points of the regularized policy gradient
algorithm and the Q-values. This connection allows us to estimate the Q-values
from the action preferences of the policy, to which we apply Q-learning
updates. We refer to the new technique as ‘PGQ’, for policy gradient and
Q-learning. We also establish an equivalency between action-value fitting
techniques and actor-critic algorithms, showing that regularized policy
gradient techniques can be interpreted as advantage function learning
algorithms. We conclude with some numerical examples that demonstrate improved
data efficiency and stability of PGQ. In particular, we tested PGQ on the full
suite of Atari games and achieved performance exceeding that of both
asynchronous advantage actor-critic (A3C) and Q-learning.
Barret Zoph, Quoc V. Le
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Neural networks are powerful and flexible models that work well for many
difficult learning tasks in image, speech and natural language understanding.
Despite their success, neural networks are still hard to design. In this paper,
we use a recurrent network to generate the model descriptions of neural
networks and train this RNN with reinforcement learning to maximize the
expected accuracy of the generated architectures on a validation set. On the
CIFAR-10 dataset, our method, starting from scratch, can design a novel network
architecture that rivals the best human-invented architecture in terms of test
set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84, which is
only 0.1 percent worse and 1.2x faster than the current state-of-the-art model.
On the Penn Treebank dataset, our model can compose a novel recurrent cell that
outperforms the widely-used LSTM cell, and other state-of-the-art baselines.
Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is
3.6 perplexity better than the previous state-of-the-art.
Roderick Bloem, Nicolas Braud-Santoni, Vedad Hadzic
Comments: pre-print, submitted at TACAS 2017
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
We introduce a novel generalization of Counterexample-Guided Inductive
Synthesis (CEGIS) and instantiate it to yield a novel, competitive algorithm
for solving Quantified Boolean Formulas (QBF). Current QBF solvers based on
counterexample-guided expansion use a recursive approach which scales poorly
with the number of quantifier alternations. Our generalization of CEGIS removes
the need for this recursive approach, and we instantiate it to yield a simple
and efficient algorithm for QBF solving. Lastly, this research is supported by
a competitive, though straightforward, implementation of the algorithm, making
it possible to study the practical impact of our algorithm design decisions,
along with various optimizations.
C. Daniel Freeman, Joan Bruna
Comments: 24 Pages (12 main + 12 Appendix), 4 Figures, 1 Table, Submitted to ICLR for review
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Statistics Theory (math.ST)
The loss surface of deep neural networks has recently attracted interest in
the optimization and machine learning communities as a prime example of
high-dimensional non-convex problem. Some insights were recently gained using
spin glass models and mean-field approximations, but at the expense of strongly
simplifying the nonlinear nature of the model.
In this work, we do not make any such assumption and study conditions on the
data distribution and model architecture that prevent the existence of bad
local minima. Our theoretical work quantifies and formalizes two important
emph{folklore} facts: (i) the landscape of deep linear networks has a
radically different topology from that of deep half-rectified ones, and (ii)
that the energy landscape in the non-linear case is fundamentally controlled by
the interplay between the smoothness of the data distribution and model
over-parametrization. These results are in accordance with empirical practice
and recent literature. %Together with %recent results that rigorously establish
that no gradient descent can %get stuck on saddle points, we conclude that
gradient descent converges %to a global optimum in deep rectified networks.
The conditioning of gradient descent is the next challenge we address. We
study this question through the geometry of the level sets, and we introduce an
algorithm to efficiently estimate the regularity of such sets on large-scale
networks. Our empirical results show that these level sets remain connected
throughout all the learning phase, suggesting a near convex behavior, but they
become exponentially more curvy as the energy level decays, in accordance to
what is observed in practice with very low curvature attractors.
Ivania Donoso, Denis Parra
Comments: 5 pages, 4 figures, pre-print submitted to ACM IUI 2017
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)
Evidence-based health care (EBHC) is an important practice of medicine which
attempts to provide systematic scientific evidence to answer clinical
questions. In this context, Epistemonikos (www.epistemonikos.org) is one of the
first and most important online systems in the field, providing an interface
that supports users on searching and filtering scientific articles for
practicing EBHC. The system nowadays requires a large amount of expert human
effort, where close to 500 physicians manually curate articles to be utilized
in the platform. In order to scale up the large and continuous amount of data
to keep the system updated, we introduce EpistAid, an interactive intelligent
interface which supports clinicians in the process of curating documents for
Epistemonikos within lists of papers called evidence matrices. We introduce the
characteristics, design and algorithms of our solution, as well as a prototype
implementation and a case study to show how our solution addresses the
information overload problem in this area.
Bálint Daróczy, Frederick Ayala-Gómez, András Benczúr
Comments: 9 pages, 8 figures, 4 tables
Subjects: Information Retrieval (cs.IR)
Web recommendation services bear great importance in e-commerce, as they aid
the user in navigating through the items that are most relevant to her needs.
In a typical Web site, long history of previous activities or purchases by the
user is rarely available. Hence in most cases, recommenders propose items that
are similar to the most recent ones viewed in the current user session. The
corresponding task is called session based item-to-item recommendation. For
frequent items, it is easy to present item-to-item recommendations by “people
who viewed this, also viewed” lists. However, most of the items belong to the
long tail, where previous actions are sparsely available. Another difficulty is
the so-called cold start problem, when the item has recently appeared and had
no time yet to accumulate sufficient number of transactions. In order to
recommend a next item in a session in sparse or cold start situations, we also
have to incorporate item similarity models. In this paper we describe a
probabilistic similarity model based on Random Fields to approximate
item-to-item transition probabilities. We give a generative model for the item
interactions based on arbitrary distance measures over the items including
explicit, implicit ratings and external metadata. The model may change in time
to fit better recent events and recommend the next item based on the updated
Fisher Information. Our new model outperforms both simple similarity baseline
methods and recent item-to-item recommenders, under several different
performance metrics and publicly available data sets. We reach significant
gains in particular for recommending a new item following a rare item.
Hua Sun, Syed A. Jafar
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
The capacity has recently been characterized for the private information
retrieval (PIR) problem as well as several of its variants. In every case it is
assumed that all the queries are generated by the user simultaneously. Here we
consider multiround PIR, where the queries in each round are allowed to depend
on the answers received in previous rounds. We show that the capacity of
multiround PIR is the same as the capacity of single-round PIR (the result is
generalized to also include (T)-privacy constraints). Combined with previous
results, this shows that there is no capacity advantage from multiround over
single-round schemes, non-linear over linear schemes or from (epsilon)-error
over zero-error schemes. However, we show through an example that there is an
advantage in terms of storage overhead. We provide an example of a multiround,
non-linear, (epsilon)-error PIR scheme that requires a strictly smaller
storage overhead than the best possible with single-round, linear, zero-error
PIR schemes.
Ricardo Usbeck, Jonathan Huthmann, Nico Duldhardt, Axel-Cyrille Ngonga Ngomo
Comments: 6 pages, 1 figure, pre-print in lncs
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Question answering (QA) has been the subject of a resurgence over the past
years. The said resurgence has led to a multitude of question answering (QA)
systems being developed both by companies and research facilities. While a few
components of QA systems get reused across implementations, most systems do not
leverage the full potential of component reuse. Hence, the development of QA
systems is currently still a tedious and time-consuming process. We address the
challenge of accelerating the creation of novel or tailored QA systems by
presenting a concept for a self-wiring approach to composing QA systems. Our
approach will allow the reuse of existing, web-based QA systems or modules
while developing new QA platforms. To this end, it will rely on QA modules
being described using the Web Ontology Language. Based on these descriptions,
our approach will be able to automatically compose QA systems using a
data-driven approach automatically.
Abhishek Santra, Sanjukta Bhowmick, Sharma Chakravarthy
Subjects: Databases (cs.DB); Discrete Mathematics (cs.DM); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Holistic analysis of many real-world problems are based on data collected
from multiple sources contributing to some aspect of that problem. The word
fusion has also been used in the literature for such problems involving
disparate data types. Holistically understanding traffic patterns, causes of
accidents, bombings, terrorist planning and many natural phenomenon such as
storms, earthquakes fall into this category. Some may have real-time
requirements and some may need to be analyzed after the fact (post-mortem or
forensic analysis.) What is common for all these problems is that the amount
and types of data associated with the event. Data may also be incomplete and
trustworthiness of sources may also vary. Currently, manual and ad-hoc
approaches are used in aggregating data in different ways for analyzing and
understanding these problems.
In this paper, we approach this problem in a novel way using multilayered
networks. We identify features of a central event and propose a network layer
for each feature. This approach allows us to study the effect of each feature
independently and its impact on the event. We also establish that the proposed
approach allows us to compose these features in arbitrary ways (without loss of
information) to analyze their combined effect. Additionally, formulation of
relationships (e.g., distance measure for a single feature instead of several
at the same time) is simpler. Further, computations can be done once on each
layer in this approach and reused for mixing and matching the features for
aggregate impacts and “what if” scenarios to understand the problem
holistically. This has been demonstrated by recreating the communities for the
AND-Composed network by using the communities of the individual layers.
We believe that techniques proposed here make an important contribution to
the nascent yet fast growing area of data fusion.
Liwen Zhang, John Winn, Ryota Tomioka
Comments: 12 pages, 2 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We propose the Gaussian attention model for content-based neural memory
access. With the proposed attention model, a neural network has the additional
degree of freedom to control the focus of its attention from a laser sharp
attention to blurred attention. It is applicable whenever we can assume that
the distance in the latent space reflects some notion of semantics. We use the
proposed attention model as a scoring function for the embedding of a
knowledgebase into a continuous vector space and then train a model that
performs question answering about the entities in the knowledgebase. The
proposed attention model can handle both the propagation of uncertainty when
following a series of relations and also the conjunction of thoughts in a
natural way. On a dataset of soccer players who participated in the FIFA World
Cup 2014, we demonstrate that our model can handle both path queries and
conjunctive queries well.
Bin He, Bin Dong, Yi Guan, Jinfeng Yang, Zhipeng Jiang, Qiubin Yu, Jianyi Cheng, Chunyan Qu
Comments: 27 pages
Subjects: Computation and Language (cs.CL)
Objective: To build a comprehensive corpus covering syntactic and semantic
annotations of Chinese clinical texts with corresponding annotation guidelines
and methods as well as to develop tools trained on the annotated corpus, which
supplies baselines for research on Chinese texts in the clinical domain.
Materials and methods: An iterative annotation method was proposed to train
annotators and to develop annotation guidelines. Then, by using annotation
quality assurance measures, a comprehensive corpus was built, containing
annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions,
and relations. Inter-annotator agreement (IAA) was calculated to evaluate the
annotation quality and a Chinese clinical text processing and information
extraction system (CCTPIES) was developed based on our annotated corpus.
Results: The syntactic corpus consists of 138 Chinese clinical documents with
47,424 tokens and 2553 full parsing trees, while the semantic corpus includes
992 documents that annotated 39,511 entities with their assertions and 7695
relations. IAA evaluation shows that this comprehensive corpus is of good
quality, and the system modules are effective. Discussion: The annotated corpus
makes a considerable contribution to natural language processing (NLP) research
into Chinese texts in the clinical domain. However, this corpus has a number of
limitations. Some additional types of clinical text should be introduced to
improve corpus coverage and active learning methods should be utilized to
promote annotation efficiency. Conclusions: In this study, several annotation
guidelines and an annotation method for Chinese clinical texts were proposed,
and a comprehensive corpus with its NLP modules were constructed, providing a
foundation for further study of applying NLP techniques to Chinese texts in the
clinical domain.
Will Radford, Andrew Chisholm, Ben Hachey, Bo Han
Subjects: Computation and Language (cs.CL)
We report on an exploratory analysis of Emoji Dick, a project that leverages
crowdsourcing to translate Melville’s Moby Dick into emoji. This distinctive
use of emoji removes textual context, and leads to a varying translation
quality. In this paper, we use statistical word alignment and part-of-speech
tagging to explore how people use emoji. Despite these simple methods, we
observed differences in token and part-of-speech distributions. Experiments
also suggest that semantics are preserved in the translation, and repetition is
more common in emoji.
Xavier Holt, Will Radford, Ben Hachey
Subjects: Computation and Language (cs.CL)
The timeline generation task summarises an entity’s biography by selecting
stories representing key events from a large pool of relevant documents. This
paper addresses the lack of a standard dataset and evaluative methodology for
the problem. We present and make publicly available a new dataset of 18,793
news articles covering 39 entities. For each entity, we provide a gold standard
timeline and a set of entity-related articles. We propose ROUGE as an
evaluation metric and validate our dataset by showing that top Google results
outperform straw-man baselines.
Adrien Bougouin, Florian Boudin, Béatrice Daille
Comments: Accepted at the COLING 2016 conference
Subjects: Computation and Language (cs.CL)
Keyphrase annotation is the task of identifying textual units that represent
the main content of a document. Keyphrase annotation is either carried out by
extracting the most important phrases from a document, keyphrase extraction, or
by assigning entries from a controlled domain-specific vocabulary, keyphrase
assignment. Assignment methods are generally more reliable. They provide
better-formed keyphrases, as well as keyphrases that do not occur in the
document. But they are often silent on the contrary of extraction methods that
do not depend on manually built resources. This paper proposes a new method to
perform both keyphrase extraction and keyphrase assignment in an integrated and
mutual reinforcing manner. Experiments have been carried out on datasets
covering different domains of humanities and social sciences. They show
statistically significant improvements compared to both keyphrase extraction
and keyphrase assignment state-of-the art methods.
Depeng Liang, Yongdong Zhang
Comments: 7 pages
Subjects: Computation and Language (cs.CL)
Recently deeplearning models have been shown to be capable of making
remarkable performance in sentences and documents classification tasks. In this
work, we propose a novel framework called AC-BLSTM for modeling setences and
documents, which combines the asymmetric convolution neural network (ACNN) with
the Bidirectional Long Short-Term Memory network (BLSTM). Experiment results
demonstrate that our model achieves state-of-the-art results on all six tasks,
including sentiment analysis, question type classification, and subjectivity
classification.
Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li
Subjects: Computation and Language (cs.CL)
Although end-to-end Neural Machine Translation (NMT) has achieved remarkable
progress in the past two years, it suffers from a major drawback: translations
generated by NMT systems often lack of adequacy. It has been widely observed
that NMT tends to repeatedly translate some source words while mistakenly
ignoring other words. To alleviate this problem, we propose a novel
encoder-decoder-reconstructor framework for NMT. The reconstructor,
incorporated into the NMT model, manages to reconstruct the input source
sentence from the hidden layer of the output target sentence, to ensure that
the information in the source side is transformed to the target side as much as
possible. Experiments show that the proposed framework significantly improves
the adequacy of NMT output and achieves superior translation result over
state-of-the-art NMT and statistical MT systems.
Luyang Li, Bing Qin, Wenjing Ren, Ting Liu
Comments: 10 pages, 2 figures
Subjects: Computation and Language (cs.CL); Databases (cs.DB)
Truth discovery is to resolve conflicts and find the truth from
multiple-source statements. Conventional methods mostly research based on the
mutual effect between the reliability of sources and the credibility of
statements, however, pay no attention to the mutual effect among the
credibility of statements about the same object. We propose memory network
based models to incorporate these two ideas to do the truth discovery. We use
feedforward memory network and feedback memory network to learn the
representation of the credibility of statements which are about the same
object. Specially, we adopt memory mechanism to learn source reliability and
use it through truth prediction. During learning models, we use multiple types
of data (categorical data and continuous data) by assigning different weights
automatically in the loss function based on their own effect on truth discovery
prediction. The experiment results show that the memory network based models
much outperform the state-of-the-art method and other baseline methods.
Xinyun Chen, Chang Liu, Richard Shin, Dawn Song, Mingcheng Chen
Comments: Accepted by NIPS 2016
Subjects: Computation and Language (cs.CL)
Automatic translation from natural language descriptions into programs is a
longstanding challenging problem. In this work, we consider a simple yet
important sub-problem: translation from textual descriptions to If-Then
programs. We devise a novel neural network architecture for this task which we
train end-to-end. Specifically, we introduce Latent Attention, which computes
multiplicative weights for the words in the description in a two-stage process
with the goal of better leveraging the natural language structures that
indicate the relevant parts for predicting program elements. Our architecture
reduces the error rate by 28.57% compared to prior art. We also propose a
one-shot learning scenario of If-Then program synthesis and simulate it with
our existing dataset. We demonstrate a variation on the training procedure for
this scenario that outperforms the original procedure, significantly closing
the gap to the model trained with all data.
Eunsol Choi, Daniel Hewlett, Alexandre Lacoste, Illia Polosukhin, Jakob Uszkoreit, Jonathan Berant
Subjects: Computation and Language (cs.CL)
Reading an article and answering questions about its content is a fundamental
task for natural language understanding. While most successful neural
approaches to this problem rely on recurrent neural networks (RNNs), training
RNNs over long documents can be prohibitively slow. We present a novel
framework for question answering that can efficiently scale to longer documents
while maintaining or even improving performance. Our approach combines a
coarse, inexpensive model for selecting one or more relevant sentences and a
more expensive RNN that produces the answer from those sentences. A central
challenge is the lack of intermediate supervision for the coarse model, which
we address using reinforcement learning. Experiments demonstrate
state-of-the-art performance on a challenging subset of the WIKIREADING
dataset(Hewlett et al., 2016) and on a newly-gathered dataset, while reducing
the number of sequential RNN steps by 88% against a standard sequence to
sequence model.
Yehoshua Dissen, Joseph Keshet, Jacob Goldberger, Cynthia Clopper
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
In this paper we present a domain adaptation technique for formant estimation
using a deep network. We first train a deep learning network on a small read
speech dataset. We then freeze the parameters of the trained network and use
several different datasets to train an adaptation layer that makes the obtained
network universal in the sense that it works well for a variety of speakers and
speech domains with very different characteristics. We evaluated our adapted
network on three datasets, each of which has different speaker characteristics
and speech styles. The performance of our method compares favorably with
alternative methods for formant estimation.
Shuohang Wang, Jing Jiang
Comments: 11 pages, 2 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Many NLP tasks including machine comprehension, answer selection and text
entailment require the comparison between sequences. Matching the important
units between sequences is a key to solve these problems. In this paper, we
present a general “compare-aggregate” framework that performs word-level
matching followed by aggregation using Convolutional Neural Networks. We
particularly focus on the different comparison functions we can use to match
two vectors. We use four different datasets to evaluate the model. We find that
some simple comparison functions based on element-wise operations can work
better than standard neural network and neural tensor network.
Timothy Dozat, Christopher D. Manning
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
While deep learning parsing approaches have proven very successful at finding
the structure of sentences, most neural dependency parsers use neural networks
only for feature extraction, and then use those features in traditional parsing
algorithms. In contrast, this paper builds off recent work using
general-purpose neural network components, training an attention mechanism over
an LSTM to attend to the head of the phrase. We get state-of-the-art results
for standard dependency parsing benchmarks, achieving 95.44% UAS and 93.76% LAS
on the PTB dataset, 0.8% and 1.0% improvement, respectively, over Andor et al.
(2016). In addition to proposing a new parsing architecture using
dimensionality reduction and biaffine interactions, we examine simple
hyperparameter choices that had a profound influence on the model’s
performance, such as reducing the value of beta2 in the Adam optimization
algorithm.
Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Previous work combines word-level and character-level representations using
concatenation or scalar weighting, which is suboptimal for high-level tasks
like reading comprehension. We present a fine-grained gating mechanism to
dynamically combine word-level and character-level representations based on
properties of the words. We also extend the idea of fine-grained gating to
modeling the interaction between questions and paragraphs for reading
comprehension. Experiments show that our approach can improve the performance
on reading comprehension tasks, achieving new state-of-the-art results on the
Children’s Book Test dataset. To demonstrate the generality of our gating
mechanism, we also show improved results on a social media tag prediction task.
Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based
language model designed to directly capture the global semantic meaning
relating words in a document via latent topics. Because of their sequential
nature, RNNs are good at capturing the local structure of a word sequence –
both semantic and syntactic – but might face difficulty remembering long-range
dependencies. Intuitively, these long-range dependencies are of semantic
nature. In contrast, latent topic models are able to capture the global
underlying semantic structure of a document but do not account for word
ordering. The proposed TopicRNN model integrates the merits of RNNs and latent
topic models: it captures local (syntactic) dependencies using an RNN and
global (semantic) dependencies using latent topics. Unlike previous work on
contextual RNN language modeling, our model is learned end-to-end. Empirical
results on word prediction show that TopicRNN outperforms existing contextual
RNN baselines. In addition, TopicRNN can be used as an unsupervised feature
extractor for documents. We do this for sentiment analysis and report a new
state-of-the-art error rate on the IMDB movie review dataset that amounts to a
(13.3\%) improvement over the previous best result. Finally TopicRNN also
yields sensible topics, making it a useful alternative to document models such
as latent Dirichlet allocation.
Zichao Yang, Phil Blunsom, Chris Dyer, Wang Ling
Comments: iclr submission
Subjects: Computation and Language (cs.CL)
We propose a general class of language models that treat reference as an
explicit stochastic latent variable. This architecture allows models to create
mentions of entities and their attributes by accessing external databases
(required by, e.g., dialogue generation and recipe generation) and internal
state (required by, e.g. language models which are aware of coreference). This
facilitates the incorporation of information that can be accessed in
predictable locations in databases or discourse context, even when the targets
of the reference may be rare words. Experiments on three tasks shows our model
variants based on deterministic attention.
Caiming Xiong, Victor Zhong, Richard Socher
Comments: 13 pages, 7 figures
Subjects: Computation and Language (cs.CL)
Several deep learning models have been proposed for question answering.
However, due to their single-pass nature, they have no way to recover from
local maxima corresponding to incorrect answers. To address this problem, we
introduce the Dynamic Coattention Network (DCN) for question answering. The DCN
first fuses co-dependent representations of the question and the document in
order to focus on relevant parts of both. Then a dynamic pointing decoder
iterates over potential answer spans. This iterative procedure enables the
model to recover from initial local maxima corresponding to incorrect answers.
On the Stanford question answering dataset, a single DCN model improves the
previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains
80.4% F1.
Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi
Subjects: Computation and Language (cs.CL)
Machine Comprehension (MC), answering questions about a given context,
re-quires modeling complex interactions between the context and the query.
Recently, attention mechanisms have been successfully extended to MC. Typically
these mechanisms use attention to summarize the query and context into a single
vectors, couple attentions temporally, and often form a unidirectional
attention. In this paper we introduce the Bidirectional Attention Flow (BIDAF)
Model, a multi-stage hierarchical process that represents the context at
different levels of granularity and uses a bi-directional attention flow
mechanism to achieve a query-aware context representation without early
summarization. Our experimental evaluations show that our model achieves the
state-of-the-art results in Stanford QA(SQuAD) and CNN/DailyMail Cloze Test
datasets.
Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
Comments: Under review as a conference paper at ICLR 2017
Subjects: Computation and Language (cs.CL)
Transfer and multi-task learning have traditionally focused on either a
single source-target pair or very few, similar tasks. Ideally, the linguistic
levels of morphology, syntax and semantics would benefit each other by being
trained in a single model. We introduce such a joint many-task model together
with a strategy for successively growing its depth to solve increasingly
complex tasks. All layers include shortcut connections to both word
representations and lower-level task predictions. We use a simple
regularization term to allow for optimizing all model weights to improve one
task’s loss without exhibiting catastrophic interference of the other tasks.
Our single end-to-end trainable model obtains state-of-the-art results on
chunking, dependency parsing, semantic relatedness and textual entailment. It
also performs competitively on POS tagging. Our dependency parsing layer relies
only on a single feed-forward pass and does not require a beam search.
Philip Blair, Yuval Merhav, Joel Barry
Comments: 12 pages
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We propose a language-agnostic way of automatically generating sets of
semantically similar clusters of entities along with sets of “outlier”
elements, which may then be used to perform an intrinsic evaluation of word
embeddings in the outlier detection task. We used our methodology to create a
gold-standard dataset, which we call WikiSem500, and evaluated multiple
state-of-the-art embeddings. The results show a correlation between performance
on this dataset and performance on sentiment analysis.
Ricardo Usbeck, Jonathan Huthmann, Nico Duldhardt, Axel-Cyrille Ngonga Ngomo
Comments: 6 pages, 1 figure, pre-print in lncs
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Question answering (QA) has been the subject of a resurgence over the past
years. The said resurgence has led to a multitude of question answering (QA)
systems being developed both by companies and research facilities. While a few
components of QA systems get reused across implementations, most systems do not
leverage the full potential of component reuse. Hence, the development of QA
systems is currently still a tedious and time-consuming process. We address the
challenge of accelerating the creation of novel or tailored QA systems by
presenting a concept for a self-wiring approach to composing QA systems. Our
approach will allow the reuse of existing, web-based QA systems or modules
while developing new QA platforms. To this end, it will rely on QA modules
being described using the Web Ontology Language. Based on these descriptions,
our approach will be able to automatically compose QA systems using a
data-driven approach automatically.
Ark Anderson, Kyle Shaffer, Artem Yankov, Court D. Corley, Nathan O. Hodas
Subjects: Learning (cs.LG); Computation and Language (cs.CL)
In this paper we present a technique to train neural network models on small
amounts of data. Current methods for training neural networks on small amounts
of rich data typically rely on strategies such as fine-tuning a pre-trained
neural network or the use of domain-specific hand-engineered features. Here we
take the approach of treating network layers, or entire networks, as modules
and combine pre-trained modules with untrained modules, to learn the shift in
distributions between data sets. The central impact of using a modular approach
comes from adding new representations to a network, as opposed to replacing
representations via fine-tuning. Using this technique, we are able surpass
results using standard fine-tuning transfer learning approaches, and we are
also able to significantly increase performance over such approaches when using
smaller amounts of data.
Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Lipreading is the task of decoding text from the movement of a speaker’s
mouth. Traditional approaches separated the problem into two stages: designing
or learning visual features, and prediction. More recent deep lipreading
approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman,
2016a). All existing works, however, perform only word classification, not
sentence-level sequence prediction. Studies have shown that human lipreading
performance increases for longer words (Easton & Basala, 1982), indicating the
importance of features capturing temporal context in an ambiguous communication
channel. Motivated by this observation, we present LipNet, a model that maps a
variable-length sequence of video frames to text, making use of spatiotemporal
convolutions, an LSTM recurrent network, and the connectionist temporal
classification loss, trained entirely end-to-end. To the best of our knowledge,
LipNet is the first lipreading model to operate at sentence-level, using a
single end-to-end speaker-independent deep model to simultaneously learn
spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet
achieves 93.4% accuracy, outperforming experienced human lipreaders and the
previous 79.6% state-of-the-art accuracy.
James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
Comments: Submitted to conference track at ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)
Recurrent neural networks are a powerful tool for modeling sequential data,
but the dependence of each timestep’s computation on the previous timestep’s
output limits parallelism and makes RNNs unwieldy for very long sequences. We
introduce quasi-recurrent neural networks (QRNNs), an approach to neural
sequence modeling that alternates convolutional layers, which apply in parallel
across timesteps, and a minimalist recurrent pooling function that applies in
parallel across channels. Despite lacking trainable recurrent layers, stacked
QRNNs have better predictive accuracy than stacked LSTMs of the same hidden
size. Due to their increased parallelism, they are up to 16 times faster at
train and test time. Experiments on language modeling, sentiment
classification, and character-level neural machine translation demonstrate
these advantages and underline the viability of QRNNs as a basic building block
for a variety of sequence tasks.
Bernhard Etzlinger, Florian Meyer, Franz Hlawatsch, Andreas Springer, Henk Wymeersch
Comments: 13 pages, 6 figures, 3 tables; manuscript submitted to IEEE Transaction on Signal Processing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY)
Cooperative localization in agent networks based on interagent time-of-flight
measurements is closely related to synchronization. To leverage this relation,
we propose a Bayesian factor graph framework for cooperative simultaneous
localization and synchronization (CoSLAS). This framework is suited to mobile
agents and time-varying local clock parameters. Building on the CoSLAS factor
graph, we develop a distributed (decentralized) belief propagation algorithm
for CoSLAS in the practically important case of an affine clock model and
asymmetric time stamping. Our algorithm allows for real-time operation and is
suitable for a time-varying network connectivity. To achieve high accuracy at
reduced complexity and communication cost, the algorithm combines particle
implementations with parametric message representations and takes advantage of
a conditional independence property. Simulation results demonstrate the good
performance of the proposed algorithm in a challenging scenario with
time-varying network connectivity.
Vincenzo De Florio
Comments: Doctoral thesis, successfully defended on October 13, 2000
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The structures for the expression of fault-tolerance provisions into the
application software are the central topic of this dissertation. Structuring
techniques provide means to control complexity, the latter being a relevant
factor for the introduction of design faults. This fact and the ever increasing
complexity of today’s distributed software justify the need for simple,
coherent, and effective structures for the expression of fault-tolerance in the
application software. A first contribution of this dissertation is the
definition of a base of structural attributes with which application-level
fault-tolerance structures can be qualitatively assessed and compared with each
other and with respect to the above mentioned need. This result is then used to
provide an elaborated survey of the state-of-the-art of software
fault-tolerance structures. The key contribution of this work is a novel
structuring technique for the expression of the fault-tolerance design concerns
in the application layer of those distributed software systems that are
characterized by soft real-time requirements and with a number of processing
nodes known at compile-time. The main thesis of this dissertation is that this
new structuring technique is capable of exhibiting satisfactory values of the
structural attributes in the domain of soft real-time, distributed and parallel
applications. Following this novel approach, beside the conventional
programming language addressing the functional design concerns, a
special-purpose linguistic structure (the so-called “recovery language”) is
available to address error recovery and reconfiguration. This recovery language
comes into play as soon as an error is detected by an underlying error
detection layer, or when some erroneous condition is signaled by the
application processes.
Natalie Perlin, Joel P. Zysman, Ben P. Kirtman
Comments: 9 pages, 8 figures, 1 table
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Engineering, Finance, and Science (cs.CE); Atmospheric and Oceanic Physics (physics.ao-ph)
The concept of scalability analysis of numerical parallel applications has
been revisited, with the specific goals defined for the performance estimation
of research applications. A series of Community Climate Model System (CCSM)
numerical simulations were used to test the several MPI implementations,
determine optimal use of the system resources, and their scalability. The
scaling capacity and model throughput performance metrics for (N) cores showed
a log-linear behavior approximated by a power fit in the form of (C(N)=bN^a),
where (a) and (b) are two empirical constants. Different metrics yielded
identical power coefficients ((a)), but different dimensionality coefficients
((b)). This model was consistent except for the large numbers of N. The power
fit approach appears to be very useful for scalability estimates, especially
when no serial testing is possible. Scalability analysis of additional
scientific application has been conducted in the similar way to validate the
robustness of the power fit approach.
Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong, Haiyong Xie
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In large-scale distributed file systems, efficient meta- data operations are
critical since most file operations have to interact with metadata servers
first. In existing distributed hash table (DHT) based metadata management
systems, the lookup service could be a performance bottleneck due to its
significant CPU overhead. Our investigations showed that the lookup service
could reduce system throughput by up to 70%, and increase system latency by a
factor of up to 8 compared to ideal scenarios. In this paper, we present
MetaFlow, a scalable metadata lookup service utilizing software-defined
networking (SDN) techniques to distribute lookup workload over network
components. MetaFlow tackles the lookup bottleneck problem by leveraging
B-tree, which is constructed over the physical topology, to manage flow tables
for SDN-enabled switches. Therefore, metadata requests can be forwarded to
appropriate servers using only switches. Extensive performance evaluations in
both simulations and testbed showed that MetaFlow increases system throughput
by a factor of up to 3.2, and reduce system latency by a factor of up to 5
compared to DHT-based systems. We also deployed MetaFlow in a distributed file
system, and demonstrated significant performance improvement.
Ilya Trofimov, Alexander Genkin
Comments: arXiv admin note: text overlap with arXiv:1411.6520
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC)
Generalized linear model with (L_1) and (L_2) regularization is a widely used
technique for solving classification, class probability estimation and
regression problems. With the numbers of both features and examples growing
rapidly in the fields like text mining and clickstream data analysis
parallelization and the use of cluster architectures becomes important. We
present a novel algorithm for fitting regularized generalized linear models in
the distributed environment. The algorithm splits data between nodes by
features, uses coordinate descent on each node and line search to merge results
globally. Convergence proof is provided. A modifications of the algorithm
addresses slow node problem. For an important particular case of logistic
regression we empirically compare our program with several state-of-the art
approaches that rely on different algorithmic and data spitting methods.
Experiments demonstrate that our approach is scalable and superior when
training on large and sparse datasets.
Akshay Balsubramani
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We formulate learning of a binary autoencoder as a biconvex optimization
problem which learns from the pairwise correlations between encoded and decoded
bits. Among all possible algorithms that use this information, ours finds the
autoencoder that reconstructs its inputs with worst-case optimal loss. The
optimal decoder is a single layer of artificial neurons, emerging entirely from
the minimax loss minimization, and with weights learned by convex optimization.
All this is reflected in competitive experimental results, demonstrating that
binary autoencoding can be done efficiently by conveying information in
pairwise correlations in an optimal fashion.
Miguel Lázaro-Gredilla, Yi Liu, D. Scott Phoenix, Dileep George
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We introduce the hierarchical compositional network (HCN), a directed
generative model able to discover and disentangle, without supervision, the
building blocks of a set of binary images. The building blocks are binary
features defined hierarchically as a composition of some of the features in the
layer immediately below, arranged in a particular manner. At a high level, HCN
is similar to a sigmoid belief network with pooling. Inference and learning in
HCN are very challenging and existing variational approximations do not work
satisfactorily. A main contribution of this work is to show that both can be
addressed using max-product message passing (MPMP) with a particular schedule
(no EM required). Also, using MPMP as an inference engine for HCN makes new
tasks simple: adding supervision information, classifying images, or performing
inpainting all correspond to clamping some variables of the model to their
known values and running MPMP on the rest. When used for classification, fast
inference with HCN has exactly the same functional form as a convolutional
neural network (CNN) with linear activations and binary weights. However, HCN’s
features are qualitatively very different.
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Model-free deep reinforcement learning (RL) methods have been successful in a
wide variety of simulated domains. However, a major obstacle facing deep RL in
the real world is the high sample complexity of such methods. Unbiased batch
policy-gradient methods offer stable learning, but at the cost of high
variance, which often requires large batches, while TD-style methods, such as
off-policy actor-critic and Q-learning, are more sample-efficient but biased,
and often require costly hyperparameter sweeps to stabilize. In this work, we
aim to develop methods that combine the stability of unbiased policy gradients
with the efficiency of off-policy RL. We present Q-Prop, a policy gradient
method that uses a Taylor expansion of the off-policy critic as a control
variate. Q-Prop is both sample efficient and stable, and effectively combines
the benefits of on-policy and off-policy methods. We analyze the connection
between Q-Prop and existing model-free algorithms, and use control variate
theory to derive two variants of Q-Prop with conservative and aggressive
adaptation. We show that conservative Q-Prop provides substantial gains in
sample efficiency over trust region policy optimization (TRPO) with generalized
advantage estimation (GAE), and improves stability over deep deterministic
policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods,
on OpenAI Gym’s MuJoCo continuous control environments.
Nadav Bhonker, Shai Rozenberg, Itay Hubara
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Mastering a video game requires skill, tactics and strategy. While these
attributes may be acquired naturally by human players, teaching them to a
computer program is a far more challenging task. In recent years, extensive
research was carried out in the field of reinforcement learning and numerous
algorithms were introduced, aiming to learn how to perform human tasks such as
playing video games. As a results, the Arcade Learning Environment (ALE) has
become a commonly used benchmark environment allowing algorithms to train on
various Atari 2600 games. Most Atari games no longer pose a challenge to
state-of-the-art algorithms. In this paper we introduce a new learning
environment, the Retro Learning Environment — RLE, based on the Super
Nintendo Entertainment System (SNES). The environment is expandable, allowing
for more video games and consoles to be easily added to the environment, while
maintaining the same interface as ALE. Moreover, RLE is compatible with Python
and Torch. SNES games pose a significant challenge to current algorithms due to
their higher level of complexity and versatility. To overcome these challenges,
we introduce a novel training method based on training two agents against each
other.
Virginia Smith, Simone Forte, Chenxin Ma, Martin Takac, Michael I. Jordan, Martin Jaggi
Subjects: Learning (cs.LG)
The scale of modern datasets necessitates the development of efficient
distributed optimization methods for machine learning. We present a
general-purpose framework for the distributed environment, CoCoA, that has an
efficient communication scheme and is applicable to a wide variety of problems
in machine learning and signal processing. We extend the framework to cover
general non-strongly convex regularizers, including L1-regularized problems
like lasso, sparse logistic regression, and elastic net regularization, and
show how earlier work can be derived as a special case. We provide convergence
guarantees for the class of convex regularized loss minimization objectives,
leveraging a novel approach in handling non-strongly convex regularizers and
non-smooth loss functions. The resulting framework has markedly improved
performance over state-of-the-art methods, as we illustrate with an extensive
set of experiments on real distributed datasets.
Leonard Berrada, Andrew Zisserman, M. Pawan Kumar
Subjects: Learning (cs.LG)
We present a novel layerwise optimization algorithm for the learning
objective of a large class of convolutional neural networks (CNNs).
Specifically, we consider CNNs that employ piecewise linear non-linearities
such as the commonly used ReLU and max-pool, and an SVM classifier as the final
layer. The key observation of our approach is that the problem corresponding to
the parameter estimation of a layer can be formulated as a difference-of-convex
(DC) program, which happens to be a latent structured SVM. We optimize the DC
program using the concave- convex procedure, which requires us to iteratively
solve a structured SVM problem. To this end, we extend the block-coordinate
Frank-Wolfe (BCFW) algorithm in three important ways: (i) we include a
trust-region for the parameters, which allows us to use the previous parameters
as an initialization; (ii) we reduce the memory requirement of BCFW by
potentially several orders of magnitude for the dense layers, which enables us
to learn a large set of parameters; and (iii) we observe that, empirically, the
optimal solution of the structured SVM problem can be obtained efficiently by
solving a related, but significantly easier, multi-class SVM problem. Using
publicly available data sets, we show that our approach outperforms the state
of the art variants of backpropagation, and is also more robust to the
hyperparameters of the learning objective.
Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar
Subjects: Learning (cs.LG)
At present, designing convolutional neural network (CNN) architectures
requires both human expertise and labor. New architectures are handcrafted by
careful experimentation or modified from a handful of existing networks. We
propose a meta-modelling approach based on reinforcement learning to
automatically generate high-performing CNN architectures for a given learning
task. The learning agent is trained to sequentially choose CNN layers using
Q-learning with an (epsilon)-greedy exploration strategy and experience
replay. The agent explores a large but finite space of possible architectures
and iteratively discovers designs with improved performance on the learning
task. On image classification benchmarks, the agent-designed networks
(consisting of only standard convolution, pooling, and fully-connected layers)
beat existing networks designed with the same layer types and are competitive
against the state-of-the-art methods that use more complex layer types. We also
outperform existing network design meta-modelling approaches on image
classification.
Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We introduce a method to stabilize Generative Adversarial Networks (GANs) by
defining the generator objective with respect to an unrolled optimization of
the discriminator. This allows training to be adjusted between using the
optimal discriminator in the generator’s objective, which is ideal but
infeasible in practice, and using the current value of the discriminator, which
is often unstable and leads to poor solutions. We show how this technique
solves the common problem of mode collapse, stabilizes training of GANs with
complex recurrent generators, and increases diversity and coverage of the data
distribution by the generator.
Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, Daniel Tarlow
Comments: Submitted to ICLR 2017
Subjects: Learning (cs.LG)
We introduce and develop solutions for the problem of Lifelong Perceptual
Programming By Example (LPPBE). The problem is to induce a series of programs
that require understanding perceptual data like images or text. LPPBE systems
learn from weak supervision (input-output examples) and incrementally construct
a shared library of components that grows and improves as more tasks are
solved. Methodologically, we extend differentiable interpreters to operate on
perceptual data and to share components across tasks. Empirically we show that
this leads to a lifelong learning system that transfers knowledge to new tasks
more effectively than baselines, and the performance on earlier tasks continues
to improve even as the system learns on new, different tasks.
Valeria Efimova, Andrey Filchenkov, Anatoly Shalyto
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Many algorithms for data analysis exist, especially for classification
problems. To solve a data analysis problem, a proper algorithm should be
chosen, and also its hyperparameters should be selected. In this paper, we
present a new method for the simultaneous selection of an algorithm and its
hyperparameters. In order to do so, we reduced this problem to the multi-armed
bandit problem. We consider an algorithm as an arm and algorithm
hyperparameters search during a fixed time as the corresponding arm play. We
also suggest a problem-specific reward function. We performed the experiments
on 10 real datasets and compare the suggested method with the existing one
implemented in Auto-WEKA. The results show that our method is significantly
better in most of the cases and never worse than the Auto-WEKA.
Ivan Smetannikov, Ilya Isaev, Andrey Filchenkov
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
One of the classical problems in machine learning and data mining is feature
selection. A feature selection algorithm is expected to be quick, and at the
same time it should show high performance. MeLiF algorithm effectively solves
this problem using ensembles of ranking filters. This article describes two
different ways to improve MeLiF algorithm performance with parallelization.
Experiments show that proposed schemes significantly improves algorithm
performance and increase feature selection quality.
Mickaël Chen, Ludovic Denoyer
Comments: Submitted at ICLR 2017
Subjects: Learning (cs.LG)
Learning over multi-view data is a challenging problem with strong practical
applications. Most related studies focus on the classification point of view
and assume that all the views are available at any time. We consider an
extension of this framework in two directions. First, based on the BiGAN model,
the Multi-view BiGAN (MV-BiGAN) is able to perform density estimation from
multi-view inputs. Second, it can deal with missing views and is able to update
its prediction when additional views are provided. We illustrate these
properties on a set of experiments over different datasets.
Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, Daniel Tarlow
Comments: Submitted to ICLR 2017
Subjects: Learning (cs.LG)
We develop a first line of attack for solving programming competition-style
problems from input-output examples using deep learning. The approach is to
train a neural network to predict properties of the program that generated the
outputs from the inputs. We use the neural network’s predictions to augment
search techniques from the programming languages community, including
enumerative search and an SMT-based solver. Empirically, we show that our
approach leads to an order of magnitude speedup over the strong non-augmented
baselines and a Recurrent Neural Network approach, and that we are able to
solve problems of difficulty comparable to the simplest problems on programming
competition websites.
Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca
Comments: Submitted to ICLR2017 conference
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Regularization is key for deep learning since it allows training more complex
models while keeping lower levels of overfitting. However, the most prevalent
regularizations do not leverage all the capacity of the models since they rely
on reducing the effective number of parameters. Feature decorrelation is an
alternative for using the full capacity of the models but the overfitting
reduction margins are too narrow given the overhead it introduces. In this
paper, we show that regularizing negatively correlated features is an obstacle
for effective decorrelation and present OrthoReg, a novel regularization
technique that locally enforces feature orthogonality. As a result, imposing
locality constraints in feature decorrelation removes interferences between
negatively correlated features, allowing the regularizer to reach higher
decorrelation bounds, and reducing the overfitting more effectively. In
particular, we show that the models regularized with OrthoReg have higher
accuracy bounds even when batch normalization and dropout present. Moreover,
since our regularization is directly performed on the weights, it is especially
suitable for fully convolutional neural networks, where the weight space is
constant compared to the feature map space. As a result, we are able to reduce
the overfitting of state-of-the-art CNNs on CIFAR-10, CIFAR-100, and SVHN.
Kalina Jasinska, Nikos Karampatziakis
Subjects: Learning (cs.LG)
We present LTLS, a technique for multiclass and multilabel prediction that
can perform training and inference in logarithmic time and space. LTLS embeds
large classification problems into simple structured prediction problems and
relies on efficient dynamic programming algorithms for inference. We train LTLS
with stochastic gradient descent on a number of multiclass and multilabel
datasets and show that despite its small memory footprint it is often
competitive with existing approaches.
Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, Tarek Abdelzaher
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)
Mobile sensing applications usually require time-series inputs from sensors.
Some applications, such as tracking, can use sensed acceleration and rate of
rotation to calculate displacement based on physical system models. Other
applications, such as activity recognition, extract manually designed features
from sensor inputs for classification. Such applications face two challenges.
On one hand, on-device sensor measurements are noisy. For many mobile
applications, it is hard to find a distribution that exactly describes the
noise in practice. Unfortunately, calculating target quantities based on
physical system and noise models is only as accurate as the noise assumptions.
Similarly, in classification applications, although manually designed features
have proven to be effective, it is not always straightforward to find the most
robust features to accommodate diverse sensor noise patterns and user
behaviors. To this end, we propose DeepSense, a deep learning framework that
directly addresses the aforementioned noise and feature customization
challenges in a unified manner. DeepSense integrates convolutional and
recurrent neural networks to exploit local interactions among similar mobile
sensors, merge local interactions of different sensory modalities into global
interactions, and extract temporal relationships to model signal dynamics.
DeepSense thus provides a general signal estimation and classification
framework that accommodates a wide range of applications. We demonstrate the
effectiveness of DeepSense using three representative and challenging tasks:
car tracking with motion sensors, heterogeneous human activity recognition, and
user identification with biometric motion analysis. DeepSense significantly
outperforms the state-of-the-art methods for all three tasks. In addition,
DeepSense is feasible to implement on smartphones due to its moderate energy
consumption and low latency
Wentao Huang, Kechen Zhang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
A framework is presented for unsupervised learning of representations based
on infomax principle for large-scale neural populations. We use an asymptotic
approximation to the Shannon’s mutual information for a large neural population
to demonstrate that a good initial approximation to the global
information-theoretic optimum can be obtained by a hierarchical infomax method.
From the initial solution, an efficient algorithm based on gradient descent of
the final objective function is proposed to learn representations from the
input datasets, allowing complete, overcomplete, or undercomplete bases. As
confirmed by numerical experiments, our method is robust and highly efficient
for extracting salient features from image datasets. Compared with the main
existing methods, our algorithm has a distinct advantage in both the training
speed and the robustness of unsupervised representation learning.
Jundong Li, Huan Liu
Comments: Special Issue on Big Data, IEEE Intelligent Systems, 2016. arXiv admin note: text overlap with arXiv:1601.07996
Subjects: Learning (cs.LG)
We are surrounded by huge amounts of large-scale high dimensional data. It is
desirable to reduce the dimensionality of data for many learning tasks due to
the curse of dimensionality. Feature selection has shown its effectiveness in
many applications by building simpler and more comprehensive model, improving
learning performance, and preparing clean, understandable data. Recently, some
unique characteristics of big data such as data velocity and data variety
present challenges to the feature selection problem. In this paper, we envision
these challenges of feature selection for big data analytics. In particular, we
first give a brief introduction about feature selection and then detail the
challenges of feature selection for structured, heterogeneous and streaming
data as well as its scalability and stability issues. At last, to facilitate
and promote the feature selection research, we present an open-source feature
selection repository (scikit-feature), which consists of most of current
popular feature selection algorithms.
Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun
Subjects: Learning (cs.LG)
This paper proposes a new optimization algorithm called Entropy-SGD for
training deep neural networks that is motivated by the local geometry of the
energy landscape at solutions found by gradient descent. Local extrema with low
generalization error have a large proportion of almost-zero eigenvalues in the
Hessian with very few positive or negative eigenvalues. We leverage upon this
observation to construct a local entropy based objective that favors
well-generalizable solutions lying in the flat regions of the energy landscape,
while avoiding poorly-generalizable solutions located in the sharp valleys. Our
algorithm resembles two nested loops of SGD, where we use Langevin dynamics to
compute the gradient of local entropy at each update of the weights. We prove
that incorporating local entropy into the objective function results in a
smoother energy landscape and use uniform stability to show improved
generalization bounds over SGD. Our experiments on competitive baselines
demonstrate that Entropy-SGD leads to improved generalization and has the
potential to accelerate training.
Shuangfei Zhai, Yu Cheng, Rogerio Feris, Zhongfei Zhang
Comments: Under review at ICLR 2017
Subjects: Learning (cs.LG)
In this paper, we study deep generative models for effective unsupervised
learning. We propose VGAN, which works by minimizing a variational lower bound
of the negative log likelihood (NLL) of an energy based model (EBM), where the
model density (p(mathbf{x})) is approximated by a variational distribution
(q(mathbf{x})) that is easy to sample from. The training of VGAN takes a two
step procedure: given (p(mathbf{x})), (q(mathbf{x})) is updated to maximize
the lower bound; (p(mathbf{x})) is then updated one step with samples drawn
from (q(mathbf{x})) to decrease the lower bound. VGAN is inspired by the
generative adversarial networks (GANs), where (p(mathbf{x})) corresponds to
the discriminator and (q(mathbf{x})) corresponds to the generator, but with
several notable differences. We hence name our model variational GANs (VGANs).
VGAN provides a practical solution to training deep EBMs in high dimensional
space, by eliminating the need of MCMC sampling. From this view, we are also
able to identify causes to the difficulty of training GANs and propose viable
solutions. footnote{Experimental code is available at
this https URL}
Jacob Andreas, Dan Klein, Sergey Levine
Comments: Submitted to ICLR
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We describe a framework for multitask deep reinforcement learning guided by
policy sketches. Sketches annotate each task with a sequence of named subtasks,
providing high-level structural relationships among tasks, but not providing
the detailed guidance required by previous work on learning policy abstractions
for RL (e.g. intermediate rewards, subtask completion signals, or intrinsic
motivations). Our approach associates every subtask with its own modular
subpolicy, and jointly optimizes over full task-specific policies by tying
parameters across shared subpolicies. This optimization is accomplished via a
simple decoupled actor-critic training objective that facilitates learning
common behaviors from dissimilar reward functions. We evaluate the
effectiveness of our approach on a maze navigation game and a 2-D
Minecraft-inspired crafting game. Both games feature extremely sparse rewards
that can be obtained only after completing a number of high-level subgoals
(e.g. escaping from a sequence of locked rooms or collecting and combining
various ingredients in the proper order). Experiments illustrate two main
advantages of our approach. First, we outperform standard baselines that learn
task-specific or shared monolithic policies. Second, our method naturally
induces a library of primitive behaviors that can be recombined to rapidly
acquire policies for new tasks.
Rudy Bunel, Alban Desmaison, M. Pawan Kumar, Philip H.S. Torr, Pushmeet Kohli
Comments: Submitted to ICLR 2017
Subjects: Learning (cs.LG)
Code super-optimization is the task of transforming any given program to a
more efficient version while preserving its input-output behaviour. In some
sense, it is similar to the paraphrase problem from natural language processing
where the intention is to change the syntax of an utterance without changing
its semantics. Code-optimization has been the subject of years of research that
has resulted in the development of rule-based transformation strategies that
are used by compilers. More recently, however, a class of stochastic search
based methods have been shown to outperform these strategies. This approach
involves repeated sampling of modifications to the program from a proposal
distribution, which are accepted or rejected based on whether they preserve
correctness, and the improvement they achieve. These methods, however, neither
learn from past behaviour nor do they try to leverage the semantics of the
program under consideration. Motivated by this observation, we present a novel
learning based approach for code super-optimization. Intuitively, our method
works by learning the proposal distribution using unbiased estimators of the
gradient of the expected improvement. Experiments on benchmarks comprising of
automatically generated as well as existing (“Hacker’s Delight”) programs show
that the proposed method is able to significantly outperform state of the art
approaches for code super-optimization.
Alexey Dosovitskiy, Vladlen Koltun
Comments: ICLR-2017 submission
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We present an approach to sensorimotor control in immersive environments. Our
approach utilizes a high-dimensional sensory stream and a lower-dimensional
measurement stream. The cotemporal structure of these streams provides a rich
supervisory signal, which enables training a sensorimotor control model by
interacting with the environment. The model is trained using supervised
learning techniques, but without extraneous supervision. It learns to act based
on raw sensory input from a complex three-dimensional environment. The
presented formulation allows changing the agent’s goal at test time. We conduct
extensive experiments in three-dimensional simulations based on the classical
first-person game Doom. The results demonstrate that the presented approach
outperforms sophisticated prior formulations, particularly on challenging
tasks. The results also show that trained models successfully generalize across
environments and goals. A model trained using the presented approach won the
Full Deathmatch track of the Visual Doom AI Competition, which was held in
previously unseen environments.
Ark Anderson, Kyle Shaffer, Artem Yankov, Court D. Corley, Nathan O. Hodas
Subjects: Learning (cs.LG); Computation and Language (cs.CL)
In this paper we present a technique to train neural network models on small
amounts of data. Current methods for training neural networks on small amounts
of rich data typically rely on strategies such as fine-tuning a pre-trained
neural network or the use of domain-specific hand-engineered features. Here we
take the approach of treating network layers, or entire networks, as modules
and combine pre-trained modules with untrained modules, to learn the shift in
distributions between data sets. The central impact of using a modular approach
comes from adding new representations to a network, as opposed to replacing
representations via fine-tuning. Using this technique, we are able surpass
results using standard fine-tuning transfer learning approaches, and we are
also able to significantly increase performance over such approaches when using
smaller amounts of data.
Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, Jennifer Wortman Vaughan
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT)
We consider the design of online no-regret algorithms that are
computationally efficient, given access to an offline optimization oracle. We
present an algorithm we call Generalized Follow-the-Perturbed-Leader and
provide conditions under which it achieves vanishing regret and is
oracle-efficient. Our second main contribution is introducing a new adversarial
auction-design framework for revenue maximization and applying our
oracle-efficient learning results to the adaptive optimization of auctions.
Our learning algorithm is a generalization of the FTPL algorithm of Kalai and
Vempala that at every step plays the best-performing action subject to some
perturbations. Our design uses a shared source of randomness across all actions
that can be efficiently implemented using an oracle. Our work extends to
oracle-efficient algorithms for contextual learning, learning with
Maximal-in-Range approximation algorithms, and no-regret bidding in
simultaneous auctions, answering an open problem of Daskalakis and Syrgkanis in
the latter case.
Our auction-design framework considers an auctioneer learning an optimal
auction for a sequence of adversarially selected valuations with the goal of
achieving revenue that is almost as good as the optimal auction in hindsight,
among a class of auctions. We give oracle-efficient learning results for: (1)
VCG auctions with bidder-specific reserves in single-parameter settings, (2)
envy-free item pricing in multi-item auctions, and (3) s-level auctions of
Morgenstern and Roughgarden for single-item settings. The last result leads to
an approximation of the optimal Myerson auction for the stationary distribution
of a Markov process, extending prior work that only gave such guarantees for
the i.i.d. setting. We also extend our framework to allow the auctioneer to use
side information about the bidders in the design of the optimal auction
(contextual learning).
Mirmorsal Madani
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Today data mining techniques are exploited in medical science for diagnosing,
overcoming and treating diseases. Neural network is one of the techniques which
are widely used for diagnosis in medical field. In this article efficiency of
nine algorithms, which are basis of neural network learning in diagnosing
cardiovascular diseases, will be assessed. Algorithms are assessed in terms of
accuracy, sensitivity, transparency, AROC and convergence rate by means of 10
fold cross validation. The results suggest that in training phase, Lonberg-M
algorithm has the best efficiency in terms of all metrics, algorithm OSS has
maximum accuracy in testing phase, algorithm SCG has the maximum transparency
and algorithm CGB has the maximum sensitivity.
Ishan Durugkar, Ian Gemp, Sridhar Mahadevan
Comments: Submitted as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Multiagent Systems (cs.MA); Neural and Evolutionary Computing (cs.NE)
Generative adversarial networks (GANs) are a framework for producing a
generative model by way of a two-player minimax game. In this paper, we propose
the emph{Generative Multi-Adversarial Network} (GMAN), a framework that
extends GANs to multiple discriminators. In previous work, the successful
training of GANs requires modifying the minimax objective to accelerate
training early on. In contrast, GMAN can be reliably trained with the original,
untampered objective. We explore a number of design perspectives with the
discriminator role ranging from formidable adversary to forgiving teacher.
Image generation tasks comparing the proposed framework to standard GANs
demonstrate GMAN produces higher quality samples in a fraction of the
iterations when measured by a pairwise GAM-type metric.
Patrick McClure, Nikolaus Kriegeskorte
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
As deep neural networks (DNNs) are applied to increasingly challenging
problems, they will need to be able to represent their own uncertainty.
Modeling uncertainty is one of the key features of Bayesian methods. Scalable
Bayesian DNNs that use dropout-based variational distributions have recently
been proposed. Here we evaluate the ability of Bayesian DNNs trained with
Bernoulli or Gaussian distributions over units (dropout) or weights
(dropconnect) to represent their own uncertainty at the time of inference
through sampling. We tested how well Bayesian fully connected and convolutional
DNNs represented their own uncertainty in classifying the MNIST handwritten
digits. By adding different levels of Gaussian noise to the test images, we
assessed how DNNs represented their uncertainty about regions of input space
not covered by the training set. Bayesian DNNs estimated their own uncertainty
more accurately than traditional DNNs with a softmax output. These results are
important for building better deep learning systems and for investigating the
hypothesis that biological neural networks use sampling to represent
uncertainty.
Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Policy gradient is an efficient technique for improving a policy in a
reinforcement learning setting. However, vanilla online variants are on-policy
only and not able to take advantage of off-policy data. In this paper we
describe a new technique that combines policy gradient with off-policy
Q-learning, drawing experience from a replay buffer. This is motivated by
making a connection between the fixed points of the regularized policy gradient
algorithm and the Q-values. This connection allows us to estimate the Q-values
from the action preferences of the policy, to which we apply Q-learning
updates. We refer to the new technique as ‘PGQ’, for policy gradient and
Q-learning. We also establish an equivalency between action-value fitting
techniques and actor-critic algorithms, showing that regularized policy
gradient techniques can be interpreted as advantage function learning
algorithms. We conclude with some numerical examples that demonstrate improved
data efficiency and stability of PGQ. In particular, we tested PGQ on the full
suite of Atari games and achieved performance exceeding that of both
asynchronous advantage actor-critic (A3C) and Q-learning.
Frank S. He, Yang Liu, Alexander G. Schwing, Jian Peng
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We propose a novel training algorithm for reinforcement learning which
combines the strength of deep Q-learning with a constrained optimization
approach to tighten optimality and encourage faster reward propagation. Our
novel technique makes deep reinforcement learning more practical by drastically
reducing the training time. We evaluate the performance of our approach on the
49 games of the challenging Arcade Learning Environment, and report significant
improvements in both training time and accuracy.
Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Lipreading is the task of decoding text from the movement of a speaker’s
mouth. Traditional approaches separated the problem into two stages: designing
or learning visual features, and prediction. More recent deep lipreading
approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman,
2016a). All existing works, however, perform only word classification, not
sentence-level sequence prediction. Studies have shown that human lipreading
performance increases for longer words (Easton & Basala, 1982), indicating the
importance of features capturing temporal context in an ambiguous communication
channel. Motivated by this observation, we present LipNet, a model that maps a
variable-length sequence of video frames to text, making use of spatiotemporal
convolutions, an LSTM recurrent network, and the connectionist temporal
classification loss, trained entirely end-to-end. To the best of our knowledge,
LipNet is the first lipreading model to operate at sentence-level, using a
single end-to-end speaker-independent deep model to simultaneously learn
spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet
achieves 93.4% accuracy, outperforming experienced human lipreaders and the
previous 79.6% state-of-the-art accuracy.
Marthinus C. du Plessis, Gang Niu, Masashi Sugiyama
Comments: To appear in Machine Learning
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We consider the problem of estimating the class prior in an unlabeled
dataset. Under the assumption that an additional labeled dataset is available,
the class prior can be estimated by fitting a mixture of class-wise data
distributions to the unlabeled data distribution. However, in practice, such an
additional labeled dataset is often not available. In this paper, we show that,
with additional samples coming only from the positive class, the class prior of
the unlabeled dataset can be estimated correctly. Our key idea is to use
properly penalized divergences for model fitting to cancel the error caused by
the absence of negative samples. We further show that the use of the penalized
(L_1)-distance gives a computationally efficient algorithm with an analytic
solution. The consistency, stability, and estimation error are theoretically
analyzed. Finally, we experimentally demonstrate the usefulness of the proposed
method.
Barret Zoph, Quoc V. Le
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Neural networks are powerful and flexible models that work well for many
difficult learning tasks in image, speech and natural language understanding.
Despite their success, neural networks are still hard to design. In this paper,
we use a recurrent network to generate the model descriptions of neural
networks and train this RNN with reinforcement learning to maximize the
expected accuracy of the generated architectures on a validation set. On the
CIFAR-10 dataset, our method, starting from scratch, can design a novel network
architecture that rivals the best human-invented architecture in terms of test
set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84, which is
only 0.1 percent worse and 1.2x faster than the current state-of-the-art model.
On the Penn Treebank dataset, our model can compose a novel recurrent cell that
outperforms the widely-used LSTM cell, and other state-of-the-art baselines.
Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is
3.6 perplexity better than the previous state-of-the-art.
Liwen Zhang, John Winn, Ryota Tomioka
Comments: 12 pages, 2 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We propose the Gaussian attention model for content-based neural memory
access. With the proposed attention model, a neural network has the additional
degree of freedom to control the focus of its attention from a laser sharp
attention to blurred attention. It is applicable whenever we can assume that
the distance in the latent space reflects some notion of semantics. We use the
proposed attention model as a scoring function for the embedding of a
knowledgebase into a continuous vector space and then train a model that
performs question answering about the entities in the knowledgebase. The
proposed attention model can handle both the propagation of uncertainty when
following a series of relations and also the conjunction of thoughts in a
natural way. On a dataset of soccer players who participated in the FIFA World
Cup 2014, we demonstrate that our model can handle both path queries and
conjunctive queries well.
Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli
Comments: ICLR 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Recent works on neural architectures have demonstrated the utility of
attention mechanisms for a wide variety of tasks. Attention models used for
problems such as image captioning typically depend on the image under
consideration, as well as the previous sequence of words that come before the
word currently being generated. While these types of models have produced
impressive results, they are not able to model the higher-order interactions
involved in problems such as video description/captioning, where the
relationship between parts of the video and the concepts being depicted is
complex. Motivated by these observations, we propose a novel memory-based
attention model for video description. Our model utilizes memories of past
attention when reasoning about where to attend to in the current time step,
similar to the central executive system proposed in human cognition. This
allows the model to not only reason about local attention more effectively, it
allows it to consider the entire sequence of video frames while generating each
word. Evaluation on the challenging and popular MSVD and Charades datasets show
that the proposed architecture outperforms all previously proposed methods and
leads to a new state of the art results in the video description.
Zhen Xu, Wen Dong, Sargur Srihari
Comments: In proceedings of 29th Conference on Neural Information Processing Systems (NIPS 2016)
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Social dynamics is concerned primarily with interactions among individuals
and the resulting group behaviors, modeling the temporal evolution of social
systems via the interactions of individuals within these systems. In
particular, the availability of large-scale data from social networks and
sensor networks offers an unprecedented opportunity to predict state-changing
events at the individual level. Examples of such events include disease
transmission, opinion transition in elections, and rumor propagation. Unlike
previous research focusing on the collective effects of social systems, this
study makes efficient inferences at the individual level. In order to cope with
dynamic interactions among a large number of individuals, we introduce the
stochastic kinetic model to capture adaptive transition probabilities and
propose an efficient variational inference algorithm the complexity of which
grows linearly — rather than exponentially — with the number of
individuals. To validate this method, we have performed epidemic-dynamics
experiments on wireless sensor network data collected from more than ten
thousand people over three years. The proposed algorithm was used to track
disease transmission and predict the probability of infection for each
individual. Our results demonstrate that this method is more efficient than
sampling while nonetheless achieving high accuracy.
Sean C. Smithson, Guang Yang, Warren J. Gross, Brett H. Meyer
Comments: To appear in ICCAD’16. The authoritative version will appear in the ACM Digital Library
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Artificial neural networks have gone through a recent rise in popularity,
achieving state-of-the-art results in various fields, including image
classification, speech recognition, and automated control. Both the performance
and computational complexity of such models are heavily dependant on the design
of characteristic hyper-parameters (e.g., number of hidden layers, nodes per
layer, or choice of activation functions), which have traditionally been
optimized manually. With machine learning penetrating low-power mobile and
embedded areas, the need to optimize not only for performance (accuracy), but
also for implementation complexity, becomes paramount. In this work, we present
a multi-objective design space exploration method that reduces the number of
solution networks trained and evaluated through response surface modelling.
Given spaces which can easily exceed 1020 solutions, manually designing a
near-optimal architecture is unlikely as opportunities to reduce network
complexity, while maintaining performance, may be overlooked. This problem is
exacerbated by the fact that hyper-parameters which perform well on specific
datasets may yield sub-par results on others, and must therefore be designed on
a per-application basis. In our work, machine learning is leveraged by training
an artificial neural network to predict the performance of future candidate
networks. The method is evaluated on the MNIST and CIFAR-10 image datasets,
optimizing for both recognition accuracy and computational complexity.
Experimental results demonstrate that the proposed method can closely
approximate the Pareto-optimal front, while only exploring a small fraction of
the design space.
John K. Feser, Marc Brockschmidt, Alexander L. Gaunt, Daniel Tarlow
Comments: Submitted to ICLR 2017
Subjects: Programming Languages (cs.PL); Learning (cs.LG)
We discuss a range of modeling choices that arise when constructing an
end-to-end differentiable programming language suitable for learning programs
from input-output examples. Taking cues from programming languages research, we
study the effect of memory allocation schemes, immutable data, type systems,
and built-in control-flow structures on the success rate of learning
algorithms. We build a range of models leading up to a simple differentiable
functional programming language. Our empirical evaluation shows that this
language allows to learn far more programs than existing baselines.
Peisong Wang, Jian Cheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In recent years, Deep Neural Networks (DNNs) based methods have achieved
remarkable performance in a wide range of tasks and have been among the most
powerful and widely used techniques in computer vision, speech recognition and
Natural Language Processing. However, DNN-based methods are both
computational-intensive and resource-consuming, which hinders the application
of these methods on embedded systems like smart phones. To alleviate this
problem, we introduce a novel Fixed-point Factorized Networks (FFN) on
pre-trained models to reduce the computational complexity as well as the
storage requirement of networks. Extensive experiments on large-scale ImageNet
classification task show the effectiveness of our proposed method.
Oron Anschel, Nir Baram, Nahum Shimkin
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
The commonly used Q-learning algorithm combined with function approximation
induces systematic overestimations of state-action values. These systematic
errors might cause instability, poor performance and sometimes divergence of
learning. In this work, we present the extsc{Averaged Target DQN} (ADQN)
algorithm, an adaptation to the DQN class of algorithms which uses a weighted
average over past learned networks to reduce generalization noise variance. As
a consequence, this leads to reduced overestimations, more stable learning
process and improved performance. Additionally, we analyze ADQN variance
reduction along trajectories and demonstrate the performance of ADQN on a toy
Gridworld problem, as well as on several of the Atari 2600 games from the
Arcade Learning Environment.
Sam Fletcher, Md Zahidul Islam
Comments: Pre-print of paper submitted to ACM Computing Surveys, 33 pages
Subjects: Databases (cs.DB); Learning (cs.LG)
Data mining information about people is becoming increasingly important in
the data-driven society of the 21st century. Unfortunately, sometimes there are
real-world considerations that conflict with the goals of data mining;
sometimes the privacy of the people being data mined needs to be considered.
This necessitates that the output of data mining algorithms be modified to
preserve privacy while simultaneously not ruining the predictive power of the
outputted model. Differential privacy is a strong, enforceable definition of
privacy that can be used in data mining algorithms, guaranteeing that nothing
will be learned about the people in the data that could not already be
discovered without their participation. In this survey, we focus on one
particular data mining algorithm — decision trees — and how differential
privacy interacts with each of the components that constitute decision tree
algorithms. We analyze both greedy and random decision trees, and the conflicts
that arise when trying to balance privacy requirements with the accuracy of the
model.
Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We investigate deep generative models that can exchange multiple modalities
bi-directionally, e.g., generating images from corresponding texts and vice
versa. Recently, some studies handle multiple modalities on deep generative
models, such as variational autoencoders (VAEs). However, these models
typically assume that modalities are forced to have a conditioned relation,
i.e., we can only generate modalities in one direction. To achieve our
objective, we should extract a joint representation that captures high-level
concepts among all modalities and through which we can exchange them
bi-directionally. As described herein, we propose a joint multimodal
variational autoencoder (JMVAE), in which all modalities are independently
conditioned on joint representation. In other words, it models a joint
distribution of modalities. Furthermore, to be able to generate missing
modalities from the remaining modalities properly, we develop an additional
method, JMVAE-kl, that is trained by reducing the divergence between JMVAE’s
encoder and prepared networks of respective modalities. Our experiments show
that our proposed method can obtain appropriate joint representation from
multiple modalities and that it can generate and reconstruct them more properly
than conventional VAEs. We further demonstrate that JMVAE can generate multiple
modalities bi-directionally.
Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Physics and Society (physics.soc-ph)
When encountering novel object, humans are able to infer a wide range of
physical properties such as mass, friction and deformability by interacting
with them in a goal driven way. This process of active interaction is in the
same spirit of a scientist performing an experiment to discover hidden facts.
Recent advances in artificial intelligence have yielded machines that can
achieve superhuman performance in Go, Atari, natural language processing, and
complex control problems, but it is not clear that these systems can rival the
scientific intuition of even a young child. In this work we introduce a basic
set of tasks that require agents to estimate hidden properties such as mass and
cohesion of objects in an interactive simulated environment where they can
manipulate the objects and observe the consequences. We found that state of art
deep reinforcement learning methods can learn to perform the experiments
necessary to discover such hidden properties. By systematically manipulating
the problem difficulty and the cost incurred by the agent for performing
experiments, we found that agents learn different strategies that balance the
cost of gathering information against the cost of making mistakes in different
situations.
Pavol Bielik, Veselin Raychev, Martin Vechev
Subjects: Programming Languages (cs.PL); Learning (cs.LG)
To be practically useful, modern static analyzers must precisely model the
effect of both, statements in the programming language as well as frameworks
used by the program under analysis. While important, manually addressing these
challenges is difficult for at least two reasons: (i) the effects on the
overall analysis can be non-trivial, and (ii) as the size and complexity of
modern libraries increase, so is the number of cases the analysis must handle.
In this paper we present a new, automated approach for creating static
analyzers: instead of manually providing the various inference rules of the
analyzer, the key idea is to learn these rules from a dataset of programs. Our
method consists of two ingredients: (i) a synthesis algorithm capable of
learning a candidate analyzer from a given dataset, and (ii) a counter-example
guided learning procedure which generates new programs beyond those in the
initial dataset, critical for discovering corner cases and ensuring the learned
analysis generalizes to unseen programs.
We implemented and instantiated our approach to the task of learning a
points-to analysis for JavaScript, a challenging yet important problem that has
received significant research attention. We show that our approach is
effective: our system automatically discovered practical and useful inference
rules for many corner cases that are tricky to manually identify and are missed
by state-of-the-art, manually tuned solutions.
Gyuwan Kim, Hayoon Yi, Jangho Lee, Yunheung Paek, Sungroh Yoon
Comments: 12 pages, 5 figures
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
In computer security, designing a robust intrusion detection system is one of
the most fundamental and important problems. In this paper, we propose a
system-call language-modeling approach for designing anomaly-based host
intrusion detection systems. To remedy the issue of high false-alarm rates
commonly arising in conventional methods, we employ a novel ensemble method
that blends multiple thresholding classifiers into a single one, making it
possible to accumulate ‘highly normal’ sequences. The proposed system-call
language model has various advantages leveraged by the fact that it can learn
the semantic meaning and interactions of each system call that existing methods
cannot effectively consider. Through diverse experiments on public benchmark
datasets, we demonstrate the validity and effectiveness of the proposed method.
Moreover, we show that our model possesses high portability, which is one of
the key aspects of realizing successful intrusion detection systems.
Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Previous work combines word-level and character-level representations using
concatenation or scalar weighting, which is suboptimal for high-level tasks
like reading comprehension. We present a fine-grained gating mechanism to
dynamically combine word-level and character-level representations based on
properties of the words. We also extend the idea of fine-grained gating to
modeling the interaction between questions and paragraphs for reading
comprehension. Experiments show that our approach can improve the performance
on reading comprehension tasks, achieving new state-of-the-art results on the
Children’s Book Test dataset. To demonstrate the generality of our gating
mechanism, we also show improved results on a social media tag prediction task.
Dilin Wang, Qiang Liu
Comments: Under review as a conference paper at ICLR 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We propose a simple algorithm to train stochastic neural networks to draw
samples from given target distributions for probabilistic inference. Our method
is based on iteratively adjusting the neural network parameters so that the
output changes along a Stein variational gradient that maximumly decreases the
KL divergence with the target distribution. Our method works for any target
distribution specified by their unnormalized density function, and can train
any black-box architectures that are differentiable in terms of the parameters
we want to adapt. As an application of our method, we propose an amortized MLE
algorithm for training deep energy model, where a neural sampler is adaptively
trained to approximate the likelihood function. Our method mimics an
adversarial game between the deep energy model and the neural sampler, and
obtains realistic-looking images competitive with the state-of-the-art results.
Feras Saad, Vikash Mansinghka
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Sparse databases with hundreds of variables are commonplace. In this setting,
it is both statistically and computationally challenging to detect true
predictive relationships between variables and also to suppress false
positives. This paper proposes a new approach to dependency detection that
combines probabilistic programming, information theory, and non-parametric
Bayesian modeling. The key ideas are to (i) build an ensemble of joint
probability models for the whole database via approximate posterior inference
in CrossCat, a non-parametric factorial mixture; (ii) identify independencies
by analyzing model structures; and (iii) report the distribution on conditional
mutual information induced by posterior uncertainty over the ensemble of
models. This paper presents experiments showing that the approach finds
relationships that pairwise correlation misses, including context-specific
independencies, on databases of mathematics exam scores and global indicators
of macroeconomic development.
Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based
language model designed to directly capture the global semantic meaning
relating words in a document via latent topics. Because of their sequential
nature, RNNs are good at capturing the local structure of a word sequence –
both semantic and syntactic – but might face difficulty remembering long-range
dependencies. Intuitively, these long-range dependencies are of semantic
nature. In contrast, latent topic models are able to capture the global
underlying semantic structure of a document but do not account for word
ordering. The proposed TopicRNN model integrates the merits of RNNs and latent
topic models: it captures local (syntactic) dependencies using an RNN and
global (semantic) dependencies using latent topics. Unlike previous work on
contextual RNN language modeling, our model is learned end-to-end. Empirical
results on word prediction show that TopicRNN outperforms existing contextual
RNN baselines. In addition, TopicRNN can be used as an unsupervised feature
extractor for documents. We do this for sentiment analysis and report a new
state-of-the-art error rate on the IMDB movie review dataset that amounts to a
(13.3\%) improvement over the previous best result. Finally TopicRNN also
yields sensible topics, making it a useful alternative to document models such
as latent Dirichlet allocation.
Yuval Dagan, Yuval Filmus, Ariel Gabizon, Shay Moran
Comments: 77 pages
Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Combinatorics (math.CO)
A basic combinatorial interpretation of Shannon’s entropy function is via the
“20 questions” game. This cooperative game is played by two players, Alice and
Bob: Alice picks a distribution (pi) over the numbers ({1,ldots,n}), and
announces it to Bob. She then chooses a number (x) according to (pi), and Bob
attempts to identify (x) using as few Yes/No queries as possible, on average.
An optimal strategy for the “20 questions” game is given by a Huffman code
for (pi): Bob’s questions reveal the codeword for (x) bit by bit. This
strategy finds (x) using fewer than (H(pi)+1) questions on average. However,
the questions asked by Bob could be arbitrary. In this paper, we investigate
the following question: Are there restricted sets of questions that match the
performance of Huffman codes, either exactly or approximately?
Our first main result shows that for every distribution (pi), Bob has a
strategy that uses only questions of the form “(x < c)?” and “(x = c)?”, and
uncovers (x) using at most (H(pi)+1) questions on average, matching the
performance of Huffman codes in this sense. We also give a natural set of
(O(rn^{1/r})) questions that achieve a performance of at most (H(pi)+r), and
show that (Omega(rn^{1/r})) questions are required to achieve such a
guarantee.
Our second main result gives a set (mathcal{Q}) of (1.25^{n+o(n)}) questions
such that for every distribution (pi), Bob can implement an optimal strategy
for (pi) using only questions from (mathcal{Q}). We also show that
(1.25^{n-o(n)}) questions are needed, for infinitely many (n). If we allow a
small slack of (r) over the optimal strategy, then roughly ((rn)^{Theta(1/r)})
questions are necessary and sufficient.
Lu Hou, Quanming Yao, James T. Kwok
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
Deep neural network models, though very powerful and highly successful, are
computationally expensive in terms of space and time. Recently, there have been
a number of attempts on binarizing the network weights and activations. This
greatly reduces the network size, and replaces the underlying multiplications
to additions or even XNOR bit operations. However, existing binarization
schemes are based on simple matrix approximation and ignore the effect of
binarization on the loss. In this paper, we propose a proximal Newton algorithm
with diagonal Hessian approximation that directly minimizes the loss w.r.t. the
binarized weights. The underlying proximal step has an efficient closed-form
solution, and the second-order information can be efficiently obtained from the
second moments already computed by the Adam optimizer. Experiments on both
feedforward and recurrent networks show that the proposed loss-aware
binarization algorithm outperforms existing binarization schemes, and is also
more robust for wide and deep networks.
James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
Comments: Submitted to conference track at ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)
Recurrent neural networks are a powerful tool for modeling sequential data,
but the dependence of each timestep’s computation on the previous timestep’s
output limits parallelism and makes RNNs unwieldy for very long sequences. We
introduce quasi-recurrent neural networks (QRNNs), an approach to neural
sequence modeling that alternates convolutional layers, which apply in parallel
across timesteps, and a minimalist recurrent pooling function that applies in
parallel across channels. Despite lacking trainable recurrent layers, stacked
QRNNs have better predictive accuracy than stacked LSTMs of the same hidden
size. Due to their increased parallelism, they are up to 16 times faster at
train and test time. Experiments on language modeling, sentiment
classification, and character-level neural machine translation demonstrate
these advantages and underline the viability of QRNNs as a basic building block
for a variety of sequence tasks.
Philip Blair, Yuval Merhav, Joel Barry
Comments: 12 pages
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We propose a language-agnostic way of automatically generating sets of
semantically similar clusters of entities along with sets of “outlier”
elements, which may then be used to perform an intrinsic evaluation of word
embeddings in the outlier detection task. We used our methodology to create a
gold-standard dataset, which we call WikiSem500, and evaluated multiple
state-of-the-art embeddings. The results show a correlation between performance
on this dataset and performance on sentiment analysis.
Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a highly accurate single-image super-resolution (SR) method. Our
method uses a very deep convolutional network inspired by VGG-net used for
ImageNet classification cite{simonyan2015very}. We find increasing our network
depth shows a significant improvement in accuracy. Our final model uses 20
weight layers. By cascading small filters many times in a deep network
structure, contextual information over large image regions is exploited in an
efficient way. With very deep networks, however, convergence speed becomes a
critical issue during training. We propose a simple yet effective training
procedure. We learn residuals only and use extremely high learning rates
((10^4) times higher than SRCNN cite{dong2015image}) enabled by adjustable
gradient clipping. Our proposed method performs better than existing methods in
accuracy and visual improvements in our results are easily noticeable.
Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We propose an image super-resolution method (SR) using a deeply-recursive
convolutional network (DRCN). Our network has a very deep recursive layer (up
to 16 recursions). Increasing recursion depth can improve performance without
introducing new parameters for additional convolutions. Albeit advantages,
learning a DRCN is very hard with a standard gradient descent method due to
exploding/vanishing gradients. To ease the difficulty of training, we propose
two extensions: recursive-supervision and skip-connection. Our method
outperforms previous methods by a large margin.
Hua Sun, Syed A. Jafar
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
The capacity has recently been characterized for the private information
retrieval (PIR) problem as well as several of its variants. In every case it is
assumed that all the queries are generated by the user simultaneously. Here we
consider multiround PIR, where the queries in each round are allowed to depend
on the answers received in previous rounds. We show that the capacity of
multiround PIR is the same as the capacity of single-round PIR (the result is
generalized to also include (T)-privacy constraints). Combined with previous
results, this shows that there is no capacity advantage from multiround over
single-round schemes, non-linear over linear schemes or from (epsilon)-error
over zero-error schemes. However, we show through an example that there is an
advantage in terms of storage overhead. We provide an example of a multiround,
non-linear, (epsilon)-error PIR scheme that requires a strictly smaller
storage overhead than the best possible with single-round, linear, zero-error
PIR schemes.
Ragnar Freij-Hollanti, Oliver Gnilke, Camilla Hollanti, David Karpuk
Subjects: Information Theory (cs.IT)
We present a general framework for Private Information Retrieval (PIR) from
arbitrary coded databases, that allows one to adjust the rate of the scheme
according to the suspected number of colluding servers. If the storage code is
a generalized Reed-Solomon code of length n and dimension k, we design PIR
schemes which simultaneously protect against t colluding servers and provide
PIR rate 1-(k+t-1)/n, for all t between 1 and n-k. This interpolates between
the previously studied cases of t=1 and k=1 and asymptotically achieves the
known capacity bounds in both of these cases, as the size of the database
grows.
Yirui Cong, Xiangyun Zhou, Rodney A. Kennedy
Comments: The paper is accepted for publication in IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
This paper studies a wireless network consisting of multiple
transmitter-receiver pairs where interference is treated as noise. Previously,
the throughput region of such networks was characterized for either one time
slot or an infinite time horizon. We aim to fill the gap by investigating the
throughput region for transmissions over a finite time horizon. Unlike the
infinite-horizon throughput region, which is simply the convex hull of the
throughput region of one time slot, the finite-horizon throughput region is
generally non-convex. Instead of directly characterizing all achievable
rate-tuples in the finite-horizon throughput region, we propose a metric termed
the rate margin, which not only determines whether any given rate-tuple is
within the throughput region (i.e., achievable or unachievable), but also tells
the amount of scaling that can be done to the given achievable (unachievable)
rate-tuple such that the resulting rate-tuple is still within (brought back
into) the throughput region. Furthermore, we derive an efficient algorithm to
find the rate-achieving policy for any given rate-tuple in the finite-horizon
throughput region.
Nima N. Moghadam, Hossein Shokri-Ghadikolaei, Gabor Fodor, Mats Bengtsson, Carlo Fischione
Comments: Submitted to IEEE Journal on Selected Areas in Communications
Subjects: Information Theory (cs.IT)
Although the benefits of precoding and combining of data signals are widely
recognized, the potential of these techniques for pilot transmission is not
fully understood. This is particularly relevant for multiuser multiple-input
multiple-output (MU-MIMO) cellular systems using millimeter-wave (mmWave)
communications, where multiple antennas will have to be used both at the
transmitter and the receiver to overcome the severe path loss. In this paper,
we characterize the gains of pilot precoding and combining in terms of channel
estimation quality and achievable data rate. Specifically, we consider three
uplink pilot transmission scenarios in a mmWave MU-MIMO cellular system: 1)
non-precoded and uncombined, 2) precoded but uncombined, and 3) precoded and
combined. We show that a simple precoder that utilizes only the second-order
statistics of the channel reduces the variance of the channel estimation error
by a factor that is proportional to the number of user equipment (UE) antennas.
We also show that using a linear combiner designed based on the second-order
statistics of the channel significantly reduces multiuser interference and
provides the possibility of reusing some pilots. Specifically, in the large
antenna regime, pilot precoding and combining help to accommodate a large
number of UEs in one cell, significantly improve channel estimation quality,
boost the signal-to-noise ratio of the UEs located close to the cell edges,
alleviate pilot contamination, and address the imbalanced coverage of pilot and
data signals.
Shihao Yan, Xiangyun Zhou, Nan Yang, Biao He, Thushara D. Abhayapala
Comments: Accepted by IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
This work for the first time examines the impact of transmitter-side
correlation on the artificial-noise-aided secure transmission, based on which a
new power allocation strategy for artificial noise (AN) is devised for physical
layer security enhancement. Specifically, we design a correlation-based power
allocation (CPA) for AN, of which the optimality in terms of achieving the
minimum secrecy outage probability is analytically proved in the large system
regime with the number of transmit antennas approaching infinity. In order to
fully reveal the benefits of the CPA, we derive easy-to-evaluate expressions
for the secrecy outage probability achieved by the CPA. Our study demonstrates
that the CPA is nearly optimal and significantly outperforms the widely-used
uniform power allocation (UPA) even for a moderately small number of correlated
transmit antennas. Furthermore, our numerical results reveal a fundamental
difference between the CPA and UPA. That is when the number of correlated
transmit antennas increases, the secrecy outage probability of the CPA always
reduces while the secrecy outage probability of the UPA suffers from a
saturation point.
Ardhendu Tripathy, Aditya Ramamoorthy
Comments: 27 pages, 7 figures, preprint submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
A sum-network is an instance of a network coding problem over a directed
acyclic network in which each terminal node wants to compute the sum over a
finite field of the information observed at all the source nodes. Many
characteristics of the well-studied multiple unicast network communication
problem also hold for sum-networks due to a known reduction between instances
of these two problems. In this work, we describe an algorithm to construct
families of sum-network instances using incidence structures. The computation
capacity of several of these sum-network families is characterized. We
demonstrate that unlike the multiple unicast problem, the computation capacity
of sum-networks depends on the characteristic of the finite field over which
the sum is computed. This dependence is very strong; we show examples of
sum-networks that have a rate-1 solution over one characteristic but a rate
close to zero over a different characteristic. Additionally, a sum-network can
have an arbitrary different number of computation capacities for different
alphabets. This is contrast to the multiple unicast problem where it is known
that the capacity is independent of the network coding alphabet.
Ming Ding, David Lopez Perez
Comments: 6 pages, 5 figures, to appear in IEEE Globecom 2016. arXiv admin note: substantial text overlap with arXiv:1608.06694
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
In this paper, we present a new and significant theoretical discovery. If the
absolute height difference between base station (BS) antenna and user equipment
(UE) antenna is larger than zero, then the network capacity performance in
terms of the area spectral efficiency (ASE) will continuously decrease as the
BS density increases for ultra-dense (UD) small cell networks (SCNs). This
performance behavior has a tremendous impact on the deployment of UD SCNs in
the 5th-generation (5G) era. Network operators may invest large amounts of
money in deploying more network infrastructure to only obtain an even worse
network performance. Our study results reveal that it is a must to lower the
SCN BS antenna height to the UE antenna height to fully achieve the capacity
gains of UD SCNs in 5G. However, this requires a revolutionized approach of BS
architecture and deployment, which is explored in this paper too.
Yahia Alghorani, Georges Kaddoum, Sami Muhaidat, Samuel Pierre
Journal-ref: IEEE Wireless Communications Letters, Vol. 4, No. 4, August 2015
Subjects: Information Theory (cs.IT)
In this letter, we consider the problem of energy detection of unknown
signals in an intervehicular communication (IVC) system over n*Rayleigh fading
channels (also known as cascaded Rayleigh). Novel tight approximations for the
probability of detection are derived for the no-diversity and the maximum ratio
combining (MRC)) diversity schemes. Moreover, we investigate the system
performance when cooperative spectrum sensing (CSS) is considered with and
without imperfect reporting channels. The analytical results show that the
detection reliability is decreased as the fading severity parameter n increases
but reliability is substantially improved when CSS employs MRC schemes.
Yehuda Dar, Alfred M. Bruckstein
Subjects: Information Theory (cs.IT)
In this work we study the topic of high-resolution adaptive sampling of a
given deterministic signal and establish a clear connection to classic analyses
of high-rate quantization. More specifically, we formulate solutions to the
task of optimal high-resolution sampling of one-dimensional signals, which are
shown as the counterparts of well-known results for high-rate scalar
quantization. Our results reveal that the optimal high-resolution sampling
structure is determined by the density of the signal-derivative energy, just as
the probability-density-function defines the optimal high-rate quantization
form. This paper has three main contributions: the first is establishing a
fundamental paradigm bridging the topics of sampling and quantization. The
second is a theoretical analysis of sampling that is relevant to the emerging
field of high-resolution signal processing. The third is a new approach for
nonuniform sampling of one-dimensional signals that is experimentally shown to
outperform an optimized tree-structured sampling technique.
Junfeng Guo, Zhaozhe Song, Ying Cui
Comments: submitted to ICC 2017
Subjects: Information Theory (cs.IT)
To increase mobile batteries’ lifetime and improve quality of experience for
computation-intensive and latency-sensitive applications, mobile edge computing
has received significant interest. Designing energy-efficient mobile edge
computing systems requires joint optimization of communication and computation
resources. In this paper, we consider energy-efficient resource allocation for
a multi-user mobile edge computing system. First, we establish on two
computation-efficient models with negligible and non-negligible base station
(BS) executing durations, respectively. Then, under each model, we formulate
the overall weighted sum energy consumption minimization problem by optimally
allocating communication and computation resources. The optimization problem
for negligible BS executing duration is convex, and we obtain the optimal
solution in closed-form to this problem. The optimization problem for
non-negligible BS executing duration is NP-hard in general, and we obtain a
sub-optimal solution with low-complexity to this problem, by connecting it to a
three-stage flow-shop scheduling problem and wisely utilizing Johnson’s
algorithm. Finally, numerical results show that the proposed solutions
outperform some baseline schemes.
Kaushik Majumdar, Srinath Jayachandran
Comments: 11 figures, 3 tables
Subjects: Information Theory (cs.IT)
A one dimensional time domain analog signal s(t) can be visualized as a
trajectory of a moving particle in a force field with one degree of freedom.
Then the power of the particle at point t is P(s(t)) = s”(t)s'(t), which is the
rate at which kinetic energy is dissipated (assuming the mass of the particle
is unit) by the particle in order to create the trajectory or give shape to the
signal. Assuming meaning of the signal or the semantic information is in its
shape, we can say that P(s(t)) is the rate at which kinetic energy of the
particle is dissipated to encode semantic information in s(t) at t. After s(t)
is digitized (to make it s[n]) the discrete form P(s[n]) is valid. Considering
the sign changes of P(s[n]) it has been shown that in the smallest neighborhood
of n, in which n is the middle point, semantic information in s[n] can be
encoded in 13 distinct ways. This list is exhaustive. A deterministic finite
automaton (DFA) has been designed which can accept any finite length digital
signal and therefore collection of all finite length digital signals forms a
regular language. The DFA has been generalized to a weighted finite state
transducer (WFST), which has been used to identify action potentials in a spike
train and also to distinguish two speakers when uttering the same phoneme. It
has been shown that in any analog signal semantic information can be encoded at
a point in the form of the shape of its infinitesimal neighborhood in 17
distinct ways. The list is exhaustive. A new entropy measure called semantic
entropy has been introduced. It has been shown that a signal s(t) is traceable
on a piece of paper or in an oscilloscope, only if s”(t) exists on all but at
most a finite number of points within any finite interval. This is an essential
condition for a signal to be the trajectory of a moving particle.
Wonjae Shin, Mojtaba Vaezi, Byungju Lee, David J. Love, Jungwoo Lee, H. Vincent Poor
Comments: 13 pages, 4 figures, submitted to IEEE Communications Magazine
Subjects: Information Theory (cs.IT)
Non-orthogonal multiple access (NOMA) is a potential enabler for the
development of 5G and beyond wireless networks. By allowing multiple users to
share the same time and frequency, NOMA can scale up the number of served
users, increase the spectral efficiency, and improve user-fairness compared to
existing orthogonal multiple access (OMA) techniques. While single-cell NOMA
has drawn significant attention recently, much less attention has been given to
multi-cell NOMA. This article discusses the opportunities and challenges of
NOMA in a multi-cell environment. As the density of base stations and devices
increases, inter-cell interference becomes a major obstacle in multi-cell
networks. As such, identifying techniques that combine interference management
approaches with NOMA is of great significance. After discussing the theory
behind NOMA, this paper provides an overview of the current literature and
discusses key implementation and research challenges, with an emphasis on
multi-cell NOMA.
Md Abdul Alim, Tianyi Pan, My Tra Thai, Walid Saad
Comments: 34 pages
Subjects: Information Theory (cs.IT)
Device-to-device (D2D) communications over licensed wireless spectrum has
been recently proposed as a promising technology to meet the capacity crunch of
next generation cellular networks. However, due to the high mobility of
cellular devices, establishing and ensuring the success of D2D transmission
becomes a major challenge. To this end, in this paper, a novel framework is
proposed to enable devices to form multi-hop D2D connections in an effort to
maintain sustainable communication in the presence of device mobility. To solve
the problem posed by device mobility, in contrast to existing works, which
mostly focus on physical domain information, a durable community based approach
is introduced taking social encounters into context. It is shown that the
proposed scheme can derive an optimal solution for time sensitive content
transmission while also minimizing the cost that the base station pays in order
to incentivize users to participate in D2D. Simulation results show that the
proposed social community aware approach yields significant performance gain,
in terms of the amount of traffic offloaded from the cellular network to the
D2D tier, compared to the classical social-unaware methods.
Mohammad Mohammadi Amiri, Qianqian Yang, Deniz Gunduz
Comments: Submitted for publication
Subjects: Information Theory (cs.IT)
Decentralized coded caching is studied in a content delivery system with a
server holding a database of (N) files, each of size (F) bits, serving (K)
users, each equipped with a cache memory, not necessarily of equal size. It is
assumed that the users’ caches are filled in advance during the off-peak
traffic period in a decentralized manner, i.e., without the knowledge of the
number of active users, their identities, or their particular demands. User
demands are revealed during the peak traffic period, and are served
simultaneously through an error-free shared link. A group-based decentralized
coded caching scheme is proposed, and it is shown to improve upon the
state-of-the-art in terms of the minimum required delivery rate over the shared
link, when there are more users in the system than files. Numerical results
indicate that the improvement is more significant as the cache capacities of
the users become more skewed. A new lower bound on the delivery rate is also
presented, which provides a tighter bound than the classical cut-set bound.
Amir H. Jafari, Vijay Venkateswaran, David Lopez-Perez, Jie Zhang
Comments: 13 pages, 7 figures. Submitted to IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT)
In ultra-dense small cell networks, spatial multiplexing gain is a challenge
because of the different propagation conditions. The channels associated with
different transmitreceive pairs can be highly correlated due to the i) high
probability of line-of-sight (LOS) communication between user equipment (UE)
and base station (BS), and ii) insufficient spacing between antenna elements at
both UE and BS. In this paper, we propose a novel transmission technique titled
Diversity Pulse Shaped Transmission (DPST) to enhance the throughput over the
correlated MIMO channels in an ultra-dense small cell network. The fundamental
of DPST is to shape transmit signals at adjacent antennas with distinct
interpolating filters, introducing pulse shaping diversity. In DPST, each
antenna transmits its own data stream with a relative deterministic time
offset-which must be a fraction of the symbol period-with respect to the
adjacent antenna. The delay is interpolated with the pulse shaped signal
generating a virtual MIMO channel that benefits from increased diversity from
the receiver perspective. To extract the diversity, the receiver must operate
in an over-sampled domain and hence a fractionally spaced equaliser (FSE) is
proposed. The joint impact of DPST and FSE helps the receiver to sense a less
correlated channel, eventually enhancing the UE’s throughput. Moreover, in
order to minimise the spatial correlation, we aim to optimise the deterministic
fractional delay. Simulation results show that applying DPST to a correlated
channel can approximately enhance the UE throughput by 1.93x and 3.76x in 2×2
and 4×4 MIMO systems, respectively.
Jian Du, Shaodan Ma, Yik-Chung Wu, Soummya Kar, José M. F. Moura
Comments: 30 pages
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT)
In networks such as the smart grid, communication networks, and social
networks, local measurements/observations are scattered over a wide
geographical area. Centralized inference algorithm are based on gathering all
the observations at a central processing unit. However, with data explosion and
ever-increasing network sizes, centralized inference suffers from large
communication overhead, heavy computation burden at the center, and
susceptibility to central node failure. This paper considers inference over
networks using factor graphs and a distributed inference algorithm based on
Gaussian belief propagation. The distributed inference involves only local
computation of the information matrix and of the mean vector and message
passing between neighbors. We discover and show analytically that the message
information matrix converges exponentially fast to a unique positive definite
limit matrix for arbitrary positive semidefinite initialization. We provide the
necessary and sufficient convergence condition for the belief mean vector to
converge to the optimal centralized estimator. An easily verifiable sufficient
convergence condition on the topology of a factor graph is further provided.
Neri Merhav
Comments: 18 pages, 2 figures, submitted for publication
Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT)
We consider a system model of a general finite-state machine (ratchet) that
simultaneously interacts with three kinds of reservoirs: a heat reservoir, a
work reservoir, and an information reservoir, the latter being taken to be a
running digital tape whose symbols interact sequentially with the machine. As
has been shown in earlier work, this finite-state machine can act as a demon
(with memory), which creates a net flow of energy from the heat reservoir into
the work reservoir (thus extracting useful work) at the price of increasing the
entropy of the information reservoir. Under very few assumptions, we propose a
simple derivation of a family of inequalities that relate the work extraction
with the entropy production. These inequalities can be seen as either upper
bounds on the extractable work or as lower bounds on the entropy production,
depending on the point of view. Many of these bounds are relatively easy to
calculate and they are tight in the sense that equality can be approached
arbitrarily closely. In their basic forms, these inequalities are applicable to
any finite number of cycles (and not only asymptotically), and for a general
input information sequence (possibly correlated), which is not necessarily
assumed even stationary. Several known results are obtained as special cases.
Sven Müelich, Martin Bossert
Comments: 6 pages
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
Error correction is an indispensable component when Physical Unclonable
Functions (PUFs) are used in cryptographic applications. So far, there exist
schemes that obtain helper data, which they need within the error correction
process. We introduce a new scheme, which only uses an error correcting code
without any further helper data. The main idea is to construct for each PUF
instance an individual code which contains the initial PUF response as
codeword. In this work we use LDPC codes, however other code classes are also
possible. Our scheme allows a trade-off between code rate and cryptographic
security. In addition, decoding with linear complexity is possible.
Wentao Huang, Kechen Zhang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
A framework is presented for unsupervised learning of representations based
on infomax principle for large-scale neural populations. We use an asymptotic
approximation to the Shannon’s mutual information for a large neural population
to demonstrate that a good initial approximation to the global
information-theoretic optimum can be obtained by a hierarchical infomax method.
From the initial solution, an efficient algorithm based on gradient descent of
the final objective function is proposed to learn representations from the
input datasets, allowing complete, overcomplete, or undercomplete bases. As
confirmed by numerical experiments, our method is robust and highly efficient
for extracting salient features from image datasets. Compared with the main
existing methods, our algorithm has a distinct advantage in both the training
speed and the robustness of unsupervised representation learning.
Johannes Ballé, Valero Laparra, Eero P. Simoncelli
Comments: Under review as a conference paper at ICLR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
We describe an image compression system, consisting of a nonlinear encoding
transformation, a uniform quantizer, and a nonlinear decoding transformation.
Like many deep neural network architectures, the transforms consist of layers
of convolutional linear filters and nonlinear activation functions, but we use
a joint nonlinearity that implements a form of local gain control, inspired by
those used to model biological neurons. Using a variant of stochastic gradient
descent, we jointly optimize the system for rate-distortion performance over a
database of training images, introducing a continuous proxy for the
discontinuous loss function arising from the quantizer. The relaxed
optimization problem resembles that of variational autoencoders, except that it
must operate at any point along the rate-distortion curve, whereas the
optimization of generative models aims only to minimize entropy of the data
under the model. Across an independent database of test images, we find that
the optimized coder exhibits significantly better rate-distortion performance
than the standard JPEG and JPEG 2000 compression systems, as well as a dramatic
improvement in visual quality of compressed images.
Yuval Dagan, Yuval Filmus, Ariel Gabizon, Shay Moran
Comments: 77 pages
Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Combinatorics (math.CO)
A basic combinatorial interpretation of Shannon’s entropy function is via the
“20 questions” game. This cooperative game is played by two players, Alice and
Bob: Alice picks a distribution (pi) over the numbers ({1,ldots,n}), and
announces it to Bob. She then chooses a number (x) according to (pi), and Bob
attempts to identify (x) using as few Yes/No queries as possible, on average.
An optimal strategy for the “20 questions” game is given by a Huffman code
for (pi): Bob’s questions reveal the codeword for (x) bit by bit. This
strategy finds (x) using fewer than (H(pi)+1) questions on average. However,
the questions asked by Bob could be arbitrary. In this paper, we investigate
the following question: Are there restricted sets of questions that match the
performance of Huffman codes, either exactly or approximately?
Our first main result shows that for every distribution (pi), Bob has a
strategy that uses only questions of the form “(x < c)?” and “(x = c)?”, and
uncovers (x) using at most (H(pi)+1) questions on average, matching the
performance of Huffman codes in this sense. We also give a natural set of
(O(rn^{1/r})) questions that achieve a performance of at most (H(pi)+r), and
show that (Omega(rn^{1/r})) questions are required to achieve such a
guarantee.
Our second main result gives a set (mathcal{Q}) of (1.25^{n+o(n)}) questions
such that for every distribution (pi), Bob can implement an optimal strategy
for (pi) using only questions from (mathcal{Q}). We also show that
(1.25^{n-o(n)}) questions are needed, for infinitely many (n). If we allow a
small slack of (r) over the optimal strategy, then roughly ((rn)^{Theta(1/r)})
questions are necessary and sufficient.