Tim Rocktäschel, Sebastian Riedel
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)
We introduce neural networks for end-to-end differentiable theorem proving
that operate on dense vector representations of symbols. These neural networks
are constructed recursively by taking inspiration from the backward chaining
algorithm as used in Prolog. Specifically, we replace symbolic unification with
a differentiable computation on vector representations of symbols using a
radial basis function kernel, thereby combining symbolic reasoning with
learning subsymbolic vector representations. By using gradient descent, the
resulting neural network can be trained to infer facts from a given incomplete
knowledge base. It learns to (i) place representations of similar symbols in
close proximity in a vector space, (ii) make use of such similarities to prove
facts, (iii) induce logical rules, and (iv) use provided and induced logical
rules for complex multi-hop reasoning. We demonstrate that this architecture
outperforms ComplEx, a state-of-the-art neural link prediction model, on four
benchmark knowledge bases while at the same time inducing interpretable
function-free first-order logic rules.
Friedemann Zenke, Surya Ganguli
Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
A vast majority of computation in the brain is performed by spiking neural
networks. Despite the ubiquity of such spiking, we currently lack an
understanding of how biological spiking neural circuits learn and compute
in-vivo, as well as how we can instantiate such capabilities in artificial
spiking circuits in-silico. Here we revisit the problem of supervised learning
in temporally coding multi-layer spiking neural networks. First, by using a
surrogate gradient approach, we derive SuperSpike, a nonlinear voltage-based
three factor learning rule capable of training multi-layer networks of
deterministic integrate-and-fire neurons to perform nonlinear computations on
spatiotemporal spike patterns. Second, inspired by recent results on feedback
alignment, we compare the performance of our learning rule under different
credit assignment strategies for propagating output errors to hidden units.
Specifically, we test uniform, symmetric and random feedback, finding that
simpler tasks can be solved with any type of feedback, while more complex tasks
require symmetric feedback. In summary, our results open the door to obtaining
a better scientific understanding of learning and computation in spiking neural
networks by advancing our ability to train them to solve nonlinear problems
involving transformations between different spatiotemporal spike-time patterns.
Julien Perez, Tomi Silander
Comments: 11 pages, 1 figure, 1 table
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Partially observable environments present an important open challenge in the
domain of sequential control learning with delayed rewards. Despite numerous
attempts during the two last decades, the majority of reinforcement learning
algorithms and associated approximate models, applied to this context, still
assume Markovian state transitions. In this paper, we explore the use of a
recently proposed attention-based model, the Gated End-to-End Memory Network,
for sequential control. We call the resulting model the Gated End-to-End Memory
Policy Network. More precisely, we use a model-free value-based algorithm to
learn policies for partially observed domains using this memory-enhanced neural
network. This model is end-to-end learnable and it features unbounded memory.
Indeed, because of its attention mechanism and associated non-parametric
memory, the proposed model allows us to define an attention mechanism over the
observation stream unlike recurrent models. We show encouraging results that
illustrate the capability of our attention-based model in the context of the
continuous-state non-stationary control problem of stock trading. We also
present an OpenAI Gym environment for simulated stock exchange and explain its
relevance as a benchmark for the field of non-Markovian decision process
learning.
Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville
Comments: 11 pages, 3 figures, 5 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Generative Adversarial Networks (GANs) have gathered a lot of attention from
the computer vision community, yielding impressive results for image
generation. Advances in the adversarial generation of natural language from
noise however are not commensurate with the progress made in generating images,
and still lag far behind likelihood based methods. In this paper, we take a
step towards generating natural language with a GAN objective alone. We
introduce a simple baseline that addresses the discrete output space problem
without relying on gradient estimators and show that it is able to achieve
state-of-the-art results on a Chinese poem generation dataset. We present
quantitative results on generating sentences from context-free and
probabilistic context-free grammars, and qualitative language modeling results.
A conditional version is also described that can generate sequences conditioned
on sentence characteristics.
Larissa Albantakis
Comments: This article is a contribution to the FQXi 2016-2017 essay contest “Wandering Towards a Goal”
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)
What does it take for a system, biological or not, to have goals? Here, this
question is approached in the context of in silico artificial evolution. By
examining the informational and causal properties of artificial organisms
(‘animats’) controlled by small, adaptive neural networks (Markov Brains), this
essay discusses necessary requirements for intrinsic information, autonomy, and
meaning. The focus lies on comparing two types of Markov Brains that evolved in
the same simple environment: one with purely feedforward connections between
its elements, the other with an integrated set of elements that causally
constrain each other. While both types of brains ‘process’ information about
their environment and are equally fit, only the integrated one forms a causally
autonomous entity above a background of external influences. This suggests that
to assess whether goals are meaningful for a system itself, it is important to
understand what the system is, rather than what it does.
Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In the neural network domain, methods for hyperparameter optimization and
meta-modeling are computationally expensive due to the need to train a large
number of neural network configurations. In this paper, we show that a simple
regression model, based on support vector machines, can predict the final
performance of partially trained neural network configurations using features
based on network architectures, hyperparameters, and time-series validation
performance data. We use this regression model to develop an early stopping
strategy for neural network configurations. With this early stopping strategy,
we obtain significant speedups in both hyperparameter optimization and
meta-modeling. Particularly in the context of meta-modeling, our method can
learn to predict the performance of drastically different architectures and is
seamlessly incorporated into reinforcement learning-based architecture
selection algorithms. Finally, we show that our method is simpler, faster, and
more accurate than Bayesian methods for learning curve prediction.
Aparna Bharati, Daniel Moreira, Allan Pinto, Joel Brogan, Kevin Bowyer, Patrick Flynn, Walter Scheirer, Anderson Rocha
Comments: 5 pages, Accepted in International Conference on Image Processing, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deriving relationships between images and tracing back their history of
modifications are at the core of Multimedia Phylogeny solutions, which aim to
combat misinformation through doctored visual media. Nonetheless, most recent
image phylogeny solutions cannot properly address cases of forged composite
images with multiple donors, an area known as multiple parenting phylogeny
(MPP). This paper presents a preliminary undirected graph construction solution
for MPP, without any strict assumptions. The algorithm is underpinned by robust
image representative keypoints and different geometric consistency checks among
matching regions in both images to provide regions of interest for direct
comparison. The paper introduces a novel technique to geometrically filter the
most promising matches as well as to aid in the shared region localization
task. The strength of the approach is corroborated by experiments with
real-world cases, with and without image distractors (unrelated cases).
Nathanael L. Baisa, Deepayan Bhowmik, Andrew Wallace
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Tracking a target of interest in both sparse and crowded environments is a
challenging problem, not yet successfully addressed in the literature. In this
paper, we propose a new long-term visual tracking algorithm, learning
discriminative correlation filters and using an online classifier, to track a
target of interest in both sparse and crowded video sequences. First, we learn
a translation correlation filter using a multi-layer hybrid of convolutional
neural networks (CNN) and traditional hand-crafted features. We combine
advantages of both the lower convolutional layer which retains more spatial
details for precise localization and the higher convolutional layer which
encodes semantic information for handling appearance variations, and then
integrate these with histogram of oriented gradients (HOG) and color-naming
traditional features. Second, we include a re-detection module for overcoming
tracking failures due to long-term occlusions by training an incremental
(online) SVM on the most confident frames using hand-engineered features. This
re-detection module is activated only when the correlation response of the
object is below some pre-defined threshold. This generates high score detection
proposals which are temporally filtered using a Gaussian mixture probability
hypothesis density (GM-PHD) filter to find the detection proposal with the
maximum weight as the target state estimate by removing the other detection
proposals as clutter. Finally, we learn a scale correlation filter for
estimating the scale of a target by constructing a target pyramid around the
estimated or re-detected position using the HOG features. We carry out
extensive experiments on both sparse and dense data sets which show that our
method significantly outperforms state-of-the-art methods.
Hsiao-Yu Fish Tung, Adam Harley, William Seto, Katerina Fragkiadaki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose adversarial inversion, a weakly supervised neural network model
that combines inverse rendering with adversarial networks. Given a set of
images, our inverse rendering encoder predicts a set of latent factors (e.g.,
depth, camera pose), which a renderer then projects to reconstruct (part of)
the visual input. Inversion is often ambiguous, e.g., many compositions of 3D
shape and camera pose give rise to the same 2D projection. To address this
ambiguity, we impose priors on the predicted latent factors, through an
adversarial discriminator network trained to discriminate between predicted
factors and ground-truth ones. Training adversarial inversion does not require
input-output paired annotations, but merely a collection of ground-truth
factors, unrelated (unpaired) to the current input. Our model can thus be
self-supervised by unlabelled image data, by minimizing a joint reconstruction
and adversarial loss, complementing any direct supervision provided by paired
annotations. We apply adversarial inversion to 3D human pose estimation and 3D
structure and egomotion estimation, and outperform models supervised by only
paired annotations, and/or reconstruction losses, that do not use adversarial
priors. Applying adversarial inversion to super-resolution and inpainting
results in automated “visual plastic surgery”. In adversarial super-resolution,
when the discriminator is provided with young, old, female, male, or Tom Cruise
faces as ground-truth, our model renders the input face image towards its
young, old, feminine, masculine or Tom Cruise-like equivalent. In adversarial
inpainting, when the discriminator is provided with faces with big lips or big
noses as ground-truth, it creates visual lip or nose augmentations.
Luan Tran, Xi Yin, Xiaoming Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The large pose discrepancy between two face images is one of the fundamental
challenges in automatic face recognition. Conventional approaches to
pose-invariant face recognition either perform face frontalization on, or learn
a pose-invariant representation from, a non-frontal face image. We argue that
it is more desirable to perform both tasks jointly to allow them to leverage
each other. To this end, this paper proposes a Disentangled Representation
learning-Generative Adversarial Network (DR-GAN) with three distinct novelties.
First, the encoder-decoder structure of the generator enables DR-GAN to learn a
representation that is both generative and discriminative, which can be used
for face image synthesis and pose-invariant face recognition. Second, this
representation is explicitly disentangled from other face variations such as
pose, through the pose code provided to the decoder and pose estimation in the
discriminator. Third, DR-GAN can take one or multiple images as the input, and
generate one unified identity representation along with an arbitrary number of
synthetic face images. Extensive quantitative and qualitative evaluation on a
number of controlled and in-the-wild databases demonstrate the superiority of
DR-GAN over the state of the art in both learning representations and rotating
large-pose face images.
Seong Tae Kim, Yong Man Ro
Comments: 8 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
With the recent substantial growth of media such as YouTube, a considerable
number of instructional videos covering a wide variety of tasks are available
online. Therefore, online instructional videos have become a rich resource for
humans to learn everyday skills. In order to improve the effectiveness of the
learning with instructional video, observation and evaluation of the activity
are required. However, it is difficult to observe and evaluate every activity
steps by expert. In this study, a novel deep learning framework which targets
human activity evaluation for learning from instructional video has been
proposed. In order to deal with the inherent variability of activities, we
propose to model activity as a structured process. First, action units are
encoded from dense trajectories with LSTM network. The variable-length action
unit features are then evaluated by a Siamese LSTM network. By the comparative
experiments on public dataset, the effectiveness of the proposed method has
been demonstrated.
Jianxu Chen, Sreya Banerjee, Abhinav Grama, Walter J. Scheirer, Danny Z. Chen
Comments: miccai 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we consider the problem of automatically segmenting neuronal
cells in dual-color confocal microscopy images. This problem is a key task in
various quantitative analysis applications in neuroscience, such as tracing
cell genesis in Danio rerio (zebrafish) brains. Deep learning, especially using
fully convolutional networks (FCN), has profoundly changed segmentation
research in biomedical imaging. We face two major challenges in this problem.
First, neuronal cells may form dense clusters, making it difficult to correctly
identify all individual cells (even to human experts). Consequently,
segmentation results of the known FCN-type models are not accurate enough.
Second, pixel-wise ground truth is difficult to obtain. Only a limited amount
of approximate instance-wise annotation can be collected, which makes the
training of FCN models quite cumbersome. We propose a new FCN-type deep
learning model, called deep complete bipartite networks (CB-Net), and a new
scheme for leveraging approximate instance-wise annotation to train our
pixel-wise prediction model. Evaluated using seven real datasets, our proposed
new CB-Net model outperforms the state-of-the-art FCN models and produces
neuron segmentation results of remarkable quality
Qi Li, Zhenan Sun, Ran He, Tieniu Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
With the rapid growth of image and video data on the web, hashing has been
extensively studied for image or video search in recent years. Benefit from
recent advances in deep learning, deep hashing methods have achieved promising
results for image retrieval. However, there are some limitations of previous
deep hashing methods (e.g., the semantic information is not fully exploited).
In this paper, we develop a deep supervised discrete hashing algorithm based on
the assumption that the learned binary codes should be ideal for
classification. Both the pairwise label information and the classification
information are used to learn the hash codes within one stream framework. We
constrain the outputs of the last layer to be binary codes directly, which is
rarely investigated in deep hashing algorithm. Because of the discrete nature
of hash codes, an alternating minimization method is used to optimize the
objective function. Experimental results have shown that our method outperforms
current state-of-the-art methods on benchmark datasets.
D. S. Guru, N. Vinay Kumar
Comments: 12 Pages, 3 figures, 7 tables
Journal-ref: RTIP2R 2016, CCIS 709, pp. 228 TO 239, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, a novel feature selection approach for supervised interval
valued features is proposed. The proposed approach takes care of selecting the
class specific features through interval K-Means clustering. The kernel of
K-Means clustering algorithm is modified to adapt interval valued data. During
training, a set of samples corresponding to a class is fed into the interval
K-Means clustering algorithm, which clusters features into K distinct clusters.
Hence, there are K number of features corresponding to each class.
Subsequently, corresponding to each class, the cluster representatives are
chosen. This procedure is repeated for all the samples of remaining classes.
During testing the feature indices correspond to each class are used for
validating the given dataset through classification using suitable symbolic
classifiers. For experimentation, four standard supervised interval datasets
are used. The results show the superiority of the proposed model when compared
with the other existing state-of-the-art feature selection methods.
Stefan Sommer, Alexis Arnaudon, Line Kuhnel, Sarang Joshi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present an inference algorithm and connected Monte Carlo based estimation
procedures for metric estimation from landmark configurations distributed
according to the transition distribution of a Riemannian Brownian motion
arising from the Large Deformation Diffeomorphic Metric Mapping (LDDMM) metric.
The distribution possesses properties similar to the regular Euclidean normal
distribution but its transition density is governed by a high-dimensional PDE
with no closed-form solution in the nonlinear case. We show how the density can
be numerically approximated by Monte Carlo sampling of conditioned Brownian
bridges, and we use this to estimate parameters of the LDDMM kernel and thus
the metric structure by maximum likelihood.
Ming Gong, You Hao, Hanlin Mo, Hua Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We proposed a kind of naturally combined shape-color affine moment invariants
(SCAMI), which consider both shape and color affine transformations
simultaneously in one single system. In the real scene, color and shape
deformations always exist in images simultaneously. Simple shape invariants or
color invariants can not be qualified for this situation. The conventional
method is just to make a simple linear combination of the two factors.
Meanwhile, the manual selection of weights is a complex issue. Our construction
method is based on the multiple integration framework. The integral kernel is
assigned as the continued product of the shape and color invariant cores. It is
the first time to directly derive an invariant to dual affine transformations
of shape and color. The manual selection of weights is no longer necessary, and
both the shape and color transformations are extended to affine transformation
group. With the various of invariant cores, a set of lower-order invariants are
constructed and the completeness and independence are discussed detailedly. A
set of SCAMIs, which called SCAMI24, are recommended, and the effectiveness and
robustness have been evaluated on both synthetic and real datasets.
JunYoung Gwak, Christopher B. Choy, Animesh Garg, Manmohan Chandraker, Silvio Savarese
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Volumetric 3D reconstruction has witnessed a significant progress in
performance through the use of deep neural network based methods that address
some of the limitations of traditional reconstruction algorithms. However, this
increase in performance requires large scale annotations of 2D/3D data. This
paper introduces a novel generative model for volumetric 3D reconstruction,
Weakly supervised Generative Adversarial Network (WS-GAN) which reduces
reliance on expensive 3D supervision. WS-GAN takes an input image, a sparse set
of 2D object masks with respective camera parameters, and an unmatched 3D model
as inputs during training. WS-GAN uses a learned encoding as input to a
conditional 3D-model generator trained alongside a discriminator, which is
constrained to the manifold of realistic 3D shapes. We bridge the
representation gap between 2D masks and 3D volumes through a perspective
raytrace pooling layer, that enables perspective projection and allows
backpropagation. We evaluate WS-GAN on ShapeNet, ObjectNet and Stanford Online
Product dataset for reconstruction with single-view and multi-view cases in
both synthetic and real images. We compare our method to voxel carving and
prior work with full 3D supervision. Additionally, we also demonstrate that the
learned feature representation is semantically meaningful through interpolation
and manipulation in input space.
David Rolnick, Yaron Meirovitch, Toufiq Parag, Hanspeter Pfister, Viren Jain, Jeff W. Lichtman, Edward S. Boyden, Nir Shavit
Comments: 13 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Deep learning algorithms for connectomics rely upon localized classification,
rather than overall morphology. This leads to a high incidence of erroneously
merged objects. Humans, by contrast, can easily detect such errors by acquiring
intuition for the correct morphology of objects. Biological neurons have
complicated and variable shapes, which are challenging to learn, and merge
errors take a multitude of different forms. We present an algorithm, MergeNet,
that shows 3D ConvNets can, in fact, detect merge errors from high-level
neuronal morphology. MergeNet follows unsupervised training and operates across
datasets. We demonstrate the performance of MergeNet both on a variety of
connectomics data and on a dataset created from merged MNIST images.
Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenovic, Jiri Matas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a novel loss for learning local feature descriptors that is
inspired by the SIFT matching scheme. We show that the proposed loss that
relies on the maximization of the distance between the closest positive and
closest negative patches can replace more complex regularization methods which
have been used in local descriptor learning; it works well for both shallow and
deep convolution network architectures. The resulting descriptor is compact —
it has the same dimensionality as SIFT (128), it shows state-of-art performance
on matching, patch verification and retrieval benchmarks and it is fast to
compute on a GPU.
Jiawei He, Mostafa S. Ibrahim, Zhiwei Deng, Greg Mori
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We develop a novel framework for action localization in videos. We propose
the Tube Proposal Network (TPN), which can generate generic, class-independent,
video-level tubelet proposals in videos. The generated tubelet proposals can be
utilized in various video analysis tasks, including recognizing and localizing
actions in videos. In particular, we integrate these generic tubelet proposals
into a unified temporal deep network for action classification. Compared with
other methods, our generic tubelet proposal method is accurate, general, and is
fully differentiable under a smoothL1 loss function. We demonstrate the
performance of our algorithm on the standard UCF-Sports, J-HMDB21, and UCF-101
datasets. Our class-independent TPN outperforms other tubelet generation
methods, and our unified temporal deep network achieves state-of-the-art
localization results on all three datasets.
Weihong Guo, Guohui Song, Yue Zhang
Comments: 22 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose in this paper a novel two-stage Projection Correction Modeling
(PCM) framework for image reconstruction from (non-uniform) Fourier
measurements. PCM consists of a projection stage (P-stage) motivated by the
multi-scale Galerkin method and a correction stage (C-stage) with an edge
guided regularity fusing together the advantages of total variation (TV) and
total fractional variation (TFV). The P-stage allows for continuous modeling of
the underlying image of interest. The given measurements are projected onto a
space in which the image is well represented. We then enhance the
reconstruction result at the C-stage that minimizes an energy functional
consisting of a fidelity in the transformed domain and a novel edge guided
regularity. We further develop efficient proximal algorithms to solve the
corresponding optimization problem. Various numerical results in both 1D
signals and 2D images have also been presented to demonstrate the superior
performance of the proposed two-stage method to other classical one-stage
methods.
Serhii Havrylov, Ivan Titov
Comments: Submitted to NIPS 2017. The extended abstract was presented at ICLR 2017 workshop track
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
Learning to communicate through interaction, rather than relying on explicit
supervision, is often considered a prerequisite for developing a general AI. We
study a setting where two agents engage in playing a referential game and, from
scratch, develop a communication protocol necessary to succeed in this game.
Unlike previous work, we require that messages they exchange, both at train and
test time, are in the form of a language (i.e. sequences of discrete symbols).
We compare a reinforcement learning approach and one using a differentiable
relaxation (straight-through Gumbel-softmax estimator) and observe that the
latter is much faster to converge and it results in more effective protocols.
Interestingly, we also observe that the protocol we induce by optimizing the
communication success exhibits a degree of compositionality and variability
(i.e. the same information can be phrased in different ways), both properties
characteristic of natural languages. As the ultimate goal is to ensure that
communication is accomplished in natural language, we also perform experiments
where we inject prior information about natural language into our model and
study properties of the resulting protocol.
Chao Zuo, Tianyang Tao, Shijie Feng, Lei Huang, Anand Asundi, Qian Chen
Comments: This manuscript was originally submitted on 30th January 17
Subjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
Recent advances in imaging sensors and digital light projection technology
have facilitated a rapid progress in 3D optical sensing, enabling 3D surfaces
of complex-shaped objects to be captured with improved resolution and accuracy.
However, due to the large number of projection patterns required for phase
recovery and disambiguation, the maximum fame rates of current 3D shape
measurement techniques are still limited to the range of hundreds of frames per
second (fps). Here, we demonstrate a new 3D dynamic imaging technique, Micro
Fourier Transform Profilometry ((mu)FTP), which can capture 3D surfaces of
transient events at up to 10,000 fps based on our newly developed high-speed
fringe projection system. Compared with existing techniques, (mu)FTP has the
prominent advantage of recovering an accurate, unambiguous, and dense 3D point
cloud with only two projected patterns. Furthermore, the phase information is
encoded within a single high-frequency fringe image, thereby allowing
motion-artifact-free reconstruction of transient events with temporal
resolution of 50 microseconds. To show (mu)FTP’s broad utility, we use it to
reconstruct 3D videos of 4 transient scenes: vibrating cantilevers, rotating
fan blades, bullet fired from a toy gun, and balloon’s explosion triggered by a
flying dart, which were previously difficult or even unable to be captured with
conventional approaches.
Emily Denton, Vighnesh Birodkar
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We present a new model DrNET that learns disentangled image representations
from video. Our approach leverages the temporal coherence of video and a novel
adversarial loss to learn a representation that factorizes each frame into a
stationary part and a temporally varying component. The disentangled
representation can be used for a range of tasks. For example, applying a
standard LSTM to the time-vary components enables prediction of future frames.
We evaluate our approach on a range of synthetic and real videos, demonstrating
the ability to coherently generate hundreds of steps into the future.
Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In the neural network domain, methods for hyperparameter optimization and
meta-modeling are computationally expensive due to the need to train a large
number of neural network configurations. In this paper, we show that a simple
regression model, based on support vector machines, can predict the final
performance of partially trained neural network configurations using features
based on network architectures, hyperparameters, and time-series validation
performance data. We use this regression model to develop an early stopping
strategy for neural network configurations. With this early stopping strategy,
we obtain significant speedups in both hyperparameter optimization and
meta-modeling. Particularly in the context of meta-modeling, our method can
learn to predict the performance of drastically different architectures and is
seamlessly incorporated into reinforcement learning-based architecture
selection algorithms. Finally, we show that our method is simpler, faster, and
more accurate than Bayesian methods for learning curve prediction.
Vitaly Kurin, Sebastian Nowozin, Katja Hofmann, Lucas Beyer, Bastian Leibe
Subjects: Artificial Intelligence (cs.AI)
Recent progress in Reinforcement Learning (RL), fueled by its combination,
with Deep Learning has enabled impressive results in learning to interact with
complex virtual environments, yet real-world applications of RL are still
scarce. A key limitation is data efficiency, with current state-of-the-art
approaches requiring millions of training samples. A promising way to tackle
this problem is to augment RL with learning from human demonstrations. However,
human demonstration data is not yet readily available. This hinders progress in
this direction. The present work addresses this problem as follows. We (i)
collect and describe a large dataset of human Atari 2600 replays — the largest
and most diverse such data set publicly released to date, (ii) illustrate an
example use of this dataset by analyzing the relation between demonstration
quality and imitation learning performance, and (iii) outline possible research
directions that are opened up by our work.
Son N. Tran
Subjects: Artificial Intelligence (cs.AI)
Representing symbolic knowledge into a connectionist network is the key
element for the integration of scalable learning and sound reasoning. Most of
the previous studies focus on discriminative neural networks which
unnecessarily require a separation of input/output variables. Recent
development of generative neural networks such as restricted Boltzmann machines
(RBMs) has shown a capability of learning semantic abstractions directly from
data, posing a promise for general symbolic learning and reasoning. Previous
work on Penalty logic show a link between propositional logic and symmetric
connectionist networks, however it is not applicable to RBMs. This paper
proposes a novel method to represent propositional formulas into RBMs/stack of
RBMs where Gibbs sampling can be seen as MaxSAT. It also shows a promising use
of RBMs to learn symbolic knowledge through maximum likelihood estimation.
Jerry Lonlac, Engelbert Mephu Nguifo
Subjects: Artificial Intelligence (cs.AI)
Clause Learning is one of the most important components of a conflict driven
clause learning (CDCL) SAT solver that is effective on industrial instances.
Since the number of learned clauses is proved to be exponential in the worse
case, it is necessary to identify the most relevant clauses to maintain and
delete the irrelevant ones. As reported in the literature, several learned
clauses deletion strategies have been proposed. However the diversity in both
the number of clauses to be removed at each step of reduction and the results
obtained with each strategy creates confusion to determine which criterion is
better. Thus, the problem to select which learned clauses are to be removed
during the search step remains very challenging. In this paper, we propose a
novel approach to identify the most relevant learned clauses without favoring
or excluding any of the proposed measures, but by adopting the notion of
dominance relationship among those measures. Our approach bypasses the problem
of the diversity of results and reaches a compromise between the assessments of
these measures. Furthermore, the proposed approach also avoids another
non-trivial problem which is the amount of clauses to be deleted at each
reduction of the learned clause database.
Hang Ma, Jiaoyang Li, T. K. Satish Kumar, Sven Koenig
Comments: In AAMAS 2017
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
The multi-agent path-finding (MAPF) problem has recently received a lot of
attention. However, it does not capture important characteristics of many
real-world domains, such as automated warehouses, where agents are constantly
engaged with new tasks. In this paper, we therefore study a lifelong version of
the MAPF problem, called the multi-agent pickup and delivery (MAPD) problem. In
the MAPD problem, agents have to attend to a stream of delivery tasks in an
online setting. One agent has to be assigned to each delivery task. This agent
has to first move to a given pickup location and then to a given delivery
location while avoiding collisions with other agents. We present two decoupled
MAPD algorithms, Token Passing (TP) and Token Passing with Task Swaps (TPTS).
Theoretically, we show that they solve all well-formed MAPD instances, a
realistic subclass of MAPD instances. Experimentally, we compare them against a
centralized strawman MAPD algorithm without this guarantee in a simulated
warehouse system. TP can easily be extended to a fully distributed MAPD
algorithm and is the best choice when real-time computation is of primary
concern since it remains efficient for MAPD instances with hundreds of agents
and tasks. TPTS requires limited communication among agents and balances well
between TP and the centralized MAPD algorithm.
Thommen George Karimpanal, Roland Bouffanais
Comments: 23 pages, 6 figures, Submitted to the journal Artificial Intelligence
Subjects: Artificial Intelligence (cs.AI)
Experience replay is one of the most commonly used approaches to improve the
sample efficiency of reinforcement learning algorithms. In this work, we
propose an approach to select and replay sequences of transitions in order to
accelerate the learning of a reinforcement learning agent in an off-policy
setting. In addition to selecting appropriate sequences, we also artificially
construct transition sequences using information gathered from previous
agent-environment interactions. These sequences, when replayed, allow value
function information to trickle down to larger sections of the
state/state-action space, thereby making the most of the agent’s experience. We
demonstrate our approach on modified versions of standard reinforcement
learning tasks such as the mountain car and puddle world problems and
empirically show that it enables better learning of value functions as compared
to other forms of experience replay. Further, we briefly discuss some of the
possible extensions to this work, as well as applications and situations where
this approach could be particularly useful.
Xerxes D. Arsiwalla, Clement Moulin-Frier, Ivan Herreros, Marti Sanchez-Fibla, Paul Verschure
Comments: 19 pages, 3 figures
Subjects: Neurons and Cognition (q-bio.NC); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Biological Physics (physics.bio-ph)
Given recent proposals to synthesize consciousness, how many forms of
conscious machines can one distinguish and on what grounds? Based on current
clinical scales of consciousness, that measure cognitive awareness and
wakefulness, we take a perspective on how contemporary artificially intelligent
machines and synthetically engineered life forms would measure on these scales.
To do so, we argue that awareness and wakefulness can be associated to
computational and autonomous complexity respectively. Then, building on
insights from cognitive robotics, we ask what function consciousness serves,
and interpret it as an evolutionary game-theoretic strategy. We make the case
for a third type of complexity necessary for describing consciousness, namely,
social complexity. Having identified these complexity types, allows us to
represent both, biological and synthetic systems in a common morphospace. This
suggests an embodiment-based taxonomy of consciousness. In particular, we
distinguish four forms of consciousness, based on embodiment: biological,
synthetic, group (resulting from group interactions) and simulated
consciousness (embodied by virtual agents within a simulated reality). Such a
taxonomy is useful for studying comparative signatures of consciousness across
domains, in order to highlight design principles necessary to engineer
conscious machines. This is particularly relevant in the light of recent
developments at the crossroads of neuroscience, biomedical engineering,
artificial intelligence and biomimetics.
Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Learning meaningful representations that maintain the content necessary for a
particular task while filtering away detrimental variations is a problem of
great interest in machine learning. In this paper, we tackle the problem of
learning representations invariant to a specific factor or trait of data,
leading to better generalization. The representation learning process is
formulated as an adversarial minimax game. We analyze the optimal equilibrium
of such a game and find that it amounts to maximizing the uncertainty of
inferring the detrimental factor given the representation while maximizing the
certainty of making task-specific predictions. On three benchmark tasks, namely
fair and bias-free classification, language-independent generation, and
lighting-independent image classification, we show that the proposed framework
induces an invariant representation, and leads to better generalization
evidenced by the improved test performance.
Tim Rocktäschel, Sebastian Riedel
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)
We introduce neural networks for end-to-end differentiable theorem proving
that operate on dense vector representations of symbols. These neural networks
are constructed recursively by taking inspiration from the backward chaining
algorithm as used in Prolog. Specifically, we replace symbolic unification with
a differentiable computation on vector representations of symbols using a
radial basis function kernel, thereby combining symbolic reasoning with
learning subsymbolic vector representations. By using gradient descent, the
resulting neural network can be trained to infer facts from a given incomplete
knowledge base. It learns to (i) place representations of similar symbols in
close proximity in a vector space, (ii) make use of such similarities to prove
facts, (iii) induce logical rules, and (iv) use provided and induced logical
rules for complex multi-hop reasoning. We demonstrate that this architecture
outperforms ComplEx, a state-of-the-art neural link prediction model, on four
benchmark knowledge bases while at the same time inducing interpretable
function-free first-order logic rules.
Julien Perez, Tomi Silander
Comments: 11 pages, 1 figure, 1 table
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Partially observable environments present an important open challenge in the
domain of sequential control learning with delayed rewards. Despite numerous
attempts during the two last decades, the majority of reinforcement learning
algorithms and associated approximate models, applied to this context, still
assume Markovian state transitions. In this paper, we explore the use of a
recently proposed attention-based model, the Gated End-to-End Memory Network,
for sequential control. We call the resulting model the Gated End-to-End Memory
Policy Network. More precisely, we use a model-free value-based algorithm to
learn policies for partially observed domains using this memory-enhanced neural
network. This model is end-to-end learnable and it features unbounded memory.
Indeed, because of its attention mechanism and associated non-parametric
memory, the proposed model allows us to define an attention mechanism over the
observation stream unlike recurrent models. We show encouraging results that
illustrate the capability of our attention-based model in the context of the
continuous-state non-stationary control problem of stock trading. We also
present an OpenAI Gym environment for simulated stock exchange and explain its
relevance as a benchmark for the field of non-Markovian decision process
learning.
Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville
Comments: 11 pages, 3 figures, 5 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Generative Adversarial Networks (GANs) have gathered a lot of attention from
the computer vision community, yielding impressive results for image
generation. Advances in the adversarial generation of natural language from
noise however are not commensurate with the progress made in generating images,
and still lag far behind likelihood based methods. In this paper, we take a
step towards generating natural language with a GAN objective alone. We
introduce a simple baseline that addresses the discrete output space problem
without relying on gradient estimators and show that it is able to achieve
state-of-the-art results on a Chinese poem generation dataset. We present
quantitative results on generating sentences from context-free and
probabilistic context-free grammars, and qualitative language modeling results.
A conditional version is also described that can generate sequences conditioned
on sentence characteristics.
Emily Denton, Vighnesh Birodkar
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We present a new model DrNET that learns disentangled image representations
from video. Our approach leverages the temporal coherence of video and a novel
adversarial loss to learn a representation that factorizes each frame into a
stationary part and a temporally varying component. The disentangled
representation can be used for a range of tasks. For example, applying a
standard LSTM to the time-vary components enables prediction of future frames.
We evaluate our approach on a range of synthetic and real videos, demonstrating
the ability to coherently generate hundreds of steps into the future.
David Rolnick, Yaron Meirovitch, Toufiq Parag, Hanspeter Pfister, Viren Jain, Jeff W. Lichtman, Edward S. Boyden, Nir Shavit
Comments: 13 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Deep learning algorithms for connectomics rely upon localized classification,
rather than overall morphology. This leads to a high incidence of erroneously
merged objects. Humans, by contrast, can easily detect such errors by acquiring
intuition for the correct morphology of objects. Biological neurons have
complicated and variable shapes, which are challenging to learn, and merge
errors take a multitude of different forms. We present an algorithm, MergeNet,
that shows 3D ConvNets can, in fact, detect merge errors from high-level
neuronal morphology. MergeNet follows unsupervised training and operates across
datasets. We demonstrate the performance of MergeNet both on a variety of
connectomics data and on a dataset created from merged MNIST images.
Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In the neural network domain, methods for hyperparameter optimization and
meta-modeling are computationally expensive due to the need to train a large
number of neural network configurations. In this paper, we show that a simple
regression model, based on support vector machines, can predict the final
performance of partially trained neural network configurations using features
based on network architectures, hyperparameters, and time-series validation
performance data. We use this regression model to develop an early stopping
strategy for neural network configurations. With this early stopping strategy,
we obtain significant speedups in both hyperparameter optimization and
meta-modeling. Particularly in the context of meta-modeling, our method can
learn to predict the performance of drastically different architectures and is
seamlessly incorporated into reinforcement learning-based architecture
selection algorithms. Finally, we show that our method is simpler, faster, and
more accurate than Bayesian methods for learning curve prediction.
Hamidreza Alvari, Paulo Shakarian, J.E. Kelly Snyder
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Human trafficking is one of the most atrocious crimes and among the
challenging problems facing law enforcement which demands attention of global
magnitude. In this study, we leverage textual data from the website “Backpage”-
used for classified advertisement- to discern potential patterns of human
trafficking activities which manifest online and identify advertisements of
high interest to law enforcement. Due to the lack of ground truth, we rely on a
human analyst from law enforcement, for hand-labeling a small portion of the
crawled data. We extend the existing Laplacian SVM and present S3VM-R, by
adding a regularization term to exploit exogenous information embedded in our
feature space in favor of the task at hand. We train the proposed method using
labeled and unlabeled data and evaluate it on a fraction of the unlabeled data,
herein referred to as unseen data, with our expert’s further verification.
Results from comparisons between our method and other semi-supervised and
supervised approaches on the labeled data demonstrate that our learner is
effective in identifying advertisements of high interest to law enforcement
Ruihui Zhao, Yuanliang Sun, Mizuho Iwaihara
Comments: 9 pages, submitted to CIKM 2017
Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR)
Cloud computing is emerging as a revolutionary computing paradigm, while
security and privacy become major concerns in the cloud scenario. For which
Searchable Encryption (SE) technology is proposed to support efficient
retrieval of encrypted data. However, the absence of lightweight ranked search
with higher search quality in a harsh adversary model is still a typical
shortage in existing SE schemes. In this paper, we propose a novel SE scheme
called LRSE which firstly integrates machine learning methods into the
framework of SE and combines local and global representations of encrypted
cloud data to achieve the above design goals. In LRSE, we employ an improved
secure kNN scheme to guarantee sufficient privacy protection. Our detailed
security analysis shows that LRSE satisfies our formulated privacy
requirements. Extensive experiments performed on benchmark datasets demonstrate
that LRSE indeed achieves state-of-the-art search quality with lowest system
cost.
Li Lucy, Jon Gauthier
Comments: Accepted at RoboNLP 2017
Subjects: Computation and Language (cs.CL)
Distributional word representation methods exploit word co-occurrences to
build compact vector encodings of words. While these representations enjoy
widespread use in modern natural language processing, it is unclear whether
they accurately encode all necessary facets of conceptual meaning. In this
paper, we evaluate how well these representations can predict perceptual and
conceptual features of concrete concepts, drawing on two semantic norm datasets
sourced from human participants. We find that several standard word
representations fail to encode many salient perceptual features of concepts,
and show that these deficits correlate with word-word similarity prediction
errors. Our analyses provide motivation for grounded and embodied language
learning approaches, which may help to remedy these deficits.
Junhui Li, Muhua Zhu
Comments: 5 pages, 2 figures
Subjects: Computation and Language (cs.CL)
In the past few years, attention mechanisms have become an indispensable
component of end-to-end neural machine translation models. However, previous
attention models always refer to some source words when predicting a target
word, which contradicts with the fact that some target words have no
corresponding source words. Motivated by this observation, we propose a novel
attention model that has the capability of determining when a decoder should
attend to source words and when it should not. Experimental results on NIST
Chinese-English translation tasks show that the new model achieves an
improvement of 0.8 BLEU score over a state-of-the-art baseline.
Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun
Subjects: Computation and Language (cs.CL)
Generative adversarial networks (GANs) have great successes on synthesizing
data. However, the existing GANs restrict the discriminator to be a binary
classifier, and thus limit their learning capacity for tasks that need to
synthesize output with rich structures such as natural language descriptions.
In this paper, we propose a novel generative adversarial network, RankGAN, for
generating high-quality language descriptions. Rather than train the
discriminator to learn and assign absolute binary predicate for individual data
sample, the proposed RankGAN is able to analyze and rank a collection of
human-written and machine-written sentences by giving a reference group. By
viewing a set of data samples collectively and evaluating their quality through
relative ranking scores, the discriminator is able to make better assessment
which in turn helps to learn a better generator. The proposed RankGAN is
optimized through the policy gradient technique. Experimental results on
multiple public datasets clearly demonstrate the effectiveness of the proposed
approach.
Koichiro Yoshino, Shinsuke Mori, Satoshi Nakamura
Subjects: Computation and Language (cs.CL)
This paper investigates and analyzes the effect of dependency information on
predicate-argument structure analysis (PASA) and zero anaphora resolution (ZAR)
for Japanese, and shows that a straightforward approach of PASA and ZAR works
effectively even if dependency information was not available. We constructed an
analyzer that directly predicts relationships of predicates and arguments with
their semantic roles from a POS-tagged corpus. The features of the system are
designed to compensate for the absence of syntactic information by using
features used in dependency parsing as a reference. We also constructed
analyzers that use the oracle dependency and the real dependency parsing
results, and compared with the system that does not use any syntactic
information to verify that the improvement provided by dependencies is not
crucial.
Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville
Comments: 11 pages, 3 figures, 5 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Generative Adversarial Networks (GANs) have gathered a lot of attention from
the computer vision community, yielding impressive results for image
generation. Advances in the adversarial generation of natural language from
noise however are not commensurate with the progress made in generating images,
and still lag far behind likelihood based methods. In this paper, we take a
step towards generating natural language with a GAN objective alone. We
introduce a simple baseline that addresses the discrete output space problem
without relying on gradient estimators and show that it is able to achieve
state-of-the-art results on a Chinese poem generation dataset. We present
quantitative results on generating sentences from context-free and
probabilistic context-free grammars, and qualitative language modeling results.
A conditional version is also described that can generate sequences conditioned
on sentence characteristics.
Paul Michel, Abhilasha Ravichander, Shruti Rijhwani
Comments: 5 pages, 3 figures. Rep4NLP workshop at ACL 2017
Subjects: Computation and Language (cs.CL)
We investigate the pertinence of methods from algebraic topology for text
data analysis. These methods enable the development of
mathematically-principled isometric-invariant mappings from a set of vectors to
a document embedding, which is stable with respect to the geometry of the
document in the selected metric space. In this work, we evaluate the utility of
these topology-based document representations in traditional NLP tasks,
specifically document clustering and sentiment classification. We find that the
embeddings do not benefit text analysis. In fact, performance is worse than
simple techniques like ( extit{tf-idf}), indicating that the geometry of the
document does not provide enough variability for classification on the basis of
topic or sentiment in the chosen datasets.
Xiang Yu, Ngoc Thang Vu
Comments: Accepted in ACL 2017 (Short)
Subjects: Computation and Language (cs.CL)
We present a transition-based dependency parser that uses a convolutional
neural network to compose word representations from characters. The character
composition model shows great improvement over the word-lookup model,
especially for parsing agglutinative languages. These improvements are even
better than using pre-trained word embeddings from extra data. On the SPMRL
data sets, our system outperforms the previous best greedy parser (Ballesteros
et al., 2015) by a margin of 3% on average.
Serhii Havrylov, Ivan Titov
Comments: Submitted to NIPS 2017. The extended abstract was presented at ICLR 2017 workshop track
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
Learning to communicate through interaction, rather than relying on explicit
supervision, is often considered a prerequisite for developing a general AI. We
study a setting where two agents engage in playing a referential game and, from
scratch, develop a communication protocol necessary to succeed in this game.
Unlike previous work, we require that messages they exchange, both at train and
test time, are in the form of a language (i.e. sequences of discrete symbols).
We compare a reinforcement learning approach and one using a differentiable
relaxation (straight-through Gumbel-softmax estimator) and observe that the
latter is much faster to converge and it results in more effective protocols.
Interestingly, we also observe that the protocol we induce by optimizing the
communication success exhibits a degree of compositionality and variability
(i.e. the same information can be phrased in different ways), both properties
characteristic of natural languages. As the ultimate goal is to ensure that
communication is accomplished in natural language, we also perform experiments
where we inject prior information about natural language into our model and
study properties of the resulting protocol.
Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Learning meaningful representations that maintain the content necessary for a
particular task while filtering away detrimental variations is a problem of
great interest in machine learning. In this paper, we tackle the problem of
learning representations invariant to a specific factor or trait of data,
leading to better generalization. The representation learning process is
formulated as an adversarial minimax game. We analyze the optimal equilibrium
of such a game and find that it amounts to maximizing the uncertainty of
inferring the detrimental factor given the representation while maximizing the
certainty of making task-specific predictions. On three benchmark tasks, namely
fair and bias-free classification, language-independent generation, and
lighting-independent image classification, we show that the proposed framework
induces an invariant representation, and leads to better generalization
evidenced by the improved test performance.
Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Björn Schuller
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)
Eliminating the negative effect of highly non-stationary environmental noise
is a long-standing research topic for speech recognition but remains an
important challenge nowadays. To address this issue, traditional unsupervised
signal processing methods seem to have touched the ceiling. However,
data-driven based supervised approaches, particularly the ones designed with
deep learning, have recently emerged as potential alternatives. In this light,
we are going to comprehensively summarise the recently developed and most
representative deep learning approaches to deal with the raised problem in this
article, with the aim of providing guidelines for those who are going deeply
into the field of environmentally robust speech recognition. To better
introduce these approaches, we categorise them into single- and multi-channel
techniques, each of which is specifically described at the front-end, the
back-end, and the joint framework of speech recognition systems. In the
meanwhile, we describe the pros and cons of these approaches as well as the
relationships among them, which can probably benefit future research.
Evangelos Georganas, Steven Hofmeyr, Rob Egan, Aydin Buluc, Leonid Oliker, Daniel Rokhsar, Katherine Yelick
Comments: To appear as a chapter in Exascale Scientific Applications: Programming Approaches for Scalability, Performance, and Portability, Straatsma, Antypas, Williams (editors), CRC Press, 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
De novo whole genome assembly reconstructs genomic sequence from short,
overlapping, and potentially erroneous DNA segments and is one of the most
important computations in modern genomics. This work presents HipMER, a
high-quality end-to-end de novo assembler designed for extreme scale analysis,
via efficient parallelization of the Meraculous code. Genome assembly software
has many components, each of which stresses different components of a computer
system. This chapter explains the computational challenges involved in each
step of the HipMer pipeline, the key distributed data structures, and
communication costs in detail. We present performance results of assembling the
human genome and the large hexaploid wheat genome on large supercomputers up to
tens of thousands of cores.
Zhijie Ren, Kelong Cong, Johan Pouwelse, Zekeriya Erkin
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Recently, the blockchain technique was put in the spotlight as it introduced
a systematic approach for multiple parties to reach consensus without needing
trust. However, the application of this technique in practice is severely
restricted due to its limitations in throughput, reliability, storage
requirement, and privacy. In this paper, we propose a novel consensus model,
namely the implicit consensus, with a distinctive blockchain-based distributed
ledger in which each node holds its individual blockchain. In our system, the
agreement is not on the transactions, but on a special type of blocks called
Check Points that are used to validate individual transactions. Our system
achieves superior performance over all existing blockchain techniques in
multiple aspects with equivalent reliability. Most remarkably, our system
achieves unbounded throughput which is by far the best throughput achieved by
any blockchain technique.
Jie Tang, Shaoshan Liu, Chao Wang, Quan Wang
Comments: 12 pages, 7 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)
Autonomous vehicle safety and reliability are the paramount requirements when
developing autonomous vehicles. These requirements are guaranteed by massive
functional and performance tests. Conducting these tests on real vehicles is
extremely expensive and time consuming, and thus it is imperative to develop a
simulation platform to perform these tasks. For simulation, we can utilize the
Robot Operating System (ROS) for data playback to test newly developed
algorithms. However, due to the massive amount of simulation data, performing
simulation on single machines is not practical. Hence, a high-performance
distributed simulation platform is a critical piece in autonomous driving
development. In this paper we present our experiences of building a production
distributed autonomous driving simulation platform. This platform is built upon
Spark distributed framework, for distributed computing management, and ROS, for
data playback simulations.
Dorota Urbańska-Osula
Comments: 10 pages, 2 figures, 3 pseudo-codes
Subjects: Discrete Mathematics (cs.DM); Distributed, Parallel, and Cluster Computing (cs.DC); Combinatorics (math.CO)
A group of mobile entities is given a task to explore an edge-weighted tree
(T), i.e. every vertex of (T) has to be visited by at least one agent. There is
no centralized unit to coordinate their actions, but they can freely
communicate with each other and they know the structure of (T) in advance. The
goal is to construct a deterministic strategy which allows robots to complete
their task optimally.
In this paper we are interested in a cost-optimal strategy, where the cost is
understood as the sum of the total distance traversed by agents and the cost of
invoking them. We present an algorithm that finds an optimal solution for a
given (n)-node tree in (O(n)) time.
Serhii Havrylov, Ivan Titov
Comments: Submitted to NIPS 2017. The extended abstract was presented at ICLR 2017 workshop track
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
Learning to communicate through interaction, rather than relying on explicit
supervision, is often considered a prerequisite for developing a general AI. We
study a setting where two agents engage in playing a referential game and, from
scratch, develop a communication protocol necessary to succeed in this game.
Unlike previous work, we require that messages they exchange, both at train and
test time, are in the form of a language (i.e. sequences of discrete symbols).
We compare a reinforcement learning approach and one using a differentiable
relaxation (straight-through Gumbel-softmax estimator) and observe that the
latter is much faster to converge and it results in more effective protocols.
Interestingly, we also observe that the protocol we induce by optimizing the
communication success exhibits a degree of compositionality and variability
(i.e. the same information can be phrased in different ways), both properties
characteristic of natural languages. As the ultimate goal is to ensure that
communication is accomplished in natural language, we also perform experiments
where we inject prior information about natural language into our model and
study properties of the resulting protocol.
Chang Xu, Tao Qin, Gang Wang, Tie-Yan Liu
Comments: 7 pages, 9 figures
Subjects: Learning (cs.LG)
Stochastic gradient descent (SGD), which updates the model parameters by
adding a local gradient times a learning rate at each step, is widely used in
model training of machine learning algorithms such as neural networks. It is
observed that the models trained by SGD are sensitive to learning rates and
good learning rates are problem specific. We propose an algorithm to
automatically learn learning rates using neural network based actor-critic
methods from deep reinforcement learning (RL).In particular, we train a policy
network called actor to decide the learning rate at each step during training,
and a value network called critic to give feedback about quality of the
decision (e.g., the goodness of the learning rate outputted by the actor) that
the actor made. The introduction of auxiliary actor and critic networks helps
the main network achieve better performance. Experiments on different datasets
and network architectures show that our approach leads to better convergence of
SGD than human-designed competitors.
Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Learning meaningful representations that maintain the content necessary for a
particular task while filtering away detrimental variations is a problem of
great interest in machine learning. In this paper, we tackle the problem of
learning representations invariant to a specific factor or trait of data,
leading to better generalization. The representation learning process is
formulated as an adversarial minimax game. We analyze the optimal equilibrium
of such a game and find that it amounts to maximizing the uncertainty of
inferring the detrimental factor given the representation while maximizing the
certainty of making task-specific predictions. On three benchmark tasks, namely
fair and bias-free classification, language-independent generation, and
lighting-independent image classification, we show that the proposed framework
induces an invariant representation, and leads to better generalization
evidenced by the improved test performance.
Linus Hamilton, Frederic Koehler, Ankur Moitra
Comments: 25 pages
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)
Markov random fields area popular model for high-dimensional probability
distributions. Over the years, many mathematical, statistical and algorithmic
problems on them have been studied. Until recently, the only known algorithms
for provably learning them relied on exhaustive search, correlation decay or
various incoherence assumptions. Bresler gave an algorithm for learning general
Ising models on bounded degree graphs. His approach was based on a structural
result about mutual information in Ising models.
Here we take a more conceptual approach to proving lower bounds on the mutual
information through setting up an appropriate zero-sum game. Our proof
generalizes well beyond Ising models, to arbitrary Markov random fields with
higher order interactions. As an application, we obtain algorithms for learning
Markov random fields on bounded degree graphs on (n) nodes with (r)-order
interactions in (n^r) time and (log n) sample complexity. The sample
complexity is information theoretically optimal up to the dependence on the
maximum degree. The running time is nearly optimal under standard conjectures
about the hardness of learning parity with noise.
Zhenzhou Wu, Sean Saito
Subjects: Learning (cs.LG)
Traditionally, classifying large hierarchical labels with more than 10000
distinct traces can only be achieved with flatten labels. Although flatten
labels is feasible, it misses the hierarchical information in the labels.
Hierarchical models like HSVM by cite{vural2004hierarchical} becomes
impossible to train because of the sheer number of SVMs in the whole
architecture. We developed a hierarchical architecture based on neural networks
that is simple to train. Also, we derived an inference algorithm that can
efficiently infer the MAP (maximum a posteriori) trace guaranteed by our
theorems. Furthermore, the complexity of the model is only (O(n^2)) compared to
(O(n^h)) in a flatten model, where (h) is the height of the hierarchy.
Francesco Locatello, Michael Tschannen, Gunnar Rätsch, Martin Jaggi
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe
(FW) algorithms regained popularity in recent years due to their simplicity,
effectiveness and theoretical guarantees. MP and FW address optimization over
the linear span and the convex hull of a set of atoms, respectively. In this
paper, we consider the intermediate case of optimization over the convex cone,
parametrized as the conic hull of a generic atom set, leading to the first
principled definitions of non-negative MP algorithms for which we give explicit
convergence rates and demonstrate excellent empirical performance. In
particular, we derive sublinear ((mathcal{O}(1/t))) convergence on general
smooth and convex objectives, and linear convergence ((mathcal{O}(e^{-t}))) on
strongly convex objectives, in both cases for general sets of atoms.
Furthermore, we establish a clear correspondence of our algorithms to known
algorithms from the MP and FW literature. Our novel algorithms and analyses
target general atom sets and general objective functions, and hence are
directly applicable to a large variety of learning settings.
Zachary T. Wilson, Nikolaos V. Sahinidis
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
ALAMO is a computational methodology for leaning algebraic functions from
data. Given a data set, the approach begins by building a low-complexity,
linear model composed of explicit non-linear transformations of the independent
variables. Linear combinations of these non-linear transformations allow a
linear model to better approximate complex behavior observed in real processes.
The model is refined, as additional data are obtained in an adaptive fashion
through error maximization sampling using derivative-free optimization. Models
built using ALAMO can enforce constraints on the response variables to
incorporate first-principles knowledge. The ability of ALAMO to generate simple
and accurate models for a number of reaction problems is demonstrated. The
error maximization sampling is compared with Latin hypercube designs to
demonstrate its sampling efficiency. ALAMO’s constrained regression methodology
is used to further refine concentration models, resulting in models that
perform better on validation data and satisfy upper and lower bounds placed on
model outputs.
Emily Denton, Vighnesh Birodkar
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We present a new model DrNET that learns disentangled image representations
from video. Our approach leverages the temporal coherence of video and a novel
adversarial loss to learn a representation that factorizes each frame into a
stationary part and a temporally varying component. The disentangled
representation can be used for a range of tasks. For example, applying a
standard LSTM to the time-vary components enables prediction of future frames.
We evaluate our approach on a range of synthetic and real videos, demonstrating
the ability to coherently generate hundreds of steps into the future.
Qilong Gu, Arindam Banerjee
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
High dimensional superposition models characterize observations using
parameters which can be written as a sum of multiple component parameters, each
with its own structure, e.g., sum of low rank and sparse matrices, sum of
sparse and rotated sparse vectors, etc. In this paper, we consider general
superposition models which allow sum of any number of component parameters, and
each component structure can be characterized by any norm. We present a simple
estimator for such models, give a geometric condition under which the
components can be accurately estimated, characterize sample complexity of the
estimator, and give high probability non-asymptotic bounds on the componentwise
estimation error. We use tools from empirical processes and generic chaining
for the statistical analysis, and our results, which substantially generalize
prior work on superposition models, are in terms of Gaussian widths of suitable
sets.
Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, Z. Steven Wu
Comments: 24 pages single-column
Subjects: Learning (cs.LG)
Traditional approaches to differential privacy assume a fixed privacy
requirement (epsilon) for a computation, and attempt to maximize the accuracy
of the computation subject to the privacy constraint. As differential privacy
is increasingly deployed in practical settings, it may often be that there is
instead a fixed accuracy requirement for a given computation and the data
analyst would like to maximize the privacy of the computation subject to the
accuracy constraint. This raises the question of how to find and run a
maximally private empirical risk minimizer subject to a given accuracy
requirement. We propose a general “noise reduction” framework that can apply to
a variety of private empirical risk minimization (ERM) algorithms, using them
to “search” the space of privacy levels to find the empirically strongest one
that meets the accuracy constraint, incurring only logarithmic overhead in the
number of privacy levels searched. The privacy analysis of our algorithm leads
naturally to a version of differential privacy where the privacy parameters are
dependent on the data, which we term ex-post privacy, and which is related to
the recently introduced notion of privacy odometers. We also give an ex-post
privacy analysis of the classical AboveThreshold privacy tool, modifying it to
allow for queries chosen depending on the database. Finally, we apply our
approach to two common objectives, regularized linear and logistic regression,
and empirically compare our noise reduction methods to (i) inverting the
theoretical utility guarantees of standard private ERM algorithms and (ii) a
stronger, empirical baseline based on binary search.
Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik
Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In the neural network domain, methods for hyperparameter optimization and
meta-modeling are computationally expensive due to the need to train a large
number of neural network configurations. In this paper, we show that a simple
regression model, based on support vector machines, can predict the final
performance of partially trained neural network configurations using features
based on network architectures, hyperparameters, and time-series validation
performance data. We use this regression model to develop an early stopping
strategy for neural network configurations. With this early stopping strategy,
we obtain significant speedups in both hyperparameter optimization and
meta-modeling. Particularly in the context of meta-modeling, our method can
learn to predict the performance of drastically different architectures and is
seamlessly incorporated into reinforcement learning-based architecture
selection algorithms. Finally, we show that our method is simpler, faster, and
more accurate than Bayesian methods for learning curve prediction.
Hamidreza Alvari, Paulo Shakarian, J.E. Kelly Snyder
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Human trafficking is one of the most atrocious crimes and among the
challenging problems facing law enforcement which demands attention of global
magnitude. In this study, we leverage textual data from the website “Backpage”-
used for classified advertisement- to discern potential patterns of human
trafficking activities which manifest online and identify advertisements of
high interest to law enforcement. Due to the lack of ground truth, we rely on a
human analyst from law enforcement, for hand-labeling a small portion of the
crawled data. We extend the existing Laplacian SVM and present S3VM-R, by
adding a regularization term to exploit exogenous information embedded in our
feature space in favor of the task at hand. We train the proposed method using
labeled and unlabeled data and evaluate it on a fraction of the unlabeled data,
herein referred to as unseen data, with our expert’s further verification.
Results from comparisons between our method and other semi-supervised and
supervised approaches on the labeled data demonstrate that our learner is
effective in identifying advertisements of high interest to law enforcement
Friedemann Zenke, Surya Ganguli
Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
A vast majority of computation in the brain is performed by spiking neural
networks. Despite the ubiquity of such spiking, we currently lack an
understanding of how biological spiking neural circuits learn and compute
in-vivo, as well as how we can instantiate such capabilities in artificial
spiking circuits in-silico. Here we revisit the problem of supervised learning
in temporally coding multi-layer spiking neural networks. First, by using a
surrogate gradient approach, we derive SuperSpike, a nonlinear voltage-based
three factor learning rule capable of training multi-layer networks of
deterministic integrate-and-fire neurons to perform nonlinear computations on
spatiotemporal spike patterns. Second, inspired by recent results on feedback
alignment, we compare the performance of our learning rule under different
credit assignment strategies for propagating output errors to hidden units.
Specifically, we test uniform, symmetric and random feedback, finding that
simpler tasks can be solved with any type of feedback, while more complex tasks
require symmetric feedback. In summary, our results open the door to obtaining
a better scientific understanding of learning and computation in spiking neural
networks by advancing our ability to train them to solve nonlinear problems
involving transformations between different spatiotemporal spike-time patterns.
Tim Rocktäschel, Sebastian Riedel
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)
We introduce neural networks for end-to-end differentiable theorem proving
that operate on dense vector representations of symbols. These neural networks
are constructed recursively by taking inspiration from the backward chaining
algorithm as used in Prolog. Specifically, we replace symbolic unification with
a differentiable computation on vector representations of symbols using a
radial basis function kernel, thereby combining symbolic reasoning with
learning subsymbolic vector representations. By using gradient descent, the
resulting neural network can be trained to infer facts from a given incomplete
knowledge base. It learns to (i) place representations of similar symbols in
close proximity in a vector space, (ii) make use of such similarities to prove
facts, (iii) induce logical rules, and (iv) use provided and induced logical
rules for complex multi-hop reasoning. We demonstrate that this architecture
outperforms ComplEx, a state-of-the-art neural link prediction model, on four
benchmark knowledge bases while at the same time inducing interpretable
function-free first-order logic rules.
Dan Oprisa, Peter Toth
Subjects: Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG)
Guided by critical systems found in nature we develop a novel mechanism
consisting of inhomogeneous polynomial regularisation via which we can induce
scale invariance in deep learning systems. Technically, we map our deep
learning (DL) setup to a genuine field theory, on which we act with the
Renormalisation Group (RG) in momentum space and produce the flow equations of
the couplings; those are translated to constraints and consequently interpreted
as “critical regularisation” conditions in the optimiser; the resulting
equations hence prove to be sufficient conditions for – and serve as an elegant
and simple mechanism to induce scale invariance in any deep learning setup.
Julien Perez, Tomi Silander
Comments: 11 pages, 1 figure, 1 table
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Partially observable environments present an important open challenge in the
domain of sequential control learning with delayed rewards. Despite numerous
attempts during the two last decades, the majority of reinforcement learning
algorithms and associated approximate models, applied to this context, still
assume Markovian state transitions. In this paper, we explore the use of a
recently proposed attention-based model, the Gated End-to-End Memory Network,
for sequential control. We call the resulting model the Gated End-to-End Memory
Policy Network. More precisely, we use a model-free value-based algorithm to
learn policies for partially observed domains using this memory-enhanced neural
network. This model is end-to-end learnable and it features unbounded memory.
Indeed, because of its attention mechanism and associated non-parametric
memory, the proposed model allows us to define an attention mechanism over the
observation stream unlike recurrent models. We show encouraging results that
illustrate the capability of our attention-based model in the context of the
continuous-state non-stationary control problem of stock trading. We also
present an OpenAI Gym environment for simulated stock exchange and explain its
relevance as a benchmark for the field of non-Markovian decision process
learning.
Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Kernel methods provide a principled way to perform non linear, nonparametric
learning. They rely on solid functional analytic foundations and enjoy optimal
statistical properties. However, at least in their basic form, they have
limited applicability in large scale scenarios because of stringent
computational requirements in terms of time and especially memory. In this
paper, we take a substantial step in scaling up kernel methods, proposing
FALKON, a novel algorithm that allows to efficiently process millions of
points. FALKON is derived combining several algorithmic principles, namely
stochastic projections, iterative solvers and preconditioning. Our theoretical
analysis shows that optimal statistical accuracy is achieved requiring
essentially (O(n)) memory and (O(nsqrt{n})) time. Extensive experiments show
that state of the art results on available large scale datasets can be achieved
even on a single machine.
Yuichi Yoshida, Takeru Miyato
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We investigate the generalizability of deep learning based on the sensitivity
to input perturbation. We hypothesize that the high sensitivity to the
perturbation of data degrades the performance on it. To reduce the sensitivity
to perturbation, we propose a simple and effective regularization method,
referred to as spectral norm regularization, which penalizes the high spectral
norm of weight matrices in neural networks. We provide supportive evidence for
the abovementioned hypothesis by experimentally confirming that the models
trained using spectral norm regularization exhibit better generalizability than
other baseline methods.
Henghui Zhu, Feng Nan, Ioannis Paschalidis, Venkatesh Saligrama
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Deep neural network (DNN) based approaches hold significant potential for
reinforcement learning (RL) and have already shown remarkable gains over
state-of-art methods in a number of applications. The effectiveness of DNN
methods can be attributed to leveraging the abundance of supervised data to
learn value functions, Q-functions, and policy function approximations without
the need for feature engineering. Nevertheless, the deployment of DNN-based
predictors with very deep architectures can pose an issue due to computational
and other resource constraints at test-time in a number of applications. We
propose a novel approach for reducing the average latency by learning a
computationally efficient gating function that is capable of recognizing states
in a sequential decision process for which policy prescriptions of a shallow
network suffices and deeper layers of the DNN have little marginal utility. The
overall system is adaptive in that it dynamically switches control actions
based on state-estimates in order to reduce average latency without sacrificing
terminal performance. We experiment with a number of alternative loss-functions
to train gating functions and shallow policies and show that in a number of
applications a speed-up of up to almost 5X can be obtained with little loss in
performance.
Javier S. Turek, Alexander Huth
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Numerical Analysis (cs.NA)
Symmetric matrices are widely used in machine learning problems such as
kernel machines and manifold learning. Using large datasets often requires
computing low-rank approximations of these symmetric matrices so that they fit
in memory. In this paper, we present a novel method based on biharmonic
interpolation for low-rank matrix approximation. The method exploits knowledge
of the data manifold to learn an interpolation operator that approximates
values using a subset of randomly selected landmark points. This operator is
readily sparsified, reducing memory requirements by at least two orders of
magnitude without significant loss in accuracy. We show that our method can
approximate very large datasets using twenty times more landmarks than other
methods. Further, numerical results suggest that our method is stable even when
numerical difficulties arise for other methods.
Velibor V. Mišić
Comments: 48 pages, 10 figures
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
Tree ensemble models such as random forests and boosted trees are among the
most widely used and practically successful predictive models in applied
machine learning and business analytics. Although such models have been used to
make predictions based on exogenous, uncontrollable independent variables, they
are increasingly being used to make predictions where the independent variables
are controllable and are also decision variables. In this paper, we study the
problem of tree ensemble optimization: given a tree ensemble that predicts some
dependent variable using controllable independent variables, how should we set
these variables so as to maximize the predicted value? We formulate the problem
as a mixed-integer optimization problem. We theoretically examine the strength
of our formulation, provide a hierarchy of approximate formulations with bounds
on approximation quality and exploit the structure of the problem to develop
two large-scale solution methods, one based on Benders decomposition and one
based on iteratively generating tree split constraints. We test our methodology
on real data sets, including two case studies in drug design and customized
pricing, and show that our methodology can efficiently solve large-scale
instances to near or full optimality, and outperforms solutions obtained by
heuristic approaches. In our drug design case, we show how our approach can
identify compounds that efficiently trade-off predicted performance and novelty
with respect to existing, known compounds. In our customized pricing case, we
show how our approach can efficiently determine optimal store-level prices
under a random forest model that delivers excellent predictive accuracy.
Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Björn Schuller
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)
Eliminating the negative effect of highly non-stationary environmental noise
is a long-standing research topic for speech recognition but remains an
important challenge nowadays. To address this issue, traditional unsupervised
signal processing methods seem to have touched the ceiling. However,
data-driven based supervised approaches, particularly the ones designed with
deep learning, have recently emerged as potential alternatives. In this light,
we are going to comprehensively summarise the recently developed and most
representative deep learning approaches to deal with the raised problem in this
article, with the aim of providing guidelines for those who are going deeply
into the field of environmentally robust speech recognition. To better
introduce these approaches, we categorise them into single- and multi-channel
techniques, each of which is specifically described at the front-end, the
back-end, and the joint framework of speech recognition systems. In the
meanwhile, we describe the pros and cons of these approaches as well as the
relationships among them, which can probably benefit future research.
Gabriel Lima Guimaraes, Benjamin Sanchez-Lengeling, Pedro Luis Cunha Farias, Alán Aspuru-Guzik
Comments: 10 pages, 7 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In unsupervised data generation tasks, besides the generation of a sample
based on previous observations, one would often like to give hints to the model
in order to bias the generation towards desirable metrics. We propose a method
that combines Generative Adversarial Networks (GANs) and reinforcement learning
(RL) in order to accomplish exactly that. While RL biases the data generation
process towards arbitrary metrics, the GAN component of the reward function
ensures that the model still remembers information learned from data. We build
upon previous results that incorporated GANs and RL in order to generate
sequence data and test this model in several settings for the generation of
molecules encoded as text sequences (SMILES) and in the context of music
generation, showing for each case that we can effectively bias the generation
process towards desired metrics.
Wei Mao, Suhas Diggavi, Sreeram Kannan
Subjects: Information Theory (cs.IT)
Nanopore sequencing is an emerging new technology for sequencing DNA, which
can read long fragments of DNA (~50,000 bases) in contrast to most current
(short-read) sequencing technologies which can only read hundreds of bases.
While nanopore sequencers can acquire long reads, the high error rates
(20%-30%) pose a technical challenge. In a nanopore sequencer, a DNA is
migrated through a nanopore and current variations are measured. The DNA
sequence is inferred from this observed current pattern using an algorithm
called a base-caller. In this paper, we propose a mathematical model for the
“channel” from the input DNA sequence to the observed current, and calculate
bounds on the information extraction capacity of the nanopore sequencer. This
model incorporates impairments like (non-linear) inter-symbol interference,
deletions, as well as random response. The potential practical application of
such information bounds is two-fold: (1) benchmarking present base-calling
algorithms against those theoretical bounds, and (2) offering an optimization
objective for designing better nanopore sequencers.
Stanislav Kruglik, Marina Dudina, Valeriya Potapova, Alexey Frolov
Comments: submitted to ITW 2017. arXiv admin note: text overlap with arXiv:1702.01314
Subjects: Information Theory (cs.IT)
We investigate one possible generalization of locally recoverable codes (LRC)
with all-symbol locality and availability when recovering sets can intersect in
a small number of coordinates. This feature allows us to increase the
achievable code rate and still meet load balancing requirements. In this paper
we derive an upper bound for the rate of such codes and give explicit
constructions of codes with such a property. These constructions utilize LRC
codes developed by Wang et al.
Rong Zhang, Lie-Liang Yang, Lajos Hanzo
Subjects: Information Theory (cs.IT)
Conventional wireless information transfer by modulating the amplitude, phase
or frequency leads to an inevitable Rate-Energy (RE) trade-off in the presence
of simultaneous Wireless Power Transfer (WPT). In echoing Varshney’s seminal
concept of jointly transmitting both information and energy, we propose the
so-called Generalised Precoded Spatial Modulation (GPSM) aided Integrated
Wireless Information and Power Transfer (IWIPT) concept employing a power-split
receiver. The principle of GPSM is that a particular subset of Receive Antennas
(RA) is activated and the activation pattern itself conveys useful information.
Hence, the novelty of our GPSM aided IWIPT concept is that RA pattern-based
information transfer is used in addition to the conventional waveform-based
information carried by the classic M-ary modulation. Following the Radio
Frequency (RF) to Direct Current (DC) power conversion invoked for WPT at the
power-split receiver, the non-coherent detector simply compares the remaining
received power accumulated by each legitimate RA pattern for the sake of
identifying the most likely RA. This operation is then followed by
down-conversion and conventional Base Band (BB) M-ary detection. Both our
analysis and simulations show that the RA pattern based information transfer
represented in the Spatial Domain (SD) exhibits a beneficial immunity to any
potential power conversion induced performance degradation and hence improves
the overall RE trade-off when additionally the waveform-based information
transfer is also taken into account. Moreover, we investigate the impact of
realistic imperfect Channel State Information at the Transmitter (CSIT) as well
as that of the antenna correlations encountered. Finally, the system’s
asymptotic performance is characterised in the context of large-scale Multiple
Input Multiple Output (MIMO) systems.
Meysam Sadeghi, Emil Björnson, Erik G. Larsson, Chau Yuen, Thomas L. Marzetta
Subjects: Information Theory (cs.IT)
This paper considers the downlink precoding for physical layer multicasting
in massive multiple-input-multiple-output (MIMO) systems. We study the max-min
fairness (MMF) problem, where channel state information (CSI) at the
transmitter is used to design precoding vectors that maximize the minimum
spectral efficiency (SE) of the system, given fixed power budgets for uplink
training and downlink transmission. Our system model accounts for channel
estimation, pilot contamination, arbitrary pathlosses, and multi-group
multicasting. We consider six scenarios with different transmission
technologies (unicast and multicast), different pilot assignment strategies
(dedicated or shared pilot assignments), and different precoding schemes
(maximum ratio transmission and zero forcing), and derive achievable spectral
efficiencies for all possible combinations. Then we solve the MMF problem for
each of these scenarios and for any given pilot length we find the SE
maximizing uplink pilot and downlink data transmission policies, all in
closed-forms. We use these results to draw a general guideline for massive MIMO
multicasting design, where for a given number of base station antennas, number
of users, and coherence interval length, we determine the multicasting scheme
that shall be used.
Manar Mohaisen, Saetbyeol Lee
Comments: 11 pages, 3 tables, 11 figures. ETRI Journal, 2017
Subjects: Information Theory (cs.IT)
In this paper, we propose a spatial modulation (SM) scheme referred to as
complex quadrature spatial modulation (CQSM). In contrast to quadrature spatial
modulation (QSM), CQSM transmits two complex signal constellation symbols on
the real and quadrature spatial dimensions at each channel use, increasing the
spectral efficiency. To this end, signal symbols transmitted at any given time
instant are drawn from two different modulation sets. The first modulation set
is any of the conventional QAM/PSK alphabets, while the second is a rotated
version of it. The optimal rotation angle is obtained through simulations for
several modulation schemes and analytically proven for the case of QPSK, where
both results coincide. Simulation results showed that CQSM outperformed QSM and
generalized SM (GSM) by approximately 5 and 4.5 dB, respectively, for the same
transmission rate. Its performance was similar to that of QSM; however, it
achieved higher transmission rates. It was additionally shown numerically and
analytically that CQSM outperformed QSM for a relatively large number of
transmit antennas.
Linus Hamilton, Frederic Koehler, Ankur Moitra
Comments: 25 pages
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)
Markov random fields area popular model for high-dimensional probability
distributions. Over the years, many mathematical, statistical and algorithmic
problems on them have been studied. Until recently, the only known algorithms
for provably learning them relied on exhaustive search, correlation decay or
various incoherence assumptions. Bresler gave an algorithm for learning general
Ising models on bounded degree graphs. His approach was based on a structural
result about mutual information in Ising models.
Here we take a more conceptual approach to proving lower bounds on the mutual
information through setting up an appropriate zero-sum game. Our proof
generalizes well beyond Ising models, to arbitrary Markov random fields with
higher order interactions. As an application, we obtain algorithms for learning
Markov random fields on bounded degree graphs on (n) nodes with (r)-order
interactions in (n^r) time and (log n) sample complexity. The sample
complexity is information theoretically optimal up to the dependence on the
maximum degree. The running time is nearly optimal under standard conjectures
about the hardness of learning parity with noise.
Harm Derksen
Comments: 59 pages, 27 figures
Subjects: Numerical Analysis (math.NA); Information Theory (cs.IT); Numerical Analysis (cs.NA); Optimization and Control (math.OC); Statistics Theory (math.ST)
We study the Pareto frontier for two competing norms (|cdot|_X) and
(|cdot|_Y) on a vector space. For a given vector (c), the pareto frontier
describes the possible values of ((|a|_X,|b|_Y)) for a decomposition
(c=a+b). The singular value decomposition of a matrix is closely related to the
Pareto frontier for the spectral and nuclear norm. We will develop a general
theory that extends the notion of singular values of a matrix to arbitrary
finite dimensional euclidean vector spaces equipped with dual norms. This also
generalizes the diagonal singular value decompositions for tensors introduced
by the author in previous work. We can apply the results to denoising, where
(c) is a noisy signal, (a) is a sparse signal and (b) is noise. Applications
include 1D total variation denoising, 2D total variation Rudin-Osher-Fatemi
image denoising, LASSO, basis pursuit denoising and tensor decompositions.