Priyadarshini Panda, Jason M. Allred, Shriram Ramanathan, Kaushik Roy
Comments: 14 pages, 14 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
A fundamental feature of learning in animals is the “ability to forget” that
allows an organism to perceive, model and make decisions from disparate streams
of information and adapt to changing environments. Against this backdrop, we
present a novel unsupervised learning mechanism ASP (Adaptive Synaptic
Plasticity) for improved recognition with Spiking Neural Networks (SNNs) for
real time on-line learning in a dynamic environment. We incorporate an adaptive
weight decay mechanism with the traditional Spike Timing Dependent Plasticity
(STDP) learning to model adaptivity in SNNs. The leak rate of the synaptic
weights is modulated based on the temporal correlation between the spiking
patterns of the pre- and post-synaptic neurons. This mechanism helps in gradual
forgetting of insignificant data while retaining significant, yet old,
information. ASP, thus, maintains a balance between forgetting and immediate
learning to construct a stable-plastic self-adaptive SNN for continuously
changing inputs. We demonstrate that the proposed learning methodology
addresses catastrophic forgetting while yielding significantly improved
accuracy over the conventional STDP learning method for digit recognition
applications. Additionally, we observe that the proposed learning model
automatically encodes selective attention towards relevant features in the
input data while eliminating the influence of background noise (or denoising)
further improving the robustness of the ASP learning.
Shumeet Baluja
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)
In all but the most trivial optimization problems, the structure of the
solutions exhibit complex interdependencies between the input parameters.
Decades of research with stochastic search techniques has shown the benefit of
explicitly modeling the interactions between sets of parameters and the overall
quality of the solutions discovered. We demonstrate a novel method, based on
learning deep networks, to model the global landscapes of optimization
problems. To represent the search space concisely and accurately, the deep
networks must encode information about the underlying parameter interactions
and their contributions to the quality of the solution. Once the networks are
trained, the networks are probed to reveal parameter combinations with high
expected performance with respect to the optimization task. These estimates are
used to initialize fast, randomized, local search algorithms, which in turn
expose more information about the search space that is subsequently used to
refine the models. We demonstrate the technique on multiple optimization
problems that have arisen in a variety of real-world domains, including:
packing, graphics, job scheduling, layout and compression. The problems include
combinatoric search spaces, discontinuous and highly non-linear spaces, and
span binary, higher-cardinality discrete, as well as continuous parameters.
Strengths, limitations, and extensions of the approach are extensively
discussed and demonstrated.
Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo
Comments: Submitted to Interspeech-2017
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Recent work on end-to-end automatic speech recognition (ASR) has shown that
the connectionist temporal classification (CTC) loss can be used to convert
acoustics to phone or character sequences. Such systems are used with a
dictionary and separately-trained Language Model (LM) to produce word
sequences. However, they are not truly end-to-end in the sense of mapping
acoustics directly to words without an intermediate phone representation. In
this paper, we present the first results employing direct acoustics-to-word CTC
models on two well-known public benchmark tasks: Switchboard and CallHome.
These models do not require an LM or even a decoder at run-time and hence
recognize speech with minimal complexity. However, due to the large number of
word output units, CTC word models require orders of magnitude more data to
train reliably compared to traditional systems. We present some techniques to
mitigate this issue. Our CTC word model achieves a word error rate of
13.0%/18.8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or
decoder compared with 9.6%/16.0% for phone-based CTC with a 4-gram LM. We also
present rescoring results on CTC word model lattices to quantify the
performance benefits of a LM, and contrast the performance of word and phone
CTC models.
Alexander Hermans, Lucas Beyer, Bastian Leibe
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In the past few years, the field of computer vision has gone through a
revolution fueled mainly by the advent of large datasets and the adoption of
deep convolutional neural networks for end-to-end learning. The person
re-identification subfield is no exception to this, thanks to the notable
publication of the Market-1501 and MARS datasets and several strong deep
learning approaches. Unfortunately, a prevailing belief in the community seems
to be that the triplet loss is inferior to using surrogate losses
(classification, verification) followed by a separate metric learning step. We
show that, for models trained from scratch as well as pretrained ones, using a
variant of the triplet loss to perform end-to-end deep metric learning
outperforms any other published method by a large margin.
Alexander Hermans, Lucas Beyer, Bastian Leibe
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In the past few years, the field of computer vision has gone through a
revolution fueled mainly by the advent of large datasets and the adoption of
deep convolutional neural networks for end-to-end learning. The person
re-identification subfield is no exception to this, thanks to the notable
publication of the Market-1501 and MARS datasets and several strong deep
learning approaches. Unfortunately, a prevailing belief in the community seems
to be that the triplet loss is inferior to using surrogate losses
(classification, verification) followed by a separate metric learning step. We
show that, for models trained from scratch as well as pretrained ones, using a
variant of the triplet loss to perform end-to-end deep metric learning
outperforms any other published method by a large margin.
Thijs Kooi, Nico Karssemeijer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We investigate the addition of symmetry and temporal context information to a
deep Convolutional Neural Network (CNN) with the purpose of detecting malignant
soft tissue lesions in mammography. We employ a simple linear mapping that
takes the location of a mass candidate and maps it to either the contra-lateral
or prior mammogram and Regions Of Interest (ROI) are extracted around each
location. We subsequently explore two different architectures (1) a fusion
model employing two datastreams were both ROIs are fed to the network during
training and testing and (2) a stage-wise approach where a single ROI CNN is
trained on the primary image and subsequently used as feature extractor for
both primary and symmetrical or prior ROIs. A ‘shallow’ Gradient Boosted Tree
(GBT) classifier is then trained on the concatenation of these features and
used to classify the joint representation. Results shown a significant increase
in performance using the first architecture and symmetry information, but only
marginal gains in performance using temporal data and the other setting. We
feel results are promising and can greatly be improved when more temporal data
becomes available.
Natalia Neverova, Pauline Luc, Camille Couprie, Jakob Verbeek, Yann LeCun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
The ability to predict and therefore to anticipate the future is an important
attribute of intelligence. It is also of utmost importance in real-time
systems, e.g. in robotics or autonomous driving, which depend on visual scene
understanding for decision making. While prediction of the raw RGB pixel values
in future video frames has been studied in previous work, here we focus on
predicting semantic segmentations of future frames. More precisely, given a
sequence of semantically segmented video frames, our goal is to predict
segmentation maps of not yet observed video frames that lie up to a second or
further in the future. We develop an autoregressive convolutional neural
network that learns to iteratively generate multiple frames. Our results on the
Cityscapes dataset show that directly predicting future segmentations is
substantially better than predicting and then segmenting future RGB frames. Our
models predict trajectories of cars and pedestrians much more accurately (25%)
than baselines that copy the most recent semantic segmentation or warp it using
optical flow. Prediction results up to half a second in the future are visually
convincing, the mean IoU of predicted segmentations reaching two thirds of the
real future segmentations.
Tomas Wilkinson, Jonas Lindström, Anders Brun
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we approach the problem of segmentation-free query-by-string
word spotting for handwritten documents. In other words, we use methods
inspired from computer vision and machine learning to search for words in large
collections of digitized manuscripts. In particular, we are interested in
historical handwritten texts, which are often far more challenging than modern
printed documents. This task is important, as it provides people with a way to
quickly find what they are looking for in large collections that are tedious
and difficult to read manually. To this end, we introduce an end-to-end
trainable model based on deep neural networks that we call Ctrl-F-Net. Given a
full manuscript page, the model simultaneously generates region proposals, and
embeds these into a distributed word embedding space, where searches are
performed. We evaluate the model on common benchmarks for handwritten word
spotting, outperforming the previous state-of-the-art segmentation-free
approaches by a large margin, and in some cases even segmentation-based
approaches. One interesting real-life application of our approach is to help
historians to find and count specific words in court records that are related
to women’s sustenance activities and division of labor. We provide promising
preliminary experiments that validate our method on this task.
Harish Katti, S. P. Arun
Comments: 9 pages, 5 figure, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Faces form the basis for a rich variety of judgments in humans, yet the
underlying features remain poorly understood. Although fine-grained
distinctions within a race might more strongly constrain possible facial
features used by humans than in case of coarse categories such as race or
gender, such fine grained distinctions are relatively less studied.
Fine-grained race classification is also interesting because even humans may
not be perfectly accurate on these tasks. This allows us to compare errors made
by humans and machines, in contrast to standard object detection tasks where
human performance is nearly perfect. We have developed a novel face database of
close to 1650 diverse Indian faces labeled for fine-grained race (South vs
North India) as well as for age, weight, height and gender. We then asked close
to 130 human subjects who were instructed to categorize each face as belonging
toa Northern or Southern state in India. We then compared human performance on
this task with that of computational models trained on the ground-truth labels.
Our main results are as follows: (1) Humans are highly consistent (average
accuracy: 63.6%), with some faces being consistently classified with > 90%
accuracy and others consistently misclassified with < 30% accuracy; (2) Models
trained on ground-truth labels showed slightly worse performance (average
accuracy: 62%) but showed higher accuracy (72.2%) on faces classified with >
80% accuracy by humans. This was true for models trained on simple spatial and
intensity measurements extracted from faces as well as deep neural networks
trained on race or gender classification; (3) Using overcomplete banks of
features derived from each face part, we found that mouth shape was the single
largest contributor towards fine-grained race classification, whereas distances
between face parts was the strongest predictor of gender.
Fan Wu, Zhongwen Xu, Yi Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose an end-to-end approach to the natural language object retrieval
task, which localizes an object within an image according to a natural language
description, i.e., referring expression. Previous works divide this problem
into two independent stages: first, compute region proposals from the image
without the exploration of the language description; second, score the object
proposals with regard to the referring expression and choose the top-ranked
proposals. The object proposals are generated independently from the referring
expression, which makes the proposal generation redundant and even irrelevant
to the referred object. In this work, we train an agent with deep reinforcement
learning, which learns to move and reshape a bounding box to localize the
object according to the referring expression. We incorporate both the spatial
and temporal context information into the training procedure. By simultaneously
exploiting local visual information, the spatial and temporal context and the
referring language a priori, the agent selects an appropriate action to take at
each time. A special action is defined to indicate when the agent finds the
referred object, and terminate the procedure. We evaluate our model on various
datasets, and our algorithm significantly outperforms the compared algorithms.
Notably, the accuracy improvement of our method over the recent method GroundeR
and SCRC on the ReferItGame dataset are 7.67% and 18.25%, respectively.
Florian Chabot, Mohamed Chaouch, Jaonary Rabarisoa, Céline Teulière, Thierry Chateau
Comments: CVPR 2017 (to appear)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a novel approach, called Deep MANTA (Deep
Many-Tasks), for many-task vehicle analysis from a given image. A robust
convolutional network is introduced for simultaneous vehicle detection, part
localization, visibility characterization and 3D dimension estimation. Its
architecture is based on a new coarse-to-fine object proposal that boosts the
vehicle detection. Moreover, the Deep MANTA network is able to localize vehicle
parts even if these parts are not visible. In the inference, the network’s
outputs are used by a real time robust pose estimation algorithm for fine
orientation estimation and 3D vehicle localization. We show in experiments that
our method outperforms monocular state-of-the-art approaches on vehicle
detection, orientation and 3D location tasks on the very challenging KITTI
benchmark.
Qikui Zhu, Bo Du, Baris Turkbey, Peter L . Choyke, Pingkun Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Prostate segmentation from Magnetic Resonance (MR) images plays an important
role in image guided interven- tion. However, the lack of clear boundary
specifically at the apex and base, and huge variation of shape and texture
between the images from different patients make the task very challenging. To
overcome these problems, in this paper, we propose a deeply supervised
convolutional neural network (CNN) utilizing the convolutional information to
accurately segment the prostate from MR images. The proposed model can
effectively detect the prostate region with additional deeply supervised layers
compared with other approaches. Since some information will be abandoned after
convolution, it is necessary to pass the features extracted from early stages
to later stages. The experimental results show that significant segmentation
accuracy improvement has been achieved by our proposed method compared to other
reported approaches.
Guo-Jun Qi, Wei Liu, Charu Aggarwal, Thomas Huang
Comments: The paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. It will apear in a future issue
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a label transfer model from texts to images for
image classification tasks. The problem of image classification is often much
more challenging than text classification. On one hand, labeled text data is
more widely available than the labeled images for classification tasks. On the
other hand, text data tends to have natural semantic interpretability, and they
are often more directly related to class labels. On the contrary, the image
features are not directly related to concepts inherent in class labels. One of
our goals in this paper is to develop a model for revealing the functional
relationships between text and image features as to directly transfer
intermodal and intramodal labels to annotate the images. This is implemented by
learning a transfer function as a bridge to propagate the labels between two
multimodal spaces. However, the intermodal label transfers could be undermined
by blindly transferring the labels of noisy texts to annotate images. To
mitigate this problem, we present an intramodal label transfer process, which
complements the intermodal label transfer by transferring the image labels
instead when relevant text is absent from the source corpus. In addition, we
generalize the inter-modal label transfer to zero-shot learning scenario where
there are only text examples available to label unseen classes of images
without any positive image examples. We evaluate our algorithm on an image
classification task and show the effectiveness with respect to the other
compared algorithms.
Simon Niklaus, Long Mai, Feng Liu
Comments: CVPR 2017, this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Video frame interpolation typically involves two steps: motion estimation and
pixel synthesis. Such a two-step approach heavily depends on the quality of
motion estimation. This paper presents a robust video frame interpolation
method that combines these two steps into a single process. Specifically, our
method considers pixel synthesis for the interpolated frame as local
convolution over two input frames. The convolution kernel captures both the
local motion between the input frames and the coefficients for pixel synthesis.
Our method employs a deep fully convolutional neural network to estimate a
spatially-adaptive convolution kernel for each pixel. This deep neural network
can be directly trained end to end using widely available video data without
any difficult-to-obtain ground-truth data like optical flow. Our experiments
show that the formulation of video interpolation as a single convolution
process allows our method to gracefully handle challenges like occlusion, blur,
and abrupt brightness change and enables high-quality video frame
interpolation.
Fujun Luan, Sylvain Paris, Eli Shechtman, Kavita Bala
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces a deep-learning approach to photographic style transfer
that handles a large variety of image content while faithfully transferring the
reference style. Our approach builds upon recent work on painterly transfer
that separates style from the content of an image by considering different
layers of a neural network. However, as is, this approach is not suitable for
photorealistic style transfer. Even when both the input and reference images
are photographs, the output still exhibits distortions reminiscent of a
painting. Our contribution is to constrain the transformation from the input to
the output to be locally affine in colorspace, and to express this constraint
as a custom CNN layer through which we can backpropagate. We show that this
approach successfully suppresses distortion and yields satisfying
photorealistic style transfers in a broad variety of scenarios, including
transfer of the time of day, weather, season, and artistic edits.
Afonso Menegola, Michel Fornaciali, Ramon Pires, Flávia Vasques Bittencourt, Sandra Avila, Eduardo Valle
Comments: 4 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Knowledge transfer impacts the performance of deep learning — the state of
the art for image classification tasks, including automated melanoma screening.
Deep learning’s greed for large amounts of training data poses a challenge for
medical tasks, which we can alleviate by recycling knowledge from models
trained on different tasks, in a scheme called transfer learning. Although much
of the best art on automated melanoma screening employs some form of transfer
learning, a systematic evaluation was missing. Here we investigate the presence
of transfer, from which task the transfer is sourced, and the application of
fine tuning (i.e., retraining of the deep learning model after transfer). We
also test the impact of picking deeper (and more expensive) models. Our results
favor deeper models, pre-trained over ImageNet, with fine-tuning, reaching an
AUC of 80.7% and 84.5% for the two skin-lesion datasets evaluated.
S. Alireza Golestaneh, Lina J. Karam
Comments: Paper got accepted in CVPR 2017
Journal-ref: 2017 IEEE Conference on Computer Vision and Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The detection of spatially-varying blur without having any information about
the blur type is a challenging task. In this paper, we propose a novel
effective approach to address the blur detection problem from a single image
without requiring any knowledge about the blur type, level, or camera settings.
Our approach computes blur detection maps based on a novel High-frequency
multiscale Fusion and Sort Transform (HiFST) of gradient magnitudes. The
evaluations of the proposed approach on a diverse set of blurry images with
different blur types, levels, and contents demonstrate that the proposed
algorithm performs favorably against the state-of-the-art methods qualitatively
and quantitatively.
Chunhui Liu, Yueyu Hu, Yanghao Li, Sijie Song, Jiaying Liu
Comments: 10 pages, submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Despite the fact that many 3D human activity benchmarks being proposed, most
existing action datasets focus on the action recognition tasks for the
segmented videos. There is a lack of standard large-scale benchmarks,
especially for current popular data-hungry deep learning based methods. In this
paper, we introduce a new large scale benchmark (PKU-MMD) for continuous
multi-modality 3D human action understanding and cover a wide range of complex
human activities with well annotated information. PKU-MMD contains 1076 long
video sequences in 51 action categories, performed by 66 subjects in three
camera views. It contains almost 20,000 action instances and 5.4 million frames
in total. Our dataset also provides multi-modality data sources, including RGB,
depth, Infrared Radiation and Skeleton. With different modalities, we conduct
extensive experiments on our dataset in terms of two scenarios and evaluate
different methods by various metrics, including a new proposed evaluation
protocol 2D-AP. We believe this large-scale dataset will benefit future
researches on action detection for the community.
Feras Dayoub, Niko Sünderhauf, Peter Corke
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
We investigate different strategies for active learning with Bayesian deep
neural networks. We focus our analysis on scenarios where new, unlabeled data
is obtained episodically, such as commonly encountered in mobile robotics
applications. An evaluation of different strategies for acquisition, updating,
and final training on the CIFAR-10 dataset shows that incremental network
updates with final training on the accumulated acquisition set are essential
for best performance, while limiting the amount of required human labeling
labor.
Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, Saurabh Singh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We address the problem of distance metric learning (DML), defined as learning
a distance consistent with a notion of semantic similarity. Traditionally, for
this problem supervision is expressed in the form of sets of points that follow
an ordinal relationship — an anchor point (x) is similar to a set of positive
points (Y), and dissimilar to a set of negative points (Z), and a loss defined
over these distances is minimized.
While the specifics of the optimization differ, in this work we collectively
call this type of supervision Triplets and all methods that follow this pattern
Triplet-Based methods. These methods are challenging to optimize. A main issue
is the need for finding informative triplets, which is usually achieved by a
variety of tricks such as increasing the batch size, hard or semi-hard triplet
mining, etc, but even with these tricks, the convergence rate of such methods
is slow. In this paper we propose to optimize the triplet loss on a different
space of triplets, consisting of an anchor data point and similar and
dissimilar proxy points. These proxies approximate the original data points, so
that a triplet loss over the proxies is a tight upper bound of the original
loss. This proxy-based loss is empirically better behaved. As a result, the
proxy-loss improves on state-of-art results for three standard zero-shot
learning datasets, by up to 15% points, while converging three times as fast as
other triplet-based losses.
Sungmin Eum, Hyungtae Lee, Heesung Kwon, David Doermann
Comments: submitted to IEEE International Conference on Image Processing 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Many previous methods have showed the importance of considering semantically
relevant objects for performing event recognition, yet none of the methods have
exploited the power of deep convolutional neural networks to directly integrate
relevant object information into a unified network. We present a novel unified
deep CNN architecture which integrates architecturally different, yet
semantically-related object detection networks to enhance the performance of
the event recognition task. Our architecture allows the sharing of the
convolutional layers and a fully connected layer which effectively integrates
event recognition, rigid object detection and non-rigid object detection.
Nicolai Wojke, Alex Bewley, Dietrich Paulus
Comments: 5 pages, 1 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Simple Online and Realtime Tracking (SORT) is a pragmatic approach to
multiple object tracking with a focus on simple, effective algorithms. In this
paper, we integrate appearance information to improve the performance of SORT.
Due to this extension we are able to track objects through longer periods of
occlusions, effectively reducing the number of identity switches. In spirit of
the original framework we place much of the computational complexity into an
offline pre-training stage where we learn a deep association metric on a
large-scale person re-identification dataset. During online application, we
establish measurement-to-track associations using nearest neighbor queries in
visual appearance space. Experimental evaluation shows that our extensions
reduce the number of identity switches by 45%, achieving overall competitive
performance at high frame rates.
Priyadarshini Panda, Jason M. Allred, Shriram Ramanathan, Kaushik Roy
Comments: 14 pages, 14 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
A fundamental feature of learning in animals is the “ability to forget” that
allows an organism to perceive, model and make decisions from disparate streams
of information and adapt to changing environments. Against this backdrop, we
present a novel unsupervised learning mechanism ASP (Adaptive Synaptic
Plasticity) for improved recognition with Spiking Neural Networks (SNNs) for
real time on-line learning in a dynamic environment. We incorporate an adaptive
weight decay mechanism with the traditional Spike Timing Dependent Plasticity
(STDP) learning to model adaptivity in SNNs. The leak rate of the synaptic
weights is modulated based on the temporal correlation between the spiking
patterns of the pre- and post-synaptic neurons. This mechanism helps in gradual
forgetting of insignificant data while retaining significant, yet old,
information. ASP, thus, maintains a balance between forgetting and immediate
learning to construct a stable-plastic self-adaptive SNN for continuously
changing inputs. We demonstrate that the proposed learning methodology
addresses catastrophic forgetting while yielding significantly improved
accuracy over the conventional STDP learning method for digit recognition
applications. Additionally, we observe that the proposed learning model
automatically encodes selective attention towards relevant features in the
input data while eliminating the influence of background noise (or denoising)
further improving the robustness of the ASP learning.
Tao Ding, Shimei Pan, Warren K. Bickel
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
In economics and psychology, delay discounting is often used to characterize
how individuals choose between a smaller immediate reward and a larger delayed
reward. People with higher delay discounting rate (DDR) often choose smaller
but more immediate rewards (a “today person”). In contrast, people with a lower
discounting rate often choose a larger future rewards (a “tomorrow person”).
Since the ability to modulate the desire of immediate gratification for long
term rewards plays an important role in our decision-making, the lower
discounting rate often predicts better social, academic and health outcomes. In
contrast, the higher discounting rate is often associated with problematic
behaviors such as alcohol/drug abuse, pathological gambling and credit card
default. Thus, research on understanding and moderating delay discounting has
the potential to produce substantial societal benefits.
Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli
Comments: 8 pages + 9 pages of supplementary material
Subjects: Artificial Intelligence (cs.AI)
The problem of automatically generating a computer program from some
specification has been studied since the early days of AI. Recently, two
competing approaches for automatic program learning have received significant
attention: (1) neural program synthesis, where a neural network is conditioned
on input/output (I/O) examples and learns to generate a program, and (2) neural
program induction, where a neural network generates new outputs directly using
a latent program representation.
Here, for the first time, we directly compare both approaches on a
large-scale, real-world learning task. We additionally contrast to rule-based
program synthesis, which uses hand-crafted semantics to guide the program
generation. Our neural models use a modified attention RNN to allow encoding of
variable-sized sets of I/O pairs. Our best synthesis model achieves 92%
accuracy on a real-world test set, compared to the 34% accuracy of the previous
best neural synthesis approach. The synthesis model also outperforms a
comparable induction model on this task, but we more importantly demonstrate
that the strength of each approach is highly dependent on the evaluation metric
and end-user application. Finally, we show that we can train our neural models
to remain very robust to the type of noise expected in real-world data (e.g.,
typos), while a highly-engineered rule-based system fails entirely.
Maria-Florina Balcan, Hongyang Zhang
Comments: 39 pages
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
We provide new results concerning noise-tolerant and sample-efficient
learning algorithms under (s)-concave distributions over (mathbb{R}^n) for
(-frac{1}{2n+3}le sle 0). The new class of (s)-concave distributions is a
broad and natural generalization of log-concavity, and includes many important
additional distributions, e.g., the Pareto distribution and (t)-distribution.
This class has been studied in the context of efficient sampling, integration,
and optimization, but much remains unknown concerning the geometry of this
class of distributions and their applications in the context of learning.
The challenge is that unlike the commonly used distributions in learning
(uniform or more generally log-concave distributions), this broader class is
not closed under the marginalization operator and many such distributions are
fat-tailed. In this work, we introduce new convex geometry tools to study the
properties of s-concave distributions and use these properties to provide
bounds on quantities of interest to learning including the probability of
disagreement between two halfspaces, disagreement outside a band, and
disagreement coefficient. We use these results to significantly generalize
prior results for margin-based active learning, disagreement-based active
learning, and passively learning of intersections of halfspaces.
Our analysis of geometric properties of s-concave distributions might be of
independent interest to optimization more broadly.
Christoph Dann, Tor Lattimore, Emma Brunskill
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We present UBEV, a simple and efficient reinforcement learning algorithm for
fixed-horizon episodic Markov decision processes. The main contribution is a
proof that UBEV enjoys a sample-complexity bound that holds for all accuracy
levels simultaneously with high probability, and matches the lower bound except
for logarithmic terms and one factor of the horizon. A consequence of the fact
that our sample-complexity bound holds for all accuracy levels is that the new
algorithm achieves a sub-linear regret of O(sqrt(SAT)), which is the first time
the dependence on the size of the state space has provably appeared inside the
square root. A brief empirical evaluation shows that UBEV is practically
superior to existing algorithms with known sample-complexity guarantees.
Ian Osband, Daniel Russo, Zheng Wen, Benjamin Van Roy
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
We study the use of randomized value functions to guide deep exploration in
reinforcement learning. This offers an elegant means for synthesizing
statistically and computationally efficient exploration with common practical
approaches to value function learning. We present several reinforcement
learning algorithms that leverage randomized value functions and demonstrate
their efficacy through computational studies. We also prove a regret bound that
establishes statistical efficiency with a tabular representation.
Shumeet Baluja
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)
In all but the most trivial optimization problems, the structure of the
solutions exhibit complex interdependencies between the input parameters.
Decades of research with stochastic search techniques has shown the benefit of
explicitly modeling the interactions between sets of parameters and the overall
quality of the solutions discovered. We demonstrate a novel method, based on
learning deep networks, to model the global landscapes of optimization
problems. To represent the search space concisely and accurately, the deep
networks must encode information about the underlying parameter interactions
and their contributions to the quality of the solution. Once the networks are
trained, the networks are probed to reveal parameter combinations with high
expected performance with respect to the optimization task. These estimates are
used to initialize fast, randomized, local search algorithms, which in turn
expose more information about the search space that is subsequently used to
refine the models. We demonstrate the technique on multiple optimization
problems that have arisen in a variety of real-world domains, including:
packing, graphics, job scheduling, layout and compression. The problems include
combinatoric search spaces, discontinuous and highly non-linear spaces, and
span binary, higher-cardinality discrete, as well as continuous parameters.
Strengths, limitations, and extensions of the approach are extensively
discussed and demonstrated.
Vishal Jain, Dr. Mayank Singh
Journal-ref: 7th International Conference on Advanced Computing and
Communication Technologies, 16th November, 2013
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
The proposed methodology is procedural i.e. it follows finite number of steps
that extracts relevant documents according to users query. It is based on
principles of Data Mining for analyzing web data. Data Mining first adapts
integration of data to generate warehouse. Then, it extracts useful information
with the help of algorithm. The task of representing extracted documents is
done by using Vector Based Statistical Approach that represents each document
in set of Terms.
Gagandeep Singh Narula, Vishal Jain
Journal-ref: International Journal of Computer Applications ISSN No 0975 8887
Volume 94 No 2, May 2014
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
A typical IR system that delivers and stores information is affected by
problem of matching between user query and available content on web. Use of
Ontology represents the extracted terms in form of network graph consisting of
nodes, edges, index terms etc. The above mentioned IR approaches provide
relevance thus satisfying users query. The paper also emphasis on analyzing
multimedia documents and performs calculation for extracted terms using
different statistical formulas. The proposed model developed reduces semantic
gap and satisfies user needs efficiently.
Vishal Jain, Dr. Mayank Singh
Journal-ref: 7th International Conference on Advanced Computing and
Communication Technologies, 16th November, 2013
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
The proposed methodology is procedural i.e. it follows finite number of steps
that extracts relevant documents according to users query. It is based on
principles of Data Mining for analyzing web data. Data Mining first adapts
integration of data to generate warehouse. Then, it extracts useful information
with the help of algorithm. The task of representing extracted documents is
done by using Vector Based Statistical Approach that represents each document
in set of Terms.
Gagandeep Singh Narula, Vishal Jain
Journal-ref: International Journal of Computer Applications ISSN No 0975 8887
Volume 94 No 2, May 2014
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
A typical IR system that delivers and stores information is affected by
problem of matching between user query and available content on web. Use of
Ontology represents the extracted terms in form of network graph consisting of
nodes, edges, index terms etc. The above mentioned IR approaches provide
relevance thus satisfying users query. The paper also emphasis on analyzing
multimedia documents and performs calculation for extracted terms using
different statistical formulas. The proposed model developed reduces semantic
gap and satisfies user needs efficiently.
Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo
Comments: Submitted to Interspeech-2017
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Recent work on end-to-end automatic speech recognition (ASR) has shown that
the connectionist temporal classification (CTC) loss can be used to convert
acoustics to phone or character sequences. Such systems are used with a
dictionary and separately-trained Language Model (LM) to produce word
sequences. However, they are not truly end-to-end in the sense of mapping
acoustics directly to words without an intermediate phone representation. In
this paper, we present the first results employing direct acoustics-to-word CTC
models on two well-known public benchmark tasks: Switchboard and CallHome.
These models do not require an LM or even a decoder at run-time and hence
recognize speech with minimal complexity. However, due to the large number of
word output units, CTC word models require orders of magnitude more data to
train reliably compared to traditional systems. We present some techniques to
mitigate this issue. Our CTC word model achieves a word error rate of
13.0%/18.8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or
decoder compared with 9.6%/16.0% for phone-based CTC with a 4-gram LM. We also
present rescoring results on CTC word model lattices to quantify the
performance benefits of a LM, and contrast the performance of word and phone
CTC models.
Zhao Meng, Lili Mou, Zhi Jin
Subjects: Computation and Language (cs.CL)
Traditional speaker change detection in dialogues is typically based on audio
input. In some scenarios, however, researchers can only obtain text, and do not
have access to raw audio signals. Moreover, with the increasing need of deep
semantic processing, text-based dialogue understanding is attracting more
attention in the community. These raise the problem of text-based speaker
change detection. In this paper, we formulate the task as a matching problem of
utterances before and after a certain decision point; we propose a hierarchical
recurrent neural network (RNN) with static sentence-level attention. Our model
comprises three main components: a sentence encoder with a long short term
memory (LSTM)-based RNN, a context encoder with another LSTM-RNN, and a static
sentence-level attention mechanism, which allows rich information interaction.
Experimental results show that neural networks consistently achieve better
performance than feature-based approaches, and that our attention-based model
significantly outperforms non-attention neural networks.
Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur
Comments: 5 pages, 2 figures, submitted to Interspeech 2017
Subjects: Computation and Language (cs.CL)
Modern topic identification (topic ID) systems for speech use automatic
speech recognition (ASR) to produce speech transcripts, and perform supervised
classification on such ASR outputs. However, under resource-limited conditions,
the manually transcribed speech required to develop standard ASR systems can be
severely limited or unavailable. In this paper, we investigate alternative
unsupervised solutions to obtaining tokenizations of speech in terms of a
vocabulary of automatically discovered word-like or phoneme-like units, without
depending on the supervised training of ASR systems. Moreover, using automatic
phoneme-like tokenizations, we demonstrate that a convolutional neural network
based framework for learning spoken document representations provides
competitive performance compared to a standard bag-of-words representation, as
evidenced by comprehensive topic ID evaluations on both single-label and
multi-label classification tasks.
Nathan Schneider, Chuck Wooters
Subjects: Computation and Language (cs.CL)
A new Python API, integrated within the NLTK suite, offers access to the
FrameNet 1.7 lexical database. The lexicon (structured in terms of frames) as
well as annotated sentences can be processed programatically, or browsed with
human-readable displays via the interactive Python prompt.
Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee
Comments: 5 pages
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)
In this paper we analyze the gate activation signals inside the gated
recurrent neural networks, and find the temporal structure of such signals is
highly correlated with the phoneme boundaries. This correlation is further
verified by a set of experiments for phoneme segmentation, in which better
results compared to standard approaches were obtained.
Josef Spillner
Comments: 15 pages, 9 figures, 4 tables, repeatable, unreviewed
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Snafu, or Snake Functions, is a modular system to host, execute and manage
language-level functions offered as stateless (micro-)services to diverse
external triggers. The system interfaces resemble those of commercial FaaS
providers but its implementation provides distinct features which make it
overall useful to research on FaaS and prototyping of FaaS-based applications.
This paper argues about the system motivation in the presence of already
existing alternatives, its design and architecture, the open source
implementation and collected metrics which characterise the system.
Michael Dinitz, Yasamin Nazari
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Combinatorics (math.CO)
Graph spanners have been studied extensively, and have many applications in
algorithms, distributed systems, and computer networks. For many of these
application, we want distributed constructions of spanners, i.e., algorithms
which use only local information. Dinitz and Krauthgamer (PODC 2011) provided a
distributed approximation algorithm for 2-spanners in the LOCAL model with
polylogarithmic running time, but the question of whether a similar algorithm
exists for k-spanners with k > 2 remained open. In this paper, we show that a
similar algorithm also works for cases where k > 2.
Emmanuel Bengio, Valentin Thomas, Joelle Pineau, Doina Precup, Yoshua Bengio
Comments: RLDM submission
Subjects: Learning (cs.LG)
Finding features that disentangle the different causes of variation in real
data is a difficult task, that has nonetheless received considerable attention
in static domains like natural images. Interactive environments, in which an
agent can deliberately take actions, offer an opportunity to tackle this task
better, because the agent can experiment with different actions and observe
their effects. We introduce the idea that in interactive environments, latent
factors that control the variation in observed data can be identified by
figuring out what the agent can control. We propose a naive method to find
factors that explain or measure the effect of the actions of a learner, and
test it in illustrative experiments.
Christoph Dann, Tor Lattimore, Emma Brunskill
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We present UBEV, a simple and efficient reinforcement learning algorithm for
fixed-horizon episodic Markov decision processes. The main contribution is a
proof that UBEV enjoys a sample-complexity bound that holds for all accuracy
levels simultaneously with high probability, and matches the lower bound except
for logarithmic terms and one factor of the horizon. A consequence of the fact
that our sample-complexity bound holds for all accuracy levels is that the new
algorithm achieves a sub-linear regret of O(sqrt(SAT)), which is the first time
the dependence on the size of the state space has provably appeared inside the
square root. A brief empirical evaluation shows that UBEV is practically
superior to existing algorithms with known sample-complexity guarantees.
Morteza Ashraphijuo, Xiaodong Wang
Comments: arXiv admin note: text overlap with arXiv:1612.01597
Subjects: Learning (cs.LG); Information Theory (cs.IT); Algebraic Geometry (math.AG); Machine Learning (stat.ML)
In this paper, we analyze the fundamental conditions for low-rank tensor
completion given the separation or tensor-train (TT) rank, i.e., ranks of
unfoldings. We exploit the algebraic structure of the TT decomposition to
obtain the deterministic necessary and sufficient conditions on the locations
of the samples to ensure finite completability. Specifically, we propose an
algebraic geometric analysis on the TT manifold that can incorporate the whole
rank vector simultaneously in contrast to the existing approach based on the
Grassmannian manifold that can only incorporate one rank component. Our
proposed technique characterizes the algebraic independence of a set of
polynomials defined based on the sampling pattern and the TT decomposition,
which is instrumental to obtaining the deterministic condition on the sampling
pattern for finite completability. In addition, based on the proposed analysis,
assuming that the entries of the tensor are sampled independently with
probability (p), we derive a lower bound on the sampling probability (p), or
equivalently, the number of sampled entries that ensures finite completability
with high probability. Moreover, we also provide the deterministic and
probabilistic conditions for unique completability.
Shaul Zevin, Catherine Holzem
Comments: 13 pages, 4 tables, 4 examples
Subjects: Learning (cs.LG); Programming Languages (cs.PL)
As of today the programming language of the vast majority of the published
source code is manually specified or programmatically assigned based on the
sole file extension. In this paper we show that the source code programming
language identification task can be fully automated using machine learning
techniques. We first define the criteria that a production-level automatic
programming language identification solution should meet. Our criteria include
accuracy, programming language coverage, extensibility and performance. We then
describe our approach: How training files are preprocessed for extracting
features that mimic grammar productions, and then how these extracted grammar
productions are used for the training and testing of our classifier. We achieve
a 99 percent accuracy rate while classifying 29 of the most popular programming
languages with a Maximum Entropy classifier.
Joris Guérin, Olivier Gibaru, Stéphane Thiery, Eric Nyiri
Comments: 13 pages, 6 figures, 2 tables. This paper is under the review process for AIAP 2017
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
This paper describes a method for clustering data that are spread out over
large regions and which dimensions are on different scales of measurement. Such
an algorithm was developed to implement a robotics application consisting in
sorting and storing objects in an unsupervised way. The toy dataset used to
validate such application consists of Lego bricks of different shapes and
colors. The uncontrolled lighting conditions together with the use of RGB color
features, respectively involve data with a large spread and different levels of
measurement between data dimensions. To overcome the combination of these two
characteristics in the data, we have developed a new weighted K-means
algorithm, called gap-ratio K-means, which consists in weighting each dimension
of the feature space before running the K-means algorithm. The weight
associated with a feature is proportional to the ratio of the biggest gap
between two consecutive data points, and the average of all the other gaps.
This method is compared with two other variants of K-means on the Lego bricks
clustering problem as well as two other common classification datasets.
Pranjal Awasthi, Avrim Blum, Nika Haghtalab, Yishay Mansour
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)
In recent years crowdsourcing has become the method of choice for gathering
labeled training data for learning algorithms. Standard approaches to
crowdsourcing view the process of acquiring labeled data separately from the
process of learning a classifier from the gathered data. This can give rise to
computational and statistical challenges. For example, in most cases there are
no known computationally efficient learning algorithms that are robust to the
high level of noise that exists in crowdsourced data, and efforts to eliminate
noise through voting often require a large number of queries per example.
In this paper, we show how by interleaving the process of labeling and
learning, we can attain computational efficiency with much less overhead in the
labeling cost. In particular, we consider the realizable setting where there
exists a true target function in (mathcal{F}) and consider a pool of labelers.
When a noticeable fraction of the labelers are perfect, and the rest behave
arbitrarily, we show that any (mathcal{F}) that can be efficiently learned in
the traditional realizable PAC model can be learned in a computationally
efficient manner by querying the crowd, despite high amounts of noise in the
responses. Moreover, we show that this can be done while each labeler only
labels a constant number of examples and the number of labels requested per
example, on average, is a constant. When no perfect labelers exist, a related
task is to find a set of the labelers which are good but not perfect. We show
that we can identify all good labelers, when at least the majority of labelers
are good.
George Tucker, Andriy Mnih, Chris J. Maddison, Jascha Sohl-Dickstein
Comments: Accepted as a workshop submission at ICLR 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Learning in models with discrete latent variables is challenging due to high
variance gradient estimators. Generally, approaches have relied on control
variates to reduce the variance of the REINFORCE estimator. Recent work (Jang
et al. 2016, Maddison et al. 2016) has taken a different approach, introducing
a continuous relaxation of discrete variables to produce low-variance, but
biased, gradient estimates. In this work, we combine the two approaches through
a novel control variate that produces low-variance, unbiased gradient
estimates. We present encouraging preliminary results on a toy problem and on
learning sigmoid belief networks.
Xushen Han, Dajiang Zhou, Shihao Wang, Shinji Kimura
Journal-ref: ICCD 2016
Subjects: Learning (cs.LG); Hardware Architecture (cs.AR)
Large-scale deep convolutional neural networks (CNNs) are widely used in
machine learning applications. While CNNs involve huge complexity, VLSI (ASIC
and FPGA) chips that deliver high-density integration of computational
resources are regarded as a promising platform for CNN’s implementation. At
massive parallelism of computational units, however, the external memory
bandwidth, which is constrained by the pin count of the VLSI chip, becomes the
system bottleneck. Moreover, VLSI solutions are usually regarded as a lack of
the flexibility to be reconfigured for the various parameters of CNNs. This
paper presents CNN-MERP to address these issues. CNN-MERP incorporates an
efficient memory hierarchy that significantly reduces the bandwidth
requirements from multiple optimizations including on/off-chip data allocation,
data flow optimization and data reuse. The proposed 2-level reconfigurability
is utilized to enable fast and efficient reconfiguration, which is based on the
control logic and the multiboot feature of FPGA. As a result, an external
memory bandwidth requirement of 1.94MB/GFlop is achieved, which is 55% lower
than prior arts. Under limited DRAM bandwidth, a system throughput of
1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5.48 times
higher than the state-of-the-art FPGA implementations.
Hrayr Harutyunyan, Hrant Khachatrian, David C. Kale, Aram Galstyan
Comments: Proposes clinical prediction benchmark tasks; details can be found at this https URL Manuscript revisions will be uploaded regularly to reflect updates to the benchmark data set, new results, and new findings. An earlier version of this manuscript is currently under review for SIGKDD 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Health care is one of the most exciting frontiers in data mining and machine
learning. Successful adoption of electronic health records (EHRs) created an
explosion in digital clinical data available for analysis, but progress in
machine learning for healthcare research has been difficult to measure because
of the absence of publicly available benchmark data sets. To address this
problem, we propose four clinical prediction benchmarks using data derived from
the publicly available Medical Information Mart for Intensive Care (MIMIC-III)
database. These tasks cover a range of clinical problems including modeling
risk of mortality, forecasting length of stay, detecting physiologic decline,
and phenotype classification. We formulate a heterogeneous multitask problem
where the goal is to jointly learn multiple clinically relevant prediction
tasks based on the same time series data. To address this problem, we propose a
novel recurrent neural network (RNN) architecture that leverages the
correlations between the various tasks to learn a better predictive model. We
validate the proposed neural architecture on this benchmark, and demonstrate
that it outperforms strong baselines, including single task RNNs.
Maria-Florina Balcan, Hongyang Zhang
Comments: 39 pages
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
We provide new results concerning noise-tolerant and sample-efficient
learning algorithms under (s)-concave distributions over (mathbb{R}^n) for
(-frac{1}{2n+3}le sle 0). The new class of (s)-concave distributions is a
broad and natural generalization of log-concavity, and includes many important
additional distributions, e.g., the Pareto distribution and (t)-distribution.
This class has been studied in the context of efficient sampling, integration,
and optimization, but much remains unknown concerning the geometry of this
class of distributions and their applications in the context of learning.
The challenge is that unlike the commonly used distributions in learning
(uniform or more generally log-concave distributions), this broader class is
not closed under the marginalization operator and many such distributions are
fat-tailed. In this work, we introduce new convex geometry tools to study the
properties of s-concave distributions and use these properties to provide
bounds on quantities of interest to learning including the probability of
disagreement between two halfspaces, disagreement outside a band, and
disagreement coefficient. We use these results to significantly generalize
prior results for margin-based active learning, disagreement-based active
learning, and passively learning of intersections of halfspaces.
Our analysis of geometric properties of s-concave distributions might be of
independent interest to optimization more broadly.
Natalia Neverova, Pauline Luc, Camille Couprie, Jakob Verbeek, Yann LeCun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
The ability to predict and therefore to anticipate the future is an important
attribute of intelligence. It is also of utmost importance in real-time
systems, e.g. in robotics or autonomous driving, which depend on visual scene
understanding for decision making. While prediction of the raw RGB pixel values
in future video frames has been studied in previous work, here we focus on
predicting semantic segmentations of future frames. More precisely, given a
sequence of semantically segmented video frames, our goal is to predict
segmentation maps of not yet observed video frames that lie up to a second or
further in the future. We develop an autoregressive convolutional neural
network that learns to iteratively generate multiple frames. Our results on the
Cityscapes dataset show that directly predicting future segmentations is
substantially better than predicting and then segmenting future RGB frames. Our
models predict trajectories of cars and pedestrians much more accurately (25%)
than baselines that copy the most recent semantic segmentation or warp it using
optical flow. Prediction results up to half a second in the future are visually
convincing, the mean IoU of predicted segmentations reaching two thirds of the
real future segmentations.
Ian Osband, Daniel Russo, Zheng Wen, Benjamin Van Roy
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
We study the use of randomized value functions to guide deep exploration in
reinforcement learning. This offers an elegant means for synthesizing
statistically and computationally efficient exploration with common practical
approaches to value function learning. We present several reinforcement
learning algorithms that leverage randomized value functions and demonstrate
their efficacy through computational studies. We also prove a regret bound that
establishes statistical efficiency with a tabular representation.
Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee
Comments: 5 pages
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)
In this paper we analyze the gate activation signals inside the gated
recurrent neural networks, and find the temporal structure of such signals is
highly correlated with the phoneme boundaries. This correlation is further
verified by a set of experiments for phoneme segmentation, in which better
results compared to standard approaches were obtained.
Marc Goessling
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Multivariate binary distributions can be decomposed into products of
univariate conditional distributions. Recently popular approaches have modeled
these conditionals through neural networks with sophisticated weight-sharing
structures. It is shown that state-of-the-art performance on several standard
benchmark datasets can actually be achieved by training separate probability
estimators for each dimension. In that case, model training can be trivially
parallelized over data dimensions. On the other hand, complexity control has to
be performed for each learned conditional distribution. Three possible methods
are considered and experimentally compared. The estimator that is employed for
each conditional is LogitBoost. Similarities and differences between the
proposed approach and autoregressive models based on neural networks are
discussed in detail.
Feras Dayoub, Niko Sünderhauf, Peter Corke
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
We investigate different strategies for active learning with Bayesian deep
neural networks. We focus our analysis on scenarios where new, unlabeled data
is obtained episodically, such as commonly encountered in mobile robotics
applications. An evaluation of different strategies for acquisition, updating,
and final training on the CIFAR-10 dataset shows that incremental network
updates with final training on the accumulated acquisition set are essential
for best performance, while limiting the amount of required human labeling
labor.
Shumeet Baluja
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)
In all but the most trivial optimization problems, the structure of the
solutions exhibit complex interdependencies between the input parameters.
Decades of research with stochastic search techniques has shown the benefit of
explicitly modeling the interactions between sets of parameters and the overall
quality of the solutions discovered. We demonstrate a novel method, based on
learning deep networks, to model the global landscapes of optimization
problems. To represent the search space concisely and accurately, the deep
networks must encode information about the underlying parameter interactions
and their contributions to the quality of the solution. Once the networks are
trained, the networks are probed to reveal parameter combinations with high
expected performance with respect to the optimization task. These estimates are
used to initialize fast, randomized, local search algorithms, which in turn
expose more information about the search space that is subsequently used to
refine the models. We demonstrate the technique on multiple optimization
problems that have arisen in a variety of real-world domains, including:
packing, graphics, job scheduling, layout and compression. The problems include
combinatoric search spaces, discontinuous and highly non-linear spaces, and
span binary, higher-cardinality discrete, as well as continuous parameters.
Strengths, limitations, and extensions of the approach are extensively
discussed and demonstrated.
Gökhan Gül
Subjects: Information Theory (cs.IT)
Minimax robust decentralized detection is studied for parallel sensor
networks. Random variables corresponding to sensor observations are assumed to
follow a distribution function, which belongs to an uncertainty class. It has
been proven that, for some uncertainty classes, if all probability
distributions are absolutely continuous with respect to a common measure, the
joint stochastic boundedness property, which is the fundamental rule for the
derivations in Veerevalli’s work, does not hold. This raises a natural question
whether minimax robust decentralized detection is possible if the uncertainty
classes do not own this property. The answer to this question has been shown to
be positive, which leads to a generalization of the work of Veerevalli.
Moreover, due to a direct consequence of Tsitsiklis’s work, quantization
functions at the sensors are not required to be monotone. For the proposed
model, some specific examples have been provided and possible generalizations
have been discussed.
Salah Eddine Hajri, Mohamad Assaad
Subjects: Information Theory (cs.IT)
The exponential increase in mobile data traffic forces network operators to
deal with a capacity shortage. One of the most promising technologies for 5G
networks is proactive caching. Using a network of cache enabled small cells,
traffic during peak hours can be reduced through proactively caching the
content that is most probable to be requested. Studying the users behavior and
caching files at base stations accordingly, can offload the backhaul traffic
and improve the network throughput. We explore a new caching framework, in
which, the users are clustered according to their content popularity. The
caching is then done based on the mean content popularity in each cluster. In
order to achieve an efficient clustering of the users, we use a statistical
model selection criterion, namely the Akaike information criterion. We derive a
closed form expression of the hit probability, which will be optimized with
respect to the fractions of small base stations associated to each cluster. We
then investigate the Energy Efficiency of the proposed caching framework. We
derive a closed form expression of the energy efficiency, which will be
optimized by defining the optimal density of active small base stations. We
then provide a small base station allocation algorithm in order to associate
each individual base station with a given cluster. This algorithm aims at
caching in each small base station the files that are most likely to be
requested within their direct neighborhood. Coupled with channel inversion
power control, this optimization will improve the energy efficiency and cache
hit probability of the network. Numerical results show that the clustering
scheme enable to considerably improve the cache hit probability. We also show
that optimizing the allocation of the small base stations results in improving
of the energy efficiency and hit probability in the network.
Ali Dalir, Hassan Aghaeinia
Comments: arXiv admin note: text overlap with arXiv:1703.05944
Subjects: Information Theory (cs.IT)
This paper proposes a new algorithm to improve the throughput of the MIMO
interference channel, under imperfect channel state information (CSI). Each
transmitter and receiver has respectively M and N antennas and network operates
in a time division duplex mode. With the knowledge of channel estimation error
variance, mean of signal-to-interference-plus-noise ratio (SINR) is
approximated. Each transceiver adjusts its filter to maximize the expected
value of SINR. Formulation of maximization problem is convex. The proposed New
Approach for Throughput Enhancement under imperfect CSI utilizes the
reciprocity of wireless networks to maximize the estimated mean. The sum rate
performance of the proposed algorithm is verified using Monte Carlo
simulations.
Walid Saad, Anibal Sanjab, Yunpeng Wang, Charles Kamhoua, Kevin Kwiat
Comments: IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT)
Outsourcing integrated circuit (IC) manufacturing to offshore foundries has
grown exponentially in recent years. Given the critical role of ICs in the
control and operation of vehicular systems and other modern engineering
designs, such offshore outsourcing has led to serious security threats due to
the potential of insertion of hardware trojans – malicious designs that, when
activated, can lead to highly detrimental consequences. In this paper, a novel
game-theoretic framework is proposed to analyze the interactions between a
hardware manufacturer, acting as attacker, and an IC testing facility, acting
as defender. The problem is formulated as a noncooperative game in which the
attacker must decide on the type of trojan that it inserts while taking into
account the detection penalty as well as the damage caused by the trojan.
Meanwhile, the resource-constrained defender must decide on the best testing
strategy that allows optimizing its overall utility which accounts for both
damages and the fines. The proposed game is based on the robust behavioral
framework of prospect theory (PT) which allows capturing the potential
uncertainty, risk, and irrational behavior in the decision making of both the
attacker and defender. For both, the standard rational expected utility (EUT)
case and the PT case, a novel algorithm based on fictitious play is proposed
and shown to converge to a mixed-strategy Nash equilibrium. For an illustrative
case study, thorough analytical results are derived for both EUT and PT to
study the properties of the reached equilibrium as well as the impact of key
system parameters such as the defender-set fine. Simulation results assess the
performance of the proposed framework under both EUT and PT and show that the
use of PT will provide invaluable insights on the outcomes of the proposed
hardware trojan game, in particular, and system security, in general.
Yanxiang Jiang, Hlaing Minn, Xiqi Gao, Xiaohu You
Comments: 12 pages, 5 figures, IEEE TWC, 2008
Journal-ref: IEEE TWC, Apr. 2008
Subjects: Information Theory (cs.IT)
This paper addresses carrier frequency offset (CFO) estimation and training
sequence design for multiple-input multiple-output (MIMO) orthogonal frequency
division multiplexing (OFDM) systems over frequency selective fading channels.
By exploiting the orthogonality of the training sequences in the frequency
domain, integer CFO (ICFO) is estimated. {With the uniformly spaced non-zero
pilots in the training sequences} and the corresponding geometric mapping,
fractional CFO (FCFO) is estimated through the roots of a real polynomial.
Furthermore, the condition for the training sequences to guarantee estimation
identifiability is developed. Through the analysis of the correlation property
of the training sequences, two types of sub-optimal training sequences
generated from the Chu sequence are constructed. Simulation results verify the
good performance of the CFO estimator assisted by the proposed training
sequences.
Javane Rostampoor, S. Mohammad Razavizadeh, Inkyu Lee
Comments: 16 pages, 6 figures, to appear in IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT)
In this paper, we study the energy efficiency (EE) maximization problem in
multiple-input multiple-output (MIMO) two-way relay networks with simultaneous
wireless information and power transfer (SWIPT). The network consists of a
multiple-antenna amplify-and-forward relay node which provides bidirectional
communications between two multiple-antenna transceiver nodes
Alex Dytso, Ronit Bustin, H. Vincent Poor, Shlomo Shamai (Shitz)
Subjects: Information Theory (cs.IT)
The paper establishes the equality condition in the I-MMSE proof of the
entropy power inequality (EPI). This is done by establishing an exact
expression for the deficit between the two sides of the EPI. Interestingly, a
necessary condition for the equality is established by making a connection to
the famous Cauchy functional equation.
Nof Abuzainab, Walid Saad, Choong-Seon Hong, H. Vincent Poor
Subjects: Information Theory (cs.IT)
In this paper, the problem of distributed resource allocation is studied for
an Internet of Things (IoT) system, composed of heterogeneous group of nodes
compromising both machine-type devices (MTDs) and human-type devices (HTDs).
The problem is formulated as a noncooperative game between the heterogeneous
IoT devices that seek to find the optimal time allocation so as to meet their
qualityof-service (QoS) requirements in terms of energy, rate and latency.
Since the strategy space of each device is dependent on the actions of the
other devices, the generalized Nash equilibrium (GNE) solution is first
characterized, and the conditions for uniqueness of the GNE are derived. Then,
to explicitly capture the heterogeneity of the devices, in terms of resource
constraints and QoS needs, a novel and more realistic game-theoretic approach,
based on the behavioral framework of cognitive hierarchy (CH) theory, is
proposed. This approach is then shown to enable the IoT devices to reach a CH
equilibrium (CHE) concept that takes into account the various levels of
rationality corresponding to the heterogeneous computational capabilities and
the information accessible for each one of the MTDs and HTDs. Simulation
results show that the CHE solution maintains a stable performance. In
particular, the proposed CHE solution keeps the percentage of devices with
satisfied QoS constraints above 96% for IoT networks containing up to 4000
devices while not considerably degrading the overall system performance in
terms of the total utility. Simulation results also show that the proposed CHE
solution brings a two fold increase in the total rate of HTDs and deceases the
total energy by MTDs by 78% compared to the equal time policy.
Morteza Ashraphijuo, Xiaodong Wang
Comments: arXiv admin note: text overlap with arXiv:1612.01597
Subjects: Learning (cs.LG); Information Theory (cs.IT); Algebraic Geometry (math.AG); Machine Learning (stat.ML)
In this paper, we analyze the fundamental conditions for low-rank tensor
completion given the separation or tensor-train (TT) rank, i.e., ranks of
unfoldings. We exploit the algebraic structure of the TT decomposition to
obtain the deterministic necessary and sufficient conditions on the locations
of the samples to ensure finite completability. Specifically, we propose an
algebraic geometric analysis on the TT manifold that can incorporate the whole
rank vector simultaneously in contrast to the existing approach based on the
Grassmannian manifold that can only incorporate one rank component. Our
proposed technique characterizes the algebraic independence of a set of
polynomials defined based on the sampling pattern and the TT decomposition,
which is instrumental to obtaining the deterministic condition on the sampling
pattern for finite completability. In addition, based on the proposed analysis,
assuming that the entries of the tensor are sampled independently with
probability (p), we derive a lower bound on the sampling probability (p), or
equivalently, the number of sampled entries that ensures finite completability
with high probability. Moreover, we also provide the deterministic and
probabilistic conditions for unique completability.
Ayan Mahalanobis, Vivek Mallick
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT); Algebraic Geometry (math.AG)
In this short paper, we develop a probabilistic algorithm for the elliptic
curve discrete logarithm problem. This algorithm is not generic in nature, it
uses some properties of the elliptic curve.
Ningning Lu, Yanxiang Jiang, Fuchun Zheng, Xiaohu You
Comments: 6 pages, 4 figures, IEEE Wireless Communications and Networking Conference Workshops (WCNCW’16)
Journal-ref: IEEE Wireless Communications and Networking Conference Workshops
(WCNCW’16), April 2016
Subjects: Networking and Internet Architecture (cs.NI); Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)
In this paper, energy efficient power control for the uplink two-tier
networks where a macrocell tier with a massive multiple-input multiple-output
(MIMO) base station is overlaid with a small cell tier is investigated. We
propose a distributed energy efficient power control algorithm which allows
each user in the two-tier network taking individual decisions to optimize its
own energy efficiency (EE) for the multi-user and multi-cell scenario. The
distributed power control algorithm is implemented by decoupling the EE
optimization problem into two steps. In the first step, we propose to assign
the users on the same resource into the same group and each group can optimize
its own EE, respectively. In the second step, multiple power control games
based on evolutionary game theory (EGT) are formulated for each group, which
allows each user optimizing its own EE. In the EGT-based power control games,
each player selects a strategy giving a higher payoff than the average payoff,
which can improve the fairness among the users. The proposed algorithm has a
linear complexity with respect to the number of subcarriers and the number of
cells in comparison with the brute force approach which has an exponential
complexity. Simulation results show the remarkable improvements in terms of
fairness by using the proposed algorithm.
Yanxiang Jiang, Ningning Lu, Yan Chen, Mehdi Bennis, Fuchun Zheng, Xiqi Gao, Xiaohu You
Comments: 8 pages, 10 figures. This paper has been accepted by IEEE Transactions on Vehicular Technology
Journal-ref: IEEE Transactions on Vehicular Technology, 2017
Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)
In this paper, energy efficient power control for small cells underlaying a
macro cellular network is investigated. We formulate the power control problem
in self-organizing small cell networks as a non-cooperative game, and propose a
distributed energy efficient power control scheme, which allows the small base
stations (SBSs) to take individual decisions for attaining the Nash equilibrium
(NE) with minimum information exchange. Specially, in the non-cooperative power
control game, a non-convex optimization problem is formulated for each SBS to
maximize their energy efficiency (EE). By exploiting the properties of
parameter-free fractional programming and the concept of perspective function,
the non-convex optimization problem for each SBS is transformed into an
equivalent constrained convex optimization problem. Then, the constrained
convex optimization problem is converted into an unconstrained convex
optimization problem by exploiting the mixed penalty function method. The
inequality constraints are eliminated by introducing the logarithmic barrier
functions and the equality constraint is eliminated by introducing the
quadratic penalty function. We also theoretically show the existence and the
uniqueness of the NE in the non-cooperative power control game. Simulation
results show remarkable improvements in terms of EE by using the proposed
scheme.
Carlos Fernandez-Granda, Gongguo Tang, Xiaodong Wang, Le Zheng
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)
We consider the problem of super-resolving the line spectrum of a
multisinusoidal signal from a finite number of samples, some of which may be
completely corrupted. Measurements of this form can be modeled as an additive
mixture of a sinusoidal and a sparse component. We propose to demix the two
components and super-resolve the spectrum of the multisinusoidal signal by
solving a convex program. Our main theoretical result is that– up to
logarithmic factors– this approach is guaranteed to be successful with high
probability for a number of spectral lines that is linear in the number of
measurements, even if a constant fraction of the data are outliers. The result
holds under the assumption that the phases of the sinusoidal and sparse
components are random and the line spectrum satisfies a minimum-separation
condition. We show that the method can be implemented via semidefinite
programming and explain how to adapt it in the presence of dense perturbations,
as well as exploring its connection to atomic-norm denoising. In addition, we
propose a fast greedy demixing method which provides good empirical results
when coupled with a local nonconvex-optimization step.