Asier Mujika, Florian Meier, Angelika Steger
Subjects: Neural and Evolutionary Computing (cs.NE)
Processing sequential data of variable length is a major challenge in a wide
range of applications, such as speech recognition, language modeling,
generative image modeling and machine translation. Here, we address this
challenge by proposing a novel recurrent neural network (RNN) architecture, the
Fast-Slow RNN (FS-RNN). The FS-RNN incorporates the strengths of both
multiscale RNNs and deep transition RNNs as it processes sequential data on
different timescales and learns complex transition functions from one time step
to the next. We evaluate the FS-RNN on two character level language modeling
data sets, Penn Treebank and Hutter Prize Wikipedia, where we improve state of
the art results to (1.19) and (1.25) bits-per-character (BPC), respectively. In
addition, an ensemble of two FS-RNNs achieves (1.20) BPC on Hutter Prize
Wikipedia outperforming the best known compression algorithm with respect to
the BPC measure. We also present an empirical investigation of the learning and
network dynamics of the FS-RNN, which explains the improved performance
compared to other RNN architectures. Our approach is general as any kind of RNN
cell is a possible building block for the FS-RNN architecture, and thus can be
flexibly applied to different tasks.
Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The key idea of current deep learning methods for dense prediction is to
apply a model on a regular patch centered on each pixel to make pixel-wise
predictions. These methods are limited in the sense that the patches are
determined by network architecture instead of learned from data. In this work,
we propose the dense transformer networks, which can learn the shapes and sizes
of patches from data. The dense transformer networks employ an encoder-decoder
architecture, and a pair of dense transformer modules are inserted into each of
the encoder and decoder paths. The novelty of this work is that we provide
technical solutions for learning the shapes and sizes of patches from data and
efficiently restoring the spatial correspondence required for dense prediction.
The proposed dense transformer modules are differentiable, thus the entire
network can be trained. We apply the proposed networks on natural and
biological image segmentation tasks and show superior performance is achieved
in comparison to baseline methods.
Aditya Grover, Manik Dhar, Stefano Ermon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Evaluating the performance of generative models for unsupervised learning is
inherently challenging due to the lack of well-defined and tractable
objectives. This is particularly difficult for implicit models such as
generative adversarial networks (GANs) which perform extremely well in practice
for tasks such as sample generation, but sidestep the explicit characterization
of a density.
We propose Flow-GANs, a generative adversarial network with the generator
specified as a normalizing flow model which can perform exact likelihood
evaluation. Subsequently, we learn a Flow-GAN using a hybrid objective that
integrates adversarial training with maximum likelihood estimation. We show
empirically the benefits of Flow-GANs on MNIST and CIFAR-10 datasets in
learning generative models that can attain low generalization error based on
the log-likelihoods and generate high quality samples. Finally, we show a
simple, yet hard to beat baseline for GANs based on Gaussian Mixture Models.
Ankit Vani, Yacine Jernite, David Sontag
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this work, we present the Grounded Recurrent Neural Network (GRNN), a
recurrent neural network architecture for multi-label prediction which
explicitly ties labels to specific dimensions of the recurrent hidden state (we
call this process “grounding”). The approach is particularly well-suited for
extracting large numbers of concepts from text. We apply the new model to
address an important problem in healthcare of understanding what medical
concepts are discussed in clinical text. Using a publicly available dataset
derived from Intensive Care Units, we learn to label a patient’s diagnoses and
procedures from their discharge summary. Our evaluation shows a clear advantage
to using our proposed architecture over a variety of strong baselines.
Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie
Comments: MICCAI 2017 Camera Ready
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Mammogram classification is directly related to computer-aided diagnosis of
breast cancer. Traditional methods rely on regions of interest (ROIs) which
require great efforts to annotate. Inspired by the success of using deep
convolutional features for natural image analysis and multi-instance learning
(MIL) for labeling a set of instances/patches, we propose end-to-end trained
deep multi-instance networks for mass classification based on whole mammogram
without the aforementioned ROIs. We explore three different schemes to
construct deep multi-instance networks for whole mammogram classification.
Experimental results on the INbreast dataset demonstrate the robustness of
proposed networks compared to previous work using segmentation and detection
annotations.
Gonzalo Diaz, Achille Fokoue, Giacomo Nannicini, Horst Samulowitz
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
A major challenge in designing neural network (NN) systems is to determine
the best structure and parameters for the network given the data for the
machine learning problem at hand. Examples of parameters are the number of
layers and nodes, the learning rates, and the dropout rates. Typically, these
parameters are chosen based on heuristic rules and manually fine-tuned, which
may be very time-consuming, because evaluating the performance of a single
parametrization of the NN may require several hours. This paper addresses the
problem of choosing appropriate parameters for the NN by formulating it as a
box-constrained mathematical optimization problem, and applying a
derivative-free optimization tool that automatically and effectively searches
the parameter space. The optimization tool employs a radial basis function
model of the objective function (the prediction accuracy of the NN) to
accelerate the discovery of configurations yielding high accuracy. Candidate
configurations explored by the algorithm are trained to a small number of
epochs, and only the most promising candidates receive full training. The
performance of the proposed methodology is assessed on benchmark sets and in
the context of predicting drug-drug interactions, showing promising results.
The optimization tool used in this paper is open-source.
Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The key idea of current deep learning methods for dense prediction is to
apply a model on a regular patch centered on each pixel to make pixel-wise
predictions. These methods are limited in the sense that the patches are
determined by network architecture instead of learned from data. In this work,
we propose the dense transformer networks, which can learn the shapes and sizes
of patches from data. The dense transformer networks employ an encoder-decoder
architecture, and a pair of dense transformer modules are inserted into each of
the encoder and decoder paths. The novelty of this work is that we provide
technical solutions for learning the shapes and sizes of patches from data and
efficiently restoring the spatial correspondence required for dense prediction.
The proposed dense transformer modules are differentiable, thus the entire
network can be trained. We apply the proposed networks on natural and
biological image segmentation tasks and show superior performance is achieved
in comparison to baseline methods.
Paolo Russo, Fabio Maria Carlucci, Tatiana Tommasi, Barbara Caputo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The effectiveness of generative adversarial approaches in producing images
according to a specific style or visual domain has recently opened new
directions to solve the unsupervised domain adaptation problem. It has been
shown that source labeled images can be modified to mimic target samples making
it possible to train directly a classifier in the target domain, despite the
original lack of annotated data. Inverse mappings from the target to the source
domain have also been evaluated but only passing through adapted feature
spaces, thus without new image generation.
In this paper we propose to better exploit the potential of generative
adversarial networks for adaptation by introducing a novel symmetric mapping
among domains. We jointly optimize bi-directional image transformations
combining them with target self-labeling. Moreover we define a new class
consistency loss that aligns the generators in the two directions imposing to
conserve the class identity of an image passing through both domain mappings.
A detailed qualitative and quantitative analysis of the reconstructed images
confirm the power of our approach. By integrating the two domain specific
classifiers obtained with our bi-directional network we exceed previous
state-of-the-art unsupervised adaptation results on four different benchmark
datasets.
Maxim Berman, Matthew B. Blaschko
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The Jaccard loss, commonly referred to as the intersection-over-union loss,
is commonly employed in the evaluation of segmentation quality due to its
better perceptual quality and scale invariance, which lends appropriate
relevance to small objects compared with per-pixel losses. We present a method
for direct optimization of the per-image intersection-over-union loss in neural
networks, in the context of semantic image segmentation, based on a convex
surrogate: the Lov’asz hinge. The loss is shown to perform better with respect
to the Jaccard index measure than other losses traditionally used in the
context of semantic segmentation; such as cross-entropy. We develop a
specialized optimization method, based on an efficient computation of the
proximal operator of the Lov’asz hinge, yielding reliably faster and more
stable optimization than alternatives. We demonstrate the effectiveness of the
method by showing substantially improved intersection-overunion segmentation
scores on the Pascal VOC dataset using a state-of-the-art deep learning
segmentation architecture.
Minju Jung, Haanvid Lee, Jun Tani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Based on the progress of image recognition, video recognition has been
extensively studied recently. However, most of the existing methods are focused
on short-term but not long-term video recognition, called contextual video
recognition. To address contextual video recognition, we use convolutional
recurrent neural networks (ConvRNNs) having a rich spatio-temporal information
processing capability, but ConvRNNs requires extensive computation that slows
down training. In this paper, inspired by the normalization and detrending
methods, we propose adaptive detrending (AD) for temporal normalization in
order to accelerate the training of ConvRNNs, especially for convolutional
gated recurrent unit (ConvGRU). AD removes internal covariate shift within a
sequence of each neuron in recurrent neural networks (RNNs) by subtracting a
trend. In the experiments for contextual recognition on ConvGRU, the results
show that (1) ConvGRU clearly outperforms the feed-forward neural networks, (2)
AD consistently offers a significant training acceleration and generalization
improvement, and (3) AD is further improved by collaborating with the existing
normalization methods.
Qing Sun, Stefan Lee, Dhruv Batra
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We develop the first approximate inference algorithm for 1-Best (and M-Best)
decoding in bidirectional neural sequence models by extending Beam Search (BS)
to reason about both forward and backward time dependencies. Beam Search (BS)
is a widely used approximate inference algorithm for decoding sequences from
unidirectional neural sequence models. Interestingly, approximate inference in
bidirectional models remains an open problem, despite their significant
advantage in modeling information from both the past and future. To enable the
use of bidirectional models, we present Bidirectional Beam Search (BiBS), an
efficient algorithm for approximate bidirectional inference.To evaluate our
method and as an interesting problem in its own right, we introduce a novel
Fill-in-the-Blank Image Captioning task which requires reasoning about both
past and future sentence structure to reconstruct sensible image descriptions.
We use this task as well as the Visual Madlibs dataset to demonstrate the
effectiveness of our approach, consistently outperforming all baseline methods.
Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C.V. Jawahar
Comments: Accepted CVPR 2017 paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
End-to-end training from scratch of current deep architectures for new
computer vision problems would require Imagenet-scale datasets, and this is not
always possible. In this paper we present a method that is able to take
advantage of freely available multi-modal content to train computer vision
algorithms without human supervision. We put forward the idea of performing
self-supervised learning of visual features by mining a large scale corpus of
multi-modal (text and image) documents. We show that discriminative visual
features can be learnt efficiently by training a CNN to predict the semantic
context in which a particular image is more probable to appear as an
illustration. For this we leverage the hidden semantic structures discovered in
the text corpus with a well-known topic modeling technique. Our experiments
demonstrate state of the art performance in image classification, object
detection, and multi-modal retrieval compared to recent self-supervised or
natural-supervised approaches.
Yassine Maalej, Sameh Sorour, Ahmed Abdel-Rahim, Mohsen Guizani
Comments: 7 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we design a multimodal framework for object detection,
recognition and mapping based on the fusion of stereo camera frames, point
cloud Velodyne Lidar scans, and Vehicle-to-Vehicle (V2V) Basic Safety Messages
(BSMs) exchanged using Dedicated Short Range Communication (DSRC). We merge the
key features of rich texture descriptions of objects from 2D images, depth and
distance between objects provided by 3D point cloud and awareness of hidden
vehicles from BSMs’ 3D information. We present a joint pixel to point cloud and
pixel to V2V correspondences of objects in frames from the Kitti Vision
Benchmark Suite by using a semi-supervised manifold alignment approach to
achieve camera-Lidar and camera-V2V mapping of their recognized objects that
have the same underlying manifold.
Junying Li, Zichen Yang, Haifeng Liu, Deng Cai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, learning equivariant representations has attracted considerable
research attention. Dieleman et al. introduce four operations which can be
inserted to CNN to learn deep representations equivariant to rotation. However,
feature maps should be copied and rotated four times in each layer in their
approach, which causes much running time and memory overhead. In order to
address this problem, we propose Deep Rotation Equivariant Network(DREN)
consisting of cycle layers, isotonic layers and decycle layers.Our proposed
layers apply rotation transformation on filters rather than feature maps,
achieving a speed up of more than 2 times with even less memory overhead. We
evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it
can improve the performance of state-of-the-art architectures. Our codes are
released on GitHub.
Lingkun Luo, Xiaofang Wang, Shiqiang Hu, Liming Chen
Comments: 12 pages, 1 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Domain adaptation (DA) is transfer learning which aims to leverage labeled
data in a related source domain to achieve informed knowledge transfer and help
the classification of unlabeled data in a target domain. In this paper, we
propose a novel DA method, namely Robust Data Geometric Structure Aligned,
Close yet Discriminative Domain Adaptation (RSA-CDDA), which brings closer, in
a latent joint subspace, both source and target data distributions, and aligns
inherent hidden source and target data geometric structures while performing
discriminative DA in repulsing both interclass source and target data. The
proposed method performs domain adaptation between source and target in solving
a unified model, which incorporates data distribution constraints, in
particular via a nonparametric distance, i.e., Maximum Mean Discrepancy (MMD),
as well as constraints on inherent hidden data geometric structure segmentation
and alignment between source and target, through low rank and sparse
representation. RSA-CDDA achieves the search of a joint subspace in solving the
proposed unified model through iterative optimization, alternating Rayleigh
quotient algorithm and inexact augmented Lagrange multiplier algorithm.
Extensive experiments carried out on standard DA benchmarks, i.e., 16
cross-domain image classification tasks, verify the effectiveness of the
proposed method, which consistently outperforms the state-of-the-art methods.
Davit Buniatyan, Thomas Macrina, Dodam Ih, Jonathan Zung, H. Sebastian Seung
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Template matching by normalized cross correlation (NCC) is widely used for
finding image correspondences. We improve the robustness of this algorithm by
preprocessing images with “siamese” convolutional networks trained to maximize
the contrast between NCC values of true and false matches. The improvement is
quantified using patches of brain images from serial section electron
microscopy. Relative to a parameter-tuned bandpass filter, siamese
convolutional networks significantly reduce false matches. Furthermore, all
false matches can be eliminated by removing a tiny fraction of all matches
based on NCC values. The improved accuracy of our method could be essential for
connectomics, because emerging petascale datasets may require billions of
template matches to assemble 2D images of serial sections into a 3D image
stack. Our method is also expected to generalize to many other computer vision
applications that use NCC template matching to find image correspondences.
Yida Wang, Weihong Deng
Comments: 15 pages, 12 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Given large amount of real photos for training, Convolutional neural network
shows excellent performance on object recognition tasks. However, the process
of collecting data is so tedious and the background are also limited which
makes it hard to establish a perfect database. In this paper, our generative
model trained with synthetic images rendered from 3D models reduces the
workload of data collection and limitation of conditions. Our structure is
composed of two sub-networks: semantic foreground object reconstruction network
based on Bayesian inference and classification network based on multi-triplet
cost function for avoiding over-fitting problem on monotone surface and fully
utilizing pose information by establishing sphere-like distribution of
descriptors in each category which is helpful for recognition on regular photos
according to poses, lighting condition, background and category information of
rendered images. Firstly, our conjugate structure called generative model with
metric learning utilizing additional foreground object channels generated from
Bayesian rendering as the joint of two sub-networks. Multi-triplet cost
function based on poses for object recognition are used for metric learning
which makes it possible training a category classifier purely based on
synthetic data. Secondly, we design a coordinate training strategy with the
help of adaptive noises acting as corruption on input images to help both
sub-networks benefit from each other and avoid inharmonious parameter tuning
due to different convergence speed of two sub-networks. Our structure achieves
the state of the art accuracy of over 50\% on ShapeNet database with data
migration obstacle from synthetic images to real photos. This pipeline makes it
applicable to do recognition on real images only based on 3D models.
Anoop Cherian, Suvrit Sra, Richard Hartley
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Representations that can compactly and effectively capture temporal evolution
of semantic content are important to machine learning algorithms that operate
on multi-variate time-series data. We investigate such representations
motivated by the task of human action recognition. Here each data instance is
encoded by a multivariate feature (such as via a deep CNN) where action
dynamics are characterized by their variations in time. As these features are
often non-linear, we propose a novel pooling method, kernelized rank pooling,
that represents a given sequence compactly as the pre-image of the parameters
of a hyperplane in an RKHS, projections of data onto which captures their
temporal order. We develop this idea further and show that such a pooling
scheme can be cast as an order-constrained kernelized PCA objective; we then
propose to use the parameters of a kernelized low-rank feature subspace as the
representation of the sequences. We cast our formulation as an optimization
problem on generalized Grassmann manifolds and then solve it efficiently using
Riemannian optimization techniques. We present experiments on several action
recognition datasets using diverse feature modalities and demonstrate
state-of-the-art results.
Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie
Comments: MICCAI 2017 Camera Ready
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Mammogram classification is directly related to computer-aided diagnosis of
breast cancer. Traditional methods rely on regions of interest (ROIs) which
require great efforts to annotate. Inspired by the success of using deep
convolutional features for natural image analysis and multi-instance learning
(MIL) for labeling a set of instances/patches, we propose end-to-end trained
deep multi-instance networks for mass classification based on whole mammogram
without the aforementioned ROIs. We explore three different schemes to
construct deep multi-instance networks for whole mammogram classification.
Experimental results on the INbreast dataset demonstrate the robustness of
proposed networks compared to previous work using segmentation and detection
annotations.
Ahmed Ibrahim, A. Lynn Abbott, Mohamed E. Hussein
Comments: Accepted in the 14th International Conference on Image Analysis and Recognition (ICIAR) 2017, Montreal, Canada
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces a new architectural framework, known as input
fast-forwarding, that can enhance the performance of deep networks. The main
idea is to incorporate a parallel path that sends representations of input
values forward to deeper network layers. This scheme is substantially different
from “deep supervision” in which the loss layer is re-introduced to earlier
layers. The parallel path provided by fast-forwarding enhances the training
process in two ways. First, it enables the individual layers to combine
higher-level information (from the standard processing path) with lower-level
information (from the fast-forward path). Second, this new architecture reduces
the problem of vanishing gradients substantially because the fast-forwarding
path provides a shorter route for gradient backpropagation. In order to
evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet),
with 20 convolutional layers along with parallel fast-forward paths, has been
created and tested. The paper presents empirical results that demonstrate
improved learning capacity of FFNet due to fast-forwarding, as compared to
GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in
size, respectively. All of the source code and deep learning models described
in this paper will be made available to the entire research community
Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto
Comments: Accepted in IJCAI-17
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
The knowledge representation community has built general-purpose ontologies
which contain large amounts of commonsense knowledge over relevant aspects of
the world, including useful visual information, e.g.: “a ball is used by a
football player”, “a tennis player is located at a tennis court”. Current
state-of-the-art approaches for visual recognition do not exploit these
rule-based knowledge sources. Instead, they learn recognition models directly
from training examples. In this paper, we study how general-purpose
ontologies—specifically, MIT’s ConceptNet ontology—can improve the
performance of state-of-the-art vision systems. As a testbed, we tackle the
problem of sentence-based image retrieval. Our retrieval approach incorporates
knowledge from ConceptNet on top of a large pool of object detectors derived
from a deep learning technique. In our experiments, we show that ConceptNet can
improve performance on a common benchmark dataset. Key to our performance is
the use of the ESPGAME dataset to select visually relevant relations from
ConceptNet. Consequently, a main conclusion of this work is that
general-purpose commonsense ontologies improve performance on visual reasoning
tasks when properly filtered to select meaningful visual relations.
Hao Liu, Haoli Bai, Lirong He, Zenglin Xu
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Unsupervised structure learning in high-dimensional time series data has
attracted a lot of research interests. For example, segmenting and labelling
high dimensional time series can be helpful in behavior understanding and
medical diagnosis. Recent advances in generative sequential modeling have
suggested to combine recurrent neural networks with state space models (e.g.,
Hidden Markov Models). This combination can model not only the long term
dependency in sequential data, but also the uncertainty included in the hidden
states. Inheriting these advantages of stochastic neural sequential models, we
propose a structured and stochastic sequential neural network, which models
both the long-term dependencies via recurrent neural networks and the
uncertainty in the segmentation and labels via discrete random variables. For
accurate and efficient inference, we present a bi-directional inference network
by reparamterizing the categorical segmentation and labels with the recent
proposed Gumbel-Softmax approximation and resort to the Stochastic Gradient
Variational Bayes. We evaluate the proposed model in a number of tasks,
including speech modeling, automatic segmentation and labeling in behavior
understanding, and sequential multi-objects recognition. Experimental results
have demonstrated that our proposed model can achieve significant improvement
over the state-of-the-art methods.
Hanul Shin, Jung Kwon Lee, Jaehong Kim, Jiwon Kim
Comments: Submitted to NIPS 2017
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Attempts to train a comprehensive artificial intelligence capable of solving
multiple tasks have been impeded by a chronic problem called catastrophic
forgetting. Although simply replaying all previous data alleviates the problem,
it requires large memory and even worse, often infeasible in real world
applications where the access to past data is limited. Inspired by the
generative nature of hippocampus as a short-term memory system in primate
brain, we propose the Deep Generative Replay, a novel framework with a
cooperative dual model architecture consisting of a deep generative model
(“generator”) and a task solving model (“solver”). With only these two models,
training data for previous tasks can easily be sampled and interleaved with
those for a new task. We test our methods in several sequential learning
settings involving image classification tasks.
Kun He, Fatih Cakir, Sarah A. Bargal, Stan Sclaroff
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We formulate the problem of supervised hashing, or learning binary embeddings
of data, as a learning to rank problem. Specifically, we optimize two common
ranking-based evaluation metrics, Average Precision (AP) and Normalized
Discounted Cumulative Gain (NDCG). Observing that ranking with the discrete
Hamming distance naturally results in ties, we propose to use tie-aware
versions of ranking metrics in both the evaluation and the learning of
supervised hashing. For AP and NDCG, we derive continuous relaxations of their
tie-aware versions, and optimize them using stochastic gradient ascent with
deep neural networks. Our results establish the new state-of-the-art for
tie-aware AP and NDCG on common hashing benchmarks.
Matthias Hein, Maksym Andriushchenko
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Recent work has shown that state-of-the-art classifiers are quite brittle, in
the sense that a small adversarial change of an originally with high confidence
correctly classified input leads to a wrong classification again with high
confidence. This raises concerns that such classifiers are vulnerable to
attacks and calls into question their usage in safety-critical systems. We show
in this paper for the first time formal guarantees on the robustness of a
classifier by giving instance-specific lower bounds on the norm of the input
manipulation required to change the classifier decision. Based on this analysis
we propose the Cross-Lipschitz regularization functional. We show that using
this form of regularization in kernel methods resp. neural networks improves
the robustness of the classifier without any loss in prediction performance.
Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto
Comments: Accepted in IJCAI-17
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
The knowledge representation community has built general-purpose ontologies
which contain large amounts of commonsense knowledge over relevant aspects of
the world, including useful visual information, e.g.: “a ball is used by a
football player”, “a tennis player is located at a tennis court”. Current
state-of-the-art approaches for visual recognition do not exploit these
rule-based knowledge sources. Instead, they learn recognition models directly
from training examples. In this paper, we study how general-purpose
ontologies—specifically, MIT’s ConceptNet ontology—can improve the
performance of state-of-the-art vision systems. As a testbed, we tackle the
problem of sentence-based image retrieval. Our retrieval approach incorporates
knowledge from ConceptNet on top of a large pool of object detectors derived
from a deep learning technique. In our experiments, we show that ConceptNet can
improve performance on a common benchmark dataset. Key to our performance is
the use of the ESPGAME dataset to select visually relevant relations from
ConceptNet. Consequently, a main conclusion of this work is that
general-purpose commonsense ontologies improve performance on visual reasoning
tasks when properly filtered to select meaningful visual relations.
Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, Owain Evans
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Advances in artificial intelligence (AI) will transform modern life by
reshaping transportation, health, science, finance, and the military. To adapt
public policy, we need to better anticipate these advances. Here we report the
results from a large survey of machine learning researchers on their beliefs
about progress in AI. Researchers predict AI will outperform humans in many
activities in the next ten years, such as translating languages (by 2024),
writing high-school essays (by 2026), driving a truck (by 2027), working in
retail (by 2031), writing a bestselling book (by 2049), and working as a
surgeon (by 2053). Researchers believe there is a 50% chance of AI
outperforming humans in all tasks in 45 years and of automating all human jobs
in 120 years, with Asian respondents expecting these dates much sooner than
North Americans. These results will inform discussion amongst researchers and
policymakers about anticipating and managing trends in AI.
Hanul Shin, Jung Kwon Lee, Jaehong Kim, Jiwon Kim
Comments: Submitted to NIPS 2017
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Attempts to train a comprehensive artificial intelligence capable of solving
multiple tasks have been impeded by a chronic problem called catastrophic
forgetting. Although simply replaying all previous data alleviates the problem,
it requires large memory and even worse, often infeasible in real world
applications where the access to past data is limited. Inspired by the
generative nature of hippocampus as a short-term memory system in primate
brain, we propose the Deep Generative Replay, a novel framework with a
cooperative dual model architecture consisting of a deep generative model
(“generator”) and a task solving model (“solver”). With only these two models,
training data for previous tasks can easily be sampled and interleaved with
those for a new task. We test our methods in several sequential learning
settings involving image classification tasks.
Gonzalo Diaz, Achille Fokoue, Giacomo Nannicini, Horst Samulowitz
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
A major challenge in designing neural network (NN) systems is to determine
the best structure and parameters for the network given the data for the
machine learning problem at hand. Examples of parameters are the number of
layers and nodes, the learning rates, and the dropout rates. Typically, these
parameters are chosen based on heuristic rules and manually fine-tuned, which
may be very time-consuming, because evaluating the performance of a single
parametrization of the NN may require several hours. This paper addresses the
problem of choosing appropriate parameters for the NN by formulating it as a
box-constrained mathematical optimization problem, and applying a
derivative-free optimization tool that automatically and effectively searches
the parameter space. The optimization tool employs a radial basis function
model of the objective function (the prediction accuracy of the NN) to
accelerate the discovery of configurations yielding high accuracy. Candidate
configurations explored by the algorithm are trained to a small number of
epochs, and only the most promising candidates receive full training. The
performance of the proposed methodology is assessed on benchmark sets and in
the context of predicting drug-drug interactions, showing promising results.
The optimization tool used in this paper is open-source.
Pouria Amirian, Anahid Basiri, Jeremy Morley
Subjects: Artificial Intelligence (cs.AI)
The explosive growth of the location-enabled devices coupled with the
increasing use of Internet services has led to an increasing awareness of the
importance and usage of geospatial information in many applications. The
navigation apps (often called Maps), use a variety of available data sources to
calculate and predict the travel time as well as several options for routing in
public transportation, car or pedestrian modes. This paper evaluates the
pedestrian mode of Maps apps in three major smartphone operating systems
(Android, iOS and Windows Phone). In the paper, we will show that the Maps apps
on iOS, Android and Windows Phone in pedestrian mode, predict travel time
without learning from the individual’s movement profile. In addition, we will
exemplify that those apps suffer from a specific data quality issue which
relates to the absence of information about location and type of pedestrian
crossings. Finally, we will illustrate learning from movement profile of
individuals using various predictive analytics models to improve the accuracy
of travel time estimation.
Yan Zhao, Xiao Fang, David Simchi-Levi
Subjects: Artificial Intelligence (cs.AI)
Randomized experiments have been used to assist decision-making in many
areas. They help people select the optimal treatment for the test population
with certain statistical guarantee. However, subjects can show significant
heterogeneity in response to treatments. The problem of customizing treatment
assignment based on subject characteristics is known as uplift modeling,
differential response analysis, or personalized treatment learning in
literature. A key feature for uplift modeling is that the data is unlabeled. It
is impossible to know whether the chosen treatment is optimal for an individual
subject because response under alternative treatments is unobserved. This
presents a challenge to both the training and the evaluation of uplift models.
In this paper we describe how to obtain an unbiased estimate of the key
performance metric of an uplift model, the expected response. We present a new
uplift algorithm which creates a forest of randomized trees. The trees are
built with a splitting criterion designed to directly optimize their uplift
performance based on the proposed evaluation method. Both the evaluation method
and the algorithm apply to arbitrary number of treatments and general response
types. Experimental results on synthetic data and industry-provided data show
that our algorithm leads to significant performance improvement over other
applicable methods.
Aditya Grover, Manik Dhar, Stefano Ermon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Evaluating the performance of generative models for unsupervised learning is
inherently challenging due to the lack of well-defined and tractable
objectives. This is particularly difficult for implicit models such as
generative adversarial networks (GANs) which perform extremely well in practice
for tasks such as sample generation, but sidestep the explicit characterization
of a density.
We propose Flow-GANs, a generative adversarial network with the generator
specified as a normalizing flow model which can perform exact likelihood
evaluation. Subsequently, we learn a Flow-GAN using a hybrid objective that
integrates adversarial training with maximum likelihood estimation. We show
empirically the benefits of Flow-GANs on MNIST and CIFAR-10 datasets in
learning generative models that can attain low generalization error based on
the log-likelihoods and generate high quality samples. Finally, we show a
simple, yet hard to beat baseline for GANs based on Gaussian Mixture Models.
Abhishek Kumar, Prasanna Sattigeri, P. Thomas Fletcher
Comments: 16 pages, 7 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Semi-supervised learning methods using Generative Adversarial Networks (GANs)
have shown promising empirical success recently. Most of these methods use a
shared discriminator/classifier which discriminates real examples from fake
while also predicting the class label. Motivated by the ability of the GANs
generator to capture the data manifold well, we propose to estimate the tangent
space to the data manifold using GANs and employ it to inject invariances into
the classifier. In the process, we propose enhancements over existing methods
for learning the inverse mapping (i.e., the encoder) which greatly improves in
terms of semantic similarity of the reconstructed sample with the input sample.
We observe considerable empirical gains in semi-supervised learning over
baselines, particularly in the cases when the number of labeled examples is
low. We also provide insights into how fake examples influence the
semi-supervised learning procedure.
Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)
Reinforcement learning is a powerful paradigm for learning optimal policies
from experimental data. However, to find optimal policies, most reinforcement
learning algorithms explore all possible actions, which may be harmful for
real-world systems. As a consequence, learning algorithms are rarely applied on
safety-critical systems in the real world. In this paper, we present a learning
algorithm that explicitly considers safety in terms of stability guarantees.
Specifically, we extend control theoretic results on Lyapunov stability
verification and show how to use statistical models of the dynamics to obtain
high-performance control policies with provable stability certificates.
Moreover, under additional regularity assumptions in terms of a Gaussian
process prior, we prove that one can effectively and safely collect data in
order to learn about the dynamics and thus both improve control performance and
expand the safe region of the state space. In our experiments, we show how the
resulting algorithm can safely optimize a neural network policy on a simulated
inverted pendulum, without the pendulum ever falling down.
Xiaobo Ma, Yihui He, Xiapu Luo, Jianfeng Li, Mengchen Zhao, Bo An, Xiaohong Guan
Comments: 10 pages, 2 figures, under review for IEEE Intelligent Systems
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Security surveillance is one of the most important issues in smart cities,
especially in an era of terrorism. Deploying a number of (video) cameras is a
common surveillance approach. Given the never-ending power offered by vehicles
to metropolises, exploiting vehicle traffic to design camera placement
strategies could potentially facilitate security surveillance. This article
constitutes the first effort toward building the linkage between vehicle
traffic and security surveillance, which is a critical problem for smart
cities. We expect our study could influence the decision making of surveillance
camera placement, and foster more research of principled ways of security
surveillance beneficial to our physical-world life.
Yonatan Geifman, Ran El-Yaniv
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Selective classification techniques (also known as reject option) have not
yet been considered in the context of deep neural networks (DNNs). These
techniques can potentially significantly improve DNNs prediction performance by
trading-off coverage. In this paper we propose a method to construct a
selective classifier given a trained neural network. Our method allows a user
to set a desired risk level. At test time, the classifier rejects instances as
needed, to grant the desired risk (with high probability). Empirical results
over CIFAR and ImageNet convincingly demonstrate the viability of our method,
which opens up possibilities to operate DNNs in mission-critical applications.
For example, using our method an unprecedented 2% error in top-5 ImageNet
classification can be guaranteed with probability 99.9%, and almost 60% test
coverage.
Denis Newman-Griffis, Eric Fosler-Lussier
Comments: Submitted to NIPS 2017. (8 pages + 4 reference)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We introduce second-order vector representations of words, induced from
nearest neighborhood topological features in pre-trained contextual word
embeddings. We then analyze the effects of using second-order embeddings as
input features in two deep natural language processing models, for named entity
recognition and recognizing textual entailment, as well as a linear model for
paraphrase recognition. Surprisingly, we find that nearest neighbor information
alone is sufficient to capture most of the performance benefits derived from
using pre-trained word embeddings. Furthermore, second-order embeddings are
able to handle highly heterogeneous data better than first-order
representations, though at the cost of some specificity. Additionally,
augmenting contextual embeddings with second-order information further improves
model performance in some cases. Due to variance in the random initializations
of word embeddings, utilizing nearest neighbor features from multiple
first-order embedding samples can also contribute to downstream performance
gains. Finally, we identify intriguing characteristics of second-order
embedding spaces for further research, including much higher density and
different semantic interpretations of cosine similarity.
Matthias Hein, Maksym Andriushchenko
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Recent work has shown that state-of-the-art classifiers are quite brittle, in
the sense that a small adversarial change of an originally with high confidence
correctly classified input leads to a wrong classification again with high
confidence. This raises concerns that such classifiers are vulnerable to
attacks and calls into question their usage in safety-critical systems. We show
in this paper for the first time formal guarantees on the robustness of a
classifier by giving instance-specific lower bounds on the norm of the input
manipulation required to change the classifier decision. Based on this analysis
we propose the Cross-Lipschitz regularization functional. We show that using
this form of regularization in kernel methods resp. neural networks improves
the robustness of the classifier without any loss in prediction performance.
Jakub M. Tomczak, Max Welling
Comments: 15 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Many different methods to train deep generative models have been proposed in
the past. In this paper, we propose to extend the variational auto-encoder
(VAE) framework with a new type of prior which we call “Variational Mixture of
Posteriors” prior, or VampPrior for short. The VampPrior consists of a mixture
distribution (e.g., a mixture of Gaussians) with components given by
variational posteriors conditioned on learnable pseudo-inputs. We further
extend this prior to a two layer hierarchical model and show that this
architecture where prior and posterior are coupled, learns significantly better
models. The model also avoids the usual local optima issues that plague VAEs
related to useless latent dimensions. We provide empirical studies on three
benchmark datasets, namely, MNIST, OMNIGLOT and Caltech 101 Silhouettes, and
show that applying the hierarchical VampPrior delivers state-of-the-art results
on all three datasets in the unsupervised permutation invariant setting.
Sirui Yao, Bert Huang
Subjects: Information Retrieval (cs.IR)
We study fairness in collaborative-filtering recommender systems, which are
sensitive to discrimination that exists in historical data. Biased data can
lead collaborative-filtering methods to make unfair predictions for users from
minority groups. We identify the insufficiency of existing fairness metrics and
propose four new metrics that address different forms of unfairness. These
fairness metrics can be optimized by adding fairness terms to the learning
objective. Experiments on synthetic and real data show that our new metrics can
better measure fairness than the baseline, and that the fairness objectives
effectively help reduce unfairness.
Omid Aghili, Mark Sanderson
Subjects: Information Retrieval (cs.IR)
We describe the results of a qualitative study on journalists’ information
seeking behavior on social media. Based on interviews with eleven journalists
along with a study of a set of university level journalism modules, we
determined the categories of information need types that lead journalists to
social media. We also determined the ways that social media is exploited as a
tool to satisfy information needs and to define influential factors, which
impacted on journalists’ information seeking behavior. We find that not only is
social media used as an information source, but it can also be a supplier of
stories found serendipitously. We find seven information need types that expand
the types found in previous work. We also find five categories of influential
factors that affect the way journalists seek information.
Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto
Comments: Accepted in IJCAI-17
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
The knowledge representation community has built general-purpose ontologies
which contain large amounts of commonsense knowledge over relevant aspects of
the world, including useful visual information, e.g.: “a ball is used by a
football player”, “a tennis player is located at a tennis court”. Current
state-of-the-art approaches for visual recognition do not exploit these
rule-based knowledge sources. Instead, they learn recognition models directly
from training examples. In this paper, we study how general-purpose
ontologies—specifically, MIT’s ConceptNet ontology—can improve the
performance of state-of-the-art vision systems. As a testbed, we tackle the
problem of sentence-based image retrieval. Our retrieval approach incorporates
knowledge from ConceptNet on top of a large pool of object detectors derived
from a deep learning technique. In our experiments, we show that ConceptNet can
improve performance on a common benchmark dataset. Key to our performance is
the use of the ESPGAME dataset to select visually relevant relations from
ConceptNet. Consequently, a main conclusion of this work is that
general-purpose commonsense ontologies improve performance on visual reasoning
tasks when properly filtered to select meaningful visual relations.
Fabio Massimo Zanzotto, Giordano Cristini
Subjects: Computation and Language (cs.CL)
Syntactic parsing is a key task in natural language processing which has been
dominated by symbolic, grammar-based syntactic parsers. Neural networks, with
their distributed representations, are challenging these methods.
In this paper, we want to show that existing parsing algorithms can cross the
border and be defined over distributed representations. We then define D-CYK: a
version of the traditional CYK algorithm defined over distributed
representations. Our D-CYK operates as the original CYK but uses matrix
multiplications. These operations are compatible with traditional neural
networks. Experiments show that D-CYK approximates the original CYK. By showing
that CYK can be performed on distributed representations, our D-CYK opens the
possibility of defining recurrent layers of CYK-informed neural networks.
Jeremy Ferrero, Laurent Besacier, Didier Schwab, Frederic Agnes
Comments: Accepted to BUCC (10th Workshop on Building and Using Comparable Corpora) colocated with ACL 2017
Subjects: Computation and Language (cs.CL)
This paper is a deep investigation of cross-language plagiarism detection
methods on a new recently introduced open dataset, which contains parallel and
comparable collections of documents with multiple characteristics (different
genres, languages and sizes of texts). We investigate cross-language plagiarism
detection methods for 6 language pairs on 2 granularities of text units in
order to draw robust conclusions on the best methods while deeply analyzing
correlations across document styles and languages.
Denis Newman-Griffis, Eric Fosler-Lussier
Comments: Submitted to NIPS 2017. (8 pages + 4 reference)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We introduce second-order vector representations of words, induced from
nearest neighborhood topological features in pre-trained contextual word
embeddings. We then analyze the effects of using second-order embeddings as
input features in two deep natural language processing models, for named entity
recognition and recognizing textual entailment, as well as a linear model for
paraphrase recognition. Surprisingly, we find that nearest neighbor information
alone is sufficient to capture most of the performance benefits derived from
using pre-trained word embeddings. Furthermore, second-order embeddings are
able to handle highly heterogeneous data better than first-order
representations, though at the cost of some specificity. Additionally,
augmenting contextual embeddings with second-order information further improves
model performance in some cases. Due to variance in the random initializations
of word embeddings, utilizing nearest neighbor features from multiple
first-order embedding samples can also contribute to downstream performance
gains. Finally, we identify intriguing characteristics of second-order
embedding spaces for further research, including much higher density and
different semantic interpretations of cosine similarity.
Ankit Vani, Yacine Jernite, David Sontag
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this work, we present the Grounded Recurrent Neural Network (GRNN), a
recurrent neural network architecture for multi-label prediction which
explicitly ties labels to specific dimensions of the recurrent hidden state (we
call this process “grounding”). The approach is particularly well-suited for
extracting large numbers of concepts from text. We apply the new model to
address an important problem in healthcare of understanding what medical
concepts are discussed in clinical text. Using a publicly available dataset
derived from Intensive Care Units, we learn to label a patient’s diagnoses and
procedures from their discharge summary. Our evaluation shows a clear advantage
to using our proposed architecture over a variety of strong baselines.
Asa Dan, Rajit Manohar, Yoram Moses
Comments: This is an extended version of a paper to appear in PODC 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Even in the absence of clocks, time bounds on the duration of actions enable
the use of time for distributed coordination. This paper initiates an
investigation of coordination in such a setting. A new communication structure
called a zigzag pattern is introduced, and shown to guarantee bounds on the
relative timing of events in this clockless model. Indeed, zigzag patterns are
shown to be necessary and sufficient for establishing that events occur in a
manner that satisfies prescribed bounds. We capture when a process can know
that an appropriate zigzag pattern exists, and use this to provide necessary
and sufficient conditions for timed coordination of events using a
full-information protocol in the clockless model.
Archita Agarwal, Zhiyu Liu, Eli Rosenthal, Vikram Saraph
Comments: 17 pages, 9 figures
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
In this work, we provide a general framework for adding a linearizable
iterator to data structures with set operations. We propose a condition on
these set operations, called locality, so that any data structure implemented
from local atomic operations can be augmented with a linearizable iterator as
described by our framework. We then apply the iterator framework to various
data structures, prove locality of their operations, and demonstrate that the
iterator framework does not significantly affect the performance of concurrent
operations.
Hung Cao, Monica Wachowicz, Sangwhan Cha
Comments: Edge-based analytics, real-time transit data streams, fog computing, descriptive analytics, Internet of Mobile Things, edge computing, mobile cloud computing, mobile edge computing
Subjects: Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
The Internet of Mobile Things encompasses stream data being generated by
sensors, network communications that pull and push these data streams, as well
as running processing and analytics that can effectively leverage actionable
information for planning, management, and business advantage. Edge computing
emerges as a new paradigm that decentralizes the communication, computation,
control and storage resources from the cloud to the edge of the Internet. This
paper proposes an edge computing platform where mobile fog nodes are physical
devices where descriptive analytics is deployed to analyze real-time transit
data streams. An application experiment is used to evaluate the advantages and
disadvantages of our proposed platform to run descriptive analytics at the
mobile fog node and support transit managers with actionable information.
Aditya Grover, Manik Dhar, Stefano Ermon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Evaluating the performance of generative models for unsupervised learning is
inherently challenging due to the lack of well-defined and tractable
objectives. This is particularly difficult for implicit models such as
generative adversarial networks (GANs) which perform extremely well in practice
for tasks such as sample generation, but sidestep the explicit characterization
of a density.
We propose Flow-GANs, a generative adversarial network with the generator
specified as a normalizing flow model which can perform exact likelihood
evaluation. Subsequently, we learn a Flow-GAN using a hybrid objective that
integrates adversarial training with maximum likelihood estimation. We show
empirically the benefits of Flow-GANs on MNIST and CIFAR-10 datasets in
learning generative models that can attain low generalization error based on
the log-likelihoods and generate high quality samples. Finally, we show a
simple, yet hard to beat baseline for GANs based on Gaussian Mixture Models.
Abhishek Kumar, Prasanna Sattigeri, P. Thomas Fletcher
Comments: 16 pages, 7 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Semi-supervised learning methods using Generative Adversarial Networks (GANs)
have shown promising empirical success recently. Most of these methods use a
shared discriminator/classifier which discriminates real examples from fake
while also predicting the class label. Motivated by the ability of the GANs
generator to capture the data manifold well, we propose to estimate the tangent
space to the data manifold using GANs and employ it to inject invariances into
the classifier. In the process, we propose enhancements over existing methods
for learning the inverse mapping (i.e., the encoder) which greatly improves in
terms of semantic similarity of the reconstructed sample with the input sample.
We observe considerable empirical gains in semi-supervised learning over
baselines, particularly in the cases when the number of labeled examples is
low. We also provide insights into how fake examples influence the
semi-supervised learning procedure.
Diane Bouchacourt, Ryota Tomioka, Sebastian Nowozin
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We would like to learn a representation of the data which decomposes an
observation into factors of variation which we can independently control.
Specifically, we want to use minimal supervision to learn a latent
representation that reflects the semantics behind a specific grouping of the
data, where within a group the samples share a common factor of variation. For
example, consider a collection of face images grouped by identity. We wish to
anchor the semantics of the grouping into a relevant and disentangled
representation that we can easily exploit. However, existing deep probabilistic
models often assume that the observations are independent and identically
distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new
deep probabilistic model for learning a disentangled representation of a set of
grouped observations. The ML-VAE separates the latent representation into
semantically meaningful parts by working both at the group level and the
observation level, while retaining efficient test-time inference. Quantitative
and qualitative evaluations show that the ML-VAE model (i) learns a
semantically meaningful disentanglement of grouped data, (ii) enables
manipulation of the latent representation, and (iii) generalises to unseen
groups.
Yang Yu, Wei-Yang Qu, Nan Li, Zimin Guo
Comments: Published in IJCAI 2017
Subjects: Learning (cs.LG)
In real-world classification tasks, it is difficult to collect samples of all
possible categories of the environment in the training stage. Therefore, the
classifier should be prepared for unseen classes. When an instance of an unseen
class appears in the prediction stage, a robust classifier should have the
ability to tell it is unseen, instead of classifying it to be any known
category. In this paper, adopting the idea of adversarial learning, we propose
the ASG framework for open-category classification. ASG generates positive and
negative samples of seen categories in the unsupervised manner via an
adversarial learning strategy. With the generated samples, ASG then learns to
tell seen from unseen in the supervised manner. Experiments performed on
several datasets show the effectiveness of ASG.
Hao Liu, Haoli Bai, Lirong He, Zenglin Xu
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Unsupervised structure learning in high-dimensional time series data has
attracted a lot of research interests. For example, segmenting and labelling
high dimensional time series can be helpful in behavior understanding and
medical diagnosis. Recent advances in generative sequential modeling have
suggested to combine recurrent neural networks with state space models (e.g.,
Hidden Markov Models). This combination can model not only the long term
dependency in sequential data, but also the uncertainty included in the hidden
states. Inheriting these advantages of stochastic neural sequential models, we
propose a structured and stochastic sequential neural network, which models
both the long-term dependencies via recurrent neural networks and the
uncertainty in the segmentation and labels via discrete random variables. For
accurate and efficient inference, we present a bi-directional inference network
by reparamterizing the categorical segmentation and labels with the recent
proposed Gumbel-Softmax approximation and resort to the Stochastic Gradient
Variational Bayes. We evaluate the proposed model in a number of tasks,
including speech modeling, automatic segmentation and labeling in behavior
understanding, and sequential multi-objects recognition. Experimental results
have demonstrated that our proposed model can achieve significant improvement
over the state-of-the-art methods.
Bollepalli S. Chandra, Challa S. Sastry, Laxminarayana Anumandla, Soumya Jana
Comments: 19 pages, 9 figures and 5 tables
Subjects: Learning (cs.LG)
While cardiovascular diseases (CVDs) are prevalent across economic strata,
the economically disadvantaged population is disproportionately affected due to
the high cost of traditional CVD management. Accordingly, developing an
ultra-low-cost alternative, affordable even to groups at the bottom of the
economic pyramid, has emerged as a societal imperative. Against this backdrop,
we propose an inexpensive yet accurate home-based electrocardiogram(ECG)
monitoring service. Specifically, we seek to provide point-of-care monitoring
of premature ventricular contractions (PVCs), high frequency of which could
indicate the onset of potentially fatal arrhythmia. Note that a traditional
telecardiology system acquires the ECG, transmits it to a professional
diagnostic centre without processing, and nearly achieves the diagnostic
accuracy of a bedside setup, albeit at high bandwidth cost. In this context, we
aim at reducing cost without significantly sacrificing reliability. To this
end, we develop a dictionary-based algorithm that detects with high sensitivity
the anomalous beats only which are then transmitted. We further compress those
transmitted beats using class-specific dictionaries subject to suitable
reconstruction/diagnostic fidelity. Such a scheme would not only reduce the
overall bandwidth requirement, but also localising anomalous beats, thereby
reducing physicians’ burden. Finally, using Monte Carlo cross validation on
MIT/BIH arrhythmia database, we evaluate the performance of the proposed
system. In particular, with a sensitivity target of at most one undetected PVC
in one hundred beats, and a percentage root mean squared difference less than
9% (a clinically acceptable level of fidelity), we achieved about 99.15%
reduction in bandwidth cost, equivalent to 118-fold savings over traditional
telecardiology.
Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, Barnabás Póczos
Comments: submitted to NIPS 2017
Subjects: Learning (cs.LG)
Generative moment matching network (GMMN) is a deep generative model that
differs from Generative Adversarial Network (GAN) by replacing the
discriminator in GAN with a two-sample test based on kernel maximum mean
discrepancy (MMD). Although some theoretical guarantees of MMD have been
studied, the empirical performance of GMMN is still not as competitive as that
of GAN on challenging and large benchmark datasets. The computational
efficiency of GMMN is also less desirable in comparison with GAN, partially due
to its requirement for a rather large batch size during the training. In this
paper, we propose to improve both the model expressiveness of GMMN and its
computational efficiency by introducing adversarial kernel learning techniques,
as the replacement of a fixed Gaussian kernel in the original GMMN. The new
approach combines the key ideas in both GMMN and GAN, hence we name it MMD-GAN.
The new distance measure in MMD-GAN is a meaningful loss that enjoys the
advantage of weak topology and can be optimized via gradient descent with
relatively small batch sizes. In our evaluation on multiple benchmark datasets,
including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN
significantly outperforms GMMN, and is competitive with other representative
GAN works.
Wenbo Guo, Kaixuan Zhang, Lin Lin, Sui Huang, Xinyu Xing
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
It is oftentimes impossible to understand how machine learning models reach a
decision. While recent research has proposed various technical approaches to
provide some clues as to how a learning model makes individual decisions, they
cannot provide users with ability to inspect a learning model as a complete
entity. In this work, we propose a new technical approach that augments a
Bayesian regression mixture model with multiple elastic nets. Using the
enhanced mixture model, we extract explanations for a target model through
global approximation. To demonstrate the utility of our approach, we evaluate
it on different learning models covering the tasks of text mining and image
recognition. Our results indicate that the proposed approach not only
outperforms the state-of-the-art technique in explaining individual decisions
but also provides users with an ability to discover the vulnerabilities of a
learning model.
Wei-Cheng Chang, Chun-Liang Li, Yiming Yang, Barnabas Poczos
Comments: To appear in International Joint Conference on Artificial Intelligence (IJCAI), 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Large-scale kernel approximation is an important problem in machine learning
research. Approaches using random Fourier features have become increasingly
popular [Rahimi and Recht, 2007], where kernel approximation is treated as
empirical mean estimation via Monte Carlo (MC) or Quasi-Monte Carlo (QMC)
integration [Yang et al., 2014]. A limitation of the current approaches is that
all the features receive an equal weight summing to 1. In this paper, we
propose a novel shrinkage estimator from “Stein effect”, which provides a
data-driven weighting strategy for random features and enjoys theoretical
justifications in terms of lowering the empirical risk. We further present an
efficient randomized algorithm for large-scale applications of the proposed
method. Our empirical results on six benchmark data sets demonstrate the
advantageous performance of this approach over representative baselines in both
kernel approximation and supervised learning tasks.
Osbert Bastani, Carolyn Kim, Hamsa Bastani
Subjects: Learning (cs.LG)
Interpretability has become an important issue as machine learning is
increasingly used to inform consequential decisions. We propose an approach for
interpreting a blackbox model by extracting a decision tree that approximates
the model. Our model extraction algorithm avoids overfitting by leveraging
blackbox model access to actively sample new training points. We prove that as
the number of samples goes to infinity, the decision tree learned using our
algorithm converges to the exact greedy decision tree. In our evaluation, we
use our algorithm to interpret random forests and neural nets trained on
several datasets from the UCI Machine Learning Repository, as well as control
policies learned for three classical reinforcement learning problems. We show
that our algorithm improves over a baseline based on CART on every problem
instance. Furthermore, we show how an interpretation generated by our approach
can be used to understand and debug these models.
Yonatan Geifman, Ran El-Yaniv
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Selective classification techniques (also known as reject option) have not
yet been considered in the context of deep neural networks (DNNs). These
techniques can potentially significantly improve DNNs prediction performance by
trading-off coverage. In this paper we propose a method to construct a
selective classifier given a trained neural network. Our method allows a user
to set a desired risk level. At test time, the classifier rejects instances as
needed, to grant the desired risk (with high probability). Empirical results
over CIFAR and ImageNet convincingly demonstrate the viability of our method,
which opens up possibilities to operate DNNs in mission-critical applications.
For example, using our method an unprecedented 2% error in top-5 ImageNet
classification can be guaranteed with probability 99.9%, and almost 60% test
coverage.
Ran El-Yaniv, Yonatan Geifman, Yair Weiner
Subjects: Learning (cs.LG)
We introduce the Prediction Advantage (PA), a novel performance measure for
prediction functions under any loss function (e.g., classification or
regression). The PA is defined as the performance advantage relative to the
Bayesian risk restricted to knowing only the distribution of the labels. We
derive the PA for well-known loss functions, including 0/1 loss, cross-entropy
loss, absolute loss, and squared loss. In the latter case, the PA is identical
to the well-known R-squared measure, widely used in statistics. The use of the
PA ensures meaningful quantification of prediction performance, which is not
guaranteed, for example, when dealing with noisy imbalanced classification
problems. We argue that among several known alternative performance measures,
PA is the best (and only) quantity ensuring meaningfulness for all noise and
imbalance levels.
Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, Marzyeh Ghassemi
Subjects: Learning (cs.LG)
Real-time prediction of clinical interventions remains a challenge within
intensive care units (ICUs). This task is complicated by data sources that are
noisy, sparse, heterogeneous and outcomes that are imbalanced. In this paper,
we integrate data from all available ICU sources (vitals, labs, notes,
demographics) and focus on learning rich representations of this data to
predict onset and weaning of multiple invasive interventions. In particular, we
compare both long short-term memory networks (LSTM) and convolutional neural
networks (CNN) for prediction of five intervention tasks: invasive ventilation,
non-invasive ventilation, vasopressors, colloid boluses, and crystalloid
boluses. Our predictions are done in a forward-facing manner to enable
“real-time” performance, and predictions are made with a six hour gap time to
support clinically actionable planning. We achieve state-of-the-art results on
our predictive tasks using deep architectures. We explore the use of feature
occlusion to interpret LSTM models, and compare this to the interpretability
gained from examining inputs that maximally activate CNN outputs. We show that
our models are able to significantly outperform baselines in intervention
prediction, and provide insight into model learning, which is crucial for the
adoption of such models in practice.
Brendan Maginnis, Pierre H. Richemond
Subjects: Learning (cs.LG)
Recurrent Neural Networks architectures excel at processing sequences by
modelling dependencies over different timescales. The recently introduced
Recurrent Weighted Average (RWA) unit captures long term dependencies far
better than an LSTM on several challenging tasks. The RWA achieves this by
applying attention to each input and computing a weighted average over the full
history of its computations. Unfortunately, the RWA cannot change the attention
it has assigned to previous timesteps, and so struggles with carrying out
consecutive tasks or tasks with changing requirements. We present the Recurrent
Discounted Attention (RDA) unit that builds on the RWA by additionally allowing
the discounting of the past.
We empirically compare our model to RWA, LSTM and GRU units on several
challenging tasks. On tasks with a single output the RWA, RDA and GRU units
learn much quicker than the LSTM and with better performance. On the multiple
sequence copy task our RDA unit learns the task three times as quickly as the
LSTM or GRU units while the RWA fails to learn at all. On the Wikipedia
character prediction task the LSTM performs best but it followed closely by our
RDA unit. Overall our RDA unit performs well and is sample efficient on a large
variety of sequence tasks.
Matthias Hein, Maksym Andriushchenko
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Recent work has shown that state-of-the-art classifiers are quite brittle, in
the sense that a small adversarial change of an originally with high confidence
correctly classified input leads to a wrong classification again with high
confidence. This raises concerns that such classifiers are vulnerable to
attacks and calls into question their usage in safety-critical systems. We show
in this paper for the first time formal guarantees on the robustness of a
classifier by giving instance-specific lower bounds on the norm of the input
manipulation required to change the classifier decision. Based on this analysis
we propose the Cross-Lipschitz regularization functional. We show that using
this form of regularization in kernel methods resp. neural networks improves
the robustness of the classifier without any loss in prediction performance.
Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The key idea of current deep learning methods for dense prediction is to
apply a model on a regular patch centered on each pixel to make pixel-wise
predictions. These methods are limited in the sense that the patches are
determined by network architecture instead of learned from data. In this work,
we propose the dense transformer networks, which can learn the shapes and sizes
of patches from data. The dense transformer networks employ an encoder-decoder
architecture, and a pair of dense transformer modules are inserted into each of
the encoder and decoder paths. The novelty of this work is that we provide
technical solutions for learning the shapes and sizes of patches from data and
efficiently restoring the spatial correspondence required for dense prediction.
The proposed dense transformer modules are differentiable, thus the entire
network can be trained. We apply the proposed networks on natural and
biological image segmentation tasks and show superior performance is achieved
in comparison to baseline methods.
Galina Lavrentyeva, Sergey Novoselov, Konstantin Simonchik
Comments: 12 pages, 0 figures, published in Springer Communications in Computer and Information Science (CCIS) vol. 661
Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)
Growing interest in automatic speaker verification (ASV)systems has lead to
significant quality improvement of spoofing attackson them. Many research works
confirm that despite the low equal er-ror rate (EER) ASV systems are still
vulnerable to spoofing attacks. Inthis work we overview different acoustic
feature spaces and classifiersto determine reliable and robust countermeasures
against spoofing at-tacks. We compared several spoofing detection systems,
presented so far,on the development and evaluation datasets of the Automatic
SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
2015.Experimental results presented in this paper demonstrate that the useof
magnitude and phase information combination provides a substantialinput into
the efficiency of the spoofing detection systems. Also wavelet-based features
show impressive results in terms of equal error rate. Inour overview we compare
spoofing performance for systems based on dif-ferent classifiers. Comparison
results demonstrate that the linear SVMclassifier outperforms the conventional
GMM approach. However, manyresearchers inspired by the great success of deep
neural networks (DNN)approaches in the automatic speech recognition, applied
DNN in thespoofing detection task and obtained quite low EER for known and
un-known type of spoofing attacks.
Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, Vadim Shchemelinin
Comments: 11 pages, 3 figures, accepted for Specom 2017
Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)
This paper presents the Speech Technology Center (STC) replay attack
detection systems proposed for Automatic Speaker Verification Spoofing and
Countermeasures Challenge 2017. In this study we focused on comparison of
different spoofing detection approaches. These were GMM based methods, high
level features extraction with simple classifier and deep learning frameworks.
Experiments performed on the development and evaluation parts of the challenge
dataset demonstrated stable efficiency of deep learning approaches in case of
changing acoustic conditions. At the same time SVM classifier with high level
features provided a substantial input in the efficiency of the resulting STC
systems according to the fusion systems results.
Nicolas Courty, Rémi Flamary, Amaury Habrard, Alain Rakotomamonjy
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
This paper deals with the unsupervised domain adaptation problem, where one
wants to estimate a prediction function (f) in a given target domain without
any labeled sample by exploiting the knowledge available from a source domain
where labels are known. Our work makes the following assumption: there exists a
non-linear transformation between the joint feature/label space distributions
of the two domain (mathcal{P}_s) and (mathcal{P}_t). We propose a solution of
this problem with optimal transport, that allows to recover an estimated target
(mathcal{P}^f_t=(X,f(X))) by optimizing simultaneously the optimal coupling
and (f). We show that our method corresponds to the minimization of a bound on
the target error, and provide an efficient algorithmic solution, for which
convergence is proved. The versatility of our approach, both in terms of class
of hypothesis or loss functions is demonstrated with real world classification
and regression problems, for which we reach or surpass state-of-the-art
results.
Yanbo Fan, Siwei Lyu, Yiming Ying, Bao-Gang Hu
Comments: 18 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In this work, we introduce the average top-(k) (AT(_k)) loss as a new
ensemble loss for supervised learning, which is the average over the (k)
largest individual losses over a training dataset. We show that the AT(_k) loss
is a natural generalization of the two widely used ensemble losses, namely the
average loss and the maximum loss, but can combines their advantages and
mitigate their drawbacks to better adapt to different data distributions.
Furthermore, it remains a convex function over all individual losses, which can
lead to convex optimization problems that can be solved effectively with
conventional gradient-based method. We provide an intuitive interpretation of
the AT(_k) loss based on its equivalent effect on the continuous individual
loss functions, suggesting that it can reduce the penalty on correctly
classified data. We further give a learning theory analysis of MAT(_k) learning
on the classification calibration of the AT(_k) loss and the error bounds of
AT(_k)-SVM. We demonstrate the applicability of minimum average top-(k)
learning for binary classification and regression using synthetic and real
datasets.
Christos Louizos, Uri Shalit, Joris Mooij, David Sontag, Richard Zemel, Max Welling
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Learning individual-level causal effects from observational data, such as
inferring the most effective medication for a specific patient, is a problem of
growing importance for policy makers. The most important aspect of inferring
causal effects from observational data is the handling of confounders, factors
that affect both an intervention and its outcome. A carefully designed
observational study attempts to measure all important confounders. However,
even if one does not have direct access to all confounders, there may exist
noisy and uncertain measurement of proxies for confounders. We build on recent
advances in latent variable modelling to simultaneously estimate the unknown
latent space summarizing the confounders and the causal effect. Our method is
based on Variational Autoencoders (VAE) which follow the causal structure of
inference with proxies. We show our method is significantly more robust than
existing methods, and matches the state-of-the-art on previous benchmarks
focused on individual treatment effects.
Elad Hoffer, Itay Hubara, Daniel Soudry
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Background: Deep learning models are typically trained using stochastic
gradient descent or one of its variants. These methods update the weights using
their gradient, estimated from a small fraction of the training data. It has
been observed that when using large batch sizes there is a persistent
degradation in generalization performance – known as the “generalization gap”
phenomena. Identifying the origin of this gap and closing it had remained an
open problem.
Contributions: We examine the initial high learning rate training phase. We
find that the weight distance from its initialization grows logarithmically
with the number of weight updates. We therefore propose a “random walk on
random landscape” statistical model which is known to exhibit similar
“ultra-slow” diffusion behavior. Following this hypothesis we conducted
experiments to show empirically that the “generalization gap” stems from the
relatively small number of updates rather than the batch size, and can be
completely eliminated by adapting the training regime used. We further
investigate different techniques to train models in the large-batch regime and
present a novel algorithm named “Ghost Batch Normalization” which enables
significant decrease in the generalization gap without increasing the number of
updates. To validate our findings we conduct several additional experiments on
MNIST, CIFAR-10, CIFAR-100 and ImageNet. Finally, we reassess common practices
and beliefs concerning training of deep models and suggest they may not be
optimal to achieve good generalization.
Sami Remes, Markus Heinonen, Samuel Kaski
Comments: 16 pages, 5 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We propose non-stationary spectral kernels for Gaussian process regression.
We propose to model the spectral density of a non-stationary kernel function as
a mixture of input-dependent Gaussian process frequency density surfaces. We
solve the generalised Fourier transform with such a model, and present a family
of non-stationary and non-monotonic kernels that can learn input-dependent and
potentially long-range, non-monotonic covariances between inputs. We derive
efficient inference using model whitening and marginalized posterior, and show
with case studies that these kernels are necessary when modelling even rather
simple time series, image or geospatial data with non-stationary
characteristics.
Hanul Shin, Jung Kwon Lee, Jaehong Kim, Jiwon Kim
Comments: Submitted to NIPS 2017
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Attempts to train a comprehensive artificial intelligence capable of solving
multiple tasks have been impeded by a chronic problem called catastrophic
forgetting. Although simply replaying all previous data alleviates the problem,
it requires large memory and even worse, often infeasible in real world
applications where the access to past data is limited. Inspired by the
generative nature of hippocampus as a short-term memory system in primate
brain, we propose the Deep Generative Replay, a novel framework with a
cooperative dual model architecture consisting of a deep generative model
(“generator”) and a task solving model (“solver”). With only these two models,
training data for previous tasks can easily be sampled and interleaved with
those for a new task. We test our methods in several sequential learning
settings involving image classification tasks.
Christos Louizos, Karen Ullrich, Max Welling
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Compression and computational efficiency in deep learning have become a
problem of great significance. In this work, we argue that the most principled
and effective way to attack this problem is by taking a Bayesian point of view,
where through sparsity inducing priors we prune large parts of the network. We
introduce two novelties in this paper: 1) we use hierarchical priors to prune
nodes instead of individual weights, and 2) we use the posterior uncertainties
to determine the optimal fixed point precision to encode the weights. Both
factors significantly contribute to achieving the state of the art in terms of
compression rates, while still staying competitive with methods designed to
optimize for speed or energy efficiency.
Anna C. Gilbert, Yi Zhang, Kibok Lee, Yuting Zhang, Honglak Lee
Journal-ref: IJCAI 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Several recent works have empirically observed that Convolutional Neural Nets
(CNNs) are (approximately) invertible. To understand this approximate
invertibility phenomenon and how to leverage it more effectively, we focus on a
theoretical explanation and develop a mathematical model of sparse signal
recovery that is consistent with CNNs with random weights. We give an exact
connection to a particular model of model-based compressive sensing (and its
recovery algorithms) and random-weight CNNs. We show empirically that several
learned networks are consistent with our mathematical analysis and then
demonstrate that with such a simple theoretical framework, we can obtain
reasonable re- construction results on real images. We also discuss gaps
between our model assumptions and the CNN trained for classification in
practical scenarios.
Julian Katz-Samuels, Clayton Scott
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We consider the task of collaborative preference completion: given a pool of
items, a pool of users and a partially observed item-user rating matrix, the
goal is to recover the personalized ranking of each user over all of the items.
Our approach is nonparametric: we assume that each item (i) and each user (u)
have unobserved features (x_i) and (y_u), and that the associated rating is
given by (g_u(f(x_i,y_u))) where (f) is Lipschitz and (g_u) is a monotonic
transformation that depends on the user. We propose a (k)-nearest
neighbors-like algorithm and prove that it is consistent. To the best of our
knowledge, this is the first consistency result for the collaborative
preference completion problem in a nonparametric setting. Finally, we conduct
experiments on the Netflix and Movielens datasets that suggest that our
algorithm has some advantages over existing neighborhood-based methods and that
its performance is comparable to some state-of-the art matrix factorization
methods.
Aniket Anand Deshmukh, Urun Dogan, Clayton Scott
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Contextual bandits are a form of multi-armed bandit in which the agent has
access to predictive side information (known as the context) for each arm at
each time step, and have been used to model personalized news recommendation,
ad placement, and other applications. In this work, we propose a multi-task
learning framework for contextual bandit problems. Like multi-task learning in
the batch setting, the goal is to leverage similarities in contexts for
different arms so as to improve the agent’s ability to predict rewards from
contexts. We propose an upper confidence bound-based multi-task learning
algorithm for contextual bandits, establish a corresponding regret bound, and
interpret this bound to quantify the advantages of learning in the presence of
high task (arm) similarity. We also describe an effective scheme for estimating
task similarity from data, and demonstrate our algorithm’s performance on
several data sets.
Kun He, Fatih Cakir, Sarah A. Bargal, Stan Sclaroff
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We formulate the problem of supervised hashing, or learning binary embeddings
of data, as a learning to rank problem. Specifically, we optimize two common
ranking-based evaluation metrics, Average Precision (AP) and Normalized
Discounted Cumulative Gain (NDCG). Observing that ranking with the discrete
Hamming distance naturally results in ties, we propose to use tie-aware
versions of ranking metrics in both the evaluation and the learning of
supervised hashing. For AP and NDCG, we derive continuous relaxations of their
tie-aware versions, and optimize them using stochastic gradient ascent with
deep neural networks. Our results establish the new state-of-the-art for
tie-aware AP and NDCG on common hashing benchmarks.
Ankit Vani, Yacine Jernite, David Sontag
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this work, we present the Grounded Recurrent Neural Network (GRNN), a
recurrent neural network architecture for multi-label prediction which
explicitly ties labels to specific dimensions of the recurrent hidden state (we
call this process “grounding”). The approach is particularly well-suited for
extracting large numbers of concepts from text. We apply the new model to
address an important problem in healthcare of understanding what medical
concepts are discussed in clinical text. Using a publicly available dataset
derived from Intensive Care Units, we learn to label a patient’s diagnoses and
procedures from their discharge summary. Our evaluation shows a clear advantage
to using our proposed architecture over a variety of strong baselines.
Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)
Reinforcement learning is a powerful paradigm for learning optimal policies
from experimental data. However, to find optimal policies, most reinforcement
learning algorithms explore all possible actions, which may be harmful for
real-world systems. As a consequence, learning algorithms are rarely applied on
safety-critical systems in the real world. In this paper, we present a learning
algorithm that explicitly considers safety in terms of stability guarantees.
Specifically, we extend control theoretic results on Lyapunov stability
verification and show how to use statistical models of the dynamics to obtain
high-performance control policies with provable stability certificates.
Moreover, under additional regularity assumptions in terms of a Gaussian
process prior, we prove that one can effectively and safely collect data in
order to learn about the dynamics and thus both improve control performance and
expand the safe region of the state space. In our experiments, we show how the
resulting algorithm can safely optimize a neural network policy on a simulated
inverted pendulum, without the pendulum ever falling down.
Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie
Comments: MICCAI 2017 Camera Ready
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Mammogram classification is directly related to computer-aided diagnosis of
breast cancer. Traditional methods rely on regions of interest (ROIs) which
require great efforts to annotate. Inspired by the success of using deep
convolutional features for natural image analysis and multi-instance learning
(MIL) for labeling a set of instances/patches, we propose end-to-end trained
deep multi-instance networks for mass classification based on whole mammogram
without the aforementioned ROIs. We explore three different schemes to
construct deep multi-instance networks for whole mammogram classification.
Experimental results on the INbreast dataset demonstrate the robustness of
proposed networks compared to previous work using segmentation and detection
annotations.
Bowei Yan, Mingzhang Yin, Purnamrita Sarkar
Comments: 31 pages
Subjects: Statistics Theory (math.ST); Learning (cs.LG)
In this paper, we study convergence properties of the gradient
Expectation-Maximization algorithm cite{lange1995gradient} for Gaussian
Mixture Models for general number of clusters and mixing coefficients. We
derive the convergence rate depending on the mixing coefficients, minimum and
maximum pairwise distances between the true centers and dimensionality and
number of components; and obtain a near-optimal local contraction radius. While
there have been some recent notable works that derive local convergence rates
for EM in the two equal mixture symmetric GMM, in the more general case, the
derivations need structurally different and non-trivial arguments. We use
recent tools from learning theory and empirical processes to achieve our
theoretical results.
Gonzalo Diaz, Achille Fokoue, Giacomo Nannicini, Horst Samulowitz
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
A major challenge in designing neural network (NN) systems is to determine
the best structure and parameters for the network given the data for the
machine learning problem at hand. Examples of parameters are the number of
layers and nodes, the learning rates, and the dropout rates. Typically, these
parameters are chosen based on heuristic rules and manually fine-tuned, which
may be very time-consuming, because evaluating the performance of a single
parametrization of the NN may require several hours. This paper addresses the
problem of choosing appropriate parameters for the NN by formulating it as a
box-constrained mathematical optimization problem, and applying a
derivative-free optimization tool that automatically and effectively searches
the parameter space. The optimization tool employs a radial basis function
model of the objective function (the prediction accuracy of the NN) to
accelerate the discovery of configurations yielding high accuracy. Candidate
configurations explored by the algorithm are trained to a small number of
epochs, and only the most promising candidates receive full training. The
performance of the proposed methodology is assessed on benchmark sets and in
the context of predicting drug-drug interactions, showing promising results.
The optimization tool used in this paper is open-source.
Cuong V. Nguyen, Lam Si Tung Ho, Huan Xu, Vu Dinh, Binh Nguyen
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We study pool-based active learning with abstention feedbacks, where a
labeler can abstain from labeling a queried example. We take a Bayesian
approach to the problem and propose a general framework that learns both the
target classification problem and the unknown abstention pattern at the same
time. As specific instances of the framework, we develop two useful greedy
algorithms with theoretical guarantees: they respectively achieve the
({(1-frac{1}{e})}) factor approximation of the optimal expected or worst-case
value of a useful utility function. Our experiments show the algorithms perform
well in various practical scenarios.
Yuan Cao, Yonglin Cao
Subjects: Information Theory (cs.IT)
For any prime number (p), positive integers (m, k, n) satisfying ({
m
gcd}(p,n)=1) and (lambda_0in mathbb{F}_{p^m}^ imes), we prove that any
(lambda_0^{p^k})-constacyclic code of length (p^kn) over the finite field
(mathbb{F}_{p^m}) is monomially equivalent to a matrix-product code of a
nested sequence of (p^k) (lambda_0)-constacyclic codes with length (n) over
(mathbb{F}_{p^m}).
Soo-Chang Pei, Shih-Gu Huang
Comments: Accepted by IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT)
An adaptive time-frequency representation (TFR) with higher energy
concentration usually requires higher complexity. Recently, a low-complexity
adaptive short-time Fourier transform (ASTFT) based on the chirp rate has been
proposed. To enhance the performance, this method is substantially modified in
this paper: i) because the wavelet transform used for instantaneous frequency
(IF) estimation is not signal-dependent, a low-complexity ASTFT based on a
novel concentration measure is addressed; ii) in order to increase robustness
to IF estimation error, the principal component analysis (PCA) replaces the
difference operator for calculating the chirp rate; and iii) a more robust
Gaussian kernel with time-frequency-varying window width is proposed.
Simulation results show that our method has higher energy concentration than
the other ASTFTs, especially for multicomponent signals and nonlinear FM
signals. Also, for IF estimation, our method is superior to many other adaptive
TFRs in low signal-to-noise ratio (SNR) environments.
Boya Di, Lingyang Song, Yonghui Li, Zhu Han
Comments: Accepted by IEEE Wireless Communications Magazine
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Benefited from the widely deployed infrastructure, the LTE network has
recently been considered as a promising candidate to support the
vehicle-to-everything (V2X) services. However, with a massive number of devices
accessing the V2X network in the future, the conventional OFDM-based LTE
network faces the congestion issues due to its low efficiency of orthogonal
access, resulting in significant access delay and posing a great challenge
especially to safety-critical applications. The non-orthogonal multiple access
(NOMA) technique has been well recognized as an effective solution for the
future 5G cellular networks to provide broadband communications and massive
connectivity. In this article, we investigate the applicability of NOMA in
supporting cellular V2X services to achieve low latency and high reliability.
Starting with a basic V2X unicast system, a novel NOMA-based scheme is proposed
to tackle the technical hurdles in designing high spectral efficient scheduling
and resource allocation schemes in the ultra dense topology. We then extend it
to a more general V2X broadcasting system. Other NOMA-based extended V2X
applications and some open issues are also discussed.
Sven Puchinger, Sven Müelich, Martin Bossert
Comments: 9 pages, extended version of a paper submitted to the International Workshop on Optimal Codes and Related Topics, 2017
Subjects: Information Theory (cs.IT)
In this paper, we derive analytic expressions for the success probability of
decoding (Partial) Unit Memory codes in memoryless channels. An applications of
this result is that these codes outperform individual block codes in certain
channels.
Italo Atzeni, Marco Maso, Imène Ghamnia, Ejder Baştuğ, Mérouane Debbah
Comments: 5 pages, 5 figures, to be presented at 18th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC’2017), Sapporo, Japan, 2017
Subjects: Information Theory (cs.IT)
Caching at the edge is a promising technique to cope with the increasing data
demand in wireless networks. This paper analyzes the performance of cellular
networks consisting of a tier macro-cell wireless backhaul nodes overlaid with
a tier of cache-aided small cells. We consider both static and dynamic
association policies for content delivery to the user terminals and analyze
their performance. In particular, we derive closed-form expressions for the
area spectral efficiency and the energy efficiency, which are used to optimize
relevant design parameters such as the density of cache-aided small cells and
the storage size. By means of this approach, we are able to draw useful design
insights for the deployment of highly performing cache-aided tiered networks.
Wei Bao, He Chen, Yonghui Li, Branka Vucetic
Comments: Accepted to appear in IEEE Journal on Selected Areas in Communications (JSAC)
Subjects: Information Theory (cs.IT)
This paper investigates the optimal resource allocation of a downlink
non-orthogonal multiple access (NOMA) system consisting of one base station and
multiple users. Unlike existing short-term NOMA designs that focused on the
resource allocation for only the current transmission timeslot, we aim to
maximize a long-term network utility by jointly optimizing the data rate
control at the network layer and the power allocation among multiple users at
the physical layer, subject to practical constraints on both the short-term and
long-term power consumptions. To solve this problem, we leverage the
recently-developed Lyapunov optimization framework to convert the original
long-term optimization problem into a series of online rate control and power
allocation problems in each timeslot. The power allocation problem, however, is
shown to be non-convex in nature and thus cannot be solved with a standard
method. However, we explore two structures of the optimal solution and develop
a dynamic programming based power allocation algorithm, which can derive a
globally optimal solution, with a polynomial computational complexity.
Extensive simulation results are provided to evaluate the performance of the
proposed joint rate control and power allocation framework for NOMA systems,
which demonstrate that the proposed NOMA design can significantly outperform
multiple benchmark schemes, including orthogonal multiple access (OMA) schemes
with optimal power allocation and NOMA schemes with non-optimal power
allocation, in terms of average throughput and data delay.
Linqi Song, Sundara Rajan Srinivasavaradhan, Christina Fragouli
Subjects: Information Theory (cs.IT)
In wireless distributed computing, networked nodes perform intermediate
computations over data placed in their memory and exchange these intermediate
values to calculate function values. In this paper we consider an asymmetric
setting where each node has access to a random subset of the data, i.e., we
cannot control the data placement. The paper makes a simple point: we can
realize significant benefits if we are allowed to be “flexible”, and decide
which node computes which function, in our system. We make this argument in the
case where each function depends on only two of the data messages, as is the
case in similarity searches. We establish a percolation in the behavior of the
system, where, depending on the amount of observed data, by being flexible, we
may need no communication at all.
Dawei Ding, Dmitri S. Pavlichin, Mark M. Wilde
Comments: 29 pages
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
Communication over a noisy channel is often conducted in a setting in which
different input symbols to the channel incur a certain cost. For example, for
the additive white Gaussian noise channel, the cost associated with a real
number input symbol is the square of its magnitude. In such a setting, it is
often useful to know the maximum amount of information that can be reliably
transmitted per cost incurred. This is known as the capacity per unit cost. In
this paper, we generalize the capacity per unit cost to various communication
tasks involving a quantum channel; in particular, we consider classical
communication, entanglement-assisted classical communication, private
communication, and quantum communication. For each task, we define the
corresponding capacity per unit cost and derive a formula for it via the
expression for the capacity per channel use. Furthermore, for the special case
in which there is a zero-cost quantum state, we obtain expressions for the
various capacities per unit cost in terms of an optimized relative entropy
involving the zero-cost state. For each communication task, we construct an
explicit pulse-position-modulation coding scheme that achieves the capacity per
unit cost. Finally, we compute capacities per unit cost for various quantum
Gaussian channels.
Shuaiwen Wang, Haolei Weng, Arian Maleki
Comments: 63 pages, 10 figures
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)
We study the problem of variable selection for linear models under the high
dimensional asymptotic setting, where the number of observations n grows at the
same rate as the number of predictors p. We consider two stage variable
selection techniques (TVS) in which the first stage uses bridge estimators to
obtain an estimate of the regression coefficients, and the second stage simply
thresholds the regression coefficients estimate to select the “important”
predictors. The asymptotic false discovery proportion (AFDP) and true positive
proportion (ATPP) of these TVS are evaluated. We prove that for a fixed ATTP,
in order to obtain the smallest AFDP one should pick an estimator that
minimizes the asymptotic mean square error in the first stage of TVS. This
simple observation enables us to evaluate and compare the performances of
different TVS with each other and with some standard variable selection
techniques, such as LASSO and Sure Independence Screening. For instance, we
prove that a TVS with LASSO in its first stage can outperform LASSO (only one
stage) in a large range of ATTP. Furthermore, we will show that for large
values of noise, a TVS with ridge in its first stage outperforms TVS with other
bridge estimators including the one that has LASSO in its first stage.