IT博客汇 | arXiv Paper Daily: Wed, 23 Nov 2016

arXiv Paper Daily: Wed, 23 Nov 2016

我爱机器学习(52ml.net)发表于 2016-11-23 00:00:00

Neural and Evolutionary Computing

Recurrent Neural Networks With Limited Numerical Precision

Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio
Comments: NIPS 2016 EMDNN Workshop paper, condensed version of arXiv:1608.06902
Subjects: Neural and Evolutionary Computing (cs.NE)

Recurrent Neural Networks (RNNs) produce state-of-art performance on many
machine learning tasks but their demand on resources in terms of memory and
computational power are often high. Therefore, there is a great interest in
optimizing the computations performed with these models especially when
considering development of specialized low-power hardware for deep networks.
One way of reducing the computational needs is to limit the numerical precision
of the network weights and biases, and this will be addressed for the case of
RNNs. We present results from the use of different stochastic and deterministic
reduced precision training methods applied to two major RNN types, which are
then tested on three datasets. The results show that the stochastic and
deterministic ternarization, pow2- ternarization, and exponential quantization
methods gave rise to low-precision RNNs that produce similar and even higher
accuracy on certain datasets, therefore providing a path towards training more
efficient implementations of RNNs in specialized hardware.

Deep Learning Approximation for Stochastic Control Problems

Jiequn Han, E Weinan
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Machine Learning (stat.ML)

Many real world stochastic control problems suffer from the “curse of
dimensionality”. To overcome this difficulty, we develop a deep learning
approach that directly solves high-dimensional stochastic control problems
based on Monte-Carlo sampling. We approximate the time-dependent controls as
feedforward neural networks and stack these networks together through model
dynamics. The objective function for the control problem plays the role of the
loss function for the deep neural network. We test this approach using examples
from the areas of optimal trading and energy storage. Our results suggest that
the algorithm presented here achieves satisfactory accuracy and at the same
time, can handle rather high dimensional problems.

Computer Vision and Pattern Recognition

Scene Labeling using Recurrent Neural Networks with Explicit Long Range Contextual Dependency

Qiangui Huang, Weiyue Wang, Kevin Zhou, Suya You, Ulrich Neumann
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Spatial contextual dependencies are crucial for scene labeling problems.
Recurrent neural network (RNN) is one of state-of-the-art methods for modeling
contextual dependencies. However, RNNs are fundamentally designed for
sequential data, not spatial data. This work shows that directly applying
traditional RNN architectures, which unfold a 2D lattice grid into a sequence,
is not sufficient to model structure dependencies in images due to the “impact
vanishing” problem. A new RNN unit with Explicit Long-range Conditioning
(RNN-ELC) is designed to overcome this problem. Based on this new RNN-ELC unit,
a novel neural network architecture is built for scene labeling tasks. This
architecture achieves state-of-the-art performances on several standard scene
labeling datasets. Comprehensive experiments demonstrate that scene labeling
tasks benefit a lot from the explicit long range contextual dependencies
encoded in our algorithm.

Smart Library: Identifying Books in a Library using Richly Supervised Deep Scene Text Reading

Xiao Yang, Dafang He, Wenyi Huang, Zihan Zhou, Alex Ororbia, Dan Kifer, C. Lee Giles
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Physical library collections are valuable and long standing resources for
knowledge and learning. However, managing books in a large bookshelf and
finding books on it often leads to tedious manual work, especially for large
book collections where books might be missing or misplaced. Recently, deep
neural models, such as Convolutional Neural Networks (CNN) and Recurrent Neural
Networks (RNN) have achieved great success for scene text detection and
recognition. Motivated by these recent successes, we aim to investigate their
viability in facilitating book management, a task that introduces further
challenges including large amounts of cluttered scene text, distortion, and
varied lighting conditions. In this paper, we present a library inventory
building and retrieval system based on scene text reading methods. We
specifically design our scene text recognition model using rich supervision to
accelerate training and achieve state-of-the-art performance on several
benchmark datasets. Our proposed system has the potential to greatly reduce the
amount of human labor required in managing book inventories as well as the
space needed to store book information.

PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI

Amir Alansary, Bernhard Kainz, Martin Rajchl, Maria Murgasova, Mellisa Damodaram, David F.A. Lloyd, Alice Davidson, Steven G. McDonagh, Mary Rutherford, Joseph V. Hajnal, Daniel Rueckert
Comments: 10 pages, 13 figures, submitted to IEEE Transactions on Medical Imaging
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we present a novel method for the correction of motion
artifacts that are present in fetal Magnetic Resonance Imaging (MRI) scans of
the whole uterus. Contrary to current slice-to-volume registration (SVR)
methods, requiring an inflexible anatomical enclosure of a single investigated
organ, the proposed patch-to-volume reconstruction (PVR) approach is able to
reconstruct a large field of view of non-rigidly deforming structures. It
relaxes rigid motion assumptions by introducing a specific amount of redundant
information that is exploited with parallelized patch-wise optimization,
super-resolution, and automatic outlier rejection. We further describe and
provide an efficient parallel implementation of PVR allowing its execution
within reasonable time on commercially available graphics processing units
(GPU), enabling its use in the clinical practice. We evaluate PVR’s
computational overhead compared to standard methods and observe improved
reconstruction accuracy in presence of affine motion artifacts of approximately
30% compared to conventional SVR in synthetic experiments. Furthermore, we have
evaluated our method qualitatively and quantitatively on real fetal MRI data
subject to maternal breathing and sudden fetal movements. We evaluate
peak-signal-to-noise ratio (PSNR), structural similarity index (SSIM), and
cross correlation (CC) with respect to the originally acquired data and provide
a method for visual inspection of reconstruction uncertainty. With these
experiments we demonstrate successful application of PVR motion compensation to
the whole uterus, the human fetus, and the human placenta.

Active learning with version spaces for object detection

Soumya Roy, Vinay P. Namboodiri, Arijit Biswas
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Given an image, we would like to learn to detect objects belonging to
particular object categories. Common object detection methods train on large
annotated datasets which are annotated in terms of bounding boxes that contain
the object of interest. Previous works on object detection model the problem as
a structured regression problem which ranks the correct bounding boxes more
than the background ones. In this paper we develop algorithms which actively
obtain annotations from human annotators for a small set of images, instead of
all images, thereby reducing the annotation effort. Towards this goal, we make
the following contributions: 1. We develop a principled version space based
active learning method that solves for object detection as a structured
prediction problem in a weakly supervised setting 2. We derive the relation
between our proposed method with other active learning techniques such as
maximum model change 3. Additionally, we propose two variants of the
margin-based querying strategy 4. We analyse the results on standard object
detection benchmarks that show that with only 20% of the data we can obtain
more than 95% of the localization accuracy of full supervision. Our methods
outperform random sampling and the classical uncertainty-based active learning
algorithms like entropy

Deep Single and Direct Multi-View Depth Fusion

José M. Fácil, Alejo Concha, Luis Montesano, Javier Civera
Comments: Submitted to the International Conference on Robotics and Automation (ICRA) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Dense 3D mapping from a monocular sequence is a key technology for several
applications and still a research problem. This paper leverages recent results
on single-view CNN-based depth estimation and fuses them with direct multi-view
depth estimation. Both approaches present complementary strengths. Multi-view
depth estimation is highly accurate but only in high-texture and high-parallax
cases. Single-view depth captures the local structure of mid-level regions,
including textureless areas, but the estimated depth lacks global coherence.

The single and multi-view fusion we propose has several challenges. First,
both depths are related by a non-rigid deformation that depends on the image
content. And second, the selection of multi-view points of high accuracy might
be difficult for low-parallax configurations. We present contributions for both
problems. Our results in the public datasets of NYU and TUM shows that our
algorithm outperforms the individual single and multi-view approaches.

CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression

Lukas Cavigelli, Pascal Hager, Luca Benini
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Information Retrieval (cs.IR); Multimedia (cs.MM)

Lossy image compression algorithms are pervasively used to reduce the size of
images transmitted over the web and recorded on data storage media. However, we
pay for their high compression rate with visual artifacts degrading the user
experience. Deep convolutional neural networks have become a widespread tool to
address high-level computer vision tasks very successfully. Recently, they have
found their way into the areas of low-level computer vision and image
processing to solve regression problems mostly with relatively shallow
networks.

We present a novel 12-layer deep convolutional network for image compression
artifact suppression with hierarchical skip connections and a multi-scale loss
function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an
improvement of up to 0.36 dB over the best previous ConvNet result. We show
that a network trained for a specific quality factor (QF) is resilient to the
QF used to compress the input image – a single network trained for QF 60
provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.

A Spatial and Temporal Non-Local Filter Based Data Fusion

Qing Cheng, Huiqing Liu, Huanfeng Shen, Penghai Wu, Liangpei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The trade-off in remote sensing instruments that balances the spatial
resolution and temporal frequency limits our capacity to monitor spatial and
temporal dynamics effectively. The spatiotemporal data fusion technique is
considered as a cost-effective way to obtain remote sensing data with both high
spatial resolution and high temporal frequency, by blending observations from
multiple sensors with different advantages or characteristics. In this paper,
we develop the spatial and temporal non-local filter based fusion model
(STNLFFM) to enhance the prediction capacity and accuracy, especially for
complex changed landscapes. The STNLFFM method provides a new transformation
relationship between the fine-resolution reflectance images acquired from the
same sensor at different dates with the help of coarse-resolution reflectance
data, and makes full use of the high degree of spatiotemporal redundancy in the
remote sensing image sequence to produce the final prediction. The proposed
method was tested over both the Coleambally Irrigation Area study site and the
Lower Gwydir Catchment study site. The results show that the proposed method
can provide a more accurate and robust prediction, especially for heterogeneous
landscapes and temporally dynamic areas.

Object detection can be improved using human-derived contextual expectations

Harish Katti, Marius V. Peelen, S. P. Arun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

Each object in the world occurs in a specific context: cars are seen on
highways but not in forests. Contextual information is generally thought to
facilitate computation by constraining locations to search. But can knowing
context yield tangible benefits in object detection? For it to do so, scene
context needs to be learned independently from target features. However this is
impossible in traditional object detection where classifiers are trained on
images containing both target features and surrounding coarse scene features.
In contrast, we humans have the opportunity to learn context and target
features separately, such as when we see highways without cars. Here we show
for the first time that human-derived scene expectations can be used to improve
object detection performance in machines. To measure these expectations, we
asked human subjects to indicate the scale, location and likelihood at which
targets may occur on scenes containing no targets. Humans showed highly
systematic expectations that we could accurately predict using scene features.
We then augmented state-of-the-art object detectors (based on deep neural
networks) with these human-derived expectations on novel scenes to produce a
significant (1-3%) improvement in detecting cars and people in scenes. This
improvement was due to low-confidence detector matches being correctly
relabeled as targets when they occurred in likely scenes.

Recurrent Attention Models for Depth-Based Person Identification

Albert Haque, Alexandre Alahi, Li Fei-Fei
Comments: Computer Vision and Pattern Recognition (CVPR) 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present an attention-based model that reasons on human body shape and
motion dynamics to identify individuals in the absence of RGB information,
hence in the dark. Our approach leverages unique 4D spatio-temporal signatures
to address the identification problem across days. Formulated as a
reinforcement learning task, our model is based on a combination of
convolutional and recurrent neural networks with the goal of identifying small,
discriminative regions indicative of human identity. We demonstrate that our
model produces state-of-the-art results on several published datasets given
only depth images. We further study the robustness of our model towards
viewpoint, appearance, and volumetric changes. Finally, we share insights
gleaned from interpretable 2D, 3D, and 4D visualizations of our model’s
spatio-temporal attention.

Exploiting Web Images for Dataset Construction: A Domain Robust Approach

Yazhou Yao, Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, Zhenmin Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Labelled image datasets have played a critical role in high-level image
understanding; however the process of manual labelling is both time-consuming
and labor intensive. To reduce the cost of manual labelling, there has been
increased research interest in automatically constructing image datasets by
exploiting web images. Datasets constructed by existing methods tend to have a
weak domain adaptation ability, which is known as the “dataset bias problem”.
To address this issue, we present a novel image dataset construction framework
which can be generalized well to unseen target domains. In specific, the given
queries are first expanded by searching in the Google Books Ngrams Corpus to
obtain a richer semantic description, from which the visually non-salient and
less relevant expansions are filtered out. By treating each unfiltered
expansion as a “bag” and the retrieved images as “instances”, image selection
can be formulated as a multi-instance learning problem with constrained
positive bags. We propose to solve the employed problems by the cutting-plane
and concave-convex procedure (CCCP) algorithm. Using this approach, images from
different distributions will be retained while noisy images will be filtered
out. To verify the effectiveness of our proposed approach, we build a
domain-robust image dataset with 20 categories, which we refer to as DRID-20.
We compare DRID-20 with three publicly available datasets STL-10, CIFAR-10 and
ImageNet. The experimental results confirm the effectiveness of our dataset in
terms of image classification ability, cross-dataset generalization ability and
dataset diversity. We further run object detection on PASCAL VOC 2007 using our
data, and the results demonstrate the superiority of our method to the weakly
supervised and web-supervised state-of-the-art detection methods.

Learning Multi-level Deep Representations for Image Emotion Classification

Tianrong Rao, Min Xu, Dong Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a new deep network that learns multi-level deep
representations for image emotion classification (MldrNet). Image emotion can
be recognized through image semantics, image aesthetics and low-level visual
features from both global and local views. Existing image emotion
classification works using hand-crafted features or deep features mainly focus
on either low-level visual features or semantic-level image representations
without taking all factors into consideration. Our proposed MldrNet unifies
deep representations of three levels, i.e. image semantics, image aesthetics
and low-level visual features through multiple instance learning (MIL) in order
to effectively cope with noisy labeled data, such as images collected from the
Internet. Extensive experiments on both Internet images and abstract paintings
demonstrate the proposed method outperforms the state-of-the-art methods using
deep features or hand-crafted features. The proposed approach also outperforms
the state-of-the-art methods with at least 6% performance improvement in terms
of overall classification accuracy.

Learning Multi-level Features For Sensor-based Human Action Recognition

Yan Xu, Zhengyang Shen, Xin Zhang, Yifan Gao, Shujian Deng, Yipei Wang, Yubo Fan, Eric I-Chao Chang
Comments: 28 pages, 23 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a multi-level feature learning framework for human action
recognition using body-worn inertial sensors. The framework consists of three
phases, respectively designed to analyze signal-based (low-level), components
(mid-level) and semantic (high-level) information. Low-level features,
extracted from raw signals, capture the time and frequency domain property
while mid-level representations, obtained through the dictionary learning
method, learn the composition of the action. The Max-margin Latent Pattern
Learning (MLPL) method is proposed and implemented on the concatenation of low-
and mid-level features to learn high-level semantic descriptions of latent
action patterns as the output of our framework. Various experiments on Opp,
Skoda and WISDM datasets show that the semantic feature learned by this
framework possesses higher representation ability than low- and mid-level
features. Compared with existing methods, the proposed method achieves
state-of-the-art performances.

Cascaded Neural Networks with Selective Classifiers and its evaluation using Lung X-ray CT Images

Masaharu Sakamoto, Hiroki Nakano
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Lung nodule detection is a class imbalanced problem because nodules are found
with much lower frequency than non-nodules. In the class imbalanced problem,
conventional classifiers tend to be overwhelmed by the majority class and
ignore the minority class. We therefore propose cascaded convolutional neural
networks to cope with the class imbalanced problem. In the proposed approach,
cascaded convolutional neural networks that perform as selective classifiers
filter out obvious non-nodules. Successively, a convolutional neural network
trained with a balanced data set calculates nodule probabilities. The proposed
method achieved the detection sensitivity of 85.3% and 90.7% at 1 and 4 false
positives per scan in FROC curve, respectively.

Max-Margin Deep Generative Models for (Semi-)Supervised Learning

Chongxuan Li, Jun Zhu, Bo Zhang
Comments: arXiv admin note: substantial text overlap with arXiv:1504.06787
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Deep generative models (DGMs) are effective on learning multilayered
representations of complex data and performing inference of input data by
exploring the generative ability. However, it is relatively insufficient to
empower the discriminative ability of DGMs on making accurate predictions. This
paper presents max-margin deep generative models (mmDGMs) and a
class-conditional variant (mmDCGMs), which explore the strongly discriminative
principle of max-margin learning to improve the predictive performance of DGMs
in both supervised and semi-supervised learning, while retaining the generative
capability. In semi-supervised learning, we use the predictions of a max-margin
classifier as the missing labels instead of performing full posterior inference
for efficiency; we also introduce additional max-margin and label-balance
regularization terms of unlabeled data for effectiveness. We develop an
efficient doubly stochastic subgradient algorithm for the piecewise linear
objectives in different settings. Empirical results on various datasets
demonstrate that: (1) max-margin learning can significantly improve the
prediction performance of DGMs and meanwhile retain the generative ability; (2)
in supervised learning, mmDGMs are competitive to the best fully discriminative
networks when employing convolutional neural networks as the generative and
recognition models; and (3) in semi-supervised learning, mmDCGMs can perform
efficient inference and achieve state-of-the-art classification results on
several benchmarks.

Inducing Interpretable Representations with Variational Autoencoders

N. Siddharth, Brooks Paige, Alban Desmaison, Jan-Willem Van de Meent, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We develop a framework for incorporating structured graphical models in the
emph{encoders} of variational autoencoders (VAEs) that allows us to induce
interpretable representations through approximate variational inference. This
allows us to both perform reasoning (e.g. classification) under the structural
constraints of a given graphical model, and use deep generative models to deal
with messy, high-dimensional domains where it is often difficult to model all
the variation. Learning in this framework is carried out end-to-end with a
variational objective, applying to both unsupervised and semi-supervised
schemes.

Grad-CAM: Why did you say that?

Ramprasaath R Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems. This is an extended abstract version of arXiv:1610.02391
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We propose a technique for making Convolutional Neural Network (CNN)-based
models more transparent by visualizing input regions that are ‘important’ for
predictions — or visual explanations. Our approach, called Gradient-weighted
Class Activation Mapping (Grad-CAM), uses class-specific gradient information
to localize important regions. These localizations are combined with existing
pixel-space visualizations to create a novel high-resolution and
class-discriminative visualization called Guided Grad-CAM. These methods help
better understand CNN-based models, including image captioning and visual
question answering (VQA) models. We evaluate our visual explanations by
measuring their ability to discriminate between classes, to inspire trust in
humans, and their correlation with occlusion maps. Grad-CAM provides a new way
to understand CNN-based models.

We have released code, an online demo hosted on CloudCV, and a full version
of this extended abstract.

3D Image Reconstruction from X-Ray Measurements with Overlap

Maria Klodt, Raphael Hauser
Comments: Published in Computer Vision – ECCV 2016. The final publication is available at link.springer.com/chapter/10.1007/978-3-319-46466-4_2
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)

3D image reconstruction from a set of X-ray projections is an important image
reconstruction problem, with applications in medical imaging, industrial
inspection and airport security. The innovation of X-ray emitter arrays allows
for a novel type of X-ray scanners with multiple simultaneously emitting
sources. However, two or more sources emitting at the same time can yield
measurements from overlapping rays, imposing a new type of image reconstruction
problem based on nonlinear constraints. Using traditional linear reconstruction
methods, respective scanner geometries have to be implemented such that no rays
overlap, which severely restricts the scanner design. We derive a new type of
3D image reconstruction model with nonlinear constraints, based on measurements
with overlapping X-rays. Further, we show that the arising optimization problem
is partially convex, and present an algorithm to solve it. Experiments show
highly improved image reconstruction results from both simulated and real-world
measurements.

Geometry of 3D Environments and Sum of Squares Polynomials

Amir Ali Ahmadi, Georgina Hall, Ameesh Makadia, Vikas Sindhwani
Subjects: Optimization and Control (math.OC); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Motivated by applications in robotics and computer vision, we study problems
related to spatial reasoning of a 3D environment using sublevel sets of
polynomials. These include: tightly containing a cloud of points (e.g.,
representing an obstacle) with convex or nearly-convex basic semialgebraic
sets, computation of Euclidean distances between two such sets, separation of
two convex basic semalgebraic sets that overlap, and tight containment of the
union of several basic semialgebraic sets with a single convex one. We use
algebraic techniques from sum of squares optimization that reduce all these
tasks to semidefinite programs of small size and present numerical experiments
in realistic scenarios.

Distributable Consistent Multi-Graph Matching

Nan Hu, Boris Thibert, Leonidas Guibas
Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV)

In this paper we propose an optimization-based framework to multiple graph
matching. The framework takes as input maps computed between pairs of graphs,
and outputs maps that 1) are consistent among all pairs of graphs, and 2)
preserve edge connectivity between pairs of graphs. We show how to formulate
this as solving a piece-wise low-rank matrix recovery problem using a
generalized message passing scheme. We also present necessary and sufficient
conditions under which such global consistency is guaranteed. The key feature
of our approach is that it is scalable to large datasets, while still produce
maps whose quality is competent against state-of-the-art global
optimization-based techniques.

Artificial Intelligence

An unexpected unity among methods for interpreting model predictions

Scott Lundberg, Su-In Lee
Comments: Short extended abstract
Subjects: Artificial Intelligence (cs.AI)

Understanding why a model made a certain prediction is crucial in many data
science fields. Interpretable predictions engender appropriate trust and
provide insight into how the model may be improved. However, with large modern
datasets the best accuracy is often achieved by complex models even experts
struggle to interpret, which creates a tension between accuracy and
interpretability. Recently, several methods have been proposed for interpreting
predictions from complex models by estimating the importance of input features.
Here, we present how a model-agnostic additive representation of the importance
of input features unifies current methods. This representation is optimal, in
the sense that it is the only set of additive values that satisfies important
properties. We show how we can leverage these properties to create novel visual
explanations of model predictions. The thread of unity that this representation
weaves through the literature indicates that there are common principles to be
learned about the interpretation of model predictions that apply in many
scenarios.

A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

Felix Leibfried, Nate Kushman, Katja Hofmann
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Reinforcement learning is concerned with learning to interact with
environments that are initially unknown. State-of-the-art reinforcement
learning approaches, such as DQN, are model-free and learn to act effectively
across a wide range of environments such as Atari games, but require huge
amounts of data. Model-based techniques are more data-efficient, but need to
acquire explicit knowledge about the environment dynamics or the reward
structure.

In this paper we take a step towards using model-based techniques in
environments with high-dimensional visual state space when system dynamics and
the reward structure are both unknown and need to be learned, by demonstrating
that it is possible to learn both jointly. Empirical evaluation on five Atari
games demonstrate accurate cumulative reward prediction of up to 200 frames. We
consider these positive results as opening up important directions for
model-based RL in complex, initially unknown environments.

Variational Intrinsic Control

Karol Gregor, Danilo Jimenez Rezende, Daan Wierstra
Comments: 15 pages, 6 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

In this paper, we present a fast cascaded regression for face alignment, via
a novel local feature. Our proposed local lightweight feature, namely intimacy
definition feature (IDF), is more discriminative than landmark shape-indexed
feature, more efficient than the handcrafted scale-invariant feature transform
(SIFT) feature, and more compact than the local binary feature (LBF).
Experimental results show that our approach achieves state-of-the-art
performance, when tested on the most challenging benchmarks. Compared with an
LBF-based algorithm, our method is able to obtain about two times the speed-up
and more than 20% improvement, in terms of alignment error measurement, and
able to save an order of magnitude of memory requirement.

Deep Learning Approximation for Stochastic Control Problems

Jiequn Han, E Weinan
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Machine Learning (stat.ML)

Many real world stochastic control problems suffer from the “curse of
dimensionality”. To overcome this difficulty, we develop a deep learning
approach that directly solves high-dimensional stochastic control problems
based on Monte-Carlo sampling. We approximate the time-dependent controls as
feedforward neural networks and stack these networks together through model
dynamics. The objective function for the control problem plays the role of the
loss function for the deep neural network. We test this approach using examples
from the areas of optimal trading and energy storage. Our results suggest that
the algorithm presented here achieves satisfactory accuracy and at the same
time, can handle rather high dimensional problems.

Randomized Mechanisms for Selling Reserved Instances in Cloud

Jia Zhang, Weidong Ma, Tao Qin, Xiaoming Sun, Tie-Yan Liu
Comments: Already accepted by AAAI’17
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

Selling reserved instances (or virtual machines) is a basic service in cloud
computing. In this paper, we consider a more flexible pricing model for
instance reservation, in which a customer can propose the time length and
number of resources of her request, while in today’s industry, customers can
only choose from several predefined reservation packages. Under this model, we
design randomized mechanisms for customers coming online to optimize social
welfare and providers’ revenue.

We first consider a simple case, where the requests from the customers do not
vary too much in terms of both length and value density. We design a randomized
mechanism that achieves a competitive ratio (frac{1}{42}) for both
emph{social welfare} and emph{revenue}, which is a improvement as there is
usually no revenue guarantee in previous works such as
cite{azar2015ec,wang2015selling}. This ratio can be improved up to
(frac{1}{11}) when we impose a realistic constraint on the maximum number of
resources used by each request. On the hardness side, we show an upper bound
(frac{1}{3}) on competitive ratio for any randomized mechanism. We then extend
our mechanism to the general case and achieve a competitive ratio
(frac{1}{42log klog T}) for both social welfare and revenue, where (T) is
the ratio of the maximum request length to the minimum request length and (k)
is the ratio of the maximum request value density to the minimum request value
density. This result outperforms the previous upper bound (frac{1}{CkT}) for
deterministic mechanisms cite{wang2015selling}. We also prove an upper bound
(frac{2}{log 8kT}) for any randomized mechanism. All the mechanisms we
provide are in a greedy style. They are truthful and easy to be integrated into
practical cloud systems.

Limbo: A Fast and Flexible Library for Bayesian Optimization

Antoine Cully, Konstantinos Chatzilygeroudis, Federico Allocati, Jean-Baptiste Mouret
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)

Limbo is an open-source C++11 library for Bayesian optimization which is
designed to be both highly flexible and very fast. It can be used to optimize
functions for which the gradient is unknown, evaluations are expensive, and
runtime cost matters (e.g., on embedded systems or robots). Benchmarks on
standard functions show that Limbo is about 2 times faster than BayesOpt
(another C++ library) for a similar accuracy.

CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression

Lossy image compression algorithms are pervasively used to reduce the size of
images transmitted over the web and recorded on data storage media. However, we
pay for their high compression rate with visual artifacts degrading the user
experience. Deep convolutional neural networks have become a widespread tool to
address high-level computer vision tasks very successfully. Recently, they have
found their way into the areas of low-level computer vision and image
processing to solve regression problems mostly with relatively shallow
networks.

We present a novel 12-layer deep convolutional network for image compression
artifact suppression with hierarchical skip connections and a multi-scale loss
function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an
improvement of up to 0.36 dB over the best previous ConvNet result. We show
that a network trained for a specific quality factor (QF) is resilient to the
QF used to compress the input image – a single network trained for QF 60
provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.

Interpreting Finite Automata for Sequential Data

Christian Albert Hammerschmidt, Qin Lin, Sicco Verwer, Radu State
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

Automaton models are often seen as interpretable models. Interpretability
itself is not well defined: it remains unclear what interpretability means
without first explicitly specifying objectives or desired attributes. In this
paper, we identify the key properties used to interpret automata and propose a
modification of a state-merging approach to learn variants of finite state
automata. We apply the approach to problems beyond typical grammar inference
tasks. Additionally, we cover several use-cases for prediction, classification,
and clustering on sequential data in both supervised and unsupervised scenarios
to show how the identified key properties are applicable in a wide range of
contexts.

An Efficient Training Algorithm for Kernel Survival Support Vector Machines

Sebastian Pölsterl, Nassir Navab, Amin Katouzian
Comments: ECML PKDD MLLS 2016: 3rd Workshop on Machine Learning in Life Sciences
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Survival analysis is a fundamental tool in medical research to identify
predictors of adverse events and develop systems for clinical decision support.
In order to leverage large amounts of patient data, efficient optimisation
routines are paramount. We propose an efficient training algorithm for the
kernel survival support vector machine (SSVM). We directly optimise the primal
objective function and employ truncated Newton optimisation and order statistic
trees to significantly lower computational costs compared to previous training
algorithms, which require (O(n^4)) space and (O(p n^6)) time for datasets with
(n) samples and (p) features. Our results demonstrate that our proposed
optimisation scheme allows analysing data of a much larger scale with no loss
in prediction performance. Experiments on synthetic and 5 real-world datasets
show that our technique outperforms existing kernel SSVM formulations if the
amount of right censoring is high ((geq85\%)), and performs comparably
otherwise.

Associative Adversarial Networks

Tarik Arici, Asli Celikyilmaz
Comments: NIPS 2016 Workshop on Adversarial Training
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We propose a higher-level associative memory for learning adversarial
networks. Generative adversarial network (GAN) framework has a discriminator
and a generator network. The generator (G) maps white noise (z) to data samples
while the discriminator (D) maps data samples to a single scalar. To do so, G
learns how to map from high-level representation space to data space, and D
learns to do the opposite. We argue that higher-level representation spaces
need not necessarily follow a uniform probability distribution. In this work,
we use Restricted Boltzmann Machines (RBMs) as a higher-level associative
memory and learn the probability distribution for the high-level features
generated by D. The associative memory samples its underlying probability
distribution and G learns how to map these samples to data space. The proposed
associative adversarial networks (AANs) are generative models in the
higher-levels of the learning, and use adversarial non-stochastic models D and
G for learning the mapping between data and higher-level representation spaces.
Experiments show the potential of the proposed networks.

Information Retrieval

CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression

Lossy image compression algorithms are pervasively used to reduce the size of
images transmitted over the web and recorded on data storage media. However, we
pay for their high compression rate with visual artifacts degrading the user
experience. Deep convolutional neural networks have become a widespread tool to
address high-level computer vision tasks very successfully. Recently, they have
found their way into the areas of low-level computer vision and image
processing to solve regression problems mostly with relatively shallow
networks.

We present a novel 12-layer deep convolutional network for image compression
artifact suppression with hierarchical skip connections and a multi-scale loss
function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an
improvement of up to 0.36 dB over the best previous ConvNet result. We show
that a network trained for a specific quality factor (QF) is resilient to the
QF used to compress the input image – a single network trained for QF 60
provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.

A Natural Language Query Interface for Searching Personal Information on Smartwatches

Reza Rawassizadeh, Chelsea Dobbins, Manouchehr Nourizadeh, Zahra Ghamchili, Michael Pazzani
Comments: 6 pages, 4 figures, 2 tables
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Information Retrieval (cs.IR)

Currently, personal assistant systems, run on smartphones and use natural
language interfaces. However, these systems rely mostly on the web for finding
information. Mobile and wearable devices can collect an enormous amount of
contextual personal data such as sleep and physical activities. These
information objects and their applications are known as quantified-self, mobile
health or personal informatics, and they can be used to provide a deeper
insight into our behavior. To our knowledge, existing personal assistant
systems do not support all types of quantified-self queries. In response to
this, we have undertaken a user study to analyze a set of “textual
questions/queries” that users have used to search their quantified-self or
mobile health data. Through analyzing these questions, we have constructed a
light-weight natural language based query interface, including a text parser
algorithm and a user interface, to process the users’ queries that have been
used for searching quantified-self information. This query interface has been
designed to operate on small devices, i.e. smartwatches, as well as augmenting
the personal assistant systems by allowing them to process end users’ natural
language queries about their quantified-self data.

Leveraging Citation Networks to Visualize Scholarly Influence Over Time

Jason Portenoy, Jessica Hullman, Jevin D. West
Subjects: Human-Computer Interaction (cs.HC); Digital Libraries (cs.DL); Information Retrieval (cs.IR)

Assessing the influence of a scholar’s work is an important task for funding
organizations, academic departments, and researchers. Common methods, such as
measures of citation counts, can ignore much of the nuance and
multidimensionality of scholarly influence. We present an approach for
generating dynamic narrative visualizations of scholars’ careers. This approach
uses an animated node-link diagram showing the citation network accumulated
around the researcher over the course of the career in concert with key
indicators, highlighting influence both within and across fields. We developed
our design in collaboration with one funding organization—the Pew Biomedical
Scholars program—but the methods are generalizable to visualizations of
scholarly influence in general. We applied the design method to the Microsoft
Academic Graph, which includes more than 120 million publications. We validate
our abstractions throughout the process through collaboration with the Pew
Biomedical Scholars program officers and summative evaluations with their
scholars.

Computation and Language

Compositional Learning of Relation Paths Embedding for Knowledge Base Completion

Xixun Lin, Yanchun Liang, Renchu Guan
Comments: 9 Pages,1 figure
Subjects: Computation and Language (cs.CL)

Nowadays, large-scale knowledge bases containing billions of facts have
reached impressive sizes; however, they are still far from completion. In
addition, most existing methods only consider the direct links between
entities, ignoring the vital impact about the semantic of relation paths. In
this paper, we study the problem of how to better embed entities and relations
into different low dimensional spaces. A compositional learning model of
relation paths embedding (RPE) is proposed to take full advantage of additional
semantic information expressed by relation paths. More specifically, using
corresponding projection matrices, RPE can simultaneously embed entities into
corresponding relation and path spaces. It is also suggested that type
constraints could be extended from traditional relation-specific to the new
proposed path-specific ones. Both of the two type constraints can be seamlessly
incorporated into RPE and decrease the errors in prediction. Experiments are
conducted on the benchmark datasets and the proposed model achieves significant
and consistent improvements compared with the state-of-the-art algorithms for
knowledge base completion.

Learning to Distill: The Essence Vector Modeling Framework

Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang
Subjects: Computation and Language (cs.CL)

In the context of natural language processing, representation learning has
emerged as a newly active research subject because of its excellent performance
in many applications. Learning representations of words is a pioneering study
in this school of research. However, paragraph (or sentence and document)
embedding learning is more suitable/reasonable for some tasks, such as
sentiment classification and document summarization. Nevertheless, as far as we
are aware, there is relatively less work focusing on the development of
unsupervised paragraph embedding methods. Classic paragraph embedding methods
infer the representation of a given paragraph by considering all of the words
occurring in the paragraph. Consequently, those stop or function words that
occur frequently may mislead the embedding learning process to produce a misty
paragraph representation. Motivated by these observations, our major
contributions in this paper are twofold. First, we propose a novel unsupervised
paragraph embedding method, named the essence vector (EV) model, which aims at
not only distilling the most representative information from a paragraph but
also excluding the general background information to produce a more informative
low-dimensional vector representation for the paragraph. Second, in view of the
increasing importance of spoken content processing, an extension of the EV
model, named the denoising essence vector (D-EV) model, is proposed. The D-EV
model not only inherits the advantages of the EV model but also can infer a
more robust representation for a given spoken paragraph against imperfect
speech recognition.

An Experimental Comparison of Deep Neural Networks for End-to-end Speech Recognition

Zewang Zhang, Zheng Sun, Jiaqi Liu, Jingwen Chen, Zhao Huo, Xiao Zhang
Comments: 10pages, 13figures
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Performance of end-to-end automatic speech recognition (ASR) systems can
significantly be improved by the increasing large speech corpus and deeper
neural network. Given the arising problem of training speed and recent success
of deep convolutional neural network in ASR, we build a novel deep recurrent
convolutional network for acoustic modeling and apply deep residual learning
framework to it, our experiments show that it has not only faster convergence
speed but better recognition accuracy over traditional deep convolutional
recurrent network. We mainly compare convergence speed of two acoustic models,
which are novel deep recurrent convolutional networks and traditional deep
convolutional recurrent networks. With faster convergence speed, our novel deep
recurrent convolutional networks can reach the comparable performance. We
further show that applying deep residual learning can boost both convergence
speed and recognition accuracy of our novel recurret convolutional networks.
Finally, we evaluate all our experimental networks by phoneme error rate (PER)
with newly proposed bidirectional statistical language model. Our evaluation
results show that our model applied with deep residual learning can reach the
best PER of 17.33% with fastest convergence speed in TIMIT database.

A Natural Language Query Interface for Searching Personal Information on Smartwatches

Currently, personal assistant systems, run on smartphones and use natural
language interfaces. However, these systems rely mostly on the web for finding
information. Mobile and wearable devices can collect an enormous amount of
contextual personal data such as sleep and physical activities. These
information objects and their applications are known as quantified-self, mobile
health or personal informatics, and they can be used to provide a deeper
insight into our behavior. To our knowledge, existing personal assistant
systems do not support all types of quantified-self queries. In response to
this, we have undertaken a user study to analyze a set of “textual
questions/queries” that users have used to search their quantified-self or
mobile health data. Through analyzing these questions, we have constructed a
light-weight natural language based query interface, including a text parser
algorithm and a user interface, to process the users’ queries that have been
used for searching quantified-self information. This query interface has been
designed to operate on small devices, i.e. smartwatches, as well as augmenting
the personal assistant systems by allowing them to process end users’ natural
language queries about their quantified-self data.

Distributed, Parallel, and Cluster Computing

Can Broken Multicore Hardware be Mended?

János Végh
Comments: 3 figures; a Viewpoint
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)

A suggestion is made for mending multicore hardware, which has been diagnosed
as broken.

Time and Space Optimal Counting in Population Protocols

James Aspnes (YALE), Joffroy Beauquier (LRI), Janna Burman (LRI), Devan Sohier
Journal-ref: 20th International Conference on Principles of Distributed
Systems, OPODIS 2015, Dec 2016, Madrid, Spain. 2016, 20th International
Conference on Principles of Distributed Systems, OPODIS 2015
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

This work concerns the general issue of combined optimality in terms of time
and space complexity. In this context, we study the problem of (exact) counting
resource-limited and passively mobile nodes in the model of population
protocols, in which the space complexity is crucial. The counted nodes are
memory-limited anonymous devices (called agents) communicating asynchronously
in pairs (according to a fairness condition). Moreover, we assume that these
agents are prone to failures so that they cannot be correctly initialized. This
study considers two classical fairness conditions, and for each we investigate
the issue of time optimality of counting given the optimal space per agent. In
the case of randomly interacting agents (probabilistic fairness), as usual, the
convergence time is measured in terms of parallel time (or parallel
interactions), which is defined as the number of pairwise interactions until
convergence, divided by n (the number of agents). In case of weak fairness,
where it is only required that every pair of agents interacts infinitely often,
the convergence time is defined in terms of non-null transitions, i.e, the
transitions that affect the states of the interacting agents.First, assuming
probabilistic fairness, we present a “non-guessing” time optimal protocol of
O(n log n) expected time given an optimal space of only one bit, and we prove
the time optimality of this protocol. Then, for weak fairness, we show that a
space optimal (semi-uniform) solution cannot converge faster than in
(Omega)(2^n) time (non-null transitions). This result, together with the time
complexity analysis of an already known space optimal protocol, shows that it
is also optimal in time (given the optimal space constrains).

Fast and Energy-Efficient CNN Inference on IoT Devices

Mohammad Motamedi, Daniel Fong, Soheil Ghiasi
Comments: 7 pages, 10 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Convolutional Neural Networks (CNNs) exhibit remarkable performance in
various machine learning tasks. As sensor-equipped internet of things (IoT)
devices permeate into every aspect of modern life, it is increasingly important
to run CNN inference, a computationally intensive application, on resource
constrained devices. We present a technique for fast and energy-efficient CNN
inference on mobile SoC platforms, which are projected to be a major player in
the IoT space. We propose techniques for efficient parallelization of CNN
inference targeting mobile GPUs, and explore the underlying tradeoffs.
Experiments with running Squeezenet on three different mobile devices confirm
the effectiveness of our approach. For further study, please refer to the
project repository available on our GitHub page:
this https URL

pocl: A Performance-Portable OpenCL Implementation

Pekka Jääskeläinen, Carlos Sánchez de La Lama, Erik Schnetter, Kalle Raiskila, Jarmo Takala, Heikki Berg
Comments: This article was published in 2015; it is now openly accessible via arxiv
Journal-ref: Int J Parallel Prog (2015) 43: 752
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

OpenCL is a standard for parallel programming of heterogeneous systems. The
benefits of a common programming standard are clear; multiple vendors can
provide support for application descriptions written according to the standard,
thus reducing the program porting effort. While the standard brings the obvious
benefits of platform portability, the performance portability aspects are
largely left to the programmer. The situation is made worse due to multiple
proprietary vendor implementations with different characteristics, and, thus,
required optimization strategies.

In this paper, we propose an OpenCL implementation that is both portable and
performance portable. At its core is a kernel compiler that can be used to
exploit the data parallelism of OpenCL programs on multiple platforms with
different parallel hardware styles. The kernel compiler is modularized to
perform target-independent parallel region formation separately from the
target-specific parallel mapping of the regions to enable support for various
styles of fine-grained parallel resources such as subword SIMD extensions, SIMD
datapaths and static multi-issue. Unlike previous similar techniques that work
on the source level, the parallel region formation retains the information of
the data parallelism using the LLVM IR and its metadata infrastructure. This
data can be exploited by the later generic compiler passes for efficient
parallelization.

The proposed open source implementation of OpenCL is also platform portable,
enabling OpenCL on a wide range of architectures, both already commercialized
and on those that are still under research. The paper describes how the
portability of the implementation is achieved. Our results show that most of
the benchmarked applications when compiled using pocl were faster or close to
as fast as the best proprietary OpenCL implementation for the platform at hand.

Call Trace and Memory Access Pattern based Runtime Insider Threat Detection for Big Data Platforms

Santosh Aditham, Nagarajan Ranganathan, Srinivas Katkoori
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Big data platforms such as Hadoop and Spark are being widely adopted both by
academia and industry. In this paper, we propose a runtime intrusion detection
technique that understands and works according to the properties of such
distributed compute platforms. The proposed method is based on runtime analysis
of system and library calls and memory access patterns of tasks running on the
datanodes (slaves). First, the primary datanode of a big data system creates a
behavior profile for every task it executes. A behavior profile includes (a)
trace of the system & library calls made, and (b) sequence representing the
sizes of private and shared memory accesses made during task execution. Then,
the process behavior profile is shared with other replica datanodes that are
scheduled to execute the same task on their copy of the same data. Next, these
replica datanodes verify their local tasks with the help of the information
embedded in the received behavior profiles. This is realized in two steps: (i)
comparing the system & library calls metadata, and (ii) statistical matching of
the memory access patterns. Finally, datanodes share their observations for
consensus and report an intrusion to the namenode (master) if they find any
discrepancy. The proposed solution was tested on a small hadoop cluster using
the default MapReduce examples and the results show that our approach can
detect insider attacks that cannot be detected with the traditional analysis
metrics.

Learning

A causal framework for discovering and removing direct and indirect discrimination

Lu Zhang (1), Yongkai Wu (1), Xintao Wu (1) ((1) University of Arkansas)
Subjects: Learning (cs.LG)

Anti-discrimination is an increasingly important task in data science. In
this paper, we investigate the problem of discovering both direct and indirect
discrimination from the historical data, and removing the discriminatory
effects before the data is used for predictive analysis (e.g., building
classifiers). We make use of the causal network to capture the causal structure
of the data. Then we model direct and indirect discrimination as the
path-specific effects, which explicitly distinguish the two types of
discrimination as the causal effects transmitted along different paths in the
network. Based on that, we propose an effective algorithm for discovering
direct and indirect discrimination, as well as an algorithm for precisely
removing both types of discrimination while retaining good data utility.
Different from previous works, our approaches can ensure that the predictive
models built from the modified data will not incur discrimination in decision
making. Experiments using real datasets show the effectiveness of our
approaches.

Variational Intrinsic Control

Karol Gregor, Danilo Jimenez Rezende, Daan Wierstra
Comments: 15 pages, 6 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

In this paper we introduce a new unsupervised reinforcement learning method
for discovering the set of intrinsic options available to an agent. This set is
learned by maximizing the number of different states an agent can reliably
reach, as measured by the mutual information between the set of options and
option termination states. To this end, we instantiate two policy gradient
based algorithms, one that creates an explicit embedding space of options and
one that represents options implicitly. The algorithms also provide an explicit
measure of empowerment in a given state that can be used by an empowerment
maximizing agent. The algorithm scales well with function approximation and we
demonstrate the applicability of the algorithm on a range of tasks.

Singularity of the Hessian in Deep Learning

Levent Sagun, Leon Bottou, Yann LeCun
Comments: ICLR 2017 Submission on Nov 4, 2016
Subjects: Learning (cs.LG)

We look at the eigenvalues of the Hessian of a loss function before and after
training. The eigenvalue distribution is seen to be composed of two parts, the
bulk which is concentrated around zero, and the edges which are scattered away
from zero. We present empirical evidence for the bulk indicating how
over-parametrized the system is, and for the edges indicating the complexity of
the input data.

Achieving non-discrimination in data release

Lu Zhang (1), Yongkai Wu (1), Xintao Wu (1) ((1) University of Arkansas)
Subjects: Learning (cs.LG)

Discrimination discovery and prevention/removal are increasingly important
tasks in data mining. Discrimination discovery aims to unveil discriminatory
practices on the protected attribute (e.g., gender) by analyzing the dataset of
historical decision records, and discrimination prevention aims to remove
discrimination by modifying the biased data before conducting predictive
analysis. In this paper, we show that the key to discrimination discovery and
prevention is to find the meaningful partitions that can be used to provide
quantitative evidences for the judgment of discrimination. With the support of
the causal graph, we present a graphical condition for identifying a meaningful
partition. Based on that, we develop a simple criterion for the claim of
non-discrimination, and propose discrimination removal algorithms which
accurately remove discrimination while retaining good data utility. Experiments
using real datasets show the effectiveness of our approaches.

Deep Learning Approximation for Stochastic Control Problems

Jiequn Han, E Weinan
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Machine Learning (stat.ML)

Many real world stochastic control problems suffer from the “curse of
dimensionality”. To overcome this difficulty, we develop a deep learning
approach that directly solves high-dimensional stochastic control problems
based on Monte-Carlo sampling. We approximate the time-dependent controls as
feedforward neural networks and stack these networks together through model
dynamics. The objective function for the control problem plays the role of the
loss function for the deep neural network. We test this approach using examples
from the areas of optimal trading and energy storage. Our results suggest that
the algorithm presented here achieves satisfactory accuracy and at the same
time, can handle rather high dimensional problems.

Limbo: A Fast and Flexible Library for Bayesian Optimization

Antoine Cully, Konstantinos Chatzilygeroudis, Federico Allocati, Jean-Baptiste Mouret
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)

Limbo is an open-source C++11 library for Bayesian optimization which is
designed to be both highly flexible and very fast. It can be used to optimize
functions for which the gradient is unknown, evaluations are expensive, and
runtime cost matters (e.g., on embedded systems or robots). Benchmarks on
standard functions show that Limbo is about 2 times faster than BayesOpt
(another C++ library) for a similar accuracy.

Correlation Clustering with Low-Rank Matrices

Nate Veldt, Anthony Wirth, David F. Gleich
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Numerical Analysis (cs.NA)

Correlation clustering is a technique for aggregating data based on
qualitative information about which pairs of objects are labeled ‘similar’ or
‘dissimilar.’ Because the optimization problem is NP-hard, much of the previous
literature focuses on finding approximation algorithms. In this paper we
explore a new approach to correlation clustering by considering how to solve
the problem when the data to be clustered can be represented by a low-rank
matrix. Many real-world datasets are known to be inherently low-dimensional,
and our goal is to establish a tractable approach to correlation clustering in
this important setting. We prove in particular that correlation clustering can
be solved in polynomial time when the underlying matrix is positive
semidefinite with small constant rank, but that the task remains NP-hard in the
presence of even one negative eigenvalue. Based on our theoretical results, we
develop an algorithm for efficiently solving low-rank positive semidefinite
correlation clustering by employing a procedure for zonotope vertex
enumeration. We demonstrate the effectiveness and speed of our algorithm by
using it to solve several clustering problems on both synthetic and real-world
data.

Risk-Sensitive Learning and Pricing for Demand Response

Kia Khezeli, Eilyan Bitar
Subjects: Learning (cs.LG); Optimization and Control (math.OC)

We consider the setting in which an electric power utility seeks to curtail
its peak electricity demand by offering a fixed group of customers a uniform
price for reductions in consumption relative to their predetermined baselines.
The underlying demand curve, which describes the aggregate reduction in
consumption in response to the offered price, is assumed to be affine and
subject to unobservable random shocks. Assuming that both the parameters of the
demand curve and the distribution of the random shocks are initially unknown to
the utility, we investigate the extent to which the utility might dynamically
adjust its offered prices to maximize its cumulative risk-sensitive payoff over
a finite number of (T) days. In order to do so effectively, the utility must
design its pricing policy to balance the tradeoff between the need to learn the
unknown demand model (exploration) and maximize its payoff (exploitation) over
time. In this paper, we propose such a pricing policy, which is shown to
exhibit an expected payoff loss over (T) days that is at most (O(sqrt{T})),
relative to an oracle pricing policy that knows the underlying demand model.
Moreover, the proposed pricing policy is shown to yield a sequence of prices
that converge to the oracle optimal prices in the mean square sense.

An Efficient Training Algorithm for Kernel Survival Support Vector Machines

Survival analysis is a fundamental tool in medical research to identify
predictors of adverse events and develop systems for clinical decision support.
In order to leverage large amounts of patient data, efficient optimisation
routines are paramount. We propose an efficient training algorithm for the
kernel survival support vector machine (SSVM). We directly optimise the primal
objective function and employ truncated Newton optimisation and order statistic
trees to significantly lower computational costs compared to previous training
algorithms, which require (O(n^4)) space and (O(p n^6)) time for datasets with
(n) samples and (p) features. Our results demonstrate that our proposed
optimisation scheme allows analysing data of a much larger scale with no loss
in prediction performance. Experiments on synthetic and 5 real-world datasets
show that our technique outperforms existing kernel SSVM formulations if the
amount of right censoring is high ((geq85\%)), and performs comparably
otherwise.

Inducing Interpretable Representations with Variational Autoencoders

We develop a framework for incorporating structured graphical models in the
emph{encoders} of variational autoencoders (VAEs) that allows us to induce
interpretable representations through approximate variational inference. This
allows us to both perform reasoning (e.g. classification) under the structural
constraints of a given graphical model, and use deep generative models to deal
with messy, high-dimensional domains where it is often difficult to model all
the variation. Learning in this framework is carried out end-to-end with a
variational objective, applying to both unsupervised and semi-supervised
schemes.

Can Co-robots Learn to Teach?

Harshal Maske, Emily Kieson, Girish Chowdhary, Charles Abramson
Comments: 9 pages, conference
Subjects: Robotics (cs.RO); Learning (cs.LG)

We explore beyond existing work on learning from demonstration by asking the
question: Can robots learn to teach?, that is, can a robot autonomously learn
an instructional policy from expert demonstration and use it to instruct or
collaborate with humans in executing complex tasks in uncertain environments?
In this paper we pursue a solution to this problem by leveraging the idea that
humans often implicitly decompose a higher level task into several subgoals
whose execution brings the task closer to completion. We propose Dirichlet
process based non-parametric Inverse Reinforcement Learning (DPMIRL) approach
for reward based unsupervised clustering of task space into subgoals. This
approach is shown to capture the latent subgoals that a human teacher would
have utilized to train a novice. The notion of action primitive is introduced
as the means to communicate instruction policy to humans in the least
complicated manner, and as a computationally efficient tool to segment
demonstration data. We evaluate our approach through experiments on hydraulic
actuated scaled model of an excavator and evaluate and compare different
teaching strategies utilized by the robot.

Grad-CAM: Why did you say that?

We propose a technique for making Convolutional Neural Network (CNN)-based
models more transparent by visualizing input regions that are ‘important’ for
predictions — or visual explanations. Our approach, called Gradient-weighted
Class Activation Mapping (Grad-CAM), uses class-specific gradient information
to localize important regions. These localizations are combined with existing
pixel-space visualizations to create a novel high-resolution and
class-discriminative visualization called Guided Grad-CAM. These methods help
better understand CNN-based models, including image captioning and visual
question answering (VQA) models. We evaluate our visual explanations by
measuring their ability to discriminate between classes, to inspire trust in
humans, and their correlation with occlusion maps. Grad-CAM provides a new way
to understand CNN-based models.

We have released code, an online demo hosted on CloudCV, and a full version
of this extended abstract.

TreeView: Peeking into Deep Neural Networks Via Feature-Space Partitioning

Jayaraman J. Thiagarajan, Bhavya Kailkhura, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

With the advent of highly predictive but opaque deep learning models, it has
become more important than ever to understand and explain the predictions of
such models. Existing approaches define interpretability as the inverse of
complexity and achieve interpretability at the cost of accuracy. This
introduces a risk of producing interpretable but misleading explanations. As
humans, we are prone to engage in this kind of behavior cite{mythos}. In this
paper, we take a step in the direction of tackling the problem of
interpretability without compromising the model accuracy. We propose to build a
Treeview representation of the complex model via hierarchical partitioning of
the feature space, which reveals the iterative rejection of unlikely class
labels until the correct association is predicted.

Variational Graph Auto-Encoders

Thomas N. Kipf, Max Welling
Comments: Bayesian Deep Learning Workshop (NIPS 2016)
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We introduce the variational graph auto-encoder (VGAE), a framework for
unsupervised learning on graph-structured data based on the variational
auto-encoder (VAE). This model makes use of latent variables and is capable of
learning interpretable latent representations for undirected graphs. We
demonstrate this model using a graph convolutional network (GCN) encoder and a
simple inner product decoder. Our model achieves competitive results on a link
prediction task in citation networks. In contrast to most existing models for
unsupervised learning on graph-structured data and link prediction, our model
can naturally incorporate node features, which significantly improves
predictive performance on a number of benchmark datasets.

Investigating the influence of noise and distractors on the interpretation of neural networks

Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, Sven Dähne
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Understanding neural networks is becoming increasingly important. Over the
last few years different types of visualisation and explanation methods have
been proposed. However, none of them explicitly considered the behaviour in the
presence of noise and distracting elements. In this work, we will show how
noise and distracting dimensions can influence the result of an explanation
model. This gives a new theoretical insights to aid selection of the most
appropriate explanation model within the deep-Taylor decomposition framework.

An Experimental Comparison of Deep Neural Networks for End-to-end Speech Recognition

Zewang Zhang, Zheng Sun, Jiaqi Liu, Jingwen Chen, Zhao Huo, Xiao Zhang
Comments: 10pages, 13figures
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Performance of end-to-end automatic speech recognition (ASR) systems can
significantly be improved by the increasing large speech corpus and deeper
neural network. Given the arising problem of training speed and recent success
of deep convolutional neural network in ASR, we build a novel deep recurrent
convolutional network for acoustic modeling and apply deep residual learning
framework to it, our experiments show that it has not only faster convergence
speed but better recognition accuracy over traditional deep convolutional
recurrent network. We mainly compare convergence speed of two acoustic models,
which are novel deep recurrent convolutional networks and traditional deep
convolutional recurrent networks. With faster convergence speed, our novel deep
recurrent convolutional networks can reach the comparable performance. We
further show that applying deep residual learning can boost both convergence
speed and recognition accuracy of our novel recurret convolutional networks.
Finally, we evaluate all our experimental networks by phoneme error rate (PER)
with newly proposed bidirectional statistical language model. Our evaluation
results show that our model applied with deep residual learning can reach the
best PER of 17.33% with fastest convergence speed in TIMIT database.

Fast and Energy-Efficient CNN Inference on IoT Devices

Mohammad Motamedi, Daniel Fong, Soheil Ghiasi
Comments: 7 pages, 10 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Convolutional Neural Networks (CNNs) exhibit remarkable performance in
various machine learning tasks. As sensor-equipped internet of things (IoT)
devices permeate into every aspect of modern life, it is increasingly important
to run CNN inference, a computationally intensive application, on resource
constrained devices. We present a technique for fast and energy-efficient CNN
inference on mobile SoC platforms, which are projected to be a major player in
the IoT space. We propose techniques for efficient parallelization of CNN
inference targeting mobile GPUs, and explore the underlying tradeoffs.
Experiments with running Squeezenet on three different mobile devices confirm
the effectiveness of our approach. For further study, please refer to the
project repository available on our GitHub page:
this https URL

Max-Margin Deep Generative Models for (Semi-)Supervised Learning

Deep generative models (DGMs) are effective on learning multilayered
representations of complex data and performing inference of input data by
exploring the generative ability. However, it is relatively insufficient to
empower the discriminative ability of DGMs on making accurate predictions. This
paper presents max-margin deep generative models (mmDGMs) and a
class-conditional variant (mmDCGMs), which explore the strongly discriminative
principle of max-margin learning to improve the predictive performance of DGMs
in both supervised and semi-supervised learning, while retaining the generative
capability. In semi-supervised learning, we use the predictions of a max-margin
classifier as the missing labels instead of performing full posterior inference
for efficiency; we also introduce additional max-margin and label-balance
regularization terms of unlabeled data for effectiveness. We develop an
efficient doubly stochastic subgradient algorithm for the piecewise linear
objectives in different settings. Empirical results on various datasets
demonstrate that: (1) max-margin learning can significantly improve the
prediction performance of DGMs and meanwhile retain the generative ability; (2)
in supervised learning, mmDGMs are competitive to the best fully discriminative
networks when employing convolutional neural networks as the generative and
recognition models; and (3) in semi-supervised learning, mmDCGMs can perform
efficient inference and achieve state-of-the-art classification results on
several benchmarks.

A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

Felix Leibfried, Nate Kushman, Katja Hofmann
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Reinforcement learning is concerned with learning to interact with
environments that are initially unknown. State-of-the-art reinforcement
learning approaches, such as DQN, are model-free and learn to act effectively
across a wide range of environments such as Atari games, but require huge
amounts of data. Model-based techniques are more data-efficient, but need to
acquire explicit knowledge about the environment dynamics or the reward
structure.

In this paper we take a step towards using model-based techniques in
environments with high-dimensional visual state space when system dynamics and
the reward structure are both unknown and need to be learned, by demonstrating
that it is possible to learn both jointly. Empirical evaluation on five Atari
games demonstrate accurate cumulative reward prediction of up to 200 frames. We
consider these positive results as opening up important directions for
model-based RL in complex, initially unknown environments.

The Recycling Gibbs Sampler for Efficient Learning

Luca Martino, Victor Elvira, Gustau Camps-Valls
Comments: The MATLAB code of the numerical examples is provided at this http URL
Subjects: Computation (stat.CO); Learning (cs.LG); Machine Learning (stat.ML)

Monte Carlo methods are essential tools for Bayesian inference. Gibbs
sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively
used in signal processing, machine learning, and statistics, employed to draw
samples from complicated high-dimensional posterior distributions. The key
point for the successful application of the Gibbs sampler is the ability to
draw efficiently samples from the full-conditional probability density
functions. Since in the general case this is not possible, in order to speed up
the convergence of the chain, it is required to generate auxiliary samples
whose information is eventually disregarded. In this work, we show that these
auxiliary samples can be recycled within the Gibbs estimators, improving their
efficiency with no extra cost. This novel scheme arises naturally after
pointing out the relationship between the standard Gibbs sampler and the chain
rule used for sampling purposes. Numerical simulations involving simple and
real inference problems confirm the excellent performance of the proposed
scheme in terms of accuracy and computational efficiency. In particular we give
empirical evidence of performance in a toy example, inference of Gaussian
processes hyperparameters, and learning dependence graphs through regression.

Measuring Sample Quality with Diffusions

Jack Gorham, Andrew B. Duncan, Sebastian J. Vollmer, Lester Mackey
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Probability (math.PR)

Standard Markov chain Monte Carlo diagnostics, like effective sample size,
are ineffective for biased sampling procedures that sacrifice asymptotic
correctness for computational speed. Recent work addresses this issue for a
class of strongly log-concave target distributions by constructing a computable
discrepancy measure based on Stein’s method that provably determines
convergence to the target. We generalize this approach to cover any target with
a fast-coupling Ito diffusion by bounding the derivatives of Stein equation
solutions in terms of Markov process coupling rates. As example applications,
we develop computable and convergence-determining diffusion Stein discrepancies
for log-concave, heavy-tailed, and multimodal targets and use these quality
measures to select the hyperparameters of biased samplers, compare random and
deterministic quadrature rules, and quantify bias-variance tradeoffs in
approximate Markov chain Monte Carlo. Our explicit multivariate Stein factor
bounds may be of independent interest.

Information Theory

Non-Orthogonal Multiple Access (NOMA) for Downlink Multiuser MIMO Systems: User Clustering, Beamforming, and Power Allocation

Md Shipon Ali, Ekram Hossain, Dong In Kim
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

We investigate the application of non-orthogonal multiple access (NOMA) with
successive interference cancellation (SIC) in downlink multiuser multiple-input
multiple-output (MIMO) cellular systems, where the total number of receive
antennas at user equipment (UE) ends in a cell is more than the number of
transmit antennas at the base station (BS). We first dynamically group the UE
receive antennas into a number of clusters equal to or more than the number of
BS transmit antennas. A single beamforming vector is then shared by all the
receive antennas in a cluster. We propose a linear beamforming technique in
which all the receive antennas can significantly cancel the inter-cluster
interference. On the other hand, the receive antennas in each cluster are
scheduled on power domain NOMA basis with SIC at the receiver ends. For
inter-cluster and intra-cluster power allocation, we provide dynamic power
allocation solutions with an objective to maximizing the overall cell capacity.
An extensive performance evaluation is carried out for the proposed MIMO-NOMA
system and the results are compared with those for conventional orthogonal
multiple access (OMA)-based MIMO systems and other existing MIMO-NOMA
solutions. The numerical results quantify the capacity gain of the proposed
MIMO-NOMA model over MIMO-OMA and other existing MIMO-NOMA solutions.

Efficient Feedback Mechanisms for FDD Massive MIMO under User-level Cooperation

Junting Chen, Haifan Yin, Laura Cottatellucci, David Gesbert
Comments: 13 pages
Subjects: Information Theory (cs.IT)

Channel state information (CSI) feedback is a challenging issue in frequency
division multiplexing (FDD) massive MIMO systems. This paper studies a
cooperative feedback scheme, where the users first exchange their CSI with each
other by exploiting device-to-device (D2D) communications, then compute the
precoder by themselves, and feed back the precoder to the base station (BS).
Analytical results are derived to show that the cooperative precoder feedback
is more efficient than the CSI feedback in terms of interference mitigation.
Under the constraint of limited D2D communication capacity, we develop an
adaptive CSI exchange strategy based on signal subspace projection and optimal
bit partition. Numerical results demonstrate that the proposed cooperative
precoder feedback scheme with adaptive CSI exchange significantly outperforms
the CSI feedback scheme, even when the CSI is exchanged via rate-limited D2D
communications.

What to Expect When You Are Expecting on the Grassmannian

Armin Eftekhari, Laura Balzano, Michael B. Wakin
Subjects: Information Theory (cs.IT)

Consider an incoming sequence of vectors, all belonging to an unknown
subspace (operatorname{S}), and each with many missing entries. In order to
estimate (operatorname{S}), it is common to partition the data into blocks and
iteratively update the estimate of (operatorname{S}) with each new incoming
measurement block.

In this paper, we investigate a rather basic question: Is it possible to
identify (operatorname{S}) by averaging the column span of the partially
observed incoming measurement blocks on the Grassmannian?

We find that in general the span of the incoming blocks is in fact a biased
estimator of (operatorname{S}) when data suffers from erasures, and we find an
upper bound for this bias. We reach this conclusion by examining the defining
optimization program for the Fr'{e}chet expectation on the Grassmannian, and
with the aid of a sharp perturbation bound and standard large deviation
results.

Comparing entropy rates on finite and infinite rooted trees with length functions

Thomas Hirschler, Wolfgang Woess
Subjects: Information Theory (cs.IT); Probability (math.PR)

We consider stochastic processes with (or without) memory whose evolution is
encoded by a finite or infinite rooted tree. The main goal is to compare the
entropy rates of a given base process and a second one, to be considered as a
perturbation of the former. The processes are described by probability measures
on the boundary of the given tree, and by corresponding forward transition
probabilities at the inner nodes. The comparison is in terms of
Kullback-Leibler divergence. We elaborate and extend ideas and results of
B”ocherer and Amjad. Our extensions involve length functions on the edges of
the tree as well as nodes with countably many successors. In the last part, we
consider trees with infinite geodesic rays and random perturbations of a given
process.

New bounds of permutation codes under Hamming metric and Kendall's (τ)-metric

Xin Wang, Yiwei Zhang, Yiting Yang, Gennian Ge
Subjects: Information Theory (cs.IT)

Permutation codes are widely studied objects due to their numerous
applications in various areas, such as power line communications, block
ciphers, and the rank modulation scheme for flash memories. Several kinds of
metrics are considered for permutation codes according to their specific
applications. This paper concerns some improvements on the bounds of
permutation codes under Hamming metric and Kendall’s ( au)-metric
respectively, using mainly a graph coloring approach. Specifically, under
Hamming metric, we improve the Gilbert-Varshamov bound asymptotically by a
factor (n), when the minimum Hamming distance (d) is fixed and the code length
(n) goes to infinity. Under Kendall’s ( au)-metric, we narrow the gap between
the known lower bounds and upper bounds. Besides, we also obtain some sporadic
results under Kendall’s ( au)-metric for small parameters.

Distance verification for classical and quantum LDPC codes

Ilya Dumer, Alexey A. Kovalev, Leonid P. Pryadko
Comments: 16 pages, 2 figures
Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)

The techniques of distance verification known for general linear codes are
re-applied to quantum stabilizer codes. Then distance verification is addressed
for classical and quantum LDPC codes. New complexity bounds for distance
verification with provable performance are derived using the average weight
spectra of the ensembles of LDPC codes. These bounds are expressed in terms of
the erasure-correcting capacity of the corresponding ensemble. We also present
a new irreducible-cluster technique that can be applied to any LDPC code and
takes advantage of parity-checks’ sparsity for both classical and quantum LDPC
codes. This technique reduces complexity exponents of all existing
deterministic techniques designed for generic stabilizer codes with small
relative distances, which also include all known families of quantum LDPC
codes.

Outage Effective Capacity of Buffer-Aided Diamond Relay Systems Using HARQ with Incremental Redundancy

Deli Qiao
Subjects: Information Theory (cs.IT)

In this paper, transmission over buffer-aided diamond relay systems under
statistical quality of service (QoS) constraints is studied. The statistical
QoS constraints are imposed as limitations on delay violation probabilities. In
the absence of channel state information (CSI) at the transmitter, truncated
hybrid automatic repeat request-incremental redundancy (HARQ-IR) is
incorporated to make better use of the wireless channel and the resources for
each communication link. The packets that cannot be successfully received upon
the maximum number of transmissions will be removed from buffer, i.e., outage
occurs. The emph{outage effective capacity} of a communication link is defined
as the maximum constant arrival rate to the source that can be supported by the
emph{goodput} departure processes, i.e., the departure that can be
successfully received by the receiver. Then, the outage effective capacity for
the buffer-aided diamond relay system is obtained for HARQ-IR incorporated
transmission strategy under the emph{end-to-end} delay constraints. In
comparison with the DF protocol with perfect CSI at the transmitters, it is
shown that HARQ-IR can achieve superior performance when the SNR levels at the
relay are not so large or when the delay constraints are stringent.