IT博客汇 | arXiv Paper Daily: Mon, 5 Dec 2016

arXiv Paper Daily: Mon, 5 Dec 2016

我爱机器学习(52ml.net)发表于 2016-12-05 00:00:00

Neural and Evolutionary Computing

Probabilistic Neural Programs

Kenton W. Murray, Jayant Krishnamurthy
Comments: Appears in NAMPI workshop at NIPS 2016
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present probabilistic neural programs, a framework for program induction
that permits flexible specification of both a computational model and inference
algorithm while simultaneously enabling the use of deep neural networks.
Probabilistic neural programs combine a computation graph for specifying a
neural network with an operator for weighted nondeterministic choice. Thus, a
program describes both a collection of decisions as well as the neural network
architecture used to make each one. We evaluate our approach on a challenging
diagram question answering task where probabilistic neural programs correctly
execute nearly twice as many programs as a baseline model.

Reliable Evaluation of Neural Network for Multiclass Classification of Real-world Data

Siddharth Dinesh, Tirtharaj Dash
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

This paper presents a systematic evaluation of Neural Network (NN) for
classification of real-world data. In the field of machine learning, it is
often seen that a single parameter that is ‘predictive accuracy’ is being used
for evaluating the performance of a classifier model. However, this parameter
might not be considered reliable given a dataset with very high level of
skewness. To demonstrate such behavior, seven different types of datasets have
been used to evaluate a Multilayer Perceptron (MLP) using twelve(12) different
parameters which include micro- and macro-level estimation. In the present
study, the most common problem of prediction called ‘multiclass’ classification
has been considered. The results that are obtained for different parameters for
each of the dataset could demonstrate interesting findings to support the
usability of these set of performance evaluation parameters.

Summary – TerpreT: A Probabilistic Programming Language for Program Induction

Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, Daniel Tarlow
Comments: 7 pages, 2 figures, 4 tables in 1st Workshop on Neural Abstract Machines & Program Induction (NAMPI), @NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

We study machine learning formulations of inductive program synthesis; that
is, given input-output examples, synthesize source code that maps inputs to
corresponding outputs. Our key contribution is TerpreT, a domain-specific
language for expressing program synthesis problems. A TerpreT model is composed
of a specification of a program representation and an interpreter that
describes how programs map inputs to outputs. The inference task is to observe
a set of input-output examples and infer the underlying program. From a TerpreT
model we automatically perform inference using four different back-ends:
gradient descent (thus each TerpreT model can be seen as defining a
differentiable interpreter), linear program (LP) relaxations for graphical
models, discrete satisfiability solving, and the Sketch program synthesis
system. TerpreT has two main benefits. First, it enables rapid exploration of a
range of domains, program representations, and interpreter models. Second, it
separates the model specification from the inference algorithm, allowing proper
comparisons between different approaches to inference.

We illustrate the value of TerpreT by developing several interpreter models
and performing an extensive empirical comparison between alternative inference
algorithms on a variety of program models. To our knowledge, this is the first
work to compare gradient-based search over program space to traditional
search-based alternatives. Our key empirical finding is that constraint solvers
dominate the gradient descent and LP-based formulations.

This is a workshop summary of a longer report at arXiv:1608.04428

Cognitive Deep Machine Can Train Itself

András Lőrincz, Máté Csákvári, Áron Fóthi, Zoltán Ádám Milacski, András Sárkány, Zoltán Tősér
Comments: 14 pages, 8 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Machine learning is making substantial progress in diverse applications. The
success is mostly due to advances in deep learning. However, deep learning can
make mistakes and its generalization abilities to new tasks are questionable.
We ask when and how one can combine network outputs, when (i) details of the
observations are evaluated by learned deep components and (ii) facts and
confirmation rules are available in knowledge based systems. We show that in
limited contexts the required number of training samples can be low and
self-improvement of pre-trained networks in more general context is possible.
We argue that the combination of sparse outlier detection with deep components
that can support each other diminish the fragility of deep methods, an
important requirement for engineering applications. We argue that supervised
learning of labels may be fully eliminated under certain conditions: a
component based architecture together with a knowledge based system can train
itself and provide high quality answers. We demonstrate these concepts on the
State Farm Distracted Driver Detection benchmark. We argue that the view of the
Study Panel (2016) may overestimate the requirements on `years of focused
research’ and `careful, unique construction’ for `AI systems’.

Unsupervised learning of image motion by recomposing sequences

Andrew Jaegle, Stephen Phillips, Daphne Ippolito, Kostas Daniilidis
Comments: 14 pages, including references and supplement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

We propose a new method for learning a representation of image motion in an
unsupervised fashion. We do so by learning an image sequence embedding that
respects associativity and invertibility properties of composed sequences with
known temporal order. This procedure makes minimal assumptions about scene
content, and the resulting networks learn to exploit rigid and non-rigid motion
cues. We show that a deep neural network trained to respect these constraints
implicitly identifies the characteristic motion patterns of many different
sequence types.

Our network architecture consists of a CNN followed by an LSTM and is
structured to learn motion representations over sequences of arbitrary length.
We demonstrate that a network trained using our unsupervised procedure on
real-world sequences of human actions and vehicle motion can capture semantic
regions corresponding to the motion in the scene, and not merely image-level
differences, without requiring any motion labels. Furthermore, we present
results that suggest our method can be used to extract information useful for
independent motion tracking, localization, and nearest neighbor identification.
Our results suggest that this representation may be useful for motion-related
tasks where explicit labels are often very difficult to obtain.

Computer Vision and Pattern Recognition

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

Problems at the intersection of vision and language are of significant
importance both as challenging research questions and for the rich set of
applications they enable. However, inherent structure in our world and bias in
our language tend to be a simpler signal for learning than visual modalities,
resulting in models that ignore visual information, leading to an inflated
sense of their capability.

We propose to counter these language priors for the task of Visual Question
Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance
the popular VQA dataset (Antol et al., ICCV 2015) by collecting complementary
images such that every question in our balanced dataset is associated with not
just a single image, but rather a pair of similar images that result in two
different answers to the question. Our dataset is by construction more balanced
than the original VQA dataset and has approximately twice the number of
image-question pairs. Our complete balanced dataset will be publicly released
as part of the 2nd iteration of the Visual Question Answering Challenge (VQA
v2.0).

We further benchmark a number of state-of-art VQA models on our balanced
dataset. All models perform significantly worse on our balanced dataset,
suggesting that these models have indeed learned to exploit language priors.
This finding provides the first concrete empirical evidence for what seems to
be a qualitative sense among practitioners.

Finally, our data collection protocol for identifying complementary images
enables us to develop a novel interpretable model, which in addition to
providing an answer to the given (image, question) pair also provides a
counter-example based explanation – specifically, it identifies an image that
is similar to the original image, but it believes has a different answer to the
same question. This can help in building trust for machines among their users.

Scribbler: Controlling Deep Image Synthesis with Sketch and Color

Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
Comments: 13 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Recently, there have been several promising methods to generate realistic
imagery from deep convolutional networks. These methods sidestep the
traditional computer graphics rendering pipeline and instead generate imagery
at the pixel level by learning from large collections of photos (e.g. faces or
bedrooms). However, these methods are of limited utility because it is
difficult for a user to control what the network produces. In this paper, we
propose a deep adversarial image synthesis architecture that is conditioned on
coarse sketches and sparse color strokes to generate realistic cars, bedrooms,
or faces. We demonstrate a sketch based image synthesis system which allows
users to ‘scribble’ over the sketch to indicate preferred color for objects.
Our network can then generate convincing images that satisfy both the color and
the sketch constraints of user. The network is feed-forward which allows users
to see the effect of their edits in real time. We compare to recent work on
sketch to image synthesis and show that our approach can generate more
realistic, more diverse, and more controllable outputs. The architecture is
also effective at user-guided colorization of grayscale images.

Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee
Comments: published at NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Learning (cs.LG)

Understanding the 3D world is a fundamental problem in computer vision.
However, learning a good representation of 3D objects is still an open problem
due to the high dimensionality of the data and many factors of variation
involved. In this work, we investigate the task of single-view 3D object
reconstruction from a learning agent’s perspective. We formulate the learning
process as an interaction between 3D and 2D representations and propose an
encoder-decoder network with a novel projection loss defined by the perspective
transformation. More importantly, the projection loss enables the unsupervised
learning using 2D observation without explicit 3D supervision. We demonstrate
the ability of the model in generating 3D volume from a single 2D image with
three sets of experiments: (1) learning from single-class objects; (2) learning
from multi-class objects and (3) testing on novel object classes. Results show
superior performance and better generalization ability for 3D object
reconstruction when the projection loss is involved.

A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images

David Vázquez, Jorge Bernal, F. Javier Sánchez, Gloria Fernández-Esparrach, Antonio M. López, Adriana Romero, Michal Drozdzal, Aaron Courville
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Colorectal cancer (CRC) is the third cause of cancer death worldwide.
Currently, the standard approach to reduce CRC-related mortality is to perform
regular screening in search for polyps and colonoscopy is the screening tool of
choice. The main limitations of this screening procedure are polyp miss-rate
and inability to perform visual assessment of polyp malignancy. These drawbacks
can be reduced by designing Decision Support Systems (DSS) aiming to help
clinicians in the different stages of the procedure by providing endoluminal
scene segmentation. Thus, in this paper, we introduce an extended benchmark of
colonoscopy image, with the hope of establishing a new strong benchmark for
colonoscopy image analysis research. We provide new baselines on this dataset
by training standard fully convolutional networks (FCN) for semantic
segmentation and significantly outperforming, without any further
post-processing, prior results in endoluminal scene segmentation.

Action Recognition with Dynamic Image Networks

Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi
Comments: 14 pages, 9 figures, 11 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce the concept of “dynamic image”, a novel compact representation
of videos useful for video analysis especially when convolutional neural
networks (CNNs) are used. The dynamic image is based on the rank pooling
concept and is obtained through the parameters of a ranking machine that
encodes the temporal evolution of the frames of the video. Dynamic images are
obtained by directly applying rank pooling on the raw image pixels of a video
producing a single RGB image per video. This idea is simple but powerful as it
enables the use of existing CNN models directly on video data with fine-tuning.
We present an efficient and effective approximate rank pooling operator,
speeding it up orders of magnitude compared to rank pooling. Our new
approximate rank pooling CNN layer allows us to generalize dynamic images to
dynamic feature maps and we demonstrate the power of our new representations on
standard benchmarks in action recognition achieving state-of-the-art
performance.

Centrog Feature technique for vehicle type recognition at day and night times

Martins E. Irhebhude, Philip O. Odion, Darius T. Chinyio
Comments: 14 pages, 8 figures, Journal article
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work proposes a feature-based technique to recognize vehicle types
within day and night times. Support vector machine (SVM) classifier is applied
on image histogram and CENsus Transformed histogRam Oriented Gradient (CENTROG)
features in order to classify vehicle types during the day and night. Thermal
images were used for the night time experiments. Although thermal images suffer
from low image resolution, lack of colour and poor texture information, they
offer the advantage of being unaffected by high intensity light sources such as
vehicle headlights which tend to render normal images unsuitable for night time
image capturing and subsequent analysis. Since contour is useful in shape based
categorisation and the most distinctive feature within thermal images, CENTROG
is used to capture this feature information and is used within the experiments.
The experimental results so obtained were compared with those obtained by
employing the CENsus TRansformed hISTogram (CENTRIST). Experimental results
revealed that CENTROG offers better recognition accuracies for both day and
night times vehicle types recognition.

Recognition of Text Image Using Multilayer Perceptron

Singh Vijendra, Nisha Vasudeva, Hem Jyotsana Parashar
Comments: 2011 IEEE 3rd International Conference on Machine Learning and Computing (ICMLC 2011, Singapore, PP 547-550
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The biggest challenge in the field of image processing is to recognize
documents both in printed and handwritten format. Optical Character Recognition
OCR is a type of document image analysis where scanned digital image that
contains either machine printed or handwritten script input into an OCR
software engine and translating it into an editable machine readable digital
text format. A Neural network is designed to model the way in which the brain
performs a particular task or function of interest: The neural network is
simulated in software on a digital computer. Character Recognition refers to
the process of converting printed Text documents into translated Unicode Text.
The printed documents available in the form of books, papers, magazines, etc.
are scanned using standard scanners which produce an image of the scanned
document. Lines are identifying by an algorithm where we identify top and
bottom of line. Then in each line character boundaries are calculated by an
algorithm then using these calculation, characters is isolated from the image
and then we classify each character by basic back propagation. Each image
character is comprised of 30*20 pixels. We have used the Back propagation
Neural Network for efficient recognition where the errors were corrected
through back propagation and rectified neuron values were transmitted by
feed-forward method in the neural network of multiple layers.

SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation

Li Yi, Hao Su, Xingwen Guo, Leonidas Guibas
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we study the problem of semantic annotation on 3D models that
are represented as shape graphs. A functional view is taken to represent
localized information on graphs, so that annotations such as part segment or
keypoint are nothing but 0-1 indicator vertex functions. Compared with images
that are 2D grids, shape graphs are irregular and non-isomorphic data
structures. To enable the prediction of vertex functions on them by
convolutional neural networks, we resort to spectral CNN method that enables
weight sharing by parameterizing kernels in the spectral domain spanned by
graph laplacian eigenbases. Under this setting, our network, named SyncSpecCNN,
strive to overcome two key challenges: how to share coefficients and conduct
multi-scale analysis in different parts of the graph for a single shape, and
how to share information across related but different shapes that may be
represented by very different graphs. Towards these goals, we introduce a
spectral parameterization of dilated convolutional kernels and a spectral
transformer network. Experimentally we tested our SyncSpecCNN on various tasks,
including 3D shape part segmentation and 3D keypoint prediction.
State-of-the-art performance has been achieved on all benchmark datasets.

Globally Consistent Multi-People Tracking using Motion Patterns

Andrii Maksai, Xinchao Wang, Francois Fleuret, Pascal Fua
Comments: 8 pages, 7 figures. 11 pages supplementary
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Many state-of-the-art approaches to people tracking rely on detecting them in
each frame independently, grouping detections into short but reliable
trajectory segments, and then further grouping them into full trajectories.
This grouping typically relies on imposing local smoothness constraints but
almost never on enforcing more global constraints on the trajectories. In this
paper, we propose an approach to imposing global consistency by first inferring
behavioral patterns from the ground truth and then using them to guide the
tracking algorithm. When used in conjunction with several state-of-the-art
algorithms, this further increases their already good performance. Furthermore,
we propose an unsupervised scheme that yields almost similar improvements
without the need for ground truth.

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

Haoqiang Fan, Hao Su, Leonidas Guibas
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generation of 3D data by deep neural network has been attracting increasing
attention in the research community. The majority of extant works resort to
regular representations such as volumetric grids or collection of images;
however, these representations obscure the natural invariance of 3D shapes
under geometric transformations and also suffer from a number of other issues.
In this paper we address the problem of 3D reconstruction from a single image,
generating a straight-forward form of output — point cloud coordinates. Along
with this problem arises a unique and interesting issue, that the groundtruth
shape for an input image may be ambiguous. Driven by this unorthodox output
form and the inherent ambiguity in groundtruth, we design architecture, loss
function and learning paradigm that are novel and effective. Our final solution
is a conditional shape sampler, capable of predicting multiple plausible 3D
point clouds from an input image. In experiments not only can our system
outperform state-of-the-art methods on single image based 3d reconstruction
benchmarks; but it also shows a strong performance for 3d shape completion and
promising ability in making multiple plausible predictions.

Learning to Search on Manifolds for 3D Pose Estimation of Articulated Objects

Yu Zhang, Chi Xu, Li Cheng
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper focuses on the challenging problem of 3D pose estimation of a
diverse spectrum of articulated objects from single depth images. A novel
structured prediction approach is considered, where 3D poses are represented as
skeletal models that naturally operate on manifolds. Given an input depth
image, the problem of predicting the most proper articulation of underlying
skeletal model is thus formulated as sequentially searching for the optimal
skeletal configuration. This is subsequently addressed by convolutional neural
nets trained end-to-end to render sequential prediction of the joint locations
as regressing a set of tangent vectors of the underlying manifolds. Our
approach is examined on various articulated objects including human hand,
mouse, and fish benchmark datasets. Empirically it is shown to deliver highly
competitive performance with respect to the state-of-the-arts, while operating
in real-time (over 30 FPS).

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Point cloud is an important type of geometric data structure. Due to its
irregular format, most researchers transform such data to regular 3D voxel
grids or collections of images. This, however, renders data unnecessarily
voluminous and causes issues. In this paper, we design a novel type of neural
network that directly consumes point clouds and well respects the permutation
invariance of points in the input. Our network, named PointNet, provides a
unified architecture for applications ranging from object classification, part
segmentation, to scene semantic parsing. Though simple, PointNet is highly
efficient and effective. Empirically, it shows strong performance on par or
even better than state of the art. Theoretically, we provide analysis towards
understanding of what the network has learnt and why the network is robust with
respect to input perturbation and corruption.

Guided Open Vocabulary Image Captioning with Constrained Beam Search

Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
Comments: Under submission to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing image captioning models do not generalize well to out-of-domain
images containing novel scenes or objects. This limitation severely hinders the
use of these models in real world applications dealing with images in the wild.
We address this problem using a flexible approach that enables existing deep
captioning architectures to take advantage of image taggers at test time,
without re-training. Our method uses constrained beam search to force the
inclusion of selected tag words in the output, and fixed, pretrained word
embeddings to facilitate vocabulary expansion to previously unseen tag words.
Using this approach we achieve state of the art results for out-of-domain
captioning on MS COCO (and improved results for in-domain captioning). In order
to demonstrate the scalability of our approach, we generate and publicly
release captions for the complete ImageNet classification dataset containing
1.2M images. Each ImageNet caption includes the ground-truth image label. Human
evaluations indicate that 27% of the resulting captions are likely to meet or
exceed human quality (increasing to 38% for certain categories such as birds).

Zero-Shot Learning via Revealing Data Distribution

Bo Zhao, Botong Wu, Tianfu Wu, Yizhou Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a method of zero-shot learning (ZSL) which poses ZSL as
the missing data problem, rather than the missing label problem. While most
popular methods in ZSL focus on learning the mapping function from the image
feature space to the label embedding space, the proposed method explores a
simple yet effective transductive framework in the reverse mapping. Our method
estimates data distribution of unseen classes in the image feature space by
transferring knowledge from the label embedding space. It assumes that data of
each seen and unseen class follow Gaussian distribution in the image feature
space and utilizes Gaussian mixture model to model data. The signature is
introduced to describe the data distribution of each class. In experiments, our
method obtains 87.38% and 61.08% mean accuracies on the Animals with Attributes
(AwA) and the Caltech-UCSD Birds-200-2011 (CUB) datasets respectively, which
outperforms the runner-up methods significantly by 4.95% and 6.38%. In
addition, we also investigate the extension of our method to open-set
classification.

Unsupervised Human Action Detection by Action Matching

Basura Fernando, Sareh Shirazi, Stephen Gould
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a new task of unsupervised action detection by action matching.
Given two long videos, the objective is to temporally detect all pairs of
matching video segments. A pair of video segments are matched if they share the
same human action. The task is category independent—it does not matter what
action is being performed—and no supervision is used to discover such video
segments. Unsupervised action detection by action matching allows us to align
videos in a meaningful manner. As such, it can be used to discover new action
categories or as an action proposal technique within, say, an action detection
pipeline. Moreover, it is a useful pre-processing step for generating video
highlights, e.g., from sports videos.

We present an effective and efficient method for unsupervised action
detection. We use an unsupervised temporal encoding method and exploit the
temporal consistency in human actions to obtain candidate action segments. We
evaluate our method on this challenging task using three activity recognition
benchmarks, namely, the MPII Cooking activities dataset, the THUMOS15 action
detection benchmark and a new dataset called the IKEA dataset. On the MPII
Cooking dataset we detect action segments with a precision of 21.6% and recall
of 11.7% over 946 long video pairs and over 5000 ground truth action segments.
Similarly, on THUMOS dataset we obtain 18.4% precision and 25.1% recall over
5094 ground truth action segment pairs.

Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks

Daniel Lévy, Arzav Jain
Comments: NIPS 2016 ML4HC Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Mammography is the most widely used method to screen breast cancer. Because
of its mostly manual nature, variability in mass appearance, and low
signal-to-noise ratio, a significant number of breast masses are missed or
misdiagnosed. In this work, we present how Convolutional Neural Networks can be
used to directly classify pre-segmented breast masses in mammograms as benign
or malignant, using a combination of transfer learning, careful pre-processing
and data augmentation to overcome limited training data. We achieve
state-of-the-art results on the DDSM dataset, surpassing human performance, and
show interpretability of our model.

Object Detection via End-to-End Integration of Aspect Ratio and Context Aware Part-based Models and Fully Convolutional Networks

Bo Li, Tianfu Wu, Shuai Shao, Lun Zhang, Rufeng Chu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a framework of integrating a mixture of part-based models
and region-based convolutional networks for accurate and efficient object
detection. Each mixture component consists of a small number of parts
accounting for both object aspect ratio and contextual information explicitly.
The mixture is category-agnostic for the simplicity of scaling up in
applications. Both object aspect ratio and context have been extensively
studied in traditional object detection systems such as the mixture of
deformable part-based models [13]. They are, however, largely ignored in deep
neural network based detection systems [17, 16, 39, 8]. The proposed method
addresses this issue in two-fold: (i) It remedies the wrapping artifact due to
the generic RoI (region-of-interest) pooling (e.g., a 3 x 3 grid) by taking
into account object aspect ratios. (ii) It models both global (from the whole
image) and local (from the surrounding of a bounding box) context for improving
performance. The integrated framework is fully convolutional and enjoys
end-to-end training, which we call the aspect ratio and context aware fully
convolutional network (ARC-FCN). In experiments, ARC-FCN shows very competitive
results on the PASCAL VOC datasets, especially, it outperforms both Faster
R-CNN [39] and R-FCN [8] with significantly better mean average precision (mAP)
using larger value for the intersection-over-union (IoU) threshold (i.e., 0.7
in the experiments). ARC-FCN is still sufficiently efficient with a test-time
speed of 380ms per image, faster than the Faster R-CNN but slower than the
R-FCN.

Photorealistic Facial Texture Inference Using Deep Neural Networks

Shunsuke Saito, Lingyu Wei, Liwen Hu, Koki Nagano, Hao Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

We present a data-driven inference method that can synthesize a
photorealistic texture map of a complete 3D face model given a partial 2D view
of a person in the wild. After an initial estimation of shape and low-frequency
albedo, we compute a high-frequency partial texture map, without the shading
component, of the visible face area. To extract the fine appearance details
from this incomplete input, we introduce a multi-scale detail analysis
technique based on mid-layer feature correlations extracted from a deep
convolutional neural network. We demonstrate that fitting a convex combination
of feature correlations from a high-resolution face database can yield a
semantically plausible facial detail description of the entire face. A complete
and photorealistic texture map can then be synthesized by iteratively
optimizing for the reconstructed feature correlations. Using these
high-resolution textures and a commercial rendering framework, we can produce
high-fidelity 3D renderings that are visually comparable to those obtained with
state-of-the-art multi-view face capture systems. We demonstrate successful
face reconstructions from a wide range of low resolution input images,
including those of historical figures. In addition to extensive evaluations, we
validate the realism of our results using a crowdsourced user study.

A Visual Representation for Editing Face Images

Jiajun Lu, Kalyan Sunkavalli, Nathan Carr, Sunil Hadap, David Forsyth
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

We propose a new approach for editing face images, which enables numerous
exciting applications including face relighting, makeup transfer and face
detail editing. Our face edits are based on a visual representation, which
includes geometry, face segmentation, albedo, illumination and detail map. To
recover our visual representation, we start by estimating geometry using a
morphable face model, then decompose the face image to recover the albedo, and
then shade the geometry with the albedo and illumination. The residual between
our shaded geometry and the input image produces our detail map, which carries
high frequency information that is either insufficiently or incorrectly
captured by our shading process. By manipulating the detail map, we can edit
face images with reality and identity preserved. Our representation allows
various applications. First, it allows a user to directly manipulate various
illumination. Second, it allows non-parametric makeup transfer with input
face’s distinctive identity features preserved. Third, it allows non-parametric
modifications to the face appearance by transferring details. For face
relighting and detail editing, we evaluate via a user study and our method
outperforms other methods. For makeup transfer, we evaluate via an online
attractiveness evaluation system, and can reliably make people look younger and
more attractive. We also show extensive qualitative comparisons to existing
methods, and have significant improvements over previous techniques.

Object-Centric Representation Learning from Unlabeled Videos

Ruohan Gao, Dinesh Jayaraman, Kristen Grauman
Comments: In Proceedings of the Asian Conference on Computer Vision (ACCV), 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Supervised (pre-)training currently yields state-of-the-art performance for
representation learning for visual recognition, yet it comes at the cost of (1)
intensive manual annotations and (2) an inherent restriction in the scope of
data relevant for learning. In this work, we explore unsupervised feature
learning from unlabeled video. We introduce a novel object-centric approach to
temporal coherence that encourages similar representations to be learned for
object-like regions segmented from nearby frames. Our framework relies on a
Siamese-triplet network to train a deep convolutional neural network (CNN)
representation. Compared to existing temporal coherence methods, our idea has
the advantage of lightweight preprocessing of the unlabeled video (no tracking
required) while still being able to extract object-level regions from which to
learn invariances. Furthermore, as we show in results on several standard
datasets, our method typically achieves substantial accuracy gains over
competing unsupervised methods for image classification and retrieval tasks.

3D Bounding Box Estimation Using Deep Learning and Geometry

Arsalan Mousavian, Dragomir Anguelov, John Flynn, Jana Kosecka
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a method for 3D object detection and pose estimation from a single
image. In contrast to current techniques that only regress the 3D orientation
of an object, our method first regresses relatively stable 3D object properties
using a deep convolutional neural network and then combines these estimates
with geometric constraints provided by a 2D object bounding box to produce a
complete 3D bounding box. The first network output estimates the 3D object
orientation using a novel hybrid discrete-continuous loss, which significantly
outperforms the L2 loss. The second output regresses the 3D object dimensions,
which have relatively little variance compared to alternatives and can often be
predicted for many object types. These estimates, combined with the geometric
constraints on translation imposed by the 2D bounding box, enable us to recover
a stable and accurate 3D object pose. We evaluate our method on the challenging
KITTI object detection benchmark both on the official metric of 3D orientation
estimation and also on the accuracy of the obtained 3D bounding boxes. Although
conceptually simple, our method outperforms more complex and computationally
expensive approaches that leverage semantic segmentation, instance level
segmentation and flat ground priors and sub-category detection.

In Teacher We Trust: Learning Compressed Models for Pedestrian Detection

Jonathan Shen, Noranart Vesdapunt, Vishnu N. Boddeti, Kris M. Kitani
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep convolutional neural networks continue to advance the state-of-the-art
in many domains as they grow bigger and more complex. It has been observed that
many of the parameters of a large network are redundant, allowing for the
possibility of learning a smaller network that mimics the outputs of the large
network through a process called Knowledge Distillation. We show, however, that
standard Knowledge Distillation is not effective for learning small models for
the task of pedestrian detection. To improve this process, we introduce a
higher-dimensional hint layer to increase information flow. We also estimate
the variance in the outputs of the large network and propose a loss function to
incorporate this uncertainty. Finally, we attempt to boost the complexity of
the small network without increasing its size by using as input hand-designed
features that have been demonstrated to be effective for pedestrian detection.
We succeed in training a model that contains (400 imes) fewer parameters than
the large network while outperforming AlexNet on the Caltech Pedestrian
Dataset.

Unsupervised learning of image motion by recomposing sequences

We propose a new method for learning a representation of image motion in an
unsupervised fashion. We do so by learning an image sequence embedding that
respects associativity and invertibility properties of composed sequences with
known temporal order. This procedure makes minimal assumptions about scene
content, and the resulting networks learn to exploit rigid and non-rigid motion
cues. We show that a deep neural network trained to respect these constraints
implicitly identifies the characteristic motion patterns of many different
sequence types.

Our network architecture consists of a CNN followed by an LSTM and is
structured to learn motion representations over sequences of arbitrary length.
We demonstrate that a network trained using our unsupervised procedure on
real-world sequences of human actions and vehicle motion can capture semantic
regions corresponding to the motion in the scene, and not merely image-level
differences, without requiring any motion labels. Furthermore, we present
results that suggest our method can be used to extract information useful for
independent motion tracking, localization, and nearest neighbor identification.
Our results suggest that this representation may be useful for motion-related
tasks where explicit labels are often very difficult to obtain.

Identifying and Categorizing Anomalies in Retinal Imaging Data

Philipp Seeböck, Sebastian Waldstein, Sophie Klimscha, Bianca S. Gerendas, René Donner, Thomas Schlegl, Ursula Schmidt-Erfurth, Georg Langs
Comments: Extended Abstract, Accepted for NIPS 2016 Workshop “Machine Learning for Health”
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The identification and quantification of markers in medical images is
critical for diagnosis, prognosis and management of patients in clinical
practice. Supervised- or weakly supervised training enables the detection of
findings that are known a priori. It does not scale well, and a priori
definition limits the vocabulary of markers to known entities reducing the
accuracy of diagnosis and prognosis. Here, we propose the identification of
anomalies in large-scale medical imaging data using healthy examples as a
reference. We detect and categorize candidates for anomaly findings untypical
for the observed data. A deep convolutional autoencoder is trained on healthy
retinal images. The learned model generates a new feature representation, and
the distribution of healthy retinal patches is estimated by a one-class support
vector machine. Results demonstrate that we can identify pathologic regions in
images without using expert annotations. A subsequent clustering categorizes
findings into clinically meaningful classes. In addition the learned features
outperform standard embedding approaches in a classification task.

Voxelwise nonlinear regression toolbox for neuroimage analysis: Application to aging and neurodegenerative disease modeling

Santi Puch, Asier Aduriz, Adrià Casamitjana, Veronica Vilaplana, Paula Petrone, Grégory Operto, Raffaele Cacciaglia, Stavros Skouras, Carles Falcon, José Luis Molinuevo, Juan Domingo Gispert
Comments: 4 pages + 1 page for references. NIPS 2016 Workshop on Machine Learning for Health (NIPS ML4HC)
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neurons and Cognition (q-bio.NC); Applications (stat.AP)

This paper describes a new neuroimaging analysis toolbox that allows for the
modeling of nonlinear effects at the voxel level, overcoming limitations of
methods based on linear models like the GLM. We illustrate its features using a
relevant example in which distinct nonlinear trajectories of Alzheimer’s
disease related brain atrophy patterns were found across the full biological
spectrum of the disease.

Self-critical Sequence Training for Image Captioning

Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, Vaibhava Goel
Comments: 16 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Recently it has been shown that policy-gradient methods for reinforcement
learning can be utilized to train deep end-to-end systems directly on
non-differentiable metrics for the task at hand. In this paper we consider the
problem of optimizing image captioning systems using reinforcement learning,
and show that by carefully optimizing our systems using the test metrics of the
MSCOCO task, significant gains in performance can be realized. Our systems are
built using a new optimization approach that we call self-critical sequence
training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather
than estimating a “baseline” to normalize the rewards and reduce variance,
utilizes the output of its own test-time inference algorithm to normalize the
rewards it experiences. Using this approach, estimating the reward signal (as
actor-critic methods must do) and estimating normalization (as REINFORCE
algorithms typically do) is avoided, while at the same time harmonizing the
model with respect to its test-time inference procedure. Empirically we find
that directly optimizing the CIDEr metric with SCST and greedy decoding at
test-time is highly effective. Our results on the MSCOCO evaluation sever
establish a new state-of-the-art on the task, improving the best result in
terms of CIDEr from 104.9 to 112.3.

Artificial Intelligence

Comparison of the COG Defuzzification Technique and Its Variations to the GPA Index

Michael Gr. Voskoglou
Comments: 11 pages, 5 figures, 2 tables
Journal-ref: American Journal of Computational and Applied Mathematics, 6(5),
187-193, 2016
Subjects: Artificial Intelligence (cs.AI)

The Center of Gravity (COG) method is one of the most popular defuzzification
techniques of fuzzy mathematics. In earlier works the COG technique was
properly adapted to be used as an assessment model (RFAM)and several variations
of it (GRFAM, TFAM and TpFAM)were also constructed for the same purpose. In
this paper the outcomes of all these models are compared to the corresponding
outcomes of a traditional assessment method of the bi-valued logic, the Grade
Point Average (GPA) Index. Examples are also presented illustrating our
results.

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Problems at the intersection of vision and language are of significant
importance both as challenging research questions and for the rich set of
applications they enable. However, inherent structure in our world and bias in
our language tend to be a simpler signal for learning than visual modalities,
resulting in models that ignore visual information, leading to an inflated
sense of their capability.

We propose to counter these language priors for the task of Visual Question
Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance
the popular VQA dataset (Antol et al., ICCV 2015) by collecting complementary
images such that every question in our balanced dataset is associated with not
just a single image, but rather a pair of similar images that result in two
different answers to the question. Our dataset is by construction more balanced
than the original VQA dataset and has approximately twice the number of
image-question pairs. Our complete balanced dataset will be publicly released
as part of the 2nd iteration of the Visual Question Answering Challenge (VQA
v2.0).

We further benchmark a number of state-of-art VQA models on our balanced
dataset. All models perform significantly worse on our balanced dataset,
suggesting that these models have indeed learned to exploit language priors.
This finding provides the first concrete empirical evidence for what seems to
be a qualitative sense among practitioners.

Finally, our data collection protocol for identifying complementary images
enables us to develop a novel interpretable model, which in addition to
providing an answer to the given (image, question) pair also provides a
counter-example based explanation – specifically, it identifies an image that
is similar to the original image, but it believes has a different answer to the
same question. This can help in building trust for machines among their users.

Summary – TerpreT: A Probabilistic Programming Language for Program Induction

We study machine learning formulations of inductive program synthesis; that
is, given input-output examples, synthesize source code that maps inputs to
corresponding outputs. Our key contribution is TerpreT, a domain-specific
language for expressing program synthesis problems. A TerpreT model is composed
of a specification of a program representation and an interpreter that
describes how programs map inputs to outputs. The inference task is to observe
a set of input-output examples and infer the underlying program. From a TerpreT
model we automatically perform inference using four different back-ends:
gradient descent (thus each TerpreT model can be seen as defining a
differentiable interpreter), linear program (LP) relaxations for graphical
models, discrete satisfiability solving, and the Sketch program synthesis
system. TerpreT has two main benefits. First, it enables rapid exploration of a
range of domains, program representations, and interpreter models. Second, it
separates the model specification from the inference algorithm, allowing proper
comparisons between different approaches to inference.

We illustrate the value of TerpreT by developing several interpreter models
and performing an extensive empirical comparison between alternative inference
algorithms on a variety of program models. To our knowledge, this is the first
work to compare gradient-based search over program space to traditional
search-based alternatives. Our key empirical finding is that constraint solvers
dominate the gradient descent and LP-based formulations.

This is a workshop summary of a longer report at arXiv:1608.04428

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

The ability to learn tasks in a sequential fashion is crucial to the
development of artificial intelligence. Neural networks are not, in general,
capable of this and it has been widely thought that catastrophic forgetting is
an inevitable feature of connectionist models. We show that it is possible to
overcome this limitation and train networks that can maintain expertise on
tasks which they have not experienced for a long time. Our approach remembers
old tasks by selectively slowing down learning on the weights important for
those tasks. We demonstrate our approach is scalable and effective by solving a
set of classification tasks based on the MNIST hand written digit dataset and
by learning several Atari 2600 games sequentially.

Asynchronous Stochastic Gradient MCMC with Elastic Coupling

Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

We consider parallel asynchronous Markov Chain Monte Carlo (MCMC) sampling
for problems where we can leverage (stochastic) gradients to define continuous
dynamics which explore the target distribution. We outline a solution strategy
for this setting based on stochastic gradient Hamiltonian Monte Carlo sampling
(SGHMC) which we alter to include an elastic coupling term that ties together
multiple MCMC instances. The proposed strategy turns inherently sequential HMC
algorithms into asynchronous parallel versions. First experiments empirically
show that the resulting parallel sampler significantly speeds up exploration of
the target distribution, when compared to standard SGHMC, and is less prone to
the harmful effects of stale gradients than a naive parallelization approach.

Cognitive Deep Machine Can Train Itself

Machine learning is making substantial progress in diverse applications. The
success is mostly due to advances in deep learning. However, deep learning can
make mistakes and its generalization abilities to new tasks are questionable.
We ask when and how one can combine network outputs, when (i) details of the
observations are evaluated by learned deep components and (ii) facts and
confirmation rules are available in knowledge based systems. We show that in
limited contexts the required number of training samples can be low and
self-improvement of pre-trained networks in more general context is possible.
We argue that the combination of sparse outlier detection with deep components
that can support each other diminish the fragility of deep methods, an
important requirement for engineering applications. We argue that supervised
learning of labels may be fully eliminated under certain conditions: a
component based architecture together with a knowledge based system can train
itself and provide high quality answers. We demonstrate these concepts on the
State Farm Distracted Driver Detection benchmark. We argue that the view of the
Study Panel (2016) may overestimate the requirements on `years of focused
research’ and `careful, unique construction’ for `AI systems’.

Probabilistic Neural Programs

Kenton W. Murray, Jayant Krishnamurthy
Comments: Appears in NAMPI workshop at NIPS 2016
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present probabilistic neural programs, a framework for program induction
that permits flexible specification of both a computational model and inference
algorithm while simultaneously enabling the use of deep neural networks.
Probabilistic neural programs combine a computation graph for specifying a
neural network with an operator for weighted nondeterministic choice. Thus, a
program describes both a collection of decisions as well as the neural network
architecture used to make each one. We evaluate our approach on a challenging
diagram question answering task where probabilistic neural programs correctly
execute nearly twice as many programs as a baseline model.

Inverse Modeling of Complex Interactive Behavior with ABC

Antti Kangasrääsiö, Kumaripaba Athukorala, Andrew Howes, Jukka Corander, Samuel Kaski, Antti Oulasvirta
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Can one make deep inferences about a user based only on observations of how
she interacts? This paper contributes a methodology for inverse modeling in
HCI, where the goal is to estimate a cognitive model from limited behavioral
data. Given substantial diversity in users’ intentions, strategies and
abilities, this is a difficult problem and previously unaddressed in HCI. We
show advances following an approach that combines (1) computational
rationality, to predict how a user adapts to a task when her capabilities are
known, and (2) approximate Bayesian computation (ABC) to estimate those
capabilities. The benefit is that model parameters are conditioned on both
prior knowledge and observations, which improves model validity and helps
identify causes for observations. We demonstrate these benefits in a case of
menu interaction where the method obtained accurate estimates of users’
behavioral and cognitive features from selection time data only. Inverse
modeling methods can advance theoretical HCI by bringing complex behavior
within reach of modeling.

Active Search for Sparse Signals with Region Sensing

Yifei Ma, Roman Garnett, Jeff Schneider
Comments: aaai 2017 preprint; nips exhibition of rejections
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

Autonomous systems can be used to search for sparse signals in a large space;
e.g., aerial robots can be deployed to localize threats, detect gas leaks, or
respond to distress calls. Intuitively, search algorithms may increase
efficiency by collecting aggregate measurements summarizing large contiguous
regions. However, most existing search methods either ignore the possibility of
such region observations (e.g., Bayesian optimization and multi-armed bandits)
or make strong assumptions about the sensing mechanism that allow each
measurement to arbitrarily encode all signals in the entire environment (e.g.,
compressive sensing). We propose an algorithm that actively collects data to
search for sparse signals using only noisy measurements of the average values
on rectangular regions (including single points), based on the greedy
maximization of information gain. We analyze our algorithm in 1d and show that
it requires ( ilde{O}(frac{n}{mu^2}+k^2)) measurements to recover all of (k)
signal locations with small Bayes error, where (mu) and (n) are the signal
strength and the size of the search space, respectively. We also show that
active designs can be fundamentally more efficient than passive designs with
region sensing, contrasting with the results of Arias-Castro, Candes, and
Davenport (2013). We demonstrate the empirical performance of our algorithm on
a search problem using satellite image data and in high dimensions.

Self-critical Sequence Training for Image Captioning

Recently it has been shown that policy-gradient methods for reinforcement
learning can be utilized to train deep end-to-end systems directly on
non-differentiable metrics for the task at hand. In this paper we consider the
problem of optimizing image captioning systems using reinforcement learning,
and show that by carefully optimizing our systems using the test metrics of the
MSCOCO task, significant gains in performance can be realized. Our systems are
built using a new optimization approach that we call self-critical sequence
training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather
than estimating a “baseline” to normalize the rewards and reduce variance,
utilizes the output of its own test-time inference algorithm to normalize the
rewards it experiences. Using this approach, estimating the reward signal (as
actor-critic methods must do) and estimating normalization (as REINFORCE
algorithms typically do) is avoided, while at the same time harmonizing the
model with respect to its test-time inference procedure. Empirically we find
that directly optimizing the CIDEr metric with SCST and greedy decoding at
test-time is highly effective. Our results on the MSCOCO evaluation sever
establish a new state-of-the-art on the task, improving the best result in
terms of CIDEr from 104.9 to 112.3.

Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks

Daniel Lévy, Arzav Jain
Comments: NIPS 2016 ML4HC Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Mammography is the most widely used method to screen breast cancer. Because
of its mostly manual nature, variability in mass appearance, and low
signal-to-noise ratio, a significant number of breast masses are missed or
misdiagnosed. In this work, we present how Convolutional Neural Networks can be
used to directly classify pre-segmented breast masses in mammograms as benign
or malignant, using a combination of transfer learning, careful pre-processing
and data augmentation to overcome limited training data. We achieve
state-of-the-art results on the DDSM dataset, surpassing human performance, and
show interpretability of our model.

Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes

Taylor Killian, George Konidaris, Finale Doshi-Velez
Comments: Brief abstract for poster submission to Machine Learning for Healthcare workshop at NIPS 2016
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

Due to physiological variation, patients diagnosed with the same condition
may exhibit divergent, but related, responses to the same treatments. Hidden
Parameter Markov Decision Processes (HiP-MDPs) tackle this transfer-learning
problem by embedding these tasks into a low-dimensional space. However, the
original formulation of HiP-MDP had a critical flaw: the embedding uncertainty
was modeled independently of the agent’s state uncertainty, requiring an
unnatural training procedure in which all tasks visited every part of the state
space—possible for robots that can be moved to a particular location,
impossible for human patients. We update the HiP-MDP framework and extend it to
more robustly develop personalized medicine strategies for HIV treatment.

Computation and Language

Automated assessment of non-native learner essays: Investigating the role of linguistic features

Sowmya Vajjala
Comments: Article accepted for publication at: International Journal of Artificial Intelligence in Education (IJAIED). To appear in early 2017 (journal url: this http URL)
Subjects: Computation and Language (cs.CL)

Automatic essay scoring (AES) refers to the process of scoring free text
responses to given prompts, considering human grader scores as the gold
standard. Writing such essays is an essential component of many language and
aptitude exams. Hence, AES became an active and established area of research,
and there are many proprietary systems used in real life applications today.
However, not much is known about which specific linguistic features are useful
for prediction and how much of this is consistent across datasets. This article
addresses that by exploring the role of various linguistic features in
automatic essay scoring using two publicly available datasets of non-native
English essays written in test taking scenarios. The linguistic properties are
modeled by encoding lexical, syntactic, discourse and error types of learner
language in the feature set. Predictive models are then developed using these
features on both datasets and the most predictive features are compared. While
the results show that the feature set used results in good predictive models
with both datasets, the question “what are the most predictive features?” has a
different answer for each dataset.

ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally
Comments: 1st International Workshop on Efficient Methods for Deep Neural Networks at NIPS 2016, Barcelona, Spain. Full paper to appear at FPGA 2017
Subjects: Computation and Language (cs.CL)

Long Short-Term Memory (LSTM) is widely used in speech recognition. In order
to achieve higher prediction accuracy, machine learning scientists have built
larger and larger models. Such large model is both computation intensive and
memory intensive. Deploying such bulky model results in high power consumption
and leads to high total cost of ownership (TCO) of a data center. In order to
speedup the prediction and make it energy efficient, we first propose a
load-balance-aware pruning method that can compress the LSTM model size by 20x
(10x from pruning and 2x from quantization) with negligible loss of the
prediction accuracy. The pruned model is friendly for parallel processing.
Next, we propose scheduler that encodes and partitions the compressed model to
each PE for parallelism, and schedule the complicated LSTM data flow. Finally,
we design the hardware architecture, named Efficient Speech Recognition Engine
(ESE) that works directly on the compressed model. Implemented on Xilinx
XCKU060 FPGA running at 200MHz, ESE has a performance of 282 GOPS working
directly on the compressed LSTM network, corresponding to 2.52 TOPS on the
uncompressed one, and processes a full LSTM for speech recognition with a power
dissipation of 41 Watts. Evaluated on the LSTM for speech recognition
benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X
GPU implementations. It achieves 40x and 11.5x higher energy efficiency
compared with the CPU and GPU respectively.

Alleviating Overfitting for Polysemous Words for Word Representation Estimation Using Lexicons

Yuanzhi Ke, Masafumi Hagiwara
Comments: 7 pages, under review as a conference paper at IEEE IJCNN 2017
Subjects: Computation and Language (cs.CL)

Though there are some works on improving distributed word representations
using lexicons, the improper overfitting of the words that have multiple
meanings is a remaining issue deteriorating the learning when lexicons are
used, which needs to be solved. An alternative method is to allocate a vector
per sense instead a vector per word. However, the word representations
estimated in the former way are not as easy to use as the latter one. Our
previous work uses a probabilistic method to alleviate the overfitting, but it
is not robust with small corpus. In this paper, we propose a new neural network
to estimate distributed word representations using a lexicon and a corpus. We
add a lexicon layer in continuous bag-of-words model, and a threshold node
after the output of the lexicon layer. The threshold rejects the “bad” outputs
of the lexicon layer that are less likely to be the same with their inputs. In
this way, it alleviates the overfitting of the polysemous words. The proposed
neural network can be trained using negative sampling, which maximizing the log
probabilities of target words given the context words, by distinguishing the
target words from random noises. We compare the proposed neural network with
continuous bag-of-words model, the other works improving it, and the previous
works estimating distributed word representations using both a lexicon and a
corpus. The experimental results show that the proposed neural network is more
efficient and balanced for both semantic tasks and syntactic tasks than the
previous works, and robust to the size of the corpus.

Shift-Reduce Constituent Parsing with Neural Lookahead Features

Jiangming Liu, Yue Zhang
Subjects: Computation and Language (cs.CL)

Transition-based models can be fast and accurate for constituent parsing.
Compared with chart-based models, they leverage richer features by extracting
history information from a parser stack, which spans over non-local
constituents. On the other hand, during incremental parsing, constituent
information on the right hand side of the current word is not utilized, which
is a relative weakness of shift-reduce parsing. To address this limitation, we
leverage a fast neural model to extract lookahead features. In particular, we
build a bidirectional LSTM model, which leverages the full sentence information
to predict the hierarchy of constituents that each word starts and ends. The
results are then passed to a strong transition-based constituent parser as
lookahead features. The resulting parser gives 1.3% absolute improvement in WSJ
and 2.3% in CTB compared to the baseline, given the highest reported accuracies
for fully-supervised parsing.

Neural Document Embeddings for Intensive Care Patient Mortality Prediction

Paulina Grnarova, Florian Schmidt, Stephanie L. Hyland, Carsten Eickhoff
Subjects: Computation and Language (cs.CL)

We present an automatic mortality prediction scheme based on the unstructured
textual content of clinical notes. Proposing a convolutional document embedding
approach, our empirical investigation using the MIMIC-III intensive care
database shows significant performance gains compared to previously employed
methods such as latent topic distributions or generic doc2vec embeddings. These
improvements are especially pronounced for the difficult problem of
post-discharge mortality prediction.

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Problems at the intersection of vision and language are of significant
importance both as challenging research questions and for the rich set of
applications they enable. However, inherent structure in our world and bias in
our language tend to be a simpler signal for learning than visual modalities,
resulting in models that ignore visual information, leading to an inflated
sense of their capability.

We propose to counter these language priors for the task of Visual Question
Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance
the popular VQA dataset (Antol et al., ICCV 2015) by collecting complementary
images such that every question in our balanced dataset is associated with not
just a single image, but rather a pair of similar images that result in two
different answers to the question. Our dataset is by construction more balanced
than the original VQA dataset and has approximately twice the number of
image-question pairs. Our complete balanced dataset will be publicly released
as part of the 2nd iteration of the Visual Question Answering Challenge (VQA
v2.0).

We further benchmark a number of state-of-art VQA models on our balanced
dataset. All models perform significantly worse on our balanced dataset,
suggesting that these models have indeed learned to exploit language priors.
This finding provides the first concrete empirical evidence for what seems to
be a qualitative sense among practitioners.

Finally, our data collection protocol for identifying complementary images
enables us to develop a novel interpretable model, which in addition to
providing an answer to the given (image, question) pair also provides a
counter-example based explanation – specifically, it identifies an image that
is similar to the original image, but it believes has a different answer to the
same question. This can help in building trust for machines among their users.

Distributed, Parallel, and Cluster Computing

Performance Modeling of Distributed Deep Neural Networks

Sayed Hadi Hashemi, Shadi A. Noghabi, William Gropp
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

During the past decade, machine learning has become extremely popular and can
be found in many aspects of our every day life. Nowayadays with explosion of
data while rapid growth of computation capacity, Distributed Deep Neural
Networks (DDNNs) which can improve their performance linearly with more
computation resources, have become hot and trending. However, there has not
been an in depth study of the performance of these systems, and how well they
scale.

In this paper we analyze CNTK, one of the most commonly used DDNNs, by first
building a performance model and then evaluating the system two settings: a
small cluster with all nodes in a single rack connected to a top of rack
switch, and in large scale using Blue Waters with arbitary placement of nodes.
Our main focus was the scalability of the system with respect to adding more
nodes. Based on our results, this system has an excessive initialization
overhead because of poor I/O utilization which dominates the whole execution
time. Because of this, the system does not scale beyond a few nodes (4 in Blue
Waters). Additionally, due to a single server-multiple worker design the server
becomes a bottleneck after 16 nodes limiting the scalability of the CNTK.

Learning

Learning Operations on a Stack with Neural Turing Machines

Tristan Deleu, Joseph Dureau
Comments: 1st Workshop on Neural Abstract Machines & Program Induction (NAMPI), NIPS 2016, Barcelona, Spain
Subjects: Learning (cs.LG)

Multiple extensions of Recurrent Neural Networks (RNNs) have been proposed
recently to address the difficulty of storing information over long time
periods. In this paper, we experiment with the capacity of Neural Turing
Machines (NTMs) to deal with these long-term dependencies on well-balanced
strings of parentheses. We show that not only does the NTM emulate a stack with
its heads and learn an algorithm to recognize such words, but it is also
capable of strongly generalizing to much longer sequences.

Summary – TerpreT: A Probabilistic Programming Language for Program Induction

We study machine learning formulations of inductive program synthesis; that
is, given input-output examples, synthesize source code that maps inputs to
corresponding outputs. Our key contribution is TerpreT, a domain-specific
language for expressing program synthesis problems. A TerpreT model is composed
of a specification of a program representation and an interpreter that
describes how programs map inputs to outputs. The inference task is to observe
a set of input-output examples and infer the underlying program. From a TerpreT
model we automatically perform inference using four different back-ends:
gradient descent (thus each TerpreT model can be seen as defining a
differentiable interpreter), linear program (LP) relaxations for graphical
models, discrete satisfiability solving, and the Sketch program synthesis
system. TerpreT has two main benefits. First, it enables rapid exploration of a
range of domains, program representations, and interpreter models. Second, it
separates the model specification from the inference algorithm, allowing proper
comparisons between different approaches to inference.

We illustrate the value of TerpreT by developing several interpreter models
and performing an extensive empirical comparison between alternative inference
algorithms on a variety of program models. To our knowledge, this is the first
work to compare gradient-based search over program space to traditional
search-based alternatives. Our key empirical finding is that constraint solvers
dominate the gradient descent and LP-based formulations.

This is a workshop summary of a longer report at arXiv:1608.04428

Overcoming catastrophic forgetting in neural networks

The ability to learn tasks in a sequential fashion is crucial to the
development of artificial intelligence. Neural networks are not, in general,
capable of this and it has been widely thought that catastrophic forgetting is
an inevitable feature of connectionist models. We show that it is possible to
overcome this limitation and train networks that can maintain expertise on
tasks which they have not experienced for a long time. Our approach remembers
old tasks by selectively slowing down learning on the weights important for
those tasks. We demonstrate our approach is scalable and effective by solving a
set of classification tasks based on the MNIST hand written digit dataset and
by learning several Atari 2600 games sequentially.

Cognitive Deep Machine Can Train Itself

Machine learning is making substantial progress in diverse applications. The
success is mostly due to advances in deep learning. However, deep learning can
make mistakes and its generalization abilities to new tasks are questionable.
We ask when and how one can combine network outputs, when (i) details of the
observations are evaluated by learned deep components and (ii) facts and
confirmation rules are available in knowledge based systems. We show that in
limited contexts the required number of training samples can be low and
self-improvement of pre-trained networks in more general context is possible.
We argue that the combination of sparse outlier detection with deep components
that can support each other diminish the fragility of deep methods, an
important requirement for engineering applications. We argue that supervised
learning of labels may be fully eliminated under certain conditions: a
component based architecture together with a knowledge based system can train
itself and provide high quality answers. We demonstrate these concepts on the
State Farm Distracted Driver Detection benchmark. We argue that the view of the
Study Panel (2016) may overestimate the requirements on `years of focused
research’ and `careful, unique construction’ for `AI systems’.

Identifying and Categorizing Anomalies in Retinal Imaging Data

The identification and quantification of markers in medical images is
critical for diagnosis, prognosis and management of patients in clinical
practice. Supervised- or weakly supervised training enables the detection of
findings that are known a priori. It does not scale well, and a priori
definition limits the vocabulary of markers to known entities reducing the
accuracy of diagnosis and prognosis. Here, we propose the identification of
anomalies in large-scale medical imaging data using healthy examples as a
reference. We detect and categorize candidates for anomaly findings untypical
for the observed data. A deep convolutional autoencoder is trained on healthy
retinal images. The learned model generates a new feature representation, and
the distribution of healthy retinal patches is estimated by a one-class support
vector machine. Results demonstrate that we can identify pathologic regions in
images without using expert annotations. A subsequent clustering categorizes
findings into clinically meaningful classes. In addition the learned features
outperform standard embedding approaches in a classification task.

A General Framework for Density Based Time Series Clustering Exploiting a Novel Admissible Pruning Strategy

Nurjahan Begum, Liudmila Ulanova, Hoang Anh Dau, Jun Wang, Eamonn Keogh
Subjects: Learning (cs.LG)

Time Series Clustering is an important subroutine in many higher-level data
mining analyses, including data editing for classifiers, summarization, and
outlier detection. It is well known that for similarity search the superiority
of Dynamic Time Warping (DTW) over Euclidean distance gradually diminishes as
we consider ever larger datasets. However, as we shall show, the same is not
true for clustering. Clustering time series under DTW remains a computationally
expensive operation. In this work, we address this issue in two ways. We
propose a novel pruning strategy that exploits both the upper and lower bounds
to prune off a very large fraction of the expensive distance calculations. This
pruning strategy is admissible and gives us provably identical results to the
brute force algorithm, but is at least an order of magnitude faster. For
datasets where even this level of speedup is inadequate, we show that we can
use a simple heuristic to order the unavoidable calculations in a
most-useful-first ordering, thus casting the clustering into an anytime
framework. We demonstrate the utility of our ideas with both single and
multidimensional case studies in the domains of astronomy, speech physiology,
medicine and entomology. In addition, we show the generality of our clustering
framework to other domains by efficiently obtaining semantically significant
clusters in protein sequences using the Edit Distance, the discrete data
analogue of DTW.

Predictive Clinical Decision Support System with RNN Encoding and Tensor Decoding

Yinchong Yang, Peter A. Fasching, Markus Wallwiener, Tanja N. Fehm, Sara Y. Brucker, Volker Tresp
Subjects: Learning (cs.LG)

With the introduction of the Electric Health Records, large amounts of
digital data become available for analysis and decision support. When
physicians are prescribing treatments to a patient, they need to consider a
large range of data variety and volume, making decisions increasingly complex.
Machine learning based Clinical Decision Support systems can be a solution to
the data challenges. In this work we focus on a class of decision support in
which the physicians’ decision is directly predicted. Concretely, the model
would assign higher probabilities to decisions that it presumes the physician
are more likely to make. Thus the CDS system can provide physicians with
rational recommendations. We also address the problem of correlation in target
features: Often a physician is required to make multiple (sub-)decisions in a
block, and that these decisions are mutually dependent. We propose a solution
to the target correlation problem using a tensor factorization model. In order
to handle the patients’ historical information as sequential data, we apply the
so-called Encoder-Decoder-Framework which is based on Recurrent Neural Networks
(RNN) as encoders and a tensor factorization model as a decoder, a combination
which is novel in machine learning. With experiments with real-world datasets
we show that the proposed model does achieve better prediction performances.

Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features

Zihao Chen, Luo Luo, Zhihua Zhang
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Recently, there has been an increasing interest in designing distributed
convex optimization algorithms under the setting where the data matrix is
partitioned on features. Algorithms under this setting sometimes have many
advantages over those under the setting where data is partitioned on samples,
especially when the number of features is huge. Therefore, it is important to
understand the inherent limitations of these optimization problems. In this
paper, with certain restrictions on the communication allowed in the
procedures, we develop tight lower bounds on communication rounds for a broad
class of non-incremental algorithms under this setting. We also provide a lower
bound on communication rounds for a class of (randomized) incremental
algorithms.

Development of a hybrid learning system based on SVM, ANFIS and domain knowledge: DKFIS

Soumi Chaki, Aurobinda Routray, William K. Mohanty, Mamata Jenamani
Comments: 6 pages, 5 figures, 3tables Presented at Indicon 2015
Subjects: Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Applications (stat.AP); Machine Learning (stat.ML)

This paper presents the development of a hybrid learning system based on
Support Vector Machines (SVM), Adaptive Neuro-Fuzzy Inference System (ANFIS)
and domain knowledge to solve prediction problem. The proposed two-stage Domain
Knowledge based Fuzzy Information System (DKFIS) improves the prediction
accuracy attained by ANFIS alone. The proposed framework has been implemented
on a noisy and incomplete dataset acquired from a hydrocarbon field located at
western part of India. Here, oil saturation has been predicted from four
different well logs i.e. gamma ray, resistivity, density, and clay volume. In
the first stage, depending on zero or near zero and non-zero oil saturation
levels the input vector is classified into two classes (Class 0 and Class 1)
using SVM. The classification results have been further fine-tuned applying
expert knowledge based on the relationship among predictor variables i.e. well
logs and target variable – oil saturation. Second, an ANFIS is designed to
predict non-zero (Class 1) oil saturation values from predictor logs. The
predicted output has been further refined based on expert knowledge. It is
apparent from the experimental results that the expert intervention with
qualitative judgment at each stage has rendered the prediction into the
feasible and realistic ranges. The performance analysis of the prediction in
terms of four performance metrics such as correlation coefficient (CC), root
mean square error (RMSE), and absolute error mean (AEM), scatter index (SI) has
established DKFIS as a useful tool for reservoir characterization.

Self-critical Sequence Training for Image Captioning

Recently it has been shown that policy-gradient methods for reinforcement
learning can be utilized to train deep end-to-end systems directly on
non-differentiable metrics for the task at hand. In this paper we consider the
problem of optimizing image captioning systems using reinforcement learning,
and show that by carefully optimizing our systems using the test metrics of the
MSCOCO task, significant gains in performance can be realized. Our systems are
built using a new optimization approach that we call self-critical sequence
training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather
than estimating a “baseline” to normalize the rewards and reduce variance,
utilizes the output of its own test-time inference algorithm to normalize the
rewards it experiences. Using this approach, estimating the reward signal (as
actor-critic methods must do) and estimating normalization (as REINFORCE
algorithms typically do) is avoided, while at the same time harmonizing the
model with respect to its test-time inference procedure. Empirically we find
that directly optimizing the CIDEr metric with SCST and greedy decoding at
test-time is highly effective. Our results on the MSCOCO evaluation sever
establish a new state-of-the-art on the task, improving the best result in
terms of CIDEr from 104.9 to 112.3.

Higher Order Mutual Information Approximation for Feature Selection

Jilin Wu, Soumyajit Gupta, Chandrajit Bajaj
Comments: 14 page, 5 figures
Subjects: Learning (cs.LG)

Feature selection is a process of choosing a subset of relevant features so
that the quality of prediction models can be improved. An extensive body of
work exists on information-theoretic feature selection, based on maximizing
Mutual Information (MI) between subsets of features and class labels. The prior
methods use a lower order approximation, by treating the joint entropy as a
summation of several single variable entropies. This leads to locally optimal
selections and misses multi-way feature combinations. We present a higher order
MI based approximation technique called Higher Order Feature Selection (HOFS).
Instead of producing a single list of features, our method produces a ranked
collection of feature subsets that maximizes MI, giving better comprehension
(feature ranking) as to which features work best together when selected, due to
their underlying interdependent structure. Our experiments demonstrate that the
proposed method performs better than existing feature selection approaches
while keeping similar running times and computational complexity.

A Noise-Filtering Approach for Cancer Drug Sensitivity Prediction

Turki Turki, Zhi Wei
Comments: Accepted at NIPS 2016 Workshop on Machine Learning for Health
Subjects: Learning (cs.LG); Genomics (q-bio.GN); Machine Learning (stat.ML)

Accurately predicting drug responses to cancer is an important problem
hindering oncologists’ efforts to find the most effective drugs to treat
cancer, which is a core goal in precision medicine. The scientific community
has focused on improving this prediction based on genomic, epigenomic, and
proteomic datasets measured in human cancer cell lines. Real-world cancer cell
lines contain noise, which degrades the performance of machine learning
algorithms. This problem is rarely addressed in the existing approaches. In
this paper, we present a noise-filtering approach that integrates techniques
from numerical linear algebra and information retrieval targeted at filtering
out noisy cancer cell lines. By filtering out noisy cancer cell lines, we can
train machine learning algorithms on better quality cancer cell lines. We
evaluate the performance of our approach and compare it with an existing
approach using the Area Under the ROC Curve (AUC) on clinical trial data. The
experimental results show that our proposed approach is stable and also yields
the highest AUC at a statistically significant level.

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Problems at the intersection of vision and language are of significant
importance both as challenging research questions and for the rich set of
applications they enable. However, inherent structure in our world and bias in
our language tend to be a simpler signal for learning than visual modalities,
resulting in models that ignore visual information, leading to an inflated
sense of their capability.

We propose to counter these language priors for the task of Visual Question
Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance
the popular VQA dataset (Antol et al., ICCV 2015) by collecting complementary
images such that every question in our balanced dataset is associated with not
just a single image, but rather a pair of similar images that result in two
different answers to the question. Our dataset is by construction more balanced
than the original VQA dataset and has approximately twice the number of
image-question pairs. Our complete balanced dataset will be publicly released
as part of the 2nd iteration of the Visual Question Answering Challenge (VQA
v2.0).

We further benchmark a number of state-of-art VQA models on our balanced
dataset. All models perform significantly worse on our balanced dataset,
suggesting that these models have indeed learned to exploit language priors.
This finding provides the first concrete empirical evidence for what seems to
be a qualitative sense among practitioners.

Finally, our data collection protocol for identifying complementary images
enables us to develop a novel interpretable model, which in addition to
providing an answer to the given (image, question) pair also provides a
counter-example based explanation – specifically, it identifies an image that
is similar to the original image, but it believes has a different answer to the
same question. This can help in building trust for machines among their users.

Scribbler: Controlling Deep Image Synthesis with Sketch and Color

Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
Comments: 13 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Recently, there have been several promising methods to generate realistic
imagery from deep convolutional networks. These methods sidestep the
traditional computer graphics rendering pipeline and instead generate imagery
at the pixel level by learning from large collections of photos (e.g. faces or
bedrooms). However, these methods are of limited utility because it is
difficult for a user to control what the network produces. In this paper, we
propose a deep adversarial image synthesis architecture that is conditioned on
coarse sketches and sparse color strokes to generate realistic cars, bedrooms,
or faces. We demonstrate a sketch based image synthesis system which allows
users to ‘scribble’ over the sketch to indicate preferred color for objects.
Our network can then generate convincing images that satisfy both the color and
the sketch constraints of user. The network is feed-forward which allows users
to see the effect of their edits in real time. We compare to recent work on
sketch to image synthesis and show that our approach can generate more
realistic, more diverse, and more controllable outputs. The architecture is
also effective at user-guided colorization of grayscale images.

Learning with Hierarchical Gaussian Kernels

Ingo Steinwart, Philipp Thomann, Nico Schmid
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We investigate iterated compositions of weighted sums of Gaussian kernels and
provide an interpretation of the construction that shows some similarities with
the architectures of deep neural networks. On the theoretical side, we show
that these kernels are universal and that SVMs using these kernels are
universally consistent. We further describe a parameter optimization method for
the kernel parameters and empirically compare this method to SVMs, random
forests, a multiple kernel learning approach, and to some deep neural networks.

Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee
Comments: published at NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Learning (cs.LG)

Understanding the 3D world is a fundamental problem in computer vision.
However, learning a good representation of 3D objects is still an open problem
due to the high dimensionality of the data and many factors of variation
involved. In this work, we investigate the task of single-view 3D object
reconstruction from a learning agent’s perspective. We formulate the learning
process as an interaction between 3D and 2D representations and propose an
encoder-decoder network with a novel projection loss defined by the perspective
transformation. More importantly, the projection loss enables the unsupervised
learning using 2D observation without explicit 3D supervision. We demonstrate
the ability of the model in generating 3D volume from a single 2D image with
three sets of experiments: (1) learning from single-class objects; (2) learning
from multi-class objects and (3) testing on novel object classes. Results show
superior performance and better generalization ability for 3D object
reconstruction when the projection loss is involved.

Restricted Strong Convexity Implies Weak Submodularity

Ethan R. Elenberg, Rajiv Khanna, Alexandros G. Dimakis, Sahand Negahban
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

We connect high-dimensional subset selection and submodular maximization. Our
results extend the work of Das and Kempe (2011) from the setting of linear
regression to arbitrary objective functions. For greedy feature selection, this
connection allows us to obtain strong multiplicative performance bounds on
several methods without statistical modeling assumptions. We also derive
recovery guarantees of this form under standard assumptions. Our work shows
that greedy algorithms perform within a constant factor from the best possible
subset-selection solution for a broad class of general objective functions. Our
methods allow a direct control over the number of obtained features as opposed
to regularization parameters that only implicitly control sparsity. Our proof
technique uses the concept of weak submodularity initially defined by Das and
Kempe. We draw a connection between convex analysis and submodular set function
theory which may be of independent interest for other statistical learning
applications that have combinatorial structure.

A simple squared-error reformulation for ordinal classification

Christopher Beckham, Christopher Pal
Comments: Camera-ready abstract for NIPS for Health Workshop (2016)
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

In this paper, we explore ordinal classification (in the context of deep
neural networks) through a simple modification of the squared error loss which
not only allows it to not only be sensitive to class ordering, but also allows
the possibility of having a discrete probability distribution over the classes.
Our formulation is based on the use of a softmax hidden layer, which has
received relatively little attention in the literature. We empirically evaluate
its performance on the Kaggle diabetic retinopathy dataset, an ordinal and
high-resolution dataset and show that it outperforms all of the baselines
employed.

Asynchronous Stochastic Gradient MCMC with Elastic Coupling

Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

We consider parallel asynchronous Markov Chain Monte Carlo (MCMC) sampling
for problems where we can leverage (stochastic) gradients to define continuous
dynamics which explore the target distribution. We outline a solution strategy
for this setting based on stochastic gradient Hamiltonian Monte Carlo sampling
(SGHMC) which we alter to include an elastic coupling term that ties together
multiple MCMC instances. The proposed strategy turns inherently sequential HMC
algorithms into asynchronous parallel versions. First experiments empirically
show that the resulting parallel sampler significantly speeds up exploration of
the target distribution, when compared to standard SGHMC, and is less prone to
the harmful effects of stale gradients than a naive parallelization approach.

Probabilistic Neural Programs

Kenton W. Murray, Jayant Krishnamurthy
Comments: Appears in NAMPI workshop at NIPS 2016
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present probabilistic neural programs, a framework for program induction
that permits flexible specification of both a computational model and inference
algorithm while simultaneously enabling the use of deep neural networks.
Probabilistic neural programs combine a computation graph for specifying a
neural network with an operator for weighted nondeterministic choice. Thus, a
program describes both a collection of decisions as well as the neural network
architecture used to make each one. We evaluate our approach on a challenging
diagram question answering task where probabilistic neural programs correctly
execute nearly twice as many programs as a baseline model.

Reliable Evaluation of Neural Network for Multiclass Classification of Real-world Data

Siddharth Dinesh, Tirtharaj Dash
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

This paper presents a systematic evaluation of Neural Network (NN) for
classification of real-world data. In the field of machine learning, it is
often seen that a single parameter that is ‘predictive accuracy’ is being used
for evaluating the performance of a classifier model. However, this parameter
might not be considered reliable given a dataset with very high level of
skewness. To demonstrate such behavior, seven different types of datasets have
been used to evaluate a Multilayer Perceptron (MLP) using twelve(12) different
parameters which include micro- and macro-level estimation. In the present
study, the most common problem of prediction called ‘multiclass’ classification
has been considered. The results that are obtained for different parameters for
each of the dataset could demonstrate interesting findings to support the
usability of these set of performance evaluation parameters.

Voxelwise nonlinear regression toolbox for neuroimage analysis: Application to aging and neurodegenerative disease modeling

This paper describes a new neuroimaging analysis toolbox that allows for the
modeling of nonlinear effects at the voxel level, overcoming limitations of
methods based on linear models like the GLM. We illustrate its features using a
relevant example in which distinct nonlinear trajectories of Alzheimer’s
disease related brain atrophy patterns were found across the full biological
spectrum of the disease.

Predicting Patient State-of-Health using Sliding Window and Recurrent Classifiers

Adam McCarthy, Christopher K.I. Williams
Comments: NIPS 2016 Workshop on Machine Learning for Health
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Bedside monitors in Intensive Care Units (ICUs) frequently sound incorrectly,
slowing response times and desensitising nurses to alarms (Chambrin, 2001),
causing true alarms to be missed (Hug et al., 2011). We compare sliding window
predictors with recurrent predictors to classify patient state-of-health from
ICU multivariate time series; we report slightly improved performance for the
RNN for three out of four targets.

Inverse Modeling of Complex Interactive Behavior with ABC

Can one make deep inferences about a user based only on observations of how
she interacts? This paper contributes a methodology for inverse modeling in
HCI, where the goal is to estimate a cognitive model from limited behavioral
data. Given substantial diversity in users’ intentions, strategies and
abilities, this is a difficult problem and previously unaddressed in HCI. We
show advances following an approach that combines (1) computational
rationality, to predict how a user adapts to a task when her capabilities are
known, and (2) approximate Bayesian computation (ABC) to estimate those
capabilities. The benefit is that model parameters are conditioned on both
prior knowledge and observations, which improves model validity and helps
identify causes for observations. We demonstrate these benefits in a case of
menu interaction where the method obtained accurate estimates of users’
behavioral and cognitive features from selection time data only. Inverse
modeling methods can advance theoretical HCI by bringing complex behavior
within reach of modeling.

A temporal model for multiple sclerosis course evolution

Samuele Fiorini, Andrea Tacchino, Giampaolo Brichetto, Alessandro Verri, Annalisa Barla
Comments: NIPS Machine Learning for health Workshop 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Multiple Sclerosis is a degenerative condition of the central nervous system
that affects nearly 2.5 million of individuals in terms of their physical,
cognitive, psychological and social capabilities. Researchers are currently
investigating on the use of patient reported outcome measures for the
assessment of impact and evolution of the disease on the life of the patients.
To date, a clear understanding on the use of such measures to predict the
evolution of the disease is still lacking. In this work we resort to
regularized machine learning methods for binary classification and multiple
output regression. We propose a pipeline that can be used to predict the
disease progression from patient reported measures. The obtained model is
tested on a data set collected from an ongoing clinical research project.

Active Search for Sparse Signals with Region Sensing

Yifei Ma, Roman Garnett, Jeff Schneider
Comments: aaai 2017 preprint; nips exhibition of rejections
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

Autonomous systems can be used to search for sparse signals in a large space;
e.g., aerial robots can be deployed to localize threats, detect gas leaks, or
respond to distress calls. Intuitively, search algorithms may increase
efficiency by collecting aggregate measurements summarizing large contiguous
regions. However, most existing search methods either ignore the possibility of
such region observations (e.g., Bayesian optimization and multi-armed bandits)
or make strong assumptions about the sensing mechanism that allow each
measurement to arbitrarily encode all signals in the entire environment (e.g.,
compressive sensing). We propose an algorithm that actively collects data to
search for sparse signals using only noisy measurements of the average values
on rectangular regions (including single points), based on the greedy
maximization of information gain. We analyze our algorithm in 1d and show that
it requires ( ilde{O}(frac{n}{mu^2}+k^2)) measurements to recover all of (k)
signal locations with small Bayes error, where (mu) and (n) are the signal
strength and the size of the search space, respectively. We also show that
active designs can be fundamentally more efficient than passive designs with
region sensing, contrasting with the results of Arias-Castro, Candes, and
Davenport (2013). We demonstrate the empirical performance of our algorithm on
a search problem using satellite image data and in high dimensions.

Canonical Correlation Analysis for Analyzing Sequences of Medical Billing Codes

Corinne L. Jones, Sham M. Kakade, Lucas W. Thornblade, David R. Flum, Abraham D. Flaxman
Comments: Accepted at NIPS 2016 Workshop on Machine Learning for Health
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We propose using canonical correlation analysis (CCA) to generate features
from sequences of medical billing codes. Applying this novel use of CCA to a
database of medical billing codes for patients with diverticulitis, we first
demonstrate that the CCA embeddings capture meaningful relationships among the
codes. We then generate features from these embeddings and establish their
usefulness in predicting future elective surgery for diverticulitis, an
important marker in efforts for reducing costs in healthcare.

Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes

Due to physiological variation, patients diagnosed with the same condition
may exhibit divergent, but related, responses to the same treatments. Hidden
Parameter Markov Decision Processes (HiP-MDPs) tackle this transfer-learning
problem by embedding these tasks into a low-dimensional space. However, the
original formulation of HiP-MDP had a critical flaw: the embedding uncertainty
was modeled independently of the agent’s state uncertainty, requiring an
unnatural training procedure in which all tasks visited every part of the state
space—possible for robots that can be moved to a particular location,
impossible for human patients. We update the HiP-MDP framework and extend it to
more robustly develop personalized medicine strategies for HIV treatment.

Information Theory

Overloaded Multiuser MISO Transmission with Imperfect CSIT

Enrico Piovano, Hamdi Joudeh, Bruno Clerckx
Comments: Presented at the 50th Annual Asilomar Conference on Signals, Systems and Computers (ASILOMAR 2016)
Subjects: Information Theory (cs.IT)

A required feature for the next generation of wireless communication networks
will be the capability to serve simultaneously a large number of devices with
heterogeneous CSIT qualities and demands. In this paper, we consider the
overloaded MISO BC with two groups of CSIT qualities. We propose a transmission
scheme where degraded symbols are superimposed on top of spatially-multiplexed
symbols. The developed strategy allows to serve all users in a non-orthogonal
manner and the analysis shows an enhanced perfomance compared to existing
schemes. Moreover, optimality in a DoF sense is shown.

Massive Non-Orthogonal Multiple Access for Cellular IoT: Potentials and Limitations

Mahyar Shirvanimoghaddam, Mischa Dohler, Sarah Johnson
Comments: To appear in IEEE Communications Magazine
Subjects: Information Theory (cs.IT)

The Internet of Things (IoT) promises ubiquitous connectivity of everything
everywhere, which represents the biggest technology trend in the years to come.
It is expected that by 2020 over 25 billion devices will be connected to
cellular networks; far beyond the number of devices in current wireless
networks. Machine-to-Machine (M2M) communications aims at providing the
communication infrastructure for enabling IoT by facilitating the billions of
multi-role devices to communicate with each other and with the underlying data
transport infrastructure without, or with little, human intervention. Providing
this infrastructure will require a dramatic shift from the current protocols
mostly designed for human-to-human (H2H) applications. This article reviews
recent 3GPP solutions for enabling massive cellular IoT and investigates the
random access strategies for M2M communications, which shows that cellular
networks must evolve to handle the new ways in which devices will connect and
communicate with the system. A massive non-orthogonal multiple access (NOMA)
technique is then presented as a promising solution to support a massive number
of IoT devices in cellular networks, where we also identify its practical
challenges and future research directions.

Restricted Strong Convexity Implies Weak Submodularity

Ethan R. Elenberg, Rajiv Khanna, Alexandros G. Dimakis, Sahand Negahban
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

We connect high-dimensional subset selection and submodular maximization. Our
results extend the work of Das and Kempe (2011) from the setting of linear
regression to arbitrary objective functions. For greedy feature selection, this
connection allows us to obtain strong multiplicative performance bounds on
several methods without statistical modeling assumptions. We also derive
recovery guarantees of this form under standard assumptions. Our work shows
that greedy algorithms perform within a constant factor from the best possible
subset-selection solution for a broad class of general objective functions. Our
methods allow a direct control over the number of obtained features as opposed
to regularization parameters that only implicitly control sparsity. Our proof
technique uses the concept of weak submodularity initially defined by Das and
Kempe. We draw a connection between convex analysis and submodular set function
theory which may be of independent interest for other statistical learning
applications that have combinatorial structure.

Sampling Theorems for Shift-invariant Spaces, Gabor Frames, and Totally Positive Functions

Karlheinz Gröchenig, José Luis Romero, Joachim Stöckler
Comments: 25 pages
Subjects: Functional Analysis (math.FA); Information Theory (cs.IT)

We study nonuniform sampling in shift-invariant spaces and the construction
of Gabor frames with respect to the class of totally positive functions whose
Fourier transform factors as ( hat g(xi)= prod_{j=1}^n (1+2pi
idelta_jxi)^{-1} , e^{-c xi^2}) for (delta_1,ldots,delta_nin
mathbb{R}, c >0) (in which case (g) is called totally positive of Gaussian
type).

In analogy to Beurling’s sampling theorem for the Paley-Wiener space of
entire functions, we prove that every separated set with lower Beurling density
(>1) is a sampling set for the shift-invariant space generated by such a (g).
In view of the known necessary density conditions, this result is optimal and
validates the heuristic reasonings in the engineering literature.

Using a subtle connection between sampling in shift-invariant spaces and the
theory of Gabor frames, we show that the set of phase-space shifts of (g) with
respect to a rectangular lattice (alpha mathbb{Z} imes eta mathbb{Z})
forms a frame, if and only if (alpha eta <1). This solves an open problem
going back to Daubechies in 1990 for the class of totally positive functions of
Gaussian type.

The proof strategy involves the connection between sampling in
shift-invariant spaces and Gabor frames, a new characterization of sampling
sets “without inequalities” in the style of Beurling, new properties of totally
positive functions, and the interplay between zero sets of functions in a
shift-invariant space and functions in the Bargmann-Fock space.

Not Call Me Cellular Any More: The Emergence of Scaling Law, Fractal Patterns and Small-World in Wireless Networks

Chao Yuan, Zhifeng Zhao, Rongpeng Li, Meng Li, Honggang Zhang
Subjects: Social and Information Networks (cs.SI); Information Theory (cs.IT)

In conventional cellular networks, for base stations (BSs) that are deployed
far away from each other, it is general to assume them to be mutually
independent. Nevertheless, after long-term evolution of cellular networks in
various generations, the assumption no longer holds. Instead, the BSs, which
seem to be gradually deployed by operators in a casual manner, have embed many
fundamental features in their locations, coverage and traffic loading. Their
features can be leveraged to analyze the intrinstic pattern in BSs and even
human community. In this paper, according to large-scale measurement datasets,
we build correlation model of BSs by utilizing one of the most important
features, that is, traffic. Coupling with the theory of complex networks, we
make further analysis on the structure and characteristics of this traffic load
correlation model. Simulation results show that its degree distribution follows
scale-free property. Moreover, the datasets also unveil the characteristics of
fractality and small-world. Furthermore, we apply Collective Influence(CI)
algorithm to localize the influential base stations and demonstrate that some
low-degree BSs may outrank BSs with larger degree.

Entropy bounds on state estimation for stochastic non-linear systems under information constraints

Christoph Kawan, Serdar Yüksel
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

This paper studies state estimation over noisy channels for stochastic
non-linear systems. We consider three estimation objectives, a strong and a
weak form of almost sure stability of the estimation error as well as quadratic
stability in expectation. For all three objectives, we derive lower bounds on
the smallest channel capacity (C_0) above which the objective can be achieved
with an arbitrarily small error. Lower bounds are obtained via a dynamical
systems, an information-theoretic and a random dynamical systems approach. The
random dynamical systems approach is shown to be operationally non-adequate for
the problem, since it typically yields strictly lower lower bounds when
compared with the previous two approaches even. The first two approaches show
that for a large class of systems, such as additive noise systems, (C_0 =
infty), i.e., the estimation objectives cannot be achieved via channels of
finite capacity. Finally, we prove that a memoryless noisy channel in general
constitutes no obstruction to almost sure state estimation with arbitrarily
small errors, when there is no noise in the system. Our results are in
agreement with the existing results for deterministic systems with noise-free
channels.