IT博客汇 | arXiv Paper Daily: Tue, 6 Dec 2016

arXiv Paper Daily: Tue, 6 Dec 2016

我爱机器学习(52ml.net)发表于 2016-12-06 00:00:00

Neural and Evolutionary Computing

BrainFrame: A heterogeneous accelerator platform for neuron simulations

Georgios Smaragdos, Georgios Chatzikonstantis, Rahul Kukreja, Harrys Sidiropoulos, Dimitrios Rodopoulos, Ioannis Sourdis, Zaid Al-Ars, Christoforos Kachris, Dimitrios Soudris, Chris I. De Zeeuw, Christos Strydis
Comments: 17 pages, 19 figures, 4 tables
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC)

Objective: The advent of High-Performance Computing (HPC) in recent years has
led to its increasing use in brain study through computational models. The
scale and complexity of such models are constantly increasing, leading to
challenging computational requirements. Even though modern HPC platforms can
often deal with such challenges, the vast diversity of the modeling field does
not permit for a single acceleration (or homogeneous) platform to effectively
address the complete array of modeling requirements. Approach: In this paper we
propose and build BrainFrame, a heterogeneous acceleration platform,
incorporating three distinct acceleration technologies, a Dataflow Engine, a
Xeon Phi and a GP-GPU. The PyNN framework is also integrated into the platform.
As a challenging proof of concept, we analyze the performance of BrainFrame on
different instances of a state-of-the-art neuron model, modeling the Inferior-
Olivary Nucleus using a biophysically-meaningful, extended Hodgkin-Huxley
representation. The model instances take into account not only the neuronal-
network dimensions but also different network-connectivity circumstances that
can drastically change application workload characteristics. Main results: The
synthetic approach of three HPC technologies demonstrated that BrainFrame is
better able to cope with the modeling diversity encountered. Our performance
analysis shows clearly that the model directly affect performance and all three
technologies are required to cope with all the model use cases.

Enabling Bio-Plausible Multi-level STDP using CMOS Neurons with Dendrites and Bistable RRAMs

Xinyu Wu, Vishal Saxena
Subjects: Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)

Large-scale integration of emerging nanoscale non-volatile memory devices,
e.g. resistive random-access memory (RRAM), can enable a new generation of
neuromorphic computers that can solve a wide range of machine learning
problems. Such hybrid CMOS-RRAM neuromorphic architectures will result in
several orders of magnitude reduction in energy consumption at a very small
form factor, and herald autonomous learning machines capable of self-adapting
to their environment. However, the progress in this area has been impeded from
the realization that the actual memory devices fall well short of their
expected behavior. In this work, we discuss the challenges associated with
these memory devices and their use in neuromorphic computing circuits, and
propose pathways to overcome these limitations by introducing ‘dendritic
learning’.

Message Passing Multi-Agent GANs

Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri
Comments: The first 2 authors contributed equally for this work
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Communicating and sharing intelligence among agents is an important facet of
achieving Artificial General Intelligence. As a first step towards this
challenge, we introduce a novel framework for image generation: Message Passing
Multi-Agent Generative Adversarial Networks (MPM GANs). While GANs have
recently been shown to be very effective for image generation and other tasks,
these networks have been limited to mostly single generator-discriminator
networks. We show that we can obtain multi-agent GANs that communicate through
message passing to achieve better image generation. The objectives of the
individual agents in this framework are two fold: a co-operation objective and
a competing objective. The co-operation objective ensures that the message
sharing mechanism guides the other generator to generate better than itself
while the competing objective encourages each generator to generate better than
its counterpart. We analyze and visualize the messages that these GANs share
among themselves in various scenarios. We quantitatively show that the message
sharing formulation serves as a regularizer for the adversarial training.
Qualitatively, we show that the different generators capture different traits
of the underlying data distribution.

Known Unknowns: Uncertainty Quality in Bayesian Neural Networks

Ramon Oliveira, Pedro Tabacof, Eduardo Valle
Comments: Workshop on Bayesian Deep Learning, NIPS 2016, Barcelona, Spain
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We evaluate the uncertainty quality in neural networks using anomaly
detection. We extract uncertainty measures (e.g. entropy) from the predictions
of candidate models, use those measures as features for an anomaly detector,
and gauge how well the detector differentiates known from unknown classes. We
assign higher uncertainty quality to candidate models that lead to better
detectors. We also propose a novel method for sampling a variational
approximation of a Bayesian neural network, called One-Sample Bayesian
Approximation (OSBA). We experiment on two datasets, MNIST and CIFAR10. We
compare the following candidate neural network models: Maximum Likelihood,
Bayesian Dropout, OSBA, and — for MNIST — the standard variational
approximation. We show that Bayesian Dropout and OSBA provide better
uncertainty information than Maximum Likelihood, and are essentially equivalent
to the standard variational approximation, but much faster.

Semi-supervised learning of deep metrics for stereo reconstruction

Stepan Tulyakov, Anton Ivanov, Francois Fleuret
Comments: 11 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Deep-learning metrics have recently demonstrated extremely good performance
to match image patches for stereo reconstruction. However, training such
metrics requires large amount of labeled stereo images, which can be difficult
or costly to collect for certain applications. The main contribution of our
work is a new semi-supervised method for learning deep metrics from unlabeled
stereo images, given coarse information about the scenes and the optical
system. Our method alternatively optimizes the metric with a standard
stochastic gradient descent, and applies stereo constraints to regularize its
prediction. Experiments on reference data-sets show that, for a given network
architecture, training with this new method without ground-truth produces a
metric with performance as good as state-of-the-art baselines trained with the
said ground-truth. This work has three practical implications. Firstly, it
helps to overcome limitations of training sets, in particular noisy ground
truth. Secondly it allows to use much more training data during learning.
Thirdly, it allows to tune deep metric for a particular stereo system, even if
ground truth is not available.

Positive blood culture detection in time series data using a BiLSTM network

Leen De Baets, Joeri Ruyssinck, Thomas Peiffer, Johan Decruyenaere, Filip De Turck, Femke Ongenae, Tom Dhaene
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

The presence of bacteria or fungi in the bloodstream of patients is abnormal
and can lead to life-threatening conditions. A computational model based on a
bidirectional long short-term memory artificial neural network, is explored to
assist doctors in the intensive care unit to predict whether examination of
blood cultures of patients will return positive. As input it uses nine
monitored clinical parameters, presented as time series data, collected from
2177 ICU admissions at the Ghent University Hospital. Our main goal is to
determine if general machine learning methods and more specific, temporal
models, can be used to create an early detection system. This preliminary
research obtains an area of 71.95% under the precision recall curve, proving
the potential of temporal neural networks in this context.

Parameter Compression of Recurrent Neural Networks and Degredation of Short-term Memory

Jonathan A. Cox
Comments: Submitted to IJCNN 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The significant computational costs of deploying neural networks in
large-scale or resource constrained environments, such as data centers and
mobile devices, has spurred interest in model compression, which can achieve a
reduction in both arithmetic operations and storage memory. Several techniques
have been proposed for reducing or compressing the parameters for feed-forward
and convolutional neural networks, but less is understood about the effect of
parameter compression on recurrent neural networks (RNN). In particular, the
extent to which the recurrent parameters can be compressed and the impact on
short-term memory performance, is not well understood. In this paper, we study
the effect of complexity reduction, through singular value decomposition rank
reduction, on RNN and minimal gated recurrent unit (MGRU) networks for several
tasks. We show that considerable rank reduction is possible when compressing
recurrent weights, even without fine tuning. Furthermore, we propose a
perturbation model for the effect of general perturbations, such as a
compression, on the recurrent parameters of RNNs. The model is tested against a
noiseless memorization experiment that elucidates the short-term memory
performance. In this way, we demonstrate that the effect of compression of
recurrent parameters is dependent on the degree of temporal coherence present
in the data and task. This work can guide on-the-fly RNN compression for novel
environments or tasks, and provides insight for applying RNN compression in
low-power devices, such as hearing aids.

Computer Vision and Pattern Recognition

ROAM: a Rich Object Appearance Model with Application to Rotoscoping

Ondrej Miksik, Juan-Manuel Pérez-Rúa, Philip H. S. Torr, Patrick Pérez
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Rotoscoping, the detailed delineation of scene elements through a video shot,
is a painstaking task of tremendous importance in professional post-production
pipelines. While pixel-wise segmentation techniques can help for this task,
professional rotoscoping tools rely on parametric curves that offer the artists
a much better interactive control on the definition, editing and manipulation
of the segments of interest. Sticking to this prevalent rotoscoping paradigm,
we propose a novel framework to capture and track the visual aspect of an
arbitrary object in a scene, given a first closed outline of this object. This
model combines a collection of local foreground/background appearance models
spread along the outline, a global appearance model of the enclosed object and
a set of distinctive foreground landmarks. The structure of this rich
appearance model allows simple initialization, efficient iterative optimization
with exact minimization at each step, and on-line adaptation in videos. We
demonstrate qualitatively and quantitatively the merit of this framework
through comparisons with tools based on either dynamic segmentation with a
closed curve or pixel-wise binary labelling.

Authoring image decompositions with generative models

Jason Rock, Theerasit Issaranon, Aditya Deshpande, David Forsyth
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We show how to extend traditional intrinsic image decompositions to
incorporate further layers above albedo and shading. It is hard to obtain data
to learn a multi-layer decomposition. Instead, we can learn to decompose an
image into layers that are “like this” by authoring generative models for each
layer using proxy examples that capture the Platonic ideal (Mondrian images for
albedo; rendered 3D primitives for shading; material swatches for shading
detail). Our method then generates image layers, one from each model, that
explain the image. Our approach rests on innovation in generative models for
images. We introduce a Convolutional Variational Auto Encoder (conv-VAE), a
novel VAE architecture that can reconstruct high fidelity images. The approach
is general, and does not require that layers admit a physical interpretation.

Articulated Multi-person Tracking in the Wild

Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Bjoern Andres, Bernt Schiele
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we propose an approach for articulated tracking of multiple
people in unconstrained videos. Our starting point is a model that resembles
existing architectures for single-frame pose estimation but is several orders
of magnitude faster. We achieve this in two ways: (1) by simplifying and
sparsifying the body-part relationship graph and leveraging recent methods for
faster inference, and (2) by offloading a substantial share of computation onto
a feed-forward convolutional architecture that is able to detect and associate
body joints of the same person even in clutter. We use this model to generate
proposals for body joint locations and formulate articulated tracking as
spatio-temporal grouping of such proposals. This allows to jointly solve the
association problem for all people in the scene by propagating evidence from
strong detections through time and enforcing constraints that each proposal can
be assigned to one person only. We report results on a public MPII Human Pose
benchmark and on a new dataset of videos with multiple people. We demonstrate
that our model achieves state-of-the-art results while using only a fraction of
time and is able to leverage temporal information to improve state-of-the-art
for crowded scenes.

ImageNet pre-trained models with batch normalization

Marcel Simon, Erik Rodner, Joachim Denzler
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone
of most state-of-the-art approaches. In this paper, we present a new set of
pre-trained models with popular state-of-the-art architectures for the Caffe
framework. The first release includes Residual Networks (ResNets) with
generation script as well as the batch-normalization-variants of AlexNet and
VGG19. All models outperform previous models with the same architecture. The
models and training code are available at
this http URL and
this https URL

A Distance Function for Comparing Straight-Edge Geometric Figures

Apoorva Honnegowda Roopa, Shrisha Rao
Comments: 29 pages, 12 figures including appendices
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG)

This paper defines a distance function that measures the dissimilarity
between planar geometric figures formed with straight lines. This function can
in turn be used in partial matching of different geometric figures. For a given
pair of geometric figures that are graphically isomorphic, one function
measures the angular dissimilarity and another function measures the edge
length disproportionality. The distance function is then defined as the convex
sum of these two functions. The novelty of the presented function is that it
satisfies all properties of a distance function and the computation of the same
is done by projecting appropriate features to a cartesian plane. To compute the
deviation from the angular similarity property, the Euclidean distance between
the given angular pairs and the corresponding points on the (y=x) line is
measured. Further while computing the deviation from the edge length
proportionality property, the best fit line, for the set of edge lengths, which
passes through the origin is found, and the Euclidean distance between the
given edge length pairs and the corresponding point on a (y=mx) line is
calculated. Iterative Proportional Fitting Procedure (IPFP) is used to find
this best fit line. We demonstrate the behavior of the defined function for
some sample pairs of figures.

From One-Trick Ponies to All-Rounders: On-Demand Learning for Image Restoration

Ruohan Gao, Kristen Grauman
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While machine learning approaches to image restoration offer great promise,
current methods risk training “one-trick ponies” that perform well only for
image corruption of a particular level of difficulty—such as a certain level
of noise or blur. First, we expose the weakness of today’s one-trick pony and
demonstrate that training general models equipped to handle arbitrary levels of
corruption is indeed non-trivial. Then, we propose an on-demand learning
algorithm for training image restoration models with deep convolutional neural
networks. The main idea is to exploit a feedback mechanism to self-generate
training instances where they are needed most, thereby learning models that can
generalize across difficulty levels. On four restoration tasks—image
inpainting, pixel interpolation, image deblurring, and image denoising—and
three diverse datasets, our approach consistently outperforms both the status
quo training procedure and curriculum learning alternatives.

Human-In-The-Loop Person Re-Identification

Hanxiao Wang, Shaogang Gong, Xiatian Zhu, Tao Xiang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Current person re-identification (re-id) methods assume that (1) pre-labelled
training data are available for every camera pair, (2) the gallery size for
re-identification is moderate. Both assumptions scale poorly to real-world
applications when camera network size increases and gallery size becomes large.
Human verification of automatic model ranked re-id results becomes inevitable.
In this work, a novel human-in-the-loop re-id model based on Human Verification
Incremental Learning (HVIL) is formulated which does not require any
pre-labelled training data to learn a model, therefore readily scalable to new
camera pairs. This HVIL model learns cumulatively from human feedback to
provide instant improvement to re-id ranking of each probe on-the-fly enabling
the model scalable to large gallery sizes. We further formulate a Regularised
Metric Ensemble Learning (RMEL) model to combine a series of incrementally
learned HVIL models into a single ensemble model to be used when human feedback
becomes unavailable.

Highly Efficient Regression for Scalable Person Re-Identification

Hanxiao Wang, Shaogang Gong, Tao Xiang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing person re-identification models are poor for scaling up to large
data required in real-world applications due to: (1) Complexity: They employ
complex models for optimal performance resulting in high computational cost for
training at a large scale; (2) Inadaptability: Once trained, they are
unsuitable for incremental update to incorporate any new data available. This
work proposes a truly scalable solution to re-id by addressing both problems.
Specifically, a Highly Efficient Regression (HER) model is formulated by
embedding the Fisher’s criterion to a ridge regression model for very fast
re-id model learning with scalable memory/storage usage. Importantly, this new
HER model supports faster than real-time incremental model updates therefore
making real-time active learning feasible in re-id with human-in-the-loop.
Extensive experiments show that such a simple and fast model not only
outperforms notably the state-of-the-art re-id methods, but also is more
scalable to large data with additional benefits to active learning for reducing
human labelling effort in re-id deployment.

Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection

Dimitrios Marmanis, Konrad Schindler, Jan Dirk Wegner, Silvano Galliani, Mihai Datcu, Uwe Stilla
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present an end-to-end trainable deep convolutional neural network (DCNN)
for semantic segmentation with built-in awareness of semantically meaningful
boundaries. Semantic segmentation is a fundamental remote sensing task, and
most state-of-the-art methods rely on DCNNs as their workhorse. A major reason
for their success is that deep networks learn to accumulate contextual
information over very large windows (receptive fields). However, this success
comes at a cost, since the associated loss of effecive spatial resolution
washes out high-frequency details and leads to blurry object boundaries. Here,
we propose to counter this effect by combining semantic segmentation with
semantically informed edge detection, thus making class-boundaries explicit in
the model, First, we construct a comparatively simple, memory-efficient model
by adding boundary detection to the Segnet encoder-decoder architecture.
Second, we also include boundary detection in FCN-type models and set up a
high-end classifier ensemble. We show that boundary detection significantly
improves semantic segmentation with CNNs. Our high-end ensemble achieves > 90%
overall accuracy on the ISPRS Vaihingen benchmark.

Stereo image de-fencing using smartphones

Sankaraganesh Jonna, Sukla Satapathy, Rajiv R. Sahay
Comments: Under review as a conference paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Conventional approaches to image de-fencing have limited themselves to using
only image data in adjacent frames of the captured video of an approximately
static scene. In this work, we present a method to harness disparity using a
stereo pair of fenced images in order to detect fence pixels. Tourists and
amateur photographers commonly carry smartphones/phablets which can be used to
capture a short video sequence of the fenced scene. We model the formation of
the occluded frames in the captured video. Furthermore, we propose an
optimization framework to estimate the de-fenced image using the total
variation prior to regularize the ill-posed problem.

Message Passing Multi-Agent GANs

Communicating and sharing intelligence among agents is an important facet of
achieving Artificial General Intelligence. As a first step towards this
challenge, we introduce a novel framework for image generation: Message Passing
Multi-Agent Generative Adversarial Networks (MPM GANs). While GANs have
recently been shown to be very effective for image generation and other tasks,
these networks have been limited to mostly single generator-discriminator
networks. We show that we can obtain multi-agent GANs that communicate through
message passing to achieve better image generation. The objectives of the
individual agents in this framework are two fold: a co-operation objective and
a competing objective. The co-operation objective ensures that the message
sharing mechanism guides the other generator to generate better than itself
while the competing objective encourages each generator to generate better than
its counterpart. We analyze and visualize the messages that these GANs share
among themselves in various scenarios. We quantitatively show that the message
sharing formulation serves as a regularizer for the adversarial training.
Qualitatively, we show that the different generators capture different traits
of the underlying data distribution.

Point Pair Feature based Object Detection for Random Bin Picking

Wim Abbeloos, Toon Goedemé
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Point pair features are a popular representation for free form 3D object
detection and pose estimation. In this paper, their performance in an
industrial random bin picking context is investigated. A new method to generate
representative synthetic datasets is proposed. This allows to investigate the
influence of a high degree of clutter and the presence of self similar
features, which are typical to our application. We provide an overview of
solutions proposed in literature and discuss their strengths and weaknesses. A
simple heuristic method to drastically reduce the computational complexity is
introduced, which results in improved robustness, speed and accuracy compared
to the naive approach.

Panoramic Structure from Motion via Geometric Relationship Detection

Satoshi Ikehata, Ivaylo Boyadzhiev, Qi Shan, Yasutaka Furukawa
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the problem of Structure from Motion (SfM) for indoor
panoramic image streams, extremely challenging even for the state-of-the-art
due to the lack of textures and minimal parallax. The key idea is the fusion of
single-view and multi-view reconstruction techniques via geometric relationship
detection (e.g., detecting 2D lines as coplanar in 3D). Rough geometry suffices
to perform such detection, and our approach utilizes rough surface normal
estimates from an image-to-normal deep network to discover geometric
relationships among lines. The detected relationships provide exact geometric
constraints in our line-based linear SfM formulation. A constrained linear
least squares is used to reconstruct a 3D model and camera motions, followed by
the bundle adjustment. We have validated our algorithm on challenging datasets,
outperforming various state-of-the-art reconstruction techniques.

Deep Image Category Discovery using a Transferred Similarity Function

Yen-Chang Hsu, Zhaoyang Lv, Zsolt Kira
Comments: 13 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Automatically discovering image categories in unlabeled natural images is one
of the important goals of unsupervised learning. However, the task is
challenging and even human beings define visual categories based on a large
amount of prior knowledge. In this paper, we similarly utilize prior knowledge
to facilitate the discovery of image categories. We present a novel end-to-end
network to map unlabeled images to categories as a clustering network. We
propose that this network can be learned with contrastive loss which is only
based on weak binary pair-wise constraints. Such binary constraints can be
learned from datasets in other domains as transferred similarity functions,
which mimic a simple knowledge transfer. We first evaluate our experiments on
the MNIST dataset as a proof of concept, based on predicted similarities
trained on Omniglot, showing a 99\% accuracy which significantly outperforms
clustering based approaches. Then we evaluate the discovery performance on
Cifar-10, STL-10, and ImageNet, which achieves both state-of-the-art accuracy
and shows it can be scalable to various large natural images.

Cancerous Nuclei Detection and Scoring in Breast Cancer Histopathological Images

Pegah Faridi, Habibollah Danyali, Mohammad Sadegh Helfroush, Mojgan Akbarzadeh Jahromi
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Early detection and prognosis of breast cancer are feasible by utilizing
histopathological grading of biopsy specimens. This research is focused on
detection and grading of nuclear pleomorphism in histopathological images of
breast cancer. The proposed method consists of three internal steps. First,
unmixing colors of H&E is used in the preprocessing step. Second, nuclei
boundaries are extracted incorporating the center of cancerous nuclei which are
detected by applying morphological operations and Difference of Gaussian filter
on the preprocessed image. Finally, segmented nuclei are scored to accomplish
one parameter of the Nottingham grading system for breast cancer. In this
approach, the nuclei area, chromatin density, contour regularity, and nucleoli
presence, are features for nuclear pleomorphism scoring. Experimental results
showed that the proposed algorithm, with an accuracy of 86.6%, made significant
advancement in detecting cancerous nuclei compared to existing methods in the
related literature.

Turning an Urban Scene Video into a Cinemagraph

Hang Yan, Yebin Liu, Yasutaka Furukawa
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes an algorithm that turns a regular video capturing urban
scenes into a high-quality endless animation, known as a Cinemagraph. The
creation of a Cinemagraph usually requires a static camera in a carefully
configured scene. The task becomes challenging for a regular video with a
moving camera and objects. Our approach first warps an input video into the
viewpoint of a reference camera. Based on the warped video, we propose
effective temporal analysis algorithms to detect regions with static geometry
and dynamic appearance, where geometric modeling is reliable and visually
attractive animations can be created. Lastly, the algorithm applies a sequence
of video processing techniques to produce a Cinemagraph movie. We have tested
the proposed approach on numerous challenging real scenes. To our knowledge,
this work is the first to automatically generate Cinemagraph animations from
regular movies in the wild.

Multi-way Particle Swarm Fusion

Chen Liu, Hang Yan, Pushmeet Kohli, Yasutaka Furukawa
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel MAP inference framework for Markov Random Field
(MRF) in parallel computing environments. The inference framework, dubbed Swarm
Fusion, is a natural generalization of the Fusion Move method. Every thread (in
a case of multi-threading environments) maintains and updates a solution. At
each iteration, a thread can generate arbitrary number of solution proposals
and take arbitrary number of concurrent solutions from the other threads to
perform multi-way fusion in updating its solution. The framework is general,
making popular existing inference techniques such as alpha-expansion, fusion
move, parallel alpha-expansion, and hierarchical fusion, its special cases. We
have evaluated the effectiveness of our approach against competing methods on
three problems of varying difficulties, in particular, the stereo, the optical
flow, and the layered depthmap estimation problems.

Deep Pyramidal Residual Networks with Separated Stochastic Depth

Yoshihiro Yamada, Masakazu Iwamura, Koichi Kise
Subjects: Computer Vision and Pattern Recognition (cs.CV)

On general object recognition, Deep Convolutional Neural Networks (DCNNs)
achieve high accuracy. In particular, ResNet and its improvements have broken
the lowest error rate records. In this paper, we propose a method to
successfully combine two ResNet improvements, ResDrop and PyramidNet. We
confirmed that the proposed network outperformed the conventional methods; on
CIFAR-100, the proposed network achieved an error rate of 16.18% in contrast to
PiramidNet achieving that of 18.29% and ResNeXt 17.31%.

Local Blur Mapping: Exploiting High-Level Semantics by Deep Neural Networks

Kede Ma, Huan Fu, Tongliang Liu, Zhou Wang, Dacheng Tao
Comments: Tech report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The human visual system excels at detecting local blur of visual images, but
the underlying mechanism is mysterious. Traditional views of blur such as
reduction in local or global high-frequency energy and loss of local phase
coherence have fundamental limitations. For example, they cannot well
discriminate flat regions from blurred ones. Here we argue that high-level
semantic information is critical in successfully detecting local blur.
Therefore, we resort to deep neural networks that are proficient in learning
high-level features and propose the first end-to-end local blur mapping
algorithm based on a fully convolutional network (FCN). We empirically show
that high-level features of deeper layers indeed play a more important role
than low-level features of shallower layers in resolving challenging
ambiguities for this task. We test the proposed method on a standard blur
detection benchmark and demonstrate that it significantly advances the
state-of-the-art (ODS F-score of 0.853). In addition, we explore the use of the
generated blur map in three applications, including blur region segmentation,
blur degree estimation, and blur magnification.

Deep Multi-Modal Image Correspondence Learning

Chen Liu, Jiajun Wu, Pushmeet Kohli, Yasutaka Furukawa
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Inference of correspondences between images from different modalities is an
extremely important perceptual ability that enables humans to understand and
recognize cross-modal concepts. In this paper, we consider an instance of this
problem that involves matching photographs of building interiors with their
corresponding floorplan. This is a particularly challenging problem because a
floorplan, as a stylized architectural drawing, is very different in appearance
from a color photograph. Furthermore, individual photographs by themselves
depict only a part of a floorplan (e.g., kitchen, bathroom, and living room).
We propose the use of a number of different neural network architectures for
this task, which are trained and evaluated on a novel large-scale dataset of 5
million floorplan images and 80 million associated photographs. Experimental
evaluation reveals that our neural network architectures are able to identify
visual cues that result in reliable matches across these two quite different
modalities. In fact, the trained networks are able to even outperform human
subjects in several challenging image matching problems. Our result implies
that neural networks are effective at perceptual tasks that require long
periods of reasoning even for humans to solve.

Learnable Structured Clustering Framework for Deep Metric Learning

Hyun Oh Song, Stefanie Jegelka, Vivek Rathod, Kevin Murphy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Learning the representation and the similarity metric in an end-to-end
fashion with deep networks have demonstrated outstanding results for clustering
and retrieval. However, these recent approaches still suffer from the
performance degradation stemming from the local metric training procedure which
is unaware of the global structure of the embedding space.

We propose a global metric learning scheme for optimizing the deep metric
embedding with the learnable clustering function and the clustering metric
(NMI) in a novel structured prediction framework.

Our experiments on CUB200-2011, Cars196, and Stanford online products
datasets show state of the art performance both on the clustering and retrieval
tasks measured in the NMI and Recall@K evaluation metrics.

DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild

Rıza Alp Güler, George Trigeorgis, Epameinondas Antonakos, Patrick Snape, Stefanos Zafeiriou, Iasonas Kokkinos
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we propose to learn a mapping from image pixels into a dense
template grid through a fully convolutional network. We formulate this task as
a regression problem and train our network by leveraging upon manually
annotated facial landmarks “in-the-wild”. We use such landmarks to establish a
dense correspondence field between a three-dimensional object template and the
input image, which then serves as the ground-truth for training our regression
system. We show that we can combine ideas from semantic segmentation with
regression networks, yielding a highly-accurate “quantized regression”
architecture.

Our system, called DenseReg, allows us to estimate dense image-to-template
correspondences in a fully convolutional manner. As such our network can
provide useful correspondence information as a stand-alone system, while when
used as an initialization for Statistical Deformable Models we obtain landmark
localization results that largely outperform the current state-of-the-art on
the challenging 300W benchmark. We thoroughly evaluate our method on a host of
facial analysis tasks, and also demonstrate its use for other correspondence
estimation tasks, such as modelling of the human ear. DenseReg code is made
available at this http URL along with supplementary
materials.

Online Localization and Prediction of Actions and Interactions

Khurram Soomro, Haroon Idrees, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a person-centric and online approach to the challenging
problem of localization and prediction of actions and interactions in videos.
Typically, localization or recognition is performed in an offline manner where
all the frames in the video are processed together. This prevents timely
localization and prediction of actions and interactions – an important
consideration for many tasks including surveillance and human-machine
interaction.

In our approach, we estimate human poses at each frame and train
discriminative appearance models using the superpixels inside the pose bounding
boxes. Since the pose estimation per frame is inherently noisy, the conditional
probability of pose hypotheses at current time-step (frame) is computed using
pose estimations in the current frame and their consistency with poses in the
previous frames. Next, both the superpixel and pose-based foreground
likelihoods are used to infer the location of actors at each time through a
Conditional Random. The issue of visual drift is handled by updating the
appearance models, and refining poses using motion smoothness on joint
locations, in an online manner. For online prediction of action (interaction)
confidences, we propose an approach based on Structural SVM that operates on
short video segments, and is trained with the objective that confidence of an
action or interaction increases as time progresses. Lastly, we quantify the
performance of both detection and prediction together, and analyze how the
prediction accuracy varies as a time function of observed action (interaction)
at different levels of detection performance. Our experiments on several
datasets suggest that despite using only a few frames to localize actions
(interactions) at each time instant, we are able to obtain competitive results
to state-of-the-art offline methods.

Who is Mistaken?

Benjamin Eysenbach, Carl Vondrick, Antonio Torralba
Comments: See project website at: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recognizing when people have false beliefs is crucial for understanding their
actions. We introduce the novel problem of identifying when people in abstract
scenes have incorrect beliefs. We present a dataset of scenes, each visually
depicting an 8-frame story in which a character has a mistaken belief. We then
create a representation of characters’ beliefs for two tasks in human action
understanding: predicting who is mistaken, and when they are mistaken.
Experiments suggest that our method for identifying mistaken characters
performs better on these tasks than simple baselines. Diagnostics on our model
suggest it learns important cues for recognizing mistaken beliefs, such as
gaze. We believe models of people’s beliefs will have many applications in
action understanding, robotics, and healthcare.

General models for rational cameras and the case of two-slit projections

Matthew Trager, Bernd Sturmfels, John Canny, Martial Hebert, Jean Ponce
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The rational camera model recently introduced in [18] provides a general
methodology for studying abstract nonlinear imaging systems and their
multi-view geometry. This paper provides a concrete embedding of rational
cameras explicitly accounting for the mapping between physical visual rays and
image points, missing in the original model. This allows us to derive simple
but general analytical expressions for direct and inverse projections, and
define primitive rational cameras equivalent under the action of various
projective transformations, leading to a generalized notion of intrinsic
coordinates in this setting. The methodology is general, but it is illustrated
concretely by an in-depth study of two-slit cameras, which we describe using a
pair of linear projections. This simple analytical form allows us to describe
models for the corresponding primitive cameras, to introduce intrinsic
parameters with a clear geometric meaning, and to define an epipolar tensor
characterizing two-view correspondences. In turn, this leads to new algorithms
for structure from motion and self-calibration.

A method for the segmentation of images based on thresholding and applied to vesicular textures

Amelia Carolina Sparavigna
Comments: Keywords: Segmentation, Edge Detection, Image Analysis, 2D Textures, Texture Functions
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In image processing, a segmentation is a process of partitioning an image
into multiple sets of pixels, that are defined as super-pixels. Each
super-pixel is characterized by a label or parameter. Here, we are proposing a
method for determining the super-pixels based on the thresholding of the image.
This approach is quite useful for studying the images showing vesicular
textures.

Pyramid Scene Parsing Network

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Scene parsing is challenging for unrestricted open vocabulary and diverse
scenes. In this paper, we exploit the capability of global context information
by different-region-based context aggregation through our pyramid pooling
module together with the proposed pyramid scene parsing network (PSPNet). Our
global prior representation is effective to produce good quality results on the
scene parsing task, while PSPNet provides a superior framework for pixel-level
prediction tasks. The proposed approach achieves state-of-the-art performance
on various datasets. It came first in ImageNet scene parsing challenge 2016,
PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new
record of mIoU score as 85.4% on PASCAL VOC 2012 and 80.2% on Cityscapes.

Multi-Label Image Classification with Regional Latent Semantic Dependencies

Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent state-of-the-art approaches of multi-label image classification
exploit the label dependencies at the global level, largely improving the
labeling ability. However, accurately predicting small objects and visual
concepts is still challenging due to the limited discrimination of the global
visual features. In this paper, we propose a Regional Latent Semantic
Dependencies model (RLSD) to address this problem. The utilized model includes
a fully convolutional localization architecture to localize the regions that
may contain multiple highly-dependent labels. The localized regions are further
sent to the recurrent neural networks (RNN) to characterize the latent semantic
dependencies at the regional level. Experimental results on several benchmark
datasets show that our proposed model achieves the best performance compared to
the state-of-the-art models, especially for predicting small objects occurred
in the images. In addition, we set up an upper bound model (RLSD+ft-RPN) using
bounding box coordinates during training, the experimental results also show
that our RLSD can approach the upper bound without using the bounding-box
annotations, which is more realistic in the real world.

End-to-end Learning of Driving Models from Large-scale Video Datasets

Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Robust perception-action models should be learned from training data with
diverse visual appearances and realistic behaviors, yet current approaches to
deep visuomotor policy learning have been generally limited to in-situ models
learned from a single vehicle or a simulation environment. We advocate learning
a generic vehicle motion model from large scale crowd-sourced video data, and
develop an end-to-end trainable architecture for learning to predict a
distribution over future vehicle egomotion from instantaneous monocular camera
observations and previous vehicle state. Our model incorporates a novel
FCN-LSTM architecture, which can be learned from large-scale crowd-sourced
vehicle action data, and leverages available scene segmentation side tasks to
improve performance under a privileged learning paradigm.

Joint Visual Denoising and Classification using Deep Learning

Gang Chen, Yawei Li, Sargur N. Srihari
Comments: 5 pages, 7 figures, ICIP 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual restoration and recognition are traditionally addressed in pipeline
fashion, i.e. denoising followed by classification. Instead, observing
correlations between the two tasks, for example clearer image will lead to
better categorization and vice visa, we propose a joint framework for visual
restoration and recognition for handwritten images, inspired by advances in
deep autoencoder and multi-modality learning. Our model is a 3-pathway deep
architecture with a hidden-layer representation which is shared by multi-inputs
and outputs, and each branch can be composed of a multi-layer deep model. Thus,
visual restoration and classification can be unified using shared
representation via non-linear mapping, and model parameters can be learnt via
backpropagation. Using MNIST and USPS data corrupted with structured noise, the
proposed framework performs at least 20\% better in classification than
separate pipelines, as well as clearer recovered images. The noise model and
the reproducible source code is available at
{url{this https URL}}.

Skin Cancer Detection and Tracking using Data Synthesis and Deep Learning

Yunzhu Li, Andre Esteva, Brett Kuprel, Rob Novoa, Justin Ko, Sebastian Thrun
Comments: 4 pages, 5 figures, Yunzhu Li and Andre Esteva contributed equally to this work
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dense object detection and temporal tracking are needed across applications
domains ranging from people-tracking to analysis of satellite imagery over
time. The detection and tracking of malignant skin cancers and benign moles
poses a particularly challenging problem due to the general uniformity of large
skin patches, the fact that skin lesions vary little in their appearance, and
the relatively small amount of data available. Here we introduce a novel data
synthesis technique that merges images of individual skin lesions with
full-body images and heavily augments them to generate significant amounts of
data. We build a convolutional neural network (CNN) based system, trained on
this synthetic data, and demonstrate superior performance to traditional
detection and tracking techniques. Additionally, we compare our system to
humans trained with simple criteria. Our system is intended for potential
clinical use to augment the capabilities of healthcare providers. While
domain-specific, we believe the methods invoked in this work will be useful in
applying CNNs across domains that suffer from limited data availability.

Word Recognition with Deep Conditional Random Fields

Gang Chen, Yawei Li, Sargur N. Srihari
Comments: 5 pages, published in ICIP 2016. arXiv admin note: substantial text overlap with arXiv:1412.3397
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recognition of handwritten words continues to be an important problem in
document analysis and recognition. Existing approaches extract hand-engineered
features from word images–which can perform poorly with new data sets.
Recently, deep learning has attracted great attention because of the ability to
learn features from raw data. Moreover they have yielded state-of-the-art
results in classification tasks including character recognition and scene
recognition. On the other hand, word recognition is a sequential problem where
we need to model the correlation between characters. In this paper, we propose
using deep Conditional Random Fields (deep CRFs) for word recognition.
Basically, we combine CRFs with deep learning, in which deep features are
learned and sequences are labeled in a unified framework. We pre-train the deep
structure with stacked restricted Boltzmann machines (RBMs) for feature
learning and optimize the entire network with an online learning algorithm. The
proposed model was evaluated on two datasets, and seen to perform significantly
better than competitive baseline models. The source code is available at
this https URL

Learning to Segment Object Proposals via Recursive Neural Networks

Tianshui Chen, Liang Lin, Xian Wu, Xiaonan Luo
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

To avoid the exhaustive search over locations and scales, current
state-of-the-art object detection systems usually involve a crucial component
generating a batch of candidate object proposals from images. In this paper, we
present a simple yet effective approach for segmenting object proposals via a
deep architecture of recursive neural networks (RNNs), which hierarchically
groups regions for detecting object candidates over scales. Unlike traditional
methods that mainly adopt fixed similarity measures for merging regions or
finding object proposals, our approach adaptively learns the region merging
similarity and the objectness measure during the process of hierarchical region
grouping. Specifically, guided by a structured loss, the RNN model jointly
optimizes the cross-region similarity metric with the region merging process as
well as the objectness prediction. During inference of the object proposal
generation, we introduce randomness into the greedy search to cope with the
ambiguity of grouping regions. Extensive experiments on standard benchmarks,
e.g., PASCAL VOC and ImageNet, suggest that our approach is capable of
producing object proposals with high recall while well preserving the object
boundaries and outperforms other existing methods in both accuracy and
efficiency.

SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving

Bichen Wu, Forrest Iandola, Peter H. Jin, Kurt Keutzer
Comments: The supplementary material of this paper, which discusses the energy efficiency of SqueezeDet, is attached after the main paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Object detection is a crucial task for autonomous driving. In addition to
requiring high accuracy to ensure safety, object detection for autonomous
driving also requires real-time inference speed to guarantee prompt vehicle
control, as well as small model size and energy efficiency to enable embedded
system deployment. In this work, we propose SqueezeDet, a fully convolutional
neural network for object detection that aims to simultaneously satisfy all of
the above constraints. In our network we use convolutional layers not only to
extract feature maps, but also as the output layer to compute bounding boxes
and class probabilities. The detection pipeline of our model only contains a
single forward pass of a neural network, thus it is extremely fast. Our model
is fully-convolutional, which leads to small model size and better energy
efficiency. Finally, our experiments show that our model is very accurate,
achieving state-of-the-art accuracy on the KITTI benchmark.

Semi-Automated Annotation of Discrete States in Large Video Datasets

Lex Fridman, Bryan Reimer
Comments: To be presented at AAAI 2017. arXiv admin note: text overlap with arXiv:1508.04028
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a framework for semi-automated annotation of video frames where
the video is of an object that at any point in time can be labeled as being in
one of a finite number of discrete states. A Hidden Markov Model (HMM) is used
to model (1) the behavior of the underlying object and (2) the noisy
observation of its state through an image processing algorithm. The key insight
of this approach is that the annotation of frame-by-frame video can be reduced
from a problem of labeling every single image to a problem of detecting a
transition between states of the underlying objected being recording on video.
The performance of the framework is evaluated on a driver gaze classification
dataset composed of 16,000,000 images that were fully annotated over 6,000
hours of direct manual annotation labor. On this dataset, we achieve a 13x
reduction in manual annotation for an average accuracy of 99.1% and a 84x
reduction for an average accuracy of 91.2%.

Areas of Attention for Image Captioning

Marco Pedersoli, Thomas Lucas, Cordelia Schmid, Jakob Verbeek
Comments: Submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose “Areas of Attention”, a novel attention-based model for automatic
image caption generation. Our approach models the interplay between the state
of the RNN, image region descriptors and word embedding vectors by three
pairwise interactions. It allows association of caption words with local visual
appearances rather than with descriptors of the entire scene. This enables
better generalization to complex scenes not seen during training. Our model is
agnostic to the type of attention areas, and we instantiate it using regions
based on CNN activation grids, object proposals, and spatial transformer
networks. Our results show that all components of our model contribute to
obtain state-of-the-art performance on the MSCOCO dataset. In addition, our
results indicate that attention areas are correctly associated to meaningful
latent semantic structure in the generated captions.

Short-term traffic flow forecasting with spatial-temporal correlation in a hybrid deep learning framework

Yuankai Wu, Huachun Tan
Comments: 14 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning approaches have reached a celebrity status in artificial
intelligence field, its success have mostly relied on Convolutional Networks
(CNN) and Recurrent Networks. By exploiting fundamental spatial properties of
images and videos, the CNN always achieves dominant performance on visual
tasks. And the Recurrent Networks (RNN) especially long short-term memory
methods (LSTM) can successfully characterize the temporal correlation, thus
exhibits superior capability for time series tasks. Traffic flow data have
plentiful characteristics on both time and space domain. However, applications
of CNN and LSTM approaches on traffic flow are limited. In this paper, we
propose a novel deep architecture combined CNN and LSTM to forecast future
traffic flow (CLTFP). An 1-dimension CNN is exploited to capture spatial
features of traffic flow, and two LSTMs are utilized to mine the short-term
variability and periodicities of traffic flow. Given those meaningful features,
the feature-level fusion is performed to achieve short-term forecasting. The
proposed CLTFP is compared with other popular forecasting methods on an open
datasets. Experimental results indicate that the CLTFP has considerable
advantages in traffic flow forecasting. in additional, the proposed CLTFP is
analyzed from the view of Granger Causality, and several interesting properties
of CLTFP are discovered and discussed .

A Non-Local Means Approach for Gaussian Noise Removal from Images using a Modified Weighting Kernel

Mojtaba Kazemi, Ehsan Mohammadi.P, Parichehr shahidi sadeghi, Mohamad B. Menhaj
Comments: 2017 25th Iranian Conference on Electrical Engineering (ICEE)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Gaussian noise removal is an interesting area in digital image processing not
only to improve the visual quality, but for its impact on other post-processing
algorithms like image registration or segmentation. Many presented
state-of-the-art denoising methods are based on the self-similarity or
patch-based image processing. Specifically, Non-Local Means (NLM) as a
patch-based filter has gained increasing attention in recent years.
Essentially, this filter tends to obtain the noise-less signal value by
computing the Gaussian-weighted Euclidean distance between the patch
under-processing and other patches inside the image. However, the NLM filter is
sensitive to the outliers (pixels that their intensity values are far away from
other pixels) inside the patch, meaning that the pixels with the symmetric
locations in the patch are assigned the same weight. This can lead to
sub-optimal denoising performance when the destructive nature of noise
generates some outliers inside patches. In this paper, we propose a new
weighting approach to modify the Gaussian kernel of the NLM filter. Our
approach employs the geometric distance between image intensities to come up
with new weights for each pixel of a patch, lowering the impact of outliers on
the denoising performance. Experiments on a set of standard images and
different noise levels show that our proposed method outperforms the other
compared denoising filters.

Mining Spatio-temporal Data on Industrialization from Historical Registries

David Berenbaum, Dwyer Deighan, Thomas Marlow, Ashley Lee, Scott Frickel, Mark Howison
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Despite the growing availability of big data in many fields, historical data
on socioevironmental phenomena are often not available due to a lack of
automated and scalable approaches for collecting, digitizing, and assembling
them. We have developed a data-mining method for extracting tabulated, geocoded
data from printed directories. While scanning and optical character recognition
(OCR) can digitize printed text, these methods alone do not capture the
structure of the underlying data. Our pipeline integrates both page layout
analysis and OCR to extract tabular, geocoded data from structured text. We
demonstrate the utility of this method by applying it to scanned manufacturing
registries from Rhode Island that record 41 years of industrial land use. The
resulting spatio-temporal data can be used for socioenvironmental analyses of
industrialization at a resolution that was not previously possible. In
particular, we find strong evidence for the dispersion of manufacturing from
the urban core of Providence, the state’s capital, along the Interstate 95
corridor to the north and south.

Ensembles of Generative Adversarial Networks

Yaxing Wang, Lichao Zhang, Joost van de Weijer
Comments: accepted NIPS 2016 Workshop on Adversarial Training
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Ensembles are a popular way to improve results of discriminative CNNs. The
combination of several networks trained starting from different initializations
improves results significantly. In this paper we investigate the usage of
ensembles of GANs. The specific nature of GANs opens up several new ways to
construct ensembles. The first one is based on the fact that in the minimax
game which is played to optimize the GAN objective the generator network keeps
on changing even after the network can be considered optimal. As such ensembles
of GANs can be constructed based on the same network initialization but just
taking models which have different amount of iterations. These so-called self
ensembles are much faster to train than traditional ensembles. The second
method, called cascade GANs, redirects part of the training data which is badly
modeled by the first GAN to another GAN. In experiments on the CIFAR10 dataset
we show that ensembles of GANs obtain model probability distributions which
better model the data distribution. In addition, we show that these improved
results can be obtained at little additional computational cost.

Deep Learning with Energy-efficient Binary Gradient Cameras

Suren Jayasuriya, Orazio Gallo, Jinwei Gu, Jan Kautz
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Power consumption is a critical factor for the deployment of embedded
computer vision systems. We explore the use of computational cameras that
directly output binary gradient images to reduce the portion of the power
consumption allocated to image sensing. We survey the accuracy of binary
gradient cameras on a number of computer vision tasks using deep learning.
These include object recognition, head pose regression, face detection, and
gesture recognition. We show that, for certain applications, accuracy can be on
par or even better than what can be achieved on traditional images. We are also
the first to recover intensity information from binary spatial gradient
images–useful for applications with a human observer in the loop, such as
surveillance. Our results, which we validate with a prototype binary gradient
camera, point to the potential of gradient-based computer vision systems.

Food Image Recognition by Using Convolutional Neural Networks (CNNs)

Yuzhen Lu
Comments: 6 pages, 5 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Food image recognition is one of the promising applications of visual object
recognition in computer vision. In this study, a small-scale dataset consisting
of 5822 images of ten categories and a five-layer CNN was constructed to
recognize these images. The bag-of-features (BoF) model coupled with support
vector machine was first tested as comparison, resulting in an overall accuracy
of 56%, while the CNN performed much better with an overall accuracy of 74%.
Data expansion techniques were applied to increase the size of training images,
which achieved a significantly improved accuracy of more than 90% and prevent
the overfitting issue that occurred to the CNN without using data expansion.
Further improvement is within reach by collecting more images and optimizing
the network architecture and relevant hyper-parameters.

Semi-supervised learning of deep metrics for stereo reconstruction

Stepan Tulyakov, Anton Ivanov, Francois Fleuret
Comments: 11 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Deep-learning metrics have recently demonstrated extremely good performance
to match image patches for stereo reconstruction. However, training such
metrics requires large amount of labeled stereo images, which can be difficult
or costly to collect for certain applications. The main contribution of our
work is a new semi-supervised method for learning deep metrics from unlabeled
stereo images, given coarse information about the scenes and the optical
system. Our method alternatively optimizes the metric with a standard
stochastic gradient descent, and applies stereo constraints to regularize its
prediction. Experiments on reference data-sets show that, for a given network
architecture, training with this new method without ground-truth produces a
metric with performance as good as state-of-the-art baselines trained with the
said ground-truth. This work has three practical implications. Firstly, it
helps to overcome limitations of training sets, in particular noisy ground
truth. Secondly it allows to use much more training data during learning.
Thirdly, it allows to tune deep metric for a particular stereo system, even if
ground truth is not available.

End-to-end learning of brain tissue segmentation from imperfect labeling

Alex Fedorov, Jeremy Johnson, Eswar Damaraju, Alexei Ozerin, Vince Calhoun, Sergey Plis
Comments: 7 pages, 8 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Segmenting a structural magnetic resonance imaging (MRI) scan is an important
pre-processing step for analytic procedures and subsequent inferences about
longitudinal tissue changes. Manual segmentation defines the current gold
standard in quality but is prohibitively expensive. Automatic approaches are
computationally intensive, incredibly slow at scale, and error prone due to
usually involving many potentially faulty intermediate steps. In order to
streamline the segmentation, we introduce a deep learning model that is based
on volumetric dilated convolutions, subsequently reducing both processing time
and errors. Compared to its competitors, the model has a reduced set of
parameters and thus is easier to train and much faster to execute. The contrast
in performance between the dilated network and its competitors becomes obvious
when both are tested on a large dataset of unprocessed human brain volumes. The
dilated network consistently outperforms not only another state-of-the-art deep
learning approach, the up convolutional network, but also the ground truth on
which it was trained. Not only can the incredible speed of our model make large
scale analyses much easier but we also believe it has great potential in a
clinical setting where, with little to no substantial delay, a patient and
provider can go over test results.

Commonly Uncommon: Semantic Sparsity in Situation Recognition

Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Semantic sparsity is a common challenge in structured visual classification
problems; when the output space is complex, the vast majority of the possible
predictions are rarely, if ever, seen in the training set. This paper studies
semantic sparsity in situation recognition, the task of producing structured
summaries of what is happening in images, including activities, objects and the
roles objects play within the activity. For this problem, we find empirically
that most object-role combinations are rare, and current state-of-the-art
models significantly underperform in this sparse data regime. We avoid many
such errors by (1) introducing a novel tensor composition function that learns
to share examples across role-noun combinations and (2) semantically augmenting
our training data with automatically gathered examples of rarely observed
outputs using web data. When integrated within a complete CRF-based structured
prediction model, the tensor-based approach outperforms existing state of the
art by a relative improvement of 2.11% and 4.40% on top-5 verb and noun-role
accuracy, respectively. Adding 5 million images with our semantic augmentation
techniques gives further relative improvements of 6.23% and 9.57% on top-5 verb
and noun-role accuracy.

Parameter Compression of Recurrent Neural Networks and Degredation of Short-term Memory

Jonathan A. Cox
Comments: Submitted to IJCNN 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The significant computational costs of deploying neural networks in
large-scale or resource constrained environments, such as data centers and
mobile devices, has spurred interest in model compression, which can achieve a
reduction in both arithmetic operations and storage memory. Several techniques
have been proposed for reducing or compressing the parameters for feed-forward
and convolutional neural networks, but less is understood about the effect of
parameter compression on recurrent neural networks (RNN). In particular, the
extent to which the recurrent parameters can be compressed and the impact on
short-term memory performance, is not well understood. In this paper, we study
the effect of complexity reduction, through singular value decomposition rank
reduction, on RNN and minimal gated recurrent unit (MGRU) networks for several
tasks. We show that considerable rank reduction is possible when compressing
recurrent weights, even without fine tuning. Furthermore, we propose a
perturbation model for the effect of general perturbations, such as a
compression, on the recurrent parameters of RNNs. The model is tested against a
noiseless memorization experiment that elucidates the short-term memory
performance. In this way, we demonstrate that the effect of compression of
recurrent parameters is dependent on the degree of temporal coherence present
in the data and task. This work can guide on-the-fly RNN compression for novel
environments or tasks, and provides insight for applying RNN compression in
low-power devices, such as hearing aids.

Procedural Generation of Videos to Train Deep Action Recognition Networks

César Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel López Peña
Comments: Under review at CVPR 2017. this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning for human action recognition in videos is making significant
progress, but is slowed down by its dependency on expensive manual labeling of
large video collections. In this work, we investigate the generation of
synthetic training data for action recognition, as it has recently shown
promising results for a variety of other computer vision tasks. We propose an
interpretable parametric generative model of human action videos that relies on
procedural generation and other computer graphics techniques of modern game
engines.We generate a diverse, realistic, and physically plausible dataset of
human action videos, called PHAV for “Procedural Human Action Videos”. It
contains a total of 37,536 videos, more than 1,000 examples for each action of
35 categories. Our approach is not limited to existing motion capture
sequences, and we procedurally define 14 synthetic actions. We introduce a deep
multi-task representation learning architecture to mix synthetic and real
videos, even if the action categories differ. Our experiments on the UCF101 and
HMDB51 benchmarks suggest that combining our large set of synthetic videos with
small real-world datasets can boost recognition performance, significantly
outperforming fine-tuning state-of-the-art unsupervised generative models of
videos.

Multi-resolution Data Fusion for Super-Resolution Electron Microscopy

Suhas Sreehari, S. V. Venkatakrishnan, Katherine L. Bouman, Jeffrey P. Simmons, Lawrence F. Drummy, Charles A. Bouman
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Perhaps surprisingly, the total electron microscopy (EM) data collected to
date is less than a cubic millimeter. Consequently, there is an enormous demand
in the materials and biological sciences to image at greater speed and lower
dosage, while maintaining resolution. Traditional EM imaging based on
homogeneous raster-order scanning severely limits the volume of high-resolution
data that can be collected, and presents a fundamental limitation to
understanding physical processes such as material deformation, crack
propagation, and pyrolysis.

We introduce a novel multi-resolution data fusion (MDF) method for
super-resolution computational EM. Our method combines innovative data
acquisition with novel algorithmic techniques to dramatically improve the
resolution/volume/speed trade-off. The key to our approach is to collect the
entire sample at low resolution, while simultaneously collecting a small
fraction of data at high resolution. The high-resolution measurements are then
used to create a material-specific patch-library that is used within the
“plug-and-play” framework to dramatically improve super-resolution of the
low-resolution data. We present results using FEI electron microscope data that
demonstrate super-resolution factors of 4x, 8x, and 16x, while substantially
maintaining high image quality and reducing dosage.

Artificial Intelligence

The Complexity of Bayesian Networks Specified by Propositional and Relational Languages

Fabio Gagliardi Cozman, Denis Deratani Mauá
Subjects: Artificial Intelligence (cs.AI)

We examine the complexity of inference in Bayesian networks specified by
logical languages. We consider representations that range from fragments of
propositional logic to function-free first-order logic with equality; in doing
so we cover a variety of plate models and of probabilistic relational models.
We study the complexity of inferences when network, query and domain are the
input (the inferential and the combined complexity), when the network is fixed
and query and domain are the input (the query/data complexity), and when the
network and query are fixed and the domain is the input (the domain
complexity). We draw connections with probabilistic databases and liftability
results, and obtain complexity classes that range from polynomial to
exponential levels.

Deep Learning of Robotic Tasks using Strong and Weak Human Supervision

Bar Hilleli, Ran El-Yaniv
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

We propose a scheme for training a computerized agent to perform complex
human tasks such as highway steering. The scheme resembles natural
teaching-learning processes used by humans to teach themselves and each other
complex tasks, and consists of the following four stages. In the first stage
the agent learns by itself an informative low-dimensional representations of
raw input signals in an unsupervised learning manner. In the second stage the
agent learns to mimic the human instructor using supervised learning so as to
reach a basic performance level; the third stage is devoted to learning an
instantaneous reward model. Here, the (human) instructor observes (possibly in
real time) the agent performing the task and provides reward feedback. During
this stage the agent monitors both itself and the instructor feedback and
learns a reward model using supervised learning. This stage terminates when the
reward model is sufficiently accurate. In the last stage a reinforcement
learning algorithm is deployed to optimize the agent policy. The guidance
reward signal in the reinforcement learning algorithm relies on the previously
learned reward model. As a proof of concept for the proposed scheme, we
designed a system consisting of deep convolutional neural networks, and applied
it to successfully learn a computerized agent capable of autonomous highway
steering over the well-known racing game Assetto Corsa.

Algorithmic Songwriting with ALYSIA

Margareta Ackerman, David Loker
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)

This paper introduces ALYSIA: Automated LYrical SongwrIting Application.
ALYSIA is based on a machine learning model using Random Forests, and we
discuss its success at pitch and rhythm prediction. Next, we show how ALYSIA
was used to create original pop songs that were subsequently recorded and
produced. Finally, we discuss our vision for the future of Automated
Songwriting for both co-creative and autonomous systems.

DeepBach: a Steerable Model for Bach chorales generation

Gaëtan Hadjeres, François Pachet
Comments: 20 pages, 11 figures
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)

The composition of polyphonic chorale music in the style of J.S Bach has
represented a major challenge in automatic music composition over the last
decades. The art of Bach chorales composition involves combining four-part
harmony with characteristic rhythmic patterns and typical melodic movements to
produce musical phrases which begin, evolve and end (cadences) in a harmonious
way. To our knowledge, no model so far was able to solve all these problems
simultaneously using an agnostic machine-learning approach. This paper
introduces DeepBach, a statistical model aimed at modeling polyphonic music and
specifically four parts, hymn-like pieces. We claim that, after being trained
on the chorale harmonizations by Johann Sebastian Bach, our model is capable of
generating highly convincing chorales in the style of Bach. We evaluate how
indistinguishable our generated chorales are from existing Bach chorales with a
listening test. The results corroborate our claim. A key strength of DeepBach
is that it is agnostic and flexible. Users can constrain the generation by
imposing some notes, rhythms or cadences in the generated score. This allows
users to reharmonize user-defined melodies. DeepBach’s generation is fast,
making it usable for interactive music composition applications. Several
generation examples are provided and discussed from a musical point of view.

RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting

Andrzej Pacuk, Piotr Sankowski, Karol Węgrzycki, Adam Witkowski, Piotr Wygocki
Comments: 6 pages, 1 figure, 2 tables, Description of 2nd place winning solution of RecSys 2016 Challange. To be published in RecSys’16 Challange Proceedings
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Software Engineering (cs.SE)

We present the Mim-Solution’s approach to the RecSys Challenge 2016, which
ranked 2nd. The goal of the competition was to prepare job recommendations for
the users of the website Xing.com.

Our two phase algorithm consists of candidate selection followed by the
candidate ranking. We ranked the candidates by the predicted probability that
the user will positively interact with the job offer. We have used Gradient
Boosting Decision Trees as the regression tool.

Using Discourse Signals for Robust Instructor Intervention Prediction

Muthu Kumar Chandrasekaran, Carrie Demmans Epp, Min-Yen Kan, Diane Litman
Comments: To appear in proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, USA
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

We tackle the prediction of instructor intervention in student posts from
discussion forums in Massive Open Online Courses (MOOCs). Our key finding is
that using automatically obtained discourse relations improves the prediction
of when instructors intervene in student discussions, when compared with a
state-of-the-art, feature-rich baseline. Our supervised classifier makes use of
an automatic discourse parser which outputs Penn Discourse Treebank (PDTB) tags
that represent in-post discourse features. We show PDTB relation-based features
increase the robustness of the classifier and complement baseline features in
recalling more diverse instructor intervention patterns. In comprehensive
experiments over 14 MOOC offerings from several disciplines, the PDTB discourse
features improve performance on average. The resultant models are less
dependent on domain-specific vocabulary, allowing them to better generalize to
new courses.

A Matrix Splitting Perspective on Planning with Options

Pierre-Luc Bacon, Doina Precup
Comments: Accepted at the Continual Learning and Deep Networks Workshop, NIPS 2016
Subjects: Artificial Intelligence (cs.AI)

We show that the Bellman operator underlying the options framework leads to a
matrix splitting, an approach traditionally used to speed up convergence of
iterative solvers for large linear systems of equations. Based on standard
comparison theo- rems for matrix splittings, we then show how the asymptotic
rate of convergence varies as a function of the inherent timescales of the
options. This new perspective highlights a trade-off between asymptotic
performance and the cost of computation associated with building a good set of
options.

N-gram Opcode Analysis for Android Malware Detection

BooJoong Kang, Suleiman Y. Yerima, Sakir Sezer, Kieran McLaughlin
Journal-ref: International Journal on Cyber Situational Awareness, Vol. 1, No.
1, pp231-255 (2016)
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Android malware has been on the rise in recent years due to the increasing
popularity of Android and the proliferation of third party application markets.
Emerging Android malware families are increasingly adopting sophisticated
detection avoidance techniques and this calls for more effective approaches for
Android malware detection. Hence, in this paper we present and evaluate an
n-gram opcode features based approach that utilizes machine learning to
identify and categorize Android malware. This approach enables automated
feature discovery without relying on prior expert or domain knowledge for
pre-determined features. Furthermore, by using a data segmentation technique
for feature selection, our analysis is able to scale up to 10-gram opcodes. Our
experiments on a dataset of 2520 samples showed an f-measure of 98% using the
n-gram opcode based approach. We also provide empirical findings that
illustrate factors that have probable impact on the overall n-gram opcodes
performance trends.

Proportional Rankings

Piotr Skowron, Martin Lackner, Markus Brill, Dominik Peters, Edith Elkind
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

In this paper we extend the principle of proportional representation to
rankings. We consider the setting where alternatives need to be ranked based on
approval preferences. In this setting, proportional representation requires
that cohesive groups of voters are represented proportionally in each initial
segment of the ranking. Proportional rankings are desirable in situations where
initial segments of different lengths may be relevant, e.g., hiring decisions
(if it is unclear how many positions are to be filled), the presentation of
competing proposals on a liquid democracy platform (if it is unclear how many
proposals participants are taking into consideration), or recommender systems
(if a ranking has to accommodate different user types). We study the
proportional representation provided by several ranking methods and prove
theoretical guarantees. Furthermore, we experimentally evaluate these methods
and present preliminary evidence as to which methods are most suitable for
producing proportional rankings.

A New Type-II Fuzzy Logic Based Controller for Non-linear Dynamical Systems with Application to a 3-PSP Parallel Robot

Hamid Reza Hassanzadeh
Comments: Master’s thesis
Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI)

The concept of uncertainty is posed in almost any complex system including
parallel robots as an outstanding instance of dynamical robotics systems. As
suggested by the name, uncertainty, is some missing information that is beyond
the knowledge of human thus we may tend to handle it properly to minimize the
side-effects through the control process.

Type-II fuzzy logic has shown its superiority over traditional fuzzy logic
when dealing with uncertainty. Type-II fuzzy logic controllers are however
newer and more promising approaches that have been recently applied to various
fields due to their significant contribution especially when noise (as an
important instance of uncertainty) emerges. During the design of Type-I fuzzy
logic systems, we presume that we are almost certain about the fuzzy membership
functions which is not true in many cases. Thus T2FLS as a more realistic
approach dealing with practical applications might have a lot to offer. Type-II
fuzzy logic takes into account a higher level of uncertainty, in other words,
the membership grade for a type-II fuzzy variable is no longer a crisp number
but rather is itself a type-I linguistic term. In this thesis the effects of
uncertainty in dynamic control of a parallel robot is considered. More
specifically, it is intended to incorporate the Type-II Fuzzy Logic paradigm
into a model based controller, the so-called computed torque control method,
and apply the result to a 3 degrees of freedom parallel manipulator.

…

Message Passing Multi-Agent GANs

Communicating and sharing intelligence among agents is an important facet of
achieving Artificial General Intelligence. As a first step towards this
challenge, we introduce a novel framework for image generation: Message Passing
Multi-Agent Generative Adversarial Networks (MPM GANs). While GANs have
recently been shown to be very effective for image generation and other tasks,
these networks have been limited to mostly single generator-discriminator
networks. We show that we can obtain multi-agent GANs that communicate through
message passing to achieve better image generation. The objectives of the
individual agents in this framework are two fold: a co-operation objective and
a competing objective. The co-operation objective ensures that the message
sharing mechanism guides the other generator to generate better than itself
while the competing objective encourages each generator to generate better than
its counterpart. We analyze and visualize the messages that these GANs share
among themselves in various scenarios. We quantitatively show that the message
sharing formulation serves as a regularizer for the adversarial training.
Qualitatively, we show that the different generators capture different traits
of the underlying data distribution.

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision (Short Version)

Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao
Comments: Published in NAMPI workshop at NIPS 2016. Short version of arXiv:1611.00020
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

Extending the success of deep neural networks to natural language
understanding and symbolic reasoning requires complex operations and external
memory. Recent neural program induction approaches have attempted to address
this problem, but are typically limited to differentiable memory, and
consequently cannot scale beyond small synthetic tasks. In this work, we
propose the Manager-Programmer-Computer framework, which integrates neural
networks with non-differentiable memory to support abstract, scalable and
precise operations through a friendly neural computer interface. Specifically,
we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence
neural “programmer”, and a non-differentiable “computer” that is a Lisp
interpreter with code assist. To successfully apply REINFORCE for training, we
augment it with approximate gold programs found by an iterative maximum
likelihood training process. NSM is able to learn a semantic parser from weak
supervision over a large knowledge base. It achieves new state-of-the-art
performance on WebQuestionsSP, a challenging semantic parsing dataset, with
weak supervision. Compared to previous approaches, NSM is end-to-end, therefore
does not rely on feature engineering or domain specific knowledge.

Representing Independence Models with Elementary Triplets

Jose M. Peña
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

In an independence model, the triplets that represent conditional
independences between singletons are called elementary. It is known that the
elementary triplets represent the independence model unambiguously under some
conditions. In this paper, we show how this representation helps performing
some operations with independence models, such as finding the dominant triplets
or a minimal independence map of an independence model, or computing the union
or intersection of a pair of independence models, or performing causal
reasoning. For the latter, we rephrase in terms of conditional independences
some of Pearl’s results for computing causal effects.

Enhancing Use Case Points Estimation Method Using Soft Computing Techniques

Ali Bou Nassif, Luiz Fernando Capretz, Danny Ho
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Learning (cs.LG)

Software estimation is a crucial task in software engineering. Software
estimation encompasses cost, effort, schedule, and size. The importance of
software estimation becomes critical in the early stages of the software life
cycle when the details of software have not been revealed yet. Several
commercial and non-commercial tools exist to estimate software in the early
stages. Most software effort estimation methods require software size as one of
the important metric inputs and consequently, software size estimation in the
early stages becomes essential. One of the approaches that has been used for
about two decades in the early size and effort estimation is called use case
points. Use case points method relies on the use case diagram to estimate the
size and effort of software projects. Although the use case points method has
been widely used, it has some limitations that might adversely affect the
accuracy of estimation. This paper presents some techniques using fuzzy logic
and neural networks to improve the accuracy of the use case points method.
Results showed that an improvement up to 22% can be obtained using the proposed
approach.

Commonly Uncommon: Semantic Sparsity in Situation Recognition

Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Semantic sparsity is a common challenge in structured visual classification
problems; when the output space is complex, the vast majority of the possible
predictions are rarely, if ever, seen in the training set. This paper studies
semantic sparsity in situation recognition, the task of producing structured
summaries of what is happening in images, including activities, objects and the
roles objects play within the activity. For this problem, we find empirically
that most object-role combinations are rare, and current state-of-the-art
models significantly underperform in this sparse data regime. We avoid many
such errors by (1) introducing a novel tensor composition function that learns
to share examples across role-noun combinations and (2) semantically augmenting
our training data with automatically gathered examples of rarely observed
outputs using web data. When integrated within a complete CRF-based structured
prediction model, the tensor-based approach outperforms existing state of the
art by a relative improvement of 2.11% and 4.40% on top-5 verb and noun-role
accuracy, respectively. Adding 5 million images with our semantic augmentation
techniques gives further relative improvements of 6.23% and 9.57% on top-5 verb
and noun-role accuracy.

Toward Efficient Task Assignment and Motion Planning for Large Scale Underwater Mission

Somaiyeh Mahmoud Zadeh, David MW Powers, Karl Sammut, Amirmehdi Yazdani
Comments: arXiv admin note: substantial text overlap with arXiv:1604.03308
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

An Autonomous Underwater Vehicle (AUV) needs to acquire a certain degree of
autonomy for any particular underwater mission to fulfill the mission
objectives successfully and ensure its safety in all stages of the mission in a
large scale operating filed. In this paper, a novel combinatorial
conflict-free-task assignment strategy consisting an interactive engagement of
a local path planner and an adaptive global route planner, is introduced. The
method is established upon the heuristic search potency of the Particle Swarm
Optimisation (PSO) algorithm to address the discrete nature of routing-task
assignment approach and the complexity of NP-hard path planning problem. The
proposed hybrid method is highly efficient for having a reactive guidance
framework that guarantees successful completion of missions specifically in
cluttered environments. To examine the performance of the method in a context
of mission productivity, mission time management and vehicle safety, a series
of simulation studies are undertaken. The results of simulations declare that
the proposed method is reliable and robust, particularly in dealing with
uncertainties, and it can significantly enhance the level of vehicle’s autonomy
by relying on its reactive nature and capability of providing fast feasible
solutions.

Information Retrieval

We used Neural Networks to Detect Clickbaits: You won't believe what happened Next!

Ankesh Anand, Tanmoy Chakraborty, Noseong Park
Comments: Accepted to the European Conference on Information Retrieval (ECIR), 2017
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Online content publishers often use catchy headlines for their articles in
order to attract users to their websites. These headlines, popularly known as
clickbaits, exploit a user’s curiosity gap and lure them to click on links that
often disappoint them. Existing methods for automatically detecting clickbaits
rely on heavy feature engineering and domain knowledge. Here, we introduce a
neural network architecture based on Recurrent Neural Networks for detecting
clickbaits. Our model relies on distributed word representations learned from a
large unannotated corpora, and character embeddings learned via Convolutional
Neural Networks. Experimental results on a dataset of news headlines show that
our model outperforms existing techniques for clickbait detection with an
accuracy of 0.98 with F1-score of 0.98 and ROC-AUC of 0.99.

Mining Spatio-temporal Data on Industrialization from Historical Registries

David Berenbaum, Dwyer Deighan, Thomas Marlow, Ashley Lee, Scott Frickel, Mark Howison
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Despite the growing availability of big data in many fields, historical data
on socioevironmental phenomena are often not available due to a lack of
automated and scalable approaches for collecting, digitizing, and assembling
them. We have developed a data-mining method for extracting tabulated, geocoded
data from printed directories. While scanning and optical character recognition
(OCR) can digitize printed text, these methods alone do not capture the
structure of the underlying data. Our pipeline integrates both page layout
analysis and OCR to extract tabular, geocoded data from structured text. We
demonstrate the utility of this method by applying it to scanned manufacturing
registries from Rhode Island that record 41 years of industrial land use. The
resulting spatio-temporal data can be used for socioenvironmental analyses of
industrialization at a resolution that was not previously possible. In
particular, we find strong evidence for the dispersion of manufacturing from
the urban core of Providence, the state’s capital, along the Interstate 95
corridor to the north and south.

RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting

We present the Mim-Solution’s approach to the RecSys Challenge 2016, which
ranked 2nd. The goal of the competition was to prepare job recommendations for
the users of the website Xing.com.

Our two phase algorithm consists of candidate selection followed by the
candidate ranking. We ranked the candidates by the predicted probability that
the user will positively interact with the job offer. We have used Gradient
Boosting Decision Trees as the regression tool.

Computation and Language

Mapping the Dialog Act Annotations of the LEGO Corpus into the Communicative Functions of ISO 24617-2

Eugénio Ribeiro, Ricardo Ribeiro, David Martins de Matos
Comments: 20 pages, 2 figures
Subjects: Computation and Language (cs.CL)

In this paper we present strategies for mapping the dialog act annotations of
the LEGO corpus into the communicative functions of the ISO 24617-2 standard.
Using these strategies, we obtained an additional 347 dialogs annotated
according to the standard. This is particularly important given the reduced
amount of existing data in those conditions due to the recency of the standard.
Furthermore, these are dialogs from a widely explored corpus for dialog related
tasks. However, its dialog annotations have been neglected due to their high
domain-dependency, which renders them unuseful outside the context of the
corpus. Thus, through our mapping process, we both obtain more data annotated
according to a recent standard and provide useful dialog act annotations for a
widely explored corpus in the context of dialog research.

We used Neural Networks to Detect Clickbaits: You won't believe what happened Next!

Online content publishers often use catchy headlines for their articles in
order to attract users to their websites. These headlines, popularly known as
clickbaits, exploit a user’s curiosity gap and lure them to click on links that
often disappoint them. Existing methods for automatically detecting clickbaits
rely on heavy feature engineering and domain knowledge. Here, we introduce a
neural network architecture based on Recurrent Neural Networks for detecting
clickbaits. Our model relies on distributed word representations learned from a
large unannotated corpora, and character embeddings learned via Convolutional
Neural Networks. Experimental results on a dataset of news headlines show that
our model outperforms existing techniques for clickbait detection with an
accuracy of 0.98 with F1-score of 0.98 and ROC-AUC of 0.99.

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision (Short Version)

Extending the success of deep neural networks to natural language
understanding and symbolic reasoning requires complex operations and external
memory. Recent neural program induction approaches have attempted to address
this problem, but are typically limited to differentiable memory, and
consequently cannot scale beyond small synthetic tasks. In this work, we
propose the Manager-Programmer-Computer framework, which integrates neural
networks with non-differentiable memory to support abstract, scalable and
precise operations through a friendly neural computer interface. Specifically,
we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence
neural “programmer”, and a non-differentiable “computer” that is a Lisp
interpreter with code assist. To successfully apply REINFORCE for training, we
augment it with approximate gold programs found by an iterative maximum
likelihood training process. NSM is able to learn a semantic parser from weak
supervision over a large knowledge base. It achieves new state-of-the-art
performance on WebQuestionsSP, a challenging semantic parsing dataset, with
weak supervision. Compared to previous approaches, NSM is end-to-end, therefore
does not rely on feature engineering or domain specific knowledge.

CER: Complementary Entity Recognition via Knowledge Expansion on Large Unlabeled Product Reviews

Hu Xu, Sihong Xie, Lei Shu, Philip S. Yu
Comments: 10 pages, 2 figures, IEEE BigData 2016
Subjects: Computation and Language (cs.CL)

Product reviews contain a lot of useful information about product features
and customer opinions. One important product feature is the complementary
entity (products) that may potentially work together with the reviewed product.
Knowing complementary entities of the reviewed product is very important
because customers want to buy compatible products and avoid incompatible ones.
In this paper, we address the problem of Complementary Entity Recognition
(CER). Since no existing method can solve this problem, we first propose a
novel unsupervised method to utilize syntactic dependency paths to recognize
complementary entities. Then we expand category-level domain knowledge about
complementary entities using only a few general seed verbs on a large amount of
unlabeled reviews. The domain knowledge helps the unsupervised method to adapt
to different products and greatly improves the precision of the CER task. The
advantage of the proposed method is that it does not require any labeled data
for training. We conducted experiments on 7 popular products with about 1200
reviews in total to demonstrate that the proposed approach is effective.

Unit Dependency Graph and its Application to Arithmetic Word Problem Solving

Subhro Roy, Dan Roth
Comments: AAAI 2017
Subjects: Computation and Language (cs.CL)

Math word problems provide a natural abstraction to a range of natural
language understanding problems that involve reasoning about quantities, such
as interpreting election results, news about casualties, and the financial
section of a newspaper. Units associated with the quantities often provide
information that is essential to support this reasoning. This paper proposes a
principled way to capture and reason about units and shows how it can benefit
an arithmetic word problem solver. This paper presents the concept of Unit
Dependency Graphs (UDGs), which provides a compact representation of the
dependencies between units of numbers mentioned in a given problem. Inducing
the UDG alleviates the brittleness of the unit extraction system and allows for
a natural way to leverage domain knowledge about unit compatibility, for word
problem solving. We introduce a decomposed model for inducing UDGs with minimal
additional annotations, and use it to augment the expressions used in the
arithmetic word problem solver of (Roy and Roth 2015) via a constrained
inference framework. We show that introduction of UDGs reduces the error of the
solver by over 10 %, surpassing all existing systems for solving arithmetic
word problems. In addition, it also makes the system more robust to adaptation
to new vocabulary and equation forms .

End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager

Xuesong Yang, Yun-Nung Chen, Dilek Hakkani-Tur, Paul Crook, Xiujun Li, Jianfeng Gao, Li Deng
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Natural language understanding and dialogue policy learning are both
essential in conversational systems that predict the next system actions in
response to a current user utterance. Conventional approaches aggregate
separate models of natural language understanding (NLU) and system action
prediction (SAP) as a pipeline that is sensitive to noisy outputs of
error-prone NLU. To address the issues, we propose an end-to-end deep recurrent
neural network with limited contextual dialogue memory by jointly training NLU
and SAP on DSTC4 multi-domain human-human dialogues. Experiments show that our
proposed model significantly outperforms the state-of-the-art pipeline models
for both NLU and SAP, which indicates that our joint model is capable of
mitigating the affects of noisy NLU outputs, and NLU model can be refined by
error flows backpropagating from the extra supervised signals of system
actions.

Creating a Real-Time, Reproducible Event Dataset

John Beieler
Subjects: Computation and Language (cs.CL)

The generation of political event data has remained much the same since the
mid-1990s, both in terms of data acquisition and the process of coding text
into data. Since the 1990s, however, there have been significant improvements
in open-source natural language processing software and in the availability of
digitized news content. This paper presents a new, next-generation event
dataset, named Phoenix, that builds from these and other advances. This dataset
includes improvements in the underlying news collection process and event
coding software, along with the creation of a general processing pipeline
necessary to produce daily-updated data. This paper provides a face validity
checks by briefly examining the data for the conflict in Syria, and a
comparison between Phoenix and the Integrated Crisis Early Warning System data.

Using Discourse Signals for Robust Instructor Intervention Prediction

We tackle the prediction of instructor intervention in student posts from
discussion forums in Massive Open Online Courses (MOOCs). Our key finding is
that using automatically obtained discourse relations improves the prediction
of when instructors intervene in student discussions, when compared with a
state-of-the-art, feature-rich baseline. Our supervised classifier makes use of
an automatic discourse parser which outputs Penn Discourse Treebank (PDTB) tags
that represent in-post discourse features. We show PDTB relation-based features
increase the robustness of the classifier and complement baseline features in
recalling more diverse instructor intervention patterns. In comprehensive
experiments over 14 MOOC offerings from several disciplines, the PDTB discourse
features improve performance on average. The resultant models are less
dependent on domain-specific vocabulary, allowing them to better generalize to
new courses.

Distributed, Parallel, and Cluster Computing

A Randomized Concurrent Algorithm for Disjoint Set Union

Siddhartha V. Jayanti, Robert E. Tarjan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

The disjoint set union problem is a basic problem in data structures with a
wide variety of applications. We extend a known efficient sequential algorithm
for this problem to obtain a simple and efficient concurrent wait-free
algorithm running on an asynchronous parallel random access machine (APRAM).
Crucial to our result is the use of randomization. Under a certain independence
assumption, for a problem instance in which there are n elements, m operations,
and p processes, our algorithm does Theta(m (alpha(n, m/(np)) + log(np/m + 1)))
expected work, where the expectation is over the random choices made by the
algorithm and alpha is a functional inverse of Ackermann’s function. In
addition, each operation takes O(log n) steps with high probability. Our
algorithm is significantly simpler and more efficient than previous algorithms
proposed by Anderson and Woll. Under our independence assumption, our algorithm
achieves almost-linear speed-up for applications in which all or most of the
processes can be kept busy.

Support vector regression model for BigData systems

Alessandro Maria Rizzi
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Performance (cs.PF)

Nowadays Big Data are becoming more and more important. Many sectors of our
economy are now guided by data-driven decision processes. Big Data and business
intelligence applications are facilitated by the MapReduce programming model
while, at infrastructural layer, cloud computing provides flexible and cost
effective solutions for allocating on demand large clusters. In such systems,
capacity allocation, which is the ability to optimally size minimal resources
for achieve a certain level of performance, is a key challenge to enhance
performance for MapReduce jobs and minimize cloud resource costs. In order to
do so, one of the biggest challenge is to build an accurate performance model
to estimate job execution time of MapReduce systems. Previous works applied
simulation based models for modeling such systems. Although this approach can
accurately describe the behavior of Big Data clusters, it is too
computationally expensive and does not scale to large system. We try to
overcome these issues by applying machine learning techniques. More precisely
we focus on Support Vector Regression (SVR) which is intrinsically more robust
w.r.t other techniques, like, e.g., neural networks, and less sensitive to
outliers in the training set. To better investigate these benefits, we compare
SVR to linear regression.

High-Performance Distributed Machine Learning using Apache SPARK

Celestine Dünner, Thomas Parnell, Kubilay Atasu, Manolis Sifalakis, Haralampos Pozidis
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

In this paper we compare the performance of distributed learning using Apache
SPARK and MPI by implementing a distributed linear learning algorithm from
scratch on the two programming frameworks. We then explore the performance gap
and show how SPARK learning can be accelerated, by reducing computational cost
as well as communication-related overheads, to reduce the relative loss in
performance versus MPI from 20x to 2x. With these different implementations at
hand, we will illustrate how the optimal parameters of the algorithm depend
strongly on the characteristics of the framework on which it is executed. We
will show that carefully tuning a distributed algorithm to trade-off
communication and computation can improve performance by orders of magnitude.
Hence, understanding system aspects of the framework and their implications,
and then correctly adapting the algorithm proves to be the key to performance.

The communication-hiding pipelined BiCGStab method for the efficient parallel solution of large unsymmetric linear systems

Siegfried Cools, Wim Vanroose
Comments: 21 pages, 5 figures, 4 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

A High Performance Computing alternative to traditional Krylov methods,
pipelined Krylov solvers offer better scalability in the strong scaling limit
compared to standard Krylov methods for large and sparse linear systems. The
traditional synchronization bottleneck is mitigated by overlapping
time-consuming global communication phases with local computations in the
algorithm. This paper describes a general framework for deriving the pipelined
variant of any Krylov algorithm. The proposed framework was implicitly used to
derive the pipelined Conjugate Gradient (p-CG) method in Hiding global
synchronization latency in the preconditioned Conjugate Gradient algorithm by
P. Ghysels and W. Vanroose, Parallel Computing, 40(7):224–238, 2014. The
pipelining framework is subsequently illustrated by formulating a pipelined
version of the BiCGStab method for the solution of large unsymmetric linear
systems on parallel hardware. A residual replacement strategy is proposed to
account for the possible loss of attainable accuracy and robustness by the
pipelined BiCGStab method. It is shown that the pipelined algorithm improves
scalability on distributed memory machines, leading to significant speedups
compared to standard preconditioned BiCGStab.

Adaptive Work-Efficient Connected Components on the GPU

Michael Sutton, Tal Ben-Nun, Amnon Barak, Sreepathi Pai, Keshav Pingali
Comments: 4 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

This report presents an adaptive work-efficient approach for implementing the
Connected Components algorithm on GPUs. The results show a considerable
increase in performance (up to 6.8( imes)) over current state-of-the-art
solutions.

QoS-based Computing Resources Partitioning between Virtual Machines in the Cloud Architecture

Evgeny Nikulchev, Evgeniy Pluzhnik, Oleg Lukyanchikov, Dmitry Biryukov, Elena Andrianova
Comments: 6 pages, International Journal of Advanced Computer Science and Applications (2016) 7
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Cloud services have been used very widely, but configuration of the
parameters, including the efficient allocation of resources, is an important
objective for the system architect. The article is devoted to solving the
problem of choosing the architecture of computers based on simulation and
developed program for monitoring computing resources. Techniques were developed
aimed at providing the required quality of service and efficient use of
resources. The article describes the monitoring program of computing resources
and time efficiency of the target application functions. On the basis of this
application the technique is shown and described in the experiment, designed to
ensure the requirements for quality of service, by isolating one process from
the others on different virtual machines inside the hypervisor.

On spreading rumor in heterogeneous systems

Jacek Cichoń, Zbigniew Gołeobbiewski, Marcin Kardas, Marek Klonowski, Filip Zagórski
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper we consider a model of spreading information in heterogeneous
systems wherein we have two kinds of objects. Some of them are active and
others are passive. Active objects can, if they possess information, share it
with an encountered passive object. We focus on a particular case such that
active objects communicate independently with randomly chosen passive objects.
Such model is motivated by two real-life scenarios. The first one is a very
dynamic system of mobile devices distributing information among stationary
devices. The second is an architecture wherein clients communicate with several
servers and can leave some information learnt from other servers. The main
question we investigate is how many rounds is needed to deliver the information
to all objects under the assumption that at the beginning exactly one object
has the information?

In this paper we provide mathematical models of such process and show rigid
and very precise mathematical analysis for some special cases important from
practical point of view. Some mathematical results are quite surprising — we
find relation of investigated process to both coupon collector’s problem as
well as the birthday paradox. Additionally, we present simulations for showing
behaviour for general parameters

BrainFrame: A heterogeneous accelerator platform for neuron simulations

Objective: The advent of High-Performance Computing (HPC) in recent years has
led to its increasing use in brain study through computational models. The
scale and complexity of such models are constantly increasing, leading to
challenging computational requirements. Even though modern HPC platforms can
often deal with such challenges, the vast diversity of the modeling field does
not permit for a single acceleration (or homogeneous) platform to effectively
address the complete array of modeling requirements. Approach: In this paper we
propose and build BrainFrame, a heterogeneous acceleration platform,
incorporating three distinct acceleration technologies, a Dataflow Engine, a
Xeon Phi and a GP-GPU. The PyNN framework is also integrated into the platform.
As a challenging proof of concept, we analyze the performance of BrainFrame on
different instances of a state-of-the-art neuron model, modeling the Inferior-
Olivary Nucleus using a biophysically-meaningful, extended Hodgkin-Huxley
representation. The model instances take into account not only the neuronal-
network dimensions but also different network-connectivity circumstances that
can drastically change application workload characteristics. Main results: The
synthetic approach of three HPC technologies demonstrated that BrainFrame is
better able to cope with the modeling diversity encountered. Our performance
analysis shows clearly that the model directly affect performance and all three
technologies are required to cope with all the model use cases.

StreamNF: Performance and Correctness for Stateful Chained NFs

Junaid Khalid, Aditya Akella
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)

Network functions virtualization (NFV) — deploying network functions in
software on commodity machines — allows operators to employ rich chains of NFs
to realize custom performance, security, and compliance policies, and ensure
high performance by dynamically adding instances and/or failing over. Because
NFs are stateful, it is important to carefully manage their state, especially
during such dynamic actions. Crucially, state management must: (1) offer good
performance to match the needs of modern networks; (2) ensure NF chain-wide
properties; and (3) not require the operator to manage low-level state
management details. We present StreamNF, an NFV framework that satisfies the
above requirements. To do so, StreamNF leverages an external state store with
novel caching strategies and offloading of state operations, and chain-level
logical packet clocks and packet logging/replay. Extensive evaluation of a
StreamNF prototype built atop Apache Storm shows that the significant benefits
of StreamNF in terms of state management performance and chain-wide properties
come at a modest per-packet latency cost.

Online Page Migration on Ring Networks in Uniform Model

Amanj Khorramian, Akira Matsubayashi
Comments: 8 pages, 5 figures
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

This paper explores the problem of page migration in ring networks. A ring
network is a connected graph, in which each node is connected with exactly two
other nodes. In this problem, one of the nodes in a given network holds a page
of size D. This node is called the server and the page is a non-duplicable data
in the network. Requests are issued by nodes to access the page one after
another. Every time a new request is issued, the server must serve the request
and may migrate to another node before the next request arrives. A service
costs the distance between the server and the requesting node, and the
migration costs the distance of the migration multiplied by D. The problem is
to minimize the total costs of services and migrations. We study this problem
in uniform model, for which the page has a unit size, i.e. D=1. A
3.326-competitive algorithm improving the current best upper bound is designed.
We show that this ratio is tight for our algorithm.

Expander Graph and Communication-Efficient Decentralized Optimization

Yat-Tin Chow, Wei Shi, Tianyu Wu, Wotao Yin
Comments: 2016 IEEE Asilomar Conference on Signals, Systems, and Computers
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)

In this paper, we discuss how to design the graph topology to reduce the
communication complexity of certain algorithms for decentralized optimization.
Our goal is to minimize the total communication needed to achieve a prescribed
accuracy. We discover that the so-called expander graphs are near-optimal
choices. We propose three approaches to construct expander graphs for different
numbers of nodes and node degrees. Our numerical results show that the
performance of decentralized optimization is significantly better on expander
graphs than other regular graphs.

Learning

Incomplete data representation for SVM classification

Łukasz Struski, Marek Śmieja, Jacek Tabor
Comments: 15 pages, 3 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In this paper we propose two ways of incomplete data representation. The
first one is a generalization of a flag representation, where a vector with
missing attributes is filled with some values and joined with flag vectors
indicating missing components. Our generalization uses pointed affine
subspaces, which in addition to flag representation allows to perform various
affine transformations of data, as whitening or dimensionality reduction. We
show how to embed such affine subspaces into a vector space and how to define a
proper scalar product. In the second approach, we represent missing data points
by degenerated Gaussian densities, which additionally model the uncertainty
connected with missing features. This representation allows to construct an
analogue of RBF kernel on incomplete data space.

Zeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction

Bin Gu, Zhouyuan Huo, Heng Huang
Subjects: Learning (cs.LG)

Zeroth-order (derivative-free) optimization attracts a lot of attention in
machine learning, because explicit gradient calculations may be computationally
expensive or infeasible. To handle large scale problems both in volume and
dimension, recently asynchronous doubly stochastic zeroth-order algorithms were
proposed. The convergence rate of existing asynchronous doubly stochastic
zeroth order algorithms is (O(frac{1}{sqrt{T}})) (also for the sequential
stochastic zeroth-order optimization algorithms). In this paper, we focus on
the finite sums of smooth but not necessarily convex functions, and propose an
asynchronous doubly stochastic zeroth-order optimization algorithm using the
accelerated technology of variance reduction (AsyDSZOVR). Rigorous theoretical
analysis show that the convergence rate can be improved from
(O(frac{1}{sqrt{T}})) the best result of existing algorithms to
(O(frac{1}{T})). Also our theoretical results is an improvement to the ones of
the sequential stochastic zeroth-order optimization algorithms.

Sparse Label Propagation

Alexander Jung
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We consider massive heterogeneous datasets with intrinsic network structure,
i.e., big data over networks. These datasets can be modelled by graph signals,
which are defined over large-scale irregular graphs representing complex
networks. We show that (semi-supervised) learning of the entire underlying
graph signal based on incomplete information provided by few initial labels can
be reduced to a compressed sensing recovery problem within the cosparse
analysis model. This reduction provides two things: First, it allows to apply
highly developed compressed sensing methods to the learning problem. In
particular, by implementing a recent primal-dual method for convex
optimization, we obtain a sparse label propagation algorithm. Moreover, by
casting the learning problem within compressed sensing, we are able to derive
sufficient conditions on the graph structure and available label information,
such that sparse label propagation is accurate.

Learning Adversary-Resistant Deep Neural Networks

Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G. Ororbia II, Xinyu Xing, C. Lee Giles, Xue Liu
Subjects: Learning (cs.LG)

Deep neural networks (DNNs) have proven to be quite effective in a vast array
of machine learning tasks, with recent examples in cyber security and
autonomous vehicles. Despite the superior performance of DNNs in these
applications, it has been recently shown that these models are susceptible to a
particular type of attack that exploits a fundamental flaw in their design.
This attack consists of generating particular synthetic examples referred to as
adversarial samples. These samples are constructed by slightly manipulating
real data-points in order to “fool” the original DNN model, forcing it to
mis-classify previously correctly classified samples with high confidence.
Addressing this flaw in the model is essential if DNNs are to be used in
critical applications such as those in cyber security. Previous work has
provided various defense mechanisms by either augmenting the training set or
enhancing model complexity. However, after a thorough analysis, we discover
that DNNs protected by these defense mechanisms are still susceptible to
adversarial samples, indicating that there are no theoretical guarantees of
resistance provided by these mechanisms. To the best of our knowledge, we are
the first to investigate this issue shared across previous research work and to
propose a unifying framework for protecting DNN models by integrating a data
transformation module with the DNN. More importantly, we provide a theoretical
guarantee for protection under our proposed framework. We evaluate our method
and several other existing solutions on both MNIST, CIFAR-10, and a malware
dataset, to demonstrate the generality of our proposed method and its potential
for handling cyber security applications. The results show that our framework
provides better resistance compared to state-of-art solutions while
experiencing negligible degradation in accuracy.

Implicit Modeling — A Generalization of Discriminative and Generative Approaches

Dmitrij Schlesinger, Carsten Rother
Subjects: Learning (cs.LG)

We propose a new modeling approach that is a generalization of generative and
discriminative models. The core idea is to use an implicit parameterization of
a joint probability distribution by specifying only the conditional
distributions. The proposed scheme combines the advantages of both worlds — it
can use powerful complex discriminative models as its parts, having at the same
time better generalization capabilities. We thoroughly evaluate the proposed
method for a simple classification task with artificial data and illustrate its
advantages for real-word scenarios on a semantic image segmentation problem.

A Nearly Optimal Contextual Bandit Algorithm

Mohammadreza Mohaghegh Neyshabouri, Kaan Gokcesu, Selami Ciftci, Suleyman S. Kozat
Subjects: Learning (cs.LG)

We investigate the contextual multi-armed bandit problem in an adversarial
setting and introduce an online algorithm that asymptotically achieves the
performance of the best contextual bandit arm selection strategy under certain
conditions. We show that our algorithm is highly efficient and provides
significantly improved performance with a guaranteed performance upper bound in
a strong mathematical sense. We have no statistical assumptions on the context
vectors and the loss of the bandit arms, hence our results are guaranteed to
hold even in adversarial environments. We use a tree notion in order to
partition the space of context vectors in a nested structure. Using this tree,
we construct a large class of context dependent bandit arm selection strategies
and adaptively combine them to achieve the performance of the best strategy. We
use the hierarchical nature of introduced tree to implement this combination
with a significantly low computational complexity, thus our algorithm can be
efficiently used in applications involving big data. Through extensive set of
experiments involving synthetic and real data, we demonstrate significant
performance gains achieved by the proposed algorithm with respect to the
state-of-the-art adversarial bandit algorithms.

Diagnostic Prediction Using Discomfort Drawings

Cheng Zhang, Hedvig Kjellstrom, Bo C. Bertilson
Comments: NIPS 2016 Workshop on Machine Learning for Health
Subjects: Learning (cs.LG)

In this paper, we explore the possibility to apply machine learning to make
diagnostic predictions using discomfort drawings. A discomfort drawing is an
intuitive way for patients to express discomfort and pain related symptoms.
These drawings have proven to be an effective method to collect patient data
and make diagnostic decisions in real-life practice. A dataset from real-world
patient cases is collected for which medical experts provide diagnostic labels.
Next, we extend a factorized multimodal topic model, Inter-Battery Topic Model
(IBTM), to train a system that can make diagnostic predictions given an unseen
discomfort drawing. Experimental results show reasonable predictions of
diagnostic labels given an unseen discomfort drawing. The positive result
indicates a significant potential of machine learning to be used for parts of
the pain diagnostic process and to be a decision support system for physicians
and other health care personnel.

A One class Classifier based Framework using SVDD : Application to an Imbalanced Geological Dataset

Soumi Chaki, Akhilesh Kumar Verma, Aurobinda Routray, William K. Mohanty, Mamata Jenamani
Comments: presented at IEEE Students Technology Symposium (TechSym), 28 February to 2 March 2014, IIT Kharagpur, India. 6 pages, 7 figures, 2tables
Subjects: Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

Evaluation of hydrocarbon reservoir requires classification of petrophysical
properties from available dataset. However, characterization of reservoir
attributes is difficult due to the nonlinear and heterogeneous nature of the
subsurface physical properties. In this context, present study proposes a
generalized one class classification framework based on Support Vector Data
Description (SVDD) to classify a reservoir characteristic water saturation into
two classes (Class high and Class low) from four logs namely gamma ray, neutron
porosity, bulk density, and P sonic using an imbalanced dataset. A comparison
is carried out among proposed framework and different supervised classification
algorithms in terms of g metric means and execution time. Experimental results
show that proposed framework has outperformed other classifiers in terms of
these performance evaluators. It is envisaged that the classification analysis
performed in this study will be useful in further reservoir modeling.

Cryptocurrency Portfolio Management with Deep Reinforcement Learning

Zhengyao Jiang, Jinjun Liang
Subjects: Learning (cs.LG)

We present a convolutional neural network with historic prices of a set of
financial assets as its input, outputting portfolio weights of the set. The
network is trained to achieve maximum accumulative return. A back-test trading
experiment is conducted in a cryptocurrency market, achieving 10-fold returns
in 1.8 month’s periods, outperforming many traditional portfolio management
algorithms and benchmarks.

Deep Symbolic Representation Learning for Heterogeneous Time-series Classification

Shengdong Zhang, Soheil Bahrampour, Naveen Ramakrishnan, Mohak Shah
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we consider the problem of event classification with
multi-variate time series data consisting of heterogeneous (continuous and
categorical) variables. The complex temporal dependencies between the variables
combined with sparsity of the data makes the event classification problem
particularly challenging. Most state-of-art approaches address this either by
designing hand-engineered features or breaking up the problem over homogeneous
variates. In this work, we propose and compare three representation learning
algorithms over symbolized sequences which enables classification of
heterogeneous time-series data using a deep architecture. The proposed
representations are trained jointly along with the rest of the network
architecture in an end-to-end fashion that makes the learned features
discriminative for the given task. Experiments on three real-world datasets
demonstrate the effectiveness of the proposed approaches.

Robust nonparametric nearest neighbor random process clustering

Michael Tschannen, Helmut Bölcskei
Comments: 13 pages, 4 figures
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

We consider the problem of clustering noisy finite-length observations of
stationary ergodic random processes according to their generative models
without prior knowledge of the model statistics and the number of generative
models. Two algorithms, both using the L1-distance between estimated power
spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The
first one, termed nearest neighbor process clustering (NNPC) is new and relies
on partitioning the nearest neighbor graph of the observations via spectral
clustering. The second algorithm, simply referred to as k-means (KM), consists
of a single k-means iteration with farthest point initialization and was
considered before in the literature, albeit with a different dissimilarity
measure and with asymptotic performance results only. We prove that both
algorithms succeed with high probability in the presence of noise and missing
entries, and even when the generative process PSDs overlap significantly, all
provided that the observation length is sufficiently large. Our results
quantify the tradeoff between the overlap of the generative process PSDs, the
observation length, the fraction of missing entries, and the noise variance.
Furthermore, we prove that treating the finite-length observations of
stationary ergodic random processes as vectors in Euclidean space and
clustering them using the thresholding-based subspace clustering (TSC)
algorithm, the subspace clustering cousin of NNPC, results in performance
strictly inferior to that of NNPC. We argue that the underlying cause is to be
found in TSC employing spherical distance as dissimilarity measure, thereby
ignoring the stationary process structure of the observations. Finally, we
provide extensive numerical results for synthetic and real data and find that
NNPC outperforms state-of-the-art algorithms in human motion sequence
clustering.

Learning to superoptimize programs – Workshop Version

Rudy Bunel, Alban Desmaison, M. Pawan Kumar, Philip H.S.Torr, Pushmeet Kohli
Comments: Workshop version for the NIPS NAMPI Workshop. Extended version at arXiv:1611.01787
Subjects: Learning (cs.LG)

Superoptimization requires the estimation of the best program for a given
computational task. In order to deal with large programs, superoptimization
techniques perform a stochastic search. This involves proposing a modification
of the current program, which is accepted or rejected based on the improvement
achieved. The state of the art method uses uniform proposal distributions,
which fails to exploit the problem structure to the fullest. To alleviate this
deficiency, we learn a proposal distribution over possible modifications using
Reinforcement Learning. We provide convincing results on the superoptimization
of “Hacker’s Delight” programs.

Trained Ternary Quantization

Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally
Comments: Submitted to ICLR 17
Subjects: Learning (cs.LG)

Product reviews contain a lot of useful information about product features
and customer opinions. One important product feature is the complementary
entity (products) that may potentially work together with the reviewed product.
Knowing complementary entities of the reviewed product is very important
because customers want to buy compatible products and avoid incompatible ones.
In this paper, we address the problem of Complementary Entity Recognition
(CER). Since no existing method can solve this problem, we first propose a
novel unsupervised method to utilize syntactic dependency paths to recognize
complementary entities. Then we expand category-level domain knowledge about
complementary entities using only a few general seed verbs on a large amount of
unlabeled reviews. The domain knowledge helps the unsupervised method to adapt
to different products and greatly improves the precision of the CER task. The
advantage of the proposed method is that it does not require any labeled data
for training. We conducted experiments on 7 popular products with about 1200
reviews in total to demonstrate that the proposed approach is effective.

Positive blood culture detection in time series data using a BiLSTM network

The presence of bacteria or fungi in the bloodstream of patients is abnormal
and can lead to life-threatening conditions. A computational model based on a
bidirectional long short-term memory artificial neural network, is explored to
assist doctors in the intensive care unit to predict whether examination of
blood cultures of patients will return positive. As input it uses nine
monitored clinical parameters, presented as time series data, collected from
2177 ICU admissions at the Ghent University Hospital. Our main goal is to
determine if general machine learning methods and more specific, temporal
models, can be used to create an early detection system. This preliminary
research obtains an area of 71.95% under the precision recall curve, proving
the potential of temporal neural networks in this context.

Success Probability of Exploration: a Concrete Analysis of Learning Efficiency

Liangpeng Zhang, Ke Tang, Xin Yao
Subjects: Learning (cs.LG)

Exploration has been a crucial part of reinforcement learning, yet several
important questions concerning exploration efficiency are still not answered
satisfactorily by existing analytical frameworks. These questions include
exploration parameter setting, situation analysis, and hardness of MDPs, all of
which are unavoidable for practitioners. To bridge the gap between the theory
and practice, we propose a new analytical framework called the success
probability of exploration. We show that those important questions of
exploration above can all be answered under our framework, and the answers
provided by our framework meet the needs of practitioners better than the
existing ones. More importantly, we introduce a concrete and practical approach
to evaluating the success probabilities in certain MDPs without the need of
actually running the learning algorithm. We then provide empirical results to
verify our approach, and demonstrate how the success probability of exploration
can be used to analyse and predict the behaviours and possible outcomes of
exploration, which are the keys to the answer of the important questions of
exploration.

A Novel Framework based on SVDD to Classify Water Saturation from Seismic Attributes

Soumi Chaki, Akhilesh Kumar Verma, Aurobinda Routray, William K. Mohanty, Mamata Jenamani
Comments: 6 pages, 8 figures, 2table Presented at Fourth International Conference on Emerging Applications of Information Technology (EAIT 2014), ISI Kolkata, India
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Water saturation is an important property in reservoir engineering domain.
Thus, satisfactory classification of water saturation from seismic attributes
is beneficial for reservoir characterization. However, diverse and non-linear
nature of subsurface attributes makes the classification task difficult. In
this context, this paper proposes a generalized Support Vector Data Description
(SVDD) based novel classification framework to classify water saturation into
two classes (Class high and Class low) from three seismic attributes seismic
impedance, amplitude envelop, and seismic sweetness. G-metric means and program
execution time are used to quantify the performance of the proposed framework
along with established supervised classifiers. The documented results imply
that the proposed framework is superior to existing classifiers. The present
study is envisioned to contribute in further reservoir modeling.

A novel multiclassSVM based framework to classify lithology from well logs: a real-world application

Soumi Chaki, Aurobinda Routray, William K. Mohanty, Mamata Jenamani
Comments: 5 pages, 5 figures, 4 tables Presented at INDICON 2015 at New Delhi, India
Subjects: Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

Support vector machines (SVMs) have been recognized as a potential tool for
supervised classification analyses in different domains of research. In
essence, SVM is a binary classifier. Therefore, in case of a multiclass
problem, the problem is divided into a series of binary problems which are
solved by binary classifiers, and finally the classification results are
combined following either the one-against-one or one-against-all strategies. In
this paper, an attempt has been made to classify lithology using a multiclass
SVM based framework using well logs as predictor variables. Here, the lithology
is classified into four classes such as sand, shaly sand, sandy shale and shale
based on the relative values of sand and shale fractions as suggested by an
expert geologist. The available dataset consisting well logs (gamma ray,
neutron porosity, density, and P-sonic) and class information from four closely
spaced wells from an onshore hydrocarbon field is divided into training and
testing sets. We have used one-against-all strategy to combine the results of
multiple binary classifiers. The reported results established the superiority
of multiclass SVM compared to other classifiers in terms of classification
accuracy. The selection of kernel function and associated parameters has also
been investigated here. It can be envisaged from the results achieved in this
study that the proposed framework based on multiclass SVM can further be used
to solve classification problems. In future research endeavor, seismic
attributes can be introduced in the framework to classify the lithology
throughout a study area from seismic inputs.

A Nonparametric Latent Factor Model For Location-Aware Video Recommendations

Ehtsham Elahi
Comments: NIPS 2016 Workshop on Practical Bayesian Nonparametrics
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We are interested in learning customers’ video preferences from their
historic viewing patterns and geographical location. We consider a Bayesian
latent factor modeling approach for this task. In order to tune the complexity
of the model to best represent the data, we make use of Bayesian nonparameteric
techniques. We describe an inference technique that can scale to large
real-world data sets. Finally we show results obtained by applying the model to
a large internal Netflix data set, that illustrates that the model was able to
capture interesting relationships between viewing patterns and geographical
location.

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Deep neural networks are powerful black box predictors that have recently
achieved impressive performance on a wide spectrum of tasks. Quantifying
predictive uncertainty in neural networks is a challenging and yet unsolved
problem. Bayesian neural networks, which learn a distribution over weights, are
currently the state-of-the-art for estimating predictive uncertainty; however
these require significant modifications to the training procedure and are
computationally expensive compared to standard (non-Bayesian) neural neural
networks. We propose an alternative to Bayesian neural networks, that is simple
to implement, readily parallelisable and yields high quality predictive
uncertainty estimates. Through a series of experiments on classification and
regression benchmarks, we demonstrate that our method produces well-calibrated
uncertainty estimates which are as good or better than approximate Bayesian
neural networks. Finally, we evaluate the predictive uncertainty on test
examples from known and unknown classes, and show that our method is able to
express higher degree of uncertainty on unknown classes, unlike existing
methods which make overconfident predictions even on unknown classes.

Support vector regression model for BigData systems

Alessandro Maria Rizzi
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Performance (cs.PF)

Nowadays Big Data are becoming more and more important. Many sectors of our
economy are now guided by data-driven decision processes. Big Data and business
intelligence applications are facilitated by the MapReduce programming model
while, at infrastructural layer, cloud computing provides flexible and cost
effective solutions for allocating on demand large clusters. In such systems,
capacity allocation, which is the ability to optimally size minimal resources
for achieve a certain level of performance, is a key challenge to enhance
performance for MapReduce jobs and minimize cloud resource costs. In order to
do so, one of the biggest challenge is to build an accurate performance model
to estimate job execution time of MapReduce systems. Previous works applied
simulation based models for modeling such systems. Although this approach can
accurately describe the behavior of Big Data clusters, it is too
computationally expensive and does not scale to large system. We try to
overcome these issues by applying machine learning techniques. More precisely
we focus on Support Vector Regression (SVR) which is intrinsically more robust
w.r.t other techniques, like, e.g., neural networks, and less sensitive to
outliers in the training set. To better investigate these benefits, we compare
SVR to linear regression.

High-Performance Distributed Machine Learning using Apache SPARK

Celestine Dünner, Thomas Parnell, Kubilay Atasu, Manolis Sifalakis, Haralampos Pozidis
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

In this paper we compare the performance of distributed learning using Apache
SPARK and MPI by implementing a distributed linear learning algorithm from
scratch on the two programming frameworks. We then explore the performance gap
and show how SPARK learning can be accelerated, by reducing computational cost
as well as communication-related overheads, to reduce the relative loss in
performance versus MPI from 20x to 2x. With these different implementations at
hand, we will illustrate how the optimal parameters of the algorithm depend
strongly on the characteristics of the framework on which it is executed. We
will show that carefully tuning a distributed algorithm to trade-off
communication and computation can improve performance by orders of magnitude.
Hence, understanding system aspects of the framework and their implications,
and then correctly adapting the algorithm proves to be the key to performance.

Extracting Implicit Social Relation for Social Recommendation Techniques in User Rating Prediction

Seyed Mohammad Taheri, Hamidreza Mahyar, Mohammad Firouzi, Ali Movaghar
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

Recommendation plays an increasingly important role in our daily lives.
Recommender systems automatically suggest items to users that might be
interesting for them. Recent studies illustrate that incorporating social trust
in Matrix Factorization methods demonstrably improves accuracy of rating
prediction. Such approaches mainly use the trust scores explicitly expressed by
users. However, it is often challenging to have users provide explicit trust
scores of each other. There exist quite a few works, which propose Trust
Metrics to compute and predict trust scores between users based on their
interactions. In this paper, first we present how social relation can be
extracted from users’ ratings to items by describing Hellinger distance between
users in recommender systems. Then, we propose to incorporate the predicted
trust scores into social matrix factorization models. By analyzing social
relation extraction from three well-known real-world datasets, which both:
trust and recommendation data available, we conclude that using the implicit
social relation in social recommendation techniques has almost the same
performance compared to the actual trust scores explicitly expressed by users.
Hence, we build our method, called Hell-TrustSVD, on top of the
state-of-the-art social recommendation technique to incorporate both the
extracted implicit social relations and ratings given by users on the
prediction of items for an active user. To the best of our knowledge, this is
the first work to extend TrustSVD with extracted social trust information. The
experimental results support the idea of employing implicit trust into matrix
factorization whenever explicit trust is not available, can perform much better
than the state-of-the-art approaches in user rating prediction.

Ranking Biomarkers Through Mutual Information

Konstantinos Sechidis, Emily Turner, Paul D. Metcalfe, James Weatherall, Gavin Brown
Comments: Accepted at NIPS 2016 Workshop on Machine Learning for Health
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Applications (stat.AP)

We study information theoretic methods for ranking biomarkers. In clinical
trials there are two, closely related, types of biomarkers: predictive and
prognostic, and disentangling them is a key challenge. Our first step is to
phrase biomarker ranking in terms of optimizing an information theoretic
quantity. This formalization of the problem will enable us to derive rankings
of predictive/prognostic biomarkers, by estimating different, high dimensional,
conditional mutual information terms. To estimate these terms, we suggest
efficient low dimensional approximations, and we derive an empirical Bayes
estimator, which is suitable for small or sparse datasets. Finally, we
introduce a new visualisation tool that captures the prognostic and the
predictive strength of a set of biomarkers. We believe this representation will
prove to be a powerful tool in biomarker discovery.

Message Passing Multi-Agent GANs

Communicating and sharing intelligence among agents is an important facet of
achieving Artificial General Intelligence. As a first step towards this
challenge, we introduce a novel framework for image generation: Message Passing
Multi-Agent Generative Adversarial Networks (MPM GANs). While GANs have
recently been shown to be very effective for image generation and other tasks,
these networks have been limited to mostly single generator-discriminator
networks. We show that we can obtain multi-agent GANs that communicate through
message passing to achieve better image generation. The objectives of the
individual agents in this framework are two fold: a co-operation objective and
a competing objective. The co-operation objective ensures that the message
sharing mechanism guides the other generator to generate better than itself
while the competing objective encourages each generator to generate better than
its counterpart. We analyze and visualize the messages that these GANs share
among themselves in various scenarios. We quantitatively show that the message
sharing formulation serves as a regularizer for the adversarial training.
Qualitatively, we show that the different generators capture different traits
of the underlying data distribution.

Deep Image Category Discovery using a Transferred Similarity Function

Yen-Chang Hsu, Zhaoyang Lv, Zsolt Kira
Comments: 13 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Automatically discovering image categories in unlabeled natural images is one
of the important goals of unsupervised learning. However, the task is
challenging and even human beings define visual categories based on a large
amount of prior knowledge. In this paper, we similarly utilize prior knowledge
to facilitate the discovery of image categories. We present a novel end-to-end
network to map unlabeled images to categories as a clustering network. We
propose that this network can be learned with contrastive loss which is only
based on weak binary pair-wise constraints. Such binary constraints can be
learned from datasets in other domains as transferred similarity functions,
which mimic a simple knowledge transfer. We first evaluate our experiments on
the MNIST dataset as a proof of concept, based on predicted similarities
trained on Omniglot, showing a 99\% accuracy which significantly outperforms
clustering based approaches. Then we evaluate the discovery performance on
Cifar-10, STL-10, and ImageNet, which achieves both state-of-the-art accuracy
and shows it can be scalable to various large natural images.

Known Unknowns: Uncertainty Quality in Bayesian Neural Networks

We evaluate the uncertainty quality in neural networks using anomaly
detection. We extract uncertainty measures (e.g. entropy) from the predictions
of candidate models, use those measures as features for an anomaly detector,
and gauge how well the detector differentiates known from unknown classes. We
assign higher uncertainty quality to candidate models that lead to better
detectors. We also propose a novel method for sampling a variational
approximation of a Bayesian neural network, called One-Sample Bayesian
Approximation (OSBA). We experiment on two datasets, MNIST and CIFAR10. We
compare the following candidate neural network models: Maximum Likelihood,
Bayesian Dropout, OSBA, and — for MNIST — the standard variational
approximation. We show that Bayesian Dropout and OSBA provide better
uncertainty information than Maximum Likelihood, and are essentially equivalent
to the standard variational approximation, but much faster.

Learnable Structured Clustering Framework for Deep Metric Learning

Hyun Oh Song, Stefanie Jegelka, Vivek Rathod, Kevin Murphy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Learning the representation and the similarity metric in an end-to-end
fashion with deep networks have demonstrated outstanding results for clustering
and retrieval. However, these recent approaches still suffer from the
performance degradation stemming from the local metric training procedure which
is unaware of the global structure of the embedding space.

We propose a global metric learning scheme for optimizing the deep metric
embedding with the learnable clustering function and the clustering metric
(NMI) in a novel structured prediction framework.

Our experiments on CUB200-2011, Cars196, and Stanford online products
datasets show state of the art performance both on the clustering and retrieval
tasks measured in the NMI and Recall@K evaluation metrics.

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We consider the problem of off-policy evaluation—estimating the value of a
target policy using data collected by another policy—under the contextual
bandit model. We establish a minimax lower bound on the mean squared error
(MSE), and show that it is matched up to constant factors by the inverse
propensity scoring (IPS) estimator. Since in the multi-armed bandit problem the
IPS is suboptimal (Li et. al, 2015), our result highlights the difficulty of
the contextual setting with non-degenerate context distributions. We further
consider improvements on this minimax MSE bound, given access to a reward
model. We show that the existing doubly robust approach, which utilizes such a
reward model, may continue to suffer from high variance even when the reward
model is perfect. We propose a new estimator called SWITCH which more
effectively uses the reward model and achieves a superior bias-variance
tradeoff compared with prior work. We prove an upper bound on its MSE and
demonstrate its benefits empirically on a diverse collection of datasets, often
seeing orders of magnitude improvements over a number of baselines.

Intra-day Activity Better Predicts Chronic Conditions

Tom Quisel, David C. Kale, Luca Foschini
Comments: Presented at the NIPS 2016 Workshop on Machine Learning for Health
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

In this work we investigate intra-day patterns of activity on a population of
7,261 users of mobile health wearable devices and apps. We show that: (1) using
intra-day step and sleep data recorded from passive trackers significantly
improves classification performance on self-reported chronic conditions related
to mental health and nervous system disorders, (2) Convolutional Neural
Networks achieve top classification performance vs. baseline models when
trained directly on multivariate time series of activity data, and (3) jointly
predicting all condition classes via multi-task learning can be leveraged to
extract features that generalize across data sets and achieve the highest
classification performance.

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision (Short Version)

Extending the success of deep neural networks to natural language
understanding and symbolic reasoning requires complex operations and external
memory. Recent neural program induction approaches have attempted to address
this problem, but are typically limited to differentiable memory, and
consequently cannot scale beyond small synthetic tasks. In this work, we
propose the Manager-Programmer-Computer framework, which integrates neural
networks with non-differentiable memory to support abstract, scalable and
precise operations through a friendly neural computer interface. Specifically,
we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence
neural “programmer”, and a non-differentiable “computer” that is a Lisp
interpreter with code assist. To successfully apply REINFORCE for training, we
augment it with approximate gold programs found by an iterative maximum
likelihood training process. NSM is able to learn a semantic parser from weak
supervision over a large knowledge base. It achieves new state-of-the-art
performance on WebQuestionsSP, a challenging semantic parsing dataset, with
weak supervision. Compared to previous approaches, NSM is end-to-end, therefore
does not rely on feature engineering or domain specific knowledge.

On the propriety of restricted Boltzmann machines

Andee Kaplan, Daniel Nordman, Stephen Vardeman
Comments: 35 pages, 13 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

A restricted Boltzmann machine (RBM) is an undirected graphical model
constructed for discrete or continuous random variables, with two layers, one
hidden and one visible, and no conditional dependency within a layer. In recent
years, RBMs have risen to prominence due to their connection to deep learning.
By treating a hidden layer of one RBM as the visible layer in a second RBM, a
deep architecture can be created. RBMs are thought to thereby have the ability
to encode very complex and rich structures in data, making them attractive for
supervised learning. However, the generative behavior of RBMs is largely
unexplored. In this paper, we discuss the relationship between RBM parameter
specification in the binary case and the tendency to undesirable model
properties such as degeneracy, instability and uninterpretability. We also
describe the difficulties that arise in likelihood-based and Bayes fitting of
such (highly flexible) models, especially as Gibbs sampling (quasi-Bayes)
methods are often advocated for the RBM model structure.

Deep Learning of Robotic Tasks using Strong and Weak Human Supervision

Bar Hilleli, Ran El-Yaniv
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

We propose a scheme for training a computerized agent to perform complex
human tasks such as highway steering. The scheme resembles natural
teaching-learning processes used by humans to teach themselves and each other
complex tasks, and consists of the following four stages. In the first stage
the agent learns by itself an informative low-dimensional representations of
raw input signals in an unsupervised learning manner. In the second stage the
agent learns to mimic the human instructor using supervised learning so as to
reach a basic performance level; the third stage is devoted to learning an
instantaneous reward model. Here, the (human) instructor observes (possibly in
real time) the agent performing the task and provides reward feedback. During
this stage the agent monitors both itself and the instructor feedback and
learns a reward model using supervised learning. This stage terminates when the
reward model is sufficiently accurate. In the last stage a reinforcement
learning algorithm is deployed to optimize the agent policy. The guidance
reward signal in the reinforcement learning algorithm relies on the previously
learned reward model. As a proof of concept for the proposed scheme, we
designed a system consisting of deep convolutional neural networks, and applied
it to successfully learn a computerized agent capable of autonomous highway
steering over the well-known racing game Assetto Corsa.

Enhancing Use Case Points Estimation Method Using Soft Computing Techniques

Ali Bou Nassif, Luiz Fernando Capretz, Danny Ho
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Learning (cs.LG)

Software estimation is a crucial task in software engineering. Software
estimation encompasses cost, effort, schedule, and size. The importance of
software estimation becomes critical in the early stages of the software life
cycle when the details of software have not been revealed yet. Several
commercial and non-commercial tools exist to estimate software in the early
stages. Most software effort estimation methods require software size as one of
the important metric inputs and consequently, software size estimation in the
early stages becomes essential. One of the approaches that has been used for
about two decades in the early size and effort estimation is called use case
points. Use case points method relies on the use case diagram to estimate the
size and effort of software projects. Although the use case points method has
been widely used, it has some limitations that might adversely affect the
accuracy of estimation. This paper presents some techniques using fuzzy logic
and neural networks to improve the accuracy of the use case points method.
Results showed that an improvement up to 22% can be obtained using the proposed
approach.

Algorithmic Songwriting with ALYSIA

Margareta Ackerman, David Loker
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)

Support vector machines (SVMs) have been recognized as a potential tool for
supervised classification analyses in different domains of research. In
essence, SVM is a binary classifier. Therefore, in case of a multiclass
problem, the problem is divided into a series of binary problems which are
solved by binary classifiers, and finally the classification results are
combined following either the one-against-one or one-against-all strategies. In
this paper, an attempt has been made to classify lithology using a multiclass
SVM based framework using well logs as predictor variables. Here, the lithology
is classified into four classes such as sand, shaly sand, sandy shale and shale
based on the relative values of sand and shale fractions as suggested by an
expert geologist. The available dataset consisting well logs (gamma ray,
neutron porosity, density, and P-sonic) and class information from four closely
spaced wells from an onshore hydrocarbon field is divided into training and
testing sets. We have used one-against-all strategy to combine the results of
multiple binary classifiers. The reported results established the superiority
of multiclass SVM compared to other classifiers in terms of classification
accuracy. The selection of kernel function and associated parameters has also
been investigated here. It can be envisaged from the results achieved in this
study that the proposed framework based on multiclass SVM can further be used
to solve classification problems. In future research endeavor, seismic
attributes can be introduced in the framework to classify the lithology
throughout a study area from seismic inputs.

Modeling trajectories of mental health: challenges and opportunities

Lauren Erdman, Ekansh Sharma, Eva Unternahrer, Shantala Hari Dass, Kieran ODonnell, Sara Mostafavi, Rachel Edgar, Michael Kobor, Helene Gaudreau, Michael Meaney, Anna Goldenberg
Comments: extended abstract for ML4HC at NIPS 2016, 4 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Applications (stat.AP)

More than two thirds of mental health problems have their onset during
childhood or adolescence. Identifying children at risk for mental illness later
in life and predicting the type of illness is not easy. We set out to develop a
platform to define subtypes of childhood social-emotional development using
longitudinal, multifactorial trait-based measures. Subtypes discovered through
this study could ultimately advance psychiatric knowledge of the early
behavioural signs of mental illness. To this extent we have examined two types
of models: latent class mixture models and GP-based models. Our findings
indicate that while GP models come close in accuracy of predicting future
trajectories, LCMMs predict the trajectories as well in a fraction of the time.
Unfortunately, neither of the models are currently accurate enough to lead to
immediate clinical impact. The available data related to the development of
childhood mental health is often sparse with only a few time points measured
and require novel methods with improved efficiency and accuracy.

Large scale modeling of antimicrobial resistance with interpretable classifiers

Alexandre Drouin, Frédéric Raymond, Gaël Letarte St-Pierre, Mario Marchand, Jacques Corbeil, François Laviolette
Comments: Peer-reviewed and accepted for presentation at the Machine Learning for Health Workshop, NIPS 2016, Barcelona, Spain
Subjects: Genomics (q-bio.GN); Learning (cs.LG); Machine Learning (stat.ML)

Antimicrobial resistance is an important public health concern that has
implications in the practice of medicine worldwide. Accurately predicting
resistance phenotypes from genome sequences shows great promise in promoting
better use of antimicrobial agents, by determining which antibiotics are likely
to be effective in specific clinical cases. In healthcare, this would allow for
the design of treatment plans tailored for specific individuals, likely
resulting in better clinical outcomes for patients with bacterial infections.
In this work, we present the recent work of Drouin et al. (2016) on using Set
Covering Machines to learn highly interpretable models of antibiotic resistance
and complement it by providing a large scale application of their method to the
entire PATRIC database. We report prediction results for 36 new datasets and
present the Kover AMR platform, a new web-based tool allowing the visualization
and interpretation of the generated models.

Transformation Function Based Methods for Model Shift

Simon Shaolei Du, Jayanth Koushik, Aarti Singh, Barnabas Poczos
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Transfer learning techniques are often used when one tries to adapt a model
learned from a source domain with abundant labeled samples to the target domain
with limited labeled samples. In this paper, we consider the regression problem
under model shift condition, i.e., regression functions are different but
related in the source and target domains. We approach this problem through the
use of transformation functions which characterize the relation between the
source and the target domain. These transformation functions are able to
transform the original problem of learning the complicated regression function
of target domain into a problem of learning a simple auxiliary function. This
transformation function based technique includes some previous works as special
cases, but the class we propose is significantly more general. In this work we
consider two widely used non-parametric estimators, Kernel Smoothing (KS) and
Kernel Ridge Regression (KRR) for this setting and show improved statistical
rates for excess risk than non-transfer learning. Through an (epsilon)-cover
technique, we show that we can find the best transformation function a function
class. Lastly, experiments on synthesized, robotics and neural imaging data
demonstrate the effectiveness of our framework.

Estimating latent feature-feature interactions in large feature-rich graphs

Corrado Monti, Paolo Boldi
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

Real-world complex networks describe connections between objects; in reality,
those objects are often endowed with some kind of features. How does the
presence or absence of such features interplay with the network link structure?
Although the situation here described is truly ubiquitous, there is a limited
body of research dealing with large graphs of this kind. Many previous works
considered homophily as the only possible transmission mechanism translating
node features into links. Other authors, instead, developed more sophisticated
models, that are able to handle complex feature interactions, but are unfit to
scale to very large networks. We expand on the MGJ model, where interactions
between pairs of features can foster or discourage link formation. In this
work, we will investigate how to estimate the latent feature-feature
interactions in this model. We shall propose two solutions: the first one
assumes feature independence and it is essentially based on Naive Bayes; the
second one, which relaxes the independence assumption assumption, is based on
perceptrons. In fact, we show it is possible to cast the model equation in
order to see it as the prediction rule of a perceptron. We analyze how
classical results for the perceptrons can be interpreted in this context; then,
we define a fast and simple perceptron-like algorithm for this task, which can
process (10^8) links in minutes. We then compare these two techniques, first
with synthetic datasets that follows our model, gaining evidence that the Naive
independence assumptions are detrimental in practice. Secondly, we consider a
real, large-scale citation network where each node (i.e., paper) can be
described by different types of characteristics; there, our algorithm can
assess how well each set of features can explain the links, and thus finding
meaningful latent feature-feature interactions.

End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager

Xuesong Yang, Yun-Nung Chen, Dilek Hakkani-Tur, Paul Crook, Xiujun Li, Jianfeng Gao, Li Deng
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Natural language understanding and dialogue policy learning are both
essential in conversational systems that predict the next system actions in
response to a current user utterance. Conventional approaches aggregate
separate models of natural language understanding (NLU) and system action
prediction (SAP) as a pipeline that is sensitive to noisy outputs of
error-prone NLU. To address the issues, we propose an end-to-end deep recurrent
neural network with limited contextual dialogue memory by jointly training NLU
and SAP on DSTC4 multi-domain human-human dialogues. Experiments show that our
proposed model significantly outperforms the state-of-the-art pipeline models
for both NLU and SAP, which indicates that our joint model is capable of
mitigating the affects of noisy NLU outputs, and NLU model can be refined by
error flows backpropagating from the extra supervised signals of system
actions.

Parameter Compression of Recurrent Neural Networks and Degredation of Short-term Memory

Jonathan A. Cox
Comments: Submitted to IJCNN 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The significant computational costs of deploying neural networks in
large-scale or resource constrained environments, such as data centers and
mobile devices, has spurred interest in model compression, which can achieve a
reduction in both arithmetic operations and storage memory. Several techniques
have been proposed for reducing or compressing the parameters for feed-forward
and convolutional neural networks, but less is understood about the effect of
parameter compression on recurrent neural networks (RNN). In particular, the
extent to which the recurrent parameters can be compressed and the impact on
short-term memory performance, is not well understood. In this paper, we study
the effect of complexity reduction, through singular value decomposition rank
reduction, on RNN and minimal gated recurrent unit (MGRU) networks for several
tasks. We show that considerable rank reduction is possible when compressing
recurrent weights, even without fine tuning. Furthermore, we propose a
perturbation model for the effect of general perturbations, such as a
compression, on the recurrent parameters of RNNs. The model is tested against a
noiseless memorization experiment that elucidates the short-term memory
performance. In this way, we demonstrate that the effect of compression of
recurrent parameters is dependent on the degree of temporal coherence present
in the data and task. This work can guide on-the-fly RNN compression for novel
environments or tasks, and provides insight for applying RNN compression in
low-power devices, such as hearing aids.

Information Theory

Capacity Regions of Two-Receiver Broadcast Erasure Channels with Feedback and Memory

Michael Heindlmaier, Shirin Saeedi Bidokhti
Subjects: Information Theory (cs.IT)

The two-receiver broadcast packet erasure channel with feedback and memory is
studied. Memory is modeled using a finite- state Markov chain representing a
channel state. Two scenarios are considered: (i) when the transmitter has
causal knowledge of the channel state (i.e., the state is visible), and (ii)
when the channel state is unknown at the transmitter, but observations of it
are available at the transmitter through feedback (i.e., the state is hidden).
In both scenarios, matching outer and inner bounds are derived on the rates of
communication and the capacity region is determined. It is shown that similar
results carry over to channels with memory and delayed feedback, and memoryless
compound channels with feedback. When the state is visible, the capacity region
has a single-letter characterization and is in terms of a linear program. Two
optimal coding schemes are devised that use feedback to keep track of the
sent/received packets via a network of queues: a probabilistic scheme and a
deterministic backpressure-like algorithm. The former bases its decisions
solely on the past channel state information and the latter follows a
max-weight queue-based policy. The performance of the algorithms are analyzed
using the frameworks of rate-stability in networks of queues, max-flow min-cut
duality in networks, and finite-horizon Lyapunov drift analysis. When the state
is hidden, the capacity region does not have a single-letter characterization
and is, in this sense, uncomputable. Approximations of the capacity region are
provided and two optimal coding algorithms are outlined. The first algorithm is
a probabilistic coding scheme that bases its decisions on the past L feedback
sequences and its achievable rate-region approaches the capacity region
exponentially fast in L. The second algorithm is a backpressure-like algorithm
that performs optimally in the long run.

Approximate Support Recovery of Atomic Line Spectral Estimation: A Tale of Resolution and Precision

Qiuwei Li, Gongguo Tang
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

This work investigates the parameter estimation performance of
super-resolution line spectral estimation using atomic norm minimization. The
focus is on analyzing the algorithm’s accuracy of inferring the frequencies and
complex magnitudes from noisy observations. When the Signal-to-Noise Ratio is
reasonably high and the true frequencies are separated by (O(frac{1}{n})), the
atomic norm estimator is shown to localize the correct number of frequencies,
each within a neighborhood of size (O(sqrt{frac{log n}{n^3}} sigma)) of one
of the true frequencies. Here (n) is half the number of temporal samples and
(sigma^2) is the Gaussian noise variance. The analysis is based on a
primal-dual witness construction procedure. The obtained error bound matches
the Cram’er-Rao lower bound up to a logarithmic factor. The relationship
between resolution (separation of frequencies) and precision or accuracy of the
estimator is highlighted. Our analysis also reveals that the atomic norm
minimization can be viewed as a convex way to solve a (ell_1)-norm
regularized, nonlinear and nonconvex least-squares problem to global
optimality.

Repairing Reed-Solomon Codes With Multiple Erasures

Hoang Dau, Iwan Duursma, Han Mao Kiah, Olgica Milenkovic
Comments: 16 pages
Subjects: Information Theory (cs.IT)

Despite their exceptional error-correcting properties, Reed-Solomon (RS)
codes have been overlooked in distributed storage applications due to the
common belief that they have poor repair bandwidth: A naive repair approach
would require the whole file to be reconstructed in order to recover a single
erased codeword symbol. In a recent work, Guruswami and Wootters (STOC’16)
proposed a single-erasure repair method for RS codes that achieves the optimal
repair bandwidth amongst all linear encoding schemes. Their key idea is to
recover the erased symbol by collecting a sufficiently large number of its
traces, each of which can be constructed from a number of traces of other
symbols. As all traces belong to a subfield of the defining field of the RS
code and many of them are linearly dependent, the total repair bandwidth is
significantly reduced compared to that of the naive repair scheme. We extend
the trace collection technique to cope with multiple erasures.

Rate-Compatible Punctured Polar Codes: Optimal Construction Based on Polar Spectra

Kai Niu, Jincheng Dai, Kai Chen, Jiaru Lin, Q. T. Zhang, Athanasios V. Vasilakos
Comments: Submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)

Polar codes are the first class of constructive channel codes achieving the
symmetric capacity of the binary-input discrete memoryless channels. But the
corresponding code length is limited to the power of two. In this paper, we
establish a systematic framework to design the rate-compatible punctured polar
(RCPP) codes with arbitrary code length. A new theoretic tool, called polar
spectra, is proposed to count the number of paths on the code tree with the
same number of zeros or ones respectively. Furthermore, a spectrum distance SD0
(SD1) and a joint spectrum distance (JSD) are presented as performance criteria
to optimize the puncturing tables. For the capacity-zero puncturing mode
(punctured bits are unknown to the decoder), we propose a quasi-uniform
puncturing algorithm, analyze the number of equivalent puncturings and prove
that this scheme can maximize SD1 and JSD. Similarly, for the capacity-one mode
(punctured bits are known to the decoder), we also devise a reversal
quasi-uniform puncturing scheme and prove that it has the maximum SD0 and JSD.
Both schemes have a universal puncturing table without any exhausted search.
These optimal RCPP codes outperform the performance of turbo codes in LTE
wireless communication systems.

On the Capacity of Discrete-Time Laguerre Optical Channels

Hossein Esmaeili, Jawad Salehi
Subjects: Information Theory (cs.IT)

In this paper, new upper and lower bounds are proposed for the capacity of
discrete-time Laguerre channel. Laguerre behavior is used to model various
types of optical systems and networks such as optical amplifiers, short
distance visible light communication systems with direct detection and coherent
code division multiple access (CDMA) networks. Bounds are derived for short
distance visible light communication systems and coherent CDMA networks. These
bounds are separated in three main cases: when both average and peak power
constraints are imposed, when peak power constraint is inactive and when only
peak power constraint is active.

In-network Compression for Multiterminal Cascade MIMO Systems

Inaki Estella Aguerri, Abdellatif Zaidi
Comments: Submitted to IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)

We study the problem of receive beamforming in uplink cascade multiple-input
multiple-output (MIMO) systems as an instance of that of cascade multiterminal
source coding for lossy function computation. Using this connection, we develop
two coding schemes for the second and show that their application leads to
efficient beamforming schemes for the first. In the first coding scheme, each
terminal in the cascade sends a description of the source that it observes; the
decoder reconstructs all sources, lossily, and then computes an estimate of the
desired function. This scheme improves upon standard routing in that every
terminal only compresses the innovation of its source w.r.t. the descriptions
that are sent by the previous terminals in the cascade. In the second scheme,
the desired function is computed gradually in the cascade network, and each
terminal sends a finer description of it. In the context of uplink cascade MIMO
systems, the application of these two schemes leads to efficient centralized
receive-beamforming and distributed receive-beamforming, respectively.

Towards a Tractable Delay Analysis in Large Wireless Networks

Yi Zhong, Martin Haenggi, Fu-Chun Zheng, Wenyi Zhang, Tony Q.S. Quek, Weili Nie
Comments: 17 pages, 4 figures, 1 table, submitted to IEEE Communications Magazine
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Meeting the diverse delay requirements of next-generation wireless
communication networks is one of the most critical goals of wireless system
design. Though the delay of point-to-point communications has been well
investigated using classical queueing theory, the delay of multi-point to
multi-point communications has not been explored in depth. The main technical
difficulty lies in the interacting queues problem, in which the service rate is
coupled with the statuses of other queues. In this article, we elaborate on the
main challenges of delay analysis in large wireless networks. Several promising
approaches to bypass these difficulties are proposed and summarized to provide
useful guidance.

Cache-Enabled Physical-Layer Security for Video Streaming in Wireless Networks with Limited Backhaul

Lin Xiang, Derrick Wing Kwan Ng, Robert Schober, Vincent W.S. Wong
Comments: Accepted for presentation at IEEE Globecom 2016, Washington, DC, Dec. 2016
Subjects: Information Theory (cs.IT)

In this paper, we investigate for the first time the benefits of wireless
caching for the physical layer security (PLS) of wireless networks. In
particular, a caching scheme enabling power-efficient PLS is proposed for
cellular video streaming with constrained backhaul capacity. By sharing video
data across a subset of base stations (BSs) through both caching and backhaul
loading, secure cooperative transmission of several BSs is dynamically enabled
in accordance with the cache status, the channel conditions, and the backhaul
capacity. Thereby, caching reduces the data sharing overhead over the
capacity-constrained backhaul links. More importantly, caching introduces
additional secure degrees of freedom and enables a power-efficient design. We
investigate the optimal caching and transmission policies for minimizing the
total transmit power while providing quality of service (QoS) and guaranteeing
secrecy during video delivery. A two-stage non-convex mixed-integer
optimization problem is formulated, which optimizes the caching policy in an
offline video caching stage and the cooperative transmission policy in an
online video delivery stage. As the problem is NP-hard, suboptimal
polynomial-time algorithms are proposed for low-complexity cache training and
delivery control, respectively. Sufficient optimality conditions, under which
the proposed schemes attain global optimal solutions, are also provided.
Simulation results show that the proposed schemes achieve low secrecy outage
probability and high power efficiency simultaneously.

Vector Approximate Message Passing for the Generalized Linear Model

Philip Schniter, Sundeep Rangan, Alyson K. Fletcher
Subjects: Information Theory (cs.IT)

The generalized linear model (GLM), where a random vector (oldsymbol{x}) is
observed through a noisy, possibly nonlinear, function of a linear transform
output (oldsymbol{z}=oldsymbol{Ax}), arises in a range of applications such
as robust regression, binary classification, quantized compressed sensing,
phase retrieval, photon-limited imaging, and inference from neural spike
trains. When (oldsymbol{A}) is large and i.i.d. Gaussian, the generalized
approximate message passing (GAMP) algorithm is an efficient means of MAP or
marginal inference, and its performance can be rigorously characterized by a
scalar state evolution. For general (oldsymbol{A}), though, GAMP can
misbehave. Damping and sequential-updating help to robustify GAMP, but their
effects are limited. Recently, a “vector AMP” (VAMP) algorithm was proposed for
additive white Gaussian noise channels. VAMP extends AMP’s guarantees from
i.i.d. Gaussian (oldsymbol{A}) to the larger class of rotationally invariant
(oldsymbol{A}). In this paper, we show how VAMP can be extended to the GLM.
Numerical experiments show that the proposed GLM-VAMP is much more robust to
ill-conditioning in (oldsymbol{A}) than damped GAMP.

Onsager-Corrected Deep Networks for Sparse Linear Inverse Problems

Mark Borgerding, Philip Schniter
Comments: arXiv admin note: text overlap with arXiv:1607.05966
Subjects: Information Theory (cs.IT)

Deep learning has gained great popularity due to its widespread success on
many inference problems. We consider the application of deep learning to the
sparse linear inverse problem encountered in compressive sensing, where one
seeks to recover a sparse signal from a few noisy linear measurements. In this
paper, we propose two novel neural-network architectures that decouple
prediction errors across layers in the same way that the approximate message
passing (AMP) algorithms decouple them across iterations: through Onsager
correction. We show numerically that our “learned AMP” network significantly
improves upon Gregor and LeCun’s “learned ISTA” when both use soft-thresholding
shrinkage. We then show that additional improvements result from jointly
learning the shrinkage functions together with the linear transforms. Finally,
we propose a network design inspired by an unfolding of the recently proposed
“vector AMP” (VAMP) algorithm, and show that it outperforms all previously
considered networks. Interestingly, the linear transforms and shrinkage
functions prescribed by VAMP coincide with the values learned through
backpropagation, yielding an intuitive explanation for the design of this deep
network.

Novel Delivery Schemes for Decentralized Coded Caching in the Finite File Size Regime

Kai Wan, Daniela Tuninetti, Pablo Piantanida
Comments: 6 pages, 1 figure, submitted to ICC 2017-WT10
Subjects: Information Theory (cs.IT)

This paper analyzes the achievable tradeoff between cache~size and
download~rate in decentralized caching systems with the uncoded cache placement
originally proposed by Maddah-Ali and Niesen. It proposes two novel delivery
schemes that take advantage of the multicasting opportunities that arise when a
file is demanded by multiple users. These delivery schemes are extensions of
known ones to the regime where the file size is finite. Numerical evaluations
for the case of file uniform popularity show that the proposed schemes
outperform previous ones for all value of the cache size.

On the Performance of Visible Light Communications Systems with Non-Orthogonal Multiple Access

Hanaa Marshoud, Paschalis C. Sofotasios, Sami Muhaidat, George K. Karagiannidis
Subjects: Information Theory (cs.IT)

Visible light communications (VLC) have been recently proposed as a promising
and efficient solution to indoor ubiquitous broadband connectivity. In this
paper, non-orthogonal multiple access, which has been recently proposed as an
effective scheme for fifth generation (5G) wireless networks, is considered in
the context of VLC systems, under different channel uncertainty models. To this
end, we first derive a novel closed-form expression for the bit-error-rate
(BER) under perfect channel state information (CSI). Capitalizing on this, we
quantify the effect of noisy and outdated CSI by deriving a simple approximated
expression for the former and a tight upper bound for the latter. The offered
results are corroborated by respective results from extensive Monte Carlo
simulations and are used to provide useful insights on the effect of imperfect
CSI knowledge on the system performance. It was shown that, while noisy CSI
leads to slight degradation in the BER performance, outdated CSI can cause
detrimental performance degradation if the order of the users’ channel gains
change as a result of mobility

Linear Codes over (mathbb{F}_{q}[x]/(x^2)) and (GR(p^2,m)) Reaching the Griesmer Bound

Jin Li, Aixian Zhang, Keqin Feng
Subjects: Information Theory (cs.IT)

We construct two series of linear codes over finite ring
(mathbb{F}_{q}[x]/(x^2)) and Galois ring (GR(p^2,m)) respectively reaching the
Griesmer bound. They derive two series of codes over finite field
(mathbb{F}_{q}) by Gray map. The first series of codes over (mathbb{F}_{q})
derived from (mathbb{F}_{q}[x]/(x^2)) are linear and also reach the Griesmer
bound in some cases. Many of linear codes over finite field we constructed have
two Hamming (non-zero) weights.

A Mathematical Proof of the Superiority of NOMA Compared to Conventional OMA

Zhiyong Chen, Zhiguo Ding, Xuchu Dai, Rui Zhang
Comments: 28 pages, 8 figures, submitted to IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT)

While existing works about non-orthogonal multiple access (NOMA) have
indicated that NOMA can yield a significant performance gain over orthogonal
multiple access (OMA) with fixed resource allocation, it is not clear whether
such a performance gain will diminish when optimal resource
(Time/Frequency/Power) allocation is carried out. In this paper, the
performance comparison between NOMA and conventional OMA systems is
investigated, from an optimization point of view. Firstly, by using the idea of
power splitting, a closed-form expression for the optimum sum rate of NOMA
systems is derived. Then, with rigorous mathematical proofs, we reveal the fact
that NOMA can always outperform conventional OMA systems, even if both are
equipped with the optimal resource allocation policies. Finally, computer
simulations are conducted to validate the accuracy of the analytical results.

Placement Optimization of UAV-Mounted Mobile Base Stations

Jiangbin Lyu, Yong Zeng, Rui Zhang, Teng Joon Lim
Comments: 4 pages, 3 figures, 1 table, accepted for publications in IEEE Communications Letters
Subjects: Information Theory (cs.IT)

In terrestrial communication networks without fixed infrastructure, unmanned
aerial vehicle (UAV)-mounted mobile base stations (MBSs) provide an efficient
solution to achieve wireless connectivity. This letter aims to minimize the
number of MBSs needed to provide wireless coverage for a group of distributed
ground terminals (GTs), ensuring that each GT is within the communication range
of at least one MBS. We propose a polynomial-time algorithm with successive MBS
placement, where the MBSs are placed sequentially starting on the area
perimeter of the uncovered GTs along a spiral path towards the center, until
all GTs are covered. Each MBS is placed to cover as many uncovered GTs as
possible, with higher priority given to the GTs on the boundary to reduce the
occurrence of outlier GTs that each may require one dedicated MBS for its
coverage. Numerical results show that the proposed algorithm performs favorably
compared to other schemes in terms of the total number of required MBSs and/or
time complexity.

A Protocol for a Secure Remote Keyless Entry System Applicable in Vehicles using Symmetric-Key Cryptography

Tobias Glocker, Timo Mantere, Mohammed Elmusrati
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

In our modern society comfort became a standard. This comfort, especially in
cars can only be achieved by equipping the car with more electronic devices.
Some of the electronic devices must cooperate with each other and thus they
require a communication channel, which can be wired or wireless. In these days,
it would be hard to sell a new car operating with traditional keys. Almost all
modern cars can be locked or unlocked with a Remote Keyless System. A Remote
Keyless System consists of a key fob that communicates wirelessly with the car
transceiver that is responsible for locking and unlocking the car. However
there are several threats for wireless communication channels.

This paper describes the possible attacks against a Remote Keyless System and
introduces a secure protocol as well as a lightweight Symmetric Encryption
Algorithm for a Remote Keyless Entry System applicable in vehicles.

Two new families of two-weight codes

Minjia Shi, Yue Guan, Patrick Sole
Comments: 13 pages, submitted on 10 November
Subjects: Information Theory (cs.IT)

We construct two new infinite families of trace codes of dimension (2m), over
the ring (mathbb{F}_p+umathbb{F}_p,) when (p) is an odd prime. They have the
algebraic structure of abelian codes. Their Lee weight distribution is computed
by using Gauss sums. By Gray mapping, we obtain two infinite families of linear
(p)-ary codes of respective lengths ((p^m-1)^2) and (2(p^m-1)^2.) When (m) is
singly-even, the first family gives five-weight codes. When (m) is odd, and
(pequiv 3 pmod{4},) the first family yields (p)-ary two-weight codes, which
are shown to be optimal by application of the Griesmer bound. The second family
consists of two-weight codes that are shown to be optimal, by the Griesmer
bound, whenever (p=3) and (m ge 3,) or (pge 5) and (mge 4.) Applications to
secret sharing schemes are given.

Optimal two weight codes from trace codes over a chain ring

Minjia Shi, Rongsheng Wu, Patrick Sole
Comments: 12 pages, submitted on 4 October. arXiv admin note: text overlap with arXiv:1612.00123, arXiv:1612.00128
Subjects: Information Theory (cs.IT)

In this paper, we construct an infinite family of two-weight codes for the
homogeneous metric over the ring (R=mathbb{F}_2+umathbb{F}_2+cdots
+u^{k-1}mathbb{F}_{2}), where (u^k=0.) These codes are defined as trace codes.
They have the algebraic structure of abelian codes. Their homogeneous weight
distribution is computed by using character sums. In particular, we give a
necessary and sufficient condition of optimality for the Gray image codes by
using the Griesmer bound. We have shown that if (k>3), these images are
projective. Their support structure is determined. An application to secret
sharing schemes is given.

Several Classes of (p)-ary Linear Codes with Few Weights

Rongsheng Wu, Minjia Shi, Patrick Sole
Comments: 11 pages
Subjects: Information Theory (cs.IT)

Few weights codes have been an important topic of coding theory for decades.
In this paper, a class of two-weight and three-weight codes for the homogeneous
metric over the chain ring (R=mathbb{F}_p+umathbb{F}_p+cdots
+u^{k-1}mathbb{F}_{p}) are constructed. These codes are defined as trace
codes. They are shown to be abelian. Their homogeneous weight distributions are
computed by using exponential sums. In particular, we give a necessary and
sufficient condition of the optimality for their Gray images by using the
Griesmer bound in the two-weight case, and the information about the dual
homogeneous distance is also given. In addition, the codewords of these codes
have been turn out to be minimal, it is significant to obtain secret sharing
schemes with interesting access structures.

Some ternary cubic two-weight codes

Minjia Shi, Daitao Huang, Patrick Sole
Comments: 11 pages, submitted on 2 December. arXiv admin note: text overlap with arXiv:1612.00118
Subjects: Information Theory (cs.IT)

We study trace codes with defining set (L,) a subgroup of the multiplicative
group of an extension of degree (m) of the alphabet ring
(mathbb{F}_3+umathbb{F}_3+u^{2}mathbb{F}_{3},) with (u^{3}=1.) These codes
are abelian, and their ternary images are quasi-cyclic of co-index three
(a.k.a. cubic codes). Their Lee weight distributions are computed by using
Gauss sums. These codes have three nonzero weights when (m) is singly-even and
(|L|=frac{3^{3m}-3^{2m}}{2}.) When (m) is odd, and
(|L|=frac{3^{3m}-3^{2m}}{2}), or (|L|={3^{3m}-3^{2m}}) and (m) is a positive
integer, we obtain two new infinite families of two-weight codes which are
optimal. Applications of the image codes to secret sharing schemes are also
given.

SNIPE for Memory-Limited PCA From Incomplete Data

Armin Eftekhari, Laura Balzano, Dehui Yang, Michael B. Wakin
Subjects: Information Theory (cs.IT)

The linear subspace model is pervasive in science and engineering and
particularly in large datasets which are often incomplete due to missing
measurements and privacy issues. Therefore, a critical problem in modeling is
to develop algorithms for estimating a low-dimensional subspace model from
incomplete data efficiently in terms of both computational complexity and
memory storage. In this paper we study an algorithm that processes blocks of
incomplete data to estimate the underlying subspace model. Our algorithm has a
simple interpretation as optimizing the subspace to fit the observed data block
but remain close to the previous estimate. We prove a linear rate of
convergence for the algorithm and our rate holds with high probability.

Analysis of finite sample size quantum hypothesis testing via martingale concentration inequalities

Cambyse Rouze, Nilanjana Datta
Comments: 24 pages, 2 figures. arXiv admin note: text overlap with arXiv:1510.04682
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Statistics Theory (math.ST)

Martingale concentration inequalities constitute a powerful mathematical tool
in the analysis of problems in a wide variety of fields ranging from
probability and statistics to information theory and machine learning. Here we
apply such inequalities to finite sample size quantum hypothesis testing, which
is the problem of discriminating quantum states belonging to two different
sequences ({
ho_n}_n) and ({sigma_n}_n), for a finite (n). We obtain
upper bounds on the type II Stein- and Hoeffding errors, which, for
i.i.d.~states, are in general tighter than the corresponding bounds obtained by
Audenaert, Mosonyi and Verstrate. Moreover, our bounds are also valid for
sequences of non-i.i.d.~states which satisfy certain conditions. We also use a
martingale concentration inequality to obtain an upper bound on the second
order asymptotics of the type II error exponent in the setting of quantum
Stein’s lemma, for certain classes of states.

Robust nonparametric nearest neighbor random process clustering

Michael Tschannen, Helmut Bölcskei
Comments: 13 pages, 4 figures
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

We consider the problem of clustering noisy finite-length observations of
stationary ergodic random processes according to their generative models
without prior knowledge of the model statistics and the number of generative
models. Two algorithms, both using the L1-distance between estimated power
spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The
first one, termed nearest neighbor process clustering (NNPC) is new and relies
on partitioning the nearest neighbor graph of the observations via spectral
clustering. The second algorithm, simply referred to as k-means (KM), consists
of a single k-means iteration with farthest point initialization and was
considered before in the literature, albeit with a different dissimilarity
measure and with asymptotic performance results only. We prove that both
algorithms succeed with high probability in the presence of noise and missing
entries, and even when the generative process PSDs overlap significantly, all
provided that the observation length is sufficiently large. Our results
quantify the tradeoff between the overlap of the generative process PSDs, the
observation length, the fraction of missing entries, and the noise variance.
Furthermore, we prove that treating the finite-length observations of
stationary ergodic random processes as vectors in Euclidean space and
clustering them using the thresholding-based subspace clustering (TSC)
algorithm, the subspace clustering cousin of NNPC, results in performance
strictly inferior to that of NNPC. We argue that the underlying cause is to be
found in TSC employing spherical distance as dissimilarity measure, thereby
ignoring the stationary process structure of the observations. Finally, we
provide extensive numerical results for synthetic and real data and find that
NNPC outperforms state-of-the-art algorithms in human motion sequence
clustering.

The Optimality of Correlated Sampling

Mohammad Bavarian, Badih Ghazi, Elad Haramaty, Pritish Kamath, Ronald L. Rivest, Madhu Sudan
Subjects: Computational Complexity (cs.CC); Information Theory (cs.IT)

In the “correlated sampling” problem, two players, say Alice and Bob, are
given two distributions, say (P) and (Q) respectively, over the same universe
and access to shared randomness. The two players are required to output two
elements, without any interaction, sampled according to their respective
distributions, while trying to minimize the probability that their outputs
disagree. A well-known protocol due to Holenstein, with close variants (for
similar problems) due to Broder, and to Kleinberg and Tardos, solves this task
with disagreement probability at most (2 delta/(1+delta)), where (delta) is
the total variation distance between (P) and (Q). This protocol has been used
in several different contexts including sketching algorithms, approximation
algorithms based on rounding linear programming relaxations, the study of
parallel repetition and cryptography.

In this note, we give a surprisingly simple proof that this protocol is in
fact tight. Specifically, for every (delta in (0,1)), we show that any
correlated sampling scheme should have disagreement probability at least
(2delta/(1+delta)). This partially answers a recent question of Rivest.

Our proof is based on studying a new problem we call “constrained agreement”.
Here, Alice is given a subset (A subseteq [n]) and is required to output an
element (i in A), Bob is given a subset (B subseteq [n]) and is required to
output an element (j in B), and the goal is to minimize the probability that
(i
eq j). We prove tight bounds on this question, which turn out to imply
tight bounds for correlated sampling. Though we settle basic questions about
the two problems, our formulation also leads to several questions that remain
open.

Online Page Migration on Ring Networks in Uniform Model

This paper explores the problem of page migration in ring networks. A ring
network is a connected graph, in which each node is connected with exactly two
other nodes. In this problem, one of the nodes in a given network holds a page
of size D. This node is called the server and the page is a non-duplicable data
in the network. Requests are issued by nodes to access the page one after
another. Every time a new request is issued, the server must serve the request
and may migrate to another node before the next request arrives. A service
costs the distance between the server and the requesting node, and the
migration costs the distance of the migration multiplied by D. The problem is
to minimize the total costs of services and migrations. We study this problem
in uniform model, for which the page has a unit size, i.e. D=1. A
3.326-competitive algorithm improving the current best upper bound is designed.
We show that this ratio is tight for our algorithm.

On The Fundamental Energy Tradeoffs of Geographical Load Balancing

Abbas Kiani, Nirwan Ansari
Comments: to appear IEEE Communications Magazine
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

Geographical load balancing can optimize the utilization of green energy and
the cost of electricity by taking the advantages of green and price diversities
at geographical dispersed data centers. However, higher green energy
utilization or lower electricity cost may actually increase the total energy
consumption, and is not necessarily the best option. The achievable energy
tradeoffs can be captured by taking into consideration of a defined service
efficiency parameter for geo-dispersed data centers.

Asymptotically Good Convolutional Codes

Giuliano Gadioli La Guardia
Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

In this paper, we construct new sequences of asymptotically good
convolutional codes. These sequences are obtained from sequences of transitive,
self-orthogonal and self-dual block codes that attain the Tsfasman-Vladut-Zink
bound. Furthermore, by applying the techniques of expanding, extending,
puncturing, direct sum, the |u|u+v| construction and the product code
construction to these block codes, we construct more new sequences of
asymptotically good convolutional codes. Additionally, we show that the
proposed construction method presented here also works when applied for all
sequences of good block codes where lim kj/nj and lim dj/nj exist.