IT博客汇 | arXiv Paper Daily: Tue, 22 Nov 2016

arXiv Paper Daily: Tue, 22 Nov 2016

我爱机器学习(52ml.net)发表于 2016-11-22 00:00:00

Neural and Evolutionary Computing

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

Matthew W. Moskewicz, Ali Jannesari, Kurt Keutzer
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)

In recent years, deep neural networks (DNNs), have yielded strong results on
a wide range of applications. Graphics Processing Units (GPUs) have been one
key enabling factor leading to the current popularity of DNNs. However, despite
increasing hardware flexibility and software programming toolchain maturity,
high efficiency GPU programming remains difficult: it suffers from high
complexity, low productivity, and low portability. GPU vendors such as NVIDIA
have spent enormous effort to write special-purpose DNN libraries. However, on
other hardware targets, especially mobile GPUs, such vendor libraries are not
generally available. Thus, the development of portable, open, high-performance,
energy-efficient GPU code for DNN operations would enable broader deployment of
DNN-based algorithms. Toward this end, this work presents a framework to enable
productive, high-efficiency GPU programming for DNN computations across
hardware platforms and programming models. In particular, the framework
provides specific support for metaprogramming, autotuning, and DNN-tailored
data types. Using our framework, we explore implementing DNN operations on
three different hardware targets: NVIDIA, AMD, and Qualcomm GPUs. On NVIDIA
GPUs, we show both portability between OpenCL and CUDA as well competitive
performance compared to the vendor library. On Qualcomm GPUs, we show that our
framework enables productive development of target-specific optimizations, and
achieves reasonable absolute performance. Finally, On AMD GPUs, we show initial
results that indicate our framework can yield reasonable performance on a new
platform with minimal effort.

Using inspiration from synaptic plasticity rules to optimize traffic flow in distributed engineered networks

Jonathan Y. Suen, Saket Navlakha
Comments: 43 pages, 5 Figures. Submitted to Neural Computation
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

Controlling the flow and routing of data is a fundamental problem in many
distributed networks, including transportation systems, integrated circuits,
and the Internet. In the brain, synaptic plasticity rules have been discovered
that regulate network activity in response to environmental inputs, which
enable circuits to be stable yet flexible. Here, we develop a new
neuro-inspired model for network flow control that only depends on modifying
edge weights in an activity-dependent manner. We show how two fundamental
plasticity rules (long-term potentiation and long-term depression) can be cast
as a distributed gradient descent algorithm for regulating traffic flow in
engineered networks. We then characterize, both via simulation and
analytically, how different forms of edge-weight update rules affect network
routing efficiency and robustness. We find a close correspondence between
certain classes of synaptic weight update rules derived experimentally in the
brain and rules commonly used in engineering, suggesting common principles to
both.

Efficient Stochastic Inference of Bitwise Deep Neural Networks

Sebastian Vogel, Christoph Schorn, Andre Guntoro, Gerd Ascheid
Comments: 6 pages, 3 figures, Workshop on Efficient Methods for Deep Neural Networks at Neural Information Processing Systems Conference 2016, NIPS 2016, EMDNN 2016
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Recently published methods enable training of bitwise neural networks which
allow reduced representation of down to a single bit per weight. We present a
method that exploits ensemble decisions based on multiple stochastically
sampled network models to increase performance figures of bitwise neural
networks in terms of classification accuracy at inference. Our experiments with
the CIFAR-10 and GTSRB datasets show that the performance of such network
ensembles surpasses the performance of the high-precision base model. With this
technique we achieve 5.81% best classification error on CIFAR-10 test set using
bitwise networks. Concerning inference on embedded systems we evaluate these
bitwise networks using a hardware efficient stochastic rounding procedure. Our
work contributes to efficient embedded bitwise neural networks.

Spikes as regularizers

Anders Søgaard
Comments: Computing with Spikes at NIPS 2016
Subjects: Neural and Evolutionary Computing (cs.NE)

We present a confidence-based single-layer feed-forward learning algorithm
SPIRAL (Spike Regularized Adaptive Learning) relying on an encoding of
activation spikes. We adaptively update a weight vector relying on confidence
estimates and activation offsets relative to previous activity. We regularize
updates proportionally to item-level confidence and weight-specific support,
loosely inspired by the observation from neurophysiology that high spike rates
are sometimes accompanied by low temporal precision. Our experiments suggest
that the new learning algorithm SPIRAL is more robust and less prone to
overfitting than both the averaged perceptron and AROW.

Multiple-View Spectral Clustering for Group-wise Functional Community Detection

Nathan D. Cahill, Harmeet Singh, Chao Zhang, Daryl A. Corcoran, Alison M. Prengaman, Paul S. Wenger, John F. Hamilton, Peter Bajorski, Andrew M. Michael
Comments: Presented at The MICCAI-BACON 16 Workshop (this https URL)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Functional connectivity analysis yields powerful insights into our
understanding of the human brain. Group-wise functional community detection
aims to partition the brain into clusters, or communities, in which functional
activity is inter-regionally correlated in a common manner across a group of
subjects. In this article, we show how to use multiple-view spectral clustering
to perform group-wise functional community detection. In a series of
experiments on 291 subjects from the Human Connectome Project, we compare three
versions of multiple-view spectral clustering: MVSC (uniform weights), MVSCW
(weights based on subject-specific embedding quality), and AASC (weights
optimized along with the embedding) with the competing technique of Joint
Diagonalization of Laplacians (JDL). Results show that multiple-view spectral
clustering not only yields group-wise functional communities that are more
consistent than JDL when using randomly selected subsets of individual brains,
but it is several orders of magnitude faster than JDL.

Generalized Dropout

Suraj Srinivas, R. Venkatesh Babu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Deep Neural Networks often require good regularizers to generalize well.
Dropout is one such regularizer that is widely used among Deep Learning
practitioners. Recent work has shown that Dropout can also be viewed as
performing Approximate Bayesian Inference over the network parameters. In this
work, we generalize this notion and introduce a rich family of regularizers
which we call Generalized Dropout. One set of methods in this family, called
Dropout++, is a version of Dropout with trainable parameters. Classical Dropout
emerges as a special case of this method. Another member of this family selects
the width of neural network layers. Experiments show that these methods help in
improving generalization performance over Dropout.

Deep Tensor Convolution on Multicores

David Budden, Alexander Matveev, Shibani Santurkar, Shraman Ray Chaudhuri, Nir Shavit
Comments: 8 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)

Deep convolutional neural networks (ConvNets) have become a de facto standard
for image classification and segmentation problems. These networks have also
had early success in the video domain, despite failing to capture motion
continuity and other rich temporal correlations. Evidence has since emerged
that extending ConvNets to 3-dimensions leads to state-of-the-art performance
across a broad set of video processing tasks by learning these joint
spatiotemporal features. However, these early 3D networks have been restricted
to shallower architectures of fewer channels than successful 2D networks due to
memory constraints inherent to GPU implementations.

In this study we present the first practical CPU implementation of tensor
convolution optimized for deep networks of small kernels. Our implementation
supports arbitrarily deep ConvNets of (N)-dimensional tensors due to the
relaxed memory constraints of CPU systems, which can be further leveraged for
an 8-fold reduction in the algorithmic cost of 3D convolution (e.g. C3D
kernels). Because most of the optimized ConvNets in previous literature are 2
rather than 3-dimensional, we benchmark our performance against the most
popular 2D implementations. Even in this special case, which is theoretically
the least beneficial for our fast algorithm, we observe a 5 to 25-fold
improvement in throughput compared to previous state-of-the-art. We believe
this work is an important step toward practical ConvNets for real-time
applications, such as mobile video processing and biomedical image analysis,
where high performance 3D networks are a must.

Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline

Zhiguang Wang, Weizhong Yan, Tim Oates
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We propose a simple but strong baseline for time series classification from
scratch with deep neural networks. Our proposed baseline models are pure
end-to-end without any heavy preprocessing on the raw data or feature crafting.
The FCN achieves premium performance to other state-of-the-art approaches. Our
exploration of the very deep neural networks with the ResNet structure achieves
competitive performance under the same simple experiment settings. The simple
MLP baseline is also comparable to the 1NN-DTW as a previous golden baseline.
Our models provides a simple choice for the real world application and a good
starting point for the future research. An overall analysis is provided to
discuss the generalization of our models, learned features, network structures
and the classification semantics.

Fast Video Classification via Adaptive Cascading of Deep Models

Haichen Shen, Seungyeop Han, Matthai Philipose, Arvind Krishnamurthy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recent advances have enabled “oracle” classifiers that can classify across
many classes and input distributions with high accuracy without retraining.
However, these classifiers are relatively heavyweight, so that applying them to
classify video is costly. We show that day-to-day video exhibits highly skewed
class distributions over the short term, and that these distributions can be
classified by much simpler models. We formulate the problem of detecting the
short-term skews online and exploiting models based on it as a new sequential
decision making problem dubbed the Online Bandit Problem, and present a new
algorithm to solve it. When applied to recognizing faces in TV shows and
movies, we realize end-to-end classification speedups of 2.5-8.5x/2.8-12.7x (on
GPU/CPU) relative to a state-of-the-art convolutional neural network, at
competitive accuracy.

Quantized neural network design under weight capacity constraint

Sungho Shin, Kyuyeon Hwang, Wonyong Sung
Comments: This paper is accepted at NIPS 2016 workshop on Efficient Methods for Deep Neural Networks (EMDNN). arXiv admin note: text overlap with arXiv:1511.06488
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The complexity of deep neural network algorithms for hardware implementation
can be lowered either by scaling the number of units or reducing the
word-length of weights. Both approaches, however, can accompany the performance
degradation although many types of research are conducted to relieve this
problem. Thus, it is an important question which one, between the network size
scaling and the weight quantization, is more effective for hardware
optimization. For this study, the performances of fully-connected deep neural
networks (FCDNNs) and convolutional neural networks (CNNs) are evaluated while
changing the network complexity and the word-length of weights. Based on these
experiments, we present the effective compression ratio (ECR) to guide the
trade-off between the network size and the precision of weights when the
hardware resource is limited.

Learning the Number of Neurons in Deep Networks

Jose M Alvarez, Mathieu Salzmann
Comments: NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Nowadays, the number of layers and of neurons in each layer of a deep network
are typically set manually. While very deep and wide networks have proven
effective in general, they come at a high memory and computation cost, thus
making them impractical for constrained platforms. These networks, however, are
known to have many redundant parameters, and could thus, in principle, be
replaced by more compact architectures. In this paper, we introduce an approach
to automatically determining the number of neurons in each layer of a deep
network during learning. To this end, we propose to make use of a group
sparsity regularizer on the parameters of the network, where each group is
defined to act on a single neuron. Starting from an overcomplete network, we
show that our approach can reduce the number of parameters by up to 80\% while
retaining or even improving the network accuracy.

Local minima in training of deep networks

Grzegorz Swirszcz, Wojciech Marian Czarnecki, Razvan Pascanu
Comments: submitted to ICLR 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

There has been a lot of recent interest in trying to characterize the error
surface of deep models. This stems from a long standing question. Given that
deep networks are highly nonlinear systems optimized by local gradient methods,
why do they not seem to be affected by bad local minima? It is widely believed
that training of deep models using gradient methods works so well because the
error surface either has no local minima, or if they exist they need to be
close in value to the global minimum. It is known that such results hold under
very strong assumptions which are not satisfied by real models. In this paper
we present examples showing that for such theorem to be true additional
assumptions on the data, initialization schemes and/or the model classes have
to be made. We look at the particular case of finite size datasets. We
demonstrate that in this scenario one can construct counter-examples (datasets
or initialization schemes) when the network does become susceptible to bad
local minima over the weight space.

Computer Vision and Pattern Recognition

Image-to-Image Translation with Conditional Adversarial Networks

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We investigate conditional adversarial networks as a general-purpose solution
to image-to-image translation problems. These networks not only learn the
mapping from input image to output image, but also learn a loss function to
train this mapping. This makes it possible to apply the same generic approach
to problems that traditionally would require very different loss formulations.
We demonstrate that this approach is effective at synthesizing photos from
label maps, reconstructing objects from edge maps, and colorizing images, among
other tasks. As a community, we no longer hand-engineer our mapping functions,
and this work suggests we can achieve reasonable results without
hand-engineering our loss functions either.

Precise Relaxation of the Mumford-Shah Functional

Thomas Möllenhoff, Daniel Cremers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Jumps, edges and cutoffs are prevalent in our world across many modalities.
The Mumford-Shah functional is a classical and elegant approach for modeling
such discontinuities but global optimization of this non-convex functional
remains challenging. The state of the art are convex representations based on
the theory of calibrations. The major drawback of these approaches is the
ultimate discretization of the co-domain into labels. For the case of total
variation regularization, this issue has been partially resolved by recent
sublabel-accurate relaxations, a generalization of which to other regularizers
is not straightforward. In this work, we show that sublabel-accurate lifting
approaches can be derived by discretizing a continuous relaxation of the
Mumford-Shah functional by means of finite elements. We thereby unify and
generalize existing functional lifting approaches. We show the efficiency of
the proposed discretizations on discontinuity-preserving denoising tasks.

Multiple-View Spectral Clustering for Group-wise Functional Community Detection

Functional connectivity analysis yields powerful insights into our
understanding of the human brain. Group-wise functional community detection
aims to partition the brain into clusters, or communities, in which functional
activity is inter-regionally correlated in a common manner across a group of
subjects. In this article, we show how to use multiple-view spectral clustering
to perform group-wise functional community detection. In a series of
experiments on 291 subjects from the Human Connectome Project, we compare three
versions of multiple-view spectral clustering: MVSC (uniform weights), MVSCW
(weights based on subject-specific embedding quality), and AASC (weights
optimized along with the embedding) with the competing technique of Joint
Diagonalization of Laplacians (JDL). Results show that multiple-view spectral
clustering not only yields group-wise functional communities that are more
consistent than JDL when using randomly selected subsets of individual brains,
but it is several orders of magnitude faster than JDL.

Kernel Cross-View Collaborative Representation based Classification for Person Re-Identification

Raphael Prates, William Robson Schwartz
Comments: Paper submitted to CVPR 2017 conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Person re-identification aims at the maintenance of a global identity as a
person moves among non-overlapping surveillance cameras. It is a hard task due
to different illumination conditions, viewpoints and the small number of
annotated individuals from each pair of cameras (small-sample-size problem).
Collaborative Representation based Classification (CRC) has been employed
successfully to address the small-sample-size problem in computer vision.
However, the original CRC formulation is not well-suited for person
re-identification since it does not consider that probe and gallery samples are
from different cameras. Furthermore, it is a linear model, while appearance
changes caused by different camera conditions indicate a strong nonlinear
transition between cameras. To overcome such limitations, we propose the Kernel
Cross-View Collaborative Representation based Classification (Kernel X-CRC)
that represents probe and gallery images by balancing representativeness and
similarity nonlinearly. It assumes that a probe and its corresponding gallery
image are represented with similar coding vectors using individuals from the
training set. Experimental results demonstrate that our assumption is true when
using a high-dimensional feature vector and becomes more compelling when
dealing with a low-dimensional and discriminative representation computed using
a common subspace learning method. We achieve state-of-the-art for rank-1
matching rates in two person re-identification datasets (PRID450S and GRID) and
the second best results on VIPeR and CUHK01 datasets.

Sampled Image Tagging and Retrieval Methods on User Generated Content

Karl Ni, Kyle Zaragoza, Carmen Carrano, Barry Chen, Yonas Tesfaye, Alex Gude
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Traditional image tagging and retrieval algorithms have limited value as a
result of being trained with heavily curated datasets. These limitations are
most evident when arbitrary search words are used that do not intersect with
training set labels. Weak labels from user generated content (UGC) found in the
wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of
unique words in the metadata tags. Prior work on word embeddings successfully
leveraged unstructured text with large vocabularies, and our proposed method
seeks to apply similar cost functions to open source imagery. Specifically, we
train a deep learning image tagging and retrieval system on large scale, user
generated content (UGC) using sampling methods and joint optimization of word
embeddings. By using the Yahoo! FlickR Creative Commons (YFCC100M) dataset,
such an approach builds robustness to common unstructured data issues that
include but are not limited to irrelevant tags, misspellings, multiple
languages, polysemy, and tag imbalance. As a result, the final proposed
algorithm will not only yield comparable results to state of the art in
conventional image tagging, but will enable new capability to train algorithms
on large, scale unstructured text in the YFCC100M dataset and outperform cited
work in zero-shot capability.

Statistical Learning for OCR Text Correction

Jie Mei, Aminul Islam, Yajing Wu, Abidalrahman Moh'd, Evangelos E. Milios
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

The accuracy of Optical Character Recognition (OCR) is crucial to the success
of subsequent applications used in text analyzing pipeline. Recent models of
OCR post-processing significantly improve the quality of OCR-generated text,
but are still prone to suggest correction candidates from limited observations
while insufficiently accounting for the characteristics of OCR errors. In this
paper, we show how to enlarge candidate suggestion space by using external
corpus and integrating OCR-specific features in a regression approach to
correct OCR-generated errors. The evaluation results show that our model can
correct 61.5% of the OCR-errors (considering the top 1 suggestion) and 71.5% of
the OCR-errors (considering the top 3 suggestions), for cases where the
theoretical correction upper-bound is 78%.

Dense Captioning with Joint Inference and Visual Context

Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dense captioning is a newly emerging computer vision topic for understanding
images with dense language descriptions. The goal is to densely detect visual
concepts (e.g., objects, object parts, and interactions between them) from
images, labeling each with a short descriptive phrase. We identify two key
challenges of dense captioning that need to be properly addressed when tackling
the problem. First, dense visual concept annotations in each image are
associated with highly overlapping target regions, making accurate localization
of each visual concept challenging. Second, the large amount of visual concepts
makes it hard to recognize each of them by appearance alone. We propose a new
model pipeline based on two novel ideas, joint inference and context fusion, to
alleviate these two challenges. We design our model architecture in a
methodical manner and thoroughly evaluate the variations in architecture. Our
final model, compact and efficient, achieves state-of-the-art accuracy on
Visual Genome for dense captioning with a relative gain of 73\% compared to the
previous best algorithm. Qualitative experiments also reveal the semantic
capabilities of our model in dense captioning.

Predicting 1p19q Chromosomal Deletion of Low-Grade Gliomas from MR Images using Deep Learning

Zeynettin Akkus, Issa Ali, Jiri Sedlar, Timothy L. Kline, Jay P. Agrawal, Ian F. Parney, Caterina Giannini, Bradley J. Erickson
Comments: This work has been presented in Conference on Machine Intelligence in Medical Imaging 2016 and RSNA 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Objective: Several studies have associated codeletion of chromosome arms
1p/19q in low-grade gliomas (LGG) with positive response to treatment and
longer progression free survival. Therefore, predicting 1p/19q status is
crucial for effective treatment planning of LGG. In this study, we predict the
1p/19q status from MR images using convolutional neural networks (CNN), which
could be a noninvasive alternative to surgical biopsy and histopathological
analysis. Method: Our method consists of three main steps: image registration,
tumor segmentation, and classification of 1p/19q status using CNN. We included
a total of 159 LGG with 3 image slices each who had biopsy-proven 1p/19q status
(57 nondeleted and 102 codeleted) and preoperative postcontrast-T1 (T1C) and T2
images. We divided our data into training, validation, and test sets. The
training data was balanced for equal class probability and then augmented with
iterations of random translational shift, rotation, and horizontal and vertical
flips to increase the size of the training set. We shuffled and augmented the
training data to counter overfitting in each epoch. Finally, we evaluated
several configurations of a multi-scale CNN architecture until training and
validation accuracies became consistent. Results: The results of the best
performing configuration on the unseen test set were 93.3% (sensitivity),
82.22% (specificity), and 87.7% (accuracy). Conclusion: Multi-scale CNN with
their self-learning capability provides promising results for predicting 1p/19q
status noninvasively based on T1C and T2 images. Significance: Predicting
1p/19q status noninvasively from MR images would allow selecting effective
treatment strategies for LGG patients without the need for surgical biopsy.

Multi-Scale Anisotropic Fourth-Order Diffusion Improves Ridge and Valley Localization

Shekoufeh Gorgi Zadeh, Stephan Didas, Maximilian W. M. Wintergerst, Thomas Schultz
Comments: 16 pages, 6 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Ridge and valley enhancing filters are widely used in applications such as
vessel detection in medical image computing. When images are degraded by noise
or include vessels at different scales, such filters are an essential step for
meaningful and stable vessel localization. In this work, we propose a novel
multi-scale anisotropic fourth-order diffusion equation that allows us to
smooth along vessels, while sharpening them in the orthogonal direction. The
proposed filter uses a fourth order diffusion tensor whose eigentensors and
eigenvalues are determined from the local Hessian matrix, at a scale that is
automatically selected for each pixel. We discuss efficient implementation
using a Fast Explicit Diffusion scheme and demonstrate results on synthetic
images and vessels in fundus images. Compared to previous isotropic and
anisotropic fourth-order filters, as well as established second-order vessel
enhancing filters, our newly proposed one better restores the centerlines in
all cases.

The subset-matched Jaccard index for evaluation of Segmentation for Plant Images

Jonathan Bell, Hannah M. Dee
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We describe a new measure for the evaluation of region level segmentation of
objects, as applied to evaluating the accuracy of leaf-level segmentation of
plant images. The proposed approach enforces the rule that a region (e.g. a
leaf) in either the image being evaluated or the ground truth image evaluated
against can be mapped to no more than one region in the other image. We call
this measure the subset-matched Jaccard index.

SANet: Structure-Aware Network for Visual Tracking

Heng Fan, Haibin Ling
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Convolutional neural network (CNN) has drawn increasing interest in visual
tracking owing to its powerfulness in feature extraction. Most existing
CNN-based trackers treat tracking as a classification problem. However, these
trackers are sensitive to similar distractors because their CNN models mainly
focus on inter-class classification. To deal with this problem, we use
self-structure information of object to distinguish it from distractors.
Specifically, we utilize recurrent neural network (RNN) to model object
structure, and incorporate it into CNN to improve its robustness in presence of
similar distractors. Considering that convolutional layers in different levels
characterize the object from different perspectives, we use multiple RNNs to
model object structure in different levels respectively. In addition, we
present a skip concatenation strategy to fuse CNN and RNN feature maps, and
thus are able to provide the next layer with richer information, which further
improves the performance of the proposed model. Extensive experimental results
on three large-scale benchmarks, OTB100, TC-128 and VOT2015, show that the
proposed algorithm outperforms other state-of-the-art methods.

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu
Comments: Accepted by AAAI2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents an end-to-end trainable fast scene text detector, named
TextBoxes, which detects scene text with both high accuracy and efficiency in a
single network forward pass, involving no post-process except for a standard
non-maximum suppression. TextBoxes outperforms competing methods in terms of
text localization accuracy and is much faster, taking only 0.09s per image in a
fast implementation. Furthermore, combined with a text recognizer, TextBoxes
significantly outperforms state-of-the-art approaches on word spotting and
end-to-end text recognition tasks.

Efficient Convolutional Neural Network with Binary Quantization Layer

Mahdyar Ravanbakhsh, Hossein Mousavi, Moin Nabi, Lucio Marcenaro, Carlo Regazzoni
Comments: Workshop on Efficient Methods for Deep Neural Networks (EMDNN), NIPS 2016, Barcelona, Spain. arXiv admin note: substantial text overlap with arXiv:1609.09220
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we introduce a novel method for segmentation that can benefit
from general semantics of Convolutional Neural Network (CNN). Our segmentation
proposes visually and semantically coherent image segments. We use binary
encoding of CNN features to overcome the difficulty of the clustering on the
high-dimensional CNN feature space. These binary encoding can be embedded into
the CNN as an extra layer at the end of the network. This results in real-time
segmentation. To the best of our knowledge our method is the first attempt on
general semantic image segmentation using CNN. All the previous papers were
limited to few number of category of the images (e.g. PASCAL VOC). Experiments
show that our segmentation algorithm outperform the state-of-the-art
non-semantic segmentation methods by a large margin.

Non-Local Color Image Denoising with Convolutional Neural Networks

Stamatios Lefkimmiatis
Comments: 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

We propose a novel deep network architecture for grayscale and color image
denoising that is based on a non-local image model. Our motivation for the
overall design of the proposed network stems from variational methods that
exploit the inherent non-local self-similarity property of natural images. We
build on this concept and introduce deep networks that perform non-local
processing and at the same time they significantly benefit from discriminative
learning. Experiments on the Berkeley segmentation dataset, comparing several
state-of-the-art methods, show that the proposed non-local models achieve the
best reported denoising performance both for grayscale and color images for all
the tested noise levels. It is also worth noting that this increase in
performance comes at no extra cost on the capacity of the network compared to
existing alternative deep network architectures. In addition, we highlight a
direct link of the proposed non-local models to convolutional neural networks.
This connection is of significant importance since it allows our models to take
full advantage of the latest advances on GPU computing in deep learning and
makes them amenable to efficient implementations through their inherent
parallelism.

Crowd Counting by Adapting Convolutional Neural Networks with Side Information

Di Kang, Debarun Dhar, Antoni B. Chan
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Computer vision tasks often have side information available that is helpful
to solve the task. For example, for crowd counting, the camera perspective
(e.g., camera angle and height) gives a clue about the appearance and scale of
people in the scene. While side information has been shown to be useful for
counting systems using traditional hand-crafted features, it has not been fully
utilized in counting systems based on deep learning. In order to incorporate
the available side information, we propose an adaptive convolutional neural
network (ACNN), where the convolutional filter weights adapt to the current
scene context via the side information. In particular, we model the filter
weights as a low-dimensional manifold, parametrized by the side information,
within the high-dimensional space of filter weights. With the help of side
information and adaptive weights, the ACNN can disentangle the variations
related to the side information, and extract discriminative features related to
the current context. Since existing crowd counting datasets do not contain
ground-truth side information, we collect a new dataset with the ground-truth
camera angle and height as the side information. On experiments in crowd
counting, the ACNN improves counting accuracy compared to a plain CNN with a
similar number of parameters. We also apply ACNN to image deconvolution to show
its potential effectiveness on other computer vision applications.

Training Sparse Neural Networks

Suraj Srinivas, Akshayvarun Subramanya, R. Venkatesh Babu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Deep neural networks with lots of parameters are typically used for
large-scale computer vision tasks such as image classification. This is a
result of using dense matrix multiplications and convolutions. However, sparse
computations are known to be much more efficient. In this work, we train and
build neural networks which implicitly use sparse computations. We introduce
additional gate variables to perform parameter selection and show that this is
equivalent to using a spike-and-slab prior. We experimentally validate our
method on both small and large networks and achieve state-of-the-art
compression results for sparse neural network models.

Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition

Jiali Duan, Shuai Zhou, Jun Wan, Xiaoyuan Guo, Stan Z. Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, the popularity of depth-sensors such as Kinect has made depth
videos easily available while its advantages have not been fully exploited.
This paper investigates, for gesture recognition, to explore the spatial and
temporal information complementarily embedded in RGB and depth sequences. We
propose a convolutional twostream consensus voting network (2SCVN) which
explicitly models both the short-term and long-term structure of the RGB
sequences. To alleviate distractions from background, a 3d depth-saliency
ConvNet stream (3DDSN) is aggregated in parallel to identify subtle motion
characteristics. These two components in an unified framework significantly
improve the recognition accuracy. On the challenging Chalearn IsoGD benchmark,
our proposed method outperforms the first place on the leader-board by a large
margin (10.29%) while also achieving the best result on RGBD-HuDaAct dataset
(96.74%). Both quantitative experiments and qualitative analysis shows the
effectiveness of our proposed framework and codes will be released to
facilitate future research.

Covariate conscious approach for Gait recognition based upon Zernike moment invariants

Himanshu Aggarwal, Dinesh K. Vishwakarma
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Gait recognition i.e. identification of an individual from his/her walking
pattern is an emerging field. While existing gait recognition techniques
perform satisfactorily in normal walking conditions, there performance tend to
suffer drastically with variations in clothing and carrying conditions. In this
work, we propose a novel covariate cognizant framework to deal with the
presence of such covariates. We describe gait motion by forming a single 2D
spatio-temporal template from video sequence, called Average Energy Silhouette
image (AESI). Zernike moment invariants (ZMIs) are then computed to screen the
parts of AESI infected with covariates. Following this, features are extracted
from Spatial Distribution of Oriented Gradients (SDOGs) and novel Mean of
Directional Pixels (MDPs) methods. The obtained features are fused together to
form the final well-endowed feature set. Experimental evaluation of the
proposed framework on three publicly available datasets i.e. CASIA dataset B,
OU-ISIR Treadmill dataset B and USF Human-ID challenge dataset with recently
published gait recognition approaches, prove its superior performance.

Deep Temporal Linear Encoding Networks

Ali Diba, Vivek Sharma, Luc Van Gool
Comments: Ali Diba and Vivek Sharma contributed equally to this work and listed in alphabetical order
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The CNN-encoding of features from entire videos for the representation of
human actions has rarely been addressed. Instead, CNN work has focused on
approaches to fuse spatial and temporal networks, but these were typically
limited to processing shorter sequences. We present a new video representation,
called temporal linear encoding (TLE) and embedded inside of CNNs as a new
layer, which captures the appearance and motion throughout entire videos. It
encodes this aggregated information into a robust video feature representation,
via end-to-end learning. Advantages of TLEs are: (a) they encode the entire
video into a compact feature representation, learning the semantics and a
discriminative feature space; (b) they are applicable to all kinds of networks
like 2D and 3D CNNs for video classification; and (c) they model feature
interactions in a more expressive way and without loss of information. We
conduct experiments on two challenging human action datasets: HMDB51 and
UCF101. The experiments show that TLE outperforms current state-of-the-art
methods on both datasets.

Estimation of respiratory pattern from video using selective ensemble aggregation

A. P. Prathosh, Pragathi Praveena, Lalit K. Mestha, Sanjay Bharadwaj
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Non-contact estimation of respiratory pattern (RP) and respiration rate (RR)
has multiple applications. Existing methods for RP and RR measurement fall into
one of the three categories – (i) estimation through nasal air flow
measurement, (ii) estimation from video-based remote photoplethysmography, and
(iii) estimation by measurement of motion induced by respiration using motion
detectors. These methods, however, require specialized sensors, are
computationally expensive and/or critically depend on selection of a region of
interest (ROI) for processing. In this paper a general framework is described
for estimating a periodic signal driving noisy LTI channels connected in
parallel with unknown dynamics. The method is then applied to derive a
computationally inexpensive method for estimating RP using 2D cameras that does
not critically depend on ROI. Specifically, RP is estimated by imaging the
changes in the reflected light caused by respiration-induced motion. Each
spatial location in the field of view of the camera is modeled as a
noise-corrupted linear time-invariant (LTI) measurement channel with unknown
system dynamics, driven by a single generating respiratory signal. Estimation
of RP is cast as a blind deconvolution problem and is solved through a method
comprising subspace projection and statistical aggregation. Experiments are
carried out on 31 healthy human subjects by generating multiple RPs and
comparing the proposed estimates with simultaneously acquired ground truth from
an impedance pneumograph device. The proposed estimator agrees well with the
ground truth device in terms of correlation measures, despite variability in
clothing pattern, angle of view and ROI.

Gland Instance Segmentation Using Deep Multichannel Neural Networks

Yan Xu, Yang Li, Yipei Wang, Mingyuan Liu, Yubo Fan, Maode Lai, Eric I-Chao Chang
Comments: arXiv admin note: substantial text overlap with arXiv:1607.04889
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a new image instance segmentation method that segments individ-
ual glands (instances) in colon histology images. This process is challenging
since the glands not only need to be segmented from a complex background, they
must also be individually identified. We leverage the idea of image-to-image
prediction in recent deep learning by designing an algorithm that automatically
exploits and fuses complex multichannel information – regional, location and
boundary cues – in gland histology images. Our proposed algorithm, a deep
multichannel framework, alleviates heavy feature design due to the use of con-
volutional neural networks and is able to meet multifarious requirements by
altering channels. Compared to methods reported in the 2015 MICCAI Gland
Segmentation Challenge and other currently prevalent instance segmentation
methods, we observe state-of-the-art results based on the evaluation metrics.
Keywords: Instance segmentation, convolutional neural networks, segmentation,
multichannel, histology image.

ResFeats: Residual Network Based Features for Image Classification

Ammar Mahmood, Mohammed Bennamoun, Senjian An, Ferdous Sohel
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep residual networks have recently emerged as the state-of-the-art
architecture in image segmentation and object detection. In this paper, we
propose new image features (called ResFeats) extracted from the last
convolutional layer of deep residual networks pre-trained on ImageNet. We
propose to use ResFeats for diverse image classification tasks namely, object
classification, scene classification and coral classification and show that
ResFeats consistently perform better than their CNN counterparts on these
classification tasks. Since the ResFeats are large feature vectors, we propose
to use PCA for dimensionality reduction. Experimental results are provided to
show the effectiveness of ResFeats with state-of-the-art classification
accuracies on Caltech-101, Caltech-256 and MLC datasets and a significant
performance improvement on MIT-67 dataset compared to the widely used CNN
features.

Self-Supervised Video Representation Learning With Odd-One-Out Networks

Basura Fernando, Hakan Bilen, Efstratios Gavves, Stephen Gould
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a new self-supervised CNN pre-training technique based on a novel
auxiliary task called “odd-one-out learning”. In this task, the machine is
asked to identify the unrelated or odd element from a set of otherwise related
elements. We apply this technique to self-supervised video representation
learning where we sample subsequences from videos and ask the network to learn
to predict the odd video subsequence. The odd video subsequence is sampled such
that it has wrong temporal order of frames while the even ones have the correct
temporal order. Therefore, to generate a odd-one-out question no manual
annotation is required. Our learning machine is implemented as multi-stream
convolutional neural network, which is learned end-to-end. Using odd-one-out
networks, we learn temporal representations for videos that generalizes to
other related tasks such as action recognition.

On action classification, our method obtains 60.3\% on the UCF101 dataset
using only UCF101 data for training which is approximately 10% better than
current state-of-the-art self-supervised learning methods. Similarly, on HMDB51
dataset we outperform self-supervised state-of-the art methods by 12.7% on
action classification task.

Cascaded Face Alignment via Intimacy Definition Feature

Hailiang Li, Kin-Man Lam, Edmond M. Y. Chiu, Kangheng Wu, Zhibin Lei
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a fast cascaded regression for face alignment, via
a novel local feature. Our proposed local lightweight feature, namely intimacy
definition feature (IDF), is more discriminative than landmark shape-indexed
feature, more efficient than the handcrafted scale-invariant feature transform
(SIFT) feature, and more compact than the local binary feature (LBF).
Experimental results show that our approach achieves state-of-the-art
performance, when tested on the most challenging benchmarks. Compared with an
LBF-based algorithm, our method is able to obtain about two times the speed-up
and more than 20% improvement, in terms of alignment error measurement, and
able to save an order of magnitude of memory requirement.

Phrase Localization and Visual Relationship Detection with Comprehensive Linguistic Cues

Bryan A. Plummer, Arun Mallya, Christopher M. Cervantes, Julia Hockenmaier, Svetlana Lazebnik
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a framework for localization or grounding of phrases in
images using a large collection of linguistic and visual cues. We model the
appearance, size, and position of entity bounding boxes, adjectives that
contain attribute information, and spatial relationships between pairs of
entities connected by verbs or prepositions. We pay special attention to
relationships between people and clothing or body part mentions, as they are
useful for distinguishing individuals. We automatically learn weights for
combining these cues and at test time, perform joint inference over all phrases
in a caption. The resulting system produces a 4% improvement in accuracy over
the state of the art on phrase localization on the Flickr30k Entities dataset
and a 4-10% improvement for visual relationship detection on the Stanford VRD
dataset.

Not Afraid of the Dark: NIR-VIS Face Recognition via Cross-spectral Hallucination and Low-rank Embedding

Jose Lezama, Qiang Qiu, Guillermo Sapiro
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Surveillance cameras today often capture NIR (near infrared) images in
low-light environments. However, most face datasets accessible for training and
verification are only collected in the VIS (visible light) spectrum. It remains
a challenging problem to match NIR to VIS face images due to the different
light spectrum. Recently, breakthroughs have been made for VIS face recognition
by applying deep learning on a huge amount of labeled VIS face samples. The
same deep learning approach cannot be simply applied to NIR face recognition
for two main reasons: First, much limited NIR face images are available for
training compared to the VIS spectrum. Second, face galleries to be matched are
mostly available only in the VIS spectrum. In this paper, we propose an
approach to extend the deep learning breakthrough for VIS face recognition to
the NIR spectrum, without retraining the underlying deep models that see only
VIS faces. Our approach consists of two core components, cross-spectral
hallucination and low-rank embedding, to optimize respectively input and output
of a VIS deep model for cross-spectral face recognition. Cross-spectral
hallucination produces VIS faces from NIR images through a deep learning
approach. Low-rank embedding restores a low-rank structure for faces deep
features across both NIR and VIS spectrum. We observe that it is often equally
effective to perform hallucination to input NIR images or low-rank embedding to
output deep features for a VIS deep model for cross-spectral recognition. When
hallucination and low-rank embedding are deployed together, we observe
significant further improvement; we obtain state-of-the-art accuracy on the
CASIA NIR-VIS v2.0 benchmark, without the need at all to re-train the
recognition system.

RefineNet: Multi-Path Refinement Networks with Identity Mappings for High-Resolution Semantic Segmentation

Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, very deep convolutional neural networks (CNNs) have shown
outstanding performance in object recognition and have also been the first
choice for dense classification problems such as semantic segmentation.
However, repeated subsampling operations like pooling or convolution striding
in deep CNNs lead to a significant decrease in the initial image resolution.
Here, we present RefineNet, a generic multi-path refinement network that
explicitly exploits all the information available along the down-sampling
process to enable high-resolution prediction using long-range residual
connections. In this way, the deeper layers that capture high-level semantic
features can be directly refined using fine-grained features from earlier
convolutions. The individual components of RefineNet employ residual
connections following the identity mapping mindset, which allows for effective
end-to-end training. Further, we introduce chained residual pooling, which
captures rich background context in an efficient manner. We carry out
comprehensive experiments and set new state-of-the-art results on seven public
datasets. In particular, we achieve an intersection-over-union score of 83.4 on
the challenging PASCAL VOC 2012 dataset, which is the best reported result to
date.

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Recent progress on image captioning has made it possible to generate novel
sentences describing images in natural language, but compressing an image into
a single sentence can describe visual content in only coarse detail. While one
new captioning approach, dense captioning, can potentially describe images in
finer levels of detail by captioning many regions within an image, it in turn
is unable to produce a coherent story for an image. In this paper we overcome
these limitations by generating entire paragraphs for describing images, which
can tell detailed, unified stories. We develop a model that decomposes both
images and paragraphs into their constituent parts, detecting semantic regions
in images and using a hierarchical recurrent neural network to reason about
language. Linguistic analysis confirms the complexity of the paragraph
generation task, and thorough experiments on a new dataset of image and
paragraph pairs demonstrate the effectiveness of our approach.

Object Recognition with and without Objects

Zhuotun Zhu, Lingxi Xie, Alan L. Yuille
Comments: 5 figures, 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While recent deep neural network models have given promising performance on
object recognition, they rely implicitly on the visual contents of the whole
image. In this paper, we train deep neural networks on the foreground (object)
and background (context) regions of images respectively. Considering human
recognition in the same situations, networks trained on pure background without
objects achieves highly reasonable recognition performance that beats humans to
a large margin if only given context. However, humans still outperform networks
with pure object available, which indicates networks and human beings have
different mechanisms in understanding an image. Furthermore, we
straightforwardly combine multiple trained networks to explore the different
visual clues learned by different networks. Experiments show that useful visual
hints can be learned separately and then combined to achieve higher
performance, which confirms the advantages of the proposed framework.

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) have become a de facto standard
for image classification and segmentation problems. These networks have also
had early success in the video domain, despite failing to capture motion
continuity and other rich temporal correlations. Evidence has since emerged
that extending ConvNets to 3-dimensions leads to state-of-the-art performance
across a broad set of video processing tasks by learning these joint
spatiotemporal features. However, these early 3D networks have been restricted
to shallower architectures of fewer channels than successful 2D networks due to
memory constraints inherent to GPU implementations.

In this study we present the first practical CPU implementation of tensor
convolution optimized for deep networks of small kernels. Our implementation
supports arbitrarily deep ConvNets of (N)-dimensional tensors due to the
relaxed memory constraints of CPU systems, which can be further leveraged for
an 8-fold reduction in the algorithmic cost of 3D convolution (e.g. C3D
kernels). Because most of the optimized ConvNets in previous literature are 2
rather than 3-dimensional, we benchmark our performance against the most
popular 2D implementations. Even in this special case, which is theoretically
the least beneficial for our fast algorithm, we observe a 5 to 25-fold
improvement in throughput compared to previous state-of-the-art. We believe
this work is an important step toward practical ConvNets for real-time
applications, such as mobile video processing and biomedical image analysis,
where high performance 3D networks are a must.

Learning Fully Convolutional Networks for Iterative Non-blind Deconvolution

Jiawei Zhang, Jinshan Pan, Wei-Sheng Lai, Rynson Lau, Ming-Hsuan Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a fully convolutional networks for iterative
non-blind deconvolution We decompose the non-blind deconvolution problem into
image denoising and image deconvolution. We train a FCNN to remove noises in
the gradient domain and use the learned gradients to guide the image
deconvolution step. In contrast to the existing deep neural network based
methods, we iteratively deconvolve the blurred images in a multi-stage
framework. The proposed method is able to learn an adaptive image prior, which
keeps both local (details) and global (structures) information. Both
quantitative and qualitative evaluations on benchmark datasets demonstrate that
the proposed method performs favorably against state-of-the-art algorithms in
terms of quality and speed.

Recurrent Memory Addressing for describing videos

Kumar Krishna Agrawal, Arnav Kumar Jain, Abhinav Agarwalla, Pabitra Mitra
Comments: Under review at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Deep Neural Network architectures with external memory components allow the
model to perform inference and capture long term dependencies, by storing
information explicitly. In this paper, we generalize Key-Value Memory Networks
to a multimodal setting, introducing a novel key-addressing mechanism to deal
with sequence-to-sequence models. The advantages of the framework are
demonstrated on the task of video captioning, i.e generating natural language
descriptions for videos. Conditioning on the previous time-step attention
distributions for the key-value memory slots, we introduce a temporal structure
in the memory addressing schema. The proposed model naturally decomposes the
problem of video captioning into vision and language segments, dealing with
them as key-value pairs. More specifically, we learn a semantic embedding (v)
corresponding to each frame (k) in the video, thereby creating (k, v) memory
slots. This allows us to exploit the temporal dependencies at multiple
hierarchies (in the recurrent key-addressing; and in the language decoder).
Exploiting this flexibility of the framework, we additionally capture spatial
dependencies while mapping from the visual to semantic embedding. Extensive
experiments on the Youtube2Text dataset demonstrate usefulness of recurrent
key-addressing, while achieving competitive scores on BLEU@4, METEOR metrics
against state-of-the-art models.

Nazr-CNN: Object Detection and Fine-Grained Classification in Crowdsourced UAV Images

N. Attari, F. Ofli, M. Awad, J. Lucas, S. Chawla
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose Nazr-CNN, a deep learning pipeline for object detection and
fine-grained classification in images acquired from Unmanned Aerial Vehicles
(UAVs). The UAVs were deployed in the Island of Vanuatu to assess damage in the
aftermath of cyclone PAM in 2015. The images were labeled by a crowdsourcing
effort and the labeling categories consisted of fine-grained levels of damage
to built structures.

Nazr-CNN consists of two components. The function of the first component is
to localize objects (e.g. houses) in an image by carrying out a pixel-level
classification. In the second component, a hidden layer of a Convolutional
Neural Network (CNN) is used to encode Fisher Vectors (FV) of the segments
generated from the first component in order to help discriminate between
between different levels of damage. Since our data set is relatively small, a
pre-trained network for pixel-level classification and FV encoding was used.
Nazr-CNN attains promising results both for object detection and damage
assessment suggesting that the integrated pipeline is robust in the face of
small data sets and labeling errors by annotators. While the focus of Nazr-CNN
is on assessment of UAV images in a post-disaster scenario, our solution is
general and can be applied in many diverse settings.

LCNN: Lookup-based Convolutional Neural Network

Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Porting state of the art deep learning algorithms to resource constrained
compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose
a fast, compact, and accurate model for convolutional neural networks that
enables efficient learning and inference. We introduce LCNN, a lookup-based
convolutional neural network that encodes convolutions by few lookups to a
dictionary that is trained to cover the space of weights in CNNs. Training LCNN
involves jointly learning a dictionary and a small set of linear combinations.
The size of the dictionary naturally traces a spectrum of trade-offs between
efficiency and accuracy. Our experimental results on ImageNet challenge show
that LCNN can offer 3.2x speedup while achieving 55.1% top-1 accuracy using
AlexNet architecture. Our fastest LCNN offers 37.6x speed up over AlexNet while
maintaining 44.3% top-1 accuracy. LCNN not only offers dramatic speed ups at
inference, but it also enables efficient training. In this paper, we show the
benefits of LCNN in few-shot learning and few-iteration learning, two crucial
aspects of on-device training of deep learning models.

On The Stability of Video Detection and Tracking

Hong Zhang, Naiyan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we study an important yet less explored aspect in video
detection and multi-object tracking — stability. Surprisingly, there is no
prior work that tried to quantify it. As a consequence, we start our work by
proposing a novel evaluation metric for video detection which considers both
stability and accuracy. For accuracy, we extend the existing accuracy metric
mean Average Precision (mAP). For stability, we decompose it into three terms:
fragment error, center position error, scale and ratio error. Each error
represents one type of stability. Furthermore, we demonstrate that the
stability metric has low correlation with accuracy metric. Thus, it indeed
captures a different perspective of quality in object detection. Lastly, based
on this metric, we evaluate several existing methods for video detection, and
show how they affect accuracy and stability. We believe our work can provide
guidance and solid baselines for future researches in related areas.

Fast Video Classification via Adaptive Cascading of Deep Models

Haichen Shen, Seungyeop Han, Matthai Philipose, Arvind Krishnamurthy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recent advances have enabled “oracle” classifiers that can classify across
many classes and input distributions with high accuracy without retraining.
However, these classifiers are relatively heavyweight, so that applying them to
classify video is costly. We show that day-to-day video exhibits highly skewed
class distributions over the short term, and that these distributions can be
classified by much simpler models. We formulate the problem of detecting the
short-term skews online and exploiting models based on it as a new sequential
decision making problem dubbed the Online Bandit Problem, and present a new
algorithm to solve it. When applied to recognizing faces in TV shows and
movies, we realize end-to-end classification speedups of 2.5-8.5x/2.8-12.7x (on
GPU/CPU) relative to a state-of-the-art convolutional neural network, at
competitive accuracy.

PsyPhy: A Psychophysics Driven Evaluation Framework for Visual Recognition

Brandon RichardWebster, Samuel E. Anthony, Walter J. Scheirer
Comments: 11 pages, 4 figures. Submitted for publication. For supplemental material see this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

By providing substantial amounts of data and standardized evaluation
protocols, datasets in computer vision have helped fuel advances across all
areas of visual recognition. But even in light of breakthrough results on
recent benchmarks, it is still fair to ask if our recognition algorithms are
doing as well as we think they are. The vision sciences at large make use of a
very different evaluation regime known as Visual Psychophysics to study visual
perception. Psychophysics is the quantitative examination of the relationships
between controlled stimuli and the behavioral responses they elicit in
experimental test subjects. Instead of using summary statistics to gauge
performance, psychophysics directs us to construct item-response curves made up
of individual stimulus responses to find perceptual thresholds, thus allowing
one to identify the exact point at which a subject can no longer reliably
recognize the stimulus class. In this paper, we introduce a comprehensive
evaluation framework for visual recognition models that is underpinned by this
methodology. Over millions of procedurally rendered 3D scenes and 2D images, we
compare the performance of well-known convolutional neural networks. Our
results bring into question recent claims of human-like performance, and
provide a path forward for correcting newly surfaced algorithmic deficiencies.

Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks

Emily Denton, Sam Gross, Rob Fergus
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce a simple semi-supervised learning approach for images based on
in-painting using an adversarial loss. Images with random patches removed are
presented to a generator whose task is to fill in the hole, based on the
surrounding pixels. The in-painted images are then presented to a discriminator
network that judges if they are real (unaltered training images) or not. This
task acts as a regularizer for standard supervised training of the
discriminator. Using our approach we are able to directly train large VGG-style
networks in a semi-supervised fashion. We evaluate on STL-10 and PASCAL
datasets, where our approach obtains performance comparable or superior to
existing methods.

Deep Outdoor Illumination Estimation

Yannick Hold-Geoffroy, Kalyan Sunkavalli, Sunil Hadap, Emiliano Gambaretto, Jean-François Lalonde
Comments: 8 pages + 2 pages of citations, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a CNN-based technique to estimate high-dynamic range outdoor
illumination from a single low dynamic range image. To train the CNN, we
leverage a large dataset of outdoor panoramas. We fit a low-dimensional
physically-based outdoor illumination model to the skies in these panoramas
giving us a compact set of parameters (including sun position, atmospheric
conditions, and camera parameters). We extract limited field-of-view images
from the panoramas, and train a CNN with this large set of input image–output
lighting parameter pairs. Given a test image, this network can be used to infer
illumination parameters that can, in turn, be used to reconstruct an outdoor
illumination environment map. We demonstrate that our approach allows the
recovery of plausible illumination conditions and enables automatic
photorealistic virtual object insertion from a single image. An extensive
evaluation on both the panorama dataset and captured HDR environment maps shows
that our technique significantly outperforms previous solutions to this
problem.

Semantic tracking: Single-target tracking with inter-supervised convolutional networks

Jingjing Xiao, Qiang Lan, Linbo Qiao, Ales Leonardis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This article presents a semantic tracker which simultaneously tracks a single
target and recognises its category. In general, it is hard to design a tracking
model suitable for all object categories, e.g., a rigid tracker for a car is
not suitable for a deformable gymnast. Category-based trackers usually achieve
superior tracking performance for the objects of that specific category, but
have difficulties being generalised. Therefore, we propose a novel unified
robust tracking framework which explicitly encodes both generic features and
category-based features. The tracker consists of a shared convolutional network
(NetS), which feeds into two parallel networks, NetC for classification and
NetT for tracking. NetS is pre-trained on ImageNet to serve as a generic
feature extractor across the different object categories for NetC and NetT.
NetC utilises those features within fully connected layers to classify the
object category. NetT has multiple branches, corresponding to multiple
categories, to distinguish the tracked object from the background. Since each
branch in NetT is trained by the videos of a specific category or groups of
similar categories, NetT encodes category-based features for tracking. During
online tracking, NetC and NetT jointly determine the target regions with the
right category and foreground labels for target estimation. To improve the
robustness and precision, NetC and NetT inter-supervise each other and trigger
network adaptation when their outputs are ambiguous for the same image regions
(i.e., when the category label contradicts the foreground/background
classification). We have compared the performance of our tracker to other
state-of-the-art trackers on a large-scale tracking benchmark (100
sequences)—the obtained results demonstrate the effectiveness of our proposed
tracker as it outperformed other 38 state-of-the-art tracking algorithms.

Deep Residual Learning for Compressed Sensing CT Reconstruction via Persistent Homology Analysis

Yoseop Han, Jaejoon Yoo, Jong Chul Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, compressed sensing (CS) computed tomography (CT) using sparse
projection views has been extensively investigated to reduce the potential risk
of radiation to patient. However, due to the insufficient number of projection
views, an analytic reconstruction approach results in severe streaking
artifacts and CS-based iterative approach is computationally very expensive. To
address this issue, here we propose a novel deep residual learning approach for
sparse view CT reconstruction. Specifically, based on a novel persistent
homology analysis showing that the manifold of streaking artifacts is
topologically simpler than original ones, a deep residual learning architecture
that estimates the streaking artifacts is developed. Once a streaking artifact
image is estimated, an artifact-free image can be obtained by subtracting the
streaking artifacts from the input image. Using extensive experiments with real
patient data set, we confirm that the proposed residual learning provides
significantly better image reconstruction performance with several orders of
magnitude faster computational speed.

Ordinal Constrained Binary Code Learning for Nearest Neighbor Search

Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang
Comments: Accepted to AAAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent years have witnessed extensive attention in binary code learning,
a.k.a. hashing, for nearest neighbor search problems. It has been seen that
high-dimensional data points can be quantized into binary codes to give an
efficient similarity approximation via Hamming distance. Among existing
schemes, ranking-based hashing is recent promising that targets at preserving
ordinal relations of ranking in the Hamming space to minimize retrieval loss.
However, the size of the ranking tuples, which shows the ordinal relations, is
quadratic or cubic to the size of training samples. By given a large-scale
training data set, it is very expensive to embed such ranking tuples in binary
code learning. Besides, it remains a dificulty to build ranking tuples
efficiently for most ranking-preserving hashing, which are deployed over an
ordinal graph-based setting. To handle these problems, we propose a novel
ranking-preserving hashing method, dubbed Ordinal Constraint Hashing (OCH),
which efficiently learns the optimal hashing functions with a graph-based
approximation to embed the ordinal relations. The core idea is to reduce the
size of ordinal graph with ordinal constraint projection, which preserves the
ordinal relations through a small data set (such as clusters or random
samples). In particular, to learn such hash functions effectively, we further
relax the discrete constraints and design a specific stochastic gradient decent
algorithm for optimization. Experimental results on three large-scale visual
search benchmark datasets, i.e. LabelMe, Tiny100K and GIST1M, show that the
proposed OCH method can achieve superior performance over the state-of-the-arts
approaches.

Invertible Conditional GANs for image editing

Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, Jose M. Álvarez
Comments: Accepted paper at NIPS 2016 Workshop on Adversarial Training
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Generative Adversarial Networks (GANs) have recently demonstrated to
successfully approximate complex data distributions. A relevant extension of
this model is conditional GANs (cGANs), where the introduction of external
information allows to determine specific representations of the generated
images. In this work, we evaluate encoders to inverse the mapping of a cGAN,
i.e., mapping a real image into a latent space and a conditional
representation. This allows, for example, to reconstruct and modify real images
of faces conditioning on arbitrary attributes. Additionally, we evaluate the
design of cGANs. The combination of an encoder with a cGAN, which we call
Invertible cGAN (IcGAN), enables to re-generate real images with deterministic
complex modifications.

Beyond Deep Residual Learning for Image Restoration: Persistent Homology-Guided Manifold Simplification

Woong Bae, Jaejoon Yoo, Jong Chul Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The latest deep learning approaches perform better than the state-of-the-art
signal processing approaches in various image restoration tasks. However, if an
image contains many patterns and structures, the performance of these CNNs is
still inferior. To address this issue, here we propose a novel wavelet-domain
deep residual learning algorithm that outperforms the existing residual
learning. The main idea is originated from observation that the performance of
a learning algorithm can be improved if the input and/or label manifold can be
made topologically simpler. Using persistent homology analysis, we show that
the recent residual learning was benefited from such manifold simplification,
and wavelet transform provides another way to simplify the data manifold while
preserving the edge information. Our extensive experiments demonstrate that the
proposed wavelet-domain residual learning outperforms the existing state-of-the
art approaches.

Learning the Number of Neurons in Deep Networks

Jose M Alvarez, Mathieu Salzmann
Comments: NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Nowadays, the number of layers and of neurons in each layer of a deep network
are typically set manually. While very deep and wide networks have proven
effective in general, they come at a high memory and computation cost, thus
making them impractical for constrained platforms. These networks, however, are
known to have many redundant parameters, and could thus, in principle, be
replaced by more compact architectures. In this paper, we introduce an approach
to automatically determining the number of neurons in each layer of a deep
network during learning. To this end, we propose to make use of a group
sparsity regularizer on the parameters of the network, where each group is
defined to act on a single neuron. Starting from an overcomplete network, we
show that our approach can reduce the number of parameters by up to 80\% while
retaining or even improving the network accuracy.

Multi-Scale Saliency Detection using Dictionary Learning

Shubham Pachori, Shanmugananthan Raman
Comments: arXiv admin note: text overlap with arXiv:1502.01094 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Saliency detection has drawn a lot of attention of researchers in various
fields over the past several years. Saliency is the perceptual quality that
makes an object, person to draw the attention of humans at the very sight.
Salient object detection in an image has been used centrally in many
computational photography and computer vision applications like video
compression, object recognition and classification, object segmentation,
adaptive content delivery, motion detection, content aware resizing, camouflage
images and change blindness images to name a few. We propose a method to detect
saliency in the objects using multimodal dictionary learning which has been
recently used in classification and image fusion. The multimodal dictionary
that we are learning is task driven which gives improved performance over its
counterpart (the one which is not task specific).

Inferring Restaurant Styles by Mining Crowd Sourced Photos from User-Review Websites

Haofu Liao, Yucheng Li, Tianran Hu, Jiebo Luo
Comments: 10 pages, Accepted by IEEE BigData 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

When looking for a restaurant online, user uploaded photos often give people
an immediate and tangible impression about a restaurant. Due to their
informativeness, such user contributed photos are leveraged by restaurant
review websites to provide their users an intuitive and effective search
experience. In this paper, we present a novel approach to inferring restaurant
types or styles (ambiance, dish styles, suitability for different occasions)
from user uploaded photos on user-review websites. To that end, we first
collect a novel restaurant photo dataset associating the user contributed
photos with the restaurant styles from TripAdvior. We then propose a deep
multi-instance multi-label learning (MIML) framework to deal with the unique
problem setting of the restaurant style classification task. We employ a
two-step bootstrap strategy to train a multi-label convolutional neural network
(CNN). The multi-label CNN is then used to compute the confidence scores of
restaurant styles for all the images associated with a restaurant. The computed
confidence scores are further used to train a final binary classifier for each
restaurant style tag. Upon training, the styles of a restaurant can be profiled
by analyzing restaurant photos with the trained multi-label CNN and SVM models.
Experimental evaluation has demonstrated that our crowd sourcing-based approach
can effectively infer the restaurant style when there are a sufficient number
of user uploaded photos for a given restaurant.

A Bayesian approach to type-specific conic fitting

Matthew Collett
Comments: 27 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A perturbative approach is used to quantify the effect of noise in data
points on fitted parameters in a general homogeneous linear model, and the
results applied to the case of conic sections. There is an optimal choice of
normalisation that minimises bias, and iteration with the correct reweighting
significantly improves statistical reliability. By conditioning on an
appropriate prior, an unbiased type-specific fit can be obtained. Error
estimates for the conic coefficients may also be used to obtain both bias
corrections and confidence intervals for other curve parameters.

Understanding Anatomy Classification Through Visualization

Devinder Kumar, Vlado Menkovski
Comments: Accepted at 30th NIPS Machine learning for Health Workshop, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

One of the main challenges for broad adoption of deep convolutional neural
network (DCNN) models is the lack of understanding of their decision process.
In many applications a simpler less capable model that can be easily understood
is favorable to a black-box model that has superior performance. In this paper,
we present an approach for designing DCNN models based on visualization of the
internal activations of the model. We visualize the model’s response using
fractional stride convolution technique and compare the results with known
imaging landmarks from the medical literature. We show that sufficiently deep
and capable models can be successfully trained to use the same medical
landmarks a human expert would use. The presented approach allows for
communicating the model decision process well, but also offers insight towards
detecting biases.

RhoanaNet Pipeline: Dense Automatic Neural Annotation

Seymour Knowles-Barley, Verena Kaynig, Thouis Ray Jones, Alyssa Wilson, Joshua Morgan, Dongil Lee, Daniel Berger, Narayanan Kasthuri, Jeff W. Lichtman, Hanspeter Pfister
Comments: 13 pages, 4 figures
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV)

Reconstructing a synaptic wiring diagram, or connectome, from electron
microscopy (EM) images of brain tissue currently requires many hours of manual
annotation or proofreading (Kasthuri and Lichtman, 2010; Lichtman and Sanes,
2008; Seung, 2009). The desire to reconstruct ever larger and more complex
networks has pushed the collection of ever larger EM datasets. A cubic
millimeter of raw imaging data would take up 1 PB of storage and present an
annotation project that would be impractical without relying heavily on
automatic segmentation methods. The RhoanaNet image processing pipeline was
developed to automatically segment large volumes of EM data and ease the burden
of manual proofreading and annotation. Based on (Kaynig et al., 2015), we
updated every stage of the software pipeline to provide better throughput
performance and higher quality segmentation results. We used state of the art
deep learning techniques to generate improved membrane probability maps, and
Gala (Nunez-Iglesias et al., 2014) was used to agglomerate 2D segments into 3D
objects.

We applied the RhoanaNet pipeline to four densely annotated EM datasets, two
from mouse cortex, one from cerebellum and one from mouse lateral geniculate
nucleus (LGN). All training and test data is made available for benchmark
comparisons. The best segmentation results obtained gave
(V^ ext{Info}_ ext{F-score}) scores of 0.9054 and 09182 for the cortex
datasets, 0.9438 for LGN, and 0.9150 for Cerebellum.

The RhoanaNet pipeline is open source software. All source code, training
data, test data, and annotations for all four benchmark datasets are available
at www.rhoana.org.

Generalized Dropout

Suraj Srinivas, R. Venkatesh Babu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Deep Neural Networks often require good regularizers to generalize well.
Dropout is one such regularizer that is widely used among Deep Learning
practitioners. Recent work has shown that Dropout can also be viewed as
performing Approximate Bayesian Inference over the network parameters. In this
work, we generalize this notion and introduce a rich family of regularizers
which we call Generalized Dropout. One set of methods in this family, called
Dropout++, is a version of Dropout with trainable parameters. Classical Dropout
emerges as a special case of this method. Another member of this family selects
the width of neural network layers. Experiments show that these methods help in
improving generalization performance over Dropout.

Effective Deterministic Initialization for (k)-Means-Like Methods via Local Density Peaks Searching

Fengfu Li, Hong Qiao, Bo Zhang
Comments: 16 pages, 9 figures, journal paper
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The (k)-means clustering algorithm is popular but has the following main
drawbacks: 1) the number of clusters, (k), needs to be provided by the user in
advance, 2) it can easily reach local minima with randomly selected initial
centers, 3) it is sensitive to outliers, and 4) it can only deal with well
separated hyperspherical clusters. In this paper, we propose a Local Density
Peaks Searching (LDPS) initialization framework to address these issues. The
LDPS framework includes two basic components: one of them is the local density
that characterizes the density distribution of a data set, and the other is the
local distinctiveness index (LDI) which we introduce to characterize how
distinctive a data point is compared with its neighbors. Based on these two
components, we search for the local density peaks which are characterized with
high local densities and high LDIs to deal with 1) and 2). Moreover, we detect
outliers characterized with low local densities but high LDIs, and exclude them
out before clustering begins. Finally, we apply the LDPS initialization
framework to (k)-medoids, which is a variant of (k)-means and chooses data
samples as centers, with diverse similarity measures other than the Euclidean
distance to fix the last drawback of (k)-means. Combining the LDPS
initialization framework with (k)-means and (k)-medoids, we obtain two novel
clustering methods called LDPS-means and LDPS-medoids, respectively.
Experiments on synthetic data sets verify the effectiveness of the proposed
methods, especially when the ground truth of the cluster number (k) is large.
Further, experiments on several real world data sets, Handwritten Pendigits,
Coil-20, Coil-100 and Olivetti Face Database, illustrate that our methods give
a superior performance than the analogous approaches on both estimating (k) and
unsupervised object categorization.

Deep Learning for the Classification of Lung Nodules

He Yang, Hengyong Yu, Ge Wang
Subjects: Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Deep learning, as a promising new area of machine learning, has attracted a
rapidly increasing attention in the field of medical imaging. Compared to the
conventional machine learning methods, deep learning requires no hand-tuned
feature extractor, and has shown a superior performance in many visual object
recognition applications. In this study, we develop a deep convolutional neural
network (CNN) and apply it to thoracic CT images for the classification of lung
nodules. We present the CNN architecture and classification accuracy for the
original images of lung nodules. In order to understand the features of lung
nodules, we further construct new datasets, based on the combination of
artificial geometric nodules and some transformations of the original images,
as well as a stochastic nodule shape model. It is found that simplistic
geometric nodules cannot capture the important features of lung nodules.

Temporal Generative Adversarial Nets

Masaki Saito, Eiichi Matsumoto
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In this paper we propose a generative model, the Temporal Generative
Adversarial Network (TGAN), which can learn a semantic representation of
unlabelled videos, and is capable of generating consistent videos. Unlike an
existing GAN that generates videos with a generator consisting of 3D
deconvolutional layers, our model exploits two types of generators: a temporal
generator and an image generator. The temporal generator consists of 1D
deconvolutional layers and outputs a set of latent variables, each of which
corresponds to a frame in the generated video, and the image generator
transforms them into a video with 2D deconvolutional layers. This
representation allows efficient training of the network parameters. Moreover,
it can handle a wider range of applications including the generation of a long
sequence, frame interpolation, and the use of pre-trained models. Experimental
results demonstrate the effectiveness of our method.

Artificial Intelligence

Memory Lens: How Much Memory Does an Agent Use?

Christoph Dann, Katja Hofmann, Sebastian Nowozin
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose a new method to study the internal memory used by reinforcement
learning policies. We estimate the amount of relevant past information by
estimating mutual information between behavior histories and the current action
of an agent. We perform this estimation in the passive setting, that is, we do
not intervene but merely observe the natural behavior of the agent. Moreover,
we provide a theoretical justification for our approach by showing that it
yields an implementation-independent lower bound on the minimal memory capacity
of any agent that implement the observed policy. We demonstrate our approach by
estimating the use of memory of DQN policies on concatenated Atari frames,
demonstrating sharply different use of memory across 49 games. The study of
memory as information that flows from the past to the current action opens
avenues to understand and improve successful reinforcement learning algorithms.

Generating machine-executable plans from end-user's natural-language instructions

Rui Liu, Xiaoli Zhang
Comments: 16 pages, 10 figures, article submitted to Robotics and Computer-Integrated Manufacturing, 2016 Aug
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)

It is critical for advanced manufacturing machines to autonomously execute a
task by following an end-user’s natural language (NL) instructions. However, NL
instructions are usually ambiguous and abstract so that the machines may
misunderstand and incorrectly execute the task. To address this NL-based
human-machine communication problem and enable the machines to appropriately
execute tasks by following the end-user’s NL instructions, we developed a
Machine-Executable-Plan-Generation (exePlan) method. The exePlan method
conducts task-centered semantic analysis to extract task-related information
from ambiguous NL instructions. In addition, the method specifies machine
execution parameters to generate a machine-executable plan by interpreting
abstract NL instructions. To evaluate the exePlan method, an industrial robot
Baxter was instructed by NL to perform three types of industrial tasks {‘drill
a hole’, ‘clean a spot’, ‘install a screw’}. The experiment results proved that
the exePlan method was effective in generating machine-executable plans from
the end-user’s NL instructions. Such a method has the promise to endow a
machine with the ability of NL-instructed task execution.

Coherent Dialogue with Attention-based Language Models

Hongyuan Mei, Mohit Bansal, Matthew R. Walter
Comments: To appear at AAAI 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We model coherent conversation continuation via RNN-based dialogue models
equipped with a dynamic attention mechanism. Our attention-RNN language model
dynamically increases the scope of attention on the history as the conversation
continues, as opposed to standard attention (or alignment) models with a fixed
input scope in a sequence-to-sequence model. This allows each generated word to
be associated with the most relevant words in its corresponding conversation
history. We evaluate the model on two popular dialogue datasets, the
open-domain MovieTriples dataset and the closed-domain Ubuntu Troubleshoot
dataset, and achieve significant improvements over the state-of-the-art and
baselines on several metrics, including complementary diversity-based metrics,
human evaluation, and qualitative visualizations. We also show that a vanilla
RNN with dynamic attention outperforms more complex memory models (e.g., LSTM
and GRU) by allowing for flexible, long-distance memory. We promote further
coherence via topic modeling-based reranking.

Enforcing Relational Matching Dependencies with Datalog for Entity Resolution

Zeinab Bahmani, Leopoldo Bertossi
Comments: Conference submission
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Entity resolution (ER) is about identifying and merging records in a database
that represent the same real-world entity. Matching dependencies (MDs) have
been introduced and investigated as declarative rules that specify ER policies.
An ER process induced by MDs over a dirty instance leads to multiple clean
instances, in general. General “answer sets programs” have been proposed to
specify the MD-based cleaning task and its results. In this work, we extend MDs
to “relational MDs”, which capture more application semantics, and identify
classes of relational MDs for which the general ASP can be automatically
rewritten into a stratified Datalog program, with the single clean instance as
its standard model.

Learning From Graph Neighborhoods Using LSTMs

Rakshit Agrawal, Luca de Alfaro, Vassilis Polychronopoulos
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Many prediction problems can be phrased as inferences over local
neighborhoods of graphs. The graph represents the interaction between entities,
and the neighborhood of each entity contains information that allows the
inferences or predictions. We present an approach for applying machine learning
directly to such graph neighborhoods, yielding predicitons for graph nodes on
the basis of the structure of their local neighborhood and the features of the
nodes in it. Our approach allows predictions to be learned directly from
examples, bypassing the step of creating and tuning an inference model or
summarizing the neighborhoods via a fixed set of hand-crafted features. The
approach is based on a multi-level architecture built from Long Short-Term
Memory neural nets (LSTMs); the LSTMs learn how to summarize the neighborhood
from data. We demonstrate the effectiveness of the proposed technique on a
synthetic example and on real-world data related to crowdsourced grading,
Bitcoin transactions, and Wikipedia edit reversions.

Options Discovery with Budgeted Reinforcement Learning

Aurélia Léon, Ludovic Denoyer
Comments: Under review as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We consider the problem of learning hierarchical policies for Reinforcement
Learning able to discover options, an option corresponding to a sub-policy over
a set of primitive actions. Different models have been proposed during the last
decade that usually rely on a predefined set of options. We specifically
address the problem of automatically discovering options in decision processes.
We describe a new RL learning framework called Bi-POMDP, and a new learning
model called Budgeted Option Neural Network (BONN) able to discover options
based on a budgeted learning objective. Since Bi-POMDP are more general than
POMDP, our model can also be used to discover options for classical RL tasks.
The BONN model is evaluated on different classical RL problems, demonstrating
both quantitative and qualitative interesting results.

Generalized Dropout

Suraj Srinivas, R. Venkatesh Babu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Deep Neural Networks often require good regularizers to generalize well.
Dropout is one such regularizer that is widely used among Deep Learning
practitioners. Recent work has shown that Dropout can also be viewed as
performing Approximate Bayesian Inference over the network parameters. In this
work, we generalize this notion and introduce a rich family of regularizers
which we call Generalized Dropout. One set of methods in this family, called
Dropout++, is a version of Dropout with trainable parameters. Classical Dropout
emerges as a special case of this method. Another member of this family selects
the width of neural network layers. Experiments show that these methods help in
improving generalization performance over Dropout.

Non-Local Color Image Denoising with Convolutional Neural Networks

Stamatios Lefkimmiatis
Comments: 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

We propose a novel deep network architecture for grayscale and color image
denoising that is based on a non-local image model. Our motivation for the
overall design of the proposed network stems from variational methods that
exploit the inherent non-local self-similarity property of natural images. We
build on this concept and introduce deep networks that perform non-local
processing and at the same time they significantly benefit from discriminative
learning. Experiments on the Berkeley segmentation dataset, comparing several
state-of-the-art methods, show that the proposed non-local models achieve the
best reported denoising performance both for grayscale and color images for all
the tested noise levels. It is also worth noting that this increase in
performance comes at no extra cost on the capacity of the network compared to
existing alternative deep network architectures. In addition, we highlight a
direct link of the proposed non-local models to convolutional neural networks.
This connection is of significant importance since it allows our models to take
full advantage of the latest advances on GPU computing in deep learning and
makes them amenable to efficient implementations through their inherent
parallelism.

Fair Division via Social Comparison

Rediet Abebe, Jon Kleinberg, David Parkes
Comments: 18 pages, 3 figures
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Combinatorics (math.CO)

In the classical cake cutting problem, a resource must be divided among
agents with different utilities so that each agent believes they have received
a fair share of the resource relative to the other agents. We introduce a
variant of the problem in which we model an underlying social network on the
agents with a graph, and agents only evaluate their shares relative to their
neighbors’ in the network. This formulation captures many situations in which
it is unrealistic to assume a global view, and also exposes interesting
phenomena in the original problem.

Specifically, we say an allocation is locally envy-free if no agent envies a
neighbor’s allocation and locally proportional if each agent values her own
allocation as much as the average value of her neighbor’s allocations, with the
former implying the latter. While global envy-freeness implies local
envy-freeness, global proportionality does not imply local proportionality, or
vice versa. A general result is that for any two distinct graphs on the same
set of nodes and an allocation, there exists a set of valuation functions such
that the allocation is locally proportional on one but not the other.

We fully characterize the set of graphs for which an oblivious single-cutter
protocol– a protocol that uses a single agent to cut the cake into pieces
–admits a bounded protocol with (O(n^2)) query complexity for locally
envy-free allocations in the Robertson-Webb model. We also consider the price
of envy-freeness, which compares the total utility of an optimal allocation to
the best utility of an allocation that is envy-free. We show that a lower bound
of (Omega(sqrt{n})) on the price of envy-freeness for global allocations in
fact holds for local envy-freeness in any connected undirected graph. Thus,
sparse graphs surprisingly do not provide more flexibility with respect to the
quality of envy-free allocations.

A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective

SamanehSorournejad, Zahra Zojaji, Reza Ebrahimi Atani, Amir Hassan Monadjemi
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Learning (cs.LG)

Credit card plays a very important rule in today’s economy. It becomes an
unavoidable part of household, business and global activities. Although using
credit cards provides enormous benefits when used carefully and
responsibly,significant credit and financial damages may be caused by
fraudulent activities. Many techniques have been proposed to confront the
growth in credit card fraud. However, all of these techniques have the same
goal of avoiding the credit card fraud; each one has its own drawbacks,
advantages and characteristics. In this paper, after investigating difficulties
of credit card fraud detection, we seek to review the state of the art in
credit card fraud detection techniques, data sets and evaluation criteria.The
advantages and disadvantages of fraud detection methods are enumerated and
compared.Furthermore, a classification of mentioned techniques into two main
fraud detection approaches, namely, misuses (supervised) and anomaly detection
(unsupervised) is presented. Again, a classification of techniques is proposed
based on capability to process the numerical and categorical data sets.
Different data sets used in literature are then described and grouped into real
and synthesized data and the effective and common attributes are extracted for
further usage.Moreover, evaluation employed criterions in literature are
collected and discussed.Consequently, open issues for credit card fraud
detection are explained as guidelines for new researchers.

Invertible Conditional GANs for image editing

Generative Adversarial Networks (GANs) have recently demonstrated to
successfully approximate complex data distributions. A relevant extension of
this model is conditional GANs (cGANs), where the introduction of external
information allows to determine specific representations of the generated
images. In this work, we evaluate encoders to inverse the mapping of a cGAN,
i.e., mapping a real image into a latent space and a conditional
representation. This allows, for example, to reconstruct and modify real images
of faces conditioning on arbitrary attributes. Additionally, we evaluate the
design of cGANs. The combination of an encoder with a cGAN, which we call
Invertible cGAN (IcGAN), enables to re-generate real images with deterministic
complex modifications.

Information Retrieval

Neural Information Retrieval: A Literature Review

Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, Tyler McDonnell, An Thanh Nguyen, Dan Xu, Byron C. Wallace, Matthew Lease
Comments: 54 pages
Subjects: Information Retrieval (cs.IR)

A recent “third wave” of Neural Network (NN) approaches now delivers
state-of-the-art performance in many machine learning tasks, spanning speech
recognition, computer vision, and natural language processing. Because these
modern NNs often comprise multiple interconnected layers, this new NN research
is often referred to as deep learning. Stemming from this tide of NN work, a
number of researchers have recently begun to investigate NN approaches to
Information Retrieval (IR). While deep NNs have yet to achieve the same level
of success in IR as seen in other areas, the recent surge of interest and work
in NNs for IR suggest that this state of affairs may be quickly changing. In
this work, we survey the current landscape of Neural IR research, paying
special attention to the use of learned representations of queries and
documents (i.e., neural embeddings). We highlight the successes of neural IR
thus far, catalog obstacles to its wider adoption, and suggest potentially
promising directions for future research.

A Visual and Textual Recurrent Neural Network for Sequential Prediction

Qiang Cui, Shu Wu, Qiang Liu, Liang Wang
Subjects: Information Retrieval (cs.IR)

Sequential prediction is a fundamental task for Web applications. Due to the
insufficiency of user feedbacks, sequential prediction usually suffers from the
cold start problem. There are two kinds of popular approaches based on matrix
factorization (MF) and Markov chains (MC) for item prediction. MF methods
factorize the user-item matrix to learn general tastes of users. MC methods
predict the next behavior based on recent behaviors. However, they have
limitations. MF methods can merge additional information to address cold start
but could not capture dynamic properties of user’s interest, and MC based
sequential methods have difficulty in addressing cold start and has a strong
Markov assumption that the next state only depends on the last state. In this
work, to deal with the cold start problem of sequential prediction, we propose
a RNN model adopting visual and textual content of items, which is named as
(mathbf{V})isual and (mathbf{T})extual (mathbf{R})ecurrent (mathbf{N})eural
(mathbf{N})etwork ((mathbf{VT})-(mathbf{RNN})). We can simultaneously learn
the sequential latent vectors that dynamically capture the user’s interest, as
well as content-based representations that contribute to address the cold
start. Experiments on two real-world datasets show that our proposed VT-RNN
model can effectively generate the personalized ranking list and significantly
alleviate the cold start problem.

Rising Novelties on Evolving Networks: Recent Behavior Dominant and Non-Dominant Model

Khushnood Abbas
Comments: 19 pages, 5 figures
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

Novelty attracts attention like popularity. Hence predicting novelty is as
important as popularity. Novelty is the side effect of competition and aging in
evolving systems. Recent behavior or recent link gain in networks plays an
important role in emergence or trend. We exploited this wisdom and came up with
two models considering different scenarios and systems. Where recent behavior
dominates over total behavior (total link gain) in the first one, and recent
behavior is as important as total behavior for future link gain in second one.
It suppose that random walker walks on a network and can jump to any node, the
probablity of jumping or making connection to other node is based on which node
is recently more active or receiving more links. In our assumption random
walker can also jump to node which is already popular but recently not popular.
We are able to predict rising novelties or popular nodes which is generally
suppressed under preferential attachment effect. To show performance of our
model we have conducted experiments on four real data sets namely, MovieLens,
Netflix, Facebook and Arxiv High Energy Physics paper citation. For testing our
model we used four information retrieval indices namely Precision, Novelty,
Area Under Receiving Operating Characteristic(AUC) and Kendal’s rank
correlation coefficient. We have used four benchmark models for validating our
proposed models. Although our model doesn’t perform better in all the cases
but, it has theoretical significance in working better for recent behavior
dominant systems.

Ontology Driven Disease Incidence Detection on Twitter

Mark Abraham Magumba, Peter Nabende
Comments: 19 pages, 7 figures, 1 table
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

In this work we address the issue of generic automated disease incidence
monitoring on twitter. We employ an ontology of disease related concepts and
use it to obtain a conceptual representation of tweets. Unlike previous key
word based systems and topic modeling approaches, our ontological approach
allows us to apply more stringent criteria for determining which messages are
relevant such as spatial and temporal characteristics whilst giving a stronger
guarantee that the resulting models will perform well on new data that may be
lexically divergent. We achieve this by training learners on concepts rather
than individual words. For training we use a dataset containing mentions of
influenza and Listeria and use the learned models to classify datasets
containing mentions of an arbitrary selection of other diseases. We show that
our ontological approach achieves good performance on this task using a variety
of Natural Language Processing Techniques. We also show that word vectors can
be learned directly from our concepts to achieve even better results.

A Business Zone Recommender System Based on Facebook and Urban Planning Data

Jovian Lin, Richard J. Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus T. Kwee, Philips K. Prasetyo
Journal-ref: Proceedings of the European Conference on Information Retrieval,
2016, pp. 641-647
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)

We present ZoneRec—a zone recommendation system for physical businesses in
an urban city, which uses both public business data from Facebook and urban
planning data. The system consists of machine learning algorithms that take in
a business’ metadata and outputs a list of recommended zones to establish the
business in. We evaluate our system using data of food businesses in Singapore
and assess the contribution of different feature groups to the recommendation
quality.

Spotting Rumors via Novelty Detection

Yumeng Qin, Dominik Wurzer, Victor Lavrenko, Cunchen Tang
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

Rumour detection is hard because the most accurate systems operate
retrospectively, only recognizing rumours once they have collected repeated
signals. By then the rumours might have already spread and caused harm. We
introduce a new category of features based on novelty, tailored to detect
rumours early on. To compensate for the absence of repeated signals, we make
use of news wire as an additional data source. Unconfirmed (novel) information
with respect to the news articles is considered as an indication of rumours.
Additionally we introduce pseudo feedback, which assumes that documents that
are similar to previous rumours, are more likely to also be a rumour.
Comparison with other real-time approaches shows that novelty based features in
conjunction with pseudo feedback perform significantly better, when detecting
rumours instantly after their publication.

Computation and Language

Coherent Dialogue with Attention-based Language Models

Hongyuan Mei, Mohit Bansal, Matthew R. Walter
Comments: To appear at AAAI 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We model coherent conversation continuation via RNN-based dialogue models
equipped with a dynamic attention mechanism. Our attention-RNN language model
dynamically increases the scope of attention on the history as the conversation
continues, as opposed to standard attention (or alignment) models with a fixed
input scope in a sequence-to-sequence model. This allows each generated word to
be associated with the most relevant words in its corresponding conversation
history. We evaluate the model on two popular dialogue datasets, the
open-domain MovieTriples dataset and the closed-domain Ubuntu Troubleshoot
dataset, and achieve significant improvements over the state-of-the-art and
baselines on several metrics, including complementary diversity-based metrics,
human evaluation, and qualitative visualizations. We also show that a vanilla
RNN with dynamic attention outperforms more complex memory models (e.g., LSTM
and GRU) by allowing for flexible, long-distance memory. We promote further
coherence via topic modeling-based reranking.

Robust end-to-end deep audiovisual speech recognition

Ramon Sanabria, Florian Metze, Fernando De La Torre
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)

Speech is one of the most effective ways of communication among humans. Even
though audio is the most common way of transmitting speech, very important
information can be found in other modalities, such as vision. Vision is
particularly useful when the acoustic signal is corrupted. Multi-modal speech
recognition however has not yet found wide-spread use, mostly because the
temporal alignment and fusion of the different information sources is
challenging.

This paper presents an end-to-end audiovisual speech recognizer (AVSR), based
on recurrent neural networks (RNN) with a connectionist temporal classification
(CTC) loss function. CTC creates sparse “peaky” output activations, and we
analyze the differences in the alignments of output targets (phonemes or
visemes) between audio-only, video-only, and audio-visual feature
representations. We present the first such experiments on the large vocabulary
IBM ViaVoice database, which outperform previously published approaches on
phone accuracy in clean and noisy conditions.

Bidirectional Tree-Structured LSTM with Head Lexicalization

Zhiyang Teng, Yue Zhang
Comments: 12 pages, 6 figures
Subjects: Computation and Language (cs.CL)

Sequential LSTM has been extended to model tree structures, giving
competitive results for a number of tasks. Existing methods model constituent
trees by bottom-up combinations of constituent nodes, making direct use of
input word information only for leaf nodes. This is different from sequential
LSTMs, which contain reference to input words for each node. In this paper, we
propose a method for automatic head-lexicalization for tree-structure LSTMs,
propagating head words from leaf nodes to every constituent node. In addition,
enabled by head lexicalization, we build a tree LSTM in the top-down direction,
which corresponds to bidirectional sequential LSTM structurally. Experiments
show that both extensions give better representations of tree structures. Our
final model gives the best results on the Standford Sentiment Treebank and
highly competitive results on the TREC question type classification task.

False-Friend Detection and Entity Matching via Unsupervised Transliteration

Yanqing Chen, Steven Skiena
Comments: 11 Pages, ACL style
Subjects: Computation and Language (cs.CL)

Transliterations play an important role in multilingual entity reference
resolution, because proper names increasingly travel between languages in news
and social media. Previous work associated with machine translation targets
transliteration only single between language pairs, focuses on specific classes
of entities (such as cities and celebrities) and relies on manual curation,
which limits the expression power of transliteration in multilingual
environment.

By contrast, we present an unsupervised transliteration model covering 69
major languages that can generate good transliterations for arbitrary strings
between any language pair. Our model yields top-(1, 20, 100) averages of
(32.85%, 60.44%, 83.20%) in matching gold standard transliteration compared to
results from a recently-published system of (26.71%, 50.27%, 72.79%). We also
show the quality of our model in detecting true and false friends from
Wikipedia high frequency lexicons. Our method indicates a strong signal of
pronunciation similarity and boosts the probability of finding true friends in
68 out of 69 languages.

Ontology Driven Disease Incidence Detection on Twitter

Mark Abraham Magumba, Peter Nabende
Comments: 19 pages, 7 figures, 1 table
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

In this work we address the issue of generic automated disease incidence
monitoring on twitter. We employ an ontology of disease related concepts and
use it to obtain a conceptual representation of tweets. Unlike previous key
word based systems and topic modeling approaches, our ontological approach
allows us to apply more stringent criteria for determining which messages are
relevant such as spatial and temporal characteristics whilst giving a stronger
guarantee that the resulting models will perform well on new data that may be
lexically divergent. We achieve this by training learners on concepts rather
than individual words. For training we use a dataset containing mentions of
influenza and Listeria and use the learned models to classify datasets
containing mentions of an arbitrary selection of other diseases. We show that
our ontological approach achieves good performance on this task using a variety
of Natural Language Processing Techniques. We also show that word vectors can
be learned directly from our concepts to achieve even better results.

Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling

Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, Bo Xu
Comments: 11 pages
Subjects: Computation and Language (cs.CL)

Recurrent Neural Network (RNN) is one of the most popular architectures used
in Natural Language Processsing (NLP) tasks because its recurrent structure is
very suitable to process variable-length text. RNN can utilize distributed
representations of words by first converting the tokens comprising each text
into vectors, which form a matrix. And this matrix includes two dimensions: the
time-step dimension and the feature vector dimension. Then most existing models
usually utilize one-dimensional (1D) max pooling operation or attention-based
operation only on the time-step dimension to obtain a fixed-length vector.
However, the features on the feature vector dimension are not mutually
independent, and simply applying 1D pooling operation over the time-step
dimension independently may destroy the structure of the feature
representation. On the other hand, applying two-dimensional (2D) pooling
operation over the two dimensions may sample more meaningful features for
sequence modeling tasks. To integrate the features on both dimensions of the
matrix, this paper explores applying 2D max pooling operation to obtain a
fixed-length representation of the text. This paper also utilizes 2D
convolution to sample more meaningful information of the matrix. Experiments
are conducted on six text classification tasks, including sentiment analysis,
question classification, subjectivity classification and newsgroup
classification. Compared with the state-of-the-art models, the proposed models
achieve excellent performance on 4 out of 6 tasks. Specifically, one of the
proposed models achieves highest accuracy on Stanford Sentiment Treebank binary
classification and fine-grained classification tasks.

Visualizing Linguistic Shift

Salman Mahmood, Rami Al-Rfou, Klaus Mueller
Subjects: Computation and Language (cs.CL)

Neural network based models are a very powerful tool for creating word
embeddings, the objective of these models is to group similar words together.
These embeddings have been used as features to improve results in various
applications such as document classification, named entity recognition, etc.
Neural language models are able to learn word representations which have been
used to capture semantic shifts across time and geography. The objective of
this paper is to first identify and then visualize how words change meaning in
different text corpus. We will train a neural language model on texts from a
diverse set of disciplines philosophy, religion, fiction etc. Each text will
alter the embeddings of the words to represent the meaning of the word inside
that text. We will present a computational technique to detect words that
exhibit significant linguistic shift in meaning and usage. We then use enhanced
scatterplots and storyline visualization to visualize the linguistic shift.

Incorporating Pass-Phrase Dependent Background Models for Text Dependent Speaker Verification

A. K. Sarkar, Zheng-Hua Tan
Subjects: Computation and Language (cs.CL)

In this paper, we propose a pass-phrase dependent background model (PBM) for
text dependent (TD) speaker verification (SV) to integrate pass-phrase
identification process (without an additional separate identification system)
in the conventional TD-SV system, where a PBM is derived from a
text-independent background model through adaptation using the utterances of a
particular pass-phrase. During training, pass-phrase specific target speaker
models are derived from the particular PBM using the training data for the
respective target model. While testing, the best PBM is first selected for the
test utterance in the maximum likelihood (ML) sense and following the selected
PBM is used for the log likelihood ratio (LLR) calculation with respect to the
claimant model. The proposed method incorporates the pass-phrase identification
step in the LLR calculation, which is not considered in conventional standalone
TD-SV based systems. The performance of the proposed method is compared to
conventional text-independent background model based TD-SV systems using a
Gaussian mixture model (GMM)-universal background model (UBM), Hidden Markov
model (HMM)-UBM and i-vector paradigms. In addition, we consider two approaches
to build PBMs: one is speaker independent and the other is speaker dependent.
We show that the proposed method significantly reduces the error rate of text
dependent speaker verification for the non-target types: target-wrong and
imposter-wrong while it maintains comparable TD-SV performance when imposters
speak a correct utterance with respect to the conventional system. Experiments
are conducted on the RedDots challenge and the RSR2015 databases which consist
of short utterances.

Tracking Words in Chinese Poetry of Tang and Song Dynasties with the China Biographical Database

Chao-Lin Liu, Kuo-Feng Luo
Comments: 9 pages, 3 figures, Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), 26th International Conference on Computational Linguistics (COLING)
Subjects: Computation and Language (cs.CL)

Large-scale comparisons between the poetry of Tang and Song dynasties shed
light on how words, collocations, and expressions were used and shared among
the poets. That some words were used only in the Tang poetry and some only in
the Song poetry could lead to interesting research in linguistics. That the
most frequent colors are different in the Tang and Song poetry provides a trace
of the changing social circumstances in the dynasties. Results of the current
work link to research topics of lexicography, semantics, and social
transitions. We discuss our findings and present our algorithms for efficient
comparisons among the poems, which are crucial for completing billion times of
comparisons within acceptable time.

Unsupervised Learning for Lexicon-Based Classification

Jacob Eisenstein
Comments: to appear in AAAI 2017
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

In lexicon-based classification, documents are assigned labels by comparing
the number of words that appear from two opposed lexicons, such as positive and
negative sentiment. Creating such words lists is often easier than labeling
instances, and they can be debugged by non-experts if classification
performance is unsatisfactory. However, there is little analysis or
justification of this classification heuristic. This paper describes a set of
assumptions that can be used to derive a probabilistic justification for
lexicon-based classification, as well as an analysis of its expected accuracy.
One key assumption behind lexicon-based classification is that all words in
each lexicon are equally predictive. This is rarely true in practice, which is
why lexicon-based approaches are usually outperformed by supervised classifiers
that learn distinct weights on each word from labeled instances. This paper
shows that it is possible to learn such weights without labeled data, by
leveraging co-occurrence statistics across the lexicons. This offers the best
of both worlds: light supervision in the form of lexicons, and data-driven
classification with higher accuracy than traditional word-counting heuristics.

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Recent progress on image captioning has made it possible to generate novel
sentences describing images in natural language, but compressing an image into
a single sentence can describe visual content in only coarse detail. While one
new captioning approach, dense captioning, can potentially describe images in
finer levels of detail by captioning many regions within an image, it in turn
is unable to produce a coherent story for an image. In this paper we overcome
these limitations by generating entire paragraphs for describing images, which
can tell detailed, unified stories. We develop a model that decomposes both
images and paragraphs into their constituent parts, detecting semantic regions
in images and using a hierarchical recurrent neural network to reason about
language. Linguistic analysis confirms the complexity of the paragraph
generation task, and thorough experiments on a new dataset of image and
paragraph pairs demonstrate the effectiveness of our approach.

Recurrent Memory Addressing for describing videos

Deep Neural Network architectures with external memory components allow the
model to perform inference and capture long term dependencies, by storing
information explicitly. In this paper, we generalize Key-Value Memory Networks
to a multimodal setting, introducing a novel key-addressing mechanism to deal
with sequence-to-sequence models. The advantages of the framework are
demonstrated on the task of video captioning, i.e generating natural language
descriptions for videos. Conditioning on the previous time-step attention
distributions for the key-value memory slots, we introduce a temporal structure
in the memory addressing schema. The proposed model naturally decomposes the
problem of video captioning into vision and language segments, dealing with
them as key-value pairs. More specifically, we learn a semantic embedding (v)
corresponding to each frame (k) in the video, thereby creating (k, v) memory
slots. This allows us to exploit the temporal dependencies at multiple
hierarchies (in the recurrent key-addressing; and in the language decoder).
Exploiting this flexibility of the framework, we additionally capture spatial
dependencies while mapping from the visual to semantic embedding. Extensive
experiments on the Youtube2Text dataset demonstrate usefulness of recurrent
key-addressing, while achieving competitive scores on BLEU@4, METEOR metrics
against state-of-the-art models.

Generating machine-executable plans from end-user's natural-language instructions

It is critical for advanced manufacturing machines to autonomously execute a
task by following an end-user’s natural language (NL) instructions. However, NL
instructions are usually ambiguous and abstract so that the machines may
misunderstand and incorrectly execute the task. To address this NL-based
human-machine communication problem and enable the machines to appropriately
execute tasks by following the end-user’s NL instructions, we developed a
Machine-Executable-Plan-Generation (exePlan) method. The exePlan method
conducts task-centered semantic analysis to extract task-related information
from ambiguous NL instructions. In addition, the method specifies machine
execution parameters to generate a machine-executable plan by interpreting
abstract NL instructions. To evaluate the exePlan method, an industrial robot
Baxter was instructed by NL to perform three types of industrial tasks {‘drill
a hole’, ‘clean a spot’, ‘install a screw’}. The experiment results proved that
the exePlan method was effective in generating machine-executable plans from
the end-user’s NL instructions. Such a method has the promise to endow a
machine with the ability of NL-instructed task execution.

Gendered Conversation in a Social Game-Streaming Platform

Supun Nakandala, Giovanni Luca Ciampaglia, Norman Makoto Su, Yong-Yeol Ahn
Comments: 10 pages, 7 figures, 5 tables
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY)

Online social media and games are increasingly replacing offline social
activities. Social media is now an indispensable mode of communication; online
gaming is not only a genuine social activity but also a popular spectator
sport. With support for anonymity and larger audiences, online interaction
shrinks social and geographical barriers. Despite such benefits, social
disparities such as gender inequality persist in online social media. In
particular, online gaming communities have been criticized for persistent
gender disparities and objectification. As gaming evolves into a social
platform, persistence of gender disparity is a pressing question. Yet, there
are few large-scale, systematic studies of gender inequality and
objectification in social gaming platforms. Here we analyze more than one
billion chat messages from Twitch, a social game-streaming platform, to study
how the gender of streamers is associated with the nature of conversation.
Using a combination of computational text analysis methods, we show that
gendered conversation and objectification is prevalent in chats. Female
streamers receive significantly more objectifying comments while male streamers
receive more game-related comments. This difference is more pronounced for
popular streamers. There also exists a large number of users who post only on
female or male streams. Employing a neural vector-space embedding (paragraph
vector) method, we analyze gendered chat messages and create prediction models
that (i) identify the gender of streamers based on messages posted in the
channel and (ii) identify the gender a viewer prefers to watch based on their
chat messages. Our findings suggest that disparities in social game-streaming
platforms is a nuanced phenomenon that involves the gender of streamers as well
as those who produce gendered and game-related conversation.

Spotting Rumors via Novelty Detection

Yumeng Qin, Dominik Wurzer, Victor Lavrenko, Cunchen Tang
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

Rumour detection is hard because the most accurate systems operate
retrospectively, only recognizing rumours once they have collected repeated
signals. By then the rumours might have already spread and caused harm. We
introduce a new category of features based on novelty, tailored to detect
rumours early on. To compensate for the absence of repeated signals, we make
use of news wire as an additional data source. Unconfirmed (novel) information
with respect to the news articles is considered as an indication of rumours.
Additionally we introduce pseudo feedback, which assumes that documents that
are similar to previous rumours, are more likely to also be a rumour.
Comparison with other real-time approaches shows that novelty based features in
conjunction with pseudo feedback perform significantly better, when detecting
rumours instantly after their publication.

Distributed, Parallel, and Cluster Computing

Population Protocols with Faulty Interactions: the Impact of a Leader

Giuseppe Antonio Di Luna, Paola Flocchini, Taisuke Izumi, Tomoko Izumi, Nicola Santoro, Giovanni Viglietta
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We consider the problem of simulating traditional population protocols under
weaker models of communication, which include one-way interactions (as opposed
to two-way interactions) and omission faults (i.e., failure by an agent to read
its partner’s state during an interaction), which in turn may be detectable or
undetectable. We focus on the impact of a leader, and we give a complete
characterization of the models in which the presence of a unique leader in the
system allows the construction of simulators: when simulations are possible, we
give explicit protocols; when they are not, we give proofs of impossibility.
Specifically, if each agent has only a finite amount of memory, the simulation
is possible only if there are no omission faults. If agents have an unbounded
amount of memory, the simulation is possible as long as omissions are
detectable. If an upper bound on the number of omissions involving the leader
is known, the simulation is always possible, except in the one-way model in
which one side is unable to detect the interaction.

Demonstration of a context-switch method for heterogeneous reconfigurable systems

Arief Wicaksana (TIMA), Alban Bourge (TIMA), Olivier Muller (TIMA), Frédéric Rousseau (TIMA)
Journal-ref: 2016 26th International Conference on Field Programmable Logic and
Applications (FPL), Aug 2016, Lausanne, Switzerland. pp.1 – 1, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Nowadays, FPGAs are integrated in high-performance computing systems,
servers, or even used as accelerators in System-on-Chip (SoC) platforms. Since
the execution is performed in hardware, FPGA gives much higher performance and
lower energy consumption compared to most microprocessor-based systems.
However, the room to improve FPGA performance still exists, e.g. when it is
used by multiple users. In multi-user approaches, FPGA resources are shared
between several users. Therefore, one must be able to interrupt a running
circuit at any given time and continue the task at will. An image of the state
of the running circuit (context) is saved during interruption and restored when
the execution is continued. The ability to extract and restore the context is
known as context-switch.In the previous work [1], an automatic checkpoint
selection method is proposed for circuit generation targeting reconfigurable
systems. The method relies on static analysis of the finite state machine of a
circuit to select the checkpoint states. States with minimum overhead will be
selected as checkpoints, which allow optimal context save and restore. The
maximum time to reach a checkpoint will be defined by the user and consideredas
the context-switch latency. The method is implemented in C code and integrated
as plugin in a free and open-source High-Level Synthesis tool AUGH [2].

A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting

Sandra Catalán, José R. Herrero, Enrique S. Quintana-Ortí, Rafael Rodríguez-Sánchez, Robert van de Geijn
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Performance (cs.PF)

We propose two novel techniques for overcoming load-imbalance encountered
when implementing so-called look-ahead mechanisms in relevant dense matrix
factorizations for the solution of linear systems. Both techniques target the
scenario where two thread teams are created/activated during the factorization,
with each team in charge of performing an independent task/branch of execution.
The first technique promotes worker sharing (WS) between the two tasks,
allowing the threads of the task that completes first to be reallocated for use
by the costlier task. The second technique allows a fast task to alert the
slower task of completion, enforcing the early termination (ET) of the second
task, and a smooth transition of the factorization procedure into the next
iteration.

The two mechanisms are instantiated via a new malleable thread-level
implementation of the Basic Linear Algebra Subprograms (BLAS), and their
benefits are illustrated via an implementation of the LU factorization with
partial pivoting enhanced with look-ahead. Concretely, our experimental results
on a six core Intel-Xeon processor show the benefits of combining WS+ET,
reporting competitive performance in comparison with a task-parallel
runtime-based solution.

Gossiping with Latencies

Seth Gilbert, Peter Robinson, Suman Sourav
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Consider the classical problem of information dissemination: one (or more)
nodes in a network have some information that they want to distribute to the
remainder of the network. In this paper, we study the cost of information
dissemination in networks where edges have latencies, i.e., sending a message
from one node to another takes some amount of time. We first generalize the
idea of conductance to weighted graphs, defining (phi_*) to be the “weighted
conductance” and (ell_*) to be the “critical latency.” One goal of this paper
is to argue that (phi_*) characterizes the connectivity of a weighted graph
with latencies in much the same way that conductance characterizes the
connectivity of unweighted graphs.

We give near tight upper and lower bounds on the problem of information
dissemination, up to polylogarithmic factors. Specifically, we show that in a
graph with (weighted) diameter (D) (with latencies as weights), maximum degree
(Delta), weighted conductance (phi_*) and critical latency (ell_*), any
information dissemination algorithm requires at least (Omega(min(D+Delta,
ell_*/phi_*))) time. We show several variants of the lower bound (e.g., for
graphs with small diameter, graphs with small max-degree, etc.) by reduction to
a simple combinatorial game.

We then give nearly matching algorithms, showing that information
dissemination can be solved in (O(min((D + Delta)log^3{n}),
(ell_*/phi_*)log(n))) time. This is achieved by combining two cases. When
nodes do not know the latency of the adjacent edges, we show that the classical
push-pull algorithm is (near) optimal when the diameter or maximum degree is
large. For the case where the diameter and maximum degree are small, we give an
alternative strategy in which we first discover the latencies and then use an
algorithm for known latencies based on a weighted spanner construction.

A Survey of Methods for Collective Communication Optimization and Tuning

Udayanga Wickramasinghe, Andrew Lumsdaine
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

New developments in HPC technology in terms of increasing computing power on
multi/many core processors, high-bandwidth memory/IO subsystems and
communication interconnects, pose a direct impact on software and runtime
system development. These advancements have become useful in producing
high-performance collective communication interfaces that integrate efficiently
on a wide variety of platforms and environments. However, number of
optimization options that shows up with each new technology or software
framework has resulted in a emph{combinatorial explosion} in feature space for
tuning collective parameters such that finding the optimal set has become a
nearly impossible task. Applicability of algorithmic choices available for
optimizing collective communication depends largely on the scalability
requirement for a particular usecase. This problem can be further exasperated
by any requirement to run collective problems at very large scales such as in
the case of exascale computing, at which impractical tuning by brute force may
require many months of resources. Therefore application of statistical, data
mining and artificial Intelligence or more general hybrid learning models seems
essential in many collectives parameter optimization problems. We hope to
explore current and the cutting edge of collective communication optimization
and tuning methods and culminate with possible future directions towards this
problem.

Towards a Complete Framework for Virtual Data Center Embedding

M P Gilesh
Comments: Technical Report, NIT Calicut
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Cloud computing is widely adopted by corporate as well as retail customers to
reduce the upfront cost of establishing computing infrastructure. However,
switching to the cloud based services poses a multitude of questions, both for
customers and for data center owners. In this work, we propose an algorithm for
optimal placement of multiple virtual data centers on a physical data center.
Our algorithm has two modes of operation – an online mode and a batch mode.
Coordinated batch and online embedding algorithms are used to maximize resource
usage while fulfilling the QoS demands. Experimental evaluation of our
algorithms show that acceptance rate is high – implying higher profit to
infrastructure provider. Additionaly, we try to keep a check on the number of
VM migrations, which can increase operational cost and thus lead to service
level agreement violations.

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

Matthew W. Moskewicz, Ali Jannesari, Kurt Keutzer
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)

In recent years, deep neural networks (DNNs), have yielded strong results on
a wide range of applications. Graphics Processing Units (GPUs) have been one
key enabling factor leading to the current popularity of DNNs. However, despite
increasing hardware flexibility and software programming toolchain maturity,
high efficiency GPU programming remains difficult: it suffers from high
complexity, low productivity, and low portability. GPU vendors such as NVIDIA
have spent enormous effort to write special-purpose DNN libraries. However, on
other hardware targets, especially mobile GPUs, such vendor libraries are not
generally available. Thus, the development of portable, open, high-performance,
energy-efficient GPU code for DNN operations would enable broader deployment of
DNN-based algorithms. Toward this end, this work presents a framework to enable
productive, high-efficiency GPU programming for DNN computations across
hardware platforms and programming models. In particular, the framework
provides specific support for metaprogramming, autotuning, and DNN-tailored
data types. Using our framework, we explore implementing DNN operations on
three different hardware targets: NVIDIA, AMD, and Qualcomm GPUs. On NVIDIA
GPUs, we show both portability between OpenCL and CUDA as well competitive
performance compared to the vendor library. On Qualcomm GPUs, we show that our
framework enables productive development of target-specific optimizations, and
achieves reasonable absolute performance. Finally, On AMD GPUs, we show initial
results that indicate our framework can yield reasonable performance on a new
platform with minimal effort.

Service-Oriented Sharding with Aspen

Adem Efe Gencer, Robbert van Renesse, Emin Gün Sirer
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

The rise of blockchain-based cryptocurrencies has led to an explosion of
services using distributed ledgers as their underlying infrastructure. However,
due to inherently single-service oriented blockchain protocols, such services
can bloat the existing ledgers, fail to provide sufficient security, or
completely forego the property of trustless auditability. Security concerns,
trust restrictions, and scalability limits regarding the resource requirements
of users hamper the sustainable development of loosely-coupled services on
blockchains.

This paper introduces Aspen, a sharded blockchain protocol designed to
securely scale with increasing number of services. Aspen shares the same trust
model as Bitcoin in a peer-to-peer network that is prone to extreme churn
containing Byzantine participants. It enables introduction of new services
without compromising the security, leveraging the trust assumptions, or
flooding users with irrelevant messages.

Distributed Nonconvex Optimization for Sparse Representation

Ying Sun, Gesualdo Scutari
Comments: Submitted to ICASSP 2017
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC)

We consider a non-convex constrained Lagrangian formulation of a fundamental
bi-criteria optimization problem for variable selection in statistical
learning; the two criteria are a smooth (possibly) nonconvex loss function,
measuring the fitness of the model to data, and the latter function is a
difference-of-convex (DC) regularization, employed to promote some extra
structure on the solution, like sparsity. This general class of nonconvex
problems arises in many big-data applications, from statistical machine
learning to physical sciences and engineering. We develop the first unified
distributed algorithmic framework for these problems and establish its
asymptotic convergence to d-stationary solutions. Two key features of the
method are: i) it can be implemented on arbitrary networks (digraphs) with
(possibly) time-varying connectivity; and ii) it does not require the
restrictive assumption that the (sub)gradient of the objective function is
bounded, which enlarges significantly the class of statistical learning
problems that can be solved with convergence guarantees.

Deep Tensor Convolution on Multicores

Deep convolutional neural networks (ConvNets) have become a de facto standard
for image classification and segmentation problems. These networks have also
had early success in the video domain, despite failing to capture motion
continuity and other rich temporal correlations. Evidence has since emerged
that extending ConvNets to 3-dimensions leads to state-of-the-art performance
across a broad set of video processing tasks by learning these joint
spatiotemporal features. However, these early 3D networks have been restricted
to shallower architectures of fewer channels than successful 2D networks due to
memory constraints inherent to GPU implementations.

In this study we present the first practical CPU implementation of tensor
convolution optimized for deep networks of small kernels. Our implementation
supports arbitrarily deep ConvNets of (N)-dimensional tensors due to the
relaxed memory constraints of CPU systems, which can be further leveraged for
an 8-fold reduction in the algorithmic cost of 3D convolution (e.g. C3D
kernels). Because most of the optimized ConvNets in previous literature are 2
rather than 3-dimensional, we benchmark our performance against the most
popular 2D implementations. Even in this special case, which is theoretically
the least beneficial for our fast algorithm, we observe a 5 to 25-fold
improvement in throughput compared to previous state-of-the-art. We believe
this work is an important step toward practical ConvNets for real-time
applications, such as mobile video processing and biomedical image analysis,
where high performance 3D networks are a must.

Learning

GRAM: Graph-based Attention Model for Healthcare Representation Learning

Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, Jimeng Sun
Comments: Under review for ICLR 2017
Subjects: Learning (cs.LG)

Deep learning methods exhibit promising performance for predictive modeling
in healthcare, but two important challenges remain: -Data insufficiency: Often
in healthcare predictive modeling, the sample size is insufficient for deep
learning methods to achieve satisfactory results. -Interpretation: The
representations learned by deep learning models should align with medical
knowledge. To address these challenges, we propose a GRaph-based Attention
Model, GRAM that supplements electronic health records (EHR) with hierarchical
information inherent to medical ontologies. Based on the data volume and the
ontology structure, GRAM represents a medical concept as a combination of its
ancestors in the ontology via an attention mechanism. We compared predictive
performance (i.e. accuracy, data needs, interpretability) of GRAM to various
methods including the recurrent neural network (RNN) in two sequential
diagnoses prediction tasks and one heart failure prediction task. Compared to
the basic RNN, GRAM achieved 10% higher accuracy for predicting diseases rarely
observed in the training data and 3% improved area under the ROC curve for
predicting heart failure using an order of magnitude less training data.
Additionally, unlike other methods, the medical concept representations learned
by GRAM are well aligned with the medical ontology. Finally, GRAM exhibits
intuitive attention behaviors by adaptively generalizing to higher level
concepts when facing data insufficiency at the lower level concepts.

Associative Adversarial Networks

Tarik Arici, Asli Celikyilmaz
Comments: NIPS 2016 Workshop on Adversarial Training
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We propose a higher-level associative memory for learning adversarial
networks. Generative adversarial network (GAN) framework has a discriminator
and a generator network. The generator (G) maps white noise (z) to data samples
while the discriminator (D) maps data samples to a single scalar. To do so, G
learns how to map from high-level representation space to data space, and D
learns to do the opposite. We argue that higher-level representation spaces
need not necessarily follow a uniform probability distribution. In this work,
we use Restricted Boltzmann Machines (RBMs) as a higher-level associative
memory and learn the probability distribution for the high-level features
generated by D. The associative memory samples its underlying probability
distribution and G learns how to map these samples to data space. The proposed
associative adversarial networks (AANs) are generative models in the
higher-levels of the learning, and use adversarial non-stochastic models D and
G for learning the mapping between data and higher-level representation spaces.
Experiments show the potential of the proposed networks.

Unsupervised Learning for Lexicon-Based Classification

Jacob Eisenstein
Comments: to appear in AAAI 2017
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

In lexicon-based classification, documents are assigned labels by comparing
the number of words that appear from two opposed lexicons, such as positive and
negative sentiment. Creating such words lists is often easier than labeling
instances, and they can be debugged by non-experts if classification
performance is unsatisfactory. However, there is little analysis or
justification of this classification heuristic. This paper describes a set of
assumptions that can be used to derive a probabilistic justification for
lexicon-based classification, as well as an analysis of its expected accuracy.
One key assumption behind lexicon-based classification is that all words in
each lexicon are equally predictive. This is rarely true in practice, which is
why lexicon-based approaches are usually outperformed by supervised classifiers
that learn distinct weights on each word from labeled instances. This paper
shows that it is possible to learn such weights without labeled data, by
leveraging co-occurrence statistics across the lexicons. This offers the best
of both worlds: light supervision in the form of lexicons, and data-driven
classification with higher accuracy than traditional word-counting heuristics.

Learning From Graph Neighborhoods Using LSTMs

Rakshit Agrawal, Luca de Alfaro, Vassilis Polychronopoulos
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Many prediction problems can be phrased as inferences over local
neighborhoods of graphs. The graph represents the interaction between entities,
and the neighborhood of each entity contains information that allows the
inferences or predictions. We present an approach for applying machine learning
directly to such graph neighborhoods, yielding predicitons for graph nodes on
the basis of the structure of their local neighborhood and the features of the
nodes in it. Our approach allows predictions to be learned directly from
examples, bypassing the step of creating and tuning an inference model or
summarizing the neighborhoods via a fixed set of hand-crafted features. The
approach is based on a multi-level architecture built from Long Short-Term
Memory neural nets (LSTMs); the LSTMs learn how to summarize the neighborhood
from data. We demonstrate the effectiveness of the proposed technique on a
synthetic example and on real-world data related to crowdsourced grading,
Bitcoin transactions, and Wikipedia edit reversions.

Options Discovery with Budgeted Reinforcement Learning

Aurélia Léon, Ludovic Denoyer
Comments: Under review as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We consider the problem of learning hierarchical policies for Reinforcement
Learning able to discover options, an option corresponding to a sub-policy over
a set of primitive actions. Different models have been proposed during the last
decade that usually rely on a predefined set of options. We specifically
address the problem of automatically discovering options in decision processes.
We describe a new RL learning framework called Bi-POMDP, and a new learning
model called Budgeted Option Neural Network (BONN) able to discover options
based on a budgeted learning objective. Since Bi-POMDP are more general than
POMDP, our model can also be used to discover options for classical RL tasks.
The BONN model is evaluated on different classical RL problems, demonstrating
both quantitative and qualitative interesting results.

Generalized Dropout

Suraj Srinivas, R. Venkatesh Babu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Deep Neural Networks often require good regularizers to generalize well.
Dropout is one such regularizer that is widely used among Deep Learning
practitioners. Recent work has shown that Dropout can also be viewed as
performing Approximate Bayesian Inference over the network parameters. In this
work, we generalize this notion and introduce a rich family of regularizers
which we call Generalized Dropout. One set of methods in this family, called
Dropout++, is a version of Dropout with trainable parameters. Classical Dropout
emerges as a special case of this method. Another member of this family selects
the width of neural network layers. Experiments show that these methods help in
improving generalization performance over Dropout.

Effective Deterministic Initialization for (k)-Means-Like Methods via Local Density Peaks Searching

Fengfu Li, Hong Qiao, Bo Zhang
Comments: 16 pages, 9 figures, journal paper
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The (k)-means clustering algorithm is popular but has the following main
drawbacks: 1) the number of clusters, (k), needs to be provided by the user in
advance, 2) it can easily reach local minima with randomly selected initial
centers, 3) it is sensitive to outliers, and 4) it can only deal with well
separated hyperspherical clusters. In this paper, we propose a Local Density
Peaks Searching (LDPS) initialization framework to address these issues. The
LDPS framework includes two basic components: one of them is the local density
that characterizes the density distribution of a data set, and the other is the
local distinctiveness index (LDI) which we introduce to characterize how
distinctive a data point is compared with its neighbors. Based on these two
components, we search for the local density peaks which are characterized with
high local densities and high LDIs to deal with 1) and 2). Moreover, we detect
outliers characterized with low local densities but high LDIs, and exclude them
out before clustering begins. Finally, we apply the LDPS initialization
framework to (k)-medoids, which is a variant of (k)-means and chooses data
samples as centers, with diverse similarity measures other than the Euclidean
distance to fix the last drawback of (k)-means. Combining the LDPS
initialization framework with (k)-means and (k)-medoids, we obtain two novel
clustering methods called LDPS-means and LDPS-medoids, respectively.
Experiments on synthetic data sets verify the effectiveness of the proposed
methods, especially when the ground truth of the cluster number (k) is large.
Further, experiments on several real world data sets, Handwritten Pendigits,
Coil-20, Coil-100 and Olivetti Face Database, illustrate that our methods give
a superior performance than the analogous approaches on both estimating (k) and
unsupervised object categorization.

Probabilistic Duality for Parallel Gibbs Sampling without Graph Coloring

Lars Mescheder, Sebastian Nowozin, Andreas Geiger
Subjects: Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

We present a new notion of probabilistic duality for random variables
involving mixture distributions. Using this notion, we show how to implement a
highly-parallelizable Gibbs sampler for weakly coupled discrete pairwise
graphical models with strictly positive factors that requires almost no
preprocessing and is easy to implement. Moreover, we show how our method can be
combined with blocking to improve mixing. Even though our method leads to
inferior mixing times compared to a sequential Gibbs sampler, we argue that our
method is still very useful for large dynamic networks, where factors are added
and removed on a continuous basis, as it is hard to maintain a graph coloring
in this setup. Similarly, our method is useful for parallelizing Gibbs sampling
in graphical models that do not allow for graph colorings with a small number
of colors such as densely connected graphs.

Temporal Generative Adversarial Nets

Masaki Saito, Eiichi Matsumoto
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In this paper we propose a generative model, the Temporal Generative
Adversarial Network (TGAN), which can learn a semantic representation of
unlabelled videos, and is capable of generating consistent videos. Unlike an
existing GAN that generates videos with a generator consisting of 3D
deconvolutional layers, our model exploits two types of generators: a temporal
generator and an image generator. The temporal generator consists of 1D
deconvolutional layers and outputs a set of latent variables, each of which
corresponds to a frame in the generated video, and the image generator
transforms them into a video with 2D deconvolutional layers. This
representation allows efficient training of the network parameters. Moreover,
it can handle a wider range of applications including the generation of a long
sequence, frame interpolation, and the use of pre-trained models. Experimental
results demonstrate the effectiveness of our method.

Prototypical Recurrent Unit

Dingkun Long, Richong Zhang, Yongyi Mao
Subjects: Learning (cs.LG)

The difficulty in analyzing LSTM-like recurrent neural networks lies in the
complex structure of the recurrent unit, which induces highly complex nonlinear
dynamics. In this paper, we design a new simple recurrent unit, which we call
Prototypical Recurrent Unit (PRU). We verify experimentally that PRU performs
comparably to LSTM and GRU. This potentially enables PRU to be a prototypical
example for analytic study of LSTM-like recurrent networks. Along these
experiments, the memorization capability of LSTM-like networks is also studied
and some insights are obtained.

Dealing with Range Anxiety in Mean Estimation via Statistical Queries

Vitaly Feldman
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In the statistical query (SQ) model an algorithm has access to an SQ oracle
for the input distribution (D) over (X) instead of i.i.d.~ samples from (D).
Given a query function (phi:X
ightarrow [-1,1]), the oracle returns an
estimate of ({f E}_{{f x}sim D}[phi({f x})]) within some tolerance
( au). In a variety of natural problems it is necessary to estimate
expectations of functions whose standard deviation is much smaller than the
range. In this note we describe a nearly optimal algorithm for estimation of
such expectations via statistical queries. As applications, we give algorithms
for high dimensional mean estimation in the SQ model and in the distributed
setting where only a single bit is communicated from each sample.

Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline

Zhiguang Wang, Weizhong Yan, Tim Oates
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

We propose a simple but strong baseline for time series classification from
scratch with deep neural networks. Our proposed baseline models are pure
end-to-end without any heavy preprocessing on the raw data or feature crafting.
The FCN achieves premium performance to other state-of-the-art approaches. Our
exploration of the very deep neural networks with the ResNet structure achieves
competitive performance under the same simple experiment settings. The simple
MLP baseline is also comparable to the 1NN-DTW as a previous golden baseline.
Our models provides a simple choice for the real world application and a good
starting point for the future research. An overall analysis is provided to
discuss the generalization of our models, learned features, network structures
and the classification semantics.

Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz
Comments: 5 pages, 8 figures, NIPS workshop: The 1st International Workshop on Efficient Methods for Deep Neural Networks
Subjects: Learning (cs.LG)

We propose a new framework for pruning convolutional kernels in neural
networks to enable efficient inference, focusing on transfer learning where
large and potentially unwieldy pretrained networks are adapted to specialized
tasks. We interleave greedy criteria-based pruning with fine-tuning by
backpropagation – a computationally efficient procedure that maintains good
generalization in the pruned network. We propose a new criterion based on an
efficient first-order Taylor expansion to approximate the absolute change in
training cost induced by pruning a network component. After normalization, the
proposed criterion scales appropriately across all layers of a deep CNN,
eliminating the need for per-layer sensitivity analysis. The proposed criterion
demonstrates superior performance compared to other criteria, such as the norm
of kernel weights or average feature map activation.

Quantized neural network design under weight capacity constraint

The complexity of deep neural network algorithms for hardware implementation
can be lowered either by scaling the number of units or reducing the
word-length of weights. Both approaches, however, can accompany the performance
degradation although many types of research are conducted to relieve this
problem. Thus, it is an important question which one, between the network size
scaling and the weight quantization, is more effective for hardware
optimization. For this study, the performances of fully-connected deep neural
networks (FCDNNs) and convolutional neural networks (CNNs) are evaluated while
changing the network complexity and the word-length of weights. Based on these
experiments, we present the effective compression ratio (ECR) to guide the
trade-off between the network size and the precision of weights when the
hardware resource is limited.

Cross-model convolutional neural network for multiple modality data representation

Yanbin Wu, Li Wang, Fan Cui, Hongbin Zhai, Baoming Dong, Jim Jing-Yan Wang
Subjects: Learning (cs.LG)

A novel data representation method of convolutional neural net- work (CNN) is
proposed in this paper to represent data of different modalities. We learn a
CNN model for the data of each modality to map the data of differ- ent
modalities to a common space, and regularize the new representations in the
common space by a cross-model relevance matrix. We further impose that the
class label of data points can also be predicted from the CNN representa- tions
in the common space. The learning problem is modeled as a minimiza- tion
problem, which is solved by an augmented Lagrange method (ALM) with updating
rules of Alternating direction method of multipliers (ADMM). The experiments
over benchmark of sequence data of multiple modalities show its advantage.

GA3C: GPU-based A3C for Deep Reinforcement Learning

Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, Jan Kautz
Subjects: Learning (cs.LG)

We introduce and analyze the computational aspects of a hybrid CPU/GPU
implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm,
currently the state-of-the-art method in reinforcement learning for various
gaming tasks. Our analysis concentrates on the critical aspects to leverage the
GPU’s computational power, including the introduction of a system of queues and
a dynamic scheduling strategy, potentially helpful for other asynchronous
algorithms as well. We also show the potential for the use of larger DNN models
on a GPU. Our TensorFlow implementation achieves a significant speed up
compared to our CPU-only implementation, and it will be made publicly available
to other researchers.

Robust end-to-end deep audiovisual speech recognition

Ramon Sanabria, Florian Metze, Fernando De La Torre
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)

Speech is one of the most effective ways of communication among humans. Even
though audio is the most common way of transmitting speech, very important
information can be found in other modalities, such as vision. Vision is
particularly useful when the acoustic signal is corrupted. Multi-modal speech
recognition however has not yet found wide-spread use, mostly because the
temporal alignment and fusion of the different information sources is
challenging.

This paper presents an end-to-end audiovisual speech recognizer (AVSR), based
on recurrent neural networks (RNN) with a connectionist temporal classification
(CTC) loss function. CTC creates sparse “peaky” output activations, and we
analyze the differences in the alignments of output targets (phonemes or
visemes) between audio-only, video-only, and audio-visual feature
representations. We present the first such experiments on the large vocabulary
IBM ViaVoice database, which outperform previously published approaches on
phone accuracy in clean and noisy conditions.

Statistical Learning for OCR Text Correction

Jie Mei, Aminul Islam, Yajing Wu, Abidalrahman Moh'd, Evangelos E. Milios
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

The accuracy of Optical Character Recognition (OCR) is crucial to the success
of subsequent applications used in text analyzing pipeline. Recent models of
OCR post-processing significantly improve the quality of OCR-generated text,
but are still prone to suggest correction candidates from limited observations
while insufficiently accounting for the characteristics of OCR errors. In this
paper, we show how to enlarge candidate suggestion space by using external
corpus and integrating OCR-specific features in a regression approach to
correct OCR-generated errors. The evaluation results show that our model can
correct 61.5% of the OCR-errors (considering the top 1 suggestion) and 71.5% of
the OCR-errors (considering the top 3 suggestions), for cases where the
theoretical correction upper-bound is 78%.

Probabilistic structure discovery in time series data

David Janz, Brooks Paige, Tom Rainforth, Jan-Willem van de Meent, Frank Wood
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Existing methods for structure discovery in time series data construct
interpretable, compositional kernels for Gaussian process regression models.
While the learned Gaussian process model provides posterior mean and variance
estimates, typically the structure is learned via a greedy optimization
procedure. This restricts the space of possible solutions and leads to
over-confident uncertainty estimates. We introduce a fully Bayesian approach,
inferring a full posterior over structures, which more reliably captures the
uncertainty of the model.

Emergence of Compositional Representations in Restricted Boltzmann Machines

Jérôme Tubiana (LPTENS), Rémi Monasson (LPTENS)
Comments: Supplementary material available at the authors’ webpage
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG); Machine Learning (stat.ML)

Extracting automatically the complex set of features composing real
high-dimensional data is crucial for achieving high performance in
machine–learning tasks. Restricted Boltzmann Machines (RBM) are empirically
known to be efficient for this purpose, and to be able to generate distributed
and graded representations of the data. We characterize the structural
conditions (sparsity of the weights, low effective temperature, nonlinearities
in the activation functions of hidden units, and adaptation of fields
maintaining the activity in the visible layer) allowing RBM to operate in such
a compositional phase. Evidence is provided by the replica analysis of an
adequate statistical ensemble of random RBMs and by RBM trained on the
handwritten digits dataset MNIST.

On the convergence of gradient-like flows with noisy gradient input

Panayotis Mertikopoulos, Mathias Staudigl
Comments: 33 pages, 3 figures
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Dynamical Systems (math.DS)

In view of solving convex optimization problems with noisy gradient input, we
analyze the asymptotic behavior of gradient-like flows that are subject to
stochastic disturbances. Specifically, we focus on the widely studied class of
mirror descent methods for constrained convex programming and we examine the
dynamics’ convergence and concentration properties in the presence of noise. In
the small noise limit, we show that the dynamics converge to the solution set
of the underlying problem (a.s.). Otherwise, if the noise is persistent, we
estimate the measure of the dynamics’ long-run concentration around interior
solutions and their convergence to boundary solutions that are sufficiently
“robust”. Finally, we show that a rectified variant of the method with a
decreasing sensitivity parameter converges irrespective of the magnitude of the
noise or the structure of the underlying convex program, and we derive an
explicit estimate for its rate of convergence.

Training Sparse Neural Networks

Suraj Srinivas, Akshayvarun Subramanya, R. Venkatesh Babu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Deep neural networks with lots of parameters are typically used for
large-scale computer vision tasks such as image classification. This is a
result of using dense matrix multiplications and convolutions. However, sparse
computations are known to be much more efficient. In this work, we train and
build neural networks which implicitly use sparse computations. We introduce
additional gate variables to perform parameter selection and show that this is
equivalent to using a spike-and-slab prior. We experimentally validate our
method on both small and large networks and achieve state-of-the-art
compression results for sparse neural network models.

Error analysis of regularized least-square regression with Fredholm kernel

Yanfang Tao, Peipei Yuan, Biqin Song
Subjects: Statistics Theory (math.ST); Learning (cs.LG)

Learning with Fredholm kernel has attracted increasing attention recently
since it can effectively utilize the data information to improve the prediction
performance. Despite rapid progress on theoretical and experimental
evaluations, its generalization analysis has not been explored in learning
theory literature. In this paper, we establish the generalization bound of
least square regularized regression with Fredholm kernel, which implies that
the fast learning rate O(l^{-1}) can be reached under mild capacity conditions.
Simulated examples show that this Fredholm regression algorithm can achieve the
satisfactory prediction performance.

Scalable Adaptive Stochastic Optimization Using Random Projections

Gabriel Krummenacher, Brian McWilliams, Yannic Kilcher, Joachim M. Buhmann, Nicolai Meinshausen
Comments: To appear in Advances in Neural Information Processing Systems 29 (NIPS 2016)
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Adaptive stochastic gradient methods such as AdaGrad have gained popularity
in particular for training deep neural networks. The most commonly used and
studied variant maintains a diagonal matrix approximation to second order
information by accumulating past gradients which are used to tune the step size
adaptively. In certain situations the full-matrix variant of AdaGrad is
expected to attain better performance, however in high dimensions it is
computationally impractical. We present Ada-LR and RadaGrad two computationally
efficient approximations to full-matrix AdaGrad based on randomized
dimensionality reduction. They are able to capture dependencies between
features and achieve similar performance to full-matrix AdaGrad but at a much
smaller computational cost. We show that the regret of Ada-LR is close to the
regret of full-matrix AdaGrad which can have an up-to exponentially smaller
dependence on the dimension than the diagonal variant. Empirically, we show
that Ada-LR and RadaGrad perform similarly to full-matrix AdaGrad. On the task
of training convolutional neural networks as well as recurrent neural networks,
RadaGrad achieves faster convergence than diagonal AdaGrad.

Deep Learning for the Classification of Lung Nodules

He Yang, Hengyong Yu, Ge Wang
Subjects: Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Deep learning, as a promising new area of machine learning, has attracted a
rapidly increasing attention in the field of medical imaging. Compared to the
conventional machine learning methods, deep learning requires no hand-tuned
feature extractor, and has shown a superior performance in many visual object
recognition applications. In this study, we develop a deep convolutional neural
network (CNN) and apply it to thoracic CT images for the classification of lung
nodules. We present the CNN architecture and classification accuracy for the
original images of lung nodules. In order to understand the features of lung
nodules, we further construct new datasets, based on the combination of
artificial geometric nodules and some transformations of the original images,
as well as a stochastic nodule shape model. It is found that simplistic
geometric nodules cannot capture the important features of lung nodules.

Variational Boosting: Iteratively Refining Posterior Approximations

Andrew C. Miller, Nicholas Foti, Ryan P. Adams
Comments: 21 pages, 9 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

We propose a black-box variational inference method to approximate
intractable distributions with an increasingly rich approximating class. Our
method, termed variational boosting, iteratively refines an existing
variational approximation by solving a sequence of optimization problems,
allowing the practitioner to trade computation time for accuracy. We show how
to expand the variational approximating class by incorporating additional
covariance structure and by introducing new components to form a mixture. We
apply variational boosting to synthetic and real statistical models, and show
that resulting posterior inferences compare favorably to existing posterior
approximation algorithms in both accuracy and efficiency.

Efficient Stochastic Inference of Bitwise Deep Neural Networks

Recently published methods enable training of bitwise neural networks which
allow reduced representation of down to a single bit per weight. We present a
method that exploits ensemble decisions based on multiple stochastically
sampled network models to increase performance figures of bitwise neural
networks in terms of classification accuracy at inference. Our experiments with
the CIFAR-10 and GTSRB datasets show that the performance of such network
ensembles surpasses the performance of the high-precision base model. With this
technique we achieve 5.81% best classification error on CIFAR-10 test set using
bitwise networks. Concerning inference on embedded systems we evaluate these
bitwise networks using a hardware efficient stochastic rounding procedure. Our
work contributes to efficient embedded bitwise neural networks.

Fast Video Classification via Adaptive Cascading of Deep Models

Haichen Shen, Seungyeop Han, Matthai Philipose, Arvind Krishnamurthy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recent advances have enabled “oracle” classifiers that can classify across
many classes and input distributions with high accuracy without retraining.
However, these classifiers are relatively heavyweight, so that applying them to
classify video is costly. We show that day-to-day video exhibits highly skewed
class distributions over the short term, and that these distributions can be
classified by much simpler models. We formulate the problem of detecting the
short-term skews online and exploiting models based on it as a new sequential
decision making problem dubbed the Online Bandit Problem, and present a new
algorithm to solve it. When applied to recognizing faces in TV shows and
movies, we realize end-to-end classification speedups of 2.5-8.5x/2.8-12.7x (on
GPU/CPU) relative to a state-of-the-art convolutional neural network, at
competitive accuracy.

A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective

SamanehSorournejad, Zahra Zojaji, Reza Ebrahimi Atani, Amir Hassan Monadjemi
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Learning (cs.LG)

Credit card plays a very important rule in today’s economy. It becomes an
unavoidable part of household, business and global activities. Although using
credit cards provides enormous benefits when used carefully and
responsibly,significant credit and financial damages may be caused by
fraudulent activities. Many techniques have been proposed to confront the
growth in credit card fraud. However, all of these techniques have the same
goal of avoiding the credit card fraud; each one has its own drawbacks,
advantages and characteristics. In this paper, after investigating difficulties
of credit card fraud detection, we seek to review the state of the art in
credit card fraud detection techniques, data sets and evaluation criteria.The
advantages and disadvantages of fraud detection methods are enumerated and
compared.Furthermore, a classification of mentioned techniques into two main
fraud detection approaches, namely, misuses (supervised) and anomaly detection
(unsupervised) is presented. Again, a classification of techniques is proposed
based on capability to process the numerical and categorical data sets.
Different data sets used in literature are then described and grouped into real
and synthesized data and the effective and common attributes are extracted for
further usage.Moreover, evaluation employed criterions in literature are
collected and discussed.Consequently, open issues for credit card fraud
detection are explained as guidelines for new researchers.

Learning the Number of Neurons in Deep Networks

Jose M Alvarez, Mathieu Salzmann
Comments: NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Nowadays, the number of layers and of neurons in each layer of a deep network
are typically set manually. While very deep and wide networks have proven
effective in general, they come at a high memory and computation cost, thus
making them impractical for constrained platforms. These networks, however, are
known to have many redundant parameters, and could thus, in principle, be
replaced by more compact architectures. In this paper, we introduce an approach
to automatically determining the number of neurons in each layer of a deep
network during learning. To this end, we propose to make use of a group
sparsity regularizer on the parameters of the network, where each group is
defined to act on a single neuron. Starting from an overcomplete network, we
show that our approach can reduce the number of parameters by up to 80\% while
retaining or even improving the network accuracy.

Local minima in training of deep networks

Grzegorz Swirszcz, Wojciech Marian Czarnecki, Razvan Pascanu
Comments: submitted to ICLR 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

There has been a lot of recent interest in trying to characterize the error
surface of deep models. This stems from a long standing question. Given that
deep networks are highly nonlinear systems optimized by local gradient methods,
why do they not seem to be affected by bad local minima? It is widely believed
that training of deep models using gradient methods works so well because the
error surface either has no local minima, or if they exist they need to be
close in value to the global minimum. It is known that such results hold under
very strong assumptions which are not satisfied by real models. In this paper
we present examples showing that for such theorem to be true additional
assumptions on the data, initialization schemes and/or the model classes have
to be made. We look at the particular case of finite size datasets. We
demonstrate that in this scenario one can construct counter-examples (datasets
or initialization schemes) when the network does become susceptible to bad
local minima over the weight space.

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani
Comments: Submitted to ICASSP 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

Deep clustering is the first method to handle general audio separation
scenarios with multiple sources of the same type and an arbitrary number of
sources, performing impressively in speaker-independent speech separation
tasks. However, little is known about its effectiveness in other challenging
situations such as music source separation. Contrary to conventional networks
that directly estimate the source signals, deep clustering generates an
embedding for each time-frequency bin, and separates sources by clustering the
bins in the embedding space. We show that deep clustering outperforms
conventional networks on a singing voice separation task, in both matched and
mismatched conditions, even though conventional networks have the advantage of
end-to-end training for best signal approximation, presumably because its more
flexible objective engenders better regularization. Since the strengths of deep
clustering and conventional network architectures appear complementary, we
explore combining them in a single hybrid network trained via an approach akin
to multi-task learning. Remarkably, the combination significantly outperforms
either of its components.

Using LSTM recurrent neural networks for detecting anomalous behavior of LHC superconducting magnets

Maciej Wielgosz, Andrzej Skoczeń, Matej Mertik
Subjects: Instrumentation and Detectors (physics.ins-det); Learning (cs.LG); Accelerator Physics (physics.acc-ph)

The superconducting LHC magnets are coupled with an electronic monitoring
system which records and analyses voltage time series reflecting their
performance. A currently used system is based on a range of preprogrammed
triggers which launches protection procedures when a misbehavior of the magnets
is detected. All the procedures used in the protection equipment were designed
and implemented according to known working scenarios of the system and are
updated and monitored by human operators.

This paper proposes a novel approach to monitoring and fault protection of
the Large Hadron Collider (LHC) superconducting magnets which employs
state-of-the-art Deep Learning algorithms. Consequently, the authors of the
paper decided to examine the performance of LSTM recurrent neural networks for
anomaly detection in voltage time series of the magnets. In order to address
this challenging task different network architectures and hyper-parameters were
used to achieve the best possible performance of the solution. The regression
results were measured in terms of RMSE for different number of future steps and
history length taken into account for the prediction. The best result of
RMSE=0.00104 was obtained for a network of 128 LSTM cells within the internal
layer and 16 steps history buffer.

Optical Flow Requires Multiple Strategies (but only one network)

Tal Schuster, Lior Wolf, David Gadot
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We show that the matching problem that underlies optical flow requires
multiple strategies, depending on the amount of image motion and other factors.
We then study the implications of this observation on training a deep neural
network for representing image patches in the context of descriptor based
optical flow. We propose a metric learning method, which selects suitable
negative samples based on the nature of the true match. This type of training
produces a network that displays multiple strategies depending on the input and
leads to state of the art results on the KITTI 2012 and KITTI 2015 optical flow
benchmarks.

Information Theory

Augustin's Method – Part II: The Sphere Packing Bound

Barış Nakiboğlu
Comments: 34 pages. The original submission (arXiv:1608.02424v1) is split into two upon the suggestion of the executive editor of IT transactions
Subjects: Information Theory (cs.IT)

The channel coding problem is reviewed for an abstract framework. If the
scaled Renyi capacities of a sequence of channels converge to a finite
continuous function (varphi) on an interval of the form ((1-varepsilon,1])
for an (varepsilon>0), then the capacity of the sequence of channels is
(varphi(1)). If the convergence holds on an interval of the form
((1-varepsilon,1+varepsilon)) then the strong converse holds. Both hypotheses
hold for large classes of product channels and for certain memoryless Poisson
channels. A sphere packing bound with a polynomial prefactor is established for
the decay rate of the error probability with the block length on any sequence
of product channels ({mathcal{W}_{[1,n]}}_{ninmathbb{Z}^{+}}) satisfying
(max_{tleq n} C_{0.5,mathcal{W}_{t}}=mathit{O}(ln n)). For discrete
stationary product channels with feedback sphere packing exponent is proved to
bound the exponential decay rate of the error probability with block length
from above. The latter result continues to hold for product channels with
feedback satisfying a milder stationarity hypothesis. A sphere packing bound
with a polynomial prefactor is established for certain memoryless Poisson
channels.

Communications and Signals Design for Wireless Power Transmission

Yong Zeng, Bruno Clerckx, Rui Zhang
Comments: Invited tutorial paper, submitted for publication, 26 pages, 11 figures
Subjects: Information Theory (cs.IT)

Radiative wireless power transfer (WPT) is a promising technology to provide
cost-effective and real-time power supplies to wireless devices. Although
radiative WPT shares many similar characteristics with the extensively studied
wireless information transfer or communication, they also differ significantly
in terms of design objectives, transmitter/receiver architectures and hardware
constraints, etc. In this article, we first give an overview on the various WPT
technologies, the historical development of the radiative WPT technology and
the main challenges in designing contemporary radiative WPT systems. Then, we
focus on discussing the new communication and signal processing techniques that
can be applied to tackle these challenges. Topics discussed include energy
harvester modeling, energy beamforming for WPT, channel acquisition, power
region characterization in multi-user WPT, waveform design with linear and
non-linear energy receiver model, safety and health issues of WPT, massive MIMO
(multiple-input multiple-output) and millimeter wave (mmWave) enabled WPT,
wireless charging control, and wireless power and communication systems
co-design. We also point out directions that are promising for future research.

Compute-and-Forward in Cell-Free Massive MIMO: Great Performance with Low Backhaul Load

Qinhui Huang, Alister Burr
Subjects: Information Theory (cs.IT)

In this paper, we consider the uplink of cell-free massive MIMO systems,
where a large number of distributed single antenna access points (APs) serve a
much smaller number of users simultaneously via limited backhaul. For the first
time, we investigate the performance of compute-and-forward (C&F) in such an
ultra dense network with a realistic channel model (including fading, pathloss
and shadowing). By utilising the characteristic of pathloss, a low complexity
coefficient selection algorithm for C&F is proposed. We also give a greedy AP
selection method for message recovery. Additionally, we compare the performance
of C&F to some other promising linear strategies for distributed massive MIMO,
such as small cells (SC) and maximum ratio combining (MRC). Numerical results
reveal that C&F not only reduces the backhaul load, but also significantly
increases the system throughput for the symmetric scenario.

On the List-Decodability of Random Self-Orthogonal Codes

Lingfei Jin, Chaoping Xing, Xiande Zhang
Subjects: Information Theory (cs.IT)

In 2011, Guruswami-H{aa}stad-Kopparty cite{Gru} showed that the
list-decodability of random linear codes is as good as that of general random
codes. In the present paper, we further strengthen the result by showing that
the list-decodability of random {it Euclidean self-orthogonal} codes is as
good as that of general random codes as well, i.e., achieves the classical
Gilbert-Varshamov bound. Specifically, we show that, for any fixed finite field
(F_q), error fraction (deltain (0,1-1/q)) satisfying (1-H_q(delta)le
frac12) and small (epsilon>0), with high probability a random Euclidean
self-orthogonal code over (F_q) of rate (1-H_q(delta)-epsilon) is ((delta,
O(1/epsilon)))-list-decodable. This generalizes the result of linear codes to
Euclidean self-orthogonal codes. In addition, we extend the result to list
decoding {it symplectic dual-containing} codes by showing that the
list-decodability of random symplectic dual-containing codes achieves the
quantum Gilbert-Varshamov bound as well. This implies that list-decodability of
quantum stabilizer codes can achieve the quantum Gilbert-Varshamov bound.

The counting argument on self-orthogonal codes is an important ingredient to
prove our result.

Throughput Efficient Large M2M Networks through Incremental Redundancy Combining

Amogh Rajanna, Mos Kaveh
Comments: 6 pages, 6 figures, submitted to IEEE Wireless Communications Networking Conference (WCNC) 2017 Workshop (M2M and Internet of Things). arXiv admin note: substantial text overlap with arXiv:1508.02117
Subjects: Information Theory (cs.IT)

In this paper, we investigate the performance of incremental redundancy
combining as a new cooperative relaying protocol for large M2M networks with
opportunistic relaying. The nodes in the large M2M network are modeled by a
Poisson Point Process, experience Rayleigh fading and utilize slotted ALOHA as
the MAC protocol. The progress rate density (PRD) of the M2M network is used to
quantify the performance of proposed relaying protocol and compare it to
conventional multihop relaying with no cooperation. It is shown that
incremental redundancy combining in a large M2M network provides substantial
throughput improvements over conventional relaying with no cooperation at all
practical values of the network parameters.

Coded Caching with Distributed Storage

Tianqiong Luo, Vaneet Aggarwal, Borja Peleato
Comments: submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)

Content delivery networks store information distributed across multiple
servers, so as to balance the load and avoid unrecoverable losses in case of
node or disk failures. Coded caching has been shown to be a useful technique
which can reduce peak traffic rates by pre-fetching popular content at the end
users and encoding transmissions so that different users can extract different
information from the same packet. On one hand, distributed storage limits the
capability of combining content from different servers into a single message,
causing performance losses in coded caching schemes. But, on the other hand,
the inherent redundancy existing in distributed storage systems can be used to
improve the performance of those schemes through parallelism.

This paper designs a scheme combining distributed storage of the content in
multiple servers and an efficient coded caching algorithm for delivery to the
users. This scheme is shown to reduce the peak transmission rate below that of
state-of-the-art algorithms.

On Storage Allocation in Cache-Enabled Interference Channels with Mixed CSIT

Mohammad Ali Tahmasbi Nejad, Seyed Pooya Shariatpanahi, Babak Hossein Khalaj
Subjects: Information Theory (cs.IT)

Recently, it has been shown that in a cache-enabled interference channel, the
storage at the transmit and receive sides are of equal value in terms of
Degrees of Freedom (DoF). This is derived by assuming full Channel State
Information at the Transmitter (CSIT). In this paper, we consider a more
practical scenario, where a training/feedback phase should exist for obtaining
CSIT, during which instantaneous channel state is not known to the
transmitters. This results in a combination of delayed and current CSIT
availability, called mixed CSIT. In this setup, we derive DoF of a
cache-enabled interference channel with mixed CSIT, which depends on the memory
available at transmit and receive sides as well as the training/feedback phase
duration. In contrast to the case of having full CSIT, we prove that, in our
setup, the storage at the receive side is more valuable than the one at the
transmit side. This is due to the fact that cooperation opportunities granted
by transmitters’ caches are strongly based on instantaneous CSIT availability.
However, multi-casting opportunities provided by receivers’ caches are robust
to such imperfection.

Robust Regularized Least-Squares Beamforming Approach to Signal Estimation

Mohamed Suliman, Tarig Ballal, Tareq Y. Al-Naffouri
Comments: 5 pages, 2 figures, conference
Subjects: Information Theory (cs.IT)

In this paper, we address the problem of robust adaptive beamforming of
signals received by a linear array. The challenge associated with the
beamforming problem is twofold. Firstly, the process requires the inversion of
the usually ill-conditioned covariance matrix of the received signals.
Secondly, the steering vector pertaining to the direction of arrival of the
signal of interest is not known precisely. To tackle these two challenges, the
standard capon beamformer is manipulated to a form where the beamformer output
is obtained as a scaled version of the inner product of two vectors. The two
vectors are linearly related to the steering vector and the received signal
snapshot, respectively. The linear operator, in both cases, is the square root
of the covariance matrix. A regularized least-squares (RLS) approach is
proposed to estimate these two vectors and to provide robustness without
exploiting prior information. Simulation results show that the RLS beamformer
using the proposed regularization algorithm outperforms state-of-the-art
beamforming algorithms, as well as another RLS beamformers using a standard
regularization approaches.

A Sequence Construction of Cyclic Codes over Finite Fields

Cunsheng Ding
Comments: arXiv admin note: substantial text overlap with arXiv:1206.4370
Subjects: Information Theory (cs.IT)

Due to their efficient encoding and decoding algorithms, cyclic codes, a
subclass of linear codes, have applications in communication systems, consumer
electronics, and data storage systems. There are several approaches to
constructing all cyclic codes over finite fields, including the generator
matrix approach, the generator polynomial approach, and the generating
idempotent approach. Another one is a sequence approach, which has been
intensively investigated in the past decade. The objective of this paper is to
survey the progress in this direction in the past decade. Many open problems
are also presented in this paper.

A Class of Two-Weight and Three-Weight Linear Codes and Their Duals

Li Liu, Xianhong Xie, Lanqiang Li
Subjects: Information Theory (cs.IT)

The objective of this paper is to construct a class of linear codes with two
nonzero weights and three nonzero weights by using the general trace functions,
which weight distributions has been determined. These linear codes contain some
optimal codes, which meets certain bound on linear codes. The dual codes are
also studied and proved to be optimal or almost optimal. These codes may have
applications in authentication codes, secret sharing schemes and strongly
regular graphs.

Spectrum Sharing Radar: Coexistence via Xampling

Deborah Cohen, Kumar Vijay Mishra, Yonina C. Eldar
Subjects: Information Theory (cs.IT)

This paper presents a spectrum sharing technology enabling interference-free
operation of a surveillance radar and communication transmissions over a common
spectrum. A cognitive radio receiver senses the spectrum using low sampling and
processing rates. The radar is a cognitive system that employs a Xampling-based
receiver and transmits in several narrow bands. Our main contribution is the
alliance of two previous ideas, CRo and cognitive radar (CRr), and their
adaptation to solve the spectrum sharing problem.

The second generalized Hamming weight of some evaluation codes arising from a projective torus

Manuel Gonzalez-Sarabia, Eduardo Camps, Eliseo Sarmiento, Rafael H. Villarreal
Subjects: Information Theory (cs.IT); Commutative Algebra (math.AC)

In this paper we find the second generalized Hamming weight of some
evaluation codes arising from a projective torus, and it allows to compute the
second generalized Hamming weight of the codes parameterized by the edges of
any complete bipartite graph. Also, at the beginning, we obtain some results
about the generalized Hamming weights of some evaluation codes arising from a
complete intersection when the minimum distance is known and they are
non–degenerate codes. Finally we give an example where we use these results to
determine the complete weight hierarchy of some codes.

On MDS Negacyclic LCD Codes

Mustafa Sarı, Mehmet Emin Koroglu
Subjects: Information Theory (cs.IT)

This paper is devoted to the study of linear codes with complementary-duals
(LCD) arising from negacyclic codes over finite fields (mathbb{F}_{q},) where
(q) is an odd prime power. We obtain two classes of MDS negacyclic LCD codes of
length (n|frac{{q – 1}}{2}), (n|frac{{q + 1}}{2}) and a class of negacyclic
LCD codes of length (n=q + 1). Also, we drive some parameters of Hermitian
negacyclic LCD codes over (mathbb{F}_{q^{2}}) of length (n=q^{2}-1) and (n=q –
1). For both Euclidean and Hermitian cases the dimension of these codes are
determined and for some classes the minimum distance is settled. For the other
cases, by studying (q) and (q^{2}-)cyclotomic classes we give lower bounds on
the minimum distance.

Decomposition of bent generalized Boolean functions

Lin Sok, MinJia Shi, Patrick Solé
Comments: 3 pages, submitted to IEEE Communication Letters
Subjects: Information Theory (cs.IT)

A one to one correspondence between regular generalized bent functions from
(F_2^n) to (_{2^m},) and (m-)tuples of Boolean bent functions is
established. This correspondence maps self-dual (resp. anti-self-dual)
generalized bent functions to (m-)tuples of self-dual (resp. anti self-dual)
Boolean bent functions. An application to the classification of regular
generalized bent functions under the extended affine group is given.

A Reinforcement Learning Approach to Power Control and Rate Adaptation in Cellular Networks

Euhanna Ghadimi, Francesco Davide Calabrese, Gunnar Peters, Pablo Soldati
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

Optimizing radio transmission power and user data rates in wireless systems
via power control requires an accurate and instantaneous knowledge of the
system model. While this problem has been extensively studied in the
literature, an efficient solution approaching optimality with the limited
information available in practical systems is still lacking. This paper
presents a reinforcement learning framework for power control and rate
adaptation in the downlink of a radio access network that closes this gap. We
present a comprehensive design of the learning framework that includes the
characterization of the system state, the design of a general reward function,
and the method to learn the control policy. System level simulations show that
our design can quickly learn a power control policy that brings significant
energy savings and fairness across users in the system.

Threshold phenomena for interference with randomly placed sensors

Rafał Kapelko
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)

Assume (n) sensors are initially placed on the half-infinite interval
([0,infty)) according to Poisson process with arrival rate (n.) Let (s ge 0)
be a given real number. We are allowed to move the sensors on the line, so as
that no two sensors are placed at distance less than (s.) When a sensor is
displaced a distance equal to (|m(i)|) the cost of movement is proportional to
some (fixed) power (a>0) of the (|m(i)|) distance traveled. As cost measure for
the displacement of the team of sensors we consider the (a)-total movement
defined as the sum (M_a:=sum_{i=1}^n |m(i)|^a,) for some constans (a>0.) In
this paper we study tradeoffs between interference value (s) and the expected
minum (a)-total movement.