IT博客汇 | arXiv Paper Daily: Tue, 28 Feb 2017

arXiv Paper Daily: Tue, 28 Feb 2017

我爱机器学习(52ml.net)发表于 2017-02-28 00:00:00

Neural and Evolutionary Computing

Low-Precision Batch-Normalized Activations

Benjamin Graham
Comments: 16 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

Artificial neural networks can be trained with relatively low-precision
floating-point and fixed-point arithmetic, using between one and 16 bits.
Previous works have focused on relatively wide-but-shallow, feed-forward
networks. We introduce a quantization scheme that is compatible with training
very deep neural networks. Quantizing the network activations in the middle of
each batch-normalization module can greatly reduce the amount of memory and
computational power needed, with little loss in accuracy.

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick
Comments: 12 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)

Recent work on generative modeling of text has found that variational
auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM
language models (Bowman et al., 2015). This negative result is so far poorly
understood, but has been attributed to the propensity of LSTM decoders to
ignore conditioning information from the encoder. In this paper, we experiment
with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s
dilation architecture, we control the effective context from previously
generated words. In experiments, we find that there is a trade off between the
contextual capacity of the decoder and the amount of encoding information used.
We show that with the right decoder, VAE can outperform LSTM language models.
We demonstrate perplexity gains on two datasets, representing the first
positive experimental result on the use VAE for generative modeling of text.
Further, we conduct an in-depth investigation of the use of VAE (with our new
decoding architecture) for semi-supervised and unsupervised labeling tasks,
demonstrating gains over several strong baselines.

Revisiting NARX Recurrent Neural Networks for Long-Term Dependencies

Robert DiPietro, Nassir Navab, Gregory D. Hager
Comments: Code will be released soon
Subjects: Neural and Evolutionary Computing (cs.NE)

Recurrent neural networks (RNNs) have shown success for many
sequence-modeling tasks, but learning long-term dependencies from data remains
difficult. This is often attributed to the vanishing gradient problem, which
shows that gradient components relating a loss at time (t) to time (t – au)
tend to decay exponentially with ( au).

Long short-term memory (LSTM) and gated recurrent units (GRUs), the most
widely-used RNN architectures, attempt to remedy this problem by making the
decay’s base closer to 1. NARX RNNs take an orthogonal approach: by including
direct connections, or delays, from the past, NARX RNNs make the decay’s
exponent closer to 0. However, as introduced, NARX RNNs reduce the decay’s
exponent only by a factor of (n_d), the number of delays, and simultaneously
increase computation by this same factor.

We introduce a new variant of NARX RNNs, called MIxed hiSTory RNNs, which
addresses these drawbacks. We show that for ( au leq 2^{n_d-1}), MIST RNNs
reduce the decay’s worst-case exponent from ( au / n_d) to (log au), while
maintaining computational complexity that is similar to LSTM and GRUs. We
compare MIST RNNs to simple RNNs, LSTM, and GRUs across 4 diverse tasks. MIST
RNNs outperform all other methods in 2 cases, and in all cases are competitive.

Equivariance Through Parameter-Sharing

Siamak Ravanbakhsh, Jeff Schneider, Barnabas Poczos
Subjects: Machine Learning (stat.ML); Neural and Evolutionary Computing (cs.NE)

We propose to study equivariance in deep neural networks through parameter
symmetries. In particular, given a group G that acts discretely on the input
and output of a standard neural network layer (phi_W), we show that
equivariance of (phi_W) is linked to the symmetry group of network parameters
W. We then propose a sparse parameter-sharing scheme to induce the desirable
symmetry on W. Under some conditions on the action of G, our procedure for
tying the parameters achieves G-equivariance and guarantee sensitivity to all
other permutation groups outside G. We demonstrate the relation of our approach
to recently-proposed “structured” neural layers such as group-convolution and
graph-convolution which leads to new insights and improvement of these
operations.

Deep Voice: Real-time Neural Text-to-Speech

Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi
Comments: Submitted to ICML 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

We present Deep Voice, a production-quality text-to-speech system constructed
entirely from deep neural networks. Deep Voice lays the groundwork for truly
end-to-end neural speech synthesis. The system comprises five major building
blocks: a segmentation model for locating phoneme boundaries, a
grapheme-to-phoneme conversion model, a phoneme duration prediction model, a
fundamental frequency prediction model, and an audio synthesis model. For the
segmentation model, we propose a novel way of performing phoneme boundary
detection with deep neural networks using connectionist temporal classification
(CTC) loss. For the audio synthesis model, we implement a variant of WaveNet
that requires fewer parameters and trains faster than the original. By using a
neural network for each component, our system is simpler and more flexible than
traditional text-to-speech systems, where each component requires laborious
feature engineering and extensive domain expertise. Finally, we show that
inference with our system can be performed faster than real time and describe
optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x
speedups over existing implementations.

Adaptive Neural Networks for Fast Test-Time Prediction

Tolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We present an approach to adaptively utilize deep neural networks in order to
reduce the evaluation time on new examples without loss of classification
performance. Rather than attempting to redesign or approximate existing
networks, we propose two schemes that adaptively utilize networks. First, we
pose an adaptive network evaluation scheme, where we learn a system to
adaptively choose the components of a deep network to be evaluated for each
example. By allowing examples correctly classified using early layers of the
system to exit, we avoid the computational time associated with full evaluation
of the network. Building upon this approach, we then learn a network selection
system that adaptively selects the network to be evaluated for each example. We
exploit the fact that many examples can be correctly classified using
relatively efficient networks and that complex, computationally costly networks
are only necessary for a small fraction of examples. By avoiding evaluation of
these complex networks for a large fraction of examples, computational time can
be dramatically reduced. Empirically, these approaches yield dramatic
reductions in computational cost, with up to a 2.8x speedup on state-of-the-art
networks from the ImageNet image recognition challenge with minimal (less than
1%) loss of accuracy.

On the Origin of Deep Learning

Haohan Wang, Bhiksha Raj, Eric P. Xing
Comments: 81 pages, 200 references
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

This paper is a review of the evolutionary history of deep learning models.
It covers from the genesis of neural networks when associationism modeling of
the brain is studied, to the models that dominate the last decade of research
in deep learning like convolutional neural networks, deep belief networks, and
recurrent neural networks, and extends to popular recent models like
variational autoencoder and generative adversarial nets. In addition to a
review of these models, this paper primarily focuses on the precedents of the
models above, examining how the initial ideas are assembled to construct the
early models and how these preliminary models are developed into their current
forms. Many of these evolutionary paths last more than half a century and have
a diversity of directions. For example, CNN is built on prior knowledge of
biological vision system; DBN is evolved from a trade-off of modeling power and
computation complexity of graphical models and many nowadays models are neural
counterparts of ancient linear models. This paper reviews these evolutionary
paths and offers a concise thought flow of how these models are developed, and
aims to provide a thorough background for deep learning. More importantly,
along with the path, this paper summarizes the gist behind these milestones and
proposes many directions to guide the future research of deep learning.

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
Comments: Accepted to IJCNN2017, Anchorage, Alaska, USA
Subjects: Sound (cs.SD); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Environmental audio tagging is a newly proposed task to predict the presence
or absence of a specific audio event in a chunk. Deep neural network (DNN)
based methods have been successfully adopted for predicting the audio tags in
the domestic audio scene. In this paper, we propose to use a convolutional
neural network (CNN) to extract robust features from mel-filter banks (MFBs),
spectrograms or even raw waveforms for audio tagging. Gated recurrent unit
(GRU) based recurrent neural networks (RNNs) are then cascaded to model the
long-term temporal structure of the audio signal. To complement the input
information, an auxiliary CNN is designed to learn on the spatial features of
stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging)
of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE
2016) challenge. Compared with our recent DNN-based method, the proposed
structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the
development set. The spatial features can further reduce the EER to 0.10. The
performance of the end-to-end learning on raw waveforms is also comparable.
Finally, on the evaluation set, we get the state-of-the-art performance with
0.12 EER while the performance of the best existing system is 0.15 EER.

Computer Vision and Pattern Recognition

Skin Lesion Classification Using Hybrid Deep Neural Networks

Amirreza Mahbod, Rupert Ecker, Isabella Ellinger
Comments: 5 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Skin cancer is one of the major types of cancers and its incidence has been
increasing over the past decades. Skin lesions can arise from various
dermatologic disorders and can be classified to various types according to
their texture, structure, color and other morphological features. The accuracy
of diagnosis of skin lesions, specifically the discrimination of benign and
malignant lesions, is paramount to ensure appropriate patient treatment.
Machine learning-based classification approaches are among popular automatic
methods for skin lesion classification. While there are many existing methods,
convolutional neural networks (CNN) have shown to be superior over other
classical machine learning methods for object detection and classification
tasks. In this work, a fully automatic computerized method is proposed, which
employs well established pre-trained convolutional neural networks and
ensembles learning to classify skin lesions. We trained the networks using 2000
skin lesion images available from the ISIC 2017 challenge, which has three main
categories and includes 374 melanoma, 254 seborrheic keratosis and 1372 benign
nevi images. The trained classifier was then tested on 150 unlabeled images.
The results, evaluated by the challenge organizer and based on the area under
the receiver operating characteristic curve (AUC), were 84.8% and 93.6% for
Melanoma and seborrheic keratosis binary classification problem, respectively.

The proposed method achieved competitive results to experienced
dermatologist. Further improvement and optimization of the proposed method with
a larger training dataset could lead to a more precise, reliable and robust
method for skin lesion classification.

Age Progression/Regression by Conditional Adversarial Autoencoder

Zhifei Zhang, Yang Song, Hairong Qi
Comments: Accepted by The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

“If I provide you a face image of mine (without telling you the actual age
when I took the picture) and a large amount of face images that I crawled
(containing labeled faces of different ages but not necessarily paired), can
you show me what I would look like when I am 80 or what I was like when I was
5?” The answer is probably a “No.” Most existing face aging works attempt to
learn the transformation between age groups and thus would require the paired
samples as well as the labeled query image. In this paper, we look at the
problem from a generative modeling perspective such that no paired samples is
required. In addition, given an unlabeled image, the generative model can
directly produce the image with desired age attribute. We propose a conditional
adversarial autoencoder (CAAE) that learns a face manifold, traversing on which
smooth age progression and regression can be realized simultaneously. In CAAE,
the face is first mapped to a latent vector through a convolutional encoder,
and then the vector is projected to the face manifold conditional on age
through a deconvolutional generator. The latent vector preserves personalized
face features (i.e., personality) and the age condition controls progression
vs. regression. Two adversarial networks are imposed on the encoder and
generator, respectively, forcing to generate more photo-realistic faces.
Experimental results demonstrate the appealing performance and flexibility of
the proposed framework by comparing with the state-of-the-art and ground truth.

Asymmetric Tri-training for Unsupervised Domain Adaptation

Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Deep-layered models trained on a large number of labeled samples boost the
accuracy of many tasks. It is important to apply such models to different
domains because collecting many labeled samples in various domains is
expensive. In unsupervised domain adaptation, one needs to train a classifier
that works well on a target domain when provided with labeled source samples
and unlabeled target samples. Although many methods aim to match the
distributions of source and target samples, simply matching the distribution
cannot ensure accuracy on the target domain. To learn discriminative
representations for the target domain, we assume that artificially labeling
target samples can result in a good representation. Tri-training leverages
three classifiers equally to give pseudo-labels to unlabeled samples, but the
method does not assume labeling samples generated from a different domain.In
this paper, we propose an asymmetric tri-training method for unsupervised
domain adaptation, where we assign pseudo-labels to unlabeled samples and train
neural networks as if they are true labels. In our work, we use three networks
asymmetrically. By asymmetric, we mean that two networks are used to label
unlabeled target samples and one network is trained by the samples to obtain
target-discriminative representations. We evaluate our method on digit
recognition and sentiment analysis datasets. Our proposed method achieves
state-of-the-art performance on the benchmark digit recognition datasets of
domain adaptation.

Revealing Hidden Potentials of q-Space Imaging in Breast Cancer

Paul Jaeger, Sebastian Bickelhaupt, Frederik Bernd Laun, Wolfgang Lederer, Daniel Heidi, Tristan Anselm Kuder, Daniel Paech, David Bonekamp, Alexander Radbruch, Stefan Delorme, Heinz-Peter Schlemmer, Franziska Steudle, Klaus H. Maier-Hein
Comments: currently under review for MICCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Mammography screening for early detection of breast lesions currently suffers
from high amounts of false positive findings, which result in unnecessary
invasive biopsies. Diffusion-weighted MR images (DWI) can help to reduce many
of these false-positive findings prior to biopsy. Current approaches estimate
tissue properties by means of quantitative parameters taken from generative,
biophysical models fit to the q-space encoded signal under certain assumptions
regarding noise and spatial homogeneity. This process is prone to fitting
instability and partial information loss due to model simplicity. We reveal
previously unexplored potentials of the signal by integrating all data
processing components into a convolutional neural network (CNN) architecture
that is designed to propagate clinical target information down to the raw input
images. This approach enables simultaneous and target-specific optimization of
image normalization, signal exploitation, global representation learning and
classification. Using a multicentric data set of 222 patients, we demonstrate
that our approach significantly improves clinical decision making with respect
to the current state of the art.

Multi-Label Segmentation via Residual-Driven Adaptive Regularization

Byung-Woo Hong, Ja-Keoung Koo, Stefano Soatto
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a variational multi-label segmentation algorithm based on a robust
Huber loss for both the data and the regularizer, minimized within a convex
optimization framework. We introduce a novel constraint on the common areas, to
bias the solution towards mutually exclusive regions. We also propose a
regularization scheme that is adapted to the spatial statistics of the residual
at each iteration, resulting in a varying degree of regularization being
applied as the algorithm proceeds: the effect of the regularizer is strongest
at initialization, and wanes as the solution increasingly fits the data. This
minimizes the bias induced by the regularizer at convergence. We design an
efficient convex optimization algorithm based on the alternating direction
method of multipliers using the equivalent relation between the Huber function
and the proximal operator of the one-norm. We empirically validate our proposed
algorithm on synthetic and real images and offer an information-theoretic
derivation of the cost-function that highlights the modeling choices made.

Visual Translation Embedding Network for Visual Relation Detection

Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat-Seng Chua
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual relations, such as “person ride bike” and “bike next to car”, offer a
comprehensive scene understanding of an image, and have already shown their
great utility in connecting computer vision and natural language. However, due
to the challenging combinatorial complexity of modeling
subject-predicate-object relation triplets, very little work has been done to
localize and predict visual relations. Inspired by the recent advances in
relational representation learning of knowledge bases and convolutional object
detection networks, we propose a Visual Translation Embedding network (VTransE)
for visual relation detection. VTransE places objects in a low-dimensional
relation space where a relation can be modeled as a simple vector translation,
i.e., subject + predicate (approx) object. We propose a novel feature
extraction layer that enables object-relation knowledge transfer in a
fully-convolutional fashion that supports training and inference in a single
forward/backward pass. To the best of our knowledge, VTransE is the first
end-to-end relation detection network. We demonstrate the effectiveness of
VTransE over other state-of-the-art methods on two large-scale datasets: Visual
Relationship and Visual Genome. Note that even though VTransE is a purely
visual model, it is still competitive to the Lu’s multi-modal model with
language priors.

Efficient Privacy Preserving Viola-Jones Type Object Detection via Random Base Image Representation

Xin Jin, Peng Yuan, Xiaodong Li, Chenggen Song, Shiming Ge, Geng Zhao, Yingya Chen
Comments: 6 pages, 3 figures, To appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 – Jul 14, 2017, Hong Kong, Hong Kong
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A cloud server spent a lot of time, energy and money to train a Viola-Jones
type object detector with high accuracy. Clients can upload their photos to the
cloud server to find objects. However, the client does not want the leakage of
the content of his/her photos. In the meanwhile, the cloud server is also
reluctant to leak any parameters of the trained object detectors. 10 years ago,
Avidan & Butman introduced Blind Vision, which is a method for securely
evaluating a Viola-Jones type object detector. Blind Vision uses standard
cryptographic tools and is painfully slow to compute, taking a couple of hours
to scan a single image. The purpose of this work is to explore an efficient
method that can speed up the process. We propose the Random Base Image (RBI)
Representation. The original image is divided into random base images. Only the
base images are submitted randomly to the cloud server. Thus, the content of
the image can not be leaked. In the meanwhile, a random vector and the secure
Millionaire protocol are leveraged to protect the parameters of the trained
object detector. The RBI makes the integral-image enable again for the great
acceleration. The experimental results reveal that our method can retain the
detection accuracy of that of the plain vision algorithm and is significantly
faster than the traditional blind vision, with only a very low probability of
the information leakage theoretically.

A Dataset for Developing and Benchmarking Active Vision

Phil Ammirato, Patrick Poirson, Eunbyung Park, Jana Kosecka, Alexander C. Berg
Comments: To appear at ICRA 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a new public dataset with a focus on simulating robotic vision
tasks in everyday indoor environments using real imagery. The dataset includes
20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely
captured in 9 unique scenes. We train a fast object category detector for
instance detection on our data. Using the dataset we show that, although
increasingly accurate and fast, the state of the art for object detection is
still severely impacted by object scale, occlusion, and viewing direction all
of which matter for robotics applications. We next validate the dataset for
simulating active vision, and use the dataset to develop and evaluate a
deep-network-based system for next best move prediction for object
classification using reinforcement learning. Our dataset is available for
download at cs.unc.edu/~ammirato/active_vision_dataset_website/.

DeepNAT: Deep Convolutional Neural Network for Segmenting Neuroanatomy

Christian Wachinger, Martin Reuter, Tassilo Klein
Comments: Accepted for publication in NeuroImage, special issue “Brain Segmentation and Parcellation”, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

We introduce DeepNAT, a 3D Deep convolutional neural network for the
automatic segmentation of NeuroAnaTomy in T1-weighted magnetic resonance
images. DeepNAT is an end-to-end learning-based approach to brain segmentation
that jointly learns an abstract feature representation and a multi-class
classification. We propose a 3D patch-based approach, where we do not only
predict the center voxel of the patch but also neighbors, which is formulated
as multi-task learning. To address a class imbalance problem, we arrange two
networks hierarchically, where the first one separates foreground from
background, and the second one identifies 25 brain structures on the
foreground. Since patches lack spatial context, we augment them with
coordinates. To this end, we introduce a novel intrinsic parameterization of
the brain volume, formed by eigenfunctions of the Laplace-Beltrami operator. As
network architecture, we use three convolutional layers with pooling, batch
normalization, and non-linearities, followed by fully connected layers with
dropout. The final segmentation is inferred from the probabilistic output of
the network with a 3D fully connected conditional random field, which ensures
label agreement between close voxels. The roughly 2.7 million parameters in the
network are learned with stochastic gradient descent. Our results show that
DeepNAT compares favorably to state-of-the-art methods. Finally, the purely
learning-based method may have a high potential for the adaptation to young,
old, or diseased brains by fine-tuning the pre-trained network with a small
training sample on the target application, where the availability of larger
datasets with manual annotations may boost the overall segmentation accuracy in
the future.

HashBox: Hash Hierarchical Segmentation exploiting Bounding Box Object Detection

Joachim Curto, Irene Zarza, Alexander J. Smola, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a novel approach to address the Simultaneous Detection and
Segmentation problem. Using hierarchical structures we use an efficient and
accurate procedure that exploits the hierarchy feature information using
Locality Sensitive Hashing. We build on recent work that utilizes convolutional
neural networks to detect bounding boxes in an image (Faster R-CNN) and then
use the top similar hierarchical region that best fits each bounding box after
hashing, we call this approach HashBox. We then refine our final segmentation
results by automatic hierarchy pruning. HashBox introduces a train-free
alternative to Hypercolumns. We conduct extensive experiments on Pascal VOC
2012 segmentation dataset, showing that HashBox gives competitive
state-of-the-art object segmentations.

Multi-scale Image Fusion Between Pre-operative Clinical CT and X-ray Microtomography of Lung Pathology

Holger R. Roth, Kai Nagara, Hirohisa Oda, Masahiro Oda, Tomoshi Sugiyama, Shota Nakamura, Kensaku Mori
Comments: In proceedings of International Forum on Medical Imaging, IFMIA 2017, Okinawa, Japan
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Computational anatomy allows the quantitative analysis of organs in medical
images. However, most analysis is constrained to the millimeter scale because
of the limited resolution of clinical computed tomography (CT). X-ray
microtomography ((mu)CT) on the other hand allows imaging of ex-vivo tissues
at a resolution of tens of microns. In this work, we use clinical CT to image
lung cancer patients before partial pneumonectomy (resection of pathological
lung tissue). The resected specimen is prepared for (mu)CT imaging at a voxel
resolution of 50 (mu)m (0.05 mm). This high-resolution image of the lung
cancer tissue allows further insides into understanding of tumor growth and
categorization. For making full use of this additional information, image
fusion (registration) needs to be performed in order to re-align the (mu)CT
image with clinical CT. We developed a multi-scale non-rigid registration
approach. After manual initialization using a few landmark points and rigid
alignment, several levels of non-rigid registration between down-sampled (in
the case of (mu)CT) and up-sampled (in the case of clinical CT)
representations of the image are performed. Any non-lung tissue is ignored
during the computation of the similarity measure used to guide the registration
during optimization. We are able to recover the volume differences introduced
by the resection and preparation of the lung specimen. The average ((pm) std.
dev.) minimum surface distance between (mu)CT and clinical CT at the resected
lung surface is reduced from 3.3 (pm) 2.9 (range: [0.1, 15.9]) to 2.3 mm (pm)
2.8 (range: [0.0, 15.3]) mm. The alignment of clinical CT with (mu)CT will
allow further registration with even finer resolutions of (mu)CT (up to 10
(mu)m resolution) and ultimately with histopathological microscopy images for
further macro to micro image fusion that can aid medical image analysis.

Bioplausible multiscale filtering in retino-cortical processing as a mechanism in perceptual grouping

Nasim Nematzadeh, David M. W. Powers, Trent W. Lewis
Comments: 19 pages, 7 figures, submitted manuscript to Brain Informatics journal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Why does our visual system fail to reconstruct reality, when we look at
certain patterns? Where do Geometrical illusions start to emerge in the visual
pathway? Should computational models of vision have the same visual ability to
detect illusions as we do? This study addresses these questions, by focusing on
a specific underlying neural mechanism involved in our visual experiences that
affects our final perception. Among many types of visual illusion,
‘Geometrical’ and, in particular, ‘Tilt’ illusions are rather important, being
characterized by misperception of geometric patterns involving lines and tiles
in combination with contrasting orientation, size or position. Over the last
decade, many new neurophysiological experiments have led to new insights as to
how, when and where retinal processing takes place, and the encoding nature of
the retinal representation that is sent to the cortex for further processing.
Based on these neurobiological discoveries, we provide computer simulation
evidence to suggest that visual Geometrical illusions are explained in part, by
the interaction of multiscale visual processing performed in the retina. The
output of our retinal stage model is presented for several types of Tilt
illusion, in which the final tilt percept arises from multiple scale processing
of Differences of Gaussian and the perceptual interaction of foreground and
background elements. Our results suggest that this multilevel filtering
explanation, which is a simplified simulation for Retinal Ganglion Cell’s
responses to these patterns is indeed the underlying mechanism connecting
low-level filtering to mid- and high-level explanations such as ‘anchoring
theory’ and ‘perceptual grouping’.

3D Scanning System for Automatic High-Resolution Plant Phenotyping

Chuong V Nguyen, Jurgen Fripp, David R Lovell, Robert Furbank, Peter Kuffner, Helen Daily, Xavier Sirault
Comments: 8 papes, DICTA 2016
Journal-ref: In Digital Image Computing: Techniques and Applications (DICTA),
2016 International Conference on, pp. 1-8. IEEE, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Thin leaves, fine stems, self-occlusion, non-rigid and slowly changing
structures make plants difficult for three-dimensional (3D) scanning and
reconstruction — two critical steps in automated visual phenotyping. Many
current solutions such as laser scanning, structured light, and multiview
stereo can struggle to acquire usable 3D models because of limitations in
scanning resolution and calibration accuracy. In response, we have developed a
fast, low-cost, 3D scanning platform to image plants on a rotating stage with
two tilting DSLR cameras centred on the plant. This uses new methods of camera
calibration and background removal to achieve high-accuracy 3D reconstruction.
We assessed the system’s accuracy using a 3D visual hull reconstruction
algorithm applied on 2 plastic models of dicotyledonous plants, 2 sorghum
plants and 2 wheat plants across different sets of tilt angles. Scan times
ranged from 3 minutes (to capture 72 images using 2 tilt angles), to 30 minutes
(to capture 360 images using 10 tilt angles). The leaf lengths, widths, areas
and perimeters of the plastic models were measured manually and compared to
measurements from the scanning system: results were within 3-4% of each other.
The 3D reconstructions obtained with the scanning system show excellent
geometric agreement with all six plant specimens, even plants with thin leaves
and fine stems.

Adversarial Networks for the Detection of Aggressive Prostate Cancer

Simon Kohl, David Bonekamp, Heinz-Peter Schlemmer, Kaneschka Yaqubi, Markus Hohenfellner, Boris Hadaschik, Jan-Philipp Radtke, Klaus Maier-Hein
Comments: 8 pages, 3 figures; under review as a conference paper at MICCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Semantic segmentation constitutes an integral part of medical image analyses
for which breakthroughs in the field of deep learning were of high relevance.
The large number of trainable parameters of deep neural networks however
renders them inherently data hungry, a characteristic that heavily challenges
the medical imaging community. Though interestingly, with the de facto standard
training of fully convolutional networks (FCNs) for semantic segmentation being
agnostic towards the `structure’ of the predicted label maps, valuable
complementary information about the global quality of the segmentation lies
idle. In order to tap into this potential, we propose utilizing an adversarial
network which discriminates between expert and generated annotations in order
to train FCNs for semantic segmentation. Because the adversary constitutes a
learned parametrization of what makes a good segmentation at a global level, we
hypothesize that the method holds particular advantages for segmentation tasks
on complex structured, small datasets. This holds true in our experiments: We
learn to segment aggressive prostate cancer utilizing MRI images of 152
patients and show that the proposed scheme is superior over the de facto
standard in terms of the detection sensitivity and the dice-score for
aggressive prostate cancer. The achieved relative gains are shown to be
particularly pronounced in the small dataset limit.

Analyzing Modular CNN Architectures for Joint Depth Prediction and Semantic Segmentation

Omid Hosseini Jafari, Oliver Groth, Alexander Kirillov, Michael Ying Yang, Carsten Rother
Comments: Accepted to ICRA 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

This paper addresses the task of designing a modular neural network
architecture that jointly solves different tasks. As an example we use the
tasks of depth estimation and semantic segmentation given a single RGB image.
The main focus of this work is to analyze the cross-modality influence between
depth and semantic prediction maps on their joint refinement. While most
previous works solely focus on measuring improvements in accuracy, we propose a
way to quantify the cross-modality influence. We show that there is a
relationship between final accuracy and cross-modality influence, although not
a simple linear one. Hence a larger cross-modality influence does not
necessarily translate into an improved accuracy. We find that a beneficial
balance between the cross-modality influences can be achieved by network
architecture and conjecture that this relationship can be utilized to
understand different network design choices. Towards this end we propose a
Convolutional Neural Network (CNN) architecture that fuses the state of the
state-of-the-art results for depth estimation and semantic labeling. By
balancing the cross-modality influences between depth and semantic prediction,
we achieve improved results for both tasks using the NYU-Depth v2 benchmark.

Bayesian Nonparametric Unmixing of Hyperspectral Images

Jürgen Hahn, Abdelhak M.Zoubir
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Hyperspectral imaging is an important tool in remote sensing, allowing for
accurate analysis of vast areas. Due to a low spatial resolution, a pixel of a
hyperspectral image rarely represents a single material, but rather a mixture
of different spectra. HSU aims at estimating the pure spectra present in the
scene of interest, referred to as endmembers, and their fractions in each
pixel, referred to as abundances. Today, many HSU algorithms have been
proposed, based either on a geometrical or statistical model. While most
methods assume that the number of endmembers present in the scene is known,
there is only little work about estimating this number from the observed data.
In this work, we propose a Bayesian nonparametric framework that jointly
estimates the number of endmembers, the endmembers itself, and their
abundances, by making use of the Indian Buffet Process as a prior for the
endmembers. Simulation results and experiments on real data demonstrate the
effectiveness of the proposed algorithm, yielding results comparable with
state-of-the-art methods while being able to reliably infer the number of
endmembers. In scenarios with strong noise, where other algorithms provide only
poor results, the proposed approach tends to overestimate the number of
endmembers slightly. The additional endmembers, however, often simply represent
noisy replicas of present endmembers and could easily be merged in a
post-processing step.

A multi-task convolutional neural network for mega-city analysis using very high resolution satellite imagery and geospatial data

Fan Zhang, Bo Du, Liangpei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Mega-city analysis with very high resolution (VHR) satellite images has been
drawing increasing interest in the fields of city planning and social
investigation. It is known that accurate land-use, urban density, and
population distribution information is the key to mega-city monitoring and
environmental studies. Therefore, how to generate land-use, urban density, and
population distribution maps at a fine scale using VHR satellite images has
become a hot topic. Previous studies have focused solely on individual tasks
with elaborate hand-crafted features and have ignored the relationship between
different tasks. In this study, we aim to propose a universal framework which
can: 1) automatically learn the internal feature representation from the raw
image data; and 2) simultaneously produce fine-scale land-use, urban density,
and population distribution maps. For the first target, a deep convolutional
neural network (CNN) is applied to learn the hierarchical feature
representation from the raw image data. For the second target, a novel
CNN-based universal framework is proposed to process the VHR satellite images
and generate the land-use, urban density, and population distribution maps. To
the best of our knowledge, this is the first CNN-based mega-city analysis
method which can process a VHR remote sensing image with such a large data
volume. A VHR satellite image (1.2 m spatial resolution) of the center of Wuhan
covering an area of 2606 km2 was used to evaluate the proposed method. The
experimental results confirm that the proposed method can achieve a promising
accuracy for land-use, urban density, and population distribution maps.

Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition

Xuefeng Xiao, Lianwen Jin, Yafeng Yang, Weixin Yang, Jun Sun, Tianhai Chang
Comments: 15 pages, 7 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Like other problems in computer vision, offline handwritten Chinese character
recognition (HCCR) has achieved impressive results using convolutional neural
network (CNN)-based methods. However, larger and deeper networks are needed to
deliver state-of-the-art results in this domain. Such networks intuitively
appear to incur high computational cost, and require the storage of a large
number of parameters, which renders them unfeasible for deployment in portable
devices. To solve this problem, we propose a Global Supervised Low-rank
Expansion (GSLRE) method and an Adaptive Drop-weight (ADW) technique to solve
the problems of speed and storage capacity. We design a nine-layer CNN for HCCR
consisting of 3,755 classes, and devise an algorithm that can reduce the
networks computational cost by nine times and compress the network to 1/18 of
the original size of the baseline model, with only a 0.21% drop in accuracy. In
tests, the proposed algorithm surpassed the best single-network performance
reported thus far in the literature while requiring only 2.3 MB for storage.
Furthermore, when integrated with our effective forward implementation, the
recognition of an offline character image took only 9.7 ms on a CPU. Compared
with the state-of-the-art CNN model for HCCR, our approach is approximately 30
times faster, yet 10 times more cost efficient.

Seeing What Is Not There: Learning Context to Determine Where Objects Are Missing

Jin Sun, David W. Jacobs
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most of computer vision focuses on what is in an image. We propose to train a
standalone object-centric context representation to perform the opposite task:
seeing what is not there. Given an image, our context model can predict where
objects should exist, even when no object instances are present. Combined with
object detection results, we can perform a novel vision task: finding where
objects are missing in an image. Our model is based on a convolutional neural
network structure. With a specially designed training strategy, the model
learns to ignore objects and focus on context only. It is fully convolutional
thus highly efficient. Experiments show the effectiveness of the proposed
approach in one important accessibility task: finding city street regions where
curb ramps are missing, which could help millions of people with mobility
disabilities.

Spatially Aware Melanoma Segmentation Using Hybrid Deep Learning Techniques

M. Attia, M. Hossny, S. Nahavandi, A. Yazdabadi
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we proposed using a hybrid method that utilises deep
convolutional and recurrent neural networks for accurate delineation of skin
lesion of images supplied with ISBI 2017 lesion segmentation challenge. The
proposed method was trained using 1800 images and tested on 150 images from
ISBI 2017 challenge.

BARCHAN: Blob Alignment for Robust CHromatographic ANalysis

Camille Couprie, Laurent Duval, Maxime Moreaud, Sophie Hénon, Mélinda Tebib, Vincent Souchon
Comments: 15 pages, published in the Special issue for RIVA 2016, 40th International Symposium on Capillary Chromatography and 13th GCxGC Symposium
Journal-ref: Journal of Chromatography A, Volume 1484, February 2017, Pages
65-72
Subjects: Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an)

Comprehensive Two dimensional gas chromatography (GCxGC) plays a central role
into the elucidation of complex samples. The automation of the identification
of peak areas is of prime interest to obtain a fast and repeatable analysis of
chromatograms. To determine the concentration of compounds or pseudo-compounds,
templates of blobs are defined and superimposed on a reference chromatogram.
The templates then need to be modified when different chromatograms are
recorded. In this study, we present a chromatogram and template alignment
method based on peak registration called BARCHAN. Peaks are identified using a
robust mathematical morphology tool. The alignment is performed by a
probabilistic estimation of a rigid transformation along the first dimension,
and a non-rigid transformation in the second dimension, taking into account
noise, outliers and missing peaks in a fully automated way. Resulting aligned
chromatograms and masks are presented on two datasets. The proposed algorithm
proves to be fast and reliable. It significantly reduces the time to results
for GCxGC analysis.

Image Stitching by Line-guided Local Warping with Global Similarity Constraint

Tianzhu Xiang, Gui-Song Xia, Xiang Bai, Liangpei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Low-textured image stitching remains a challenging problem. It is difficult
to achieve good alignment and is easy to break image structures, due to the
insufficient and unreliable point correspondences. Besides, for the viewpoint
variations between multiple images, the stitched images suffer from projective
distortions. To this end, this paper presents a line-guided local warping with
global similarity constraint for image stitching. A two-stage alignment scheme
is adopted for good alignment. More precisely, the line correspondence is
employed as alignment constraint to guide the accurate estimation of projective
warp, then line feature constraints are integrated into mesh-based warping
framework to refine the alignment while preserving image structures. To
mitigate projectve distortions in non-overlapping regions, we combine global
similarity constraint with the projective warps via a weight strategy, so that
the final warp slowly changes from projective to similarity across the image.
This is also integrated into local multiple homographies model for better
parallax handling. Our method is evaluated on a series of images and compared
with several other methods. Experiments demonstrate that the proposed method
provides convincing stitching performance and outperforms other
state-of-the-art methods.

An EM Based Probabilistic Two-Dimensional CCA with Application to Face Recognition

Mehran Safayani, Seyed Hashem Ahmadi, Homayun Afrabandpey, Abdolreza Mirzaei
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Recently, two-dimensional canonical correlation analysis (2DCCA) has been
successfully applied for image feature extraction. The method instead of
concatenating the columns of the images to the one-dimensional vectors,
directly works with two-dimensional image matrices. Although 2DCCA works well
in different recognition tasks, it lacks a probabilistic interpretation. In
this paper, we present a probabilistic framework for 2DCCA called probabilistic
2DCCA (P2DCCA) and an iterative EM based algorithm for optimizing the
parameters. Experimental results on synthetic and real data demonstrate
superior performance in loading factor estimation for P2DCCA compared to 2DCCA.
For real data, three subsets of AR face database and also the UMIST face
database confirm the robustness of the proposed algorithm in face recognition
tasks with different illumination conditions, facial expressions, poses and
occlusions.

Transfer Learning for Domain Adaptation in MRI: Application in Brain Lesion Segmentation

Mohsen Ghafoorian, Alireza Mehrtash, Tina Kapur, Nico Karssemeijer, Elena Marchiori, Mehran Pesteie, Charles R. G. Guttmann, Frank-Erik de Leeuw, Clare M. Tempany, Bram van Ginneken, Andriy Fedorov, Purang Abolmaesumi, Bram Platel, William M. Wells III
Comments: 8 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Magnetic Resonance Imaging (MRI) is widely used in routine clinical diagnosis
and treatment. However, variations in MRI acquisition protocols result in
different appearances of normal and diseased tissue in the images.
Convolutional neural networks (CNNs), which have shown to be successful in many
medical image analysis tasks, are typically sensitive to the variations in
imaging protocols. Therefore, in many cases, networks trained on data acquired
with one MRI protocol, do not perform satisfactorily on data acquired with
different protocols. This limits the use of models trained with large annotated
legacy datasets on a new dataset with a different domain which is often a
recurring situation in clinical settings. In this study, we aim to answer the
following central questions regarding domain adaptation in medical image
analysis: Given a fitted legacy model, 1) How much data from the new domain is
required for a decent adaptation of the original network?; and, 2) What portion
of the pre-trained model parameters should be retrained given a certain number
of the new domain training samples? To address these questions, we conducted
extensive experiments in white matter hyperintensity segmentation task. We
trained a CNN on legacy MR images of brain and evaluated the performance of the
domain-adapted network on the same task with images from a different domain. We
then compared the performance of the model to the surrogate scenarios where
either the same trained network is used or a new network is trained from
scratch on the new dataset.The domain-adapted network tuned only by two
training examples achieved a Dice score of 0.63 substantially outperforming a
similar network trained on the same set of examples from scratch.

Synthesizing Training Data for Object Detection in Indoor Scenes

Georgios Georgakis, Arsalan Mousavian, Alexander C. Berg, Jana Kosecka
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Detection of objects in cluttered indoor environments is one of the key
enabling functionalities for service robots. The best performing object
detection approaches in computer vision exploit deep Convolutional Neural
Networks (CNN) to simultaneously detect and categorize the objects of interest
in cluttered scenes. Training of such models typically requires large amounts
of annotated training data which is time consuming and costly to obtain. In
this work we explore the ability of using synthetically generated composite
images for training state of the art object detectors. We superimpose 2D images
of textured object models into images of real environments at variety of
locations and scales. Our experiments evaluate different superimposition
strategies ranging from purely image-based blending all the way to depth and
semantics informed positioning of the object models to real scenes. We
demonstrate the effectiveness of these object detector training strategies on
publicly available datasets of GMU-Kitchens and Washington RGB-D Scenes v2, and
show how object detectors can be trained with limited amounts of annotated real
scenes with objects present. This charts new opportunities for training
detectors for new objects by exploiting existing object model repositories in
either a purely automatic fashion or with only a very small number of
human-annotated examples.

Video and Accelerometer-Based Motion Analysis for Automated Surgical Skills Assessment

Aneeq Zia, Yachna Sharma, Vinay Bettadapura, Eric L. Sarin, Irfan Essa
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Purpose: Basic surgical skills of suturing and knot tying are an essential
part of medical training. Having an automated system for surgical skills
assessment could help save experts time and improve training efficiency. There
have been some recent attempts at automated surgical skills assessment using
either video analysis or acceleration data. In this paper, we present a novel
approach for automated assessment of OSATS based surgical skills and provide an
analysis of different features on multi-modal data (video and accelerometer
data). Methods: We conduct the largest study, to the best of our knowledge, for
basic surgical skills assessment on a dataset that contained video and
accelerometer data for suturing and knot-tying tasks. We introduce “entropy
based” features – Approximate Entropy (ApEn) and Cross-Approximate Entropy
(XApEn), which quantify the amount of predictability and regularity of
fluctuations in time-series data. The proposed features are compared to
existing methods of Sequential Motion Texture (SMT), Discrete Cosine Transform
(DCT) and Discrete Fourier Transform (DFT), for surgical skills assessment.
Results: We report average performance of different features across all
applicable OSATS criteria for suturing and knot tying tasks. Our analysis shows
that the proposed entropy based features out-perform previous state-of-the-art
methods using video data. For accelerometer data, our method performs better
for suturing only. We also show that fusion of video and acceleration features
can improve overall performance with the proposed entropy features achieving
highest accuracy. Conclusions: Automated surgical skills assessment can be
achieved with high accuracy using the proposed entropy features. Such a system
can significantly improve the efficiency of surgical training in medical
schools and teaching hospitals.

Unifying local and non-local signal processing with graph CNNs

Gilles Puy, Srdan Kitic, Patrick Pérez
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper deals with the unification of local and non-local signal
processing on graphs within a single convolutional neural network (CNN)
framework. Building upon recent works on graph CNNs, we propose to use
convolutional layers that take as inputs two variables, a signal and a graph,
allowing the network to adapt to changes in the graph structure. This also
allows us to learn through training the optimal mixing of locality and
non-locality, in cases where the graph is built on the input signal itself. We
demonstrate the versatility and the effectiveness of our framework on several
types of signals (greyscale and color images, color palettes and speech
signals) and on several applications (style transfer, color transfer, and
denoising).

Fast and Accurate Inference with Adaptive Ensemble Prediction in Image Classification with Deep Neural Networks

Hiroshi Inoue
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Ensembling multiple predictions is a widely used technique to improve the
accuracy of various machine learning tasks. In image classification tasks, for
example, averaging the predictions for multiple patches extracted from the
input image significantly improves accuracy. Using multiple networks trained
independently to make predictions improves accuracy further. One obvious
drawback of the ensembling technique is its higher execution cost during
inference. If we average 100 predictions, the execution cost will be 100 times
as high as the cost without the ensemble. This higher cost limits the
real-world use of ensembling, even though using it is almost the norm to win
image classification competitions. In this paper, we describe a new technique
called adaptive ensemble prediction, which achieves the benefits of ensembling
with much smaller additional execution costs. Our observation behind this
technique is that many easy-to-predict inputs do not require ensembling. Hence
we calculate the confidence level of the prediction for each input on the basis
of the probability of the predicted label, i.e. the outputs from the softmax,
during the ensembling computation. If the prediction for an input reaches a
high enough probability on the basis of the confidence level, we stop
ensembling for this input to avoid wasting computation power. We evaluated the
adaptive ensembling by using various datasets and showed that it reduces the
computation time significantly while achieving similar accuracy to the naive
ensembling.

Low-Precision Batch-Normalized Activations

Benjamin Graham
Comments: 16 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

Artificial neural networks can be trained with relatively low-precision
floating-point and fixed-point arithmetic, using between one and 16 bits.
Previous works have focused on relatively wide-but-shallow, feed-forward
networks. We introduce a quantization scheme that is compatible with training
very deep neural networks. Quantizing the network activations in the middle of
each batch-normalization module can greatly reduce the amount of memory and
computational power needed, with little loss in accuracy.

Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration

Judith Bütepage, Hedvig Kjellström, Danica Kragic
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Fluent and safe interactions of humans and robots require both partners to
anticipate the others’ actions. A common approach to human intention inference
is to model specific trajectories towards known goals with supervised
classifiers. However, these approaches do not take possible future movements
into account nor do they make use of kinematic cues, such as legible and
predictable motion. The bottleneck of these methods is the lack of an accurate
model of general human motion. In this work, we present a conditional
variational autoencoder that is trained to predict a window of future human
motion given a window of past frames. Using skeletal data obtained from RGB
depth images, we show how this unsupervised approach can be used for online
motion prediction for up to 1660 ms. Additionally, we demonstrate online target
prediction within the first 300-500 ms after motion onset without the use of
target specific training data. The advantage of our probabilistic approach is
the possibility to draw samples of possible future motions. Finally, we
investigate how movements and kinematic cues are represented on the learned low
dimensional manifold.

Bayesian Nonparametric Feature and Policy Learning for Decision-Making

Jürgen Hahn, Abdelhak M. Zoubir
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Learning from demonstrations has gained increasing interest in the recent
past, enabling an agent to learn how to make decisions by observing an
experienced teacher. While many approaches have been proposed to solve this
problem, there is only little work that focuses on reasoning about the observed
behavior. We assume that, in many practical problems, an agent makes its
decision based on latent features, indicating a certain action. Therefore, we
propose a generative model for the states and actions. Inference reveals the
number of features, the features, and the policies, allowing us to learn and to
analyze the underlying structure of the observed behavior. Further, our
approach enables prediction of actions for new states. Simulations are used to
assess the performance of the algorithm based upon this model. Moreover, the
problem of learning a driver’s behavior is investigated, demonstrating the
performance of the proposed model in a real-world scenario.

Supervised Learning of Labeled Pointcloud Differences via Cover-Tree Entropy Reduction

Abraham Smith, Paul Bendich, John Harer, Jay Hineman
Comments: Distribution Statement A – Approved for public release, distribution is unlimited
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We introduce a new algorithm, called CDER, for supervised machine learning
that merges the multi-scale geometric properties of Cover Trees with the
information-theoretic properties of entropy. CDER applies to a training set of
labeled pointclouds embedded in a common Euclidean space. If typical
pointclouds corresponding to distinct labels tend to differ at any scale in any
sub-region, CDER can identify these differences in (typically) linear time,
creating a set of distributional coordinates which act as a feature extraction
mechanism for supervised learning. We describe theoretical properties and
implementation details of CDER, and illustrate its benefits on several
synthetic examples.

CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

Andre Viebke, Suejb Memeti, Sabri Pllana, Ajith Abraham
Comments: The Journal of Supercomputing, 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Deep learning is an important component of big-data analytic tools and
intelligent applications, such as, self-driving cars, computer vision, speech
recognition, or precision medicine. However, the training process is
computationally intensive, and often requires a large amount of time if
performed sequentially. Modern parallel computing systems provide the
capability to reduce the required training time of deep neural networks. In
this paper, we present our parallelization scheme for training convolutional
neural networks (CNN) named Controlled Hogwild with Arbitrary Order of
Synchronization (CHAOS). Major features of CHAOS include the support for thread
and vector parallelism, non-instant updates of weight parameters during
back-propagation without a significant delay, and implicit synchronization in
arbitrary order. CHAOS is tailored for parallel computing systems that are
accelerated with the Intel Xeon Phi. We evaluate our parallelization approach
empirically using measurement techniques and performance modeling for various
numbers of threads and CNN architectures. Experimental results for the MNIST
dataset of handwritten digits using the total number of threads on the Xeon Phi
show speedups of up to 103x compared to the execution on one thread of the Xeon
Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x
compared to the sequential execution on Intel Core i5.

Learning Deep NBNN Representations for Robust Place Categorization

Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

This paper presents an approach for semantic place categorization using data
obtained from RGB cameras. Previous studies on visual place recognition and
classification have shown that, by considering features derived from
pre-trained Convolutional Neural Networks (CNNs) in combination with part-based
classification models, high recognition accuracy can be achieved, even in
presence of occlusions and severe viewpoint changes. Inspired by these works,
we propose to exploit local deep representations, representing images as set of
regions applying a Na”{i}ve Bayes Nearest Neighbor (NBNN) model for image
classification. As opposed to previous methods where CNNs are merely used as
feature extractors, our approach seamlessly integrates the NBNN model into a
fully-convolutional neural network. Experimental results show that the proposed
algorithm outperforms previous methods based on pre-trained CNN models and
that, when employed in challenging robot place recognition tasks, it is robust
to occlusions, environmental and sensor changes.

Adaptive Neural Networks for Fast Test-Time Prediction

We present an approach to adaptively utilize deep neural networks in order to
reduce the evaluation time on new examples without loss of classification
performance. Rather than attempting to redesign or approximate existing
networks, we propose two schemes that adaptively utilize networks. First, we
pose an adaptive network evaluation scheme, where we learn a system to
adaptively choose the components of a deep network to be evaluated for each
example. By allowing examples correctly classified using early layers of the
system to exit, we avoid the computational time associated with full evaluation
of the network. Building upon this approach, we then learn a network selection
system that adaptively selects the network to be evaluated for each example. We
exploit the fact that many examples can be correctly classified using
relatively efficient networks and that complex, computationally costly networks
are only necessary for a small fraction of examples. By avoiding evaluation of
these complex networks for a large fraction of examples, computational time can
be dramatically reduced. Empirically, these approaches yield dramatic
reductions in computational cost, with up to a 2.8x speedup on state-of-the-art
networks from the ImageNet image recognition challenge with minimal (less than
1%) loss of accuracy.

Mode Regularized Generative Adversarial Networks

Tong Che, Yanran Li, Athul Paul Jacob, Yoshua Bengio, Wenjie Li
Comments: Published as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Although Generative Adversarial Networks achieve state-of-the-art results on
a variety of generative tasks, they are regarded as highly unstable and prone
to miss modes. We argue that these bad behaviors of GANs are due to the very
particular functional shape of the trained discriminators in high dimensional
spaces, which can easily make training stuck or push probability mass in the
wrong direction, towards that of higher concentration than that of the data
generating distribution. We introduce several ways of regularizing the
objective, which can dramatically stabilize the training of GAN models. We also
show that our regularizers can help the fair distribution of probability mass
across the modes of the data generating distribution, during the early phases
of training and thus providing a unified solution to the missing modes problem.

Artificial Intelligence

Differentiable Learning of Logical Rules for Knowledge Base Completion

Fan Yang, Zhilin Yang, William W. Cohen
Subjects: Artificial Intelligence (cs.AI)

Learned models composed of probabilistic logical rules are useful for many
tasks, such as knowledge base completion. Unfortunately this learning problem
is difficult, since determining the structure of the theory normally requires
solving a discrete optimization problem. In this paper, we propose an
alternative approach: a completely differentiable model for learning sets of
first-order rules. The approach is inspired by a recently-developed
differentiable logic, i.e. a subset of first-order logic for which inference
tasks can be compiled into sequences of differentiable operations. Here we
describe a neural controller system which learns how to sequentially compose
the these primitive differentiable operations to solve reasoning tasks, and in
particular, to perform knowledge base completion. The long-term goal of this
work is to develop integrated, end-to-end systems that can learn to perform
high-level logical reasoning as well as lower-level perceptual tasks.

Synergistic Team Composition

Ewa Andrejczuk, Juan A. Rodriguez-Aguilar, Carme Roig, Carles Sierra
Subjects: Artificial Intelligence (cs.AI)

Effective teams are crucial for organisations, especially in environments
that require teams to be constantly created and dismantled, such as software
development, scientific experiments, crowd-sourcing, or the classroom. Key
factors influencing team performance are competences and personality of team
members. Hence, we present a computational model to compose proficient and
congenial teams based on individuals’ personalities and their competences to
perform tasks of different nature. With this purpose, we extend Wilde’s
post-Jungian method for team composition, which solely employs individuals’
personalities. The aim of this study is to create a model to partition agents
into teams that are balanced in competences, personality and gender. Finally,
we present some preliminary empirical results that we obtained when analysing
student performance. Results show the benefits of a more informed team
composition that exploits individuals’ competences besides information about
their personalities.

Criticality and Deep Learning, Part I: Theory vs. Empirics

Dan Oprisa, Peter Toth
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Motivated by the idea that criticality and universality of phase transitions
might play a crucial role in achieving and sustaining learning and intelligent
behaviour in biological and artificial networks, we analyse a theoretical and a
pragmatic experimental set up for critical phenomena in deep learning. On the
theoretical side, we use results from statistical physics to carry out critical
point calculations in feed-forward/fully connected networks, while on the
experimental side we set out to find traces of criticality in deep neural
networks. This is our first step in a series of upcoming investigations to map
out the relationship between criticality and learning in deep networks.

Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio
Comments: 11 pages, 3 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

Despite the successes in capturing continuous distributions, the application
of generative adversarial networks (GANs) to discrete settings, like natural
language tasks, is rather restricted. The fundamental reason is the difficulty
of back-propagation through discrete random variables combined with the
inherent instability of the GAN training objective. To address these problems,
we propose Maximum-Likelihood Augmented Discrete Generative Adversarial
Networks. Instead of directly optimizing the GAN objective, we derive a novel
and low-variance objective using the discriminator’s output that follows
corresponds to the log-likelihood. Compared with the original, the new
objective is proved to be consistent in theory and beneficial in practice. The
experimental results on various discrete datasets demonstrate the effectiveness
of the proposed approach.

Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

Brent Harrison, Upol Ehsan, Mark O. Riedl
Comments: 9 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG)

We introduce AI rationalization, an approach for generating explanations of
autonomous system behavior as if a human had done the behavior. We describe a
rationalization technique that uses neural machine translation to translate
internal state-action representations of the autonomous agent into natural
language. We evaluate our technique in the Frogger game environment. The
natural language is collected from human players thinking out loud as they play
the game. We motivate the use of rationalization as an approach to explanation
generation, show the results of experiments on the accuracy of our
rationalization technique, and describe future research agenda.

Asymmetric Tri-training for Unsupervised Domain Adaptation

Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Deep-layered models trained on a large number of labeled samples boost the
accuracy of many tasks. It is important to apply such models to different
domains because collecting many labeled samples in various domains is
expensive. In unsupervised domain adaptation, one needs to train a classifier
that works well on a target domain when provided with labeled source samples
and unlabeled target samples. Although many methods aim to match the
distributions of source and target samples, simply matching the distribution
cannot ensure accuracy on the target domain. To learn discriminative
representations for the target domain, we assume that artificially labeling
target samples can result in a good representation. Tri-training leverages
three classifiers equally to give pseudo-labels to unlabeled samples, but the
method does not assume labeling samples generated from a different domain.In
this paper, we propose an asymmetric tri-training method for unsupervised
domain adaptation, where we assign pseudo-labels to unlabeled samples and train
neural networks as if they are true labels. In our work, we use three networks
asymmetrically. By asymmetric, we mean that two networks are used to label
unlabeled target samples and one network is trained by the samples to obtain
target-discriminative representations. We evaluate our method on digit
recognition and sentiment analysis datasets. Our proposed method achieves
state-of-the-art performance on the benchmark digit recognition datasets of
domain adaptation.

Balancing Lexicographic Fairness and a Utilitarian Objective with Application to Kidney Exchange

Duncan C. McElfresh, John P. Dickerson
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

Balancing fairness and efficiency in resource allocation is a classical
economic and computational problem. The price of fairness measures the
worst-case loss of economic efficiency when using an inefficient but fair
allocation rule; for indivisible goods in many settings, this price is
unacceptably high. In this work, we propose a hybrid fairness rule that
balances a strict lexicographic preference ordering over classes of agents and
a utilitarian objective that maximizes economic efficiency. We develop a
utility function that favors disadvantaged groups lexicographically; but if
cost to overall efficiency becomes too high, it smoothly switches to a
utilitarian objective. This rule has only one parameter which is proportional
to a bound on the price of fairness, and can be adjusted by policymakers. We
apply this rule to kidney exchange, where needy patients swap willing but
incompatible donors, and demonstrate on real data from a large exchange that
our hybrid rule produces more reliable outcomes than other fairness rules.

DeepNAT: Deep Convolutional Neural Network for Segmenting Neuroanatomy

We introduce DeepNAT, a 3D Deep convolutional neural network for the
automatic segmentation of NeuroAnaTomy in T1-weighted magnetic resonance
images. DeepNAT is an end-to-end learning-based approach to brain segmentation
that jointly learns an abstract feature representation and a multi-class
classification. We propose a 3D patch-based approach, where we do not only
predict the center voxel of the patch but also neighbors, which is formulated
as multi-task learning. To address a class imbalance problem, we arrange two
networks hierarchically, where the first one separates foreground from
background, and the second one identifies 25 brain structures on the
foreground. Since patches lack spatial context, we augment them with
coordinates. To this end, we introduce a novel intrinsic parameterization of
the brain volume, formed by eigenfunctions of the Laplace-Beltrami operator. As
network architecture, we use three convolutional layers with pooling, batch
normalization, and non-linearities, followed by fully connected layers with
dropout. The final segmentation is inferred from the probabilistic output of
the network with a 3D fully connected conditional random field, which ensures
label agreement between close voxels. The roughly 2.7 million parameters in the
network are learned with stochastic gradient descent. Our results show that
DeepNAT compares favorably to state-of-the-art methods. Finally, the purely
learning-based method may have a high potential for the adaptation to young,
old, or diseased brains by fine-tuning the pre-trained network with a small
training sample on the target application, where the availability of larger
datasets with manual annotations may boost the overall segmentation accuracy in
the future.

Reinforcement Learning with Deep Energy-Based Policies

Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We propose a method for learning expressive energy-based policies for
continuous states and actions, which has been feasible only in tabular domains
before. We apply our method to learning maximum entropy policies, resulting
into a new algorithm, called soft Q-learning, that expresses the optimal policy
via a Boltzmann distribution. We use the recently proposed amortized Stein
variational gradient descent to learn a stochastic sampling network that
approximates samples from this distribution. The benefits of the proposed
algorithm include improved exploration and compositionality that allows
transferring skills between tasks, which we confirm in simulated experiments
with swimming and walking robots. We also draw a connection to actor-critic
methods, which can be viewed performing approximate inference on the
corresponding energy-based model.

Stochastic Variance Reduction Methods for Policy Evaluation

Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (cs.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)

Policy evaluation is a crucial step in many reinforcement-learning
procedures, which estimates a value function that predicts states’ long-term
value under a given policy. In this paper, we focus on policy evaluation with
linear function approximation over a fixed dataset. We first transform the
empirical policy evaluation problem into a (quadratic) convex-concave saddle
point problem, and then present a primal-dual batch gradient method, as well as
two stochastic variance reduction methods for solving the problem. These
algorithms scale linearly in both sample size and feature dimension. Moreover,
they achieve linear convergence even when the saddle-point problem has only
strong concavity in the dual variables but no strong convexity in the primal
variables. Numerical experiments on benchmark problems demonstrate the
effectiveness of our methods.

Contractibility for Open Global Constraints

Michael J. Maher
Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

Open forms of global constraints allow the addition of new variables to an
argument during the execution of a constraint program. Such forms are needed
for difficult constraint programming problems where problem construction and
problem solving are interleaved, and fit naturally within constraint logic
programming. However, in general, filtering that is sound for a global
constraint can be unsound when the constraint is open. This paper provides a
simple characterization, called contractibility, of the constraints where
filtering remains sound when the constraint is open. With this characterization
we can easily determine whether a constraint has this property or not. In the
latter case, we can use it to derive a contractible approximation to the
constraint. We demonstrate this work on both hard and soft constraints. In the
process, we formulate two general classes of soft constraints.

Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying

Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Athena Vakali
Comments: WWW Cybersafety Workshop 2017
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Over the past few years, online aggression and abusive behaviors have
occurred in many different forms and on a variety of platforms. In extreme
cases, these incidents have evolved into hate, discrimination, and bullying,
and even materialized into real-world threats and attacks against individuals
or groups. In this paper, we study the Gamergate controversy. Started in August
2014 in the online gaming world, it quickly spread across various social
networking platforms, ultimately leading to many incidents of cyberbullying and
cyberaggression. We focus on Twitter, presenting a measurement study of a
dataset of 340k unique users and 1.6M tweets to study the properties of these
users, the content they post, and how they differ from random Twitter users. We
find that users involved in this “Twitter war” tend to have more friends and
followers, are generally more engaged and post tweets with negative sentiment,
less joy, and more hate than random users. We also perform preliminary
measurements on how the Twitter suspension mechanism deals with such abusive
behaviors. While we focus on Gamergate, our methodology to collect and analyze
tweets related to aggressive and bullying activities is of independent
interest.

Mode Regularized Generative Adversarial Networks

Although Generative Adversarial Networks achieve state-of-the-art results on
a variety of generative tasks, they are regarded as highly unstable and prone
to miss modes. We argue that these bad behaviors of GANs are due to the very
particular functional shape of the trained discriminators in high dimensional
spaces, which can easily make training stuck or push probability mass in the
wrong direction, towards that of higher concentration than that of the data
generating distribution. We introduce several ways of regularizing the
objective, which can dramatically stabilize the training of GAN models. We also
show that our regularizers can help the fair distribution of probability mass
across the modes of the data generating distribution, during the early phases
of training and thus providing a unified solution to the missing modes problem.

Information Retrieval

Mutual Information based labelling and comparing clusters

Rob Koopman, Shenghui Wang
Comments: Special Issue of Scientometrics: Same data – different results? Towards a comparative approach to the identification of thematic structures in science
Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)

After a clustering solution is generated automatically, labelling these
clusters becomes important to help understanding the results. In this paper, we
propose to use a Mutual Information based method to label clusters of journal
articles. Topical terms which have the highest Normalised Mutual Information
(NMI) with a certain cluster are selected to be the labels of the cluster.
Discussion of the labelling technique with a domain expert was used as a check
that the labels are discriminating not only lexical-wise but also semantically.
Based on a common set of topical terms, we also propose to generate lexical
fingerprints as a representation of individual clusters. Eventually, we
visualise and compare these fingerprints of different clusters from either one
clustering solution or different ones.

PubTree: A Hierarchical Search Tool for the MEDLINE Database

William Rowe, Paul D. Dobson, Bede Constantinides, Mark Platt
Comments: 7 pages, 2 figures
Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)

Keeping track of the ever-increasing body of scientific literature is an
escalating challenge. We present PubTree a hierarchical search tool that
efficiently searches the PubMed/MEDLINE dataset based upon a decision tree
constructed using >26 million abstracts. The tool is implemented as a webpage,
where users are asked a series of eighteen questions to locate pertinent
articles. The implementation of this hierarchical search tool highlights issues
endemic with document retrieval. However, the construction of this tree
indicates that with future developments hierarchical search could become an
effective tool (or adjunct) in the mining of biological literature.

Related Pins at Pinterest: The Evolution of a Real-World Recommender System

David C. Liu, Stephanie Rogers, Raymond Shiau, Dmitry Kislyuk, Kevin C. Ma, Zhigang Zhong, Jenny Liu, Yushi Jing
Subjects: Information Retrieval (cs.IR)

Related Pins is the Web-scale recommender system that powers over 40% of user
engagement on Pinterest. This paper is a longitudinal study of three years of
its development, exploring the evolution of the system and its components from
prototypes to present state. Each component was originally built with many
constraints on engineering effort and computational resources, so we
prioritized the simplest and highest-leverage solutions. We show how organic
growth led to a complex system and how we managed this complexity. Many
challenges arose while building this system, such as avoiding feedback loops,
evaluating performance, activating content, and eliminating legacy heuristics.
Finally, we offer suggestions for tackling these challenges when engineering
Web-scale recommender systems.

Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media

Rupinder Paul Khandpur, Taoran Ji, Steve Jan, Gang Wang, Chang-Tien Lu, Naren Ramakrishnan
Comments: 13 single column pages, 5 figures, submitted to KDD 2017
Subjects: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

Social media is often viewed as a sensor into various societal events such as
disease outbreaks, protests, and elections. We describe the use of social media
as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our
approach detects a broad range of cyber-attacks (e.g., distributed denial of
service (DDOS) attacks, data breaches, and account hijacking) in an
unsupervised manner using just a limited fixed set of seed event triggers. A
new query expansion strategy based on convolutional kernels and dependency
parses helps model reporting structure and aids in identifying key event
characteristics. Through a large-scale analysis over Twitter, we demonstrate
that our approach consistently identifies and encodes events, outperforming
existing methods.

Computation and Language

Stance Classification of Social Media Users in Independence Movements

Arkaitz Zubiaga, Bo Wang, Maria Liakata, Rob Procter
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Social media and data mining are increasingly being used to analyse political
and societal issues. Characterisation of users into socio-demographic groups is
crucial to improve these analyses. Here we undertake the classification of
social media users as supporting or opposing ongoing independence movements in
their territories. Independence movements occur in territories whose citizens
have conflicting national identities; users with opposing national identities
will then support or oppose the sense of being part of an independent nation
that differs from the officially recognised country. We describe a methodology
that relies on users’ self-reported location to build datasets for three
territories — Catalonia, the Basque Country and Scotland — and we test
language-independent classifiers using four types of features. We show the
effectiveness of the approach to build large annotated datasets, and the
ability to achieve accurate, language-independent classification performances
ranging from 85% to 97% for the three territories under study.

Identifying beneficial task relations for multi-task learning in deep neural networks

Joachim Bingel, Anders Søgaard
Comments: Accepted for publication at EACL 2017
Subjects: Computation and Language (cs.CL)

Multi-task learning (MTL) in deep neural networks for NLP has recently
received increasing interest due to some compelling benefits, including its
potential to efficiently regularize models and to reduce the need for labeled
data. While it has brought significant improvements in a number of NLP tasks,
mixed results have been reported, and little is known about the conditions
under which MTL leads to gains in NLP. This paper sheds light on the specific
task relations that can lead to gains from MTL models over single-task setups.

A case study on English-Malayalam Machine Translation

Sreelekha S, Pushpak Bhattacharyya
Comments: This paper contains 10 pages with 4 figures and 10 tables
Subjects: Computation and Language (cs.CL)

In this paper we present our work on a case study on Statistical Machine
Translation (SMT) and Rule based machine translation (RBMT) for translation
from English to Malayalam and Malayalam to English. One of the motivations of
our study is to make a three way performance comparison, such as, a) SMT and
RBMT b) English to Malayalam SMT and Malayalam to English SMT c) English to
Malayalam RBMT and Malayalam to English RBMT. We describe the development of
English to Malayalam and Malayalam to English baseline phrase based SMT system
and the evaluation of its performance compared against the RBMT system. Based
on our study the observations are: a) SMT systems outperform RBMT systems, b)
In the case of SMT, English – Malayalam systems perform better than that of
Malayalam – English systems, c) In the case RBMT, Malayalam to English systems
are performing better than English to Malayalam systems. Based on our
evaluations and detailed error analysis, we describe the requirements of
incorporating morphological processing into the SMT to improve the accuracy of
translation.

Friends and Enemies of Clinton and Trump: Using Context for Detecting Stance in Political Tweets

Mirko Lai, Delia Irazú Hernández Farías, Viviana Patti, Paolo Rosso
Comments: To appear in MICAI 2016 LNAI Proceedings
Subjects: Computation and Language (cs.CL)

Stance detection, the task of identifying the speaker’s opinion towards a
particular target, has attracted the attention of researchers. This paper
describes a novel approach for detecting stance in Twitter. We define a set of
features in order to consider the context surrounding a target of interest with
the final aim of training a model for predicting the stance towards the
mentioned targets. In particular, we are interested in investigating political
debates in social media. For this reason we evaluated our approach focusing on
two targets of the SemEval-2016 Task6 on Detecting stance in tweets, which are
related to the political campaign for the 2016 U.S. presidential elections:
Hillary Clinton vs. Donald Trump. For the sake of comparison with the state of
the art, we evaluated our model against the dataset released in the
SemEval-2016 Task 6 shared task competition. Our results outperform the best
ones obtained by participating teams, and show that information about enemies
and friends of politicians help in detecting stance towards them.

Detecting (Un)Important Content for Single-Document News Summarization

Yinfei Yang, Forrest Sheng Bao, Ani Nenkova
Comments: Accepted By EACL 2017
Subjects: Computation and Language (cs.CL)

We present a robust approach for detecting intrinsic sentence importance in
news, by training on two corpora of document-summary pairs. When used for
single-document summarization, our approach, combined with the “beginning of
document” heuristic, outperforms a state-of-the-art summarizer and the
beginning-of-article baseline in both automatic and manual evaluations. These
results represent an important advance because in the absence of cross-document
repetition, single document summarizers for news have not been able to
consistently outperform the strong beginning-of-article baseline.

Critical Survey of the Freely Available Arabic Corpora

Wajdi Zaghouani
Comments: Published in the Proceedings of the International Conference on Language Resources and Evaluation (LREC’2014), OSACT Workshop. Reykjavik, Iceland, 26-31 May 2014
Subjects: Computation and Language (cs.CL)

The availability of corpora is a major factor in building natural language
processing applications. However, the costs of acquiring corpora can prevent
some researchers from going further in their endeavours. The ease of access to
freely available corpora is urgent needed in the NLP research community
especially for language such as Arabic. Currently, there is not easy was to
access to a comprehensive and updated list of freely available Arabic corpora.
We present in this paper, the results of a recent survey conducted to identify
the list of the freely available Arabic corpora and language resources. Our
preliminary results showed an initial list of 66 sources. We presents our
findings in the various categories studied and we provided the direct links to
get the data when possible.

Deep Voice: Real-time Neural Text-to-Speech

We present Deep Voice, a production-quality text-to-speech system constructed
entirely from deep neural networks. Deep Voice lays the groundwork for truly
end-to-end neural speech synthesis. The system comprises five major building
blocks: a segmentation model for locating phoneme boundaries, a
grapheme-to-phoneme conversion model, a phoneme duration prediction model, a
fundamental frequency prediction model, and an audio synthesis model. For the
segmentation model, we propose a novel way of performing phoneme boundary
detection with deep neural networks using connectionist temporal classification
(CTC) loss. For the audio synthesis model, we implement a variant of WaveNet
that requires fewer parameters and trains faster than the original. By using a
neural network for each component, our system is simpler and more flexible than
traditional text-to-speech systems, where each component requires laborious
feature engineering and extensive domain expertise. Finally, we show that
inference with our system can be performed faster than real time and describe
optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x
speedups over existing implementations.

Residual Convolutional CTC Networks for Automatic Speech Recognition

Yisen Wang, Xuejiao Deng, Songbai Pu, Zhiheng Huang
Subjects: Computation and Language (cs.CL)

Deep learning approaches have been widely used in Automatic Speech
Recognition (ASR) and they have achieved a significant accuracy improvement.
Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR
recently. However, most CNNs used in existing work have less than 10 layers
which may not be deep enough to capture all human speech signal information. In
this paper, we propose a novel deep and wide CNN architecture denoted as
RCNN-CTC, which has residual connections and Connectionist Temporal
Classification (CTC) loss function. RCNN-CTC is an end-to-end system which can
exploit temporal and spectral structures of speech signals simultaneously.
Furthermore, we introduce a CTC-based system combination, which is different
from the conventional frame-wise senone-based one. The basic subsystems adopted
in the combination are different types and thus mutually complementary to each
other. Experimental results show that our proposed single system RCNN-CTC can
achieve the lowest word error rate (WER) on WSJ and Tencent Chat data sets,
compared to several widely used neural network systems in ASR. In addition, the
proposed system combination can offer a further error reduction on these two
data sets, resulting in relative WER reductions of (14.91\%) and (6.52\%) on
WSJ dev93 and Tencent Chat data sets respectively.

When confidence and competence collide: Effects on online decision-making discussions

Liye Fu, Lillian Lee, Cristian Danescu-Niculescu-Mizil
Comments: To appear in Proceedings of WWW 2017. Online multiplayer game available at this http URL
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

Group discussions are a way for individuals to exchange ideas and arguments
in order to reach better decisions than they could on their own. One of the
premises of productive discussions is that better solutions will prevail, and
that the idea selection process is mediated by the (relative) competence of the
individuals involved. However, since people may not know their actual
competence on a new task, their behavior is influenced by their self-estimated
competence — that is, their confidence — which can be misaligned with their
actual competence.

Our goal in this work is to understand the effects of confidence-competence
misalignment on the dynamics and outcomes of discussions. To this end, we
design a large-scale natural setting, in the form of an online team-based
geography game, that allows us to disentangle confidence from competence and
thus separate their effects.

We find that in task-oriented discussions, the more-confident individuals
have a larger impact on the group’s decisions even when these individuals are
at the same level of competence as their teammates. Furthermore, this
unjustified role of confidence in the decision-making process often leads teams
to under-perform. We explore this phenomenon by investigating the effects of
confidence on conversational dynamics.

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

Recent work on generative modeling of text has found that variational
auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM
language models (Bowman et al., 2015). This negative result is so far poorly
understood, but has been attributed to the propensity of LSTM decoders to
ignore conditioning information from the encoder. In this paper, we experiment
with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s
dilation architecture, we control the effective context from previously
generated words. In experiments, we find that there is a trade off between the
contextual capacity of the decoder and the amount of encoding information used.
We show that with the right decoder, VAE can outperform LSTM language models.
We demonstrate perplexity gains on two datasets, representing the first
positive experimental result on the use VAE for generative modeling of text.
Further, we conduct an in-depth investigation of the use of VAE (with our new
decoding architecture) for semi-supervised and unsupervised labeling tasks,
demonstrating gains over several strong baselines.

Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

Despite the successes in capturing continuous distributions, the application
of generative adversarial networks (GANs) to discrete settings, like natural
language tasks, is rather restricted. The fundamental reason is the difficulty
of back-propagation through discrete random variables combined with the
inherent instability of the GAN training objective. To address these problems,
we propose Maximum-Likelihood Augmented Discrete Generative Adversarial
Networks. Instead of directly optimizing the GAN objective, we derive a novel
and low-variance objective using the discriminator’s output that follows
corresponds to the log-likelihood. Compared with the original, the new
objective is proved to be consistent in theory and beneficial in practice. The
experimental results on various discrete datasets demonstrate the effectiveness
of the proposed approach.

Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

We introduce AI rationalization, an approach for generating explanations of
autonomous system behavior as if a human had done the behavior. We describe a
rationalization technique that uses neural machine translation to translate
internal state-action representations of the autonomous agent into natural
language. We evaluate our technique in the Frogger game environment. The
natural language is collected from human players thinking out loud as they play
the game. We motivate the use of rationalization as an approach to explanation
generation, show the results of experiments on the accuracy of our
rationalization technique, and describe future research agenda.

Distributed, Parallel, and Cluster Computing

Optimized Secure Position Sharing with Non-trusted Servers

Pavel Skvortsov, Björn Schembera, Frank Dürr, Kurt Rothermel
Comments: 26 pages, 11 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Today, location-based applications and services such as friend finders and
geo-social networks are very popular. However, storing private position
information on third-party location servers leads to privacy problems. In our
previous work, we proposed a position sharing approach for secure management of
positions on non-trusted servers, which distributes position shares of limited
precision among servers of several providers. In this paper, we propose two
novel contributions to improve the original approach. First, we optimize the
placement of shares among servers by taking their trustworthiness into account.
Second, we optimize the location update protocols to minimize the number of
messages between mobile device and location servers.

Another Look at the Implementation of Read/write Registers in Crash-prone Asynchronous Message-Passing Systems (Extended Version)

Damien Imbs (LIF), Achour Mostefaoui (GDD), Matthieu Perrin, Michel Raynal (ASAP)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

” Yet another paper on ” the implementation of read/write registers in
crash-prone asynchronous message-passing systems! Yes…, but, differently from
its predecessors, this paper looks for a communication abstraction which
captures the essence of such an implementation in the same sense that total
order broadcast can be associated with consensus, or message causal delivery
can be associated with causal read/write registers. To this end, the paper
introduces a new communication abstraction, named SCD-broadcast (SCD standing
for ” Set Constrained Delivery “), which, instead of a single message, delivers
to processes sets of messages (whose size can be arbitrary), such that the
sequences of message sets delivered to any two processes satisfies some
constraints. The paper then shows that: (a) SCD-broadcast allows for a very
simple implementation of a snapshot object (and consequently also of atomic
read/write registers) in crash-prone asynchronous message-passing systems, (b)
SCD-broadcast can be built from snapshot objects (hence SCD-broadcast and
snapshot objects –or read/write registers– are ” computationally equivalent
“), (c) SCD-broadcast can be built in message-passing systems where any
minority of processes may crash (which is the weakest assumption on the number
of possible process crashes needed to implement a read/write register).

Tars: Timeliness-aware Adaptive Replica Selection for Key-Value Stores

Wanchun Jiang, Liyuan Fang, Haiming Xie, Xiangqian Zhou, Jianxin Wang
Comments: 10pages,submitted to ICDCS 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In current large-scale distributed key-value stores, a single end-user
request may lead to key-value access across tens or hundreds of servers. The
tail latency of these key-value accesses is crucial to the user experience and
greatly impacts the revenue. To cut the tail latency, it is crucial for clients
to choose the fastest replica server as much as possible for the service of
each key-value access. Aware of the challenges on the time varying performance
across servers and the herd behaviors, an adaptive replica selection scheme C3
is proposed recently. In C3, feedback from individual servers is brought into
replica ranking to reflect the time-varying performance of servers, and the
distributed rate control and backpressure mechanism is invented. Despite of
C3’s good performance, we reveal the timeliness issue of C3, which has large
impacts on both the replica ranking and the rate control, and propose the Tars
(timeliness-aware adaptive replica selection) scheme. Following the same
framework as C3, Tars improves the replica ranking by taking the timeliness of
the feedback information into consideration, as well as revises the rate
control of C3. Simulation results confirm that Tars outperforms C3.

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Huijun Wu, Chen Wang, Yinjin Fu, Sherif Sakr, Liming Zhu, Kai Lu
Comments: 14 pages, 11 figures, submitted to MSST2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Eliminating duplicate data in primary storage of clouds increases the
cost-efficiency of cloud service providers as well as reduces the cost of users
for using cloud services. Existing primary deduplication techniques either use
inline caching to exploit locality in primary workloads or use post-processing
deduplication running in system idle time to avoid the negative impact on I/O
performance. However, neither of them works well in the cloud servers running
multiple services or applications for the following two reasons: Firstly, the
temporal locality of duplicate data writes may not exist in some primary
storage workloads thus inline caching often fails to achieve good deduplication
ratio. Secondly, the post-processing deduplication allows duplicate data to be
written into disks, therefore does not provide the benefit of I/O deduplication
and requires high peak storage capacity. This paper presents HPDedup, a Hybrid
Prioritized data Deduplication mechanism to deal with the storage system shared
by applications running in co-located virtual machines or containers by fusing
an inline and a post-processing process for exact deduplication. In the inline
deduplication phase, HPDedup gives a fingerprint caching mechanism that
estimates the temporal locality of duplicates in data streams from different
VMs or applications and prioritizes the cache allocation for these streams
based on the estimation. HPDedup also allows different deduplication threshold
for streams based on their spatial locality to reduce the disk fragmentation.
The post-processing phase removes duplicates whose fingerprints are not able to
be cached due to the weak temporal locality from disks. Our experimental
results show that HPDedup clearly outperforms the state-of-the-art primary
storage deduplication techniques in terms of inline cache efficiency and
primary deduplication efficiency.

CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

Deep learning is an important component of big-data analytic tools and
intelligent applications, such as, self-driving cars, computer vision, speech
recognition, or precision medicine. However, the training process is
computationally intensive, and often requires a large amount of time if
performed sequentially. Modern parallel computing systems provide the
capability to reduce the required training time of deep neural networks. In
this paper, we present our parallelization scheme for training convolutional
neural networks (CNN) named Controlled Hogwild with Arbitrary Order of
Synchronization (CHAOS). Major features of CHAOS include the support for thread
and vector parallelism, non-instant updates of weight parameters during
back-propagation without a significant delay, and implicit synchronization in
arbitrary order. CHAOS is tailored for parallel computing systems that are
accelerated with the Intel Xeon Phi. We evaluate our parallelization approach
empirically using measurement techniques and performance modeling for various
numbers of threads and CNN architectures. Experimental results for the MNIST
dataset of handwritten digits using the total number of threads on the Xeon Phi
show speedups of up to 103x compared to the execution on one thread of the Xeon
Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x
compared to the sequential execution on Intel Core i5.

Near Data Scheduling for Data Centers with Multi Levels of Data Locality

Ali Yekkehkhany
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Data locality is a fundamental issue for data-parallel applications.
Considering MapReduce in Hadoop, the map task scheduling part requires an
efficient algorithm which takes data locality into consideration; otherwise,
system may get unstable under loads inside the system’s capacity region or jobs
may experience longer completion times which are not of interest. The data
chunk needed for any map task can be in memory, on a local disk, in a local
rack, in the same cluster or even in another data center. Hence, unless there
has been so much work on improving the speed of data center networks, still
there exists different levels of service rates for a task depending on where
its data chunk is saved and from which server it receives service. Most of the
theoretical work on load balancing is for systems with two levels of data
locality among which I can name Pandas algorithm by Xie et al. and JSQ-MW by
Wang et al., where the former is both throughput and heavy-traffic optimal,
while the latter is only throughput optimal, but heavy-traffic optimal in only
a special traffic load. We show that an extension of JSQ-MW for a system with
thee levels of data locality is throughput optimal, but not heavy-traffic
optimal for all load, but again for a special traffic scenario. Furthermore, we
show that Pandas is not even throughput optimal for a system with three levels
of data locality. We then propose a novel algorithm, Balanced-Pandas, which is
both throughput and heavy-traffic optimal. To the best of our knowledge, this
is the first theoretical work on load balancing for a system with more than two
levels of data locality which as we will see is more challenging than two
levels of data locality as a dilemma between performance and throughput
emerges.

A Methodology for Oracle Selection of Monitors and Knobs for Configuring an HPC System running a Flood Management Application

Panagiota Nikolaou, Yiannakis Sazeides, Antoni Portero, Radim Vavrik, Vit Vondrak
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This paper defines a methodology for the oracle selection of the monitors and
knobs to use to configure an HPC system running a scientific application while
satisfying the application’s requirements and not violating any system
constraints. This methodology relies on a heuristic correlation analysis
between requirements, monitors and knobs to determine the minimum subset of
monitors to observe and knobs to explore, to determine the optimal system
configuration for the HPC application. At the end of this analysis, we reduce
an 11-dimensional space to a 3-dimensional space for monitors and a
6-dimensional space to a 3-dimensional space for knobs. This reduction shows
the potential and highlights the need for a realistic methodology to help
identify such minimum set of monitors and knobs.

Scalable and Distributed Clustering via Lightweight Coresets

Olivier Bachem, Mario Lucic, Andreas Krause
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Computation (stat.CO)

Coresets are compact representations of data sets such that models trained on
a coreset are provably competitive with models trained on the full data set. As
such, they have been successfully used to scale up clustering models to massive
data sets. While existing approaches generally only allow for multiplicative
approximation errors, we propose a novel notion of coresets called lightweight
coresets that allows for both multiplicative and additive errors. We provide a
single algorithm to construct light-weight coresets for k-Means clustering,
Bregman clustering and maximum likelihood estimation of Gaussian mixture
models. The algorithm is substantially faster than existing constructions,
embarrassingly parallel and resulting coresets are smaller. In an extensive
experimental evaluation, we demonstrate that the proposed method outperforms
existing coreset constructions.

Using Battery Storage for Peak Shaving and Frequency Regulation: Joint Optimization for Superlinear Gains

Yuanyuan Shi, Bolun Xu, Di Wang, Baosen Zhang
Comments: Submitted to IEEE Transaction on Power Systems
Subjects: Systems and Control (cs.SY); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)

We consider using a battery storage system simultaneously for peak shaving
and frequency regulation through a joint optimization framework which captures
battery degradation, operational constraints and uncertainties in customer load
and regulation signals. Under this framework, using real data from a Microsoft
data center, an University of Washington building and PJM, we show the
electricity bill of users can be reduced by up to 15\%. Furthermore, we
demonstrate that the saving from joint optimization is often larger than the
sum of the optimal savings when the battery is used for the two individual
applications. A simple threshold real-time algorithm is proposed and achieves
this super-linear gain. Compared to prior works that focused on using battery
storage systems for single applications, our results suggest that batteries can
produce much larger economic benefits than previously thought if they jointly
provide multiple services.

Learning

McGan: Mean and Covariance Feature Matching GAN

Youssef Mroueh, Tom Sercu, Vaibhava Goel
Comments: 15 pages; under review
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We introduce new families of Integral Probability Metrics (IPM) for training
Generative Adversarial Networks (GAN). Our IPMs are based on matching
statistics of distributions embedded in a finite dimensional feature space.
Mean and covariance feature matching IPMs allow for stable training of GANs,
which we will call McGan. McGan minimizes a meaningful loss between
distributions.

Learning Hierarchical Features from Generative Models

Shengjia Zhao, Jiaming Song, Stefano Ermon
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Deep neural networks have been shown to be very successful at learning
feature hierarchies in supervised learning tasks. Generative models, on the
other hand, have benefited less from hierarchical models with multiple layers
of latent variables. In this paper, we prove that certain classes of
hierarchical latent variable models do not take advantage of the hierarchical
structure when trained with existing variational methods, and provide some
limitations on the kind of features existing models can learn. Finally we
propose an alternative flat architecture that learns meaningful and
disentangled features on natural images.

Neural Map: Structured Memory for Deep Reinforcement Learning

Emilio Parisotto, Ruslan Salakhutdinov
Subjects: Learning (cs.LG)

A critical component to enabling intelligent reasoning in partially
observable environments is memory. Despite this importance, Deep Reinforcement
Learning (DRL) agents have so far used relatively simple memory architectures,
with the main methods to overcome partial observability being either a temporal
convolution over the past k frames or an LSTM layer. More recent work (Oh et
al., 2016) has went beyond these architectures by using memory networks which
can allow more sophisticated addressing schemes over the past k frames. But
even these architectures are unsatisfactory due to the reason that they are
limited to only remembering information from the last k frames. In this paper,
we develop a memory system with an adaptable write operator that is customized
to the sorts of 3D environments that DRL agents typically interact with. This
architecture, called the Neural Map, uses a spatially structured 2D memory
image to learn to store arbitrary information about the environment over long
time lags. We demonstrate empirically that the Neural Map surpasses previous
DRL memories on a set of challenging 2D and 3D maze environments and show that
it is capable of generalizing to environments that were not seen during
training.

Adaptive Learning to Speed-Up Control of Prosthetic Hands: a Few Things Everybody Should Know

Valentina Gregori, Arjan Gijsberts, Barbara Caputo
Subjects: Learning (cs.LG)

A number of studies have proposed to use domain adaptation to reduce the
training efforts needed to control an upper-limb prosthesis exploiting
pre-trained models from prior subjects. These studies generally reported
impressive reductions in the required number of training samples to achieve a
certain level of accuracy for intact subjects. We further investigate two
popular methods in this field to verify whether this result equally applies to
amputees. Our findings show instead that this improvement can largely be
attributed to a suboptimal hyperparameter configuration. When hyperparameters
are appropriately tuned, the standard approach that does not exploit prior
information performs on par with the more complicated transfer learning
algorithms. Additionally, earlier studies erroneously assumed that the number
of training samples relates proportionally to the efforts required from the
subject. However, a repetition of a movement is the atomic unit for subjects
and the total number of repetitions should therefore be used as reliable
measure for training efforts. Also when correcting for this mistake, we do not
find any performance increase due to the use of prior models.

Fast and Accurate Inference with Adaptive Ensemble Prediction in Image Classification with Deep Neural Networks

Hiroshi Inoue
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Ensembling multiple predictions is a widely used technique to improve the
accuracy of various machine learning tasks. In image classification tasks, for
example, averaging the predictions for multiple patches extracted from the
input image significantly improves accuracy. Using multiple networks trained
independently to make predictions improves accuracy further. One obvious
drawback of the ensembling technique is its higher execution cost during
inference. If we average 100 predictions, the execution cost will be 100 times
as high as the cost without the ensemble. This higher cost limits the
real-world use of ensembling, even though using it is almost the norm to win
image classification competitions. In this paper, we describe a new technique
called adaptive ensemble prediction, which achieves the benefits of ensembling
with much smaller additional execution costs. Our observation behind this
technique is that many easy-to-predict inputs do not require ensembling. Hence
we calculate the confidence level of the prediction for each input on the basis
of the probability of the predicted label, i.e. the outputs from the softmax,
during the ensembling computation. If the prediction for an input reaches a
high enough probability on the basis of the confidence level, we stop
ensembling for this input to avoid wasting computation power. We evaluated the
adaptive ensembling by using various datasets and showed that it reduces the
computation time significantly while achieving similar accuracy to the naive
ensembling.

Fixed-point optimization of deep neural networks with adaptive step size retraining

Sungho Shin, Yoonho Boo, Wonyong Sung
Comments: This paper is accepted in ICASSP 2017
Subjects: Learning (cs.LG)

Fixed-point optimization of deep neural networks plays an important role in
hardware based design and low-power implementations. Many deep neural networks
show fairly good performance even with 2- or 3-bit precision when quantized
weights are fine-tuned by retraining. We propose an improved fixedpoint
optimization algorithm that estimates the quantization step size dynamically
during the retraining. In addition, a gradual quantization scheme is also
tested, which sequentially applies fixed-point optimizations from high- to
low-precision. The experiments are conducted for feed-forward deep neural
networks (FFDNNs), convolutional neural networks (CNNs), and recurrent neural
networks (RNNs).

Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis

Dan Garber, Ohad Shamir, Nathan Srebro
Subjects: Learning (cs.LG)

We study the fundamental problem of Principal Component Analysis in a
statistical distributed setting in which each machine out of (m) stores a
sample of (n) points sampled i.i.d. from a single unknown distribution. We
study algorithms for estimating the leading principal component of the
population covariance matrix that are both communication-efficient and achieve
estimation error of the order of the centralized ERM solution that uses all
(mn) samples. On the negative side, we show that in contrast to results
obtained for distributed estimation under convexity assumptions, for the PCA
objective, simply averaging the local ERM solutions cannot guarantee error that
is consistent with the centralized ERM. We show that this unfortunate phenomena
can be remedied by performing a simple correction step which correlates between
the individual solutions, and provides an estimator that is consistent with the
centralized ERM for sufficiently-large (n). We also introduce an iterative
distributed algorithm that is applicable in any regime of (n), which is based
on distributed matrix-vector products. The algorithm gives significant
acceleration in terms of communication rounds over previous distributed
algorithms, in a wide regime of parameters.

Reinforcement Learning with Deep Energy-Based Policies

Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We propose a method for learning expressive energy-based policies for
continuous states and actions, which has been feasible only in tabular domains
before. We apply our method to learning maximum entropy policies, resulting
into a new algorithm, called soft Q-learning, that expresses the optimal policy
via a Boltzmann distribution. We use the recently proposed amortized Stein
variational gradient descent to learn a stochastic sampling network that
approximates samples from this distribution. The benefits of the proposed
algorithm include improved exploration and compositionality that allows
transferring skills between tasks, which we confirm in simulated experiments
with swimming and walking robots. We also draw a connection to actor-critic
methods, which can be viewed performing approximate inference on the
corresponding energy-based model.

F2F: A Library For Fast Kernel Expansions

Joachim Curto, Irene Zarza, Feng Yang, Alexander J. Smola, Luc Van Gool
Subjects: Learning (cs.LG)

F2F is a C++ library for large-scale machine learning. It contains a CPU
optimized implementation of the Fastfood algorithm, that allows the computation
of approximated kernel expansions in loglinear time. The algorithm requires to
compute the product of Walsh-Hadamard Transform (WHT) matrices. A cache
friendly SIMD Fast Walsh-Hadamard Transform (FWHT) that achieves compelling
speed and outperforms current state-of-the-art methods has been developed. F2F
allows to obtain non-linear classification combining Fastfood and a linear
classifier.

Deceiving Google's Perspective API Built for Detecting Toxic Comments

Hossein Hosseini, Sreeram Kannan, Baosen Zhang, Radha Poovendran
Comments: 4 pages
Subjects: Learning (cs.LG); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

Social media platforms provide an environment where people can freely engage
in discussions. Unfortunately, they also enable several problems, such as
online harassment. Recently, Google and Jigsaw started a project called
Perspective, which uses machine learning to automatically detect toxic
language. A demonstration website has been also launched, which allows anyone
to type a phrase in the interface and instantaneously see the toxicity score
[1]. In this paper, we propose an attack on the Perspective toxic detection
system based on the adversarial examples. We show that an adversary can subtly
modify a highly toxic phrase in a way that the system assigns significantly
lower toxicity score to it. We apply the attack on the sample phrases provided
in the Perspective website and show that we can consistently reduce the
toxicity scores to the level of the non-toxic phrases. The existence of such
adversarial examples is very harmful for toxic detection systems and seriously
undermines their usability.

Online Multiview Representation Learning: Dropping Convexity for Better Efficiency

Zhehui Chen, Forest L. Yang, Chris J. Li, Tuo Zhao
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Multiview representation learning is very popular for latent factor analysis.
It naturally arises in many data analysis, machine learning, and information
retrieval applications to model dependent structures between a pair of data
matrices. For computational convenience, existing approaches usually formulate
the multiview representation learning as convex optimization problems, where
global optima can be obtained by certain algorithms in polynomial time.
However, many evidences have corroborated that heuristic nonconvex approaches
also have good empirical computational performance and convergence to the
global optima, although there is a lack of theoretical justification. Such a
gap between theory and practice motivates us to study a nonconvex formulation
for multiview representation learning, which can be efficiently solved by two
stochastic gradient descent (SGD) methods. Theoretically, by analyzing the
dynamics of the algorithms based on diffusion processes, we establish global
rates of convergence to the global optima with high probability. Numerical
experiments are provided to support our theory.

Learning Control for Air Hockey Striking using Deep Reinforcement Learning

Ayal Taitler, Nahum Shimkin
Subjects: Learning (cs.LG); Robotics (cs.RO)

We consider the task of learning control policies for a robotic mechanism
striking a puck in an air hockey game. The control signal is a direct command
to the robot’s motors. We employ a model free deep reinforcement learning
framework to learn the motoric skills of striking the puck accurately in order
to score. We propose certain improvements to the standard learning scheme which
make the deep Q-learning algorithm feasible when it might otherwise fail. Our
improvements include integrating prior knowledge into the learning scheme, and
accounting for the changing distribution of samples in the experience replay
buffer. Finally we present our simulation results for aimed striking which
demonstrate the successful learning of this task, and the improvement in
algorithm stability due to the proposed modifications.

Bayesian Nonparametric Feature and Policy Learning for Decision-Making

Jürgen Hahn, Abdelhak M. Zoubir
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Learning from demonstrations has gained increasing interest in the recent
past, enabling an agent to learn how to make decisions by observing an
experienced teacher. While many approaches have been proposed to solve this
problem, there is only little work that focuses on reasoning about the observed
behavior. We assume that, in many practical problems, an agent makes its
decision based on latent features, indicating a certain action. Therefore, we
propose a generative model for the states and actions. Inference reveals the
number of features, the features, and the policies, allowing us to learn and to
analyze the underlying structure of the observed behavior. Further, our
approach enables prediction of actions for new states. Simulations are used to
assess the performance of the algorithm based upon this model. Moreover, the
problem of learning a driver’s behavior is investigated, demonstrating the
performance of the proposed model in a real-world scenario.

Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

Alon Brutzkus, Amir Globerson
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Deep learning models are often successfully trained using gradient descent,
despite the worst case hardness of the underlying non-convex optimization
problem. The key question is then under what conditions can one prove that
optimization will succeed. Here we provide a strong result of this kind. We
consider a neural net with one hidden layer and a convolutional structure with
no overlap and a ReLU activation function. For this architecture we show that
learning is NP-complete in the general case, but that when the input
distribution is Gaussian, gradient descent converges to the global optimum in
polynomial time. To the best of our knowledge, this is the first global
optimality guarantee of gradient descent on a convolutional neural network with
ReLU activations.

Supervised Learning of Labeled Pointcloud Differences via Cover-Tree Entropy Reduction

We introduce a new algorithm, called CDER, for supervised machine learning
that merges the multi-scale geometric properties of Cover Trees with the
information-theoretic properties of entropy. CDER applies to a training set of
labeled pointclouds embedded in a common Euclidean space. If typical
pointclouds corresponding to distinct labels tend to differ at any scale in any
sub-region, CDER can identify these differences in (typically) linear time,
creating a set of distributional coordinates which act as a feature extraction
mechanism for supervised learning. We describe theoretical properties and
implementation details of CDER, and illustrate its benefits on several
synthetic examples.

Efficient Online Bandit Multiclass Learning with ( ilde{O}(sqrt{T})) Regret

Alina Beygelzimer, Francesco Orabona, Chicheng Zhang
Comments: 17 pages, 2 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We present an efficient second-order algorithm with
( ilde{O}(frac{1}{eta}sqrt{T})) regret for the bandit online multiclass
problem. The regret bound holds simultaneously with respect to a family of loss
functions parameterized by (eta), for a range of (eta) restricted by the norm
of the competitor. The family of loss functions ranges from hinge loss
((eta=0)) to squared hinge loss ((eta=1)). This provides a solution to the
open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for
(sqrt{T})-regret in online multiclass prediction? In COLT, 2009). We test our
algorithm experimentally, showing that it also performs favorably against
earlier algorithms.

Generative Adversarial Active Learning

Jia-Jie Zhu, Jose Bento
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We propose a new active learning approach using Generative Adversarial
Networks (GAN). Different from regular active learning, we adaptively
synthesize training instances for querying to increase learning speed. Our
approach outperforms random generation using GAN alone in active learning
experiments. We demonstrate the effectiveness of the proposed algorithm in
various datasets when compared to other algorithms. To the best our knowledge,
this is the first active learning work using GAN.

Stochastic Variance Reduction Methods for Policy Evaluation

Policy evaluation is a crucial step in many reinforcement-learning
procedures, which estimates a value function that predicts states’ long-term
value under a given policy. In this paper, we focus on policy evaluation with
linear function approximation over a fixed dataset. We first transform the
empirical policy evaluation problem into a (quadratic) convex-concave saddle
point problem, and then present a primal-dual batch gradient method, as well as
two stochastic variance reduction methods for solving the problem. These
algorithms scale linearly in both sample size and feature dimension. Moreover,
they achieve linear convergence even when the saddle-point problem has only
strong concavity in the dual variables but no strong convexity in the primal
variables. Numerical experiments on benchmark problems demonstrate the
effectiveness of our methods.

Efficient Learning of Graded Membership Models

Zilong Tan, Sayan Mukherjee
Comments: 23 pages, 1 figure, 1 table
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We present an efficient algorithm for learning graded membership models when
the number of variables (p) is much larger than the number of hidden components
(k). This algorithm reduces the computational complexity of state-of-the-art
tensor methods, which require decomposing an (Oleft(p^3
ight)) tensor, to
factorizing (Oleft(p/k
ight)) sub-tensors each of size (Oleft(k^3
ight)).
In addition, we address the issue of negative entries in the empirical method
of moments based estimators. We provide sufficient conditions under which our
approach has provable guarantees. Our approach obtains competitive empirical
results on both simulated and real data.

Coarse Grained Exponential Variational Autoencoders

Ke Sun, Xiangliang Zhang
Subjects: Learning (cs.LG)

Variational autoencoders (VAE) often use Gaussian or category distribution to
model the inference process. This puts a limit on variational learning because
this simplified assumption does not match the true posterior distribution,
which is usually much more sophisticated. To break this limitation and apply
arbitrary parametric distribution during inference, this paper derives a
emph{semi-continuous} latent representation, which approximates a continuous
density up to a prescribed precision, and is much easier to analyze than its
continuous counterpart because it is fundamentally discrete. We showcase the
proposition by applying polynomial exponential family distributions as the
posterior, which are universal probability density function generators. Our
experimental results show consistent improvements over commonly used VAE
models.

Online Learning with Many Experts

Alon Cohen, Shie Mannor
Subjects: Learning (cs.LG)

We study the problem of prediction with expert advice when the number of
experts in question may be extremely large or even infinite. We devise an
algorithm that obtains a tight regret bound of (widetilde{O}(epsilon T + N +
sqrt{NT})), where (N) is the empirical (epsilon)-covering number of the
sequence of loss functions generated by the environment. In addition, we
present a hedging procedure that allows us to find the optimal (epsilon) in
hindsight.

Finally, we discuss a few interesting applications of our algorithm. We show
how our algorithm is applicable in the approximately low rank experts model of
Hazan et al. (2016), and discuss the case of experts with bounded variation, in
which there is a surprisingly large gap between the regret bounds obtained in
the statistical and online settings.

An Unsupervised Learning Method Exploiting Sequential Output Statistics

Yu Liu, Jianshu Chen, Li Deng
Subjects: Learning (cs.LG)

We address a class of unsupervised learning problems where the same goal of
supervised learning is aimed except with no output labels provided for training
classifiers. This type of unsupervised learning is highly valuable in machine
learning practice since obtaining labels in training data is often costly.
Instead of pairing input-output samples, we exploit sequential statistics of
output labels, in the form of N-gram language models, which can be obtained
independently of input data and thus with low or no cost. We introduce a novel
cost function in this unsupervised learning setting, whose profiles are
analyzed and shown to be highly non-convex with large barriers near the global
optimum. A new stochastic primal-dual gradient method is developed to optimize
this very difficult type of cost function via the use of dual variables to
reduce the barriers. We demonstrate in experimental evaluation, with both
synthetic and real-world data sets, that the new method for unsupervised
learning gives drastically lower errors and higher learning efficiency than the
standard stochastic gradient descent, reaching classification errors about
twice of those obtained by fully supervised learning. We also show the crucial
role of labels’ sequential statistics exploited for label-free training with
the new method, reflected by the significantly lower classification errors when
higher-order language models are used in unsupervised learning than low-order
ones.

Adaptive Neural Networks for Fast Test-Time Prediction

We present an approach to adaptively utilize deep neural networks in order to
reduce the evaluation time on new examples without loss of classification
performance. Rather than attempting to redesign or approximate existing
networks, we propose two schemes that adaptively utilize networks. First, we
pose an adaptive network evaluation scheme, where we learn a system to
adaptively choose the components of a deep network to be evaluated for each
example. By allowing examples correctly classified using early layers of the
system to exit, we avoid the computational time associated with full evaluation
of the network. Building upon this approach, we then learn a network selection
system that adaptively selects the network to be evaluated for each example. We
exploit the fact that many examples can be correctly classified using
relatively efficient networks and that complex, computationally costly networks
are only necessary for a small fraction of examples. By avoiding evaluation of
these complex networks for a large fraction of examples, computational time can
be dramatically reduced. Empirically, these approaches yield dramatic
reductions in computational cost, with up to a 2.8x speedup on state-of-the-art
networks from the ImageNet image recognition challenge with minimal (less than
1%) loss of accuracy.

On the Origin of Deep Learning

Haohan Wang, Bhiksha Raj, Eric P. Xing
Comments: 81 pages, 200 references
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

This paper is a review of the evolutionary history of deep learning models.
It covers from the genesis of neural networks when associationism modeling of
the brain is studied, to the models that dominate the last decade of research
in deep learning like convolutional neural networks, deep belief networks, and
recurrent neural networks, and extends to popular recent models like
variational autoencoder and generative adversarial nets. In addition to a
review of these models, this paper primarily focuses on the precedents of the
models above, examining how the initial ideas are assembled to construct the
early models and how these preliminary models are developed into their current
forms. Many of these evolutionary paths last more than half a century and have
a diversity of directions. For example, CNN is built on prior knowledge of
biological vision system; DBN is evolved from a trade-off of modeling power and
computation complexity of graphical models and many nowadays models are neural
counterparts of ancient linear models. This paper reviews these evolutionary
paths and offers a concise thought flow of how these models are developed, and
aims to provide a thorough background for deep learning. More importantly,
along with the path, this paper summarizes the gist behind these milestones and
proposes many directions to guide the future research of deep learning.

Boundary-Seeking Generative Adversarial Networks

R Devon Hjelm, Athul Paul Jacob, Tong Che, Kyunghyun Cho, Yoshua Bengio
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We introduce a novel approach to training generative adversarial networks,
where we train a generator to match a target distribution that converges to the
data distribution at the limit of a perfect discriminator. This objective can
be interpreted as training a generator to produce samples that lie on the
decision boundary of a current discriminator in training at each update, and we
call a GAN trained using this algorithm a boundary-seeking GAN (BS-GAN). This
approach can be used to train a generator with discrete output when the
generator outputs a parametric conditional distribution. We demonstrate the
effectiveness of the proposed algorithm with discrete image data. In contrary
to the proposed algorithm, we observe that the recently proposed Gumbel-Softmax
technique for re-parametrizing the discrete variables does not work for
training a GAN with discrete data. Finally, we notice that the proposed
boundary-seeking algorithm works even with continuous variables, and
demonstrate its effectiveness with two widely used image data sets, SVHN and
CelebA.

An SDP-Based Algorithm for Linear-Sized Spectral Sparsification

Yin Tat Lee, He Sun
Comments: To appear at STOC’17
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

For any undirected and weighted graph (G=(V,E,w)) with (n) vertices and (m)
edges, we call a sparse subgraph (H) of (G), with proper reweighting of the
edges, a ((1+varepsilon))-spectral sparsifier if [
(1-varepsilon)x^{intercal}L_Gxleq x^{intercal} L_{H} xleq (1+varepsilon)
x^{intercal} L_Gx ] holds for any (xinmathbb{R}^n), where (L_G) and (L_{H})
are the respective Laplacian matrices of (G) and (H). Noticing that (Omega(m))
time is needed for any algorithm to construct a spectral sparsifier and a
spectral sparsifier of (G) requires (Omega(n)) edges, a natural question is to
investigate, for any constant (varepsilon), if a ((1+varepsilon))-spectral
sparsifier of (G) with (O(n)) edges can be constructed in ( ilde{O}(m)) time,
where the ( ilde{O}) notation suppresses polylogarithmic factors. All previous
constructions on spectral sparsification require either super-linear number of
edges or (m^{1+Omega(1)}) time.

In this work we answer this question affirmatively by presenting an algorithm
that, for any undirected graph (G) and (varepsilon>0), outputs a
((1+varepsilon))-spectral sparsifier of (G) with (O(n/varepsilon^2)) edges in
( ilde{O}(m/varepsilon^{O(1)})) time. Our algorithm is based on three novel
techniques: (1) a new potential function which is much easier to compute yet
has similar guarantees as the potential functions used in previous references;
(2) an efficient reduction from a two-sided spectral sparsifier to a one-sided
spectral sparsifier; (3) constructing a one-sided spectral sparsifier by a
semi-definite program.

Dynamic Word Embeddings via Skip-Gram Filtering

Robert Bamler, Stephan Mandt
Comments: 10 pages + supplement
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We present a probabilistic language model for time-stamped text data which
tracks the semantic evolution of individual words over time. The model
represents words and contexts by latent trajectories in an embedding space. At
each moment in time, the embedding vectors are inferred from a probabilistic
version of word2vec [Mikolov, 2013]. These embedding vectors are connected in
time through a latent diffusion process. We describe two scalable variational
inference algorithms—skip-gram smoothing and skip-gram filtering—that allow
us to train the model jointly over all times; thus learning on all data while
simultaneously allowing word and context vectors to drift. Experimental results
on three different corpora demonstrate that our dynamic model infers word
embedding trajectories that are more interpretable and lead to higher
predictive likelihoods than competing methods that are based on static models
trained separately on time slices.

Approximate Inference with Amortised MCMC

Yingzhen Li, Richard E. Turner, Qiang Liu
Comments: Submitted to ICML 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We propose a novel approximate inference algorithm that approximates a target
distribution by amortising the dynamics of a user-selected MCMC sampler. The
idea is to initialise MCMC using samples from an approximation network, apply
the MCMC operator to improve these samples, and finally use the samples to
update the approximation network thereby improving its quality. This provides a
new generic framework for approximate inference, allowing us to deploy highly
complex, or implicitly defined approximation families with intractable
densities, including approximations produced by warping a source of randomness
through a deep neural network. Experiments consider image modelling with deep
generative models as a challenging test for the method. Deep models trained
using amortised MCMC are shown to generate realistic looking samples as well as
producing diverse imputations for images with regions of missing pixels.

Uniform Deviation Bounds for Unbounded Loss Functions like k-Means

Olivier Bachem, Mario Lucic, S. Hamed Hassani, Andreas Krause
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Uniform deviation bounds limit the difference between a model’s expected loss
and its loss on an empirical sample uniformly for all models in a learning
problem. As such, they are a critical component to empirical risk minimization.
In this paper, we provide a novel framework to obtain uniform deviation bounds
for loss functions which are *unbounded*. In our main application, this allows
us to obtain bounds for (k)-Means clustering under weak assumptions on the
underlying distribution. If the fourth moment is bounded, we prove a rate of
(mathcal{O}left(m^{-frac12}
ight)) compared to the previously known
(mathcal{O}left(m^{-frac14}
ight)) rate. Furthermore, we show that the rate
also depends on the kurtosis – the normalized fourth moment which measures the
“tailedness” of a distribution. We further provide improved rates under
progressively stronger assumptions, namely, bounded higher moments,
subgaussianity and bounded support.

Scalable and Distributed Clustering via Lightweight Coresets

Coresets are compact representations of data sets such that models trained on
a coreset are provably competitive with models trained on the full data set. As
such, they have been successfully used to scale up clustering models to massive
data sets. While existing approaches generally only allow for multiplicative
approximation errors, we propose a novel notion of coresets called lightweight
coresets that allows for both multiplicative and additive errors. We provide a
single algorithm to construct light-weight coresets for k-Means clustering,
Bregman clustering and maximum likelihood estimation of Gaussian mixture
models. The algorithm is substantially faster than existing constructions,
embarrassingly parallel and resulting coresets are smaller. In an extensive
experimental evaluation, we demonstrate that the proposed method outperforms
existing coreset constructions.

Variational Inference using Implicit Distributions

Ferenc Huszár
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Generative adversarial networks (GANs) have given us a great tool to fit
implicit generative models to data. Implicit distributions are ones we can
sample from easily, and take derivatives of samples with respect to model
parameters. These models are highly expressive and we argue they can prove just
as useful for variational inference (VI) as they are for generative modelling.
Several papers have proposed GAN-like algorithms for inference, however,
connections to the theory of VI are not always well understood. This paper
provides a unifying review of existing algorithms establishing connections
between variational autoencoders, adversarially learned inference, operator VI,
GAN-based image reconstruction, and more. Secondly, the paper provides a
framework for building new algorithms: depending on the way the variational
bound is expressed we introduce prior-contrastive and joint-contrastive
methods, and show practical inference algorithms based on either density ratio
estimation or denoising.

Online Nonparametric Learning, Chaining, and the Role of Partial Feedback

Nicolò Cesa-Bianchi, Pierre Gaillard (SIERRA, Inria), Claudio Gentile, Sébastien Gerchinovitz (AOC, IMT)
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

We investigate contextual online learning with nonparametric (Lipschitz)
comparison classes under different assumptions on losses and feedback
information. For full information feedback and Lipschitz losses, we
characterize the minimax regret up to log factors by proving an upper bound
matching a previously known lower bound. In a partial feedback model motivated
by second-price auctions, we prove upper bounds for Lipschitz and
semi-Lipschitz losses that improve on the known bounds for standard bandit
feedback. Our analysis combines novel results for contextual second-price
auctions with a novel algorithmic approach based on chaining. When the context
space is Euclidean, our chaining approach is efficient and delivers an even
better regret bound.

DeepNAT: Deep Convolutional Neural Network for Segmenting Neuroanatomy

We introduce DeepNAT, a 3D Deep convolutional neural network for the
automatic segmentation of NeuroAnaTomy in T1-weighted magnetic resonance
images. DeepNAT is an end-to-end learning-based approach to brain segmentation
that jointly learns an abstract feature representation and a multi-class
classification. We propose a 3D patch-based approach, where we do not only
predict the center voxel of the patch but also neighbors, which is formulated
as multi-task learning. To address a class imbalance problem, we arrange two
networks hierarchically, where the first one separates foreground from
background, and the second one identifies 25 brain structures on the
foreground. Since patches lack spatial context, we augment them with
coordinates. To this end, we introduce a novel intrinsic parameterization of
the brain volume, formed by eigenfunctions of the Laplace-Beltrami operator. As
network architecture, we use three convolutional layers with pooling, batch
normalization, and non-linearities, followed by fully connected layers with
dropout. The final segmentation is inferred from the probabilistic output of
the network with a 3D fully connected conditional random field, which ensures
label agreement between close voxels. The roughly 2.7 million parameters in the
network are learned with stochastic gradient descent. Our results show that
DeepNAT compares favorably to state-of-the-art methods. Finally, the purely
learning-based method may have a high potential for the adaptation to young,
old, or diseased brains by fine-tuning the pre-trained network with a small
training sample on the target application, where the availability of larger
datasets with manual annotations may boost the overall segmentation accuracy in
the future.

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

Recent work on generative modeling of text has found that variational
auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM
language models (Bowman et al., 2015). This negative result is so far poorly
understood, but has been attributed to the propensity of LSTM decoders to
ignore conditioning information from the encoder. In this paper, we experiment
with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s
dilation architecture, we control the effective context from previously
generated words. In experiments, we find that there is a trade off between the
contextual capacity of the decoder and the amount of encoding information used.
We show that with the right decoder, VAE can outperform LSTM language models.
We demonstrate perplexity gains on two datasets, representing the first
positive experimental result on the use VAE for generative modeling of text.
Further, we conduct an in-depth investigation of the use of VAE (with our new
decoding architecture) for semi-supervised and unsupervised labeling tasks,
demonstrating gains over several strong baselines.

Selection of training populations (and other subset selection problems) with an accelerated genetic algorithm (STPGA: An R-package for selection of training populations with a genetic algorithm)

Deniz Akdemir
Subjects: Methodology (stat.ME); Learning (cs.LG); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM); Applications (stat.AP)

Optimal subset selection is an important task that has numerous algorithms
designed for it and has many application areas. STPGA contains a special
genetic algorithm supplemented with a tabu memory property (that keeps track of
previously tried solutions and their fitness for a number of iterations), and
with a regression of the fitness of the solutions on their coding that is used
to form the ideal estimated solution (look ahead property) to search for
solutions of generic optimal subset selection problems. I have initially
developed the programs for the specific problem of selecting training
populations for genomic prediction or association problems, therefore I give
discussion of the theory behind optimal design of experiments to explain the
default optimization criteria in STPGA, and illustrate the use of the programs
in this endeavor. Nevertheless, I have picked a few other areas of application:
supervised and unsupervised variable selection based on kernel alignment,
supervised variable selection with design criteria, influential observation
identification for regression, solving mixed integer quadratic optimization
problems, balancing gains and inbreeding in a breeding population. Some of
these illustrations pertain new statistical approaches.

Criticality and Deep Learning, Part I: Theory vs. Empirics

Dan Oprisa, Peter Toth
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Motivated by the idea that criticality and universality of phase transitions
might play a crucial role in achieving and sustaining learning and intelligent
behaviour in biological and artificial networks, we analyse a theoretical and a
pragmatic experimental set up for critical phenomena in deep learning. On the
theoretical side, we use results from statistical physics to carry out critical
point calculations in feed-forward/fully connected networks, while on the
experimental side we set out to find traces of criticality in deep neural
networks. This is our first step in a series of upcoming investigations to map
out the relationship between criticality and learning in deep networks.

Support vector machine and its bias correction in high-dimension, low-sample-size settings

Yugo Nakayama, Kazuyoshi Yata, Makoto Aoshima
Comments: 23 pages, 3 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

In this paper, we consider asymptotic properties of the support vector
machine (SVM) in high-dimension, low-sample-size (HDLSS) settings. We show that
the hard-margin linear SVM holds a consistency property in which
misclassification rates tend to zero as the dimension goes to infinity under
certain severe conditions. We show that the SVM is very biased in HDLSS
settings and its performance is affected by the bias directly. In order to
overcome such difficulties, we propose a bias-corrected SVM (BC-SVM). We show
that the BC-SVM gives preferable performances in HDLSS settings. We also
discuss the SVMs in multiclass HDLSS settings. Finally, we check the
performance of the classifiers in actual data analyses.

Kiefer Wolfowitz Algorithm is Asymptotically Optimal for a Class of Non-Stationary Bandit Problems

Rahul Singh, Taposh Banerjee
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We consider the problem of designing an allocation rule or an “online
learning algorithm” for a class of bandit problems in which the set of control
actions available at each time (t) is a convex, compact subset of
(mathbb{R}^d). Upon choosing an action (x) at time (t), the algorithm obtains
a noisy value of the unknown and time-varying function (f_t) evaluated at (x).
The “regret” of an algorithm is the gap between its expected reward, and the
reward earned by a strategy which has the knowledge of the function (f_t) at
each time (t) and hence chooses the action (x_t) that maximizes (f_t).

For this non-stationary bandit problem set-up, we propose two variants of the
Kiefer Wolfowitz (KW) algorithm i) KW with fixed step-size (eta), and ii) KW
with sliding window of length (L). We show that if the number of times that the
function (f_t) varies during time (t) is (o(T)), and if the learning rates of
the proposed algorithms are chosen “optimally”, then the regret of the proposed
algorithms is (o(T)).

Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

Despite the successes in capturing continuous distributions, the application
of generative adversarial networks (GANs) to discrete settings, like natural
language tasks, is rather restricted. The fundamental reason is the difficulty
of back-propagation through discrete random variables combined with the
inherent instability of the GAN training objective. To address these problems,
we propose Maximum-Likelihood Augmented Discrete Generative Adversarial
Networks. Instead of directly optimizing the GAN objective, we derive a novel
and low-variance objective using the discriminator’s output that follows
corresponds to the log-likelihood. Compared with the original, the new
objective is proved to be consistent in theory and beneficial in practice. The
experimental results on various discrete datasets demonstrate the effectiveness
of the proposed approach.

Ratio Utility and Cost Analysis for Privacy Preserving Subspace Projection

Mert Al, Shibiao Wan, Sun-Yuan Kung
Comments: Submitted to ICASSP 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

With a rapidly increasing number of devices connected to the internet, big
data has been applied to various domains of human life. Nevertheless, it has
also opened new venues for breaching users’ privacy. Hence it is highly
required to develop techniques that enable data owners to privatize their data
while keeping it useful for intended applications. Existing methods, however,
do not offer enough flexibility for controlling the utility-privacy trade-off
and may incur unfavorable results when privacy requirements are high. To tackle
these drawbacks, we propose a compressive-privacy based method, namely RUCA
(Ratio Utility and Cost Analysis), which can not only maximize performance for
a privacy-insensitive classification task but also minimize the ability of any
classifier to infer private information from the data. Experimental results on
Census and Human Activity Recognition data sets demonstrate that RUCA
significantly outperforms existing privacy preserving data projection
techniques for a wide range of privacy pricings.

CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

Deep learning is an important component of big-data analytic tools and
intelligent applications, such as, self-driving cars, computer vision, speech
recognition, or precision medicine. However, the training process is
computationally intensive, and often requires a large amount of time if
performed sequentially. Modern parallel computing systems provide the
capability to reduce the required training time of deep neural networks. In
this paper, we present our parallelization scheme for training convolutional
neural networks (CNN) named Controlled Hogwild with Arbitrary Order of
Synchronization (CHAOS). Major features of CHAOS include the support for thread
and vector parallelism, non-instant updates of weight parameters during
back-propagation without a significant delay, and implicit synchronization in
arbitrary order. CHAOS is tailored for parallel computing systems that are
accelerated with the Intel Xeon Phi. We evaluate our parallelization approach
empirically using measurement techniques and performance modeling for various
numbers of threads and CNN architectures. Experimental results for the MNIST
dataset of handwritten digits using the total number of threads on the Xeon Phi
show speedups of up to 103x compared to the execution on one thread of the Xeon
Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x
compared to the sequential execution on Intel Core i5.

An EM Based Probabilistic Two-Dimensional CCA with Application to Face Recognition

Mehran Safayani, Seyed Hashem Ahmadi, Homayun Afrabandpey, Abdolreza Mirzaei
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Recently, two-dimensional canonical correlation analysis (2DCCA) has been
successfully applied for image feature extraction. The method instead of
concatenating the columns of the images to the one-dimensional vectors,
directly works with two-dimensional image matrices. Although 2DCCA works well
in different recognition tasks, it lacks a probabilistic interpretation. In
this paper, we present a probabilistic framework for 2DCCA called probabilistic
2DCCA (P2DCCA) and an iterative EM based algorithm for optimizing the
parameters. Experimental results on synthetic and real data demonstrate
superior performance in loading factor estimation for P2DCCA compared to 2DCCA.
For real data, three subsets of AR face database and also the UMIST face
database confirm the robustness of the proposed algorithm in face recognition
tasks with different illumination conditions, facial expressions, poses and
occlusions.

Efficient coordinate-wise leading eigenvector computation

Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro
Subjects: Numerical Analysis (cs.NA); Learning (cs.LG); Machine Learning (stat.ML)

We develop and analyze efficient “coordinate-wise” methods for finding the
leading eigenvector, where each step involves only a vector-vector product. We
establish global convergence with overall runtime guarantees that are at least
as good as Lanczos’s method and dominate it for slowly decaying spectrum. Our
methods are based on combining a shift-and-invert approach with coordinate-wise
algorithms for linear regression.

Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

We introduce AI rationalization, an approach for generating explanations of
autonomous system behavior as if a human had done the behavior. We describe a
rationalization technique that uses neural machine translation to translate
internal state-action representations of the autonomous agent into natural
language. We evaluate our technique in the Frogger game environment. The
natural language is collected from human players thinking out loud as they play
the game. We motivate the use of rationalization as an approach to explanation
generation, show the results of experiments on the accuracy of our
rationalization technique, and describe future research agenda.

Deep Voice: Real-time Neural Text-to-Speech

We present Deep Voice, a production-quality text-to-speech system constructed
entirely from deep neural networks. Deep Voice lays the groundwork for truly
end-to-end neural speech synthesis. The system comprises five major building
blocks: a segmentation model for locating phoneme boundaries, a
grapheme-to-phoneme conversion model, a phoneme duration prediction model, a
fundamental frequency prediction model, and an audio synthesis model. For the
segmentation model, we propose a novel way of performing phoneme boundary
detection with deep neural networks using connectionist temporal classification
(CTC) loss. For the audio synthesis model, we implement a variant of WaveNet
that requires fewer parameters and trains faster than the original. By using a
neural network for each component, our system is simpler and more flexible than
traditional text-to-speech systems, where each component requires laborious
feature engineering and extensive domain expertise. Finally, we show that
inference with our system can be performed faster than real time and describe
optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x
speedups over existing implementations.

Rank-to-engage: New Listwise Approaches to Maximize Engagement

Swayambhoo Jain, Akshay Soni, Nikolay Laptev, Yashar Mehdad
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

For many internet businesses, presenting a given list of items in an order
that maximizes a certain metric of interest (e.g., click-through-rate, average
engagement time etc.) is crucial. We approach the aforementioned task from a
learning-to-rank perspective which reveals a new problem setup. In traditional
learning-to-rank literature, it is implicitly assumed that during the training
data generation one has access to the emph{best or desired} order for the
given list of items. In this work, we consider a problem setup where we do not
observe the desired ranking. We present two novel solutions: the first solution
is an extension of already existing listwise learning-to-rank
technique–Listwise maximum likelihood estimation (ListMLE)–while the second
one is a generic machine learning based framework that tackles the problem in
its entire generality. We discuss several challenges associated with this
generic framework, and propose a simple emph{item-payoff} and
emph{positional-gain} model that addresses these challenges. We provide
training algorithms, inference procedures, and demonstrate the effectiveness of
the two approaches over traditional ListMLE on synthetic as well as on
real-life setting of ranking news articles for increased dwell time.

Activation Ensembles for Deep Neural Networks

Mark Harmon, Diego Klabjan
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Many activation functions have been proposed in the past, but selecting an
adequate one requires trial and error. We propose a new methodology of
designing activation functions within a neural network at each layer. We call
this technique an “activation ensemble” because it allows the use of multiple
activation functions at each layer. This is done by introducing additional
variables, (alpha), at each activation layer of a network to allow for
multiple activation functions to be active at each neuron. By design,
activations with larger (alpha) values at a neuron is equivalent to having the
largest magnitude. Hence, those higher magnitude activations are “chosen” by
the network. We implement the activation ensembles on a variety of datasets
using an array of Feed Forward and Convolutional Neural Networks. By using the
activation ensemble, we achieve superior results compared to traditional
techniques. In addition, because of the flexibility of this methodology, we
more deeply explore activation functions and the features that they capture.

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

Environmental audio tagging is a newly proposed task to predict the presence
or absence of a specific audio event in a chunk. Deep neural network (DNN)
based methods have been successfully adopted for predicting the audio tags in
the domestic audio scene. In this paper, we propose to use a convolutional
neural network (CNN) to extract robust features from mel-filter banks (MFBs),
spectrograms or even raw waveforms for audio tagging. Gated recurrent unit
(GRU) based recurrent neural networks (RNNs) are then cascaded to model the
long-term temporal structure of the audio signal. To complement the input
information, an auxiliary CNN is designed to learn on the spatial features of
stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging)
of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE
2016) challenge. Compared with our recent DNN-based method, the proposed
structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the
development set. The spatial features can further reduce the EER to 0.10. The
performance of the end-to-end learning on raw waveforms is also comparable.
Finally, on the evaluation set, we get the state-of-the-art performance with
0.12 EER while the performance of the best existing system is 0.15 EER.

Changing Model Behavior at Test-Time Using Reinforcement Learning

Augustus Odena, Dieterich Lawson, Christopher Olah
Comments: Submitted to ICLR 2017 Workshop Track
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Machine learning models are often used at test-time subject to constraints
and trade-offs not present at training-time. For example, a computer vision
model operating on an embedded device may need to perform real-time inference,
or a translation model operating on a cell phone may wish to bound its average
compute time in order to be power-efficient. In this work we describe a
mixture-of-experts model and show how to change its test-time resource-usage on
a per-input basis using reinforcement learning. We test our method on a small
MNIST-based example.

A supervised approach to time scale detection in dynamic networks

Benjamin Fish, Rajmonda S. Caceres
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

For any stream of time-stamped edges that form a dynamic network, an
important choice is the aggregation granularity that an analyst uses to bin the
data. Picking such a windowing of the data is often done by hand, or left up to
the technology that is collecting the data. However, the choice can make a big
difference in the properties of the dynamic network. This is the time scale
detection problem. In previous work, this problem is often solved with a
heuristic as an unsupervised task. As an unsupervised problem, it is difficult
to measure how well a given algorithm performs. In addition, we show that the
quality of the windowing is dependent on which task an analyst wants to perform
on the network after windowing. Therefore the time scale detection problem
should not be handled independently from the rest of the analysis of the
network.

We introduce a framework that tackles both of these issues: By measuring the
performance of the time scale detection algorithm based on how well a given
task is accomplished on the resulting network, we are for the first time able
to directly compare different time scale detection algorithms to each other.
Using this framework, we introduce time scale detection algorithms that take a
supervised approach: they leverage ground truth on training data to find a good
windowing of the test data. We compare the supervised approach to previous
approaches and several baselines on real data.

Information Theory

On Fienup Methods for Regularized Phase Retrieval

Edouard Pauwels, Amir Beck, Yonina C. Eldar, Shoham Sabach
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

Alternating minimization, or Fienup methods, have a long history in phase
retrieval. We provide new insights related to the empirical and theoretical
analysis of these algorithms when used with Fourier measurements and combined
with convex priors. In particular, we show that Fienup methods can be viewed as
performing alternating minimization on a regularized nonconvex least-squares
problem with respect to amplitude measurements. We then prove that under mild
additional structural assumptions on the prior (semi-algebraicity), the
sequence of signal estimates has a smooth convergent behaviour towards a
critical point of the nonconvex regularized least-squares objective. Finally,
we propose an extension to Fienup techniques, based on a projected gradient
descent interpretation and acceleration using inertial terms. We demonstrate
experimentally that this modification combined with an (ell_1) prior
constitutes a competitive approach for sparse phase retrieval.

Upper and Lower Bounds for the Ergodic Capacity of MIMO Jacobi Fading Channels

Amor Nafkha, Remi Bonnefoi
Comments: 4 pages, 6 figures, submitted to the IEEE Photonics Technology Letters
Subjects: Information Theory (cs.IT)

In multi-(core/mode) optical fiber communication, the transmission channel
can be modeled as a complex sub-matrix of the Haar-distributed unitary matrix
(complex Jacobi unitary ensemble). In this letter, we present new analytical
expressions of the upper and lower bounds for the ergodic capacity of
multiple-input multiple-output Jacobi-fading channels. Recent results on the
determinant of the Jacobi unitary ensemble are employed to derive a tight lower
bound on the ergodic capacity. We use Jensen’s inequality to provide an
analytical closed-form upper bound to the ergodic capacity at any
signal-to-noise ratio (SNR). Closed-form expressions of the ergodic capacity,
at low and high SNR regimes, are also derived. Simulation results are presented
to validate the accuracy of the derived expressions.

On the second Feng-Rao distance of Algebraic Geometry codes related to Arf semigroups

J. I. Farrán, P. A. García-Sánchez, B. A. Heredia
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

We describe the second (generalized) Feng-Rao distance for elements in an Arf
numerical semigroup that are greater than or equal to the conductor of the
semigroup. This provides a lower bound for the second Hamming weight for one
point AG codes. In particular, we can obtain the second Feng-Rao distance for
the codes defined by asymptotically good towers of function fields whose
Weierstrass semigroups are inductive. In addition, we compute the second
Feng-Rao number, and provide some examples and comparisons with previous
results on this topic. These calculations rely on Ap'{e}ry sets, and thus
several results concerning Ap’ery sets of Arf semigroups are presented.

A KZ Reduction Algorithm

Jinming Wen, Xiao-Wen Chang
Comments: This work was presented in part at the IEEE International Symposium on Information Theory (ISIT 2015), Hongkong: arXiv:1504.05086
Subjects: Information Theory (cs.IT)

The Korkine-Zolotareff (KZ) reduction is one of the often used reduction
strategies for decoding lattices. A KZ reduction algorithm involves solving
shortest vector problems (SVPs) and basis expansion. In this paper, first we
improve the commonly used Schnorr-Euchner search strategy for solving SVPs.
Then, we derive upper bounds on the magnitudes of the entries of any solution
of a SVP when its basis matrix is LLL reduced. These upper bounds only involve
the parameter of the LLL reduction and the dimension of the solution and they
are useful for analyzing the complexity of the basis expansion in a KZ
reduction algorithm. Finally, we modify the basis expansion method proposed by
Zhang et al. and combine it with the improved Schnorr-Euchner search strategy
to give a new KZ reduction algorithm. Simulation results show that the new KZ
reduction algorithm is much faster and numerically more stable than the KZ
reduction algorithm proposed by Zhang et al., especially when the basis matrix
is ill conditioned.

A General Framework for Low-Resolution Receivers for MIMO Channels

Stefano Rini, Luca Barletta, Yonina C. Eldar, Elza Erkip
Subjects: Information Theory (cs.IT)

The capacity of a discrete-time multi-input multioutput (MIMO) Gaussian
channel with output quantization is investigated for different analog receiver
architectures. A general formulation of this problem is proposed in which the
antenna outputs are processed by analog combiners while sign quantizers are
used for analog/digital conversion. The configuration of the analog combiners
is chosen as a function of the channel realization, so that capacity can be
maximized over the set of available configurations for a fixed number of sign
quantizers. To exemplify this approach, three analog receiver architectures of
increasing generality are considered: (a) sign quantization of the antenna
outputs, (b) antenna selection and multilevel quantization and (c) linear
combining of the antenna outputs and multilevel quantization. By comparing the
largest attainable rates for a given number of sign quantizers, it is possible
to quantify the limiting rate advantages provided by each of the receiver
analog architectures. In particular, it is shown that sign quantization of the
antenna outputs is sufficient, among all possible receiver architectures
studied, to attain the optimal high signal-to-noise ratio (SNR) performance for
MIMO receiver in which the number of antennas is larger than the number of sign
quantizers.

Multiuser Precoding and Channel Estimation for Hybrid Millimeter Wave MIMO Systems

Lou Zhao, Derrick Wing Kwan Ng, Jinhong Yuan
Comments: 6 pages, accepted for presentation, ICC 2017
Subjects: Information Theory (cs.IT)

In this paper, we develop a low-complexity channel estimation for hybrid
millimeter wave (mmWave) systems, where the number of radio frequency (RF)
chains is much less than the number of antennas equipped at each transceiver.
The proposed channel estimation algorithm aims to estimate the strongest
angle-of-arrivals (AoAs) at both the base station (BS) and the users. Then all
the users transmit orthogonal pilot symbols to the BS via these estimated
strongest AoAs to facilitate the channel estimation. The algorithm does not
require any explicit channel state information (CSI) feedback from the users
and the associated signalling overhead of the algorithm is only proportional to
the number of users, which is significantly less compared to various existing
schemes. Besides, the proposed algorithm is applicable to both non-sparse and
sparse mmWave channel environments. Based on the estimated CSI, zero-forcing
(ZF) precoding is adopted for multiuser downlink transmission. In addition, we
derive a tight achievable rate upper bound of the system. Our analytical and
simulation results show that the proposed scheme offer a considerable
achievable rate gain compared to fully digital systems, where the number of RF
chains equipped at each transceiver is equal to the number of antennas.
Furthermore, the achievable rate performance gap between the considered hybrid
mmWave systems and the fully digital system is characterized, which provides
useful system design insights.

Lattice Coding and Decoding for Multiple-Antenna Ergodic Fading Channels

Ahmed Hindy, Aria Nosratinia
Comments: Accepted at IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)

For ergodic fading, a lattice coding and decoding strategy is proposed and
its performance is analyzed for the single-input single-output (SISO) and
multiple-input multiple-output (MIMO) point-to-point channel as well as the
multiple-access channel (MAC), with channel state information available only at
the receiver (CSIR). At the decoder a novel strategy is proposed consisting of
a time-varying equalization matrix followed by decision regions that depend
only on channel statistics, not individual realizations. Our encoder has a
similar structure to that of Erez and Zamir. For the SISO channel, the gap to
capacity is bounded by a constant under a wide range of fading distributions.
For the MIMO channel under Rayleigh fading, the rate achieved is within a gap
to capacity that does not depend on the signal-to-noise ratio (SNR), and
diminishes with the number of receive antennas. The analysis is extended to the
K-user MAC where similar results hold. Achieving a small gap to capacity while
limiting the use of CSIR to the equalizer highlights the scope for efficient
decoder implementations, since decision regions are fixed, i.e., independent of
channel realizations.

On Algorithmic Statistics for space-bounded algorithms

Alexey Milovanov
Comments: accepted to CSR 2017 conference
Subjects: Information Theory (cs.IT); Computational Complexity (cs.CC)

Algorithmic statistics studies explanations of observed data that are good in
the algorithmic sense: an explanation should be simple i.e. should have small
Kolmogorov complexity and capture all the algorithmically discoverable
regularities in the data. However this idea can not be used in practice because
Kolmogorov complexity is not computable.

In this paper we develop algorithmic statistics using space-bounded
Kolmogorov complexity. We prove an analogue of one of the main result of
`classic’ algorithmic statistics (about the connection between optimality and
randomness deficiences). The main tool of our proof is the Nisan-Wigderson
generator.

Topological Interference Management with Decoded Message Passing

Xinping Yi, Giuseppe Caire
Comments: A short version has been presented at IEEE International Symposium on Information Theory (ISIT 2016), Barcelona
Subjects: Information Theory (cs.IT)

The topological interference management (TIM) problem studies
partially-connected interference networks with no channel state information
except for the network topology (i.e., connectivity graph) at the transmitters.
In this paper, we consider a similar problem in the uplink cellular networks,
while message passing is enabled at the receivers (e.g., base stations), so
that the decoded messages can be routed to other receivers via backhaul links
to help further improve network performance. For this TIM problem with decoded
message passing (TIM-MP), we model the interference pattern by conflict
digraphs, connect orthogonal access to the acyclic set coloring on conflict
digraphs, and show that one-to-one interference alignment boils down to
orthogonal access because of message passing. With the aid of polyhedral
combinatorics, we identify the structural properties of certain classes of
network topologies where orthogonal access achieves the optimal
degrees-of-freedom (DoF) region in the information-theoretic sense. The
relation to the conventional index coding with simultaneous decoding is also
investigated by formulating a generalized index coding problem with successive
decoding as a result of decoded message passing. The properties of reducibility
and criticality are also studied, by which we are able to prove the linear
optimality of orthogonal access in terms of symmetric DoF for the networks up
to four users with all possible network topologies (218 instances). Practical
issues of the tradeoff between the overhead of message passing and the
achievable symmetric DoF are also discussed, in the hope of facilitating
efficient backhaul utilization.

Row-Centric Lossless Compression of Markov Images

Matthew G. Reyes, David L. Neuhoff
Comments: submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

Motivated by the question of whether the recently introduced Reduced Cutset
Coding (RCC) offers rate-complexity performance benefits over conventional
context-based conditional coding for sources with two-dimensional Markov
structure, this paper compares several row-centric coding strategies that vary
in the amount of conditioning as well as whether a model or an empirical table
is used in the encoding of blocks of rows. The conclusion is that, at least for
sources exhibiting low-order correlations, 1-sided model-based conditional
coding is superior to the method of RCC for a given constraint on complexity,
and conventional context-based conditional coding is nearly as good as the
1-sided model-based coding.

Delay-Optimal Probabilistic Scheduling in Green Communications with Arbitrary Arrival and Adaptive Transmission

Xiang Chen, Wei Chen, Joohyun Lee, Ness B. Shroff
Comments: 12 pages, 7 figures, extended version for our ICC’17 paper. arXiv admin note: substantial text overlap with arXiv:1609.03260
Subjects: Information Theory (cs.IT)

In this paper, we aim to obtain the optimal delay-power tradeoff and the
corresponding optimal scheduling policy for arbitrary i.i.d. arrival process
and adaptive transmissions. The number of backlogged packets at the transmitter
is known to a scheduler, who has to determine how many backlogged packets to
transmit during each time slot. The power consumption is assumed to be convex
in transmission rates. Hence, if the scheduler transmits faster, the delay will
be reduced but with higher power consumption. To obtain the optimal delay-power
tradeoff and the corresponding optimal policy, we model the problem as a
Constrained Markov Decision Process (CMDP), where we minimize the average delay
given an average power constraint. By steady-state analysis and Lagrangian
relaxation, we can show that the optimal tradeoff curve is decreasing, convex,
and piecewise linear, and the optimal policy is threshold-based. Based on the
revealed properties of the optimal policy, we develop an algorithm to
efficiently obtain the optimal tradeoff curve and the optimal policy. The
complexity of our proposed algorithm is much lower than a general algorithm
based on Linear Programming. We validate the derived results and the proposed
algorithm through Linear Programming and simulations.

Benefits of Cache Assignment on Degraded Broadcast Channels

Shirin Saeedi Bidokhti, Michele Wigger, Aylin Yener
Comments: Submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)

Degraded K-user broadcast channels (BC) are studied when receivers are
facilitated with cache memories. Lower and upper bounds are derived on the
capacity-memory tradeoff, i.e., on the largest rate of reliable communication
over the BC as a function of the receivers’ cache sizes, and the bounds are
shown to match for some special cases. The lower bounds are achieved by two new
coding schemes that benefit from non-uniform cache assignment. Lower and upper
bounds are also established on the global capacity-memory tradeoff, i.e., on
the largest capacity-memory tradeoff that can be attained by optimizing the
receivers’ cache sizes subject to a total cache memory budget. The bounds
coincide when the total cache memory budget is sufficiently small or
sufficiently large, characterized in terms of the BC statistics. For small
cache memories, it is optimal to assign all the cache memory to the weakest
receiver. In this regime, the global capacity-memory tradeoff grows as the
total cache memory budget divided by the number of files in the system. In
other words, a perfect global caching gain is achievable in this regime and the
performance corresponds to a system where all cache contents in the network are
available to all receivers. For large cache memories, it is optimal to assign a
positive cache memory to every receiver such that the weaker receivers are
assigned larger cache memories compared to the stronger receivers. In this
regime, the growth rate of the global capacity-memory tradeoff is further
divided by the number of users, which corresponds to a local caching gain.
Numerical indicate suggest that a uniform cache-assignment of the total cache
memory is suboptimal in all regimes unless the BC is completely symmetric. For
erasure BCs, this claim is proved analytically in the regime of small
cache-sizes.

Euclidean and Hermitian LCD MDS codes

Claude Carlet, Sihem Mesnager, Chunming Tang, Yanfeng Qi
Subjects: Information Theory (cs.IT)

Linear codes with complementary duals (abbreviated LCD) are linear codes
whose intersection with their dual is trivial. When they are binary, they play
an important role in armoring implementations against side-channel attacks and
fault injection attacks. Non-binary LCD codes in characteristic 2 can be
transformed into binary LCD codes by expansion. On the other hand, being
optimal codes, maximum distance separable codes (abbreviated MDS) have been of
much interest from many researchers due to their theoretical significant and
practical implications. However, little work has been done on LCD MDS codes. In
particular, determining the existence of (q)-ary ([n,k]) LCD MDS codes for
various lengths (n) and dimensions (k) is a basic and interesting problem. In
this paper, we firstly study the problem of the existence of (q)-ary ([n,k])
LCD MDS codes and completely solve it for the Euclidean case. More
specifically, we show that for (q>3) there exists a (q)-ary ([n,k]) Euclidean
LCD MDS code, where (0le k le nle q+1), or, (q=2^{m}), (n=q+2) and (k= 3
ext{or} q-1). Secondly, we investigate several constructions of new Euclidean
and Hermitian LCD MDS codes. Our main techniques in constructing Euclidean and
Hermitian LCD MDS codes use some linear codes with small dimension or
codimension, self-orthogonal codes and generalized Reed-Solomon codes.

Exact Random Coding Exponents and Universal Decoders for the Asymmetric Broadcast Channel

Ran Averbuch, Neri Merhav
Subjects: Information Theory (cs.IT)

This work contains two main contributions concerning the asymmetric broadcast
channel. The first is an analysis of the exact random coding error exponents
for both users, and the second is the derivation of universal decoders for both
users. These universal decoders are certain variants of the maximum mutual
information (MMI) universal decoder, which achieve the corresponding random
coding exponents of optimal decoding. In addition, we introduce some lower
bounds, which involve optimization over very few parameters, unlike the
original, exact exponents, which involve minimizations over auxiliary
probability distributions. Numerical results for the binary symmetric broadcast
channel show improvements over previously derived error exponents for the same
model.

Multi-scale Spectrum Sensing in Small-Cell mm-Wave Cognitive Wireless Networks

Nicolo Michelusi, Matthew Nokleby, Urbashi Mitra, Robert Calderbank
Comments: To appear on ICC 2017
Subjects: Information Theory (cs.IT)

In this paper, a multi-scale approach to spectrum sensing in cognitive
cellular networks is proposed. In order to overcome the huge cost incurred in
the acquisition of full network state information, a hierarchical scheme is
proposed, based on which local state estimates are aggregated up the hierarchy
to obtain aggregate state information at multiple scales, which are then sent
back to each cell for local decision making. Thus, each cell obtains
fine-grained estimates of the channel occupancies of nearby cells, but
coarse-grained estimates of those of distant cells. The performance of the
aggregation scheme is studied in terms of the trade-off between the throughput
achievable by secondary users and the interference generated by the activity of
these secondary users to primary users. In order to account for the irregular
structure of interference patterns arising from path loss, shadowing, and
blockages, which are especially relevant in millimeter wave networks, a greedy
algorithm is proposed to find a multi-scale aggregation tree to optimize the
performance. It is shown numerically that this tailored hierarchy outperforms a
regular tree construction by 60%.

Global Optimality in Low-rank Matrix Optimization

Zhihui Zhu, Qiuwei Li, Gongguo Tang, Michael B. Wakin
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

This paper considers the minimization of a general objective function (f(X))
over the set of non-square (n imes m) matrices where the optimal solution
(X^star) is low-rank. To reduce the computational burden, we factorize the
variable (X) into a product of two smaller matrices and optimize over these two
matrices instead of (X). Despite the resulting nonconvexity, recent studies in
matrix completion and sensing have shown that the factored problem has no
spurious local minima and obeys the so-called strict saddle property (the
function has a directional negative curvature at all critical points but local
minima). We analyze the global geometry for a general and yet well-conditioned
objective function (f(X)) whose restricted strong convexity and restricted
strong smoothness constants are comparable. In particular, we show that the
reformulated objective function has no spurious local minima and obeys the
strict saddle property. These geometric properties implies that a number of
iterative optimization algorithms (such as gradient descent) can provably solve
the factored problem with global convergence.

Rician MIMO Channel- and Jamming-Aware Decision Fusion

D. Ciuonzo, A. Aubry, V. Carotenuto
Comments: Accepted in IEEE Transactions on Signal Processing 2017
Subjects: Information Theory (cs.IT)

In this manuscript we study channel-aware decision fusion (DF) in a wireless
sensor network (WSN) where: (i) the sensors transmit their decisions
simultaneously for spectral efficiency purposes and the DF center (DFC) is
equipped with multiple antennas; (ii) each sensor-DFC channel is described via
a Rician model. As opposed to the existing literature, in order to account for
stringent energy constraints in the WSN, only statistical channel information
is assumed for the non-line-of sight (scattered) fading terms. For such a
scenario, sub-optimal fusion rules are developed in order to deal with the
exponential complexity of the likelihood ratio test (LRT) and impractical
(complete) system knowledge. Furthermore, the considered model is extended to
the case of (partially unknown) jamming-originated interference. Then the
obtained fusion rules are modified with the use of composite hypothesis testing
framework and generalized LRT. Coincidence and statistical equivalence among
them are also investigated under some relevant simplified scenarios. Numerical
results compare the proposed rules and highlight their jammingsuppression
capability.

On the Performance of Wireless Powered Communication With Non-linear Energy Harvesting

Rania Morsi, Elena Boshkovska, Esmat Ramadan, Derrick Wing Kwan Ng, Robert Schober
Comments: 5 pages, 1 Table, submitted for conference publication
Subjects: Information Theory (cs.IT)

In this paper, we analyze the performance of a time-slotted multi-antenna
wireless powered communication (WPC) system, where a wireless device first
harvests radio frequency (RF) energy from a power station (PS) in the downlink
to facilitate information transfer to an information receiving station (IRS) in
the uplink. The main goal of this paper is to provide insights and guidelines
for the design of practical WPC systems. To this end, we adopt a recently
proposed parametric non-linear RF energy harvesting (EH) model, which has been
shown to accurately model the end-to-end non-linearity of practical RF EH
circuits. In order to enhance the RF power transfer efficiency, maximum ratio
transmission is adopted at the PS to focus the energy signals on the wireless
device. Furthermore, at the IRS, maximum ratio combining is used. We analyze
the outage probability and the average throughput of information transfer,
assuming Nakagami-(m) fading uplink and downlink channels. Moreover, we study
the system performance as a function of the number of PS transmit antennas, the
number of IRS receive antennas, the transmit power of the PS, the fading
severity, the transmission rate of the wireless device, and the EH time
duration. In addition, we obtain a fixed point equation for the optimal
transmission rate and the optimal EH time duration that maximize the asymptotic
throughput for high PS transmit powers. All analytical results are corroborated
by simulations.

New constructions of MDS codes with complementary duals

Bocong Chen, Hongwei Liu
Subjects: Information Theory (cs.IT)

Linear complementary-dual (LCD for short) codes are linear codes that
intersect with their duals trivially. LCD codes have been used in certain
communication systems. It is recently found that LCD codes can be applied in
cryptography. This application of LCD codes renewed the interest in the
construction of LCD codes having a large minimum distance. MDS codes are
optimal in the sense that the minimum distance cannot be improved for given
length and code size. Constructing LCD MDS codes is thus of significance in
theory and practice. Recently, Jin (cite{Jin}, IEEE Trans. Inf. Theory, 2016)
constructed several classes of LCD MDS codes through generalized Reed-Solomon
codes. In this paper, a different approach is proposed to obtain new LCD MDS
codes from generalized Reed-Solomon codes. Consequently, new code constructions
are provided and certain previously known results in cite{Jin} are extended.

Decoding Generalized Reed-Solomon Codes and Its Application to RLCE Encryption Schemes

Yongge Wang
Subjects: Information Theory (cs.IT)

This paper presents a survey on generalized Reed-Solomon codes and various
decoding algorithms: Berlekamp-Massey decoding algorithms; Berlekamp-Welch
decoding algorithms; Euclidean decoding algorithms; discrete Fourier decoding
algorithms, Chien’s search algorithm, and Forney’s algorithm.

Tensor Balancing on Statistical Manifold

Mahito Sugiyama, Hiroyuki Nakahara, Koji Tsuda
Comments: 19 pages, 5 figures
Subjects: Methodology (stat.ME); Information Theory (cs.IT); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

We solve tensor balancing, rescaling an Nth order nonnegative tensor by
multiplying (N – 1)th order N tensors so that every fiber sums to one. This
generalizes a fundamental process of matrix balancing used to compare matrices
in a wide range of applications from biology to economics. We present an
efficient balancing algorithm with quadratic convergence using Newton’s method
and show in numerical experiments that the proposed algorithm is several orders
of magnitude faster than existing ones. To theoretically prove the correctness
of the algorithm, we model tensors as probability distributions in a
statistical manifold and realize tensor balancing as projection onto a
submanifold. The key to our algorithm is that the gradient of the manifold,
used as a Jacobian matrix in Newton’s method, can be analytically obtained
using the M”obius inversion formula, the essential of combinatorial
mathematics. Our model is not limited to tensor balancing but has a wide
applicability as it includes various statistical and machine learning models
such as weighted DAGs and Boltzmann machines.

On the calculation of Fisher information for quantum parameter estimation based on the stochastic master equation

Beili Gong, Wei Cui
Comments: 8 pages, 6 figur
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

The Fisher information can be used to indicate the precision of parameter
estimation by the quantum Cram'{e}r-Rao inequality. This paper presents an
efficient numerical algorithm for the calculation of Fisher information based
on quantum weak measurement. According to the quantum stochastic master
equation, the Fisher information is expressed in the form of log-likelihood
functions. Three main methods are employed in this algorithm: (i) we use the
numerical differentiation approach to calculate the derivative of the
log-likelihood function; (ii) we randomly generate a series of parameters of
interest by the Metropolis Hastings (MH) algorithm; and (iii) the values of
expectation can be approximated by the Markov chain Monte Carlo (MCMC)
integration. Finally, as an example to testify the feasibility of the proposed
algorithm, we consider the dissipation rates of the open quantum system as
unknown parameters that need to be estimated. We show that the Fisher
information can reach a precision close to the Heisenberg limit in the weak
coupling condition. This again demonstrates the effectiveness of the new
algorithm.

Upper bounds on the smallest size of a saturating set in projective planes and spaces of even dimension

Daniele Bartoli, Alexander Davydov, Massimo Giulietti, Stefano Marcugini, Fernanda Pambianco
Comments: 14 pages, 34 references, 1 figure
Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

In a projective plane (Pi_{q}) (not necessarily Desarguesian) of order (q),
a point subset (mathcal{S}) is saturating (or dense) if any point of
(Pi_{q}setminus mathcal{S}) is collinear with two points in (mathcal{S}).
Modifying an approach of [31], we proved the following upper bound on the
smallest size (s(2,q)) of a saturating set in (Pi_{q}): egin{equation*}
s(2,q)leq sqrt{(q+1)left(3ln q+lnln q
+lnfrac{3}{4}
ight)}+sqrt{frac{q}{3ln q}}+3. end{equation*} The bound
holds for all q, not necessarily large.

By using inductive constructions, upper bounds on the smallest size of a
saturating set in the projective space (mathrm{PG}(N,q)) with even dimension
(N) are obtained.

All the results are also stated in terms of linear covering codes.

Nonparanormal Information Estimation

Shashank Singh, Barnabás Pøczos
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (stat.ML)

We study the problem of using i.i.d. samples from an unknown multivariate
probability distribution (p) to estimate the mutual information of (p). This
problem has recently received attention in two settings: (1) where (p) is
assumed to be Gaussian and (2) where (p) is assumed only to lie in a large
nonparametric smoothness class. Estimators proposed for the Gaussian case
converge in high dimensions when the Gaussian assumption holds, but are
brittle, failing dramatically when (p) is not Gaussian. Estimators proposed for
the nonparametric case fail to converge with realistic sample sizes except in
very low dimensions. As a result, there is a lack of robust mutual information
estimators for many realistic data. To address this, we propose estimators for
mutual information when (p) is assumed to be a nonparanormal (a.k.a., Gaussian
copula) model, a semiparametric compromise between Gaussian and nonparametric
extremes. Using theoretical bounds and experiments, we show these estimators
strike a practical balance between robustness and scaling with dimensionality.

Key Reconciliation with Low-Density Parity-Check Codes for Long-Distance Quantum Cryptography

Mario Milicevic, Chen Feng, Lei M. Zhang, P. Glenn Gulak
Comments: 20 pages, 16 figures, 4 tables, 74 references
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)

The speed at which two remote parties can exchange secret keys over a
fixed-length fiber-optic cable in continuous-variable quantum key distribution
(CV-QKD) is currently limited by the computational complexity of
post-processing algorithms for key reconciliation. Multi-edge low-density
parity-check (LDPC) codes with low code rates and long block lengths were
proposed for CV-QKD, in order to extend the maximum reconciliation distance
between the two remote parties. Key reconciliation over multiple dimensions has
been shown to further improve the error-correction performance of multi-edge
LDPC codes in CV-QKD, thereby increasing both the secret key rate and distance.
However, the computational complexity of LDPC decoding for long block lengths
on the order of 10^6 bits remains a challenge. This work introduces a
quasi-cyclic code construction for multi-edge LDPC codes that is highly
suitable for hardware-accelerated decoding on a modern graphics processing unit
(GPU). When combined with an 8-dimensional reconciliation scheme, the LDPC
decoder achieves a raw decoding throughput of 1.72Mbit/s and an information
throughput of 7.16Kbit/s using an NVIDIA GeForce GTX 1080 GPU, at a maximum
distance of 164km with an effective secret key rate of 8.24×10^{-7} bits/pulse
for a rate 0.02 multi-edge code with block length of 10^6 bits. For distances
beyond 130km, the decoder delivers an information throughput between 2745x and
9138x higher than the maximum secret key rate achievable with a 1MHz light
source, thereby showing that LDPC decoding is no longer the computational
bottleneck in long-distance CV-QKD.

Holistic Small Cell Traffic Balancing across Licensed and Unlicensed Bands

Ursula Challita, Mahesh K. Marina
Comments: Accepted for publication at MSWiM 2016
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

Due to the dramatic growth in mobile data traffic on one hand and the
scarcity of the licensed spectrum on the other hand, mobile operators are
considering the use of unlicensed bands (especially those in 5 GHz) as
complementary spectrum for providing higher system capacity and better user
experience. This approach is currently being standardized by 3GPP under the
name of LTE Licensed-Assisted Access (LTE-LAA). In this paper, we take a
holistic approach for LTE-LAA small cell traffic balancing by jointly
optimizing the use of the licensed and unlicensed bands. We pose this traffic
balancing as an optimization problem that seeks proportional fair coexistence
of WiFi, small cell and macro cell users by adapting the transmission
probability of the LTE-LAA small cell in the licensed and unlicensed bands. The
motivation for this formulation is for the LTE-LAA small cell to switch between
or aggregate licensed and unlicensed bands depending on the
interference/traffic level and the number of active users in each band. We
derive a closed form solution for this optimization problem and additionally
propose a transmission mechanism for the operation of the LTE-LAA small cell on
both bands. Through numerical and simulation results, we show that our proposed
traffic balancing scheme, besides enabling better LTE-WiFi coexistence and
efficient utilization of the radio resources relative to the existing traffic
balancing scheme, also provides a better tradeoff between maximizing the total
network throughput and achieving fairness among all network flows compared to
alternative approaches.