IT博客汇 | arXiv Paper Daily: Fri, 24 Mar 2017

arXiv Paper Daily: Fri, 24 Mar 2017

我爱机器学习(52ml.net)发表于 2017-03-24 00:00:00

Neural and Evolutionary Computing

Classification-based RNN machine translation using GRUs

Ri Wang, Maysum Panju, Mahmood Gohari
Comments: 7 pages, 1 figure; graduate course research project
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

We report the results of our classification-based machine translation model,
built upon the framework of a recurrent neural network using gated recurrent
units. Unlike other RNN models that attempt to maximize the overall conditional
log probability of sentences against sentences, our model focuses a
classification approach of estimating the conditional probability of the next
word given the input sequence. This simpler approach using GRUs was hoped to be
comparable with more complicated RNN models, but achievements in this
implementation were modest and there remains a lot of room for improving this
classification approach.

Generative Adversarial Residual Pairwise Networks for One Shot Learning

Akshay Mehrotra, Ambedkar Dukkipati
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Deep neural networks achieve unprecedented performance levels over many tasks
and scale well with large quantities of data, but performance in the low-data
regime and tasks like one shot learning still lags behind. While recent work
suggests many hypotheses from better optimization to more complicated network
structures, in this work we hypothesize that having a learnable and more
expressive similarity objective is an essential missing component. Towards
overcoming that, we propose a network design inspired by deep residual networks
that allows the efficient computation of this more expressive pairwise
similarity objective. Further, we argue that regularization is key in learning
with small amounts of data, and propose an additional generator network based
on the Generative Adversarial Networks where the discriminator is our residual
pairwise network. This provides a strong regularizer by leveraging the
generated data samples. The proposed model can generate plausible variations of
exemplars over unseen classes and outperforms strong discriminative baselines
for few shot classification tasks. Notably, our residual pairwise network
design outperforms previous state-of-theart on the challenging mini-Imagenet
dataset for one shot learning by getting over 55% accuracy for the 5-way
classification task over unseen classes.

The role of zero synapses in unsupervised feature learning

Haiping Huang
Comments: 5 pages, 4 figures
Subjects: Neurons and Cognition (q-bio.NC); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Synapses in real neural circuits can take discrete values, including zero
(silent or potential) synapses. The computational role of zero synapses in
unsupervised feature learning of unlabeled noisy data is still unclear, yet
important to understand how the sparseness of synaptic activity is shaped
during learning and its relationship with receptive field formation. Here, we
formulate this kind of sparse feature learning by statistical mechanics
approach. We find that learning decreases the fraction of zero synapses, and
when the fraction decreases rapidly around a critical data size, the
intrinsically structured receptive field starts to develop. Further increasing
the data size refines the receptive field, while a very small fraction of zero
synapses remain to act as contour detectors. This feature is discovered not
only in a handwritten digits dataset, but also in retinal neural activity
measured in a natural movie stimuli experiment.

Adversarial synapses: Hebbian/anti-Hebbian learning optimizes min-max objectives

Cengiz Pehlevan, Anirvan Sengupta, Dmitri B. Chklovskii
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

A promising approach towards understanding neural networks is to view them as
implementations of online algorithms optimizing principled objectives. Existing
neural algorithms capturing both neural activity dynamics and synaptic weight
updates implement the same operation, either minimization or maximization of
the objective, with respect to each variable. Here, we derive neural networks
from principled min-max objectives: by minimizing with respect to neural
activity and feedforward synaptic weights, and maximizing with respect to
lateral synaptic weights. In turn, the min-max objectives are obtained via the
Hubbard-Stratonovich (HS) transform of similarity matching objectives. The
resulting networks perform dimensionality reduction of the input data resorting
only to biologically plausible local learning rules. The min-max nature of the
objective is reflected in the antagonism between Hebbian feedforward and
anti-Hebbian lateral learning in derived networks. We prove that the only
stable fixed points of the network dynamics correspond to the principal
subspace projection (PSP) or the principal subspace whitening (PSW). Finally,
from the min-max objectives we derive novel formulations of dimensionality
reduction using fractional matrix exponents.

Computer Vision and Pattern Recognition

Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling

Alexander Richard, Hilde Kuehne, Juergen Gall
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present an approach for weakly supervised learning of human actions. Given
a set of videos and an ordered list of the occurring actions, the goal is to
infer start and end frames of the related action classes within the video and
to train the respective action classifiers without any need for hand labeled
frame boundaries. To address this task, we propose a combination of a
discriminative representation of subactions, modeled by a recurrent neural
network, and a coarse probabilistic model to allow for a temporal alignment and
inference over long sequences. While this system alone already generates good
results, we show that the performance can be further improved by approximating
the number of subactions to the characteristics of the different action
classes. To this end, we adapt the number of subaction classes by iterating
realignment and reestimation during training. The proposed system is evaluated
on two benchmark datasets, the Breakfast and the Hollywood extended dataset,
showing a competitive performance on various weak learning tasks such as
temporal action segmentation and action alignment.

Quality Resilient Deep Neural Networks

Samuel Dodge, Lina Karam
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We study deep neural networks for classification of images with quality
distortions. We first show that networks fine-tuned on distorted data greatly
outperform the original networks when tested on distorted data. However,
fine-tuned networks perform poorly on quality distortions that they have not
been trained for. We propose a mixture of experts ensemble method that is
robust to different types of distortions. The “experts” in our model are
trained on a particular type of distortion. The output of the model is a
weighted sum of the expert models, where the weights are determined by a
separate gating network. The gating network is trained to predict optimal
weights for a particular distortion type and level. During testing, the network
is blind to the distortion level and type, yet can still assign appropriate
weights to the expert models. We additionally investigate weight sharing
methods for the mixture model and show that improved performance can be
achieved with a large reduction in the number of unique network parameters.

A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition

Alexander Richard (1), Juergen Gall (1) ((1) University of Bonn)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The traditional bag-of-words approach has found a wide range of applications
in computer vision. The standard pipeline consists of a generation of a visual
vocabulary, a quantization of the features into histograms of visual words, and
a classification step for which usually a support vector machine in combination
with a non-linear kernel is used. Given large amounts of data, however, the
model suffers from a lack of discriminative power. This applies particularly
for action recognition, where the vast amount of video features needs to be
subsampled for unsupervised visual vocabulary generation. Moreover, the kernel
computation can be very expensive on large datasets. In this work, we propose a
recurrent neural network that is equivalent to the traditional bag-of-words
approach but enables for the application of discriminative training. The model
further allows to incorporate the kernel computation into the neural network
directly, solving the complexity issue and allowing to represent the complete
classification system within a single network. We evaluate our method on four
recent action recognition benchmarks and show that the conventional model as
well as sparse coding methods are outperformed.

Is Second-order Information Helpful for Large-scale Visual Recognition?

Peihua Li, Jiangtao Xie, Qilong Wang, Wangmeng Zuo
Subjects: Computer Vision and Pattern Recognition (cs.CV)

By stacking deeper layers of convolutions and nonlinearity, convolutional
networks (ConvNets) effectively learn from low-level to high-level features and
discriminative representations. Since the end goal of large-scale recognition
is to delineate the complex boundaries of thousands of classes in a
large-dimensional space, adequate exploration of feature distributions is
important for realizing full potentials of ConvNets. However, state-of-the-art
works concentrate only on deeper or wider architecture design, while rarely
exploring feature statistics higher than first-order. We take a step towards
addressing this problem. Our method consists in covariance pooling, instead of
the most commonly used first-order pooling, of high-level convolutional
features. The main challenges involved are robust covariance estimation given a
small sample of large-dimensional features and usage of the manifold structure
of covariance matrices. To address these challenges, we present a Matrix Power
Normalized Covariance (MPN-COV) method. We develop the forward and backward
propagation formulas regarding the nonlinear matrix functions such that MPN-COV
can be trained end-to-end. In addition, we analyze both qualitatively and
quantitatively its advantage over the widely used Log-Euclidean metric. On the
ImageNet 2012 validation set, by combining MPN-COV we achieve over 4%, 3% and
2.5% gains for AlexNet, VGG-M and VGG-16, respectively; integration of MPN-COV
into 50-layer ResNet outperforms ResNet-101 and is comparable to ResNet-152,
both of which use first-order, global average pooling.

Generative Adversarial Residual Pairwise Networks for One Shot Learning

Akshay Mehrotra, Ambedkar Dukkipati
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Deep neural networks achieve unprecedented performance levels over many tasks
and scale well with large quantities of data, but performance in the low-data
regime and tasks like one shot learning still lags behind. While recent work
suggests many hypotheses from better optimization to more complicated network
structures, in this work we hypothesize that having a learnable and more
expressive similarity objective is an essential missing component. Towards
overcoming that, we propose a network design inspired by deep residual networks
that allows the efficient computation of this more expressive pairwise
similarity objective. Further, we argue that regularization is key in learning
with small amounts of data, and propose an additional generator network based
on the Generative Adversarial Networks where the discriminator is our residual
pairwise network. This provides a strong regularizer by leveraging the
generated data samples. The proposed model can generate plausible variations of
exemplars over unseen classes and outperforms strong discriminative baselines
for few shot classification tasks. Notably, our residual pairwise network
design outperforms previous state-of-theart on the challenging mini-Imagenet
dataset for one shot learning by getting over 55% accuracy for the 5-way
classification task over unseen classes.

Saliency-guided video classification via adaptively weighted learning

Yunzhen Zhao, Yuxin Peng
Comments: 6 pages, 1 figure, accepted by ICME 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Video classification is productive in many practical applications, and the
recent deep learning has greatly improved its accuracy. However, existing works
often model video frames indiscriminately, but from the view of motion, video
frames can be decomposed into salient and non-salient areas naturally. Salient
and non-salient areas should be modeled with different networks, for the former
present both appearance and motion information, and the latter present static
background information. To address this problem, in this paper, video saliency
is predicted by optical flow without supervision firstly. Then two streams of
3D CNN are trained individually for raw frames and optical flow on salient
areas, and another 2D CNN is trained for raw frames on non-salient areas. For
the reason that these three streams play different roles for each class, the
weights of each stream are adaptively learned for each class. Experimental
results show that saliency-guided modeling and adaptively weighted learning can
reinforce each other, and we achieve the state-of-the-art results.

Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

Timo von Marcard, Bodo Rosenhahn, Michael J. Black, Gerard Pons-Moll
Comments: 12 pages, Accepted at Eurographics 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

We address the problem of making human motion capture in the wild more
practical by using a small set of inertial sensors attached to the body. Since
the problem is heavily under-constrained, previous methods either use a large
number of sensors, which is intrusive, or they require additional video input.
We take a different approach and constrain the problem by: (i) making use of a
realistic statistical body model that includes anthropometric constraints and
(ii) using a joint optimization framework to fit the model to orientation and
acceleration measurements over multiple frames. The resulting tracker Sparse
Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to
the wrists, lower legs, back and head) and works for arbitrary human motions.
Experiments on the recently released TNT15 dataset show that, using the same
number of sensors, SIP achieves higher accuracy than the dataset baseline
without using any video data. We further demonstrate the effectiveness of SIP
on newly recorded challenging motions in outdoor scenarios such as climbing or
jumping over a wall.

A learning-based approach to text image retrieval: using CNN features and improved similarity metrics

Mao Tan, Si-Ping Yuan, Yong-Xin Su
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG)

Text content can have different visual presentation ways with roughly similar
characters. While conventional text image retrieval depends on complex model of
OCR-based text recognition and text similarity detection, this paper proposes a
new learning-based approach to text image retrieval with the purpose of finding
out the original or similar text through a query text image. Firstly, features
of text images are extracted by the CNN network to obtain the deep visual
representations. Then, the dimension of CNN features is reduced by PCA method
to improve the efficiency of similarity detection. Based on that, an improved
similarity metrics with article theme relevance filtering is proposed to
improve the retrieval accuracy. In experimental procedure, we collect a group
of academic papers both including English and Chinese as the text database, and
cut them into pieces of text image. A text image with changed text content is
used as the query image, experimental results show that the proposed approach
has good ability to retrieve the original text content.

Nonlinear Spectral Image Fusion

Martin Benning, Michael Möller, Raz Z. Nossek, Martin Burger, Daniel Cremers, Guy Gilboa, Carola-Bibiane Schönlieb
Comments: 13 pages, 9 figures, submitted to SSVM conference proceedings 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)

In this paper we demonstrate that the framework of nonlinear spectral
decompositions based on total variation (TV) regularization is very well suited
for image fusion as well as more general image manipulation tasks. The
well-localized and edge-preserving spectral TV decomposition allows to select
frequencies of a certain image to transfer particular features, such as
wrinkles in a face, from one image to another. We illustrate the effectiveness
of the proposed approach in several numerical experiments, including a
comparison to the competing techniques of Poisson image editing, linear
osmosis, wavelet fusion and Laplacian pyramid fusion. We conclude that the
proposed spectral TV image decomposition framework is a valuable tool for semi-
and fully-automatic image editing and fusion.

Weakly Supervised Object Localization Using Things and Stuff Transfer

Miaojing Shi, Holger Caesar, Vittorio Ferrari
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose to help weakly supervised object localization for classes where
location annotations are not available, by transferring things and stuff
knowledge from a source set with available annotations. The source and target
classes might share similar appearance (e.g. bear fur is similar to cat fur) or
appear against similar background (e.g. horse and sheep appear against grass).
To exploit this, we acquire three types of knowledge from the source set: a
segmentation model trained on both thing and stuff classes; similarity
relations between target and source classes; and co-occurrence relations
between thing and stuff classes in the source. The segmentation model is used
to generate thing and stuff segmentation maps on a target image, while the
class similarity and co-occurrence knowledge help refining them. We then
incorporate these maps as new cues into a multiple instance learning framework
(MIL), propagating the transferred knowledge from the pixel level to the object
proposal level. In extensive experiments, we conduct our transfer from the
PASCAL Context dataset (source) to the ILSVRC, COCO and PASCAL VOC 2007
datasets (targets). We evaluate our transfer across widely different thing
classes, including some that are not similar in appearance, but appear against
similar background. The results demonstrate significant improvement over
standard MIL, and we outperform the state-of-the-art in the transfer setting.

Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders

Fengfu Li, Hong Qiao, Bo Zhang, Xuanyang Xi
Comments: 27 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Traditional image clustering methods take a two-step approach, feature
learning and clustering, sequentially. However, recent research results
demonstrated that combining the separated phases in a unified framework and
training them jointly can achieve a better performance. In this paper, we first
introduce fully convolutional auto-encoders for image feature learning and then
propose a unified clustering framework to learn image representations and
cluster centers jointly based on a fully convolutional auto-encoder and soft
(k)-means scores. At initial stages of the learning procedure, the
representations extracted from the auto-encoder may not be very discriminative
for latter clustering. We address this issue by adopting a boosted
discriminative distribution, where high score assignments are highlighted and
low score ones are de-emphasized. With the gradually boosted discrimination,
clustering assignment scores are discriminated and cluster purities are
enlarged. Experiments on several vision benchmark datasets show that our
methods can achieve a state-of-the-art performance.

Image-based Localization using Hourglass Networks

Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, Esa Rahtu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose an encoder-decoder convolutional neural network
(CNN) architecture for estimating camera pose (orientation and location) from a
single RGB-image. The architecture has a hourglass shape consisting of a chain
of convolution and up-convolution layers followed by a regression part. The
up-convolution layers are introduced to preserve the fine-grained information
of the input image. Following the common practice, we train our model in
end-to-end manner utilizing transfer learning from large scale classification
data. The experiments demonstrate the performance of the approach on data
exhibiting different lighting conditions, reflections, and motion blur. The
results indicate a clear improvement over the previous state-of-the-art even
when compared to methods that utilize sequence of test frames instead of a
single frame.

Robust SfM with Little Image Overlap

Yohann Salaun, Renaud Marlet, Pascal Monasse
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Usual Structure-from-Motion (SfM) techniques require at least trifocal
overlaps to calibrate cameras and reconstruct a scene. We consider here
scenarios of reduced image sets with little overlap, possibly as low as two
images at most seeing the same part of the scene. We propose a new method,
based on line coplanarity hypotheses, for estimating the relative scale of two
independent bifocal calibrations sharing a camera, without the need of any
trifocal information or Manhattan-world assumption. We use it to compute SfM in
a chain of up-to-scale relative motions. For accuracy, we however also make use
of trifocal information for line and/or point features, when present, relaxing
usual trifocal constraints. For robustness to wrong assumptions and mismatches,
we embed all constraints in a parameterless RANSAC-like approach. Experiments
show that we can calibrate datasets that previously could not, and that this
wider applicability does not come at the cost of inaccuracy.

Recurrent Multimodal Interaction for Referring Image Segmentation

Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Alan Yuille
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we are interested in the problem of image segmentation given
natural language descriptions, i.e. referring expressions. Existing works
tackle this problem by first modeling images and sentences independently and
then segment images by combining these two types of representations. We argue
that learning word-to-image interaction is more native in the sense of jointly
modeling two modalities for the image segmentation task, and we propose
convolutional multimodal LSTM to encode the sequential interactions between
individual words, visual information, and spatial information. We show that our
proposed model outperforms the baseline model on benchmark datasets. In
addition, we analyze the intermediate output of the proposed multimodal LSTM
approach and empirically explains how this approach enforces a more effective
word-to-image interaction.

Planar Object Tracking in the Wild: A Benchmark

Pengpeng Liang, Yifan Wu, Haibin Ling
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Planar object tracking plays an important role in computer vision and related
fields. While several benchmarks have been constructed for evaluating
state-of-the-art algorithms, there is a lack of video sequences captured in the
wild rather than in constrained laboratory environment. In this paper, we
present a carefully designed planar object tracking benchmark containing 210
videos of 30 planar objects sampled in the natural environment. In particular,
for each object, we shoot seven videos involving various challenging factors,
namely scale change, rotation, perspective distortion, motion blur, occlusion,
out-of-view, and unconstrained. The ground truth is carefully annotated
semi-manually to ensure the quality. Moreover, eleven state-of-the-art
algorithms are evaluated on the benchmark using two evaluation metrics, with
detailed analysis provided for the evaluation results. We expect the proposed
benchmark to benefit future studies on planar object tracking.

Guided Perturbations: Self Corrective Behavior in Convolutional Neural Networks

Swami Sankaranarayanan, Arpit Jain, Ser Nam Lim
Comments: 15 pages, 16 figures; Under review for conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Convolutional Neural Networks have been a subject of great importance over
the past decade and great strides have been made in their utility for producing
state of the art performance in many computer vision problems. However, the
behavior of deep networks is yet to be fully understood and is still an active
area of research. In this work, we present an intriguing behavior: pre-trained
CNNs can be made to improve their predictions by structurally perturbing the
input. We observe that these perturbations – referred as Guided Perturbations –
enable a trained network to improve its prediction performance without any
learning or change in network weights. We perform various ablative experiments
to understand how these perturbations affect the local context and feature
representations. Furthermore, we demonstrate that this idea can improve
performance of several existing approaches on semantic segmentation and scene
labeling tasks on the PASCAL VOC dataset and supervised classification tasks on
MNIST and CIFAR10 datasets.

Changing Fashion Cultures

Kaori Abe, Teppei Suzuki, Shunya Ueta, Akio Nakamura, Yutaka Satoh, Hirokatsu Kataoka
Comments: 9 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Multimedia (cs.MM)

The paper presents a novel concept that analyzes and visualizes worldwide
fashion trends. Our goal is to reveal cutting-edge fashion trends without
displaying an ordinary fashion style. To achieve the fashion-based analysis, we
created a new fashion culture database (FCDB), which consists of 76 million
geo-tagged images in 16 cosmopolitan cities. By grasping a fashion trend of
mixed fashion styles,the paper also proposes an unsupervised fashion trend
descriptor (FTD) using a fashion descriptor, a codeword vetor, and temporal
analysis. To unveil fashion trends in the FCDB, the temporal analysis in FTD
effectively emphasizes consecutive features between two different times. In
experiments, we clearly show the analysis of fashion trends and fashion-based
city similarity. As the result of large-scale data collection and an
unsupervised analyzer, the proposed approach achieves world-level fashion
visualization in a time series. The code, model, and FCDB will be publicly
available after the construction of the project page.

Bidirectional-Convolutional LSTM Based Spectral-Spatial Feature Learning for Hyperspectral Image Classification

Qingshan Liu, Feng Zhou, Renlong Hang, Xiaotong Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel deep learning framework named
bidirectional-convolutional long short term memory (Bi-CLSTM) network to
automatically learn the spectral-spatial feature from hyperspectral images
(HSIs). In the network, the issue of spectral feature extraction is considered
as a sequence learning problem, and a recurrent connection operator across the
spectral domain is used to address it. Meanwhile, inspired from the widely used
convolutional neural network (CNN), a convolution operator across the spatial
domain is incorporated into the network to extract the spatial feature.
Besides, to sufficiently capture the spectral information, a bidirectional
recurrent connection is proposed. In the classification phase, the learned
features are concatenated into a vector and fed to a softmax classifier via a
fully-connected operator. To validate the effectiveness of the proposed
Bi-CLSTM framework, we compare it with several state-of-the-art methods,
including the CNN framework, on three widely used HSIs. The obtained results
show that Bi-CLSTM can improve the classification performance as compared to
other methods.

Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression

Aaron S. Jackson, Adrian Bulat, Vasileios Argyriou, Georgios Tzimiropoulos
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

3D face reconstruction is a fundamental Computer Vision problem of
extraordinary difficulty. Current systems often assume the availability of
multiple facial images (sometimes from the same subject) as input, and must
address a number of methodological challenges such as establishing dense
correspondences across large facial poses, expressions, and non-uniform
illumination. In general these methods require complex and inefficient
pipelines for model building and fitting. In this work, we propose to address
many of these limitations by training a Convolutional Neural Network (CNN) on
an appropriate dataset consisting of 2D images and 3D facial models or scans.
Our CNN works with just a single 2D facial image, does not require accurate
alignment nor establishes dense correspondence between images, works for
arbitrary facial poses and expressions, and can be used to reconstruct the
whole 3D facial geometry (including the non-visible parts of the face)
bypassing the construction (during training) and fitting (during testing) of a
3D Morphable Model. We achieve this via a simple CNN architecture that performs
direct regression of a volumetric representation of the 3D facial geometry from
a single 2D image. We also demonstrate how the related task of facial landmark
localization can be incorporated into the proposed framework and help improve
reconstruction quality, especially for the cases of large poses and facial
expressions. Testing code will be made available online, along with pre-trained
models this http URL

Cross-View Image Matching for Geo-localization in Urban Environments

Yicong Tian, Chen Chen, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we address the problem of cross-view image geo-localization.
Specifically, we aim to estimate the GPS location of a query street view image
by finding the matching images in a reference database of geo-tagged bird’s eye
view images, or vice versa. To this end, we present a new framework for
cross-view image geo-localization by taking advantage of the tremendous success
of deep convolutional neural networks (CNNs) in image classification and object
detection. First, we employ the Faster R-CNN to detect buildings in the query
and reference images. Next, for each building in the query image, we retrieve
the (k) nearest neighbors from the reference buildings using a Siamese network
trained on both positive matching image pairs and negative pairs. To find the
correct NN for each query building, we develop an efficient multiple nearest
neighbors matching method based on dominant sets. We evaluate the proposed
framework on a new dataset that consists of pairs of street view and bird’s eye
view images. Experimental results show that the proposed method achieves better
geo-localization accuracy than other approaches and is able to generalize to
images at unseen locations.

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

Huijuan Xu, Abir Das, Kate Saenko
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address the problem of activity detection in continuous, untrimmed video
streams. This is a difficult task that requires extracting meaningful
spatio-temporal features to capture activities, accurately localizing the start
and end times of each activity, and also dealing with very large data volumes.
We introduce a new model, Region Convolutional 3D Network (R-C3D), which
encodes the video streams using a three-dimensional fully convolutional
network, then generates candidate temporal regions containing activities, and
finally classifies selected regions into specific activities. Computation is
saved due to the sharing of convolutional features between the proposal and the
classification pipelines. The entire model is trained end-to-end with jointly
optimized localization and classification losses. R-C3D is faster than existing
methods (569 frames per second on a single Titan X Maxwell GPU) and achieves
state-of-the-art results on THUMOS’14 (10\% absolute improvement). We further
demonstrate that our model is a general activity detection framework that does
not rely on assumptions about particular dataset properties by evaluating our
approach on ActivityNet and Charades.

Visually grounded learning of keyword prediction from untranscribed speech

Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
Comments: 5 pages, 3 figures, 5 tables
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

During language acquisition, infants have the benefit of visual cues to
ground spoken language. Robots similarly have access to audio and visual
sensors. Recent work has shown that images and spoken captions can be mapped
into a meaningful common space, allowing images to be retrieved using speech
and vice versa. In this setting of images paired with untranscribed spoken
captions, we consider whether computer vision systems can be used to obtain
textual labels for the speech. Concretely, we use an image-to-words multi-label
visual classifier to tag images with soft textual labels, and then train a
neural network to map from the speech to these soft targets. We show that the
resulting speech system is able to predict which words occur in an
utterance—acting as a spoken bag-of-words classifier—without seeing any
parallel speech and text. We find that the model often confuses semantically
related words, e.g. “man” and “person”, making it even more effective as a
semantic keyword spotter.

Recurrent and Contextual Models for Visual Question Answering

Abhijit Sharang, Eric Lau
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

We propose a series of recurrent and contextual neural network models for
multiple choice visual question answering on the Visual7W dataset. Motivated by
divergent trends in model complexities in the literature, we explore the
balance between model expressiveness and simplicity by studying incrementally
more complex architectures. We start with LSTM-encoding of input questions and
answers; build on this with context generation by LSTM-encodings of neural
image and question representations and attention over images; and evaluate the
diversity and predictive power of our models and the ensemble thereof. All
models are evaluated against a simple baseline inspired by the current
state-of-the-art, consisting of involving simple concatenation of bag-of-words
and CNN representations for the text and images, respectively. Generally, we
observe marked variation in image-reasoning performance between our models not
obvious from their overall performance, as well as evidence of dataset bias.
Our standalone models achieve accuracies up to (64.6\%), while the ensemble of
all models achieves the best accuracy of (66.67\%), within (0.5\%) of the
current state-of-the-art for Visual7W.

Robust Kronecker-Decomposable Component Analysis for Low Rank Modeling

Mehdi Bahri, Yannis Panagakis, Stefanos Zafeiriou
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)

Dictionary learning and component analysis are part of one of the most
well-studied and active research fields, at the intersection of signal and
image processing, computer vision, and statistical machine learning. In
dictionary learning, the current methods of choice are arguably K-SVD and its
variants, which learn a dictionary (i.e., a decomposition) for sparse coding
via Singular Value Decomposition. In robust component analysis, leading methods
derive from Principal Component Pursuit (PCP), which recovers a low-rank matrix
from sparse corruptions of unknown magnitude and support. While K-SVD is
sensitive to the presence of noise and outliers in the training set, PCP does
not provide a dictionary that respects the structure of the data (e.g.,
images), and requires expensive SVD computations when solved by convex
relaxation. In this paper, we introduce a new robust decomposition of images by
combining ideas from sparse dictionary learning and PCP. We propose a novel
Kronecker-decomposable component analysis which is robust to gross corruption,
can be used for low-rank modeling, and leverages separability to solve
significantly smaller problems. We design an efficient learning algorithm by
drawing links with a restricted form of tensor factorization. The effectiveness
of the proposed approach is demonstrated on real-world applications, namely
background subtraction and image denoising, by performing a thorough comparison
with the current state of the art.

Artificial Intelligence

Note Value Recognition for Rhythm Transcription Using a Markov Random Field Model for Musical Scores and Performances of Piano Music

Eita Nakamura, Kazuyoshi Yoshii, Simon Dixon
Comments: 12 pages, 15 figures, version submitted to IEEE/ACM TASLP
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD)

This paper presents a statistical method for music transcription that can
estimate score times of note onsets and offsets from polyphonic MIDI
performance signals. Because performed note durations can deviate largely from
score-indicated values, previous methods had the problem of not being able to
accurately estimate offset score times (or note values) and thus could only
output incomplete musical scores. Based on observations that the pitch context
and onset score times are influential on the configuration of note values, we
construct a context-tree model that provides prior distributions of note values
using these features and combine it with a performance model in the framework
of Markov random fields. Evaluation results showed that our method reduces the
average error rate by around 40 percent compared to existing/simple methods. We
also confirmed that, in our model, the score model plays a more important role
than the performance model, and it automatically captures the voice structure
by unsupervised learning.

Diversification-Based Learning in Computing and Optimization

Fred Glover, Jin-Kao Hao
Comments: 17 pages
Subjects: Artificial Intelligence (cs.AI)

Diversification-Based Learning (DBL) derives from a collection of principles
and methods introduced in the field of metaheuristics that have broad
applications in computing and optimization. We show that the DBL framework goes
significantly beyond that of the more recent Opposition-based learning (OBL)
framework introduced in Tizhoosh (2005), which has become the focus of numerous
research initiatives in machine learning and metaheuristic optimization. We
unify and extend earlier proposals in metaheuristic search (Glover, 1997,
Glover and Laguna, 1997) to give a collection of approaches that are more
flexible and comprehensive than OBL for creating intensification and
diversification strategies in metaheuristic search. We also describe potential
applications of DBL to various subfields of machine learning and optimization.

An overview of embedding models of entities and relationships for knowledge base completion

Dat Quoc Nguyen
Comments: 8 pages, 4 tables. arXiv admin note: substantial text overlap with arXiv:1606.06461, arXiv:1606.08140
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Knowledge bases of real-world facts about entities and their relationships
are useful resources for a variety of natural language processing tasks.
However, because knowledge bases are typically incomplete, it is useful to be
able to perform knowledge base completion, i.e., predict whether a relationship
not in the knowledge base is likely to be true. This article presents an
overview of embedding models of entities and relationships for knowledge base
completion, with up-to-date experimental results on two standard evaluation
tasks of link prediction (i.e. entity prediction) and triple classification.

Resolving the Complexity of Some Fundamental Problems in Computational Social Choice

Palash Dey
Comments: Ph.D. Thesis
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

This thesis is in the area called computational social choice which is an
intersection area of algorithms and social choice theory.

Containment for Rule-Based Ontology-Mediated Queries

Pablo Barcelo, Gerald Berger, Andreas Pieris
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

Many efforts have been dedicated to identifying restrictions on ontologies
expressed as tuple-generating dependencies (tgds), a.k.a. existential rules,
that lead to the decidability for the problem of answering ontology-mediated
queries (OMQs). This has given rise to three families of formalisms: guarded,
non-recursive, and sticky sets of tgds. In this work, we study the containment
problem for OMQs expressed in such formalisms, which is a key ingredient for
solving static analysis tasks associated with them. Our main contribution is
the development of specially tailored techniques for OMQ containment under the
classes of tgds stated above. This enables us to obtain sharp complexity bounds
for the problems at hand. We also apply our techniques to pinpoint the
complexity of problems associated with two emerging applications of OMQ
containment: distribution over components and UCQ rewritability of OMQs.

Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

Fanhua Shang, Yuanyuan Liu, James Cheng, Jiacheng Zhuo
Comments: 19 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

Recently, research on accelerated stochastic gradient descent methods (e.g.,
SVRG) has made exciting progress (e.g., linear convergence for strongly convex
problems). However, the best-known methods (e.g., Katyusha) requires at least
two auxiliary variables and two momentum parameters. In this paper, we propose
a fast stochastic variance reduction gradient (FSVRG) method, in which we
design a novel update rule with the Nesterov’s momentum and incorporate the
technique of growing epoch size. FSVRG has only one auxiliary variable and one
momentum weight, and thus it is much simpler and has much lower per-iteration
complexity. We prove that FSVRG achieves linear convergence for strongly convex
problems and the optimal (mathcal{O}(1/T^2)) convergence rate for non-strongly
convex problems, where (T) is the number of outer-iterations. We also extend
FSVRG to directly solve the problems with non-smooth component functions, such
as SVM. Finally, we empirically study the performance of FSVRG for solving
various machine learning problems such as logistic regression, ridge
regression, Lasso and SVM. Our results show that FSVRG outperforms the
state-of-the-art stochastic methods, including Katyusha.

Unsupervised Basis Function Adaptation for Reinforcement Learning

Edward Barker, Charl Ras
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

When using reinforcement learning (RL) algorithms to evaluate a policy it is
common, given a large state space, to introduce some form of approximation
architecture for the value function (VF). The exact form of this architecture
can have a significant effect on the accuracy of the VF estimate, however, and
determining a suitable approximation architecture can often be a highly complex
task. Consequently there is a large amount of interest in the potential for
allowing RL algorithms to adaptively generate (i.e. to learn) approximation
architectures.

We investigate a method of adapting approximation architectures which uses
feedback regarding the frequency with which an agent has visited certain states
to guide which areas of the state space to approximate with greater detail. We
introduce an algorithm based upon this idea which adapts a state aggregation
approximation architecture on-line.

Assuming (S) states, we demonstrate theoretically that – provided the
following relatively non-restrictive assumptions are satisfied: (a) the number
of cells (X) in the state aggregation architecture is of order
(sqrt{S}ln{S}log_2{S}) or greater, (b) the policy and transition function
are close to deterministic, and (c) the prior for the transition function is
uniformly distributed – our algorithm can guarantee, assuming we use an
appropriate scoring function to measure VF error, error which is arbitrarily
close to zero as (S) becomes large. It is able to do this despite having only
(O(Xlog_2{S})) space complexity (and negligible time complexity). We conclude
by generating a set of empirical results which support the theoretical results.

Guided Perturbations: Self Corrective Behavior in Convolutional Neural Networks

Convolutional Neural Networks have been a subject of great importance over
the past decade and great strides have been made in their utility for producing
state of the art performance in many computer vision problems. However, the
behavior of deep networks is yet to be fully understood and is still an active
area of research. In this work, we present an intriguing behavior: pre-trained
CNNs can be made to improve their predictions by structurally perturbing the
input. We observe that these perturbations – referred as Guided Perturbations –
enable a trained network to improve its prediction performance without any
learning or change in network weights. We perform various ablative experiments
to understand how these perturbations affect the local context and feature
representations. Furthermore, we demonstrate that this idea can improve
performance of several existing approaches on semantic segmentation and scene
labeling tasks on the PASCAL VOC dataset and supervised classification tasks on
MNIST and CIFAR10 datasets.

Information-theoretic Model Identification and Policy Search using Physics Engines with Application to Robotic Manipulation

Shaojun Zhu, Andrew Kimmel, Abdeslam Boularias
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

We consider the problem of a robot learning the mechanical properties of
objects through physical interaction with the object, and introduce a
practical, data-efficient approach for identifying the motion models of these
objects. The proposed method utilizes a physics engine, where the robot seeks
to identify the inertial and friction parameters of the object by simulating
its motion under different values of the parameters and identifying those that
result in a simulation which matches the observed real motions. The problem is
solved in a Bayesian optimization framework. The same framework is used for
both identifying the model of an object online and searching for a policy that
would minimize a given cost function according to the identified model.
Experimental results both in simulation and using a real robot indicate that
the proposed method outperforms state-of-the-art model-free reinforcement
learning approaches.

Supervised Typing of Big Graphs using Semantic Embeddings

Mayank Kejriwal, Pedro Szekely
Comments: 6 pages, to be published in Semantic Big Data Workshop at ACM, SIGMOD 2017; extended version in preparation for Open Journal of Semantic Web (OJSW)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We propose a supervised algorithm for generating type embeddings in the same
semantic vector space as a given set of entity embeddings. The algorithm is
agnostic to the derivation of the underlying entity embeddings. It does not
require any manual feature engineering, generalizes well to hundreds of types
and achieves near-linear scaling on Big Graphs containing many millions of
triples and instances by virtue of an incremental execution. We demonstrate the
utility of the embeddings on a type recommendation task, outperforming a
non-parametric feature-agnostic baseline while achieving 15x speedup and
near-constant memory usage on a full partition of DBpedia. Using
state-of-the-art visualization, we illustrate the agreement of our
extensionally derived DBpedia type embeddings with the manually curated domain
ontology. Finally, we use the embeddings to probabilistically cluster about 4
million DBpedia instances into 415 types in the DBpedia ontology.

Information Retrieval

An overview of embedding models of entities and relationships for knowledge base completion

Knowledge bases of real-world facts about entities and their relationships
are useful resources for a variety of natural language processing tasks.
However, because knowledge bases are typically incomplete, it is useful to be
able to perform knowledge base completion, i.e., predict whether a relationship
not in the knowledge base is likely to be true. This article presents an
overview of embedding models of entities and relationships for knowledge base
completion, with up-to-date experimental results on two standard evaluation
tasks of link prediction (i.e. entity prediction) and triple classification.

Quantifying and suppressing ranking bias in a large citation network

Giacomo Vaccario, Matus Medo, Nicolas Wider, Manuel Sebastian Mariani
Comments: Main text (pp. 1-12) and Appendices (pp. 13-17)
Subjects: Physics and Society (physics.soc-ph); Digital Libraries (cs.DL); Information Retrieval (cs.IR); Data Analysis, Statistics and Probability (physics.data-an); Applications (stat.AP)

It is widely recognized that citation counts for papers from different fields
cannot be directly compared because different scientific fields adopt different
citation practices. Citation counts are also strongly biased by paper age since
older papers had more time to attract citations. Various procedures aim at
suppressing these biases and give rise to new normalized indicators, such as
the relative citation count. We use a large citation dataset from Microsoft
Academic Graph and a new statistical framework based on the Mahalanobis
distance to show that the rankings by well known indicators, including the
relative citation count and Google’s PageRank score, are significantly biased
by paper field and age. We propose a general normalization procedure motivated
by the (z)-score which produces much less biased rankings when applied to
citation count and PageRank score.

A learning-based approach to text image retrieval: using CNN features and improved similarity metrics

Mao Tan, Si-Ping Yuan, Yong-Xin Su
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG)

Text content can have different visual presentation ways with roughly similar
characters. While conventional text image retrieval depends on complex model of
OCR-based text recognition and text similarity detection, this paper proposes a
new learning-based approach to text image retrieval with the purpose of finding
out the original or similar text through a query text image. Firstly, features
of text images are extracted by the CNN network to obtain the deep visual
representations. Then, the dimension of CNN features is reduced by PCA method
to improve the efficiency of similarity detection. Based on that, an improved
similarity metrics with article theme relevance filtering is proposed to
improve the retrieval accuracy. In experimental procedure, we collect a group
of academic papers both including English and Chinese as the text database, and
cut them into pieces of text image. A text image with changed text content is
used as the query image, experimental results show that the proposed approach
has good ability to retrieve the original text content.

Computation and Language

Visually grounded learning of keyword prediction from untranscribed speech

During language acquisition, infants have the benefit of visual cues to
ground spoken language. Robots similarly have access to audio and visual
sensors. Recent work has shown that images and spoken captions can be mapped
into a meaningful common space, allowing images to be retrieved using speech
and vice versa. In this setting of images paired with untranscribed spoken
captions, we consider whether computer vision systems can be used to obtain
textual labels for the speech. Concretely, we use an image-to-words multi-label
visual classifier to tag images with soft textual labels, and then train a
neural network to map from the speech to these soft targets. We show that the
resulting speech system is able to predict which words occur in an
utterance—acting as a spoken bag-of-words classifier—without seeing any
parallel speech and text. We find that the model often confuses semantically
related words, e.g. “man” and “person”, making it even more effective as a
semantic keyword spotter.

An embedded segmental k-means model for unsupervised segmentation and clustering of speech

Herman Kamper, Karen Livescu, Sharon Goldwater
Comments: 5 pages, 3 figures, 2 tables
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Unsupervised segmentation and clustering of unlabelled speech are core
problems in zero-resource speech processing. Most competitive approaches lie at
methodological extremes: some follow a Bayesian approach, defining
probabilistic models with convergence guarantees, while others opt for more
efficient heuristic techniques. Here we introduce an approximation to a
segmental Bayesian model that falls in between, with a clear objective function
but using hard clustering and segmentation rather than full Bayesian inference.
Like its Bayesian counterpart, this embedded segmental k-means model
(ES-KMeans) represents arbitrary-length word segments as fixed-dimensional
acoustic word embeddings. On English and Xitsonga data, ES-KMeans outperforms a
leading heuristic method in word segmentation, giving similar scores to the
Bayesian model while being five times faster with fewer hyperparameters.
However, there is a trade-off in cluster purity, with the Bayesian model’s
purer clusters yielding about 10% better unsupervised word error rates.

Recurrent and Contextual Models for Visual Question Answering

Abhijit Sharang, Eric Lau
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

We propose a series of recurrent and contextual neural network models for
multiple choice visual question answering on the Visual7W dataset. Motivated by
divergent trends in model complexities in the literature, we explore the
balance between model expressiveness and simplicity by studying incrementally
more complex architectures. We start with LSTM-encoding of input questions and
answers; build on this with context generation by LSTM-encodings of neural
image and question representations and attention over images; and evaluate the
diversity and predictive power of our models and the ensemble thereof. All
models are evaluated against a simple baseline inspired by the current
state-of-the-art, consisting of involving simple concatenation of bag-of-words
and CNN representations for the text and images, respectively. Generally, we
observe marked variation in image-reasoning performance between our models not
obvious from their overall performance, as well as evidence of dataset bias.
Our standalone models achieve accuracies up to (64.6\%), while the ensemble of
all models achieves the best accuracy of (66.67\%), within (0.5\%) of the
current state-of-the-art for Visual7W.

An overview of embedding models of entities and relationships for knowledge base completion

Knowledge bases of real-world facts about entities and their relationships
are useful resources for a variety of natural language processing tasks.
However, because knowledge bases are typically incomplete, it is useful to be
able to perform knowledge base completion, i.e., predict whether a relationship
not in the knowledge base is likely to be true. This article presents an
overview of embedding models of entities and relationships for knowledge base
completion, with up-to-date experimental results on two standard evaluation
tasks of link prediction (i.e. entity prediction) and triple classification.

Rapid-Rate: A Framework for Semi-supervised Real-time Sentiment Trend Detection in Unstructured Big Data

Vineet John
Subjects: Computation and Language (cs.CL)

Commercial establishments like restaurants, service centres and retailers
have several sources of customer feedback about products and services, most of
which need not be as structured as rated reviews provided by services like
Yelp, or Amazon, in terms of sentiment conveyed.

For instance, Amazon provides a fine-grained score on a numeric scale for
product reviews.

Some sources, however, like social media (Twitter, Facebook), mailing lists
(Google Groups) and forums (Quora) contain text data that is much more
voluminous, but unstructured and unlabelled.

It might be in the best interests of a business establishment to assess the
general sentiment towards their brand on these platforms as well.

This text could be pipelined into a system with a built-in prediction model,
with the objective of generating real-time graphs on opinion and sentiment
trends.

Although such tasks like the one described about have been explored with
respect to document classification problems in the past, the implementation
described in this paper, by virtue of learning a continuous function rather
than a discrete one, offers a lot more depth of insight as compared to document
classification approaches.

This study aims to explore the validity of such a continuous function
predicting model to quantify sentiment about an entity, without the additional
overhead of manual labelling, and computational preprocessing & feature
extraction.

This research project also aims to design and implement a re-usable document
regression pipeline as a framework, Rapid-Ratecite{rapid_rate}, that can be
used to predict document scores in real-time.

Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation

Jean-Benoit Delbrouck, Stephane Dupont
Comments: Submitted to ICLR Workshop 2017
Subjects: Computation and Language (cs.CL)

In state-of-the-art Neural Machine Translation, an attention mechanism is
used during decoding to enhance the translation. At every step, the decoder
uses this mechanism to focus on different parts of the source sentence to
gather the most useful information before outputting its target word. Recently,
the effectiveness of the attention mechanism has also been explored for
multimodal tasks, where it becomes possible to focus both on sentence parts and
image regions. Approaches to pool two modalities usually include element-wise
product, sum or concatenation. In this paper, we evaluate the more advanced
Multimodal Compact Bilinear pooling method, which takes the outer product of
two vectors to combine the attention features for the two modalities. This has
been previously investigated for visual question answering. We try out this
approach for multimodal image caption translation and show improvements
compared to basic combination methods.

Sequential Recurrent Neural Networks for Language Modeling

Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow
Comments: published (INTERSPEECH 2016), 5 pages, 3 figures, 4 tables
Subjects: Computation and Language (cs.CL)

Feedforward Neural Network (FNN)-based language models estimate the
probability of the next word based on the history of the last N words, whereas
Recurrent Neural Networks (RNN) perform the same task based only on the last
word and some context information that cycles in the network. This paper
presents a novel approach, which bridges the gap between these two categories
of networks. In particular, we propose an architecture which takes advantage of
the explicit, sequential enumeration of the word history in FNN structure while
enhancing each word representation at the projection layer through recurrent
context information that evolves in the network. The context integration is
performed using an additional word-dependent weight matrix that is also learned
during the training. Extensive experiments conducted on the Penn Treebank (PTB)
and the Large Text Compression Benchmark (LTCB) corpus showed a significant
reduction of the perplexity when compared to state-of-the-art feedforward as
well as recurrent neural network architectures.

A network of deep neural networks for distant speech recognition

Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Despite the remarkable progress recently made in distant speech recognition,
state-of-the-art technology still suffers from a lack of robustness, especially
when adverse acoustic conditions characterized by non-stationary noises and
reverberation are met. A prominent limitation of current systems lies in the
lack of matching and communication between the various technologies involved in
the distant speech recognition process. The speech enhancement and speech
recognition modules are, for instance, often trained independently. Moreover,
the speech enhancement normally helps the speech recognizer, but the output of
the latter is not commonly used, in turn, to improve the speech enhancement. To
address both concerns, we propose a novel architecture based on a network of
deep neural networks, where all the components are jointly trained and better
cooperate with each other thanks to a full communication scheme between them.
Experiments, conducted using different datasets, tasks and acoustic conditions,
revealed that the proposed framework can overtake other competitive solutions,
including recent joint training approaches.

Supervised Typing of Big Graphs using Semantic Embeddings

We propose a supervised algorithm for generating type embeddings in the same
semantic vector space as a given set of entity embeddings. The algorithm is
agnostic to the derivation of the underlying entity embeddings. It does not
require any manual feature engineering, generalizes well to hundreds of types
and achieves near-linear scaling on Big Graphs containing many millions of
triples and instances by virtue of an incremental execution. We demonstrate the
utility of the embeddings on a type recommendation task, outperforming a
non-parametric feature-agnostic baseline while achieving 15x speedup and
near-constant memory usage on a full partition of DBpedia. Using
state-of-the-art visualization, we illustrate the agreement of our
extensionally derived DBpedia type embeddings with the manually curated domain
ontology. Finally, we use the embeddings to probabilistically cluster about 4
million DBpedia instances into 415 types in the DBpedia ontology.

Dynamic Bernoulli Embeddings for Language Evolution

Maja Rudolph, David Blei
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL)

Word embeddings are a powerful approach for unsupervised analysis of
language. Recently, Rudolph et al. (2016) developed exponential family
embeddings, which cast word embeddings in a probabilistic framework. Here, we
develop dynamic embeddings, building on exponential family embeddings to
capture how the meanings of words change over time. We use dynamic embeddings
to analyze three large collections of historical texts: the U.S. Senate
speeches from 1858 to 2009, the history of computer science ACM abstracts from
1951 to 2014, and machine learning papers on the Arxiv from 2007 to 2015. We
find dynamic embeddings provide better fits than classical embeddings and
capture interesting patterns about how language changes.

Distributed, Parallel, and Cluster Computing

Sparse geometries handling in lattice-Boltzmann method implementation for graphic processors

Tadeusz Tomczak, Roman G. Szafran
Comments: 14 pages, 10 figures, sent to IEEE Transactions on Parallel and Distributed Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We describe a high-performance implementation of the lattice-Boltzmann method
(LBM) for sparse geometries on graphic processors. In our implementation we
cover the whole geometry with a uniform mesh of small tiles and carry out
calculations for each tile independently with a proper data synchronization at
tile edges. For this method we provide both the theoretical analysis of
complexity and the results for real implementations for 2D and 3D geometries.
Based on the theoretical model, we show that tiles offer significantly smaller
bandwidth overhead than solutions based on indirect addressing. For
2-dimensional lattice arrangements a reduction of memory usage is also
possible, though at the cost of diminished performance. We reached the
performance of 682 MLUPS on GTX Titan (72\% of peak theoretical memory
bandwidth) for D3Q19 lattice arrangement and double precision data.

Learning

Online Distributed Learning Over Networks in RKH Spaces Using Random Fourier Features

Pantelis Bouboulis, Symeon Chouvardas, Sergios Theodoridis
Comments: submitted paper
Subjects: Learning (cs.LG)

We present a novel diffusion scheme for online kernel-based learning over
networks. So far, a major drawback of any online learning algorithm, operating
in a reproducing kernel Hilbert space (RKHS), is the need for updating a
growing number of parameters as time iterations evolve. Besides complexity,
this leads to an increased need of communication resources, in a distributed
setting. In contrast, the proposed method approximates the solution as a
fixed-size vector (of larger dimension than the input space) using Random
Fourier Features. This paves the way to use standard linear combine-then-adapt
techniques. To the best of our knowledge, this is the first time that a
complete protocol for distributed online learning in RKHS is presented.
Conditions for asymptotic convergence and boundness of the networkwise regret
are also provided. The simulated tests illustrate the performance of the
proposed scheme.

Failures of Deep Learning

Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
Subjects: Learning (cs.LG)

In recent years, Deep Learning has become the go-to solution for a broad
range of applications, often outperforming state-of-the-art. However, it is
important, for both theoreticians and practitioners, to gain a deeper
understanding of the difficulties and limitations associated with common
approaches and algorithms. We describe four families of problems for which some
of the commonly used existing algorithms fail or suffer significant difficulty.
We illustrate the failures through practical experiments, and provide
theoretical insights explaining their source, and how they might be remedied.

Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning

Recently, research on accelerated stochastic gradient descent methods (e.g.,
SVRG) has made exciting progress (e.g., linear convergence for strongly convex
problems). However, the best-known methods (e.g., Katyusha) requires at least
two auxiliary variables and two momentum parameters. In this paper, we propose
a fast stochastic variance reduction gradient (FSVRG) method, in which we
design a novel update rule with the Nesterov’s momentum and incorporate the
technique of growing epoch size. FSVRG has only one auxiliary variable and one
momentum weight, and thus it is much simpler and has much lower per-iteration
complexity. We prove that FSVRG achieves linear convergence for strongly convex
problems and the optimal (mathcal{O}(1/T^2)) convergence rate for non-strongly
convex problems, where (T) is the number of outer-iterations. We also extend
FSVRG to directly solve the problems with non-smooth component functions, such
as SVM. Finally, we empirically study the performance of FSVRG for solving
various machine learning problems such as logistic regression, ridge
regression, Lasso and SVM. Our results show that FSVRG outperforms the
state-of-the-art stochastic methods, including Katyusha.

Unsupervised Basis Function Adaptation for Reinforcement Learning

Edward Barker, Charl Ras
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

When using reinforcement learning (RL) algorithms to evaluate a policy it is
common, given a large state space, to introduce some form of approximation
architecture for the value function (VF). The exact form of this architecture
can have a significant effect on the accuracy of the VF estimate, however, and
determining a suitable approximation architecture can often be a highly complex
task. Consequently there is a large amount of interest in the potential for
allowing RL algorithms to adaptively generate (i.e. to learn) approximation
architectures.

We investigate a method of adapting approximation architectures which uses
feedback regarding the frequency with which an agent has visited certain states
to guide which areas of the state space to approximate with greater detail. We
introduce an algorithm based upon this idea which adapts a state aggregation
approximation architecture on-line.

Assuming (S) states, we demonstrate theoretically that – provided the
following relatively non-restrictive assumptions are satisfied: (a) the number
of cells (X) in the state aggregation architecture is of order
(sqrt{S}ln{S}log_2{S}) or greater, (b) the policy and transition function
are close to deterministic, and (c) the prior for the transition function is
uniformly distributed – our algorithm can guarantee, assuming we use an
appropriate scoring function to measure VF error, error which is arbitrarily
close to zero as (S) becomes large. It is able to do this despite having only
(O(Xlog_2{S})) space complexity (and negligible time complexity). We conclude
by generating a set of empirical results which support the theoretical results.

Random Features for Compositional Kernels

Amit Daniely, Roy Frostig, Vineet Gupta, Yoram Singer
Subjects: Learning (cs.LG)

We describe and analyze a simple random feature scheme (RFS) from prescribed
compositional kernels. The compositional kernels we use are inspired by the
structure of convolutional neural networks and kernels. The resulting scheme
yields sparse and efficiently computable features. Each random feature can be
represented as an algebraic expression over a small number of (random) paths in
a composition tree. Thus, compositional random features can be stored
compactly. The discrete nature of the generation process enables de-duplication
of repeated features, further compacting the representation and increasing the
diversity of the embeddings. Our approach complements and can be combined with
previous random feature schemes.

Faster Reinforcement Learning Using Active Simulators

Vikas Jain, Theja Tulabandhula
Comments: 12 pages and 4 figures
Subjects: Learning (cs.LG)

In this work, we propose several online methods to build a emph{learning
curriculum} from a given set of target-task-specific training tasks in order to
speed up reinforcement learning (RL). These methods can decrease the total
training time needed by an RL agent compared to training on the target task
from scratch. Unlike traditional transfer learning, we consider creating a
sequence from several training tasks in order to provide the most benefit in
terms of reducing the total time to train.

Our methods utilize the learning trajectory of the agent on the curriculum
tasks seen so far to decide which tasks to train on next. An attractive feature
of our methods is that they are weakly coupled to the choice of the RL
algorithm as well as the transfer learning method. Further, when there is
domain information available, our methods can incorporate such knowledge to
further speed up the learning. We experimentally show that these methods can be
used to obtain suitable learning curricula that speed up the overall training
time on two different domains.

Randomized Kernel Methods for Least-Squares Support Vector Machines

M. Andrecut
Comments: 16 pages, 6 figures
Journal-ref: Int. J. Mod. Phys. C, 28, 1750015 (2017)
Subjects: Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

The least-squares support vector machine is a frequently used kernel method
for non-linear regression and classification tasks. Here we discuss several
approximation algorithms for the least-squares support vector machine
classifier. The proposed methods are based on randomized block kernel matrices,
and we show that they provide good accuracy and reliable scaling for
multi-class classification problems with relatively large data sets. Also, we
present several numerical experiments that illustrate the practical
applicability of the proposed methods.

Fake News Mitigation via Point Process Based Intervention

Mehrdad Farajtabar, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, Elias Khalil, Shuang Li, Le Song, Hongyuan Zha
Comments: Point Process, Hawkes Process, Social Networks, Intervention and Control, Reinforcement Learning
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

We propose the first multistage intervention framework that tackles fake news
in social networks by combining reinforcement learning with a point process
network activity model. The spread of fake news and mitigation events within
the network is modeled by a multivariate Hawkes process with additional
exogenous control terms. By choosing a feature representation of states,
defining mitigation actions and constructing reward functions to measure the
effectiveness of mitigation activities, we map the problem of fake news
mitigation into the reinforcement learning framework. We develop a policy
iteration method unique to the multivariate networked point process, with the
goal of optimizing the actions for maximal total reward under budget
constraints. Our method shows promising performance in real-time intervention
experiments on a Twitter network to mitigate a surrogate fake news campaign,
and outperforms alternatives on synthetic datasets.

Learning to Partition using Score Based Compatibilities

Arun Rajkumar, Koyel Mukherjee, Theja Tulabandhula
Comments: Appears in the Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017)
Subjects: Learning (cs.LG)

We study the problem of learning to partition users into groups, where one
must learn the compatibilities between the users to achieve optimal groupings.
We define four natural objectives that optimize for average and worst case
compatibilities and propose new algorithms for adaptively learning optimal
groupings. When we do not impose any structure on the compatibilities, we show
that the group formation objectives considered are (NP) hard to solve and we
either give approximation guarantees or prove inapproximability results. We
then introduce an elegant structure, namely that of extit{intrinsic scores},
that makes many of these problems polynomial time solvable. We explicitly
characterize the optimal groupings under this structure and show that the
optimal solutions are related to emph{homophilous} and emph{heterophilous}
partitions, well-studied in the psychology literature. For one of the four
objectives, we show (NP) hardness under the score structure and give a
(frac{1}{2}) approximation algorithm for which no constant approximation was
known thus far. Finally, under the score structure, we propose an online low
sample complexity PAC algorithm for learning the optimal partition. We
demonstrate the efficacy of the proposed algorithm on synthetic and real world
datasets.

An embedded segmental k-means model for unsupervised segmentation and clustering of speech

Herman Kamper, Karen Livescu, Sharon Goldwater
Comments: 5 pages, 3 figures, 2 tables
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Unsupervised segmentation and clustering of unlabelled speech are core
problems in zero-resource speech processing. Most competitive approaches lie at
methodological extremes: some follow a Bayesian approach, defining
probabilistic models with convergence guarantees, while others opt for more
efficient heuristic techniques. Here we introduce an approximation to a
segmental Bayesian model that falls in between, with a clear objective function
but using hard clustering and segmentation rather than full Bayesian inference.
Like its Bayesian counterpart, this embedded segmental k-means model
(ES-KMeans) represents arbitrary-length word segments as fixed-dimensional
acoustic word embeddings. On English and Xitsonga data, ES-KMeans outperforms a
leading heuristic method in word segmentation, giving similar scores to the
Bayesian model while being five times faster with fewer hyperparameters.
However, there is a trade-off in cluster purity, with the Bayesian model’s
purer clusters yielding about 10% better unsupervised word error rates.

A learning-based approach to text image retrieval: using CNN features and improved similarity metrics

Mao Tan, Si-Ping Yuan, Yong-Xin Su
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG)

Text content can have different visual presentation ways with roughly similar
characters. While conventional text image retrieval depends on complex model of
OCR-based text recognition and text similarity detection, this paper proposes a
new learning-based approach to text image retrieval with the purpose of finding
out the original or similar text through a query text image. Firstly, features
of text images are extracted by the CNN network to obtain the deep visual
representations. Then, the dimension of CNN features is reduced by PCA method
to improve the efficiency of similarity detection. Based on that, an improved
similarity metrics with article theme relevance filtering is proposed to
improve the retrieval accuracy. In experimental procedure, we collect a group
of academic papers both including English and Chinese as the text database, and
cut them into pieces of text image. A text image with changed text content is
used as the query image, experimental results show that the proposed approach
has good ability to retrieve the original text content.

A network of deep neural networks for distant speech recognition

Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Despite the remarkable progress recently made in distant speech recognition,
state-of-the-art technology still suffers from a lack of robustness, especially
when adverse acoustic conditions characterized by non-stationary noises and
reverberation are met. A prominent limitation of current systems lies in the
lack of matching and communication between the various technologies involved in
the distant speech recognition process. The speech enhancement and speech
recognition modules are, for instance, often trained independently. Moreover,
the speech enhancement normally helps the speech recognizer, but the output of
the latter is not commonly used, in turn, to improve the speech enhancement. To
address both concerns, we propose a novel architecture based on a network of
deep neural networks, where all the components are jointly trained and better
cooperate with each other thanks to a full communication scheme between them.
Experiments, conducted using different datasets, tasks and acoustic conditions,
revealed that the proposed framework can overtake other competitive solutions,
including recent joint training approaches.

Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders

Fengfu Li, Hong Qiao, Bo Zhang, Xuanyang Xi
Comments: 27 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Traditional image clustering methods take a two-step approach, feature
learning and clustering, sequentially. However, recent research results
demonstrated that combining the separated phases in a unified framework and
training them jointly can achieve a better performance. In this paper, we first
introduce fully convolutional auto-encoders for image feature learning and then
propose a unified clustering framework to learn image representations and
cluster centers jointly based on a fully convolutional auto-encoder and soft
(k)-means scores. At initial stages of the learning procedure, the
representations extracted from the auto-encoder may not be very discriminative
for latter clustering. We address this issue by adopting a boosted
discriminative distribution, where high score assignments are highlighted and
low score ones are de-emphasized. With the gradually boosted discrimination,
clustering assignment scores are discriminated and cluster purities are
enlarged. Experiments on several vision benchmark datasets show that our
methods can achieve a state-of-the-art performance.

The role of zero synapses in unsupervised feature learning

Synapses in real neural circuits can take discrete values, including zero
(silent or potential) synapses. The computational role of zero synapses in
unsupervised feature learning of unlabeled noisy data is still unclear, yet
important to understand how the sparseness of synaptic activity is shaped
during learning and its relationship with receptive field formation. Here, we
formulate this kind of sparse feature learning by statistical mechanics
approach. We find that learning decreases the fraction of zero synapses, and
when the fraction decreases rapidly around a critical data size, the
intrinsically structured receptive field starts to develop. Further increasing
the data size refines the receptive field, while a very small fraction of zero
synapses remain to act as contour detectors. This feature is discovered not
only in a handwritten digits dataset, but also in retinal neural activity
measured in a natural movie stimuli experiment.

Perspective: Energy Landscapes for Machine Learning

Andrew J. Ballard, Ritankar Das, Stefano Martiniani, Dhagash Mehta, Levent Sagun, Jacob D. Stevenson, David J. Wales
Comments: 41 pages, 25 figures. Accepted for publication in Physical Chemistry Chemical Physics, 2017
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)

Machine learning techniques are being increasingly used as flexible
non-linear fitting and prediction tools in the physical sciences. Fitting
functions that exhibit multiple solutions as local minima can be analysed in
terms of the corresponding machine learning landscape. Methods to explore and
visualise molecular potential energy landscapes can be applied to these machine
learning landscapes to gain new insight into the solution space involved in
training and the nature of the corresponding predictions. In particular, we can
define quantities analogous to molecular structure, thermodynamics, and
kinetics, and relate these emergent properties to the structure of the
underlying landscape. This Perspective aims to describe these analogies with
examples from recent applications, and suggest avenues for new
interdisciplinary research.

Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains

Tegjyot Singh Sethi, Mehmed Kantardzic
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)

While modern day web applications aim to create impact at the civilization
level, they have become vulnerable to adversarial activity, where the next
cyber-attack can take any shape and can originate from anywhere. The increasing
scale and sophistication of attacks, has prompted the need for a data driven
solution, with machine learning forming the core of many cybersecurity systems.
Machine learning was not designed with security in mind, and the essential
assumption of stationarity, requiring that the training and testing data follow
similar distributions, is violated in an adversarial domain. In this paper, an
adversary’s view point of a classification based system, is presented. Based on
a formal adversarial model, the Seed-Explore-Exploit framework is presented,
for simulating the generation of data driven and reverse engineering attacks on
classifiers. Experimental evaluation, on 10 real world datasets and using the
Google Cloud Prediction Platform, demonstrates the innate vulnerability of
classifiers and the ease with which evasion can be carried out, without any
explicit information about the classifier type, the training data or the
application domain. The proposed framework, algorithms and empirical
evaluation, serve as a white hat analysis of the vulnerabilities, and aim to
foster the development of secure machine learning frameworks.

SIMLR: a tool for large-scale single-cell analysis by multi-kernel learning

Bo Wang, Daniele Ramazzotti, Luca De Sano, Junjie Zhu, Emma Pierson, Serafim Batzoglou
Subjects: Genomics (q-bio.GN); Learning (cs.LG); Quantitative Methods (q-bio.QM)

Motivation: We here present SIMLR (Single-cell Interpretation via
Multi-kernel LeaRning), an open-source tool that implements a novel framework
to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR
can be effectively used to perform tasks such as dimension reduction,
clustering, and visualization of heterogeneous populations of cells. SIMLR was
benchmarked against state-of-the-art methods for these three tasks on several
public datasets, showing it to be scalable and capable of greatly improving
clustering performance, as well as providing valuable insights by making the
data more interpretable via better a visualization. Availability and
Implementation: SIMLR is available on GitHub in both R and MATLAB
implementations. Furthermore, it is also available as an R package on
bioconductor.org. Contact: bowang87@stanford.edu or
daniele.ramazzotti@stanford.edu Supplementary Information: Supplementary data
are available at Bioinformatics online.

Classification-based RNN machine translation using GRUs

Ri Wang, Maysum Panju, Mahmood Gohari
Comments: 7 pages, 1 figure; graduate course research project
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

We report the results of our classification-based machine translation model,
built upon the framework of a recurrent neural network using gated recurrent
units. Unlike other RNN models that attempt to maximize the overall conditional
log probability of sentences against sentences, our model focuses a
classification approach of estimating the conditional probability of the next
word given the input sequence. This simpler approach using GRUs was hoped to be
comparable with more complicated RNN models, but achievements in this
implementation were modest and there remains a lot of room for improving this
classification approach.

Information-theoretic Model Identification and Policy Search using Physics Engines with Application to Robotic Manipulation

Shaojun Zhu, Andrew Kimmel, Abdeslam Boularias
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

We consider the problem of a robot learning the mechanical properties of
objects through physical interaction with the object, and introduce a
practical, data-efficient approach for identifying the motion models of these
objects. The proposed method utilizes a physics engine, where the robot seeks
to identify the inertial and friction parameters of the object by simulating
its motion under different values of the parameters and identifying those that
result in a simulation which matches the observed real motions. The problem is
solved in a Bayesian optimization framework. The same framework is used for
both identifying the model of an object online and searching for a policy that
would minimize a given cost function according to the identified model.
Experimental results both in simulation and using a real robot indicate that
the proposed method outperforms state-of-the-art model-free reinforcement
learning approaches.

Information Theory

Combinatorial Entropy Encoding

Abu Bakar Siddique
Subjects: Information Theory (cs.IT)

This paper proposes a novel entropy encoding technique for lossless data
compression. Representing a message string by its lexicographic index in the
permutations of its symbols results in a compressed version matching Shannon
entropy of the message. Commercial data compression standards make use of
Huffman or arithmetic coding at some stage of the compression process. In the
proposed method, like arithmetic coding entire string is mapped to an integer
but is not based on fractional numbers. Unlike both arithmetic and Huffman
coding no prior entropy model of the source is required. Simple intuitive
algorithm based on multinomial coefficients is developed for entropy encoding
that adoptively uses low number of bits for more frequent symbols. Correctness
of the algorithm is demonstrated by an example.

Cooperative Binning for Semi-deterministic Channels with Non-causal State Information

Ido B. Gattegno, Haim H. Permuter, Shlomo Shamai (Shitz), AyferÖzgür
Subjects: Information Theory (cs.IT)

The capacity of the semi-deterministic relay channel (SD-RC) with non-causal
channel state information (CSI) only at the encoder and decoder is
characterized. The capacity is achieved by a scheme based on
cooperative-bin-forward. This scheme allows cooperation between the transmitter
and the relay without the need to decode a part of the message by the relay.
The transmission is divided into blocks and each deterministic output of the
channel (observed by the relay) is mapped to a bin. The bin index is used by
the encoder and the relay to choose the cooperation codeword in the next
transmission block. In causal settings the cooperation is independent of the
state. In emph{non-causal} settings dependency between the relay’s
transmission and the state can increase the transmission rates. The encoder
implicitly conveys partial state information to the relay. In particular, it
uses the states of the next block and selects a cooperation codeword
accordingly and the relay transmission depends on the cooperation codeword and
therefore also on the states. We also consider the multiple access channel with
partial cribbing as a semi-deterministic channel. The capacity region of this
channel with non-causal CSI is achieved by the new scheme. Examining the result
in several cases, we introduce a new problem of a point-to-point (PTP) channel
where the state is provided to the transmitter by a state encoder.
Interestingly, even though the CSI is also available at the receiver, we
provide an example which shows that the capacity with non-causal CSI at the
state encoder is strictly larger than the capacity with causal CSI.

Generalised Twisted Gabidulin Codes

Sven Puchinger, Johan Rosenkilde né Nielsen, John Sheekey
Comments: 10 pages, submitted to the International Workshop on Coding and Cryptography (WCC) 2017
Subjects: Information Theory (cs.IT)

We present a new family of maximum rank distance (MRD) codes. The new class
contains codes that are neither equivalent to a generalised Gabidulin nor to a
twisted Gabidulin code, the only two known general constructions of linear MRD
codes.

A Flexible Wideband Millimeter-Wave Channel Sounder with Local Area and NLOS to LOS Transition Measurements

George R. MacCartney Jr., Hangsong Yan, Shu Sun, Theodore S. Rappaport
Comments: To be published in 2017 IEEE International Conference on Communications (ICC), Paris, France, May 2017
Subjects: Information Theory (cs.IT)

This paper presents a millimeter-wave (mmWave) wideband sliding correlator
channel sounder with flexibility to operate at various transmission rates. The
channel sounder can transmit and receive up to 1 GHz of RF null-to-null
bandwidth while measuring a 2 nanosecond multipath time resolution. The system
architecture takes advantage of field-programmable gate arrays (FPGAs),
high-speed digital-to-analog converters (DACs), and low phase noise Rubidium
(Rb) references for synchronization. Using steerable narrowbeam antennas, the
system can measure up to 185 dB of path loss. The channel sounder is used to
measure the directional and omnidirectional received power as a receiver
transitions from line-of-sight to non-line-of-sight conditions down an urban
canyon. A 25 dB drop in omnidirectional received power was observed as the
receiver transitioned from line-of-sight (LOS) conditions to deeply shadowed
non-LOS (NLOS) conditions. The channel sounder was also used to study signal
variation and spatial consistency for a local set of receiver locations
arranged in a cluster spanning a 5 m x 10 m local area, where the
omnidirectional received power in LOS and NLOS environments is found to be
relatively stable with standard deviations of received power of 2.2 dB and 4.3
dB, respectively. This work shows that when implementing beamforming at the
transmitter at mmWave, the omnidirectional received power over a local area has
little fluctuation among receiver locations separated by a few to several
meters.

Indoor Office Wideband Penetration Loss Measurements at 73 GHz

Jacqueline Ryan, George R. MacCartney Jr., Theodore S. Rappaport
Comments: To be published in 2017 IEEE International Conference on Communications Workshop (ICCW), Paris, France, May 2017
Subjects: Information Theory (cs.IT)

This paper presents millimeter wave (mmWave) penetration loss measurements
and analysis at 73 GHz using a wideband sliding correlator channel sounder in
an indoor office environment. Penetration loss was measured using a carefully
controlled measurement setup for many common indoor building materials such as
glass doors, glass windows, closet doors, steel doors, and whiteboard writing
walls. Measurements were conducted using narrowbeam transmitter (TX) and
receiver (RX) horn antennas that were boresight-aligned with a test material
between the antennas. Overall, 21 different locations were measured for 6
different materials such that the same type of material was tested in at least
two locations in order to characterize the effect of penetration loss for
materials with similar composition. As shown here, attenuation through common
materials ranged between 0.8 dB/cm and 9.9 dB/cm for co-polarized antennas,
while cross-polarized antennas exhibited similar attenuation for most
materials, but up to 23.4 dB/cm of attenuation for others. The penetration loss
results presented here are useful for site-specific planning tools that will
model indoor mmWave networks, without the need for expensive measurement
campaigns.

Improved Power Decoding of One-Point Hermitian Codes

Sven Puchinger, Irene Bouw, Johan Rosenkilde né Nielsen
Comments: 9 pages, submitted to the International Workshop on Coding and Cryptography (WCC) 2017
Subjects: Information Theory (cs.IT)

We propose a new partial decoding algorithm for one-point Hermitian codes
that can decode up to the same number of errors as the Guruswami–Sudan
decoder. Simulations suggest that it has a similar failure probability as the
latter one. The algorithm is based on a recent generalization of the power
decoding algorithm for Reed–Solomon codes and does not require an expensive
root-finding step. In addition, it promises improvements for decoding
interleaved Hermitian codes.

Security Considerations of a New Helper Data Scheme for Physical Unclonable Functions

Sven Müelich, Sven Puchinger, Martin Bossert
Comments: 10 pages, submitted to the International Workshop on Coding and Cryptography (WCC) 2017
Subjects: Information Theory (cs.IT)

When Physical Unclonable Functions (PUFs) are used for cryptographic
purposes, error correction in combination with a helper data scheme is an
essential component due to the fuzzy nature of a PUF. All known schemes require
both a code and additional helper data to recover PUF responses. Recently,
M”uelich and Bossert proposed a scheme that only requires a code. In this
paper, we answer two open questions about the new scheme. First, we show that
under certain assumptions, the code construction within the scheme is possible
with high probability. Second, it is proven that after constructing the code,
the security level of the scheme is known and is sufficiently large with high
probability.

Nonconvex Regularization Based Sparse Recovery and Demixing with Application to Color Image Inpainting

Fei Wen, Lasith Adhikari, Ling Pei, Roummel F. Marcia, Peilin Liu, Robert C. Qiu
Comments: 13 pages, 9 figures
Subjects: Information Theory (cs.IT)

This work addresses the recovery and demixing problem of signals that are
sparse in some general dictionary. Involved applications include source
separation, image inpainting, super-resolution, and restoration of signals
corrupted by clipping, saturation, impulsive noise, or narrowband interference.
We employ the (ell_q)-norm ((0 le q < 1)) for sparsity inducing and propose a
constrained (ell_q)-minimization formulation for the recovery and demixing
problem. This nonconvex formulation is approximately solved by two efficient
first-order algorithms based on proximal coordinate descent and alternative
direction method of multipliers (ADMM), respectively. The new algorithms are
convergent in the nonconvex case under some mild conditions and scale well for
high-dimensional problems. A convergence condition of the new ADMM algorithm
has been derived. Furthermore, extension of the two algorithms for
multi-channels joint recovery has been presented, which can further exploit the
joint sparsity pattern among multi-channel signals. Various numerical
experiments showed that the new algorithms can achieve considerable performance
gain over the (ell_1)-regularized algorithms.

Efficient and Robust Recovery of Sparse Signal and Image Using Generalized Nonconvex Regularization

Fei Wen, Yuan Yang, Ling Pei, Wenxian Yu, Peilin Liu
Comments: 13 pages, 5 figures
Subjects: Information Theory (cs.IT)

This work addresses the robust reconstruction problem of a sparse signal from
compressed measurements. We propose a robust formulation for sparse
reconstruction which employs the (ell_1)-norm as the loss function for the
residual error and utilizes a generalized nonconvex penalty for sparsity
inducing. The (ell_1)-loss is less sensitive to outliers in the measurements
than the popular (ell_2)-loss, while the nonconvex penalty has the capability
of ameliorating the bias problem of the popular convex LASSO penalty and thus
can yield more accurate recovery. To solve this nonconvex and nonsmooth
minimization formulation efficiently, we propose a first-order algorithm based
on alternating direction method of multipliers (ADMM). A smoothing strategy on
the (ell_1)-loss function has been used in deriving the new algorithm to make
it convergent. Further, a sufficient condition for the convergence of the new
algorithm has been provided for generalized nonconvex regularization. In
comparison with several state-of-the-art algorithms, the new algorithm showed
better performance in numerical experiments in recovering sparse signals and
compressible images. The new algorithm scales well for large-scale problems, as
often encountered in image processing.

Toward Traffic Patterns in High-speed Railway Communication Systems: Power Allocation and Antenna Selection

Jiaxun Lu, Ke Xiong, Xuhong Chen, Pingyi Fan
Comments: 30 pages, 9 figures, submitted for journal publication
Subjects: Information Theory (cs.IT)

In high-speed railway (HSR) communication systems, distributed antenna is
usually employed to support frequent handover and enhance the signal to noise
ratio to user equipments. In this case, dynamic time-domain power allocation
and antenna selection (PAWAS) could be jointly optimized to improve the system
performances. This paper consider this problem in such a simple way where
dynamic switching between multiple-input-multiple-output (MIMO) and
single-input-multiple-output (SIMO) is allowed and exclusively utilized, while
the channel states and traffic demand are taken into account. The channel
states includes sparse and rich scattering terrains, and the traffic patterns
includes delay-sensitive and delay-insensitive as well as hybrid. Some
important results are obtained in theory. In sparse scattering terrains, for
delay-sensitive traffic, the PAWAS can be viewed as the generalization of
channel-inversion associated with transmit antenna selection. On the contrary,
for delay-insensitive traffic, the power allocation with MIMO can be viewed as
channel-inversion, but with SIMO, it is traditional water-filling. For the
hybrid traffic, the PAWAS can be partitioned as delay-sensitive and
delay-insensitive parts by some specific strategies. In rich scattering
terrains, the corresponding PAWAS is derived by some amendments in sparse
scattering terrains and similar results are then presented.

A New Decoding for Polynomial Residue Codes with Non-Pairwise Coprime Moduli

Li Xiao, Xiang-Gen Xia
Comments: 4 pages
Subjects: Information Theory (cs.IT)

This paper revisits polynomial residue codes with non-pairwise coprime moduli
by presenting a new decoding, called the minimum degree-weighted distance
decoding. This decoding is based on the degree-weighted distance and different
from the traditional minimum Hamming distance decoding. It is shown that for
the two types of minimum distance decoders, i.e., the minimum degree-weighted
distance decoding and the minimum Hamming distance decoding, one is not
absolutely stronger than the other, but they can complement each other from
different points of view.

Robust Polynomial Reconstruction via Chinese Remainder Theorem in the Presence of Small Degree Residue Errors

Li Xiao, Xiang-Gen Xia
Comments: 5 pages
Subjects: Information Theory (cs.IT)

Based on unique decoding of the polynomial residue code with non-pairwise
coprime moduli, a polynomial with degree less than that of the least common
multiple (lcm) of all the moduli can be accurately reconstructed when the
number of residue errors is less than half the minimum distance of the code.
However, once the number of residue errors is beyond half the minimum distance
of the code, the unique decoding may fail and lead to a large reconstruction
error. In this paper, assuming that all the residues are allowed to have errors
with small degrees, we consider how to reconstruct the polynomial as accurately
as possible in the sense that a reconstructed polynomial is obtained with only
the last ( au) number of coefficients being possibly erroneous, when the
residues are affected by errors with degrees upper bounded by ( au). In this
regard, we first propose a multi-level robust Chinese remainder theorem (CRT)
for polynomials, namely, a trade-off between the dynamic range of the degree of
the polynomial to be reconstructed and the residue error bound ( au) is
formulated. Furthermore, a simple closed-form reconstruction algorithm is also
proposed.

Proof of Correspondence between Keys and Encoding Maps in an Authentication Code

Juan Carlos Ku-Cauich, Guillermo Morales-Luna, Horacio Tapia-Recillas
Subjects: Number Theory (math.NT); Information Theory (cs.IT)

In a former paper the authors introduced two new systematic authentication
codes based on the Gray map over a Galois ring. In this paper, it is proved the
one-to-one onto correspondence between keys and encoding maps for the second
introduced authentication code.

Extending Differential Privacy for Treating Dependent Records via Information Theory

Genqiang Wu, Xianyao Xia, Yeping He
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

Differential privacy is a strong privacy notion based on indistinguishability
of outputs of two neighboring datasets, which represent two states of one’s
information is within or without of a dataset. However, when facing dependent
records, the representation would lose its foundation. Motivated by the
observation, we introduce a variant of differential privacy notion based on the
influence of outputs to an individual’s inputs. The new notion accurately
captures the the weakening of the dependent records to the privacy guarantee of
differential privacy.

Our new privacy notion gets on well with the differential privacy. When the
individuals are independent, the differential privacy model would be one
spatial case of our model. When the individuals are dependent, the group
privacy method to achieve differential privacy in dependent case can be used to
achieve new privacy model. This fits in well with the results of differential
privacy.

Finally, our new privacy model fits in well with the information theory. We
prove that if one mechanism satisfies the new privacy notion, the mutual
information of one individual to the mechanism’s outputs would be upper bounded
by a small valued. This implies that the rationality of our new model is based
on the information theory.