IT博客汇 | arXiv Paper Daily: Thu, 29 Sep 2016

arXiv Paper Daily: Thu, 29 Sep 2016

我爱机器学习(52ml.net)发表于 2016-09-29 00:00:00

Neural and Evolutionary Computing

Training a Probabilistic Graphical Model with Resistive Switching Electronic Synapses

S. Burc Eryilmaz, Emre Neftci, Siddharth Joshi, SangBum Kim, Matthew BrightSky, Hsiang-Lan Lung, Chung Lam, Gert Cauwenberghs, H.-S. Philip Wong
Comments: submitted to IEEE Transactions on Electron Devices
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)

Current large scale implementations of deep learning and data mining require
thousands of processors, massive amounts of off-chip memory, and consume
gigajoules of energy. Emerging memory technologies such as nanoscale
two-terminal resistive switching memory devices offer a compact, scalable and
low power alternative that permits on-chip co-located processing and memory in
fine-grain distributed parallel architecture. Here we report first use of
resistive switching memory devices for implementing and training a Restricted
Boltzmann Machine (RBM), a generative probabilistic graphical model as a key
component for unsupervised learning in deep networks. We experimentally
demonstrate a 45-synapse RBM realized with 90 resistive switching phase change
memory (PCM) elements trained with a bio-inspired variant of the Contrastive
Divergence (CD) algorithm, implementing Hebbian and anti-Hebbian weight
updates. The resistive PCM devices show a two-fold to ten-fold reduction in
error rate in a missing pixel pattern completion task trained over 30 epochs,
compared to untrained case. Measured programming energy consumption is 6.1 nJ
per epoch with the resistive switching PCM devices, a factor of ~150 times
lower than conventional processor-memory systems. We analyze and discuss the
dependence of learning performance on cycle-to-cycle variations as well as
number of gradual levels in the PCM analog memory devices.

Learning Genomic Representations to Predict Clinical Outcomes in Cancer

Safoora Yousefi, Congzheng Song, Nelson Nauata, Lee Cooper
Comments: ICLR 2016 Workshop Track- May 2nd 2016 International Conference on Learning Representations
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Genomics are rapidly transforming medical practice and basic biomedical
research, providing insights into disease mechanisms and improving therapeutic
strategies, particularly in cancer. The ability to predict the future course of
a patient’s disease from high-dimensional genomic profiling will be essential
in realizing the promise of genomic medicine, but presents significant
challenges for state-of-the-art survival analysis methods. In this abstract we
present an investigation in learning genomic representations with neural
networks to predict patient survival in cancer. We demonstrate the advantages
of this approach over existing survival analysis methods using brain tumor
data.

Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, Shiyue Zhang
Comments: Submitted to ICASSP 2017
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Recurrent neural networks (RNNs) have shown clear superiority in sequence
modeling, particularly the ones with gated units, such as long short-term
memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties
behind the remarkable performance remain unclear in many applications, e.g.,
automatic speech recognition (ASR). This paper employs visualization techniques
to study the behavior of LSTM and GRU when performing speech recognition tasks.
Our experiments show some interesting patterns in the gated memory, and some of
them have inspired simple yet effective modifications on the network structure.
We report two of such modifications: (1) lazy cell update in LSTM, and (2)
shortcut connections for residual learning. Both modifications lead to more
comprehensible and powerful networks.

Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification

Franck Dernoncourt, Ji Young Lee
Comments: Accepted as a conference paper at IEEE SLT 2016. The two authors contributed equally to this work
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Systems based on artificial neural networks (ANNs) have achieved
state-of-the-art results in many natural language processing tasks. Although
ANNs do not require manually engineered features, ANNs have many
hyperparameters to be optimized. The choice of hyperparameters significantly
impacts models’ performances. However, the ANN hyperparameters are typically
chosen by manual, grid, or random search, which either requires expert
experiences or is computationally expensive. Recent approaches based on
Bayesian optimization using Gaussian processes (GPs) is a more systematic way
to automatically pinpoint optimal or near-optimal machine learning
hyperparameters. Using a previously published ANN model yielding
state-of-the-art results for dialog act classification, we demonstrate that
optimizing hyperparameters using GP further improves the results, and reduces
the computational time by a factor of 4 compared to a random search. Therefore
it is a useful technique for tuning ANN models to yield the best performances
for natural language processing tasks.

Computer Vision and Pattern Recognition

A Simple, Fast and Highly-Accurate Algorithm to Recover 3D Shape from 2D Landmarks on a Single Image

Ruiqi Zhao, Yan Wang, Aleix Martinez
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Three-dimensional shape reconstruction of 2D landmark points on a single
image is a hallmark of human vision, but is a task that has been proven
difficult for computer vision algorithms. We define a feed-forward deep neural
network algorithm that can reconstruct 3D shapes from 2D landmark points almost
perfectly (i.e., with extremely small reconstruction errors), even when these
2D landmarks are from a single image. Our experimental results show an
improvement of up to two-fold over state-of-the-art computer vision algorithms;
3D shape reconstruction of human faces is given at a reconstruction error <
.004, cars at .0022, human bodies at .022, and highly-deformable flags at an
error of .0004. Our algorithm was also a top performer at the 2016 3D Face
Alignment in the Wild Challenge competition (done in conjunction with the
European Conference on Computer Vision, ECCV) that required the reconstruction
of 3D face shape from a single image. The derived algorithm can be trained in a
couple hours and testing runs at more than 1, 000 frames/s on an i7 desktop. We
also present an innovative data augmentation approach that allows us to train
the system efficiently with small number of samples. And the system is robust
to noise (e.g., imprecise landmark points) and missing data (e.g., occluded or
undetected landmark points).

Deep Architectures for Face Attributes

Tobi Baumgartner, Jack Culpepper
Comments: 11 pages, 2 figures, accepted in “Workshop on Facial Informatics in conjunction with ACCV ’16”
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We train a deep convolutional neural network to perform identity
classification using a new dataset of public figures annotated with age,
gender, ethnicity and emotion labels, and then fine-tune it for attribute
classification. An optimal sharing pattern of computational resources within
this network is determined by experiment, requiring only 1 G flops to produce
all predictions. Rather than fine-tune by relearning weights in one additional
layer after the penultimate layer of the identity network, we try several
different depths for each attribute. We find that prediction of age and emotion
is improved by fine-tuning from earlier layers onward, presumably because
deeper layers are progressively invariant to non-identity related changes in
the input.

Graph Based Convolutional Neural Network

Michael Edwards, Xianghua Xie
Comments: 11 pages, accepted into BMVC 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The benefit of localized features within the regular domain has given rise to
the use of Convolutional Neural Networks (CNNs) in machine learning, with great
proficiency in the image classification. The use of CNNs becomes problematic
within the irregular spatial domain due to design and convolution of a kernel
filter being non-trivial. One solution to this problem is to utilize graph
signal processing techniques and the convolution theorem to perform
convolutions on the graph of the irregular domain to obtain feature map
responses to learnt filters. We propose graph convolution and pooling operators
analogous to those in the regular domain. We also provide gradient calculations
on the input data and spectral filters, which allow for the deep learning of an
irregular spatial domain problem. Signal filters take the form of spectral
multipliers, applying convolution in the graph spectral domain. Applying smooth
multipliers results in localized convolutions in the spatial domain, with
smoother multipliers providing sharper feature maps. Algebraic Multigrid is
presented as a graph pooling method, reducing the resolution of the graph
through agglomeration of nodes between layers of the network. Evaluation of
performance on the MNIST digit classification problem in both the regular and
irregular domain is presented, with comparison drawn to standard CNN. The
proposed graph CNN provides a deep learning method for the irregular domains
present in the machine learning community, obtaining 94.23% on the regular
grid, and 94.96% on a spatially irregular subsampled MNIST.

A Discriminative Framework for Anomaly Detection in Large Videos

Allison Del Giorno, J. Andrew Bagnell, Martial Hebert
Comments: 14 pages without references, 16 pages with. 7 figures. Accepted to ECCV 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We address an anomaly detection setting in which training sequences are
unavailable and anomalies are scored independently of temporal ordering.
Current algorithms in anomaly detection are based on the classical density
estimation approach of learning high-dimensional models and finding
low-probability events. These algorithms are sensitive to the order in which
anomalies appear and require either training data or early context assumptions
that do not hold for longer, more complex videos. By defining anomalies as
examples that can be distinguished from other examples in the same video, our
definition inspires a shift in approaches from classical density estimation to
simple discriminative learning. Our contributions include a novel framework for
anomaly detection that is (1) independent of temporal ordering of anomalies,
and (2) unsupervised, requiring no separate training sequences. We show that
our algorithm can achieve state-of-the-art results even when we adjust the
setting by removing training sequences from standard datasets.

Towards the effectiveness of Deep Convolutional Neural Network based Fast Random Forest Classifier

Mrutyunjaya Panda (Reader, P.G.Department of Computer Science and Applications, Utkal University), Vani Vihar (Bhubaneswar, India)
Comments: 11 pages, 9 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep Learning is considered to be a quite young in the area of machine
learning research, found its effectiveness in dealing complex yet high
dimensional dataset that includes but limited to images, text and speech etc.
with multiple levels of representation and abstraction. As there are a plethora
of research on these datasets by various researchers , a win over them needs
lots of attention. Careful setting of Deep learning parameters is of paramount
importance in order to avoid the overfitting unlike conventional methods with
limited parameter settings. Deep Convolutional neural network (DCNN) with
multiple layers of compositions and appropriate settings might be is an
efficient machine learning method that can outperform the conventional methods
in a great way. However, due to its slow adoption in learning, there are also
always a chance of overfitting during feature selection process, which can be
addressed by employing a regularization method called dropout. Fast Random
Forest (FRF) is a powerful ensemble classifier especially when the datasets are
noisy and when the number of attributes is large in comparison to the number of
instances, as is the case of Bioinformatics datasets. Several publicly
available Bioinformatics dataset, Handwritten digits recognition and Image
segmentation dataset are considered for evaluation of the proposed approach.
The excellent performance obtained by the proposed DCNN based feature selection
with FRF classifier on high dimensional datasets makes it a fast and accurate
classifier in comparison the state-of-the-art.

Understanding data augmentation for classification: when to warp?

Sebastien C. Wong, Adam Gatt, Victor Stamatescu, Mark D. McDonnell
Comments: 6 pages, 6 figures, DICTA 2016 conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we investigate the benefit of augmenting data with
synthetically created samples when training a machine learning classifier. Two
approaches for creating additional training samples are data warping, which
generates additional samples through transformations applied in the data-space,
and synthetic over-sampling, which creates additional samples in feature-space.
We experimentally evaluate the benefits of data augmentation for a
convolutional backpropagation-trained neural network, a convolutional support
vector machine and a convolutional extreme learning machine classifier, using
the standard MNIST handwritten digit dataset. We found that while it is
possible to perform generic augmentation in feature-space, if plausible
transforms for the data are known then augmentation in data-space provides a
greater benefit for improving performance and reducing overfitting.

Video Summarization using Deep Semantic Features

Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
Comments: 16 pages, the 13th Asian Conference on Computer Vision (ACCV’16)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a video summarization technique for an Internet video to
provide a quick way to overview its content. This is a challenging problem
because finding important or informative parts of the original video requires
to understand its content. Furthermore the content of Internet videos is very
diverse, ranging from home videos to documentaries, which makes video
summarization much more tough as prior knowledge is almost not available. To
tackle this problem, we propose to use deep video features that can encode
various levels of content semantics, including objects, actions, and scenes,
improving the efficiency of standard video summarization techniques. For this,
we design a deep neural network that maps videos as well as descriptions to a
common semantic space and jointly trained it with associated pairs of videos
and descriptions. To generate a video summary, we extract the deep features
from each segment of the original video and apply a clustering-based
summarization technique to them. We evaluate our video summaries using the
SumMe dataset as well as baseline approaches. The results demonstrated the
advantages of incorporating our deep semantic features in a video summarization
technique.

Scalable Discrete Supervised Hash Learning with Asymmetric Matrix Factorization

Shifeng Zhang, Jianmin Li, Jinma Guo, Bo Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Hashing method maps similar data to binary hashcodes with smaller hamming
distance, and it has received a broad attention due to its low storage cost and
fast retrieval speed. However, the existing limitations make the present
algorithms difficult to deal with large-scale datasets: (1) discrete
constraints are involved in the learning of the hash function; (2) pairwise or
triplet similarity is adopted to generate efficient hashcodes, resulting both
time and space complexity are greater than O(n^2). To address these issues, we
propose a novel discrete supervised hash learning framework which can be
scalable to large-scale datasets. First, the discrete learning procedure is
decomposed into a binary classifier learning scheme and binary codes learning
scheme, which makes the learning procedure more efficient. Second, we adopt the
Asymmetric Low-rank Matrix Factorization and propose the Fast Clustering-based
Batch Coordinate Descent method, such that the time and space complexity is
reduced to O(n). The proposed framework also provides a flexible paradigm to
incorporate with arbitrary hash function, including deep neural networks and
kernel methods. Experiments on large-scale datasets demonstrate that the
proposed method is superior or comparable with state-of-the-art hashing
algorithms.

A Fast Factorization-based Approach to Robust PCA

Chong Peng, Zhao Kang, Qiang Chen
Comments: ICDM 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Robust principal component analysis (RPCA) has been widely used for
recovering low-rank matrices in many data mining and machine learning problems.
It separates a data matrix into a low-rank part and a sparse part. The convex
approach has been well studied in the literature. However, state-of-the-art
algorithms for the convex approach usually have relatively high complexity due
to the need of solving (partial) singular value decompositions of large
matrices. A non-convex approach, AltProj, has also been proposed with lighter
complexity and better scalability. Given the true rank $r$ of the underlying
low rank matrix, AltProj has a complexity of $O(r^2dn)$, where $d imes n$ is
the size of data matrix. In this paper, we propose a novel factorization-based
model of RPCA, which has a complexity of $O(kdn)$, where $k$ is an upper bound
of the true rank. Our method does not need the precise value of the true rank.
From extensive experiments, we observe that AltProj can work only when $r$ is
precisely known in advance; however, when the needed rank parameter $r$ is
specified to a value different from the true rank, AltProj cannot fully
separate the two parts while our method succeeds. Even when both work, our
method is about 4 times faster than AltProj. Our method can be used as a
light-weight, scalable tool for RPCA in the absence of the precise value of the
true rank.

YouTube-8M: A Large-Scale Video Classification Benchmark

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Many recent advancements in Computer Vision are attributed to large datasets.
Open-source software packages for Machine Learning and inexpensive commodity
hardware have reduced the barrier of entry for exploring novel approaches at
scale. It is possible to train models over millions of examples within a few
days. Although large-scale datasets exist for image understanding, such as
ImageNet, there are no comparable size video classification datasets.

In this paper, we introduce YouTube-8M, the largest multi-label video
classification dataset, composed of ~8 million videos (500K hours of video),
annotated with a vocabulary of 4800 visual entities. To get the videos and
their labels, we used a YouTube video annotation system, which labels videos
with their main topics. While the labels are machine-generated, they have
high-precision and are derived from a variety of human-based signals including
metadata and query click signals. We filtered the video labels (Knowledge Graph
entities) using both automated and manual curation strategies, including asking
human raters if the labels are visually recognizable. Then, we decoded each
video at one-frame-per-second, and used a Deep CNN pre-trained on ImageNet to
extract the hidden representation immediately prior to the classification
layer. Finally, we compressed the frame features and make both the features and
video-level labels available for download.

We trained various (modest) classification models on the dataset, evaluated
them using popular evaluation metrics, and report them as baselines. Despite
the size of the dataset, some of our models train to convergence in less than a
day on a single machine using TensorFlow. We plan to release code for training
a TensorFlow model and for computing metrics.

A Transportation $L^p$ Distance for Signal Analysis

Matthew Thorpe, Serim Park, Soheil Kolouri, Gustavo K. Rohde, Dejan Slepčev
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Transport based distances, such as the Wasserstein distance and earth mover’s
distance, have been shown to be an effective tool in signal and image analysis.
The success of transport based distances is in part due to their Lagrangian
nature which allows it to capture the important variations in many signal
classes. However these distances require the signal to be nonnegative and
normalized. Furthermore, the signals are considered as measures and compared by
redistributing (transporting) them, which does not directly take into account
the signal intensity. Here we study a transport-based distance, called the
$TL^p$ distance, that combines Lagrangian and intensity modelling and is
directly applicable to general, non-positive and multi-channelled signals. The
framework allows the application of existing numerical methods. We give an
overview of the basic properties of this distance and applications to
classification, with multi-channelled, non-positive one and two-dimensional
signals, and color transfer.

Task Specific Adversarial Cost Function

Antonia Creswell, Anil A. Bharath
Comments: Submitted to TPAMI
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The cost function used to train a generative model should fit the purpose of
the model. If the model is intended for tasks such as generating perceptually
correct samples, it is beneficial to maximise the likelihood of a sample drawn
from the model, Q, coming from the same distribution as the training data, P.
This is equivalent to minimising the Kullback-Leibler (KL) distance, KL[Q||P].
However, if the model is intended for tasks such as retrieval or classification
it is beneficial to maximise the likelihood that a sample drawn from the
training data is captured by the model, equivalent to minimising KL[P||Q]. The
cost function used in adversarial training optimises the Jensen-Shannon entropy
which can be seen as an even interpolation between KL[Q||P] and KL[P||Q]. Here,
we propose an alternative adversarial cost function which allows easy tuning of
the model for either task. Our task specific cost function is evaluated on a
dataset of hand-written characters in the following tasks: Generation,
retrieval and one-shot learning.

Learning to Push by Grasping: Using multiple tasks for effective learning

Lerrel Pinto, Abhinav Gupta
Comments: Under review at the International Conference on Robotics and Automation (ICRA) 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Recently, end-to-end learning frameworks are gaining prevalence in the field
of robot control. These frameworks input states/images and directly predict the
torques or the action parameters. However, these approaches are often critiqued
due to their huge data requirements for learning a task. The argument of the
difficulty in scalability to multiple tasks is well founded, since training
these tasks often require hundreds or thousands of examples. But do end-to-end
approaches need to learn a unique model for every task? Intuitively, it seems
that sharing across tasks should help since all tasks require some common
understanding of the environment. In this paper, we attempt to take the next
step in data-driven end-to-end learning frameworks: move from the realm of
task-specific models to joint learning of multiple robot tasks. In an
astonishing result we show that models with multi-task learning tend to perform
better than task-specific models trained with same amounts of data. For
example, a deep-network learned with 2.5K grasp and 2.5K push examples performs
better on grasping than a network trained on 5K grasp examples.

Understanding and Exploiting Object Interaction Landscapes

Sören Pirk, Vojtech Krs, Kaimo Hu, Suren Deepak Rajasekaran, Hao Kang, Bedrich Benes, Yusuke Yoshiyasu, Leonidas J. Guibas
Comments: 14 pages, 19 figures
Subjects: Graphics (cs.GR); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)

Interactions play a key role in understanding objects and scenes, for both
virtual and real world agents. We introduce a new general representation for
proximal interactions among physical objects that is agnostic to the type of
objects or interaction involved. The representation is based on tracking
particles on one of the participating objects and then observing them with
sensors appropriately placed in the interaction volume or on the interaction
surfaces. We show how to factorize these interaction descriptors and project
them into a particular participating object so as to obtain a new functional
descriptor for that object, its interaction landscape, capturing its observed
use in a spatio-temporal framework. Interaction landscapes are independent of
the particular interaction and capture subtle dynamic effects in how objects
move and behave when in functional use. Our method relates objects based on
their function, establishes correspondences between shapes based on functional
key points and regions, and retrieves peer and partner objects with respect to
an interaction.

Analyzing the Behavior of Visual Question Answering Models

Aishwarya Agrawal, Dhruv Batra, Devi Parikh
Comments: 13 pages, 20 figures; To appear in EMNLP 2016
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Recently, a number of deep-learning based models have been proposed for the
task of Visual Question Answering (VQA). The performance of most models is
clustered around 60-70%. In this paper we propose systematic methods to analyze
the behavior of these models as a first step towards recognizing their
strengths and weaknesses, and identifying the most fruitful directions for
progress. We analyze two models, one each from two major classes of VQA models
— with-attention and without-attention and show the similarities and
differences in the behavior of these models. We also analyze the winning entry
of the VQA Challenge 2016.

Our behavior analysis reveals that despite recent progress, today’s VQA
models are “myopic” (tend to fail on sufficiently novel instances), often “jump
to conclusions” (converge on a predicted answer after ‘listening’ to just half
the question), and are “stubborn” (do not change their answers across images).

Artificial Intelligence

Global Constraint Catalog, Volume II, Time-Series Constraints

Ekaterina Arafailova, Nicolas Beldiceanu, Rémi Douence, Mats Carlsson, Pierre Flener, María Andreína Francisco Rodríguez, Justin Pearson, Helmut Simonis
Comments: 2709 pages
Subjects: Artificial Intelligence (cs.AI)

First this report presents a restricted set of finite transducers used to
synthesise structural time-series constraints described by means of a
multi-layered function composition scheme. Second it provides the corresponding
synthesised catalogue of structural time-series constraints where each
constraint is explicitly described in terms of automata with accumulators.

Learning from the Hindsight Plan — Episodic MPC Improvement

Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

Model predictive control (MPC) is a popular control method that has proved
effective for robotics, among other fields. MPC performs re-planning at every
time step. Re-planning is done with a limited horizon per computational and
real-time constraints and often also for robustness to potential model errors.
However, the limited horizon leads to suboptimal performance. In this work, we
consider the iterative learning setting, where the same task can be repeated
several times, and propose a policy improvement scheme for MPC. The main idea
is that between executions we can, offline, run MPC with a longer horizon,
resulting in a hindsight plan. To bring the next real-world execution closer to
the hindsight plan, our approach learns to re-shape the original cost function
with the goal of satisfying the following property: short horizon planning (as
realistic during real executions) with respect to the shaped cost should result
in mimicking the hindsight plan. This effectively consolidates long-term
reasoning into the short-horizon planning. We empirically evaluate our approach
in contact-rich manipulation tasks both in simulated and real environments,
such as peg insertion by a real PR2 robot.

Hierarchical Memory Networks for Answer Selection on Unknown Words

Jiaming Xu, Jing Shi, Yiqun Yao, Suncong Zheng, Bo Xu, Bo Xu
Comments: 10 pages, to appear in COLING 2016
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Recently, end-to-end memory networks have shown promising results on Question
Answering task, which encode the past facts into an explicit memory and perform
reasoning ability by making multiple computational steps on the memory.
However, memory networks conduct the reasoning on sentence-level memory to
output coarse semantic vectors and do not further take any attention mechanism
to focus on words, which may lead to the model lose some detail information,
especially when the answers are rare or unknown words. In this paper, we
propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the
past facts into sentence-level memory and word-level memory respectively. Then,
(k)-max pooling is exploited following reasoning module on the sentence-level
memory to sample the (k) most relevant sentences to a question and feed these
sentences into attention mechanism on the word-level memory to focus the words
in the selected sentences. Finally, the prediction is jointly learned over the
outputs of the sentence-level reasoning module and the word-level attention
mechanism. The experimental results demonstrate that our approach successfully
conducts answer selection on unknown words and achieves a better performance
than memory networks.

A Fast Factorization-based Approach to Robust PCA

Chong Peng, Zhao Kang, Qiang Chen
Comments: ICDM 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Robust principal component analysis (RPCA) has been widely used for
recovering low-rank matrices in many data mining and machine learning problems.
It separates a data matrix into a low-rank part and a sparse part. The convex
approach has been well studied in the literature. However, state-of-the-art
algorithms for the convex approach usually have relatively high complexity due
to the need of solving (partial) singular value decompositions of large
matrices. A non-convex approach, AltProj, has also been proposed with lighter
complexity and better scalability. Given the true rank $r$ of the underlying
low rank matrix, AltProj has a complexity of $O(r^2dn)$, where $d imes n$ is
the size of data matrix. In this paper, we propose a novel factorization-based
model of RPCA, which has a complexity of $O(kdn)$, where $k$ is an upper bound
of the true rank. Our method does not need the precise value of the true rank.
From extensive experiments, we observe that AltProj can work only when $r$ is
precisely known in advance; however, when the needed rank parameter $r$ is
specified to a value different from the true rank, AltProj cannot fully
separate the two parts while our method succeeds. Even when both work, our
method is about 4 times faster than AltProj. Our method can be used as a
light-weight, scalable tool for RPCA in the absence of the precise value of the
true rank.

Information Retrieval

Hierarchical Memory Networks for Answer Selection on Unknown Words

Recently, end-to-end memory networks have shown promising results on Question
Answering task, which encode the past facts into an explicit memory and perform
reasoning ability by making multiple computational steps on the memory.
However, memory networks conduct the reasoning on sentence-level memory to
output coarse semantic vectors and do not further take any attention mechanism
to focus on words, which may lead to the model lose some detail information,
especially when the answers are rare or unknown words. In this paper, we
propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the
past facts into sentence-level memory and word-level memory respectively. Then,
(k)-max pooling is exploited following reasoning module on the sentence-level
memory to sample the (k) most relevant sentences to a question and feed these
sentences into attention mechanism on the word-level memory to focus the words
in the selected sentences. Finally, the prediction is jointly learned over the
outputs of the sentence-level reasoning module and the word-level attention
mechanism. The experimental results demonstrate that our approach successfully
conducts answer selection on unknown words and achieves a better performance
than memory networks.

Computation and Language

Stance classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations

Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik
Comments: COLING 2016
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Rumour stance classification, the task that determines if each tweet in a
collection discussing a rumour is supporting, denying, questioning or simply
commenting on the rumour, has been attracting substantial interest. Here we
introduce a novel approach that makes use of the sequence of transitions
observed in tree-structured conversation threads in Twitter. The conversation
threads are formed by harvesting users’ replies to one another, which results
in a nested tree-like structure. Previous work addressing stance classification
task in Twitter has treated each tweet as a separate unit. Here we analyse
tweets by virtue of their position in a sequence and test two sequential
classifiers, Linear-Chain CRF and Tree CRF, each of which makes different
assumptions about the conversation structure. We experiment with eight Twitter
datasets, collected during breaking news, and show that exploiting the
sequential structure of Twitter conversations achieves significant improvements
over the non-sequential methods. Our work is the first to model Twitter
conversations as a tree structure in this manner, introducing a novel way of
tackling NLP tasks on Twitter conversations.

Psychologically Motivated Text Mining

Ekaterina Shutova, Patricia Lichtenstein
Subjects: Computation and Language (cs.CL)

Natural language processing techniques are increasingly applied to identify
social trends and predict behavior based on large text collections. Existing
methods typically rely on surface lexical and syntactic information. Yet,
research in psychology shows that patterns of human conceptualisation, such as
metaphorical framing, are reliable predictors of human expectations and
decisions. In this paper, we present a method to learn patterns of metaphorical
framing from large text collections, using statistical techniques. We apply the
method to data in three different languages and evaluate the identified
patterns, demonstrating their psychological validity.

Unsupervised Neural Hidden Markov Models

Ke Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, Kevin Knight
Comments: accepted at EMNLP 2016, Workshop on Structured Prediction for NLP. Oral presentation
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

In this work, we present the first results for neuralizing an Unsupervised
Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach
outperforms existing generative models and is competitive with the
state-of-the-art though with a simpler model easily extended to include
additional context.

Byte-based Language Identification with Deep Convolutional Networks

Johannes Bjerva
Comments: 6 pages. arXiv admin note: text overlap with arXiv:1609.07053
Subjects: Computation and Language (cs.CL)

We report on our system for the shared task on discriminating between similar
languages (DSL 2016). The system uses only byte representations in a deep
residual network (ResNet). The system, named ResIdent, is trained only on the
data released with the task (closed training). We obtain 84.88% accuracy on
subtask A, 68.80% accuracy on subtask B1, and 69.80% accuracy on subtask B2. A
large difference in accuracy on development data can be observed with
relatively minor changes in our network’s architecture and hyperparameters. We
therefore expect fine-tuning of these parameters to yield higher accuracies.

Equation Parsing: Mapping Sentences to Grounded Equations

Subhro Roy, Shyam Upadhyay, Dan Roth
Subjects: Computation and Language (cs.CL)

Identifying mathematical relations expressed in text is essential to
understanding a broad range of natural language text from election reports, to
financial news, to sport commentaries to mathematical word problems. This paper
focuses on identifying and understanding mathematical relations described
within a single sentence. We introduce the problem of Equation Parsing — given
a sentence, identify noun phrases which represent variables, and generate the
mathematical equation expressing the relation described in the sentence. We
introduce the notion of projective equation parsing and provide an efficient
algorithm to parse text to projective equations. Our system makes use of a high
precision lexicon of mathematical expressions and a pipeline of structured
predictors, and generates correct equations in $70\%$ of the cases. In $60\%$
of the time, it also identifies the correct noun phrase $
ightarrow$ variables
mapping, significantly outperforming baselines. We also release a new annotated
dataset for task evaluation.

Effective Combination of Language and Vision Through Model Composition and the R-CCA Method

Hagar Loeub, Roi Reichart
Comments: 6 pages, 1 figure
Subjects: Computation and Language (cs.CL)

We address the problem of integrating textual and visual information in
vector space models for word meaning representation. We first present the
Residual CCA (R-CCA) method, that complements the standard CCA method by
representing, for each modality, the difference between the original signal and
the signal projected to the shared, max correlation, space. We then show that
constructing visual and textual representations and then post-processing them
through composition of common modeling motifs such as PCA, CCA, R-CCA and
linear interpolation (a.k.a sequential modeling) yields high quality models. On
five standard semantic benchmarks our sequential models outperform recent
multimodal representation learning alternatives, including ones that rely on
joint representation learning. For two of these benchmarks our R-CCA method is
part of the Best configuration our algorithm yields.

Character Sequence Models for ColorfulWords

Kazuya Kawakami, Chris Dyer, Bryan R. Routledge, Noah A. Smith
Subjects: Computation and Language (cs.CL)

We present a neural network architecture to predict a point in color space
from the sequence of characters in the color’s name. Using large scale
color–name pairs obtained from an online color design forum, we evaluate our
model on a “color Turing test” and find that, given a name, the colors
predicted by our model are preferred by annotators to color names created by
humans. Our datasets and demo system are available online at
this http URL

Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification

Systems based on artificial neural networks (ANNs) have achieved
state-of-the-art results in many natural language processing tasks. Although
ANNs do not require manually engineered features, ANNs have many
hyperparameters to be optimized. The choice of hyperparameters significantly
impacts models’ performances. However, the ANN hyperparameters are typically
chosen by manual, grid, or random search, which either requires expert
experiences or is computationally expensive. Recent approaches based on
Bayesian optimization using Gaussian processes (GPs) is a more systematic way
to automatically pinpoint optimal or near-optimal machine learning
hyperparameters. Using a previously published ANN model yielding
state-of-the-art results for dialog act classification, we demonstrate that
optimizing hyperparameters using GP further improves the results, and reduces
the computational time by a factor of 4 compared to a random search. Therefore
it is a useful technique for tuning ANN models to yield the best performances
for natural language processing tasks.

Deep Reinforcement Learning for Mention-Ranking Coreference Models

Kevin Clark, Christopher D. Manning
Comments: To appear in EMNLP 2016
Subjects: Computation and Language (cs.CL)

Coreference resolution systems are typically trained with heuristic loss
functions that require careful tuning. In this paper we instead apply
reinforcement learning to directly optimize a neural mention-ranking model for
coreference evaluation metrics. We experiment with two approaches: the
REINFORCE policy gradient algorithm and a reward-rescaled max-margin objective.
We find the latter to be more effective, resulting in significant improvements
over the current state-of-the-art on the English and Chinese portions of the
CoNLL 2012 Shared Task.

Hierarchical Memory Networks for Answer Selection on Unknown Words

Recently, end-to-end memory networks have shown promising results on Question
Answering task, which encode the past facts into an explicit memory and perform
reasoning ability by making multiple computational steps on the memory.
However, memory networks conduct the reasoning on sentence-level memory to
output coarse semantic vectors and do not further take any attention mechanism
to focus on words, which may lead to the model lose some detail information,
especially when the answers are rare or unknown words. In this paper, we
propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the
past facts into sentence-level memory and word-level memory respectively. Then,
(k)-max pooling is exploited following reasoning module on the sentence-level
memory to sample the (k) most relevant sentences to a question and feed these
sentences into attention mechanism on the word-level memory to focus the words
in the selected sentences. Finally, the prediction is jointly learned over the
outputs of the sentence-level reasoning module and the word-level attention
mechanism. The experimental results demonstrate that our approach successfully
conducts answer selection on unknown words and achieves a better performance
than memory networks.

Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

Recurrent neural networks (RNNs) have shown clear superiority in sequence
modeling, particularly the ones with gated units, such as long short-term
memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties
behind the remarkable performance remain unclear in many applications, e.g.,
automatic speech recognition (ASR). This paper employs visualization techniques
to study the behavior of LSTM and GRU when performing speech recognition tasks.
Our experiments show some interesting patterns in the gated memory, and some of
them have inspired simple yet effective modifications on the network structure.
We report two of such modifications: (1) lazy cell update in LSTM, and (2)
shortcut connections for residual learning. Both modifications lead to more
comprehensible and powerful networks.

Using Natural Language Processing and Qualitative Analysis to Intervene in Gang Violence: A Collaboration Between Social Work Researchers and Data Scientists

Desmond Upton Patton (Columbia University), Kathleen McKeown (Columbia University), Owen Rambow (Columbia University), Jamie Macbeth (Columbia University)
Comments: Presented at the Data For Good Exchange 2016
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)

The U.S. has the highest rate of firearm-related deaths when compared to
other industrialized countries. Violence particularly affects low-income, urban
neighborhoods in cities like Chicago, which saw a 40% increase in firearm
violence from 2014 to 2015 to more than 3,000 shooting victims. While recent
studies have found that urban, gang-involved individuals curate a unique and
complex communication style within and between social media platforms,
organizations focused on reducing gang violence are struggling to keep up with
the growing complexity of social media platforms and the sheer volume of data
they present. In this paper, describe the Digital Urban Violence Analysis
Approach (DUVVA), a collaborative qualitative analysis method used in a
collaboration between data scientists and social work researchers to develop a
suite of systems for decoding the high- stress language of urban, gang-involved
youth. Our approach leverages principles of grounded theory when analyzing
approximately 800 tweets posted by Chicago gang members and participation of
youth from Chicago neighborhoods to create a language resource for natural
language processing (NLP) methods. In uncovering the unique language and
communication style, we developed automated tools with the potential to detect
aggressive language on social media and aid individuals and groups in
performing violence prevention and interruption.

Distributed, Parallel, and Cluster Computing

A generic framework for the development of geospatial processing pipelines on clusters

Remi Cresson
Journal-ref: IEEE journal of geoscience and remote sensing letters, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The amount of remote sensing data available to applications is constantly
growing due to the rise of very-high-resolution sensors and short repeat cycle
satellites. Consequently, tackling computational complexity in Earth
Observation information extraction is rising as a major challenge. Resorting to
High Performance Computing (HPC) is becoming a common practice, since it
provides environments and programming facilities able to speed-up processes. In
particular, clusters are flexible, cost-effective systems able to perform
data-intensive tasks ideally fulfilling any computational requirement. However,
their use typically implies a significant coding effort to build proper
implementations of specific processing pipelines. This paper presents a generic
framework for the development of RS images processing applications targeting
cluster computing. It is based on common open sources libraries, and leverages
the parallelization of a wide variety of image processing pipelines in a
transparent way. Performances on typical RS tasks implemented using the
proposed framework demonstrate a great potential for the effective and timely
processing of large amount of data.

Training a Probabilistic Graphical Model with Resistive Switching Electronic Synapses

Current large scale implementations of deep learning and data mining require
thousands of processors, massive amounts of off-chip memory, and consume
gigajoules of energy. Emerging memory technologies such as nanoscale
two-terminal resistive switching memory devices offer a compact, scalable and
low power alternative that permits on-chip co-located processing and memory in
fine-grain distributed parallel architecture. Here we report first use of
resistive switching memory devices for implementing and training a Restricted
Boltzmann Machine (RBM), a generative probabilistic graphical model as a key
component for unsupervised learning in deep networks. We experimentally
demonstrate a 45-synapse RBM realized with 90 resistive switching phase change
memory (PCM) elements trained with a bio-inspired variant of the Contrastive
Divergence (CD) algorithm, implementing Hebbian and anti-Hebbian weight
updates. The resistive PCM devices show a two-fold to ten-fold reduction in
error rate in a missing pixel pattern completion task trained over 30 epochs,
compared to untrained case. Measured programming energy consumption is 6.1 nJ
per epoch with the resistive switching PCM devices, a factor of ~150 times
lower than conventional processor-memory systems. We analyze and discuss the
dependence of learning performance on cycle-to-cycle variations as well as
number of gradual levels in the PCM analog memory devices.

Learning

Statistical comparison of classifiers through Bayesian hierarchical modelling

Giorgio Corani, Alessio Benavoli, Janez Demšar, Francesca Mangili, Marco Zaffalon
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We propose a new approach for the statistical comparison of algorithms which
have been cross-validated on multiple data sets. It is a Bayesian hierarchical
method; it draws inferences on single and on multiple datasets taking into
account the mean and the variability of the cross-validation results. It is
able to detect equivalent classifiers and to claim significances which have a
practical impact. On each data sets it estimates more accurately than the
existing methods the difference of accuracy between the two classifiers thanks
to shrinkage. Such advantages are demonstrated by simulations on synthetic and
real data.

Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

Recurrent neural networks (RNNs) have shown clear superiority in sequence
modeling, particularly the ones with gated units, such as long short-term
memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties
behind the remarkable performance remain unclear in many applications, e.g.,
automatic speech recognition (ASR). This paper employs visualization techniques
to study the behavior of LSTM and GRU when performing speech recognition tasks.
Our experiments show some interesting patterns in the gated memory, and some of
them have inspired simple yet effective modifications on the network structure.
We report two of such modifications: (1) lazy cell update in LSTM, and (2)
shortcut connections for residual learning. Both modifications lead to more
comprehensible and powerful networks.

Deep Reinforcement Learning for Tensegrity Robot Locomotion

Xinyang Geng, Marvin Zhang, Jonathan Bruce, Ken Caluwaerts, Massimo Vespignani, Vytas SunSpiral, Pieter Abbeel, Sergey Levine
Comments: Under review at the International Conference on Robotics and Automation (ICRA), 2017
Subjects: Robotics (cs.RO); Learning (cs.LG)

Tensegrity robots, composed of rigid rods connected by elastic cables, have a
number of unique properties that make them appealing for use as planetary
exploration rovers. However, control of tensegrity robots remains a difficult
problem due to their unusual structures and complex dynamics. In this work, we
show how locomotion gaits can be learned automatically using a novel extension
of mirror descent guided policy search (MDGPS) applied to periodic locomotion
movements, and we demonstrate the effectiveness of our approach on tensegrity
robot locomotion. We evaluate our method with real-world and simulated
experiments on the SUPERball tensegrity robot, showing that the learned
policies generalize to changes in system parameters, unreliable sensor
measurements, and variation in environmental conditions, including varied
terrains and a range of different gravities. Our experiments demonstrate that
our method not only learns fast, power-efficient feedback policies for rolling
gaits, but that these policies can succeed with only the limited onboard
sensing provided by SUPERball’s accelerometers. We compare the learned feedback
policies to learned open-loop policies and hand-engineered controllers, and
demonstrate that the learned policy enables the first continuous, reliable
locomotion gait for the real SUPERball robot.

Learning to Push by Grasping: Using multiple tasks for effective learning

Recently, end-to-end learning frameworks are gaining prevalence in the field
of robot control. These frameworks input states/images and directly predict the
torques or the action parameters. However, these approaches are often critiqued
due to their huge data requirements for learning a task. The argument of the
difficulty in scalability to multiple tasks is well founded, since training
these tasks often require hundreds or thousands of examples. But do end-to-end
approaches need to learn a unique model for every task? Intuitively, it seems
that sharing across tasks should help since all tasks require some common
understanding of the environment. In this paper, we attempt to take the next
step in data-driven end-to-end learning frameworks: move from the realm of
task-specific models to joint learning of multiple robot tasks. In an
astonishing result we show that models with multi-task learning tend to perform
better than task-specific models trained with same amounts of data. For
example, a deep-network learned with 2.5K grasp and 2.5K push examples performs
better on grasping than a network trained on 5K grasp examples.

Unsupervised Neural Hidden Markov Models

In this work, we present the first results for neuralizing an Unsupervised
Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach
outperforms existing generative models and is competitive with the
state-of-the-art though with a simpler model easily extended to include
additional context.

Learning from the Hindsight Plan — Episodic MPC Improvement

Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

Model predictive control (MPC) is a popular control method that has proved
effective for robotics, among other fields. MPC performs re-planning at every
time step. Re-planning is done with a limited horizon per computational and
real-time constraints and often also for robustness to potential model errors.
However, the limited horizon leads to suboptimal performance. In this work, we
consider the iterative learning setting, where the same task can be repeated
several times, and propose a policy improvement scheme for MPC. The main idea
is that between executions we can, offline, run MPC with a longer horizon,
resulting in a hindsight plan. To bring the next real-world execution closer to
the hindsight plan, our approach learns to re-shape the original cost function
with the goal of satisfying the following property: short horizon planning (as
realistic during real executions) with respect to the shaped cost should result
in mimicking the hindsight plan. This effectively consolidates long-term
reasoning into the short-horizon planning. We empirically evaluate our approach
in contact-rich manipulation tasks both in simulated and real environments,
such as peg insertion by a real PR2 robot.

Variational Autoencoder for Deep Learning of Images, Labels and Captions

Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, Lawrence Carin
Comments: NIPS 2016 (To appear)
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

A novel variational autoencoder is developed to model images, as well as
associated labels or captions. The Deep Generative Deconvolutional Network
(DGDN) is used as a decoder of the latent image features, and a deep
Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used
to approximate a distribution for the latent DGDN features/code. The latent
code is also linked to generative models for labels (Bayesian support vector
machine) or captions (recurrent neural network). When predicting a
label/caption for a new image at test, averaging is performed across the
distribution of latent codes; this is computationally efficient as a
consequence of the learned CNN-based encoder. Since the framework is capable of
modeling the image in the presence/absence of associated labels/captions, a new
semi-supervised setting is manifested for CNN learning with images; the
framework even allows unsupervised CNN learning, based on images alone.

Multiplicative weights, equalizers, and P=PPAD

Ioannis Avramopoulos
Subjects: Computer Science and Game Theory (cs.GT); Computational Complexity (cs.CC); Learning (cs.LG)

We show that, by using multiplicative weights in a game-theoretic thought
experiment (and an important convexity result on the composition of
multiplicative weights with the relative entropy function), a symmetric
bimatrix game (that is, a bimatrix matrix wherein the payoff matrix of each
player is the transpose of the payoff matrix of the other) either has an
interior symmetric equilibrium or there is a pure strategy that is weakly
dominated by some mixed strategy. Weakly dominated pure strategies can be
detected and eliminated in polynomial time by solving a linear program.
Furthermore, interior symmetric equilibria are a special case of a more general
notion, namely, that of an “equalizer,” which can also be computed efficiently
in polynomial time by solving a linear program. An elegant “symmetrization
method” of bimatrix games [Jurg et al., 1992] and the well-known
PPAD-completeness results on equilibrium computation in bimatrix games
[Daskalakis et al., 2009, Chen et al., 2009] imply then the compelling P =
PPAD.

The Famine of Forte: Few Search Problems Greatly Favor Your Algorithm

George D. Montanez
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

No Free Lunch theorems show that the average performance across any
closed-under-permutation set of problems is fixed for all algorithms, under
appropriate conditions. Extending these results, we demonstrate that the
proportion of favorable problems is itself strictly bounded, such that no
single algorithm can perform well over a large fraction of possible problems.
Our results explain why we must either continue to develop new learning methods
year after year or move towards highly parameterized models that are both
flexible and sensitive to their hyperparameters.

Learning Genomic Representations to Predict Clinical Outcomes in Cancer

Genomics are rapidly transforming medical practice and basic biomedical
research, providing insights into disease mechanisms and improving therapeutic
strategies, particularly in cancer. The ability to predict the future course of
a patient’s disease from high-dimensional genomic profiling will be essential
in realizing the promise of genomic medicine, but presents significant
challenges for state-of-the-art survival analysis methods. In this abstract we
present an investigation in learning genomic representations with neural
networks to predict patient survival in cancer. We demonstrate the advantages
of this approach over existing survival analysis methods using brain tumor
data.

Analyzing the Behavior of Visual Question Answering Models

Recently, a number of deep-learning based models have been proposed for the
task of Visual Question Answering (VQA). The performance of most models is
clustered around 60-70%. In this paper we propose systematic methods to analyze
the behavior of these models as a first step towards recognizing their
strengths and weaknesses, and identifying the most fruitful directions for
progress. We analyze two models, one each from two major classes of VQA models
— with-attention and without-attention and show the similarities and
differences in the behavior of these models. We also analyze the winning entry
of the VQA Challenge 2016.

Our behavior analysis reveals that despite recent progress, today’s VQA
models are “myopic” (tend to fail on sufficiently novel instances), often “jump
to conclusions” (converge on a predicted answer after ‘listening’ to just half
the question), and are “stubborn” (do not change their answers across images).

Information Theory

Binary Cyclic Codes that are Locally Repairable

Sreechakra Goparaju, Robert Calderbank
Comments: This 5 page paper appeared in the proceedings of the IEEE International Symposium on Information Theory (ISIT), 2014
Journal-ref: 2014 IEEE International Symposium on Information Theory, pp.
676-680. IEEE, 2014
Subjects: Information Theory (cs.IT)

Codes for storage systems aim to minimize the repair locality, which is the
number of disks (or nodes) that participate in the repair of a single failed
disk. Simultaneously, the code must sustain a high rate, operate on a small
finite field to be practically significant and be tolerant to a large number of
erasures. To this end, we construct new families of binary linear codes that
have an optimal dimension (rate) for a given minimum distance and locality.
Specifically, we construct cyclic codes that are locally repairable for
locality 2 and distances 2, 6 and 10. In doing so, we discover new upper bounds
on the code dimension, and prove the optimality of enabling local repair by
provisioning disjoint groups of disks. Finally, we extend our construction to
build codes that have multiple repair sets for each disk.

Graph-Theoretic Approaches to Two-Sender Index Coding

Chandra Thapa, Lawrence Ong, Sarah J. Johnson
Comments: To be presented at 2016 IEEE Global Communications Conference (GLOBECOM 2016) Workshop on Network Coding and Applications (NetCod), Washington, USA, 2016
Subjects: Information Theory (cs.IT)

Consider a communication scenario over a noiseless channel where a sender is
required to broadcast messages to multiple receivers, each having side
information about some messages. In this scenario, the sender can leverage the
receivers’ side information during the encoding of messages in order to reduce
the required transmissions. This type of encoding is called index coding. In
this paper, we study index coding with two cooperative senders, each with some
subset of messages, and multiple receivers, each requesting one unique message.
The index coding in this setup is called two-sender unicast index coding
(TSUIC). The main aim of TSUIC is to minimize the total number of transmissions
required by the two senders. Based on graph-theoretic approaches, we prove that
TSUIC is equivalent to single-sender unicast index coding (SSUIC) for some
special cases. Moreover, we extend the existing schemes for SSUIC, viz., the
cycle-cover scheme, the clique-cover scheme, and the local-chromatic scheme to
the corresponding schemes for TSUIC.

Reliability of universal decoding based on vector-quantized codewords

Neri Merhav
Comments: 31 pages; submitted for publication
Subjects: Information Theory (cs.IT)

Motivated by applications of biometric identification and content
identification systems, we consider the problem of random coding for channels,
where each codeword undergoes lossy compression (vector quantization), and
where the decoder bases its decision only on the compressed codewords and the
channel output, which is in turn, the channel’s response to the transmission of
an original codeword, before compression. For memoryless sources and memoryless
channels with finite alphabets, we propose a new universal decoder and analyze
its error exponent, which improves on an earlier result by Dasarathy and Draper
(2011), who used the classic maximum mutual information (MMI) universal
decoder. Further, we show that our universal decoder provides the same error
exponent as that of the optimal, maximum likelihood (ML) decoder, at least as
long as all single-letter transition probabilities of the channel are positive.
We conjecture that the same argument remains true even without this positivity
condition.

On Shared Rate Time Series for Mobile Users in Poisson Networks

Pranav Madadi, François Baccelli, Gustavo de Veciana
Subjects: Information Theory (cs.IT)

This paper focuses on modeling and analysis of the temporal performance
variations experienced by a mobile user in a wireless network and its impact on
system level design. We consider a simple stochastic geometry model: the
infrastructure nodes are Poisson distributed while the user’s motion is the
simplest possible i.e., constant velocity on a straight line. We first
characterize variations in the SNR process, and associated downlink Shannon
rate, resulting from variations in the infrastructure geometry seen by the
mobile. Specifically, by making a connection between stochastic geometry and
queueing theory the level crossings of the SNR process are shown to form an
alternating renewal process whose distribution can be completely characterized.
For large or small SNR levels, and associated rare events, we further derive
simple distributional (exponential) models. We then characterize the second
major contributor to variation, associated with changes in the number of other
users sharing infrastructure. Combining these two effects, we study what are
the dominant factors (infrastructure geometry or sharing number) given mobile
experiences a very high or low shared rate. These results are then used to
evaluate and optimize the system-level Quality of Service (QoS) and
system-level capacity to support mobile users sharing wireless infrastructure,
including mobile devices streaming video which proactively buffer content to
prevent rebuffering and mobiles which are downloading large files. Finally, we
use simulation to assess the fidelity of this model and its robustness to
factors which are presently taken into account.

Reduced-Complexity SCL Decoding of Multi-CRC-Aided Polar Codes

Mao-Ching Chiu, Wei-De Wu
Comments: 9 pages, 7 figures
Subjects: Information Theory (cs.IT)

Cyclic redundancy check (CRC) aided polar codes are capable of achieving
better performance than low-density parity-check (LDPC) codes under the
successive cancelation list (SCL) decoding scheme. However, the SCL decoding
scheme suffers from very high space and time complexities. Especially, the high
space complexity is a major concern for adopting polar codes in modern mobile
communication standards. In this paper, we propose a novel reduced-complexity
successive cancelation list (R-SCL) decoding scheme which is effective to
reduce the space complexity. Simulation results show that, with a (2048, 1024)
CRC-aided polar code, the R-SCL decoders with 25% reduction of space complexity
and 8% reduction of time complexity can still achieve almost the same
performance levels as those decoded by SCL decoders. To further reduce the
complexity, we propose a multi-CRC coding scheme for polar codes. Simulation
results show that, with a (16384, 8192) multi-CRC-aided polar code, a R-SCL
decoder with about 85% reduction of space complexity and 20% reduction of time
complexity results in a worst performance loss of only 0.04dB.

Mitigating Pilot Contamination Through Location-Aware Pilot Assignment in Massive MIMO Networks

Noman Akbar, Shihao Yan, Nan Yang, Jinhong Yuan
Comments: Accepted to appear in IEEE Globecom Workshop (ET5G)
Subjects: Information Theory (cs.IT)

We propose a novel location-aware pilot assignment scheme to mitigate pilot
contamination in massive multiple-input multiple-output (MIMO) networks, where
the channels are subjected to Rician fading. Our proposed scheme utilizes the
location information of users as the input to conduct pilot assignment in the
network. Based on the location information, we first determine the line of
sight (LOS) interference between the intended signal and the interfering
signal. Our analysis reveals that the LOS interference converges to zero as the
number of antennas at the base station (BS) goes to infinity, whereas for
finite number of antennas at the BS the LOS interference indeed depends on
specific pilot allocation strategies. Following this revelation, we assign
pilot sequences to all the users in the massive MIMO network such that the LOS
interference is minimized for finite number of antennas at the BS. Our proposed
scheme outperforms the random pilot assignment in terms of achieving a much
higher uplink sum rate for reasonable values of the Rician K-factor. Moreover,
we propose a new performance metric, which measures the strength of the LOS
interference and demonstrate that it is a good metric to analyze the
performance of pilot assignment schemes. Theoretical analysis is provided and
verified through numerical simulations to support the proposed scheme.

Efficient Offline Waveform Design Using Quincunx/Hexagonal Time-Frequency Lattices For 5G Systems

Raouia Ayadi, Inès Kammoun, Mohamed Siala
Subjects: Information Theory (cs.IT)

Conventional OFDM, adopted in LTE-A systems, cannot provide the quality of
service requirements sought in 5G systems because of extreme natural channel
impairments caused by higher Doppler spreads and unexpected artificial
impairments caused by multi-source transmission, to be brought by 5G, and by
synchronization relaxation for closed-loop signaling overhead reduction in some
5G applications. These severe impairments induce a strong loss of orthogonality
between subcarriers and OFDM symbols and, therefore, lead to a dramatic
increase in ICI and ISI. To be well armed against these dramatic impairments,
we, in the present paper, optimize the transmit/receive waveforms for FBMC
systems, with hexagonal time-frequency lattices, operating over severe doubly
dispersive channels, accounting for both natural and artificial impairments.
For this, we exploit the POPS paradigm, recently proposed for rectangular
time-frequency lattices, to design offline waveforms maximizing the SINR for
hexagonal time-frequency lattices. We show that FBMC, with hexagonal lattices,
offers a strong improvement in SIR with respect to conventional OFDM and an
improvement of 1dB with respect to POPS-FBMC, with classical rectangular
lattices. Furthermore, we show that the hexagonal POPS-FBMC brings more
robustness to frequency synchronization errors and offers a 10dB reduction in
OOB emissions with respect to rectangular POPS-FBMC.

The Famine of Forte: Few Search Problems Greatly Favor Your Algorithm

George D. Montanez
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

No Free Lunch theorems show that the average performance across any
closed-under-permutation set of problems is fixed for all algorithms, under
appropriate conditions. Extending these results, we demonstrate that the
proportion of favorable problems is itself strictly bounded, such that no
single algorithm can perform well over a large fraction of possible problems.
Our results explain why we must either continue to develop new learning methods
year after year or move towards highly parameterized models that are both
flexible and sensitive to their hyperparameters.