IT博客汇 | arXiv Paper Daily: Wed, 5 Oct 2016

arXiv Paper Daily: Wed, 5 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-05 00:00:00

Neural and Evolutionary Computing

Adaptive Neuron Apoptosis for Accelerating Deep Learning on Large Scale Systems

Charles Siegel, Jeff Daily, Abhinav Vishnu
Comments: 11 pages, 7 figures
Subjects: Neural and Evolutionary Computing (cs.NE)

We present novel techniques to accelerate the convergence of Deep Learning
algorithms by conducting low overhead removal of redundant neurons — apoptosis
of neurons — which do not contribute to model learning, during the training
phase itself. We provide in-depth theoretical underpinnings of our heuristics
(bounding accuracy loss and handling apoptosis of several neuron types), and
present the methods to conduct adaptive neuron apoptosis. Specifically, we are
able to improve the training time for several datasets by 2-3x, while reducing
the number of parameters by up to 30x (4-5x on average) on datasets such as
ImageNet classification. For the Higgs Boson dataset, our implementation
improves the accuracy (measured by Area Under Curve (AUC)) for classification
from 0.88/1 to 0.94/1, while reducing the number of parameters by 3x in
comparison to existing literature. The proposed methods achieve a 2.44x speedup
in comparison to the default (no apoptosis) algorithm.

Tutorial on Answering Questions about Images with Deep Learning

Mateusz Malinowski, Mario Fritz
Comments: The tutorial was presented at ‘2nd Summer School on Integrating Vision and Language: Deep Learning’ in Malta, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Together with the development of more accurate methods in Computer Vision and
Natural Language Understanding, holistic architectures that answer on questions
about the content of real-world images have emerged. In this tutorial, we build
a neural-based approach to answer questions about images. We base our tutorial
on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
models that we present here can achieve a competitive performance on both
datasets, in fact, they are among the best methods that use a combination of
LSTM with a global, full frame CNN representation of an image. We hope that
after reading this tutorial, the reader will be able to use Deep Learning
frameworks, such as Keras and introduced Kraino, to build various architectures
that will lead to a further performance improvement on this challenging task.

Embracing data abundance: BookTest Dataset for Reading Comprehension

Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
Comments: The first two authors contributed equally to this work. Submitted to EACL 2017. Code and dataset are publicly available
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

There is a practically unlimited amount of natural language data available.
Still, recent work in text comprehension has focused on datasets which are
small relative to current computing possibilities. This article is making a
case for the community to move to larger data and as a step in that direction
it is proposing the BookTest, a new dataset similar to the popular Children’s
Book Test (CBT), however more than 60 times larger. We show that training on
the new data improves the accuracy of our Attention-Sum Reader model on the
original CBT test data by a much larger margin than many recent attempts to
improve the model architecture. On one version of the dataset our ensemble even
exceeds the human baseline provided by Facebook. We then show in our own human
study that there is still space for further improvement.

Computer Vision and Pattern Recognition

Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs

Limin Wang, Sheng Guo, Weilin Huang, Yuanjun Xiong, Yu Qiao
Comments: Code and models are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Thanks to the available large-scale scene datasets such as Places and
Places2, Convolutional Neural Networks (CNNs) have made remarkable progress on
the problem of scene recognition. However, scene categories are often defined
according its functions and there exist large intra-class variations in a
single scene category. Meanwhile, as the number of scene classes is increasing,
some classes tend to overlap with others and label ambiguity is becoming a
problem. This paper focuses on large-scale scene recognition and makes two
major contributions to tackle these issues. First, we propose a
multi-resolution CNN architecture to capture visual content and structure at
different scales. Our proposed multi-resolution CNNs are composed of coarse
resolution CNNs and fine resolution CNNs, whose performance is complementary to
each other. Second, we design two knowledge guided disambiguation techniques to
deal with the problem of label ambiguity. In the first scenario, we exploit the
knowledge from confusion matrix at validation data to merge similar classes
into a super category, while in the second scenario, we utilize the knowledge
of extra networks to produce a soft label for each image. Both the information
of super category and soft labels are exploited to train CNNs on the Places2
datasets. We conduct experiments on three large-scale image classification
datasets (ImangeNet, Places, Places2) to demonstrate the effectiveness of our
proposed approach. In addition, our method takes part in two major scene
recognition challenges, and we achieve the 2$^{nd}$ place at the Places2
challenge 2015 and 1$^{st}$ place at the LSUN challenge 2016. Finally, we
transfer the learned representations to the datasets of MIT Indoor67 and
SUN397, which yields the state-of-the-art performance (86.7% and 72.0%) on both
datasets.

Tutorial on Answering Questions about Images with Deep Learning

Together with the development of more accurate methods in Computer Vision and
Natural Language Understanding, holistic architectures that answer on questions
about the content of real-world images have emerged. In this tutorial, we build
a neural-based approach to answer questions about images. We base our tutorial
on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
models that we present here can achieve a competitive performance on both
datasets, in fact, they are among the best methods that use a combination of
LSTM with a global, full frame CNN representation of an image. We hope that
after reading this tutorial, the reader will be able to use Deep Learning
frameworks, such as Keras and introduced Kraino, to build various architectures
that will lead to a further performance improvement on this challenging task.

Fast Image Classification by Boosting Fuzzy Classifiers

Marcin Korytkowski, Leszek Rutkowski, Rafał Scherer
Comments: 1 figure
Journal-ref: Inf. Sci. (327) 2016 175-182
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel approach to visual objects classification based
on generating simple fuzzy classifiers using local image features to
distinguish between one known class and other classes. Boosting meta learning
is used to find the most representative local features. The proposed approach
is tested on a state-of-the-art image dataset and compared with the
bag-of-features image representation model combined with the Support Vector
Machine classification. The novel method gives better classification accuracy
and the time of learning and testing process is more than 30% shorter.

Sparsity-based Color Image Super Resolution via Exploiting Cross Channel Constraints

Hojjat S. Mousavi, Vishal Monga
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Sparsity constrained single image super-resolution (SR) has been of much
recent interest. A typical approach involves sparsely representing patches in a
low-resolution (LR) input image via a dictionary of example LR patches, and
then using the coefficients of this representation to generate the
high-resolution (HR) output via an analogous HR dictionary. However, most
existing sparse representation methods for super resolution focus on the
luminance channel information and do not capture interactions between color
channels. In this work, we extend sparsity based super-resolution to multiple
color channels by taking color information into account. Edge similarities
amongst RGB color bands are exploited as cross channel correlation constraints.
These additional constraints lead to a new optimization problem which is not
easily solvable; however, a tractable solution is proposed to solve it
efficiently. Moreover, to fully exploit the complementary information among
color channels, a dictionary learning method is also proposed specifically to
learn color dictionaries that encourage edge similarities. Merits of the
proposed method over state of the art are demonstrated both visually and
quantitatively using image quality metrics.

A novel and effective scoring scheme for structure classification and pairwise similarity measurement

Rezaul Karim, Md. Momin Al Aziz, Swakkhar Shatabda, M. Sohel Rahman
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Protein tertiary structure defines its functions, classification and binding
sites. Similar structural characteristics between two proteins often lead to
the similar characteristics thereof. Determining structural similarity
accurately in real time is a crucial research issue. In this paper, we present
a novel and effective scoring scheme that is dependent on novel features
extracted from protein alpha carbon distance matrices. Our scoring scheme is
inspired from pattern recognition and computer vision. Our method is
significantly better than the current state of the art methods in terms of
family match of pairs of protein structures and other statistical measurements.
The effectiveness of our method is tested on standard benchmark structures. A
web service is available at this http URL
where you can get the similarity measurement score between two protein
structures based on our method.

Compressed Sensing and Adaptive Graph Total Variation for Tomographic Reconstructions

Faisal Mahmood, Nauman Shahid, Ulf Skoglund, Pierre Vandergheynst
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Compressed Sensing (CS) and Total Variation (TV)- based iterative image
reconstruction algorithms have received increased attention recently. This is
due to the ability of such methods to reconstruct from limited and noisy data.
Local TV methods fail to preserve texture details and fine structures, which
are tedious for the method to distinguish from noise. In many cases local
methods also create additional artifacts due to over smoothing. Non-Local Total
Variation (NLTV) has been increasingly used for medical imaging applications.
However, it is not updated in every iteration of the algorithm, has a high
computational complexity and depends on the scale of pairwise parameters. In
this work we propose using Adaptive Graph- based TV in combination with CS
(ACSGT). Similar to NLTV our proposed method goes beyond spatial similarity
between different regions of an image being reconstructed by establishing a
connection between similar regions in the image regardless of spatial distance.
However, it is computationally much more efficient and scalable when compared
to NLTV due to the use of approximate nearest neighbor search algorithm.
Moreover, our method is adaptive, i.e, it involves updating the graph prior
every iteration making the connection between similar regions stronger. Since
TV is a special case of graph TV the proposed method can be seen as a
generalization of CS and TV methods. We test our proposed algorithm by
reconstructing a variety of different phantoms from limited and corrupted data
and observe that we achieve a better result with ACSGT in every case.

Image Aesthetic Assessment: An Experimental Survey

Yubin Deng, Chen Change Loy, Xiaoou Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This survey aims at reviewing recent techniques used in the assessment of
image aesthetic quality. The assessment of image aesthetic quality is the
process of computationally distinguishing high-quality photos from low-quality
ones based on photographic rules or artistic perceptions. A variety of
approaches have been proposed in the literature trying to solve this
challenging problem. In this survey, we present a systematic listing of the
reviewed approaches based on feature types (hand-crafted features and deep
features) and evaluation criteria (dataset characteristics and evaluation
metrics). Main contributions and novelties of the reviewed approaches are
highlighted and discussed. In addition, following the emergence of deep
learning techniques, we systematically evaluate recent deep learning settings
that are useful for developing a robust deep model for aesthetic scoring.
Experiments are conducted using simple yet solid baselines that are competitive
with the current state-of-the-arts. Moreover, we discuss the relation between
image aesthetic assessment and automatic image cropping. We hope that this
survey could serve as a comprehensive reference source for future research on
the study of image aesthetic assessment.

Real Time Fine-Grained Categorization with Accuracy and Interpretability

Shaoli Huang, Dacheng Tao
Comments: arXiv admin note: text overlap with arXiv:1512.08086
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A well-designed fine-grained categorization system usually has three
contradictory requirements: accuracy (the ability to identify objects among
subordinate categories); interpretability (the ability to provide
human-understandable explanation of recognition system behavior); and
efficiency (the speed of the system). To handle the trade-off between accuracy
and interpretability, we propose a novel “Deeper Part-Stacked CNN” architecture
armed with interpretability by modeling subtle differences between object
parts. The proposed architecture consists of a part localization network, a
two-stream classification network that simultaneously encodes object-level and
part-level cues, and a feature vectors fusion component. Specifically, the part
localization network is implemented by exploring a new paradigm for key point
localization that first samples a small number of representable pixels and then
determine their labels via a convolutional layer followed by a softmax layer.
We also use a cropping layer to extract part features and propose a scale
mean-max layer for feature fusion learning. Experimentally, our proposed method
outperform state-of-the-art approaches both in part localization task and
classification task on Caltech-UCSD Birds-200-2011. Moreover, by adopting a set
of sharing strategies between the computation of multiple object parts, our
single model is fairly efficient running at 32 frames/sec.

Prediction of Manipulation Actions

Cornelia Fermüller, Fang Wang, Yezhou Yang, Konstantinos Zampogiannis, Yi Zhang, Francisco Barranco, Michael Pfeiffer
Comments: 15 pages, 12 figures, 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Looking at a person’s hands one often can tell what the person is going to do
next, how his/her hands are moving and where they will be, because an actor’s
intentions shape his/her movement kinematics during action execution.
Similarly, active systems with real-time constraints must not simply rely on
passive video-segment classification, but they have to continuously update
their estimates and predict future actions. In this paper, we study the
prediction of dexterous actions. We recorded from subjects performing different
manipulation actions on the same object, such as “squeezing”, “flipping”,
“washing”, “wiping” and “scratching” with a sponge. In psychophysical
experiments, we evaluated human observers’ skills in predicting actions from
video sequences of different length, depicting the hand movement in the
preparation and execution of actions before and after contact with the object.
We then developed a recurrent neural network based method for action prediction
using as input patches around the hand. We also used the same formalism to
predict the forces on the finger tips using for training synchronized video and
force data streams. Evaluations on two new datasets showed that our system
closely matches human performance in the recognition task, and demonstrate the
ability of our algorithm to predict what and how a dexterous action is
performed.

Real-Time RGB-D based Template Matching Pedestrian Detection

Omid Hosseini jafari, Michael Ying Yang
Comments: published in ICRA 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Pedestrian detection is one of the most popular topics in computer vision and
robotics. Considering challenging issues in multiple pedestrian detection, we
present a real-time depth-based template matching people detector. In this
paper, we propose different approaches for training the depth-based template.
We train multiple templates for handling issues due to various upper-body
orientations of the pedestrians and different levels of detail in depth-map of
the pedestrians with various distances from the camera. And, we take into
account the degree of reliability for different regions of sliding window by
proposing the weighted template approach. Furthermore, we combine the
depth-detector with an appearance based detector as a verifier to take
advantage of the appearance cues for dealing with the limitations of depth
data. We evaluate our method on the challenging ETH dataset sequence. We show
that our method outperforms the state-of-the-art approaches.

Can Ground Truth Label Propagation from Video help Semantic Segmentation?

Siva Karthik Mustikovela, Michael Ying Yang, Carsten Rother
Comments: To appear at ECCV 2016 Workshop on Video Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

For state-of-the-art semantic segmentation task, training convolutional
neural networks (CNNs) requires dense pixelwise ground truth (GT) labeling,
which is expensive and involves extensive human effort. In this work, we study
the possibility of using auxiliary ground truth, so-called extit{pseudo
ground truth} (PGT) to improve the performance. The PGT is obtained by
propagating the labels of a GT frame to its subsequent frames in the video
using a simple CRF-based, cue integration framework. Our main contribution is
to demonstrate the use of noisy PGT along with GT to improve the performance of
a CNN. We perform a systematic analysis to find the right kind of PGT that
needs to be added along with the GT for training a CNN. In this regard, we
explore three aspects of PGT which influence the learning of a CNN: i) the PGT
labeling has to be of good quality; ii) the PGT images have to be different
compared to the GT images; iii) the PGT has to be trusted differently than GT.
We conclude that PGT which is diverse from GT images and has good quality of
labeling can indeed help improve the performance of a CNN. Also, when PGT is
multiple folds larger than GT, weighing down the trust on PGT helps in
improving the accuracy. Finally, We show that using PGT along with GT, the
performance of Fully Convolutional Network (FCN) on Camvid data is increased by
$2.7\%$ on IoU accuracy. We believe such an approach can be used to train CNNs
for semantic video segmentation where sequentially labeled image frames are
needed. To this end, we provide recommendations for using PGT strategically for
semantic segmentation and hence bypass the need for extensive human efforts in
labeling.

Cardea: Context-Aware Visual Privacy Protection from Pervasive Cameras

Jiayu Shu, Rui Zheng, Pan Hui
Comments: 10 pages
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

The growing popularity of mobile and wearable devices with built-in cameras,
the bright prospect of camera related applications such as augmented reality
and life-logging system, the increased ease of taking and sharing photos, and
advances in computer vision techniques have greatly facilitated people’s lives
in many aspects, but have also inevitably raised people’s concerns about visual
privacy at the same time. Motivated by recent user studies that people’s
privacy concerns are dependent on the context, in this paper, we propose
Cardea, a context-aware and interactive visual privacy protection framework
that enforces privacy protection according to people’s privacy preferences. The
framework provides people with fine-grained visual privacy protection using: i)
personal privacy profiles, with which people can define their context-dependent
privacy preferences; and ii) visual indicators: face features, for devices to
automatically locate individuals who request privacy protection; and iii) hand
gestures, for people to flexibly interact with cameras to temporarily change
their privacy preferences. We design and implement the framework consisting of
the client app on Android devices and the cloud server. Our evaluation results
confirm this framework is practical and effective with 86% overall accuracy,
showing promising future for context-aware visual privacy protection from
pervasive cameras.

Artificial Intelligence

Towards the Design of Prospect-Theory based Human Decision Rules for Hypothesis Testing

V. Sriram Siddhardh Nadendla, Swastik Brahma, Pramod K. Varshney
Comments: 8 pages, 5 figures, Presented at the 54th Annual Allerton Conference on Communication, Control, and Computing, 2016
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Detection rules have traditionally been designed for rational agents that
minimize the Bayes risk (average decision cost). With the advent of
crowd-sensing systems, there is a need to redesign binary hypothesis testing
rules for behavioral agents, whose cognitive behavior is not captured by
traditional utility functions such as Bayes risk. In this paper, we adopt
prospect theory based models for decision makers. We consider special agent
models namely optimists and pessimists in this paper, and derive optimal
detection rules under different scenarios. Using an illustrative example, we
also show how the decision rule of a human agent deviates from the Bayesian
decision rule under various behavioral models, considered in this paper.

DeepAlgebra – an outline of a program

Przemyslaw Chojecki
Comments: 6 pages, this https URL
Subjects: Artificial Intelligence (cs.AI); Algebraic Geometry (math.AG)

We outline a program in the area of formalization of mathematics to automate
theorem proving in algebra and algebraic geometry. We propose a construction of
a dictionary between automated theorem provers and (La)TeX exploiting syntactic
parsers. We describe its application to a repository of human-written facts and
definitions in algebraic geometry (The Stacks Project). We use deep learning
techniques.

A Constraint-Handling Technique for Genetic Algorithms using a Violation Factor

Adam Chehouri, Rafic Younes, Jean Perron, Adrian Ilinca (UQAR)
Journal-ref: Journal of Computer Science, Science Publications, 2016, 12 (7),
pp.350-362
Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

Over the years, several meta-heuristic algorithms were proposed and are now
emerging as common methods for constrained optimization problems. Among them,
genetic algorithms (GA’s) shine as popular evolutionary algorithms (EA’s) in
engineering optimization. Most engineering design problems are difficult to
resolve with conventional optimization algorithms because they are highly
nonlinear and contain constraints. In order to handle these constraints, the
most common technique is to apply penalty functions. The major drawback is that
they require tuning of parameters, which can be very challenging. In this
paper, we present a constraint-handling technique for GA’s solely using the
violation factor, called VCH (Violation Constraint-Handling) method. Several
benchmark problems from the literature are examined. The VCH technique was able
to provide a consistent performance and match results from other GA-based
techniques.

Micro-Data Learning: The Other End of the Spectrum

Jean-Baptiste Mouret (LORIA, LARSEN)
Journal-ref: ERCIM News, ERCIM, 2017, pp.2
Subjects: Artificial Intelligence (cs.AI)

Many fields are now snowed under with an avalanche of data, which raises
considerable challenges for computer scientists. Meanwhile, robotics (among
other fields) can often only use a few dozen data points because acquiring them
involves a process that is expensive or time-consuming. How can an algorithm
learn with only a few data points?

Tutorial on Answering Questions about Images with Deep Learning

Together with the development of more accurate methods in Computer Vision and
Natural Language Understanding, holistic architectures that answer on questions
about the content of real-world images have emerged. In this tutorial, we build
a neural-based approach to answer questions about images. We base our tutorial
on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
models that we present here can achieve a competitive performance on both
datasets, in fact, they are among the best methods that use a combination of
LSTM with a global, full frame CNN representation of an image. We hope that
after reading this tutorial, the reader will be able to use Deep Learning
frameworks, such as Keras and introduced Kraino, to build various architectures
that will lead to a further performance improvement on this challenging task.

Embracing data abundance: BookTest Dataset for Reading Comprehension

There is a practically unlimited amount of natural language data available.
Still, recent work in text comprehension has focused on datasets which are
small relative to current computing possibilities. This article is making a
case for the community to move to larger data and as a step in that direction
it is proposing the BookTest, a new dataset similar to the popular Children’s
Book Test (CBT), however more than 60 times larger. We show that training on
the new data improves the accuracy of our Attention-Sum Reader model on the
original CBT test data by a much larger margin than many recent attempts to
improve the model architecture. On one version of the dataset our ensemble even
exceeds the human baseline provided by Facebook. We then show in our own human
study that there is still space for further improvement.

Network Structure Inference, A Survey: Motivations, Methods, and Applications

Ivan Brugere, Brian Gallagher, Tanya Y. Berger-Wolf
Comments: 43 pages, submitted to ACM Computing Surveys
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Physics and Society (physics.soc-ph)

Networks are used to represent relationships between entities in many complex
systems, spanning from online social networks to biological cell development
and brain activity. These networks model relationships which present various
challenges. In many cases, relationships between entities are unambiguously
known: are two users friends in a social network? Do two researchers
collaborate on a published paper? Do two road segments in a transportation
system intersect? These are unambiguous and directly observable in the system
in question. In most cases, relationship between nodes are not directly
observable and must be inferred: does one gene regulate the expression of
another? Do two animals who physically co-locate have a social bond? Who
infected whom in a disease outbreak?

Existing approaches use specialized knowledge in different home domains to
infer and measure the goodness of inferred network for a specific task.
However, current research lacks a rigorous validation framework which employs
standard statistical validation. In this survey, we examine how network
representations are learned from non-network data, the variety of questions and
tasks on these data over several domains, and validation strategies for
measuring the inferred network’s capability of answering questions on the
original system of interest.

Can Evolutionary Sampling Improve Bagged Ensembles?

Harsh Nisar, Bhanu Pratap Singh Rawat
Comments: 3 pages, 1 table, Data Efficient Machine Learning Workshop (DEML’16), ICML
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Perturb and Combine (P&C) group of methods generate multiple versions of the
predictor by perturbing the training set or construction and then combining
them into a single predictor (Breiman, 1996b). The motive is to improve the
accuracy in unstable classification and regression methods. One of the most
well known method in this group is Bagging. Arcing or Adaptive Resampling and
Combining methods like AdaBoost are smarter variants of P&C methods. In this
extended abstract, we lay the groundwork for a new family of methods under the
P&C umbrella, known as Evolutionary Sampling (ES). We employ Evolutionary
algorithms to suggest smarter sampling in both the feature space (sub-spaces)
as well as training samples. We discuss multiple fitness functions to assess
ensembles and empirically compare our performance against randomized sampling
of training data and feature sub-spaces.

Analyzing the Behavior of Visual Question Answering Models

Aishwarya Agrawal, Dhruv Batra, Devi Parikh
Comments: 13 pages, 20 figures; To appear in EMNLP 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Recently, a number of deep-learning based models have been proposed for the
task of Visual Question Answering (VQA). The performance of most models is
clustered around 60-70%. In this paper we propose systematic methods to analyze
the behavior of these models as a first step towards recognizing their
strengths and weaknesses, and identifying the most fruitful directions for
progress. We analyze two models, one each from two major classes of VQA models
— with-attention and without-attention and show the similarities and
differences in the behavior of these models. We also analyze the winning entry
of the VQA Challenge 2016.

Our behavior analysis reveals that despite recent progress, today’s VQA
models are “myopic” (tend to fail on sufficiently novel instances), often “jump
to conclusions” (converge on a predicted answer after ‘listening’ to just half
the question), and are “stubborn” (do not change their answers across images).

Information Retrieval

MatLM: a Matrix Formulation for Probabilistic Language Models

Yanshan Wang, Hongfang Liu
Subjects: Information Retrieval (cs.IR)

Probabilistic language models are widely used in Information Retrieval (IR)
to rank documents by the probability that they generate the query. However, the
implementation of the probabilistic representations with programming languages
that favor matrix calculations is challenging. In this paper, we utilize matrix
representations to reformulate the probabilistic language models. The matrix
representation is a superstructure for the probabilistic language models to
organize the calculated probabilities and a potential formalism for
standardization of language models and for further mathematical analysis. It
facilitates implementations by matrix friendly programming languages. In this
paper, we consider the matrix formulation of conventional language model with
Dirichlet smoothing, and two language models based on Latent Dirichlet
Allocation (LDA), i.e., LBDM and LDI. We release a Java software
package–MatLM–implementing the proposed models. Code is available at:
this https URL

Computation and Language

Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions

Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang
Subjects: Computation and Language (cs.CL)

In this paper we provide the largest published comparison of translation
quality for phrase-based SMT and neural machine translation across 30
translation directions. For ten directions we also include hierarchical
phrase-based MT. Experiments are performed for the recently published United
Nations Parallel Corpus v1.0 and its large six-way sentence-aligned subcorpus.
In the second part of the paper we investigate aspects of translation speed,
introducing AmuNMT, our efficient neural machine translation decoder. We
demonstrate that current neural machine translation could already be used for
in-production systems when comparing words-per-second ratios.

Applications of Online Deep Learning for Crisis Response Using Social Media Information

Dat Tien Nguyen, Shafiq Joty, Muhammad Imran, Hassan Sajjad, Prasenjit Mitra
Comments: Accepted at SWDM co-located with CIKM 2016. 6 pages, 2 figures. arXiv admin note: text overlap with arXiv:1608.03902
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Learning (cs.LG)

During natural or man-made disasters, humanitarian response organizations
look for useful information to support their decision-making processes. Social
media platforms such as Twitter have been considered as a vital source of
useful information for disaster response and management. Despite advances in
natural language processing techniques, processing short and informal Twitter
messages is a challenging task. In this paper, we propose to use Deep Neural
Network (DNN) to address two types of information needs of response
organizations: 1) identifying informative tweets and 2) classifying them into
topical classes. DNNs use distributed representation of words and learn the
representation as well as higher level features automatically for the
classification task. We propose a new online algorithm based on stochastic
gradient descent to train DNNs in an online fashion during disaster situations.
We test our models using a crisis-related real-world Twitter dataset.

Embracing data abundance: BookTest Dataset for Reading Comprehension

There is a practically unlimited amount of natural language data available.
Still, recent work in text comprehension has focused on datasets which are
small relative to current computing possibilities. This article is making a
case for the community to move to larger data and as a step in that direction
it is proposing the BookTest, a new dataset similar to the popular Children’s
Book Test (CBT), however more than 60 times larger. We show that training on
the new data improves the accuracy of our Attention-Sum Reader model on the
original CBT test data by a much larger margin than many recent attempts to
improve the model architecture. On one version of the dataset our ensemble even
exceeds the human baseline provided by Facebook. We then show in our own human
study that there is still space for further improvement.

Are Word Embedding-based Features Useful for Sarcasm Detection?

Aditya Joshi, Vaibhav Tripathi, Kevin Patel, Pushpak Bhattacharyya, Mark Carman
Comments: The paper will be presented at Conference on Empirical Methods in Natural Language Processing (EMNLP) 2016 in November 2016. this http URL
Subjects: Computation and Language (cs.CL)

This paper makes a simple increment to state-of-the-art in sarcasm detection
research. Existing approaches are unable to capture subtle forms of context
incongruity which lies at the heart of sarcasm. We explore if prior work can be
enhanced using semantic similarity/discordance between word embeddings. We
augment word embedding-based features to four feature sets reported in the
past. We also experiment with four types of word embeddings. We observe an
improvement in sarcasm detection, irrespective of the word embedding used or
the original feature set to which our features are augmented. For example, this
augmentation results in an improvement in F-score of around 4\% for three out
of these four feature sets, and a minor degradation in case of the fourth, when
Word2Vec embeddings are used. Finally, a comparison of the four embeddings
shows that Word2Vec and dependency weight-based features outperform LSA and
GloVe, in terms of their benefit to sarcasm detection.

A Computational Approach to Automatic Prediction of Drunk Texting

Aditya Joshi, Abhijit Mishra, Balamurali AR, Pushpak Bhattacharyya, Mark Carman
Comments: This paper was presented at ACL-IJCNLP 2015
Subjects: Computation and Language (cs.CL)

Alcohol abuse may lead to unsociable behavior such as crime, drunk driving,
or privacy leaks. We introduce automatic drunk-texting prediction as the task
of identifying whether a text was written when under the influence of alcohol.
We experiment with tweets labeled using hashtags as distant supervision. Our
classifiers use a set of N-gram and stylistic features to detect drunk tweets.
Our observations present the first quantitative evidence that text contains
signals that can be exploited to detect drunk-texting.

Chinese Event Extraction Using DeepNeural Network with Word Embedding

Yandi Xia, Yang Liu
Subjects: Computation and Language (cs.CL)

A lot of prior work on event extraction has exploited a variety of features
to represent events. Such methods have several drawbacks: 1) the features are
often specific for a particular domain and do not generalize well; 2) the
features are derived from various linguistic analyses and are error-prone; and
3) some features may be expensive and require domain expert. In this paper, we
develop a Chinese event extraction system that uses word embedding vectors to
represent language, and deep neural networks to learn the abstract feature
representation in order to greatly reduce the effort of feature engineering. In
addition, in this framework, we leverage large amount of unlabeled data, which
can address the problem of limited labeled corpus for this task. Our
experiments show that our proposed method performs better compared to the
system using rich language features, and using unlabeled data benefits the word
embeddings. This study suggests the potential of DNN and word embedding for the
event extraction task.

Grounding the Lexical Sets of Causative-Inchoative Verbs with Word Embedding

Edoardo Maria Ponti, Elisabetta Jezek, Bernardo Magnini
Comments: 5 pages, 4 figures, accepted at: Third Italian Conference on Computational Linguistics (CLIC-it). 5-6 December 2016, Napoli (Italy)
Subjects: Computation and Language (cs.CL)

Lexical sets contain the words filling the argument positions of a verb in
one of its senses. They can be grounded empirically through their automatic
extraction from corpora. The purpose of this paper is demonstrating that their
vector representation based on word embedding provides insights onto many
linguistic phenomena, and in particular about verbs undergoing the
causative-inchoative alternation. A first experiment aims at investigating the
internal structure of the sets, which are known to be radial and continuous
categories cognitively. A second experiment shows that the distance between the
subject set and object set is correlated with a semantic factor, namely the
spontaneity of the verb.

Tutorial on Answering Questions about Images with Deep Learning

Together with the development of more accurate methods in Computer Vision and
Natural Language Understanding, holistic architectures that answer on questions
about the content of real-world images have emerged. In this tutorial, we build
a neural-based approach to answer questions about images. We base our tutorial
on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
models that we present here can achieve a competitive performance on both
datasets, in fact, they are among the best methods that use a combination of
LSTM with a global, full frame CNN representation of an image. We hope that
after reading this tutorial, the reader will be able to use Deep Learning
frameworks, such as Keras and introduced Kraino, to build various architectures
that will lead to a further performance improvement on this challenging task.

Ensemble Maximum Entropy Classification and Linear Regression for Author Age Prediction

Joey Hong, Chris Mattmann, Paul Ramirez
Comments: 6 pages, 4 figures
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

The evolution of the internet has created an abundance of unstructured data
on the web, a significant part of which is textual. The task of author
profiling seeks to find the demographics of people solely from their linguistic
and content-based features in text. The ability to describe traits of authors
clearly has applications in fields such as security and forensics, as well as
marketing. Instead of seeing age as just a classification problem, we also
frame age as a regression one, but use an ensemble chain method that
incorporates the power of both classification and regression to learn the
authors exact age.

Distributed, Parallel, and Cluster Computing

Reasoning about identifier spaces: How to make Chord correct

Pamela Zave
Comments: 13 pages including references; 6 figures. arXiv admin note: text overlap with arXiv:1502.06461
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The Chord distributed hash table (DHT) is well-known and often used to
implement peer-to-peer systems. Chord peers find other peers, and access their
data, through a ring-shaped pointer structure in a large identifier space.
Despite claims of proven correctness, i.e., eventual reachability, previous
work has shown that the Chord ring-maintenance protocol is not correct under
its original operating assumptions. Previous work has not, however, discovered
whether Chord could be made correct under the same assumptions. The
contribution of this paper is to provide the first specification of correct
operations and initialization for Chord, an inductive invariant that is
necessary and sufficient to support a proof of correctness, and two independent
proofs of correctness. One proof is informal and intuitive, and applies to
networks of any size. The other proof is based on a formal model in Alloy, and
uses fully automated analysis to prove the assertions for networks of bounded
size. The two proofs complement each other in several important ways.

Reducing Opacity to Linearizability: A Sound and Complete Method

Alasdair Armstrong, Brijesh Dongol, Simon Doherty
Subjects: Logic in Computer Science (cs.LO); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Programming Languages (cs.PL)

Transactional memory is a mechanism that manages thread synchronisation on
behalf of a programmer so that blocks of code execute with an illusion of
atomicity. The main safety criterion for transactional memory is opacity, which
defines conditions for serialising concurrent transactions.

Proving opacity is complicated because it allows concurrent transactions to
observe distinct memory states, while TM implementations are typically based on
one single shared store. This paper presents a sound and complete method, based
on coarse-grained abstraction, for reducing proofs of opacity to the relatively
simpler correctness condition: linearizability. We use our methods to verify
TML and NORec from the literature and show our techniques extend to relaxed
memory models by showing that both are opaque under TSO without requiring
additional fences. Our methods also elucidate TM designs at higher level of
abstraction; as an application, we develop a variation of NORec with fast-path
reads transactions. All our proofs have been mechanised, either in the Isabelle
theorem prover or the PAT model checker.

Learning

A Non-generative Framework and Convex Relaxations for Unsupervised Learning

Elad Hazan, Tengyu Ma
Comments: to appear in NIPS 2016
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

We give a novel formal theoretical framework for unsupervised learning with
two distinctive characteristics. First, it does not assume any generative model
and based on a worst-case performance metric. Second, it is comparative, namely
performance is measured with respect to a given hypothesis class. This allows
to avoid known computational hardness results and improper algorithms based on
convex relaxations. We show how several families of unsupervised learning
models, which were previously only analyzed under probabilistic assumptions and
are otherwise provably intractable, can be efficiently learned in our framework
by convex optimization.

Reset-Free Guided Policy Search: Efficient Deep Reinforcement Learning with Stochastic Initial States

W. Montgomery, A. Ajay, C. Finn, P. Abbeel, S. Levine
Subjects: Learning (cs.LG)

Autonomous learning of robotic skills can allow general-purpose robots to
learn wide behavioral repertoires without requiring extensive manual
engineering. However, robotic skill learning methods typically make one of
several trade-offs to enable practical real-world learning, such as requiring
manually designed policy or value function representations, initialization from
human-provided demonstrations, instrumentation of the training environment, or
extremely long training times. In this paper, we propose a new reinforcement
learning algorithm for learning manipulation skills that can train
general-purpose neural network policies with minimal human engineering, while
still allowing for fast, efficient learning in stochastic environments. Our
approach builds on the guided policy search (GPS) algorithm, which transforms
the reinforcement learning problem into supervised learning from a
computational teacher (without human demonstrations). In contrast to prior GPS
methods, which require a consistent set of initial states to which the system
must be reset after each episode, our approach can handle randomized initial
states, allowing it to be used in environments where deterministic resets are
impossible. We compare our method to existing policy search techniques in
simulation, showing that it can train high-dimensional neural network policies
with the same sample efficiency as prior GPS methods, and present real-world
results on a PR2 robotic manipulator.

Ensemble Maximum Entropy Classification and Linear Regression for Author Age Prediction

Joey Hong, Chris Mattmann, Paul Ramirez
Comments: 6 pages, 4 figures
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

The evolution of the internet has created an abundance of unstructured data
on the web, a significant part of which is textual. The task of author
profiling seeks to find the demographics of people solely from their linguistic
and content-based features in text. The ability to describe traits of authors
clearly has applications in fields such as security and forensics, as well as
marketing. Instead of seeing age as just a classification problem, we also
frame age as a regression one, but use an ensemble chain method that
incorporates the power of both classification and regression to learn the
authors exact age.

cleverhans v0.1: an adversarial machine learning library

Ian Goodfellow, Nicolas Papernot, Patrick McDaniel
Comments: Technical report for this https URL
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

cleverhans is a software library that provides standardized reference
implementations of adversarial example construction techniques and adversarial
training. The library may be used to develop more robust machine learning
models and to provide standardized benchmarks of models’ performance in the
adversarial setting. Benchmarks constructed without a standardized
implementation of adversarial example construction are not comparable to each
other, because a good result may indicate a robust model or it may merely
indicate a weak implementation of the adversarial example construction
procedure.

This technical report is structured as follows. Section 1 provides an
overview of adversarial examples in machine learning and of the cleverhans
software. Section 2 presents the core functionalities of the library: namely
the attacks based on adversarial examples and defenses to improve the
robustness of machine learning models to these attacks. Section 3 describes how
to report benchmark results using the library. Section 4 describes the
versioning system.

A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning

Aleksandr Aravkin, Damek Davis
Comments: 40 pages, 6 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

Machine learning theory typically assumes that training data is unbiased and
not adversarially generated. When real training data deviates from these
assumptions, trained models make erroneous predictions, sometimes with
disastrous effects. Robust losses, such as the huber norm, were designed to
mitigate the effects of such contaminated data, but they are limited to the
regression context.

In this paper, we show how to transform any optimization problem that arises
from fitting a machine learning model into one that (1) detects and removes
contaminated data from the training set while (2) simultaneously fitting the
trimmed model on the uncontaminated data that remains. To solve the resulting
nonconvex optimization problem, we introduce a fast stochastic
proximal-gradient algorithm that incorporates prior knowledge through nonsmooth
regularization. For datasets of size $n$, our approach requires
$O(n^{2/3}/varepsilon)$ gradient evaluations to reach $varepsilon$-accuracy
and, when a certain error bound holds, the complexity improves to $O(kappa
n^{2/3}log(1/varepsilon))$. These rates are $n^{1/3}$ times better than those
achieved by typical, full gradient methods.

FLOCK: Combating Astroturfing on Livestreaming Platforms

Neil Shah
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

Livestreaming platforms have become increasingly popular in recent years as a
means of sharing and advertising creative content. Popular content streamers
who attract large viewership to their live broadcasts can earn a living by
means of ad revenue, donations and channel subscriptions. Unfortunately, this
incentivized popularity has simultaneously resulted in incentive for fraudsters
to provide services to astroturf, or artificially inflate viewership metrics by
providing fake “live” views to customers. Our work provides a number of major
contributions: (a) formulation: we are the first to introduce and characterize
the viewbot fraud problem in livestreaming platforms, (b) methodology: we
propose FLOCK, a principled and unsupervised method which efficiently and
effectively identifies botted broadcasts and their constituent botted views,
and (c) practicality: our approach achieves over 98% precision in identifying
botted broadcasts and over 90% precision/recall against sizable synthetically
generated viewbot attacks on a real-world livestreaming workload of over 16
million views and 92 thousand broadcasts. FLOCK successfully operates on larger
datasets in practice and is regularly used at a large, undisclosed
livestreaming corporation.

Tutorial on Answering Questions about Images with Deep Learning

Together with the development of more accurate methods in Computer Vision and
Natural Language Understanding, holistic architectures that answer on questions
about the content of real-world images have emerged. In this tutorial, we build
a neural-based approach to answer questions about images. We base our tutorial
on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the
models that we present here can achieve a competitive performance on both
datasets, in fact, they are among the best methods that use a combination of
LSTM with a global, full frame CNN representation of an image. We hope that
after reading this tutorial, the reader will be able to use Deep Learning
frameworks, such as Keras and introduced Kraino, to build various architectures
that will lead to a further performance improvement on this challenging task.

Applications of Online Deep Learning for Crisis Response Using Social Media Information

During natural or man-made disasters, humanitarian response organizations
look for useful information to support their decision-making processes. Social
media platforms such as Twitter have been considered as a vital source of
useful information for disaster response and management. Despite advances in
natural language processing techniques, processing short and informal Twitter
messages is a challenging task. In this paper, we propose to use Deep Neural
Network (DNN) to address two types of information needs of response
organizations: 1) identifying informative tweets and 2) classifying them into
topical classes. DNNs use distributed representation of words and learn the
representation as well as higher level features automatically for the
classification task. We propose a new online algorithm based on stochastic
gradient descent to train DNNs in an online fashion during disaster situations.
We test our models using a crisis-related real-world Twitter dataset.

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Alberto Bietti (Thoth, MSR – INRIA), Julien Mairal (Thoth)
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

Stochastic optimization algorithms with variance reduction have proven
successful for minimizing large finite sums of functions. However, in the
context of empirical risk minimization, it is often helpful to augment the
training set by considering random perturbations of input examples. In this
case, the objective is no longer a finite sum, and the main candidate for
optimization is the stochastic gradient descent method (SGD). In this paper, we
introduce a variance reduction approach for this setting when the objective is
strongly convex. After an initial linearly convergent phase, the algorithm
achieves a $O(1/t)$ convergence rate in expectation like SGD, but with a
constant factor that is typically much smaller, depending on the variance of
gradient estimates due to perturbations on a single example.

Embracing data abundance: BookTest Dataset for Reading Comprehension

There is a practically unlimited amount of natural language data available.
Still, recent work in text comprehension has focused on datasets which are
small relative to current computing possibilities. This article is making a
case for the community to move to larger data and as a step in that direction
it is proposing the BookTest, a new dataset similar to the popular Children’s
Book Test (CBT), however more than 60 times larger. We show that training on
the new data improves the accuracy of our Attention-Sum Reader model on the
original CBT test data by a much larger margin than many recent attempts to
improve the model architecture. On one version of the dataset our ensemble even
exceeds the human baseline provided by Facebook. We then show in our own human
study that there is still space for further improvement.

Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations

Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg
Comments: Submitted to International Conference on Robotics and Automation (ICRA) 2017
Subjects: Robotics (cs.RO); Learning (cs.LG)

Motivated by recent advances in Deep Learning for robot control, this paper
considers two learning algorithms in terms of how they acquire demonstrations.
“Human-Centric” (HC) sampling is the standard supervised learning algorithm,
where a human supervisor demonstrates the task by teleoperating the robot to
provide trajectories consisting of state-control pairs. “Robot-Centric” (RC)
sampling is an increasingly popular alternative used in algorithms such as
DAgger, where a human supervisor observes the robot executing a learned policy
and provides corrective control labels for each state visited. RC sampling can
be challenging for human supervisors and prone to mislabeling. RC sampling can
also induce error in policy performance because it repeatedly visits areas of
the state space that are harder to learn. Although policies learned with RC
sampling can be superior to HC sampling for standard learning models such as
linear SVMs, policies learned with HC sampling may be comparable with
highly-expressive learning models such as deep learning and hyper-parametric
decision trees, which have little model error. We compare HC and RC using a
grid world and a physical robot singulation task, where in the latter the input
is a binary image of a connected set of objects on a planar worksurface and the
policy generates a motion of the gripper to separate one object from the rest.
We observe in simulation that for linear SVMs, policies learned with RC
outperformed those learned with HC but that with deep models this advantage
disappears. We also find that with RC, the corrective control labels provided
by humans can be highly inconsistent. We prove there exists a class of examples
where in the limit, HC is guaranteed to converge to an optimal policy while RC
may fail to converge.

Revisiting Role Discovery in Networks: From Node to Edge Roles

Nesreen K. Ahmed, Ryan A. Rossi, Theodore L. Willke, Rong Zhou
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Social and Information Networks (cs.SI)

Previous work in network analysis has focused on modeling the
mixed-memberships of node roles in the graph, but not the roles of edges. We
introduce the edge role discovery problem and present a generalizable framework
for learning and extracting edge roles from arbitrary graphs automatically.
Furthermore, while existing node-centric role models have mainly focused on
simple degree and egonet features, this work also explores graphlet features
for role discovery. In addition, we also develop an approach for automatically
learning and extracting important and useful edge features from an arbitrary
graph. The experimental results demonstrate the utility of edge roles for
network analysis tasks on a variety of graphs from various problem domains.

The Search Problem in Mixture Models

Avik Ray, Joe Neeman, Sujay Sanghavi, Sanjay Shakkottai
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We consider the task of learning the parameters of a {em single} component
of a mixture model, for the case when we are given {em side information} about
that component; we call this the “search problem” in mixture models. We would
like to solve this with computational and sample complexity lower than solving
the overall original problem, where one learns parameters of all components.

Our main contributions are the development of a simple but general model for
the notion of side information, and a corresponding simple matrix-based
algorithm for solving the search problem in this general setting. We then
specialize this model and algorithm to four common scenarios: Gaussian mixture
models, LDA topic models, subspace clustering, and mixed linear regression. For
each one of these we show that if (and only if) the side information is
informative, we obtain better sample complexity than existing standard mixture
model algorithms (e.g. tensor methods). We also illustrate several natural ways
one can obtain such side information, for specific problem instances. Our
experiments on real datasets (NY Times, Yelp, BSDS500) further demonstrate the
practicality of our algorithms showing significant improvement in runtime and
accuracy.

Sequential Low-Rank Change Detection

Yao Xie, Lee Seversky
Comments: Presented at Allerton Conference, 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

Detecting emergence of a low-rank signal from high-dimensional data is an
important problem arising from many applications such as camera surveillance
and swarm monitoring using sensors. We consider a procedure based on the
largest eigenvalue of the sample covariance matrix over a sliding window to
detect the change. To achieve dimensionality reduction, we present a
sketching-based approach for rank change detection using the low-dimensional
linear sketches of the original high-dimensional observations. The premise is
that when the sketching matrix is a random Gaussian matrix, and the dimension
of the sketching vector is sufficiently large, the rank of sample covariance
matrix for these sketches equals the rank of the original sample covariance
matrix with high probability. Hence, we may be able to detect the low-rank
change using sample covariance matrices of the sketches without having to
recover the original covariance matrix. We character the performance of the
largest eigenvalue statistic in terms of the false-alarm-rate and the expected
detection delay, and present an efficient online implementation via subspace
tracking.

Video Pixel Networks

Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu
Comments: 16 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We propose a probabilistic video model, the Video Pixel Network (VPN), that
estimates the discrete joint distribution of the raw pixel values in a video.
The model and the neural architecture reflect the time, space and color
structure of video tensors and encode it as a four-dimensional dependency
chain. The VPN approaches the best possible performance on the Moving MNIST
benchmark, a leap over the previous state of the art, and the generated videos
show only minor deviations from the ground truth. The VPN also produces
detailed samples on the action-conditional Robotic Pushing benchmark and
generalizes to the motion of novel objects.

Information Theory

Energy-Aware Wireless Relay Selection in Load-Coupled OFDMA Cellular Networks

Lei You, Di Yuan, Nikolaos Pappas, Peter Värbrand
Comments: 4 pages, 2 figures
Subjects: Information Theory (cs.IT)

We investigate transmission energy minimization via optimizing wireless relay
selection in orthogonal-frequency-division multiple access (OFDMA) networks. We
take into account the impact of the load of cells on transmission energy. We
prove the NP-hardness of the energy-aware wireless relay selection problem. To
tackle the computational complexity, a partial optimality condition is derived
for providing insights in respect of designing an effective and efficient
algorithm. Numerical results show that the resulting algorithm achieves high
energy performance.

Multiuser Detection in Multibeam Satellite Systems: Theoretical Analysis and Practical Schemes

Giulio Colavolpe, Andrea Modenini, Amina Piemontese, Alessandro Ugolini
Comments: 30 pages, 21 figures, submitted to IEEE Trans. Commun
Subjects: Information Theory (cs.IT)

We consider the rates achievable by a user in a multibeam satellite system
for unicast applications, and propose alternatives to the conventional
single-user symbol-by-symbol detection applied at user terminals. Single-user
detection is known to suffer from strong degradation when the terminal is
located near the edge of the coverage area of a beam, and when aggressive
frequency reuse is adopted. For this reason, we consider multiuser detection,
and take into account the strongest interfering signal. We also analyze two
additional transmission strategies requiring modifications at medium access
control layer. We describe an information-theoretic framework to compare the
different strategies by computing the information rate of the user in the
reference beam. Furthermore, we analyze the performance of coded schemes that
could approach the information-theoretic limits. We show that classical codes
from the DVB-S2(X) standard are not suitable when multiuser detection is
adopted, and we propose two ways to improve the performance, based on the
redesign of the code and of the bit mapping.

Next-to-leading order corrections to capacity for nondispersive nonlinear optical fiber channel in intermediate power region

A. A. Panarin, A. V. Reznichenko, I. S. Terekhov
Comments: 9 pages, 2 figures
Subjects: Information Theory (cs.IT)

We consider the optical fiber channel modelled by the nonlinear
Shr”{o}dinger equation with zero dispersion and additive Gaussian noise. Using
Feynman path-integral approach for the model we find corrections to conditional
probability density function, output signal distribution, conditional and
output signal entropies, and the channel capacity at large signal-to-noise
ratio. We demonstrate that the correction to the channel capacity is positive
for large signal power. Therefore, this correction increases the earlier
calculated capacity for a nondispersive nonlinear optical fiber channel in the
intermediate power region.

Self-Dual Codes over $mathbb{Z}_2 imes (mathbb{Z}_2+umathbb{Z}_2)$

Long Yu, Qiong Huang, Hongwei Liu, Xiusheng Liu
Comments: 18 pages
Subjects: Information Theory (cs.IT)

In this paper, we study self-dual codes over $mathbb{Z}_2 imes
(mathbb{Z}_2+umathbb{Z}_2) $, where $u^2=0$. Three types of self-dual codes
are defined. For each type, the possible values $alpha,eta$ such that there
exists a code $mathcal{C}subseteq mathbb{Z}_{2}^alpha imes
(mathbb{Z}_2+umathbb{Z}_2)^eta$ are established. We also present several
approaches to construct self-dual codes over $mathbb{Z}_2 imes
(mathbb{Z}_2+umathbb{Z}_2) $. Moreover, the structure of two-weight self-dual
codes is completely obtained for $alpha cdoteta
eq 0$.

Code Design for Short Blocks: A Survey

Gianluigi Liva, Lorenzo Gaudio, Tudor Ninacs, Thomas Jerkovits
Comments: A preliminary version of this work was presented at the 25th Edition of the European Conference on Networks and Communications (EuCNC), June 2016. This version includes the performance of polar codes with list decoding and CRC
Subjects: Information Theory (cs.IT)

The design of block codes for short information blocks (e.g., a thousand or
less information bits) is an open research problem which is gaining relevance
thanks to emerging applications in wireless communication networks. In this
work, we review some of the most recent code constructions targeting the short
block regime, and we compare then with both finite-length performance bounds
and classical error correction coding schemes. We will see how it is possible
to effectively approach the theoretical bounds, with different performance vs.
decoding complexity trade-offs.

Broadcast Function Computation with Complementary Side Information

Jithin Ravi, Bikash Kumar Dey
Comments: Accepted to IEEE GLOBECOM NetCod 2016
Subjects: Information Theory (cs.IT)

We consider the function computation problem in a three node network with one
encoder and two decoders. The encoder has access to two correlated sources $X$
and $Y$. The encoder encodes $X^n$ and $Y^n$ into a message which is given to
two decoders. Decoder 1 and decoder 2 have access to $X$ and $Y$ respectively,
and they want to compute two functions $f(X,Y)$ and $g(X,Y)$ respectively using
the encoded message and their respective side information. We want to find the
optimum (minimum) encoding rate under the zero error and $epsilon$-error (i.e.
vanishing error) criteria. For the special case of this problem with $f(X,Y) =
Y$ and $g(X,Y) = X$, we show that the $epsilon$-error optimum rate is also
achievable with zero error. This result extends to a more general
`complementary delivery index coding’ problem with arbitrary number of messages
and decoders. For other functions, we show that the cut-set bound is achievable
under $epsilon$-error if $X$ and $Y$ are binary, or if the functions are from
a special class of `compatible’ functions which includes the case $f=g$.

Multiuser Rate-Diverse Network-Coded Multiple Access

Haoyuan Pan, Lu Lu, Soung Chang Liew
Comments: 10 pages
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

This paper presents the first Network-Coded Multiple Access (NCMA) system
with multiple users adopting different signal modulations, referred to as
rate-diverse NCMA. A distinguishing feature of NCMA is the joint use of
physical-layer network coding (PNC) and multiuser decoding (MUD) to boost
throughput of multipacket reception systems. In previous NCMA systems, users
adopt the same modulation regardless of their individual channel conditions.
This leads to suboptimal throughput for many practical scenarios, especially
when different users have widely varying channel conditions. A rate-diverse
NCMA system allows different users to use modulations that are commensurate
with their channel conditions. A key challenge is the design of the PNC mapping
and decoding mechanisms in NCMA when different users adopt different
modulations. While there have been past work on non-channel-coded rate-diverse
PNC, this paper is the first attempt to design channel-coded rate-diverse PNC
to ensure the reliability of the overall NCMA system. Specifically, we put
forth a symbol-splitting channel coding and modulation design so that PNC/NCMA
can work over different modulations. We implemented our rate-diverse NCMA
system on software-defined radios. Experimental results show that the
throughput of rate-diverse NCMA can outperform the state-of-the-art
rate-homogeneous NCMA by 80%. Overall, the introduction of rate diversity
significantly boosts the NCMA system throughput in practical scenarios.

Economical Energy Efficiency E3: An Advanced Performance Metric for 5G Systems

Zhipeng Yan, Mugen Peng, Chonggang Wang
Comments: 14 pages, 5 figures, Accepted by IEEE Wireless Commun
Subjects: Information Theory (cs.IT)

The performances of the fifth generation (5G) wireless communication systems
are significantly affected by edge cache and transport network. These emerging
components bring substantial costs of the placement and utilization, and the
evaluation of the cost impact is beyond the capability of traditional
performance metrics, including spectral efficiency (SE) and energy efficiency
(EE). In this article, economical energy efficiency (E3) is proposed, whose
core idea is to take SE/EE and cost into account to evaluate comprehensive
gains when different kinds of advanced technologies are used in 5G systems. The
E3results are shown when the transport network and edge cache are separately or
jointly used. Open issues in terms of modeling the cost, E3optimization based
radio resource allocation, and E3optimization for internet of things, are
identified as well.

Isometrically Self-dual Cyclic Codes

Yun Fan, Liang Zhang
Subjects: Information Theory (cs.IT)

General isometries of cyclic codes, including multipliers and translations,
are introduced; and isometrically self-dual cyclic codes are defined. In terms
of Type-I duadic splittings given by multipliers and translations, a necessary
and sufficient condition for the existence of isometrically self-dual cyclic
codes is obtained. A program to construct isometrically self-dual cyclic codes
is provided, and illustrated by several examples. In particular, a class of
isometrically self-dual MDS cyclic codes, which are alternant codes from a
class of generalized Reed-Solomon codes, is presented.