IT博客汇 | arXiv Paper Daily: Fri, 13 Jan 2017

arXiv Paper Daily: Fri, 13 Jan 2017

我爱机器学习(52ml.net)发表于 2017-01-13 00:00:00

Neural and Evolutionary Computing

Simplified Minimal Gated Unit Variations for Recurrent Neural Networks

Joel Heck, Fathi M. Salem
Comments: 5 pages, 3 Figures, 5 Tables
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Recurrent neural networks with various types of hidden units have been used
to solve a diverse range of problems involving sequence data. Two of the most
recent proposals, gated recurrent units (GRU) and minimal gated units (MGU),
have shown comparable promising results on example public datasets. In this
paper, we introduce three model variants of the minimal gated unit (MGU) which
further simplify that design by reducing the number of parameters in the
forget-gate dynamic equation. These three model variants, referred to simply as
MGU1, MGU2, and MGU3, were tested on sequences generated from the MNIST dataset
and from the Reuters Newswire Topics (RNT) dataset. The new models have shown
similar accuracy to the MGU model while using fewer parameters and thus
lowering training expense. One model variant, namely MGU2, performed better
than MGU on the datasets considered, and thus may be used as an alternate to
MGU or GRU in recurrent neural networks.

Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks

Yuzhen Lu, Fathi M. Salem
Comments: 5 pages, 4 Figures, 3 Tables. arXiv admin note: substantial text overlap with arXiv:1612.03707
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

The standard LSTM recurrent neural networks while very powerful in long-range
dependency sequence applications have highly complex structure and relatively
large (adaptive) parameters. In this work, we present empirical comparison
between the standard LSTM recurrent neural network architecture and three new
parameter-reduced variants obtained by eliminating combinations of the input
signal, bias, and hidden unit signals from individual gating signals. The
experiments on two sequence datasets show that the three new variants, called
simply as LSTM1, LSTM2, and LSTM3, can achieve comparable performance to the
standard LSTM model with less (adaptive) parameters.

Modularized Morphing of Neural Networks

Tao Wei, Changhu Wang, Chang Wen Chen
Comments: 12 pages, 6 figures, Under review as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In this work we study the problem of network morphism, an effective learning
scheme to morph a well-trained neural network to a new one with the network
function completely preserved. Different from existing work where basic
morphing types on the layer level were addressed, we target at the central
problem of network morphism at a higher level, i.e., how a convolutional layer
can be morphed into an arbitrary module of a neural network. To simplify the
representation of a network, we abstract a module as a graph with blobs as
vertices and convolutional layers as edges, based on which the morphing process
is able to be formulated as a graph transformation problem. Two atomic morphing
operations are introduced to compose the graphs, based on which modules are
classified into two families, i.e., simple morphable modules and complex
modules. We present practical morphing solutions for both of these two
families, and prove that any reasonable module can be morphed from a single
convolutional layer. Extensive experiments have been conducted based on the
state-of-the-art ResNet on benchmark datasets, and the effectiveness of the
proposed solution has been verified.

Computer Vision and Pattern Recognition

Comprehension-guided referring expressions

Ruotian Luo, Gregory Shakhnarovich
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We consider generation and comprehension of natural language referring
expression for objects in an image. Unlike generic “image captioning” which
lacks natural standard evaluation criteria, quality of a referring expression
may be measured by the receiver’s ability to correctly infer which object is
being described. Following this intuition, we propose two approaches to utilize
models trained for comprehension task to generate better expressions. First, we
use a comprehension module trained on human-generated expressions, as a
“critic” of referring expression generator. The comprehension module serves as
a differentiable proxy of human evaluation, providing training signal to the
generation module. Second, we use the comprehension module in a
generate-and-rerank pipeline, which chooses from candidate expressions
generated by a model according to their performance on the comprehension task.
We show that both approaches lead to improved referring expression generation
on multiple benchmark datasets.

Joint Dictionary Learning for Example-based Image Super-resolution

Mojtaba Sahraee-Ardakan, Mohsen Joneidi
Comments: 5 pages, 1 figure, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a new joint dictionary learning method for
example-based image super-resolution (SR), using sparse representation. The
low-resolution (LR) dictionary is trained from a set of LR sample image
patches. Using the sparse representation coefficients of these LR patches over
the LR dictionary, the high-resolution (HR) dictionary is trained by minimizing
the reconstruction error of HR sample patches. The error criterion used here is
the mean square error. In this way we guarantee that the HR patches have the
same sparse representation over HR dictionary as the LR patches over the LR
dictionary, and at the same time, these sparse representations can well
reconstruct the HR patches. Simulation results show the effectiveness of our
method compared to the state-of-art SR algorithms.

Scaling Binarized Neural Networks on Reconfigurable Logic

Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers
Comments: To appear in the PARMA-DITAM workshop at HiPEAC 2017, January 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Binarized neural networks (BNNs) are gaining interest in the deep learning
community due to their significantly lower computational and memory cost. They
are particularly well suited to reconfigurable logic devices, which contain an
abundance of fine-grained compute resources and can result in smaller, lower
power implementations, or conversely in higher classification rates. Towards
this end, the Finn framework was recently proposed for building fast and
flexible field programmable gate array (FPGA) accelerators for BNNs. Finn
utilized a novel set of optimizations that enable efficient mapping of BNNs to
hardware and implemented fully connected, non-padded convolutional and pooling
layers, with per-layer compute resources being tailored to user-provided
throughput requirements. However, FINN was not evaluated on larger topologies
due to the size of the chosen FPGA, and exhibited decreased accuracy due to
lack of padding. In this paper, we improve upon Finn to show how padding can be
employed on BNNs while still maintaining a 1-bit datapath and high accuracy.
Based on this technique, we demonstrate numerous experiments to illustrate
flexibility and scalability of the approach. In particular, we show that a
large BNN requiring 1.2 billion operations per frame running on an ADM-PCIE-8K5
platform can classify images at 12 kFPS with 671 us latency while drawing less
than 41 W board power and classifying CIFAR-10 images at 88.7% accuracy. Our
implementation of this network achieves 14.8 trillion operations per second. We
believe this is the fastest classification rate reported to date on this
benchmark at this level of accuracy.

A Digital Fuzzy Edge Detector for Color Images

Yuan-Hang Zhang, Xie Li, Jing-Yun Xiao
Comments: 6 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Edge detection is a classic problem in the field of image processing, which
lays foundations for other tasks such as image segmentation. Conventionally,
this operation is performed using gradient operators such as the Roberts or
Sobel operator, which can discover local changes in intensity levels. These
operators, however, perform poorly on low contrast images. In this paper, we
propose an edge detector architecture for color images based on fuzzy theory
and the Sobel operator. First, the R, G and B channels are extracted from an
image and enhanced using fuzzy methods, in order to suppress noise and improve
the contrast between the background and the objects. The Sobel operator is then
applied to each of the channels, which are finally combined into an edge map of
the origin image. Experimental results obtained through an FPGA-based
implementation have proved the proposed method effective.

Two-view 3D Reconstruction for Food Volume Estimation

Joachim Dehais, Marios Anthimopoulos, Sergey Shevchik, Stavroula Mougiakakou
Comments: 10 pages, 7 Tables, 8 Figures in IEEE Transactions on Multimedia, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The increasing prevalence of diet-related chronic diseases coupled with the
ineffectiveness of traditional diet management methods have resulted in a need
for novel tools to accurately and automatically assess meals. Recently,
computer vision based systems that use meal images to assess their content have
been proposed. Food portion estimation is the most difficult task for
individuals assessing their meals and it is also the least studied area. The
present paper proposes a three-stage system to calculate portion sizes using
two images of a dish acquired by mobile devices. The first stage consists in
understanding the configuration of the different views, after which a dense 3D
model is built from the two images; finally, this 3D model serves to extract
the volume of the different items. The system was extensively tested on 77 real
dishes of known volume, and achieved an average error of less than 10% in 5.5
seconds per dish. The proposed pipeline is computationally tractable and
requires no user input, making it a viable option for fully automated dietary
assessment.

Probabilistic Diffeomorphic Registration: Representing Uncertainty

Demian Wassermann (ATHENA), Matt Toews, Marc Niethammer, William Wells Iii
Journal-ref: Biomedical Image Registration, 2014, London, United Kingdom. pp.72
– 82, 2014
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel mathematical framework for representing
uncertainty in large deformation diffeomorphic image registration. The Bayesian
posterior distribution over the deformations aligning a moving and a fixed
image is approximated via a variational formulation. A stochastic differential
equation (SDE) modeling the deformations as the evolution of a time-varying
velocity field leads to a prior density over deformations in the form of a
Gaussian process. This permits estimating the full posterior distribution in
order to represent uncertainty, in contrast to methods in which the posterior
is approximated via Monte Carlo sampling or maximized in maximum a-posteriori
(MAP) estimation. The frame-work is demonstrated in the case of landmark-based
image registration, including simulated data and annotated pre and
intra-operative 3D images.

Ordered Pooling of Optical Flow Sequences for Action Recognition

Jue Wang, Anoop Cherian, Fatih Porikli
Comments: Accepted in WACV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Training of Convolutional Neural Networks (CNNs) on long video sequences is
computationally expensive due to the substantial memory requirements and the
massive number of parameters that deep architectures demand. Early fusion of
video frames is thus a standard technique, in which several consecutive frames
are first agglomerated into a compact representation, and then fed into the CNN
as an input sample. For this purpose, a summarization approach that represents
a set of consecutive RGB frames by a single dynamic image to capture pixel
dynamics is proposed recently. In this paper, we introduce a novel ordered
representation of consecutive optical flow frames as an alternative and argue
that this representation captures the action dynamics more effectively than RGB
frames. We provide intuitions on why such a representation is better for action
recognition. We validate our claims on standard benchmark datasets and
demonstrate that using summaries of flow images lead to significant
improvements over RGB frames while achieving accuracy comparable to the
state-of-the-art on UCF101 and HMDB datasets.

Light Source Point Cluster Selection Based Atmosphere Light Estimation

Wenbo Zhang, Xiaorong Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Atmosphere light value is a highly critical parameter in defogging algorithms
that are based on an atmosphere scattering model. Any error in atmosphere light
value will produce a direct impact on the accuracy of scattering computation
and thus bring chromatic distortion to restored images. To address this
problem, this paper propose a method that relies on clustering statistics to
estimate atmosphere light value. It starts by selecting in the original image
some potential atmosphere light source points, which are grouped into point
clusters by means of clustering technique. From these clusters, a number of
clusters containing candidate atmosphere light source points are selected, the
points are then analyzed statistically, and the cluster containing the most
candidate points is used for estimating atmosphere light value. The mean
brightness vector of the candidate atmosphere light points in the chosen point
cluster is taken as the estimate of atmosphere light value, while their
geometric center in the image is accepted as the location of atmosphere light.
Experimental results suggest that this statistics clustering method produces
more accurate atmosphere brightness vectors and light source locations. This
accuracy translates to, from a subjective perspective, more natural defogging
effect on the one hand and to the improvement in various objective image
quality indicators on the other hand.

Looking Beyond Appearances: Synthetic Training Data for Deep CNNs in Re-identificatio

Igor Barros Barbosa, Marco Cristani, Barbara Caputo, Aleksander Rognhaugen, Theoharis Theoharis
Comments: 14 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Re-identification is generally carried out by encoding the appearance of a
subject in terms of outfit, suggesting scenarios where people do not change
their attire. In this paper we overcome this restriction, by proposing a
framework based on a deep convolutional neural network, SOMAnet, that
additionally models other discriminative aspects, namely, structural attributes
of the human figure (e.g. height, obesity, gender). Our method is unique in
many respects. First, SOMAnet is based on the Inception architecture, departing
from the usual siamese framework. This spares expensive data preparation
(pairing images across cameras) and allows the understanding of what the
network learned. Second, and most notably, the training data consists of a
synthetic 100K instance dataset, SOMAset, created by photorealistic human body
generation software. Synthetic data represents a good compromise between
realistic imagery, usually not required in re-identification since surveillance
cameras capture low-resolution silhouettes, and complete control of the
samples, which is useful in order to customize the data w.r.t. the surveillance
scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on
recent re-identification benchmarks, outperforms all competitors, matching
subjects even with different apparel. The combination of synthetic data with
Inception architectures opens up new research avenues in re-identification.

Guaranteed Parameter Estimation for Discrete Energy Minimization

Mengtian Li, Daniel Huber
Comments: WACV 2017: IEEE Winter Conference on Applications of Computer Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Structural learning, a method to estimate the parameters for discrete energy
minimization, has been proven to be effective in solving computer vision
problems, especially in 3D scene parsing. As the complexity of the models
increases, structural learning algorithms turn to approximate inference to
retain tractability. Unfortunately, such methods often fail because the
approximation can be arbitrarily poor. In this work, we propose a method to
overcome this limitation through exploiting the properties of the joint problem
of training time inference and learning. With the help of the learning
framework, we transform the inapproximable inference problem into a polynomial
time solvable one, thereby enabling tractable exact inference while still
allowing an arbitrary graph structure and full potential interactions. Our
learning algorithm is guaranteed to return a solution with a bounded error to
the global optimal within the feasible parameter space. We demonstrate the
effectiveness of this method on two point cloud scene parsing datasets. Our
approach runs much faster and solves a problem that is intractable for
previous, well-known approaches.

Attention-Based Multimodal Fusion for Video Description

Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks
Comments: Submitted to CVPR 2017 for review, 8 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)

Currently successful methods for video description are based on
encoder-decoder sentence generation using recur-rent neural networks (RNNs).
Recent work has shown the advantage of integrating temporal and/or spatial
attention mechanisms into these models, in which the decoder net-work predicts
each word in the description by selectively giving more weight to encoded
features from specific time frames (temporal attention) or to features from
specific spatial regions (spatial attention). In this paper, we propose to
expand the attention model to selectively attend not just to specific times or
spatial regions, but to specific modalities of input such as image features,
motion features, and audio features. Our new modality-dependent attention
mechanism, which we call multimodal attention, provides a natural way to fuse
multimodal information for video description. We evaluate our method on the
Youtube2Text dataset, achieving results that are competitive with current state
of the art. More importantly, we demonstrate that our model incorporating
multimodal attention as well as temporal attention significantly outperforms
the model that uses temporal attention alone.

Linear Disentangled Representation Learning for Facial Actions

Xiang Xiang, Trac D. Tran
Comments: Codes available at this https URL and this https URL arXiv admin note: text overlap with arXiv:1410.1606
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Limited annotated data available for the recognition of facial expression and
action units embarrasses the training of deep networks, which can learn
disentangled invariant features. However, a linear model with just several
parameters normally is not demanding in terms of training data. In this paper,
we propose an elegant linear model to untangle confounding factors in
challenging realistic multichannel signals such as 2D face videos. The simple
yet powerful model does not rely on huge training data and is natural for
recognizing facial actions without explicitly disentangling the identity. Base
on well-understood intuitive linear models such as Sparse Representation based
Classification (SRC), previous attempts require a prepossessing of explicit
decoupling which is practically inexact. Instead, we exploit the low-rank
property across frames to subtract the underlying neutral faces which are
modeled jointly with sparse representation on the action components with group
sparsity enforced. On the extended Cohn-Kanade dataset (CK+), our one-shot
automatic method on raw face videos performs as competitive as SRC applied on
manually prepared action components and performs even better than SRC in terms
of true positive rate. We apply the model to the even more challenging task of
facial action unit recognition, verified on the MPI Face Video Database
(MPI-VDB) achieving a decent performance. All the programs and data have been
made publicly available.

Artificial Intelligence

Assertional Logic: Towards an Extensible Knowledge Model (extended abstract)

Yi Zhou
Comments: arXiv admin note: text overlap with arXiv:1603.03511
Subjects: Artificial Intelligence (cs.AI)

We argue that extensibility is a key challenge for knowledge representation.
For this purpose, we propose assertional logic – a knowledge model for easier
extension with new AI building blocks. In assertional logic, all syntactic
objects are categorized as set theoretic constructs including individuals,
concepts and operators, and all kinds of knowledge are formalized by equality
assertions. When extending with a new building block, one only needs to
consider its interactions with the basic form of knowledge (i.e., equality
assertions) without going deeper into its interactions with other existing
ones. We first present a primitive form of assertional logic that uses minimal
assumed knowledge and constructs. Then, we show how to extend it by
definitions, which are special kinds of knowledge, i.e., assertions. As a case
study, we show how assertional logic can be used to unify logic and
probability, and more important AI building blocks including time.

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)

In this paper, a novel architecture for a deep recurrent neural network,
residual LSTM is introduced. A plain LSTM has an internal memory cell that can
learn long term dependencies of sequential data. It also provides a temporal
shortcut path to avoid vanishing or exploding gradients in the temporal domain.
The proposed residual LSTM architecture provides an additional spatial shortcut
path from lower layers for efficient training of deep networks with multiple
LSTM layers. Compared with the previous work, highway LSTM, residual LSTM
reuses the output projection matrix and the output gate of LSTM to control the
spatial information flow instead of additional gate networks, which effectively
reduces more than 10% of network parameters. An experiment for distant speech
recognition on the AMI SDM corpus indicates that the performance of plain and
highway LSTM networks degrades with increasing network depth. For example,
10-layer plain and highway LSTM networks showed 13.7% and 6.2% increase in WER
over 3-layer baselines, respectively. On the contrary, 10-layer residual LSTM
networks provided the lowest WER 41.0%, which corresponds to 3.3% and 2.8% WER
reduction over 3-layer plain and highway LSTM networks, respectively. Training
with both the IHM and SDM corpora, the residual LSTM architecture provided
larger gain from increasing depth: a 10-layer residual LSTM showed 3.0% WER
reduction over the corresponding 5-layer one.

Information Retrieval

Cobwebs from the Past and Present: Extracting Large Social Networks using Internet Archive Data

Miroslav Shaltev, Jan-Hendrik Zab, Philipp Kemkes, Stefan Siersdorfer, Sergej Zerr
Comments: 5 pages, 5 figures, SIGIR ’16, July 17-21, 2016, Pisa, Italy
Journal-ref: Proceedings of the 39th International ACM SIGIR conference on
Research and Development in Information Retrieval, Pisa, Italy, July 17 – 21,
2016
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

Social graph construction from various sources has been of interest to
researchers due to its application potential and the broad range of technical
challenges involved. The World Wide Web provides a huge amount of continuously
updated data and information on a wide range of topics created by a variety of
content providers, and makes the study of extracted people networks and their
temporal evolution valuable for social as well as computer scientists. In this
paper we present SocGraph – an extraction and exploration system for social
relations from the content of around 2 billion web pages collected by the
Internet Archive over the 17 years time period between 1996 and 2013. We
describe methods for constructing large social graphs from extracted relations
and introduce an interface to study their temporal evolution.

Promoting Domain-Specific Terms in Topic Models with Informative Priors

Angela Fan, Finale Doshi-Velez, Luke Miratrix
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)

Latent Dirichlet Allocation (LDA) models trained without stopword removal
often produce topics with high posterior probabilities on uninformative words,
obscuring the underlying corpus content. Even when canonical stopwords are
manually removed, uninformative words common in that corpus will still dominate
the most probable words in a topic. We propose a simple strategy for
automatically promoting terms with domain relevance and demoting these
domain-specific stop words. Our approach is easily applied within any existing
LDA framework and increases the amount of domain-relevant content and reduces
the appearance of canonical and human-evaluated stopwords in three very
different domains: Department of Labor accident reports, online health forum
posts, and NIPS abstracts. Along the way, we show that standard topic quality
measures such as coherence and pointwise mutual information act
counter-intuitively in presence of common but irrelevant words. We also explain
why these standard metrics fall short, propose an additional topic quality
metric that targets the stopword problem, and show that it correlates with our
human subject experiments.

Computation and Language

SMARTies: Sentiment Models for Arabic Target Entities

Noura Farra, Kathleen McKeown
Comments: To be published in Proceedings of the European Chapter of the Association for Computational Linguistics (EACL 2017)
Subjects: Computation and Language (cs.CL)

We consider entity-level sentiment analysis in Arabic, a morphologically rich
language with increasing resources. We present a system that is applied to
complex posts written in response to Arabic newspaper articles. Our goal is to
identify important entity “targets” within the post along with the polarity
expressed about each target. We achieve significant improvements over multiple
baselines, demonstrating that the use of specific morphological representations
improves the performance of identifying both important targets and their
sentiment, and that the use of distributional semantic clusters further boosts
performances for these representations, especially when richer linguistic
resources are not available.

LanideNN: Multilingual Language Identification on Character Window

Tom Kocmi, Ondřej Bojar
Comments: Accepted to EACL 2017
Subjects: Computation and Language (cs.CL)

In language identification, a common first step in natural language
processing, we want to automatically determine the language of some input text.
Monolingual language identification assumes that the given document is written
in one language. In multilingual language identification, the document is
usually in two or three languages and we just want their names. We aim one step
further and propose a method for textual language identification where
languages can change arbitrarily and the goal is to identify the spans of each
of the languages. Our method is based on Bidirectional Recurrent Neural
Networks and it performs well in monolingual and multilingual language
identification tasks on six datasets covering 131 languages. The method keeps
the accuracy also for short documents and across domains, so it is ideal for
off-the-shelf use without preparation of training data.

A Data-Oriented Model of Literary Language

Andreas van Cranenburgh, Rens Bod
Comments: To be published in EACL 2017, 11 pages
Subjects: Computation and Language (cs.CL)

We consider the task of predicting how literary a text is, with a gold
standard from human ratings. Aside from a standard bigram baseline, we apply
rich syntactic tree fragments, mined from the training set, and a series of
hand-picked features. Our model is the first to distinguish degrees of highly
and less literary novels using a variety of lexical and syntactic features, and
explains 76.0 % of the variation in literary ratings.

Single-Pass, Adaptive Natural Language Filtering: Measuring Value in User Generated Comments on Large-Scale, Social Media News Forums

Manuel Amunategui
Comments: 52 pages and python code included in index
Subjects: Computation and Language (cs.CL)

There are large amounts of insight and social discovery potential in mining
crowd-sourced comments left on popular news forums like Reddit.com, Tumblr.com,
Facebook.com and Hacker News. Unfortunately, due the overwhelming amount of
participation with its varying quality of commentary, extracting value out of
such data isn’t always obvious nor timely. By designing efficient, single-pass
and adaptive natural language filters to quickly prune spam, noise, copy-cats,
marketing diversions, and out-of-context posts, we can remove over a third of
entries and return the comments with a higher probability of relatedness to the
original article in question. The approach presented here uses an adaptive,
two-step filtering process. It first leverages the original article posted in
the thread as a starting corpus to parse comments by matching intersecting
words and term-ratio balance per sentence then grows the corpus by adding new
words harvested from high-matching comments to increase filtering accuracy over
time.

Promoting Domain-Specific Terms in Topic Models with Informative Priors

Angela Fan, Finale Doshi-Velez, Luke Miratrix
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)

Latent Dirichlet Allocation (LDA) models trained without stopword removal
often produce topics with high posterior probabilities on uninformative words,
obscuring the underlying corpus content. Even when canonical stopwords are
manually removed, uninformative words common in that corpus will still dominate
the most probable words in a topic. We propose a simple strategy for
automatically promoting terms with domain relevance and demoting these
domain-specific stop words. Our approach is easily applied within any existing
LDA framework and increases the amount of domain-relevant content and reduces
the appearance of canonical and human-evaluated stopwords in three very
different domains: Department of Labor accident reports, online health forum
posts, and NIPS abstracts. Along the way, we show that standard topic quality
measures such as coherence and pointwise mutual information act
counter-intuitively in presence of common but irrelevant words. We also explain
why these standard metrics fall short, propose an additional topic quality
metric that targets the stopword problem, and show that it correlates with our
human subject experiments.

An Empirical Comparison of Simple Domain Adaptation Methods for Neural Machine Translation

Chenhui Chu, Raj Dabre, Sadao Kurohashi
Comments: 5 pages
Subjects: Computation and Language (cs.CL)

In this paper, we compare two simple domain adaptation methods for neural
machine translation (NMT): (1) We append an artificial token to the source
sentences of two parallel corpora (different domains and one of them is
resource scarce) to indicate the domain and then mix them to learn a multi
domain NMT model; (2) We learn a NMT model on the resource rich domain corpus
and then fine tune it using the resource poor domain corpus. We empirically
verify fine tuning works better than the artificial token mechanism when the
low resource domain corpus is of relatively poor quality (acquired via
automatic extraction) but in the case of a high quality (manually created) low
resource domain corpus both methods are equally viable.

Generating Long and Diverse Responses with Neural Conversation Models

Louis Shao, Stephan Gouws, Denny Britz, Anna Goldie, Brian Strope, Ray Kurzweil
Comments: Submitted to ICLR 2017
Subjects: Computation and Language (cs.CL)

Building general-purpose conversation agents is a very challenging task, but
necessary on the road toward intelligent agents that can interact with humans
in natural language. Neural conversation models — purely data-driven systems
trained end-to-end on dialogue corpora — have shown great promise recently,
yet they often produce short and generic responses. This work presents new
training and decoding methods that improve the quality, coherence, and
diversity of long responses generated using sequence-to-sequence models. Our
approach adds self-attention to the decoder to maintain coherence in longer
responses, and we propose a practical approach, called the glimpse-model, for
scaling to large datasets. We introduce a stochastic beam-search algorithm with
segment-by-segment reranking which lets us inject diversity earlier in the
generation process. We trained on a combined data set of over 2.3B conversation
messages mined from the web. In human evaluation studies, our method produces
longer responses overall, with a higher proportion rated as acceptable and
excellent as length increases, compared to baseline sequence-to-sequence models
with explicit length-promotion. A back-off strategy produces better responses
overall, in the full spectrum of lengths.

Parsing Universal Dependencies without training

Héctor Martínez Alonso, Željko Agić, Barbara Plank, Anders Søgaard
Comments: EACL 2017, 8+2 pages
Subjects: Computation and Language (cs.CL)

We propose UDP, the first training-free parser for Universal Dependencies
(UD). Our algorithm is based on PageRank and a small set of head attachment
rules. It features two-step decoding to guarantee that function words are
attached as leaf nodes. The parser requires no training, and it is competitive
with a delexicalized transfer system. UDP offers a linguistically sound
unsupervised alternative to cross-lingual parsing for UD, which can be used as
a baseline for such systems. The parser has very few parameters and is
distinctly robust to domain change across languages.

De-identification In practice

Besat Kassaie
Subjects: Computation and Language (cs.CL)

We report our effort to identify the sensitive information, subset of data
items listed by HIPAA (Health Insurance Portability and Accountability), from
medical text using the recent advances in natural language processing and
machine learning techniques. We represent the words with high dimensional
continuous vectors learned by a variant of Word2Vec called Continous Bag Of
Words (CBOW). We feed the word vectors into a simple neural network with a Long
Short-Term Memory (LSTM) architecture. Without any attempts to extract manually
crafted features and considering that our medical dataset is too small to be
fed into neural network, we obtained promising results. The results thrilled us
to think about the larger scale of the project with precise parameter tuning
and other possible improvements.

Attention-Based Multimodal Fusion for Video Description

Currently successful methods for video description are based on
encoder-decoder sentence generation using recur-rent neural networks (RNNs).
Recent work has shown the advantage of integrating temporal and/or spatial
attention mechanisms into these models, in which the decoder net-work predicts
each word in the description by selectively giving more weight to encoded
features from specific time frames (temporal attention) or to features from
specific spatial regions (spatial attention). In this paper, we propose to
expand the attention model to selectively attend not just to specific times or
spatial regions, but to specific modalities of input such as image features,
motion features, and audio features. Our new modality-dependent attention
mechanism, which we call multimodal attention, provides a natural way to fuse
multimodal information for video description. We evaluate our method on the
Youtube2Text dataset, achieving results that are competitive with current state
of the art. More importantly, we demonstrate that our model incorporating
multimodal attention as well as temporal attention significantly outperforms
the model that uses temporal attention alone.

Distributed, Parallel, and Cluster Computing

Comparing MapReduce and Pipeline Implementations for Counting Triangles

Edelmira Pasarella (Universitat Politecnica de Catalunya), Maria-Esther Vidal (Fraunhofer IAIS), Cristina Zoltan (Universitat Politecnica de Catalunya)
Comments: In Proceedings PROLE 2016, arXiv:1701.03069
Journal-ref: EPTCS 237, 2017, pp. 20-33
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

A common method to define a parallel solution for a computational problem
consists in finding a way to use the Divide and Conquer paradigm in order to
have processors acting on its own data and scheduled in a parallel fashion.
MapReduce is a programming model that follows this paradigm, and allows for the
definition of efficient solutions by both decomposing a problem into steps on
subsets of the input data and combining the results of each step to produce
final results. Albeit used for the implementation of a wide variety of
computational problems, MapReduce performance can be negatively affected
whenever the replication factor grows or the size of the input is larger than
the resources available at each processor. In this paper we show an alternative
approach to implement the Divide and Conquer paradigm, named dynamic pipeline.
The main features of dynamic pipelines are illustrated on a parallel
implementation of the well-known problem of counting triangles in a graph. This
problem is especially interesting either when the input graph does not fit in
memory or is dynamically generated. To evaluate the properties of pipeline, a
dynamic pipeline of processes and an ad-hoc version of MapReduce are
implemented in the language Go, exploiting its ability to deal with channels
and spawned processes. An empirical evaluation is conducted on graphs of
different topologies, sizes, and densities. Observed results suggest that
dynamic pipelines allows for an efficient implementation of the problem of
counting triangles in a graph, particularly, in dense and large graphs,
drastically reducing the execution time with respect to the MapReduce
implementation.

Using Multiple Seasonal Holt-Winters Exponential Smoothing to Predict Cloud Resource Provisioning

Ashraf A. Shahin
Journal-ref: (IJACSA) International Journal of Advanced Computer Science and
Applications,Vol. 7, No. 11, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Elasticity is one of the key features of cloud computing that attracts many
SaaS providers to minimize their services’ cost. Cost is minimized by
automatically provision and release computational resources depend on actual
computational needs. However, delay of starting up new virtual resources can
cause Service Level Agreement violation. Consequently, predicting cloud
resources provisioning gains a lot of attention to scale computational
resources in advance. However, most of current approaches do not consider
multi-seasonality in cloud workloads. This paper proposes cloud resource
provisioning prediction algorithm based on Holt-Winters exponential smoothing
method. The proposed algorithm extends Holt-Winters exponential smoothing
method to model cloud workload with multi-seasonal cycles. Prediction accuracy
of the proposed algorithm has been improved by employing Artificial Bee Colony
algorithm to optimize its parameters. Performance of the proposed algorithm has
been evaluated and compared with double and triple exponential smoothing
methods. Our results have shown that the proposed algorithm outperforms other
methods.

Automatic Cloud Resource Scaling Algorithm based on Long Short-Term Memory Recurrent Neural Network

Ashraf A. Shahin
Journal-ref: International Journal of Advanced Computer Science and
Applications (IJACSA), 7(12), 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Scalability is an important characteristic of cloud computing. With
scalability, cost is minimized by provisioning and releasing resources
according to demand. Most of current Infrastructure as a Service (IaaS)
providers deliver threshold-based auto-scaling techniques. However, setting up
thresholds with right values that minimize cost and achieve Service Level
Agreement is not an easy task, especially with variant and sudden workload
changes. This paper has proposed dynamic threshold based auto-scaling
algorithms that predict required resources using Long Short-Term Memory
Recurrent Neural Network and auto-scale virtual resources based on predicted
values. The proposed algorithms have been evaluated and compared with some of
existing algorithms. Experimental results show that the proposed algorithms
outperform other algorithms.

Towards a Semantics-Aware Code Transformation Toolchain for Heterogeneous Systems

Salvador Tamarit (Universidad Politècnica de Madrid), Julio Mariño (Universidad Politècnica de Madrid), Guillermo Vigueras (IMDEA Software Institute), Manuel Carro (IMDEA Software Institute)
Comments: In Proceedings PROLE 2016, arXiv:1701.03069. arXiv admin note: substantial text overlap with arXiv:1603.03011
Journal-ref: EPTCS 237, 2017, pp. 34-51
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

Obtaining good performance when programming heterogeneous computing platforms
poses significant challenges. We present a program transformation environment,
implemented in Haskell, where architecture-agnostic scientific C code with
semantic annotations is transformed into functionally equivalent code better
suited for a given platform. The transformation steps are represented as rules
that can be fired when certain syntactic and semantic conditions are fulfilled.
These rules are not hard-wired into the rewriting engine: they are written in a
C-like language and are automatically processed and incorporated into the
rewriting engine. That makes it possible for end-users to add their own rules
or to provide sets of rules that are adapted to certain specific domains or
purposes.

Learning

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)

In this paper, a novel architecture for a deep recurrent neural network,
residual LSTM is introduced. A plain LSTM has an internal memory cell that can
learn long term dependencies of sequential data. It also provides a temporal
shortcut path to avoid vanishing or exploding gradients in the temporal domain.
The proposed residual LSTM architecture provides an additional spatial shortcut
path from lower layers for efficient training of deep networks with multiple
LSTM layers. Compared with the previous work, highway LSTM, residual LSTM
reuses the output projection matrix and the output gate of LSTM to control the
spatial information flow instead of additional gate networks, which effectively
reduces more than 10% of network parameters. An experiment for distant speech
recognition on the AMI SDM corpus indicates that the performance of plain and
highway LSTM networks degrades with increasing network depth. For example,
10-layer plain and highway LSTM networks showed 13.7% and 6.2% increase in WER
over 3-layer baselines, respectively. On the contrary, 10-layer residual LSTM
networks provided the lowest WER 41.0%, which corresponds to 3.3% and 2.8% WER
reduction over 3-layer plain and highway LSTM networks, respectively. Training
with both the IHM and SDM corpora, the residual LSTM architecture provided
larger gain from increasing depth: a 10-layer residual LSTM showed 3.0% WER
reduction over the corresponding 5-layer one.

Modularized Morphing of Neural Networks

Tao Wei, Changhu Wang, Chang Wen Chen
Comments: 12 pages, 6 figures, Under review as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In this work we study the problem of network morphism, an effective learning
scheme to morph a well-trained neural network to a new one with the network
function completely preserved. Different from existing work where basic
morphing types on the layer level were addressed, we target at the central
problem of network morphism at a higher level, i.e., how a convolutional layer
can be morphed into an arbitrary module of a neural network. To simplify the
representation of a network, we abstract a module as a graph with blobs as
vertices and convolutional layers as edges, based on which the morphing process
is able to be formulated as a graph transformation problem. Two atomic morphing
operations are introduced to compose the graphs, based on which modules are
classified into two families, i.e., simple morphable modules and complex
modules. We present practical morphing solutions for both of these two
families, and prove that any reasonable module can be morphed from a single
convolutional layer. Extensive experiments have been conducted based on the
state-of-the-art ResNet on benchmark datasets, and the effectiveness of the
proposed solution has been verified.

Unsupervised Latent Behavior Manifold Learning from Acoustic Features: audio2behavior

Haoqi Li, Brian Baucom, Panayiotis Georgiou
Comments: Accepted by ICASSP 2017
Subjects: Learning (cs.LG); Sound (cs.SD)

Behavioral annotation using signal processing and machine learning is highly
dependent on training data and manual annotations of behavioral labels.
Previous studies have shown that speech information encodes significant
behavioral information and be used in a variety of automated behavior
recognition tasks. However, extracting behavior information from speech is
still a difficult task due to the sparseness of training data coupled with the
complex, high-dimensionality of speech, and the complex and multiple
information streams it encodes. In this work we exploit the slow varying
properties of human behavior. We hypothesize that nearby segments of speech
share the same behavioral context and hence share a similar underlying
representation in a latent space. Specifically, we propose a Deep Neural
Network (DNN) model to connect behavioral context and derive the behavioral
manifold in an unsupervised manner. We evaluate the proposed manifold in the
couples therapy domain and also provide examples from publicly available data
(e.g. stand-up comedy). We further investigate training within the couples’
therapy domain and from movie data. The results are extremely encouraging and
promise improved behavioral quantification in an unsupervised manner and
warrants further investigation in a range of applications.

Manifold Alignment Determination: finding correspondences across different data views

Andreas Damianou, Neil D. Lawrence, Carl Henrik Ek
Comments: NIPS workshop on Multi-Modal Machine Learning, 2015
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Probability (math.PR)

We present Manifold Alignment Determination (MAD), an algorithm for learning
alignments between data points from multiple views or modalities. The approach
is capable of learning correspondences between views as well as correspondences
between individual data-points. The proposed method requires only a few aligned
examples from which it is capable to recover a global alignment through a
probabilistic model. The strong, yet flexible regularization provided by the
generative model is sufficient to align the views. We provide experiments on
both synthetic and real data to highlight the benefit of the proposed approach.

Scaling Binarized Neural Networks on Reconfigurable Logic

Binarized neural networks (BNNs) are gaining interest in the deep learning
community due to their significantly lower computational and memory cost. They
are particularly well suited to reconfigurable logic devices, which contain an
abundance of fine-grained compute resources and can result in smaller, lower
power implementations, or conversely in higher classification rates. Towards
this end, the Finn framework was recently proposed for building fast and
flexible field programmable gate array (FPGA) accelerators for BNNs. Finn
utilized a novel set of optimizations that enable efficient mapping of BNNs to
hardware and implemented fully connected, non-padded convolutional and pooling
layers, with per-layer compute resources being tailored to user-provided
throughput requirements. However, FINN was not evaluated on larger topologies
due to the size of the chosen FPGA, and exhibited decreased accuracy due to
lack of padding. In this paper, we improve upon Finn to show how padding can be
employed on BNNs while still maintaining a 1-bit datapath and high accuracy.
Based on this technique, we demonstrate numerous experiments to illustrate
flexibility and scalability of the approach. In particular, we show that a
large BNN requiring 1.2 billion operations per frame running on an ADM-PCIE-8K5
platform can classify images at 12 kFPS with 671 us latency while drawing less
than 41 W board power and classifying CIFAR-10 images at 88.7% accuracy. Our
implementation of this network achieves 14.8 trillion operations per second. We
believe this is the fastest classification rate reported to date on this
benchmark at this level of accuracy.

Promoting Domain-Specific Terms in Topic Models with Informative Priors

Angela Fan, Finale Doshi-Velez, Luke Miratrix
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)

Latent Dirichlet Allocation (LDA) models trained without stopword removal
often produce topics with high posterior probabilities on uninformative words,
obscuring the underlying corpus content. Even when canonical stopwords are
manually removed, uninformative words common in that corpus will still dominate
the most probable words in a topic. We propose a simple strategy for
automatically promoting terms with domain relevance and demoting these
domain-specific stop words. Our approach is easily applied within any existing
LDA framework and increases the amount of domain-relevant content and reduces
the appearance of canonical and human-evaluated stopwords in three very
different domains: Department of Labor accident reports, online health forum
posts, and NIPS abstracts. Along the way, we show that standard topic quality
measures such as coherence and pointwise mutual information act
counter-intuitively in presence of common but irrelevant words. We also explain
why these standard metrics fall short, propose an additional topic quality
metric that targets the stopword problem, and show that it correlates with our
human subject experiments.

Sparse-TDA: Sparse Realization of Topological Data Analysis for Multi-Way Classification

Wei Guo, Krithika Manohar, Steven L. Brunton, Ashis G. Banerjee
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Topological data analysis (TDA) has emerged as one of the most promising
techniques to reconstruct the unknown shapes of high-dimensional spaces from
observed data samples. TDA, thus, yields key shape descriptors in the form of
persistent topological features that can be used for any supervised or
unsupervised learning task, including multi-way classification. Sparse
sampling, on the other hand, provides a highly efficient technique to
reconstruct signals in the spatial-temporal domain from just a few
carefully-chosen samples. Here, we present a new method, referred to as the
Sparse-TDA algorithm, that combines favorable aspects of the two techniques.
This combination is realized by selecting an optimal set of sparse pixel
samples from the persistent features generated by a vector-based TDA algorithm.
These sparse samples are selected using a low-rank matrix representation of
persistent features. We show that the Sparse-TDA method demonstrates promising
performance on three benchmark problems related to human posture recognition
and image texture classification.

Real-time eSports Match Result Prediction

Yifan Yang, Tian Qin, Yu-Heng Lei
Comments: 8 pages, 8 figures
Subjects: Applications (stat.AP); Learning (cs.LG)

In this paper, we try to predict the winning team of a match in the
multiplayer eSports game Dota 2. To address the weaknesses of previous work, we
consider more aspects of prior (pre-match) features from individual players’
match history, as well as real-time (during-match) features at each minute as
the match progresses. We use logistic regression, the proposed Attribute
Sequence Model, and their combinations as the prediction models. In a dataset
of 78362 matches where 20631 matches contain replay data, our experiments show
that adding more aspects of prior features improves accuracy from 58.69% to
71.49%, and introducing real-time features achieves up to 93.73% accuracy when
predicting at the 40th minute.

Information Theory

Distributed K-means over Compressed Binary Data

Elsa Dupraz
Comments: 5 pages, submitted at IEEE Signal Processing Letters
Subjects: Information Theory (cs.IT)

We consider a network of binary-valued sensors with a fusion center. The
fusion center has to perform K-means clustering on the binary data transmitted
by the sensors. In order to reduce the amount of data transmitted within the
network, the sensors compress their data with a source coding scheme based on
LDPC codes. We propose to apply the K-means algorithm directly over the
compressed data without reconstructing the original sensors measurements, in
order to avoid potentially complex decoding operations. We provide approximated
expressions of the error probabilities of the K-means steps in the compressed
domain. From these expressions, we show that applying the K-means algorithm in
the compressed domain enables to recover the clusters of the original domain.
Monte Carlo simulations illustrate the accuracy of the obtained approximated
error probabilities, and show that the coding rate needed to perform K-means
clustering in the compressed domain is lower than the rate needed to
reconstruct all the measurements.

Polar Codes for Arbitrary Classical-Quantum Channels and Arbitrary cq-MACs

Rajai Nasser, Joseph M. Renes
Comments: 30 pages. Submitted to IEEE Trans. Inform. Theory and in part to ISIT2017
Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)

We prove polarization theorems for arbitrary classical-quantum (cq) channels.
The input alphabet is endowed with an arbitrary Abelian group operation and an
Ar{i}kan-style transformation is applied using this operation. It is shown
that as the number of polarization steps becomes large, the synthetic
cq-channels polarize to deterministic homomorphism channels which project their
input to a quotient group of the input alphabet. This result is used to
construct polar codes for arbitrary cq-channels and arbitrary classical-quantum
multiple access channels (cq-MAC). The encoder can be implemented in (O(Nlog
N)) operations, where (N) is the blocklength of the code. A quantum successive
cancellation decoder for the constructed codes is proposed. It is shown that
the probability of error of this decoder decays faster than (2^{-N^{eta}})
for any (eta<frac{1}{2}).

Secure communications with cooperative jamming: Optimal power allocation and secrecy outage analysis

Kanapathippillai Cumanan, George C. Alexandropoulos, Zhgiuo Ding, George K. Karagiannidis
Subjects: Information Theory (cs.IT)

This paper studies the secrecy rate maximization problem of a secure wireless
communication system, in the presence of multiple eavesdroppers. The security
of the communication link is enhanced through cooperative jamming, with the
help of multiple jammers. First, a feasibility condition is derived to achieve
a positive secrecy rate at the destination. Then, we solve the original secrecy
rate maximization problem, which is not convex in terms of power allocation at
the jammers. To circumvent this non-convexity, the achievable secrecy rate is
approximated for a given power allocation at the jammers and the approximated
problem is formulated into a geometric programming one. Based on this
approximation, an iterative algorithm has been developed to obtain the optimal
power allocation at the jammers. Next, we provide a bisection approach, based
on one-dimensional search, to validate the optimality of the proposed
algorithm. In addition, by assuming Rayleigh fading, the secrecy outage
probability (SOP) of the proposed cooperative jamming scheme is analyzed. More
specifically, a single-integral form expression for SOP is derived for the most
general case as well as a closed-form expression for the special case of two
cooperative jammers and one eavesdropper. Simulation results have been provided
to validate the convergence and the optimality of the proposed algorithm as
well as the theoretical derivations of the presented SOP analysis.

A Study on Arbitrarily Varying Channels with Causal Side Information at the Encoder

Uzi Pereg, Yossef Steinberg
Subjects: Information Theory (cs.IT)

In this work, we study two models of arbitrarily varying channels, when
causal side information is available at the encoder in a causal manner. First,
we study the arbitrarily varying channel (AVC) with input and state
constraints, when the encoder has state information in a causal manner. Lower
and upper bounds on the random code capacity are developed. A lower bound on
the deterministic code capacity is established in the case of a
message-averaged input constraint. In the setting where a state constraint is
imposed on the jammer, while the user is under no constraints, the random code
bounds coincide, and the random code capacity is determined. Furthermore, for
this scenario, a generalized non-symmetrizability condition is stated, under
which the deterministic code capacity coincides with the random code capacity.

A second model considered in our work is the arbitrarily varying degraded
broadcast channel with causal side information at the encoder (without
constraints). We establish inner and outer bounds on both the random code
capacity region and the deterministic code capacity region. The capacity region
is then determined for a class of channels satisfying a condition on the mutual
informations between the strategy variables and the channel outputs. As an
example, we show that the condition holds for the arbitrarily varying binary
symmetric broadcast channel, and we find the corresponding capacity region.

Information-Theoretic Analysis of Refractory Effects in the P300 Speller

Vaishakhi Mayya, Boyla Mainsah, Galen Reeves
Subjects: Information Theory (cs.IT)

The P300 speller is a brain-computer interface that enables people with
neuromuscular disorders to communicate based on eliciting event-related
potentials (ERP) in electroencephalography (EEG) measurements. One challenge to
reliable communication is the presence of refractory effects in the P300 ERP
that induces temporal dependence in the user’s EEG responses. We propose a
model for the P300 speller as a communication channel with memory. By studying
the maximum information rate on this channel, we gain insight into the
fundamental constraints imposed by refractory effects. We construct codebooks
based on the optimal input distribution, and compare them to existing codebooks
in literature.

Finite-Length Bounds for Joint Source-Channel Coding with Markovian Source and Additive Channel Noise to Achieve Large and Moderate Deviation Bounds

Ryo Yaguchi, Masahito Hayashi
Subjects: Information Theory (cs.IT)

We derive novel upper and lower finite-length bounds of the error probability
in joint source-channel coding when the source obeys the ergodic Markov process
and the channel is a Markovian additive channel or a Markovian conditional
additive channel. These bounds achieve the tight bounds in the large and
moderate deviation regimes.

Position Aided Beam Alignment for Millimeter Wave Backhaul Systems with Large Phased Arrays

George C. Alexandropoulos
Comments: 6 pages, 3 figures, submitted to IEEE ICC 2017
Subjects: Information Theory (cs.IT)

Consensus has been reached between wireless industry bodies and academia on
the indispensability of network densification in meeting the throughput and
coverage demands for fifth generation (5G) wireless networks. One of the major
challenges with the current trend of dense multi-tier network deployments is
the wireless backhauling of the various small cells. Recently, wireless
backhaul communication has been largely realized with large antennas operating
in the millimeter wave (mmWave) frequency band and implementing highly
directional beamforming. In this paper, we focus on the alignment problem of
narrow beams between fixed position network nodes in mmWave backhaul systems
that are subject to small displacements due to wind flow or ground vibration.
We consider nodes equipped with antenna arrays that are capable of performing
only analog beamforming and communicate through wireless channels including a
line-of-sight component. Aiming at minimizing the time needed to achieve beam
alignment, we present an efficient method that capitalizes on the exchange of
position information between the nodes to design their beamforming and
combining vectors. Our representative numerical results on the outage
probability with the proposed beam alignment method offer useful preliminary
insights on the impact of some system and operation parameters.

Second Order Analysis for Joint Source-Channel Coding with Markovian Source

Ryo Yaguchi, Masahito Hayashi
Subjects: Information Theory (cs.IT)

We derive the second order rates of joint source-channel coding, whose source
obeys an irreducible and ergodic Markov process by introducing new distribution
family, switched Gaussian convolution distribution, when the channel is a
discrete memoryless. We also compare the joint source-channel scheme with the
separation scheme in the second order regime.

Heterogeneous Cellular Networks with LoS and NLoS Transmissions–The Role of Massive MIMO and Small Cells

Qi Zhang, Howard H. Yang, Tony Q. S. Quek, Jemin Lee
Subjects: Information Theory (cs.IT)

We develop a framework for downlink heterogeneous cellular networks with
line-of-sight (LoS) and non-line-of-sight (NLoS) transmissions. Using
stochastic geometry, we derive tight approximation of achievable downlink rate
that enables us to compare the performance between densifying small cells and
expanding BS antenna arrays. Interestingly, we find that adding small cells
into the network improves the achievable rate much faster than expanding
antenna arrays at the macro BS. However, when the small cell density exceeds a
critical threshold, the spacial densification will lose its benefits and
further impair the network capacity. To this end, we present the optimal small
cell density that maximizes the rate as practical deployment guidance. In
contrast, expanding macro BS antenna array can always benefit the capacity
until an upper bound caused by pilot contamination, and this bound also
surpasses the peak rate obtained from deployment of small cells. Furthermore,
we find that allocating part of antennas to distributed small cell BSs works
better than centralizing all antennas at the macro BS, and the optimal
allocation proportion is also given for practical configuration reference. In
summary, this work provides a further understanding on how to leverage small
cells and massive MIMO in future heterogeneous cellular networks deployment.

On the Successive Cancellation Decoding of Polar Codes with Arbitrary Binary Linear Kernels

Zhiliang Huang, Shiyi Zhang, Feiyan Zhang, Chunjiang Duanmu, Ming Chen
Comments: 8 pages, 2 figures, submitted to ISIT2017
Subjects: Information Theory (cs.IT)

A method for efficiently successive cancellation (SC) decoding of polar codes
with high-dimensional linear binary kernels (HDBK) is presented and analyzed.
We devise a (l)-expressions method which can obtain simplified recursive
formulas of SC decoder in likelihood ratio form for arbitrary linear binary
kernels to reduce the complexity of corresponding SC decoder. By considering
the bit-channel transition probabilities (W_{G}^{(cdot)}(cdot|0)) and
(W_{G}^{(cdot)}(cdot|1)) separately, a (W)-expressions method is proposed to
further reduce the complexity of HDBK SC decoder. For a (m imes m) binary
kernel, the complexity of straightforward SC decoder is (O(2^{m}Nlog N)). With
(W)-expressions, we reduce the complexity of straightforward SC decoder to
(O(m^{2}Nlog N)) when (mleq 16). Simulation results show that (16 imes16)
kernel polar codes offer significant advantages in terms of error performances
compared with (2 imes2) kernel polar codes under SC and list SC decoders.

Achievable Rate Region of the Zero-Forcing Precoder in a 2 X 2 MU-MISO Broadcast VLC Channel with Per-LED Peak Power Constraint and Dimming Control

Amit Agarwal, Saif Khan Mohammed
Comments: Submitted to IEEE Journal of Lightwave Technology
Subjects: Information Theory (cs.IT)

In this paper, we consider the 2 X 2 multi-user multiple-input-single-output
(MU-MISO) broadcast visible light communication (VLC) channel with two light
emitting diodes (LEDs) at the transmitter and a single photo diode (PD) at each
of the two users. We propose an achievable rate region of the Zero-Forcing (ZF)
precoder in this 2 X 2 MU-MISO VLC channel under a per-LED peak and average
power constraint, where the average optical power emitted from each LED is
fixed for constant lighting, but is controllable (referred to as dimming
control in IEEE 802.15.7 standard on VLC). We analytically characterize the
proposed rate region boundary and show that it is Pareto-optimal. Further
analysis reveals that the largest rate region is achieved when the fixed
per-LED average optical power is half of the allowed per-LED peak optical
power. We also propose a novel transceiver architecture where the channel
encoder and dimming control are separated which greatly simplifies the
complexity of the transceiver. A case study of an indoor VLC channel with the
proposed transceiver reveals that the achievable information rates are
sensitive to the placement of the LEDs and the PDs. An interesting observation
is that for a given placement of LEDs in a 5 m X 5 m X 3 m room, even with a
substantial displacement of the users from their optimum placement, reduction
in the achievable rates is not significant. This observation could therefore be
used to define “coverage zones” within a room where the reduction in the
information rates to the two users is within an acceptable tolerance limit.

Scalable Spectrum Allocation and User Association in Networks with Many Small Cells

Binnan Zhuang, Dongning Guo, Ermin Wei, Michael L. Honig
Comments: Submiited to IEEE Transactions on Communications
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

A scalable framework is developed to allocate radio resources across a large
number of densely deployed small cells with given traffic statistics on a slow
timescale. Joint user association and spectrum allocation is first formulated
as a convex optimization problem by dividing the spectrum among all possible
transmission patterns of active access points (APs). To improve scalability
with the number of APs, the problem is reformulated using local patterns of
interfering APs. To maintain global consistency among local patterns,
inter-cluster interaction is characterized as hyper-edges in a hyper-graph with
nodes corresponding to neighborhoods of APs. A scalable solution is obtained by
iteratively solving a convex optimization problem for bandwidth allocation with
reduced complexity and constructing a global spectrum allocation using
hyper-graph coloring. Numerical results demonstrate the proposed solution for a
network with 100 APs and several hundred user equipments. For a given quality
of service (QoS), the proposed scheme can increase the network capacity several
fold compared to assigning each user to the strongest AP with full-spectrum
reuse.

Shannon Shakes Hands with Chernoff: Big Data Viewpoint On Channel Information Measures

Shanyun Liu, Rui She, Jiaxun Lu, Pingyi Fan
Comments: 6 pages, 5 figures
Subjects: Information Theory (cs.IT)

Shannon entropy is the most crucial foundation of Information Theory, which
has been proven to be effective in many fields such as communications. Renyi
entropy and Chernoff information are other two popular measures of information
with wide applications. The mutual information is effective to measure the
channel information for the fact that it reflects the relation between output
variables and input variables. In this paper, we reexamine these channel
information measures in big data viewpoint by means of ACE algorithm. The
simulated results show us that decomposition results of Shannon and Chernoff
mutual information with respect to channel parametersare almost the same. In
this sense, Shannon shakes hands with Chernoff since they are different
measures of the same information quantity. We also propose a conjecture that
there is nature of channel information which is only decided by the channel
parameters.

Focusing on a Probability Element: Parameter Selection of Message Importance Measure in Big Data

Rui She, Shanyun Liu, Yunquan Dong, Pingyi Fan
Comments: 6 pages, 3 figures
Subjects: Information Theory (cs.IT)

Message importance measure (MIM) is applicable to characterize the importance
of information in the scenario of big data, similar to entropy in information
theory. In fact, MIM with a variable parameter can make an effect on the
characterization of distribution. Furthermore, by choosing an appropriate
parameter of MIM,it is possible to emphasize the message importance of a
certain probability element in a distribution. Therefore, parametric MIM can
play a vital role in anomaly detection of big data by focusing on probability
of an anomalous event. In this paper, we propose a parameter selection method
of MIM focusing on a probability element and then present its major properties.
In addition, we discuss the parameter selection with prior probability, and
investigate the availability in a statistical processing model of big data for
anomaly detection problem.

High-Mobility OFDM Downlink Transmission with Large-Scale Antenna Array

Wei Guo, Weile Zhang, Pengcheng Mu, Feifei Gao
Subjects: Information Theory (cs.IT)

In this correspondence, we propose a new receiver design for high-mobility
orthogonal frequency division multiplexing (OFDM) downlink transmissions with a
large-scale antenna array. The downlink signal experiences the challenging fast
time-varying propagation channel. The time-varying nature originates from the
multiple carrier frequency offsets (CFOs) due to the transceiver oscillator
frequency offset (OFO) and multiple Doppler shifts. Let the received signal
first go through a carefully designed beamforming network, which could separate
multiple CFOs in the spatial domain with sufficient number of receive antennas.
A joint estimation method for the Doppler shifts and the OFO is further
developed. Then the conventional single-CFO compensation and channel estimation
method can be carried out for each beamforming branch. The proposed receiver
design avoids the complicated time-varying channel estimation, which differs a
lot from the conventional methods. More importantly, the proposed scheme can be
applied to the commonly used time-varying channel models, such as the Jakes’
channel model.

Extended Gray-Wyner System with Complementary Causal Side Information

Cheuk Ting Li, Abbas El Gamal
Comments: 18 pages, 3 figures
Subjects: Information Theory (cs.IT)

We establish the rate region of an extended Gray-Wyner system for 2-DMS
((X,Y)) with two additional decoders having complementary causal side
information. This extension is interesting because in addition to the
operationally significant extreme points of the Gray-Wyner rate region, which
include Wyner’s common information, G{‘a}cs-K{“o}rner common information and
information bottleneck, the rate region for the extended system also includes
the K{“o}rner graph entropy, the privacy funnel and excess functional
information, as well as three new quantities of potential interest, as extreme
points. To simplify the investigation of the 5-dimensional rate region of the
extended Gray-Wyner system, we establish an equivalence of this region to a
3-dimensional mutual information region that consists of the set of all triples
of the form ((I(X;U),,I(Y;U),,I(X,Y;U))) for some (p_{U|X,Y}). We further
show that projections of this mutual information region yield the rate regions
for many settings involving a 2-DMS, including lossless source coding with
causal side information, distributed channel synthesis, and lossless source
coding with a helper.

How to Quantize (n) Outputs of a Binary Symmetric Channel to (n-1) Bits?

Wasim Huleihel, Or Ordentlich
Comments: 5 pages, submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

Suppose that (Y^n) is obtained by observing a uniform Bernoulli random vector
(X^n) through a binary symmetric channel with crossover probability (alpha).
The “most informative Boolean function” conjecture postulates that the maximal
mutual information between (Y^n) and any Boolean function (mathrm{b}(X^n)) is
attained by a dictator function. In this paper, we consider the “complementary”
case in which the Boolean function is replaced by
(f:left{0,1
ight}^nmapstoleft{0,1
ight}^{n-1}), namely, an (n-1) bit
quantizer, and show that (I(f(X^n);Y^n)leq (n-1)cdotleft(1-h(alpha)
ight))
for any such (f). Thus, in this case, the optimal function is of the form
(f(x^n)=(x_1,ldots,x_{n-1})).

Data compression for quantum population coding

Yuxiang Yang, Ge Bai, Giulio Chiribella, Masahito Hayashi
Comments: 14 Pages + Appendix; Comments are welcome
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We study the compression of arbitrary parametric families of (n) identically
prepared finite-dimensional quantum states, in a setting that can be regarded
as a quantum analogue of population coding. For a family with (f) free
parameters, we propose an asymptotically faithful protocol that requires a
memory of overall size ((f/2)log n). As an intermediate step, we solve the
problem of the optimal compression of (n) identically prepared displaced
thermal states. Moreover, we show that the amount of quantum memory used by our
protocol can be made arbitrarily small compared to the overall memory cost.
Finally, we show that our protocol achieve the ultimate bound predicted by
quantum Shannon theory.

Moderate Deviation Analysis for Classical-Quantum Channels and Quantum Hypothesis Testing

Hao-Chung Cheng, Min-Hsiu Hsieh
Comments: See also concurrent work by Christopher Chubb, Vincent Tan, and Marco Tomamichel
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

In this work, we study the optimal decay of error probability when the
transmission rate approaches channel capacity slowly, a research area known as
moderate deviation analysis. Our result shows that the reliable communication
through a classical-quantum channel with positive channel dispersion is
possible when the transmission rate approaches the channel capacity a rate
slower than (1/sqrt{n}). The proof employs a refined sphere-packing bound in
strong large deviation theory, and the asymptotic expansions of the
error-exponent functions. Moderate deviation analysis for quantum hypothesis
testing is also established. The converse directly follows from our channel
coding result. The achievability can be proved by using a recent noncommutative
concentration inequality.

Guaranteed recovery of quantum processes from few measurements

Martin Kliesch, Richard Kueng, Jens Eisert, David Gross
Comments: 26+5 pages, several figures and plots
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

Quantum process tomography is the task of reconstructing unknown quantum
channels from measured data. In this work, we introduce compressed
sensing-based methods that facilitate the reconstruction of quantum channels of
low Kraus rank. Our main contribution is the analysis of a natural measurement
model for this task: We assume that data is obtained by sending pure states
into the channel and measuring expectation values on the output. Neither
ancilla systems nor coherent operations across multiple channel uses are
required. Most previous results on compressed process reconstruction reduce the
problem to quantum state tomography on the channel’s Choi matrix. While this
ansatz yields recovery guarantees from an essentially minimal number of
measurements, physical implementations of such schemes would typically involve
ancilla systems. A priori, it is unclear whether a measurement model tailored
directly to quantum process tomography might require more measurements. We
establish that this is not the case. Technically, we prove recovery guarantees
for three different reconstruction algorithms. The reconstructions are based on
a trace, diamond, and (ell_2)-norm minimization, respectively. Our recovery
guarantees are uniform in the sense that with one random choice of measurement
settings all quantum channels can be recovered equally well. Moreover,
stability against arbitrary measurement noise and robustness against violations
of the low-rank assumption is guaranteed. Numerical studies demonstrate the
feasibility of the approach.

Moderate deviation analysis for classical communication over quantum channels

Christopher T. Chubb, Vincent Y. F. Tan, Marco Tomamichel
Comments: 23 pages, 1 figure. See also concurrent work by Cheng and Hsieh
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)

We analyse families of codes for classical data transmission over quantum
channels that have both a vanishing probability of error and a code rate
approaching capacity as the code length increases. To characterise the
fundamental tradeoff between decoding error, code rate and code length for such
codes we introduce a quantum generalisation of the moderate deviation analysis
proposed by Altug and Wagner as well as Polyanskiy and Verdu. We derive such a
tradeoff for classical-quantum (as well as image-additive) channels in terms of
the channel capacity and the channel dispersion, giving further evidence that
the latter quantity characterises the necessary backoff from capacity when
transmitting finite blocks of classical data. To derive these results we also
study asymmetric binary quantum hypothesis testing in the moderate deviations
regime. Due to the central importance of the latter task, we expect that our
techniques will find further applications in the analysis of other quantum
information processing tasks.

Causal Best Intervention Identification via Importance Sampling

Rajat Sen, Karthikeyan Shanmugam, Alexandros G. Dimakis, Sanjay Shakkottai
Comments: 33 pages, 7 figures
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

Motivated by applications in computational advertising and systems biology,
we consider the problem of identifying the best out of several possible soft
interventions at a source node (V) in a causal DAG, to maximize the expected
value of a target node (Y) (downstream of (V)). There is a fixed total budget
for sampling under various interventions. Also, there are cost constraints on
different types of interventions. We pose this as a best arm identification
problem with (K) arms, where each arm is a soft intervention at (V). The key
difference from the classical setting is that there is information leakage
among the arms. Each soft intervention is a distinct known conditional
probability distribution between (V) and its parents (pa(V)).

We propose an efficient algorithm that uses importance sampling to adaptively
sample using different interventions and leverage information leakage to choose
the best. We provide the first gap dependent simple regret and best arm
mis-identification error bounds for this problem. This generalizes the prior
bounds available for the simpler case of no information leakage. In the case of
no leakage, the number of samples required for identification is (upto polylog
factors) ( ilde{O} (max_i frac{i}{Delta_i^2})) where (Delta_i) is the gap
between the optimal and the (i)-th best arm. We generalize the previous result
for the causal setting and show that ( ilde{O}(max_i
frac{sigma_i^2}{Delta_i^2})) is sufficient where (sigma_i^2) is the
effective variance of an importance sampling estimator that eliminates the
(i)-th best arm out of a set of arms with gaps roughly at most twice
(Delta_i). We also show that (sigma_i^2 << i) in many cases. Our result also
recovers (up to constants) prior gap independent bounds for this setting. We
demonstrate that our algorithm empirically outperforms the state of the art,
through synthetic experiments.