IT博客汇 | arXiv Paper Daily: Mon, 17 Apr 2017

arXiv Paper Daily: Mon, 17 Apr 2017

我爱机器学习(52ml.net)发表于 2017-04-17 00:00:00

Neural and Evolutionary Computing

Runtime Analysis of the ((1+(λ,λ))) Genetic Algorithm on Random Satisfiable 3-CNF Formulas

Maxim Buzdalov, Benjamin Doerr
Comments: An extended abstract of this report will appear in the proceedings of the 2017 Genetic and Evolutionary Computation Conference (GECCO 2017)
Subjects: Neural and Evolutionary Computing (cs.NE)

The ((1+(lambda,lambda))) genetic algorithm, first proposed at GECCO 2013,
showed a surprisingly good performance on so me optimization problems. The
theoretical analysis so far was restricted to the OneMax test function, where
this GA profited from the perfect fitness-distance correlation. In this work,
we conduct a rigorous runtime analysis of this GA on random 3-SAT instances in
the planted solution model having at least logarithmic average degree, which
are known to have a weaker fitness distance correlation.

We prove that this GA with fixed not too large population size again obtains
runtimes better than (Theta(n log n)), which is a lower bound for most
evolutionary algorithms on pseudo-Boolean problems with unique optimum.
However, the self-adjusting version of the GA risks reaching population sizes
at which the intermediate selection of the GA, due to the weaker
fitness-distance correlation, is not able to distinguish a profitable offspring
from others. We show that this problem can be overcome by equipping the
self-adjusting GA with an upper limit for the population size. Apart from
sparse instances, this limit can be chosen in a way that the asymptotic
performance does not worsen compared to the idealistic OneMax case. Overall,
this work shows that the ((1+(lambda,lambda))) GA can provably have a good
performance on combinatorial search and optimization problems also in the
presence of a weaker fitness-distance correlation.

Reward-based stochastic self-configuration of neural circuits

David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, Wolfgang Maass
Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Experimental data suggest that neural circuits configure their synaptic
connectivity for a given computational task. They also point to dopamine-gated
stochastic spine dynamics as an important underlying mechanism, and they show
that the stochastic component of synaptic plasticity is surprisingly strong. We
propose a model that elucidates how task-dependent self-configuration of neural
circuits can emerge through these mechanisms. The Fokker-Planck equation allows
us to relate local stochastic processes at synapses to the stationary
distribution of network configurations, and thereby to computational properties
of the network. This framework suggests a new model for reward-gated network
plasticity, where one replaces the common policy gradient paradigm by
continuously ongoing stochastic policy search (sampling) from a posterior
distribution of network configurations. This posterior integrates priors that
encode for example previously attained knowledge and structural constraints.
This model can explain the experimentally found capability of neural circuits
to configure themselves for a given task, and to compensate automatically for
changes in the network or task. We also show that experimental data on
dopamine-modulated spine dynamics can be modeled within this theoretical
framework, and that a strong stochastic component of synaptic plasticity is
essential for its performance.

Computer Vision and Pattern Recognition

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering

Yunseok Jang, Yale Song, Youngjae Yu, Youngjin Kim, Gunhee Kim
Comments: Accepted paper at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision and language understanding has emerged as a subject undergoing intense
study in Artificial Intelligence. Among many tasks in this line of research,
visual question answering (VQA) has been one of the most successful ones, where
the goal is to learn a model that understands visual content at region-level
details and finds their associations with pairs of questions and answers in the
natural language form. Despite the rapid progress in the past few years, most
existing work in VQA have focused primarily on images. In this paper, we focus
on extending VQA to the video domain and contribute to the literature in three
important ways. First, we propose three new tasks designed specifically for
video VQA, which require spatio-temporal reasoning from videos to answer
questions correctly. Next, we introduce a new large-scale dataset for video VQA
named TGIF-QA that extends existing VQA work with our new tasks. Finally, we
propose a dual-LSTM based approach with both spatial and temporal attention,
and show its effectiveness over conventional VQA techniques through empirical
evaluations.

Deep Structured Learning for Facial Action Unit Intensity Estimation

Robert Walecki, Ognjen (Oggi)
Rudovic, Vladimir Pavlovic, Björn Schuller, Maja Pantic
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We consider the task of automated estimation of facial expression intensity.
This involves estimation of multiple output variables (facial action units —
AUs) that are structurally dependent. Their structure arises from statistically
induced co-occurrence patterns of AU intensity levels. Modeling this structure
is critical for improving the estimation performance; however, this performance
is bounded by the quality of the input features extracted from face images. The
goal of this paper is to model these structures and estimate complex feature
representations simultaneously by combining conditional random field (CRF)
encoded AU dependencies with deep learning. To this end, we propose a novel
Copula CNN deep learning approach for modeling multivariate ordinal variables.
Our model accounts for (ordinal) structure in output variables and their
(non)-(linear) dependencies via copula functions modeled as cliques of a CRF.
These are jointly optimized with deep CNN feature encoding layers using a newly
introduced balanced batch iterative training algorithm. We demonstrate the
effectiveness of our approach on the task of AU intensity estimation on two
benchmark datasets. We show that joint learning of the deep features and the
target output structure results in significant performance gains compared to
existing deep structured models for analysis of facial expressions.

3D seismic data denoising using two-dimensional sparse coding scheme

Ming-Jun Su, Jingbo Chang, Feng Qian, Guangmin Hu, Xiao-Yang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE)

Seismic data denoising is vital to geophysical applications and the
transform-based function method is one of the most widely used techniques.
However, it is challenging to design a suit- able sparse representation to
express a transform-based func- tion group due to the complexity of seismic
data. In this paper, we apply a seismic data denoising method based on
learning- type overcomplete dictionaries which uses two-dimensional sparse
coding (2DSC). First, we model the input seismic data and dictionaries as
third-order tensors and introduce tensor- linear combinations for data
approximation. Second, we ap- ply learning-type overcomplete dictionary, i.e.,
optimal sparse data representation is achieved through learning and training.
Third, we exploit the alternating minimization algorithm to solve the
optimization problem of seismic denoising. Finally we evaluate its denoising
performance on synthetic seismic data and land data survey. Experiment results
show that the two-dimensional sparse coding scheme reduces computational costs
and enhances the signal-to-noise ratio.

Parallel Multi Channel Convolution using General Matrix Multiplication

Aravind Vasudevan, Andrew Anderson, David Gregg
Comments: Under review at ASAP 2017 – The 28th Annual IEEE International Conference on Application-specific Systems, Architectures and Processors. 11 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Performance (cs.PF)

Convolutional neural networks (CNNs) have emerged as one of the most
successful machine learning technologies for image and video processing. The
most computationally intensive parts of CNNs are the convolutional layers,
which convolve multi-channel images with multiple kernels. A common approach to
implementing convolutional layers is to expand the image into a column matrix
(im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using
an existing parallel General Matrix Multiplication (GEMM) library. This im2col
conversion greatly increases the memory footprint of the input matrix and
reduces data locality.

In this paper we propose a new approach to MCMK convolution that is based on
General Matrix Multiplication (GEMM), but not on im2col. Our algorithm
eliminates the need for data replication on the input. By splitting a single
call to GEMM into several smaller calls, we can eliminate date size increases
on either the input or output of the convolution layer. We have implemented
several variants of our algorithm on CPU and GPU processors. On CPU, our
algorithm uses much less memory than im2col and in most cases is also faster.

Learning a collaborative multiscale dictionary based on robust empirical mode decomposition

Rui Chen, Huizhu Jia, Xiaodong Xie, Wen Gao
Comments: to be published in Neurocomputing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Dictionary learning is a challenge topic in many image processing areas. The
basic goal is to learn a sparse representation from an overcomplete basis set.
Due to combining the advantages of generic multiscale representations with
learning based adaptivity, multiscale dictionary representation approaches have
the power in capturing structural characteristics of natural images. However,
existing multiscale learning approaches still suffer from three main
weaknesses: inadaptability to diverse scales of image data, sensitivity to
noise and outliers, difficulty to determine optimal dictionary structure. In
this paper, we present a novel multiscale dictionary learning paradigm for
sparse image representations based on an improved empirical mode decomposition.
This powerful data-driven analysis tool for multi-dimensional signal can fully
adaptively decompose the image into multiscale oscillating components according
to intrinsic modes of data self. This treatment can obtain a robust and
effective sparse representation, and meanwhile generates a raw base dictionary
at multiple geometric scales and spatial frequency bands. This dictionary is
refined by selecting optimal oscillating atoms based on frequency clustering.
In order to further enhance sparsity and generalization, a tolerance dictionary
is learned using a coherence regularized model. A fast proximal scheme is
developed to optimize this model. The multiscale dictionary is considered as
the product of oscillating dictionary and tolerance dictionary. Experimental
results demonstrate that the proposed learning approach has the superior
performance in sparse image representations as compared with several competing
methods. We also show the promising results in image denoising application.

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B. Choy, Philip H. S. Torr, Manmohan Chandraker
Comments: Accepted at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce a Deep Stochastic IOC RNN Encoderdecoder framework, DESIRE, for
the task of future predictions of multiple interacting agents in dynamic
scenes. DESIRE effectively predicts future locations of objects in multiple
scenes by 1) accounting for the multi-modal nature of the future prediction
(i.e., given the same context, future may vary), 2) foreseeing the potential
future outcomes and make a strategic prediction based on that, and 3) reasoning
not only from the past motion history, but also from the scene context as well
as the interactions among the agents. DESIRE achieves these in a single
end-to-end trainable neural network model, while being computationally
efficient. The model first obtains a diverse set of hypothetical future
prediction samples employing a conditional variational autoencoder, which are
ranked and refined by the following RNN scoring-regression module. Samples are
scored by accounting for accumulated future rewards, which enables better
long-term strategic decisions similar to IOC frameworks. An RNN scene context
fusion module jointly captures past motion histories, the semantic scene
context and interactions among multiple agents. A feedback mechanism iterates
over the ranking and refinement to further boost the prediction accuracy. We
evaluate our model on two publicly available datasets: KITTI and Stanford Drone
Dataset. Our experiments show that the proposed model significantly improves
the prediction accuracy compared to other baseline methods.

Camera Calibration by Global Constraints on the Motion of Silhouettes

Gil Ben-Artzi
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address the problem of epipolar geometry using the motion of silhouettes.
Such methods match epipolar lines or frontier points across views, which are
then used as the set of putative correspondences. We introduce an approach that
improves by two orders of magnitude the performance over state-of-the-art
methods, by significantly reducing the number of outliers in the putative
matching. We model the frontier points’ correspondence problem as constrained
flow optimization, requiring small differences between their coordinates over
consecutive frames. Our approach is formulated as a Linear Integer Program and
we show that due to the nature of our problem, it can be solved efficiently in
an iterative manner. Our method was validated on four standard datasets
providing accurate calibrations across very different viewpoints.

Dataset Augmentation for Pose and Lighting Invariant Face Recognition

Daniel Crispell, Octavian Biris, Nate Crosswhite, Jeffrey Byrne, Joseph L. Mundy
Comments: Appeared in 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The performance of modern face recognition systems is a function of the
dataset on which they are trained. Most datasets are largely biased toward
“near-frontal” views with benign lighting conditions, negatively effecting
recognition performance on images that do not meet these criteria. The proposed
approach demonstrates how a baseline training set can be augmented to increase
pose and lighting variability using semi-synthetic images with simulated pose
and lighting conditions. The semi-synthetic images are generated using a fast
and robust 3-d shape estimation and rendering pipeline which includes the full
head and background. Various methods of incorporating the semi-synthetic
renderings into the training procedure of a state of the art deep neural
network-based recognition system without modifying the structure of the network
itself are investigated. Quantitative results are presented on the challenging
IJB-A identification dataset using a state of the art recognition pipeline as a
baseline.

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

Lukas Cavigelli, Philippe Degen, Luca Benini
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Extracting per-frame features using convolutional neural networks for
real-time processing of video data is currently mainly performed on powerful
GPU-accelerated workstations and compute clusters. However, there are many
applications such as smart surveillance cameras that require or would benefit
from on-site processing. To this end, we propose and evaluate a novel algorithm
for change-based evaluation of CNNs for video data recorded with a static
camera setting, exploiting the spatio-temporal sparsity of pixel changes. We
achieve an average speed-up of 8.6x over a cuDNN baseline on a realistic
benchmark with a negligible accuracy loss of less than 0.1% and no retraining
of the network. The resulting energy efficiency is 10x higher than per-frame
evaluation and reaches an equivalent of 328 GOp/s/W on the Tegra X1 platform.

FastVentricle: Cardiac Segmentation with ENet

Jesse Lieman-Sifry, Matthieu Le, Felix Lau, Sean Sall, Daniel Golden
Comments: 11 pages, 6 figures, Accepted to Functional Imaging and Modeling of the Heart (FIMH) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cardiac Magnetic Resonance (CMR) imaging is commonly used to assess cardiac
structure and function. One disadvantage of CMR is that post-processing of
exams is tedious. Without automation, precise assessment of cardiac function
via CMR typically requires an annotator to spend tens of minutes per case
manually contouring ventricular structures. Automatic contouring can lower the
required time per patient by generating contour suggestions that can be lightly
modified by the annotator. Fully convolutional networks (FCNs), a variant of
convolutional neural networks, have been used to rapidly advance the
state-of-the-art in automated segmentation, which makes FCNs a natural choice
for ventricular segmentation. However, FCNs are limited by their computational
cost, which increases the monetary cost and degrades the user experience of
production systems. To combat this shortcoming, we have developed the
FastVentricle architecture, an FCN architecture for ventricular segmentation
based on the recently developed ENet architecture. FastVentricle is 4x faster
and runs with 6x less memory than the previous state-of-the-art ventricular
segmentation architecture while still maintaining excellent clinical accuracy.

Visual Recognition of Paper Analytical Device Images for Detection of Falsified Pharmaceuticals

Sandipan Banerjee, James Sweet, Christopher Sweet, Marya Lieberman
Comments: in Proc. IEEE Winter Conference on Applications of Computer Vision (WACV), 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Falsification of medicines is a big problem in many developing countries,
where technological infrastructure is inadequate to detect these harmful
products. We have developed a set of inexpensive paper cards, called Paper
Analytical Devices (PADs), which can efficiently classify drugs based on their
chemical composition, as a potential solution to the problem. These cards have
different reagents embedded in them which produce a set of distinctive color
descriptors upon reacting with the chemical compounds that constitute
pharmaceutical dosage forms. If a falsified version of the medicine lacks the
active ingredient or includes substitute fillers, the difference in color is
perceivable by humans. However, reading the cards with accuracy takes training
and practice, which may hamper their scaling and implementation in low resource
settings. To deal with this, we have developed an automatic visual recognition
system to read the results from the PAD images. At first, the optimal set of
reagents was found by running singular value decomposition on the intensity
values of the color tones in the card images. A dataset of cards embedded with
these reagents is produced to generate the most distinctive results for a set
of 26 different active pharmaceutical ingredients (APIs) and excipients. Then,
we train two popular convolutional neural network (CNN) models, with the card
images. We also extract some “hand-crafted” features from the images and train
a nearest neighbor classifier and a non-linear support vector machine with
them. On testing, higher-level features performed much better in accurately
classifying the PAD images, with the CNN models reaching the highest average
accuracy of over 94\%.

Applying High-Resolution Visible Imagery to Satellite Melt Pond Fraction Retrieval: A Neural Network Approach

Qi Liu, Yawen Zhang, Qin Lv, Li Shang
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Computer Vision and Pattern Recognition (cs.CV)

During summer, melt ponds have a significant influence on Arctic sea-ice
albedo. The melt pond fraction (MPF) also has the ability to forecast the
Arctic sea-ice in a certain period. It is important to retrieve accurate melt
pond fraction (MPF) from satellite data for Arctic research. This paper
proposes a satellite MPF retrieval model based on the multi-layer neural
network, named MPF-NN. Our model uses multi-spectral satellite data as model
input and MPF information from multi-site and multi-period visible imagery as
prior knowledge for modeling. It can effectively model melt ponds evolution of
different regions and periods over the Arctic. Evaluation results show that the
MPF retrieved from MODIS data using the proposed model has an RMSE of 3.91% and
a correlation coefficient of 0.73. The seasonal distribution of MPF is also
consistent with previous results.

Close Yet Distinctive Domain Adaptation

Lingkun Luo, Xiaofang Wang, Shiqiang Hu, Chao Wang, Yuxing Tang, Liming Chen
Comments: 11pages, 3 figures, ICCV2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Domain adaptation is transfer learning which aims to generalize a learning
model across training and testing data with different distributions. Most
previous research tackle this problem in seeking a shared feature
representation between source and target domains while reducing the mismatch of
their data distributions. In this paper, we propose a close yet discriminative
domain adaptation method, namely CDDA, which generates a latent feature
representation with two interesting properties. First, the discrepancy between
the source and target domain, measured in terms of both marginal and
conditional probability distribution via Maximum Mean Discrepancy is minimized
so as to attract two domains close to each other. More importantly, we also
design a repulsive force term, which maximizes the distances between each label
dependent sub-domain to all others so as to drag different class dependent
sub-domains far away from each other and thereby increase the discriminative
power of the adapted domain. Moreover, given the fact that the underlying data
manifold could have complex geometric structure, we further propose the
constraints of label smoothness and geometric structure consistency for label
propagation. Extensive experiments are conducted on 36 cross-domain image
classification tasks over four public datasets. The comprehensive results show
that the proposed method consistently outperforms the state-of-the-art methods
with significant margins.

Artificial Intelligence

Incremental learning of high-level concepts by imitation

Mina Alibeigi, Majid Nili Ahmadabadi, Babak Nadjar Araabi
Comments: 6 pages, 5 figures, 2 tables, supplementary material, conference
Journal-ref: IEEE Transactions on Robotics 33.1 (2017): 153-168
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

Nowadays, robots become a companion in everyday life. To be well-accepted by
humans, robots should efficiently understand meanings of their partners’
motions and body language, and respond accordingly. Learning concepts by
imitation brings them this ability in a user-friendly way.

This paper presents a fast and robust model for Incremental Learning of
Concepts by Imitation (ILoCI). In ILoCI, observed multimodal spatio-temporal
demonstrations are incrementally abstracted and generalized based on both their
perceptual and functional similarities during the imitation. In this method,
perceptually similar demonstrations are abstracted by a dynamic model of mirror
neuron system. An incremental method is proposed to learn their functional
similarities through a limited number of interactions with the teacher.
Learning all concepts together by the proposed memory rehearsal enables robot
to utilize the common structural relations among concepts which not only
expedites the learning process especially at the initial stages, but also
improves the generalization ability and the robustness against discrepancies
between observed demonstrations.

Performance of ILoCI is assessed using standard LASA handwriting benchmark
data set. The results show efficiency of ILoCI in concept acquisition,
recognition and generation in addition to its robustness against variability in
demonstrations.

Environment-Independent Task Specifications via GLTL

Michael L. Littman, Ufuk Topcu, Jie Fu, Charles Isbell, Min Wen, James MacGlashan
Subjects: Artificial Intelligence (cs.AI)

We propose a new task-specification language for Markov decision processes
that is designed to be an improvement over reward functions by being
environment independent. The language is a variant of Linear Temporal Logic
(LTL) that is extended to probabilistic specifications in a way that permits
approximations to be learned in finite time. We provide several small
environments that demonstrate the advantages of our geometric LTL (GLTL)
language and illustrate how it can be used to specify standard
reinforcement-learning tasks straightforwardly.

Deep API Programmer: Learning to Program with APIs

Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli
Comments: 8 pages + 4 pages of supplementary material. Submitted to IJCAI 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

We present DAPIP, a Programming-By-Example system that learns to program with
APIs to perform data transformation tasks. We design a domain-specific language
(DSL) that allows for arbitrary concatenations of API outputs and constant
strings. The DSL consists of three family of APIs: regular expression-based
APIs, lookup APIs, and transformation APIs. We then present a novel neural
synthesis algorithm to search for programs in the DSL that are consistent with
a given set of examples. The search algorithm uses recently introduced neural
architectures to encode input-output examples and to model the program search
in the DSL. We show that synthesis algorithm outperforms baseline methods for
synthesizing programs on both synthetic and real-world benchmarks.

Graphical Models: An Extension to Random Graphs, Trees, and Other Objects

Neil Hallonquist
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

In this work, we consider an extension of graphical models to random graphs,
trees, and other objects. To do this, many fundamental concepts for
multivariate random variables (e.g., marginal variables, Gibbs distribution,
Markov properties) must be extended to other mathematical objects; it turns out
that this extension is possible, as we will discuss, if we have a consistent,
complete system of projections on a given object. Each projection defines a
marginal random variable, allowing one to specify independence assumptions
between them. Furthermore, these independencies can be specified in terms of a
small subset of these marginal variables (which we call the atomic variables),
allowing the compact representation of independencies by a directed graph.
Projections also define factors, functions on the projected object space, and
hence a projection family defines a set of possible factorizations for a
distribution; these can be compactly represented by an undirected graph.

The invariances used in graphical models are essential for learning
distributions, not just on multivariate random variables, but also on other
objects. When they are applied to random graphs and random trees, the result is
a general class of models that is applicable to a broad range of problems,
including those in which the graphs and trees have complicated edge structures.
These models need not be conditioned on a fixed number of vertices, as is often
the case in the literature for random graphs, and can be used for problems in
which attributes are associated with vertices and edges. For graphs,
applications include the modeling of molecules, neural networks, and relational
real-world scenes; for trees, applications include the modeling of infectious
diseases, cell fusion, the structure of language, and the structure of objects
in visual scenes. Many classic models are particular instances of this
framework.

Optimizing Differentiable Relaxations of Coreference Evaluation Metrics

Phong Le, Ivan Titov
Comments: 10 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

Coreference evaluation metrics are hard to optimize directly as they are
non-differentiable functions, not easily decomposable into elementary
decisions. Consequently, most approaches optimize objectives only indirectly
related to the end goal, resulting in suboptimal performance. Instead, we
propose a differentiable relaxation that lends itself to gradient-based
optimisation, thus bypassing the need for reinforcement learning or heuristic
modification of cross-entropy. We show that by modifying the training objective
of a competitive neural coreference system, we obtain a substantial gain in
performance. This suggests that our approach can be regarded as a viable
alternative to using reinforcement learning or more computationally expensive
imitation learning.

Ultrafast photonic reinforcement learning based on laser chaos

Makoto Naruse, Yuta Terashima, Atsushi Uchida, Song-Ju Kim
Subjects: Optics (physics.optics); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

Reinforcement learning involves decision making in dynamic and uncertain
environments, and constitutes one important element of artificial intelligence
(AI). In this paper, we experimentally demonstrate that the ultrafast chaotic
oscillatory dynamics of lasers efficiently solve the multi-armed bandit problem
(MAB), which requires decision making concerning a class of difficult
trade-offs called the exploration-exploitation dilemma. To solve the MAB, a
certain degree of randomness is required for exploration purposes. However,
pseudo-random numbers generated using conventional electronic circuitry
encounter severe limitations in terms of their data rate and the quality of
randomness due to their algorithmic foundations. We generate laser chaos
signals using a semiconductor laser sampled at a maximum rate of 100 GSample/s,
and combine it with a simple decision-making principle called tug-of-war with a
variable threshold, to ensure ultrafast, adaptive and accurate decision making
at a maximum adaptation speed of 1 GHz. We found that decision-making
performance was maximized with an optimal sampling interval, and we highlight
the exact coincidence between the negative autocorrelation inherent in laser
chaos and decision-making performance. This study paves the way for a new realm
of ultrafast photonics in the age of AI, where the ultrahigh bandwidth of
photons can provide new value.

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

Lukas Cavigelli, Philippe Degen, Luca Benini
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Extracting per-frame features using convolutional neural networks for
real-time processing of video data is currently mainly performed on powerful
GPU-accelerated workstations and compute clusters. However, there are many
applications such as smart surveillance cameras that require or would benefit
from on-site processing. To this end, we propose and evaluate a novel algorithm
for change-based evaluation of CNNs for video data recorded with a static
camera setting, exploiting the spatio-temporal sparsity of pixel changes. We
achieve an average speed-up of 8.6x over a cuDNN baseline on a realistic
benchmark with a negligible accuracy loss of less than 0.1% and no retraining
of the network. The resulting energy efficiency is 10x higher than per-frame
evaluation and reaches an equivalent of 328 GOp/s/W on the Tegra X1 platform.

Gamma Processes, Stick-Breaking, and Variational Inference

Anirban Roychowdhury, Brian Kulis
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

While most Bayesian nonparametric models in machine learning have focused on
the Dirichlet process, the beta process, or their variants, the gamma process
has recently emerged as a useful nonparametric prior in its own right. Current
inference schemes for models involving the gamma process are restricted to
MCMC-based methods, which limits their scalability. In this paper, we present a
variational inference framework for models involving gamma process priors. Our
approach is based on a novel stick-breaking constructive definition of the
gamma process. We prove correctness of this stick-breaking process by using the
characterization of the gamma process as a completely random measure (CRM), and
we explicitly derive the rate measure of our construction using Poisson process
machinery. We also derive error bounds on the truncation of the infinite
process required for variational inference, similar to the truncation analyses
for other nonparametric models based on the Dirichlet and beta processes. Our
representation is then used to derive a variational inference algorithm for a
particular Bayesian nonparametric latent structure formulation known as the
infinite Gamma-Poisson model, where the latent variables are drawn from a gamma
process prior with Poisson likelihoods. Finally, we present results for our
algorithms on nonnegative matrix factorization tasks on document corpora, and
show that we compare favorably to both sampling-based techniques and
variational approaches based on beta-Bernoulli priors.

Computation and Language

Cardinal Virtues: Extracting Relation Cardinalities from Text

Paramita Mirza, Simon Razniewski, Fariz Darari, Gerhard Weikum
Comments: 5 pages, ACL 2017 (short paper)
Subjects: Computation and Language (cs.CL)

Information extraction (IE) from text has largely focused on relations
between individual entities, such as who has won which award. However, some
facts are never fully mentioned, and no IE method has perfect recall. Thus, it
is beneficial to also tap contents about the cardinalities of these relations,
for example, how many awards someone has won. We introduce this novel problem
of extracting cardinalities and discusses the specific challenges that set it
apart from standard IE. We present a distant supervision method using
conditional random fields. A preliminary evaluation results in precision
between 3% and 55%, depending on the difficulty of relations.

Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Tobias Falke, Iryna Gurevych
Subjects: Computation and Language (cs.CL)

Concept maps can be used to concisely represent important information and
bring structure into large document collections. Therefore, we study a variant
of multi-document summarization that produces summaries in the form of concept
maps. However, suitable evaluation datasets for this task are currently
missing. To close this gap, we present a newly created corpus of concept maps
that summarize heterogeneous collections of web documents on educational
topics. It was created using a novel crowdsourcing approach that allows us to
efficiently determine important elements in large document collections. We
release the corpus along with a baseline system and proposed evaluation
protocol to enable further research on this variant of summarization.

Optimizing Differentiable Relaxations of Coreference Evaluation Metrics

Phong Le, Ivan Titov
Comments: 10 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

Coreference evaluation metrics are hard to optimize directly as they are
non-differentiable functions, not easily decomposable into elementary
decisions. Consequently, most approaches optimize objectives only indirectly
related to the end goal, resulting in suboptimal performance. Instead, we
propose a differentiable relaxation that lends itself to gradient-based
optimisation, thus bypassing the need for reinforcement learning or heuristic
modification of cross-entropy. We show that by modifying the training objective
of a competitive neural coreference system, we obtain a substantial gain in
performance. This suggests that our approach can be regarded as a viable
alternative to using reinforcement learning or more computationally expensive
imitation learning.

How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?

Georg Heigold, Günter Neumann, Josef van Genabith
Comments: 9 pages
Subjects: Computation and Language (cs.CL)

This paper investigates the robustness of NLP against perturbed word forms.
While neural approaches can achieve (almost) human-like accuracy for certain
tasks and conditions, they often are sensitive to small changes in the input
such as non-canonical input (e.g., typos). Yet both stability and robustness
are desired properties in applications involving user-generated content, and
the more as humans easily cope with such noisy or adversary conditions. In this
paper, we study the impact of noisy input. We consider different noise
distributions (one type of noise, combination of noise types) and mismatched
noise distributions for training and testing. Moreover, we empirically evaluate
the robustness of different models (convolutional neural networks, recurrent
neural networks, non-neural models), different basic units (characters, byte
pair encoding units), and different NLP tasks (morphological tagging, machine
translation).

Get To The Point: Summarization with Pointer-Generator Networks

Abigail See, Peter J. Liu, Christopher D. Manning
Comments: Accepted to ACL 2017
Subjects: Computation and Language (cs.CL)

Neural sequence-to-sequence models have provided a viable new approach for
abstractive text summarization (meaning they are not restricted to simply
selecting and rearranging passages from the original text). However, these
models have two shortcomings: they are liable to reproduce factual details
inaccurately, and they tend to repeat themselves. In this work we propose a
novel architecture that augments the standard sequence-to-sequence attentional
model in two orthogonal ways. First, we use a hybrid pointer-generator network
that can copy words from the source text via pointing, which aids accurate
reproduction of information, while retaining the ability to produce novel words
through the generator. Second, we use coverage to keep track of what has been
summarized, which discourages repetition. We apply our model to the CNN / Daily
Mail summarization task, outperforming the current abstractive state-of-the-art
by at least 2 ROUGE points.

Exploiting Cross-Sentence Context for Neural Machine Translation

Longyue Wang, Zhaopeng Tu, Andy Way, Qun Liu
Subjects: Computation and Language (cs.CL)

In translation, considering the document as a whole allows certain
ambiguities and inconsistencies to be resolved. In this paper, we propose a
cross-sentence context-aware approach and investigate the influence of
historical contextual information on the performance of neural machine
translation (NMT). First, this history is summarized in a hierarchical way. We
then integrate the historical representation into NMT in two strategies: 1) a
warm-start of encoder and decoder states, and 2) an auxiliary context source
for updating decoder states. Experimental results on a large Chinese-English
translation task show that our approach significantly improves upon a strong
attention-based NMT system by up to +2.1 BLEU points.

An entity-driven recursive neural network model for chinese discourse coherence modeling

Fan Xu, Shujing Du, Maoxi Li, Mingwen Wang
Comments: 9 pages
Journal-ref: International Journal of Artificial Intelligence and Applications
(IJAIA), Vol.8, No.2, March 2017
Subjects: Computation and Language (cs.CL)

Chinese discourse coherence modeling remains a challenge taskin Natural
Language Processing field.Existing approaches mostlyfocus on the need for
feature engineering, whichadoptthe sophisticated features to capture the logic
or syntactic or semantic relationships acrosssentences within a text.In this
paper, we present an entity-drivenrecursive deep modelfor the Chinese discourse
coherence evaluation based on current English discourse coherenceneural network
model. Specifically, to overcome the shortage of identifying the entity(nouns)
overlap across sentences in the currentmodel, Our combined modelsuccessfully
investigatesthe entities information into the recursive neural network
freamework.Evaluation results on both sentence ordering and machine translation
coherence rating task show the effectiveness of the proposed model, which
significantly outperforms the existing strong baseline.

Identity and Granularity of Events in Text

Piek Vossen, Agata Cybulska
Comments: Invited keynote speech by Piek Vossen at Cicling 2016
Subjects: Computation and Language (cs.CL)

In this paper we describe a method to detect event descrip- tions in
different news articles and to model the semantics of events and their
components using RDF representations. We compare these descriptions to solve a
cross-document event coreference task. Our com- ponent approach to event
semantics defines identity and granularity of events at different levels. It
performs close to state-of-the-art approaches on the cross-document event
coreference task, while outperforming other works when assuming similar quality
of event detection. We demonstrate how granularity and identity are
interconnected and we discuss how se- mantic anomaly could be used to define
differences between coreference, subevent and topical relations.

Distributed, Parallel, and Cluster Computing

Encoding Cardinality Constraints using Generalized Selection Networks

Michał Karpiński, Marek Piotrów
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

Boolean cardinality constraints state that at most (at least, or exactly) (k)
out of (n) propositional literals can be true. We propose a new class of
selection networks that can be used for an efficient encoding of them. Several
comparator networks have been proposed recently for encoding cardinality
constraints and experiments have proved their efficiency. Those were based
mainly on the odd-even or pairwise comparator networks. We use similar ideas,
but we extend the model of comparator networks so that the basic components are
not only comparators (2-sorters) but more general (m)-sorters, for (m geq 2).
The inputs are organized into (m) columns, in which elements are recursively
selected and, after that, columns are merged using an idea of multi-way
merging. We present two algorithms parametrized by (m geq 2). We call those
networks (m)-Wise Selection Network and (m)-Odd-Even Selection Network. We give
detailed construction of the mergers when (m=4). The construction can be
directly applied to any values of (k) and (n). The proposed encoding of sorters
is standard, therefore the arc-consistency is preserved. We prove correctness
of the constructions and present the theoretical and experimental evaluation,
which show that the new encodings are competitive to the other state-of-art
encodings.

HPTT: A High-Performance Tensor Transposition C++ Library

Paul Springer, Tong Su, Paolo Bientinesi
Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Recently we presented TTC, a domain-specific compiler for tensor
transpositions. Despite the fact that the performance of the generated code is
nearly optimal, due to its offline nature, TTC cannot be utilized in all the
application codes in which the tensor sizes and the necessary tensor
permutations are determined at runtime. To overcome this limitation, we
introduce the open-source C++ library High-Performance Tensor Transposition
(HPTT). Similar to TTC, HPTT incorporates optimizations such as blocking,
multi-threading, and explicit vectorization; furthermore it decomposes any
transposition into multiple loops around a so called micro-kernel. This modular
design—inspired by BLIS—makes HPTT easy to port to different architectures,
by only replacing the hand-vectorized micro-kernel (e.g., a 4×4 transpose).
HPTT also offers an optional autotuning framework—guided by a performance
mode—that explores a vast search space of implementations at runtime (similar
to FFTW). Across a wide range of different tensor transpositions and
architectures (e.g., Intel Ivy Bridge, Intel Knights Landing, ARMv7, IBM
Power7), HPTT attains a bandwidth comparable to that of SAXPY, and yields
remarkable speedups over Eigen’s tensor transposition implementation. Most
importantly, the integration of HPTT into the Cyclops Tensor Framework (CTF)
improves the overall performance of tensor contractions by up to 3.1x.

Learning

Lean From Thy Neighbor: Stochastic & Adversarial Bandits in a Network

L. Elisa Celis, Farnood Salehi
Comments: This article was first circulated in January 2015 and presented at ISMP 2015 under the title “Bandit in a Network” (this https URL&mmnno=264&ppnno=85856)
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

An individual’s decisions are often guided by those of his or her peers,
i.e., neighbors in a social network. Presumably, being privy to the experiences
of others aids in learning and decision making, but how much advantage does an
individual gain by observing her neighbors? Such problems make appearances in
sociology and economics and, in this paper, we present a novel model to capture
such decision-making processes and appeal to the classical multi-armed bandit
framework to analyze it. Each individual, in addition to her own actions, can
observe the actions and rewards obtained by her neighbors, and can use all of
this information in order to minimize her own regret. We provide algorithms for
this setting, both for stochastic and adversarial bandits, and show that their
regret smoothly interpolates between the regret in the classical bandit setting
and that of the full-information setting as a function of the neighbors’
exploration. In the stochastic setting the additional information must simply
be incorporated into the usual estimation of the rewards, while in the
adversarial setting this is attained by constructing a new unbiased estimator
for the rewards and appropriately bounding the amount of additional information
provided by the neighbors. These algorithms are optimal up to log factors;
despite the fact that the agents act independently and selfishly, this implies
that it is an approximate Nash equilibria for all agents to use our algorithms.
Further, we show via empirical simulations that our algorithms, often
significantly, outperform existing algorithms that one could apply to this
setting.

On Generalized Bellman Equations and Temporal-Difference Learning

Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton
Comments: 35 pages; an abridged version to appear at The 30th Canadian Conference on Artificial Intelligence, May, 2017
Subjects: Learning (cs.LG); Optimization and Control (math.OC)

We consider off-policy temporal-difference (TD) learning in discounted Markov
decision processes, where the goal is to evaluate a policy in a model-free way
by using observations of a state process generated without executing the
policy. To curb the high variance issue in off-policy TD learning, we propose a
new scheme of setting the (lambda)-parameters of TD, based on generalized
Bellman equations. Our scheme is to set (lambda) according to the eligibility
trace iterates calculated in TD, thereby easily keeping these traces in a
desired bounded range. Compared to prior works, this scheme is more direct and
flexible, and allows much larger (lambda) values for off-policy TD learning
with bounded traces. Using Markov chain theory, we prove the ergodicity of the
joint state-trace process under nonrestrictive conditions, and we show that
associated with our scheme is a generalized Bellman equation (for the policy to
be evaluated) that depends on both the evolution of (lambda) and the unique
invariant probability measure of the state-trace process. These results not
only lead immediately to a characterization of the convergence behavior of
least-squares based implementation of our scheme, but also prepare the ground
for further analysis of gradient-based implementations.

Close Yet Distinctive Domain Adaptation

Domain adaptation is transfer learning which aims to generalize a learning
model across training and testing data with different distributions. Most
previous research tackle this problem in seeking a shared feature
representation between source and target domains while reducing the mismatch of
their data distributions. In this paper, we propose a close yet discriminative
domain adaptation method, namely CDDA, which generates a latent feature
representation with two interesting properties. First, the discrepancy between
the source and target domain, measured in terms of both marginal and
conditional probability distribution via Maximum Mean Discrepancy is minimized
so as to attract two domains close to each other. More importantly, we also
design a repulsive force term, which maximizes the distances between each label
dependent sub-domain to all others so as to drag different class dependent
sub-domains far away from each other and thereby increase the discriminative
power of the adapted domain. Moreover, given the fact that the underlying data
manifold could have complex geometric structure, we further propose the
constraints of label smoothness and geometric structure consistency for label
propagation. Extensive experiments are conducted on 36 cross-domain image
classification tasks over four public datasets. The comprehensive results show
that the proposed method consistently outperforms the state-of-the-art methods
with significant margins.

Liquid Splash Modeling with Neural Networks

Kiwon Um, Xiangyu Hu, Nils Thuerey
Subjects: Graphics (cs.GR); Learning (cs.LG)

This paper proposes a new data-driven approach for modeling detailed splashes
for liquid simulations with neural networks. Our model learns to generate
small-scale splash detail for fluid-implicit-particle methods using training
data acquired from physically accurate, high-resolution simulations. We use
neural networks to model the regression of splash formation using a classifier
together with a velocity modification term. More specifically, we employ a
heteroscedastic model for the velocity updates. Our simulation results
demonstrate that our model significantly improves visual fidelity with a large
amount of realistic droplet formation and yields splash detail much more
efficiently than finer discretizations. We show this for two different spatial
scales and simulation setups.

Optimizing Differentiable Relaxations of Coreference Evaluation Metrics

Phong Le, Ivan Titov
Comments: 10 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

Coreference evaluation metrics are hard to optimize directly as they are
non-differentiable functions, not easily decomposable into elementary
decisions. Consequently, most approaches optimize objectives only indirectly
related to the end goal, resulting in suboptimal performance. Instead, we
propose a differentiable relaxation that lends itself to gradient-based
optimisation, thus bypassing the need for reinforcement learning or heuristic
modification of cross-entropy. We show that by modifying the training objective
of a competitive neural coreference system, we obtain a substantial gain in
performance. This suggests that our approach can be regarded as a viable
alternative to using reinforcement learning or more computationally expensive
imitation learning.

Learning a collaborative multiscale dictionary based on robust empirical mode decomposition

Rui Chen, Huizhu Jia, Xiaodong Xie, Wen Gao
Comments: to be published in Neurocomputing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Dictionary learning is a challenge topic in many image processing areas. The
basic goal is to learn a sparse representation from an overcomplete basis set.
Due to combining the advantages of generic multiscale representations with
learning based adaptivity, multiscale dictionary representation approaches have
the power in capturing structural characteristics of natural images. However,
existing multiscale learning approaches still suffer from three main
weaknesses: inadaptability to diverse scales of image data, sensitivity to
noise and outliers, difficulty to determine optimal dictionary structure. In
this paper, we present a novel multiscale dictionary learning paradigm for
sparse image representations based on an improved empirical mode decomposition.
This powerful data-driven analysis tool for multi-dimensional signal can fully
adaptively decompose the image into multiscale oscillating components according
to intrinsic modes of data self. This treatment can obtain a robust and
effective sparse representation, and meanwhile generates a raw base dictionary
at multiple geometric scales and spatial frequency bands. This dictionary is
refined by selecting optimal oscillating atoms based on frequency clustering.
In order to further enhance sparsity and generalization, a tolerance dictionary
is learned using a coherence regularized model. A fast proximal scheme is
developed to optimize this model. The multiscale dictionary is considered as
the product of oscillating dictionary and tolerance dictionary. Experimental
results demonstrate that the proposed learning approach has the superior
performance in sparse image representations as compared with several competing
methods. We also show the promising results in image denoising application.

Cross-media Similarity Metric Learning with Unified Deep Networks

Jinwei Qi, Xin Huang, Yuxin Peng
Comments: 19 pages, submitted to Multimedia Tools and Applications
Subjects: Multimedia (cs.MM); Learning (cs.LG); Machine Learning (stat.ML)

As a highlighting research topic in the multimedia area, cross-media
retrieval aims to capture the complex correlations among multiple media types.
Learning better shared representation and distance metric for multimedia data
is important to boost the cross-media retrieval. Motivated by the strong
ability of deep neural network in feature representation and comparison
functions learning, we propose the Unified Network for Cross-media Similarity
Metric (UNCSM) to associate cross-media shared representation learning with
distance metric in a unified framework. First, we design a two-pathway deep
network pretrained with contrastive loss, and employ double triplet similarity
loss for fine-tuning to learn the shared representation for each media type by
modeling the relative semantic similarity. Second, the metric network is
designed for effectively calculating the cross-media similarity of the shared
representation, by modeling the pairwise similar and dissimilar constraints.
Compared to the existing methods which mostly ignore the dissimilar constraints
and only use sample distance metric as Euclidean distance separately, our UNCSM
approach unifies the representation learning and distance metric to preserve
the relative similarity as well as embrace more complex similarity functions
for further improving the cross-media retrieval accuracy. The experimental
results show that our UNCSM approach outperforms 8 state-of-the-art methods on
4 widely-used cross-media datasets.

Deep API Programmer: Learning to Program with APIs

We present DAPIP, a Programming-By-Example system that learns to program with
APIs to perform data transformation tasks. We design a domain-specific language
(DSL) that allows for arbitrary concatenations of API outputs and constant
strings. The DSL consists of three family of APIs: regular expression-based
APIs, lookup APIs, and transformation APIs. We then present a novel neural
synthesis algorithm to search for programs in the DSL that are consistent with
a given set of examples. The search algorithm uses recently introduced neural
architectures to encode input-output examples and to model the program search
in the DSL. We show that synthesis algorithm outperforms baseline methods for
synthesizing programs on both synthetic and real-world benchmarks.

Stochastic Gradient Descent as Approximate Bayesian Inference

Stephan Mandt, Matthew D. Hoffman, David M. Blei
Comments: 26 pages + appendix
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Stochastic Gradient Descent with a constant learning rate (constant SGD)
simulates a Markov chain with a stationary distribution. With this perspective,
we derive several new results. (1) We show that constant SGD can be used as an
approximate Bayesian posterior inference algorithm. Specifically, we show how
to adjust the tuning parameters of constant SGD to best match the stationary
distribution to a posterior, minimizing the Kullback-Leibler divergence between
these two distributions. (2) We demonstrate that constant SGD gives rise to a
new variational EM algorithm that optimizes hyperparameters in complex
probabilistic models. (3) We also propose SGD with momentum for sampling and
show how to adjust the damping coefficient accordingly. (4) We analyze MCMC
algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we
quantify the approximation errors due to finite learning rates. Finally (5), we
use the stochastic process perspective to give a short proof of why Polyak
averaging is optimal. Based on this idea, we propose a scalable approximate
MCMC algorithm, the Averaged Stochastic Gradient Sampler.

Reward-based stochastic self-configuration of neural circuits

David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, Wolfgang Maass
Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Experimental data suggest that neural circuits configure their synaptic
connectivity for a given computational task. They also point to dopamine-gated
stochastic spine dynamics as an important underlying mechanism, and they show
that the stochastic component of synaptic plasticity is surprisingly strong. We
propose a model that elucidates how task-dependent self-configuration of neural
circuits can emerge through these mechanisms. The Fokker-Planck equation allows
us to relate local stochastic processes at synapses to the stationary
distribution of network configurations, and thereby to computational properties
of the network. This framework suggests a new model for reward-gated network
plasticity, where one replaces the common policy gradient paradigm by
continuously ongoing stochastic policy search (sampling) from a posterior
distribution of network configurations. This posterior integrates priors that
encode for example previously attained knowledge and structural constraints.
This model can explain the experimentally found capability of neural circuits
to configure themselves for a given task, and to compensate automatically for
changes in the network or task. We also show that experimental data on
dopamine-modulated spine dynamics can be modeled within this theoretical
framework, and that a strong stochastic component of synaptic plasticity is
essential for its performance.

Accelerated Stochastic Quasi-Newton Optimization on Riemann Manifolds

Anirban Roychowdhury, Srinivasan Parthasarathy
Comments: Note: the proof of Theorem 3.5 will be changed in a future revision
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Differential Geometry (math.DG); Machine Learning (stat.ML)

We propose L-BFGS and trust-region algorithms on Riemann manifolds that use
stochastic variance reduction techniques to speed up convergence. For the
stochastic L-BFGS algorithm we analyze and prove linear convergence rates for
geodesically convex problems on the manifold, without resorting to linesearch
methods designed to satisfy Wolfe conditions on the step size. To the best of
our knowledge our trust-region method with stochastic variance reduction
techniques is the first of its kind in the literature. We conduct experiments
on Karcher mean computation for positive definite matrices and computation of
leading eigenvectors for both synthetic and real data matrices, and demonstrate
notably better performance than recently-proposed first-order stochastic
optimization methods on Riemann manifolds, as well as standard trust-region
manifold optimization techniques.

Text-Independent Speaker Recognition for Low SNR Environments with Encryption

Aman Chadha, Divya Jyoti, M. Mani Roja
Comments: Biometrics, Pattern Recognition, Security, Speaker Individuality, Text-independence, Pitch Extraction, Voice Recognition, Autocorrelation; Published by Foundation of Computer Science, New York, USA
Journal-ref: International Journal of Computer Applications 31(10):43-50, 2011
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Learning (cs.LG)

Recognition systems are commonly designed to authenticate users at the access
control levels of a system. A number of voice recognition methods have been
developed using a pitch estimation process which are very vulnerable in low
Signal to Noise Ratio (SNR) environments thus, these programs fail to provide
the desired level of accuracy and robustness. Also, most text independent
speaker recognition programs are incapable of coping with unauthorized attempts
to gain access by tampering with the samples or reference database. The
proposed text-independent voice recognition system makes use of multilevel
cryptography to preserve data integrity while in transit or storage. Encryption
and decryption follow a transform based approach layered with pseudorandom
noise addition whereas for pitch detection, a modified version of the
autocorrelation pitch extraction algorithm is used. The experimental results
show that the proposed algorithm can decrypt the signal under test with
exponentially reducing Mean Square Error over an increasing range of SNR.
Further, it outperforms the conventional algorithms in actual identification
tasks even in noisy environments. The recognition rate thus obtained using the
proposed method is compared with other conventional methods used for speaker
identification.

Information Theory

Optimal Power Splitting for Simultaneous Information Detection and Energy Harvesting

Ali Kariminezhad, Soheil Gherekhloo, Aydin Sezgin
Subjects: Information Theory (cs.IT)

This letter deals with the joint information and energy processing at a
receiver of a point-to-point communication channel. In particular, the
trade-off between the achievable information rate and harvested energy for a
multiple-antenna power splitting (PS) receiver is investigated. Here, the rate-
energy region characterization is of particular interest, which is
intrinsically a non-convex problem. In this letter, an efficient algorithm is
proposed for obtaining an approximate solution to the problem in polynomial
time. This algorithm is mainly based on the Taylor approximation in conjunction
with semidefinite relaxation (SDR) which is solved by interior-point methods.
Moreover, we utilize the Gaussian randomization procedure to obtain a feasible
solution for the original problem. It is shown that by proper receiver design
the rate-energy region can be significantly enlarged compared to the state of
the art, while at the same time the receiver hardware costs is reduced by
utilizing less number of energy harvesting circuitry.

Limited Feedback in Single and Multi-user MIMO Systems with Finite-Bit ADCs

Jianhua Mo, Robert W. Heath Jr
Comments: 30 pages, 12 figures, submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

Communication systems with low-resolution analog-to-digital-converters (ADCs)
can exploit channel state information at the transmitter and receiver. This
paper presents codebook designs and performance analyses for limited feedback
MIMO systems with finite-bit ADCs. A point-to-point single-user channel is
firstly considered. When the received signal is sliced by 1-bit ADCs, the
absolute phase at the receiver is important to align the phase of the received
signals. A new codebook design for beamforming, which separately quantizes the
channel direction and the residual phase, is therefore proposed. For the
multi-bit case where the optimal transmission method is unknown, suboptimal
Gaussian signaling and eigenvector beamforming is assumed to obtain a lower
bound of the achievable rate. It is found that to limit the rate loss, more
feedback bits are needed in the medium SNR regime than the low and high SNR
regimes, which is quite different from the conventional infinite-bit ADC case.
Second, a multi-user system where a multiple-antenna transmitter sends signals
to multiple single-antenna receivers with finite-bit ADCs is considered. Based
on the derived performance loss due to finite-bit ADCs and finite-bit CSI
feedback, the number of bits per feedback should increase linearly with the ADC
resolution in order to restrict the rate loss.

Gbps User Rates Using mmWave Relayed Backhaul with High Gain Antennas

Jinfeng Du, Efe Onaran, Dmitry Chizhik, Sivarama Venkatesan, Reinaldo A. Valenzuela
Comments: Accepted for publication for IEEE Journal on Selected Areas in Communications (JSAC), Special Issue on Deployment and Performance Challenges for 5G
Subjects: Information Theory (cs.IT)

Delivering Gbps high user rate over long distances (around 1 km) is
challenging, and the abundant spectrum available in millimeter wave band cannot
solve the challenge by its own due to the severe path loss and other
limitations. Since it is economically challenging to deploy wired backhaul
every few hundred meters, relays (e.g., wireless access points) have been
proposed to extend the coverage of a base station which has wired connection to
the core network. These relays, deployed every few hundred meters, serve the
users in their vicinity and are backhauled to the base station through wireless
connections. In this work, the wireless relayed backhaul design has been
formulated as a topology-bandwidth-power joint optimization problem, and the
influence of path loss, angular spread, array size, and RF power limitation on
the user rate has been evaluated. It has been shown that for a linear network
deployed along the street at 28 GHz, when high joint directional gain (50 dBi)
is available, 1 Gbps user rate within cell range of 1 km can be delivered using
1.5 GHz of bandwidth (using single polarization antennas). The user rates drop
precipitously when joint directional gain is reduced, or when the path loss is
much more severe. When the number of RF chains is limited, the benefit of
larger arrays will eventually be surpassed by the increased channel estimation
penalty as the effective beamforming gain saturates owing to the channel
angular spread.

How Much Spectrum is Too Much in Millimeter Wave Wireless Access

Jinfeng Du, Reinaldo A. Valenzuela
Comments: Accepted for publication for IEEE Journal on Selected Areas in Communications (JSAC) Special Issue on Millimeter Wave Communications for Future Mobile Networks
Subjects: Information Theory (cs.IT)

A great increase in wireless access rates might be attainable by using the
large amount of spectrum available in the millimeter wave (mmWave, 30 – 300
GHz) band. However, due to higher propagation losses inherent in these
frequencies, to use wider bandwidth for transmission at ranges beyond 100
meters or in non-line-of-sight (NLOS) settings may be ineffective or even
counterproductive when the penalty for estimating the channel is taken into
account. In this work we quantify the maximum beneficial bandwidth for mmWave
transmission in some typical deployment scenarios which use pilot-based channel
estimation and assume a minimum mean square error (MMSE) channel estimator at
the receiver. We find that for an I.I.D. block fading model with coherence time
(T_c) and coherence bandwidth (B_c), for transmitters and receivers equipped
with a single antenna, the optimal (rate-maximizing) signal-to-noise-ratio is a
constant that only depends on the product (B_cT_c), which measures the channel
coherence and equals the average number of orthogonal symbols per each
independent channel coefficient. That is, for fixed channel coherence, the
optimal bandwidth scales linearly with the received signal power. Under some
typical deployment scenarios with both transmit and receive side beamforming, 1
GHz bandwidth can be too much.

A Low-Complexity Approach to Distributed Cooperative Caching with Geographic Constraints

Konstantin Avrachenkov, Jasper Goseling, Berksan Serbetci
Comments: 15 pages, 9 figures, accepted for presentation at SIGMETRICS’17 and for publication in the first issue of the PACM Series on Measurement and Analysis of Computing Systems (POMACS)
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

We consider caching in cellular networks in which each base station is
equipped with a cache that can store a limited number of files. The popularity
of the files is known and the goal is to place files in the caches such that
the probability that a user at an arbitrary location in the plane will find the
file that she requires in one of the covering caches is maximized.

We develop distributed asynchronous algorithms for deciding which contents to
store in which cache. Such cooperative algorithms require communication only
between caches with overlapping coverage areas and can operate in asynchronous
manner. The development of the algorithms is principally based on an
observation that the problem can be viewed as a potential game. Our basic
algorithm is derived from the best response dynamics. We demonstrate that the
complexity of each best response step is independent of the number of files,
linear in the cache capacity and linear in the maximum number of base stations
that cover a certain area. Then, we show that the overall algorithm complexity
for a discrete cache placement is polynomial in both network size and catalog
size. In practical examples, the algorithm converges in just a few iterations.
Also, in most cases of interest, the basic algorithm finds the best Nash
equilibrium corresponding to the global optimum. We provide two extensions of
our basic algorithm based on stochastic and deterministic simulated annealing
which find the global optimum.

Finally, we demonstrate the hit probability evolution on real and synthetic
networks numerically and show that our distributed caching algorithm performs
significantly better than storing the most popular content, probabilistic
content placement policy and Multi-LRU caching policies.

General three and four person two color Hat Game

Theo van Uem
Comments: 7 pages. arXiv admin note: text overlap with arXiv:1612.00276
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Information Theory (cs.IT)

N distinguishable players are randomly fitted with a white or black hat,
where the probabilities of getting a white or black hat may be different for
each player, but known to all the players. All players guess simultaneously the
color of their own hat observing only the hat colors of the other N-1 players.
It is also allowed for each player to pass: no color is guessed. The team wins
if at least one player guesses his hat color correctly and none of the players
has an incorrect guess. No communication of any sort is allowed, except for an
initial strategy session before the game begins. Our goal is to maximize the
probability of winning the game and to describe winning strategies, using the
concept of an adequate set. We find explicit solutions in case of N =3 and N
=4.

From Data to Decisions: Distributionally Robust Optimization is Optimal

Bart P.G. Van Parys, Peyman Mohajerin Esfahani, Daniel Kuhn
Comments: 30 pages, 2 figures. Submitted to Management Science
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

We study stochastic programs where the decision-maker cannot observe the
distribution of the exogenous uncertainties but has access to a finite set of
independent samples from this distribution. In this setting, the goal is to
find a procedure that transforms the data to an estimate of the expected cost
function under the unknown data-generating distribution, i.e., a predictor, and
an optimizer of the estimated cost function that serves as a near-optimal
candidate decision, i.e., a prescriptor. As functions of the data, predictors
and prescriptors constitute statistical estimators. We propose a
meta-optimization problem to find the least conservative predictors and
prescriptors subject to constraints on their out-of-sample disappointment. The
out-of-sample disappointment quantifies the probability that the actual
expected cost of the candidate decision under the unknown true distribution
exceeds its predicted cost. Leveraging tools from large deviations theory, we
prove that this meta-optimization problem admits a unique solution: The best
predictor-prescriptor pair is obtained by solving a distributionally robust
optimization problem over all distributions within a given relative entropy
distance from the empirical distribution of the data.