IT博客汇 | arXiv Paper Daily: Tue, 17 Jan 2017

arXiv Paper Daily: Tue, 17 Jan 2017

我爱机器学习(52ml.net)发表于 2017-01-17 00:00:00

Neural and Evolutionary Computing

Automatic Knot Adjustment Using Dolphin Echolocation Algorithm for B-Spline Curve Approximation

Hasan Ali Akyürek, Erkan Ülker, Barış Koçer
Comments: The Journal of MacroTrends in Technology and Innovation, Vol 4. Issue 1. 2016
Subjects: Graphics (cs.GR); Neural and Evolutionary Computing (cs.NE)

In this paper, a new approach to solve the cubic B-spline curve fitting
problem is presented based on a meta-heuristic algorithm called ” dolphin
echolocation “. The method minimizes the proximity error value of the selected
nodes that measured using the least squares method and the Euclidean distance
method of the new curve generated by the reverse engineering. The results of
the proposed method are compared with the genetic algorithm. As a result, this
new method seems to be successful.

End-to-End ASR-free Keyword Search from Speech

Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, Brian Kingsbury
Comments: Published in the IEEE 2017 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), scheduled for 5-9 March 2017 in New Orleans, Louisiana, USA
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

End-to-end (E2E) systems have achieved competitive results compared to
conventional hybrid hidden Markov model (HMM)-deep neural network based
automatic speech recognition (ASR) systems. Such E2E systems are attractive due
to the lack of dependence on alignments between input acoustic and output
grapheme or HMM state sequence during training. This paper explores the design
of an ASR-free end-to-end system for text query-based keyword search (KWS) from
speech trained with minimal supervision. Our E2E KWS system consists of three
sub-systems. The first sub-system is a recurrent neural network (RNN)-based
acoustic auto-encoder trained to reconstruct the audio through a
finite-dimensional representation. The second sub-system is a character-level
RNN language model using embeddings learned from a convolutional neural
network. Since the acoustic and text query embeddings occupy different
representation spaces, they are input to a third feed-forward neural network
that predicts whether the query occurs in the acoustic utterance or not. This
E2E ASR-free KWS system performs respectably despite lacking a conventional ASR
system and trains much faster.

Long Timescale Credit Assignment in NeuralNetworks with External Memory

Steven Stenberg Hansen
Comments: Accepted into the NIPS 2016 workshop on Continual Learning
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Credit assignment in traditional recurrent neural networks usually involves
back-propagating through a long chain of tied weight matrices. The length of
this chain scales linearly with the number of time-steps as the same network is
run at each time-step. This creates many problems, such as vanishing gradients,
that have been well studied. In contrast, a NNEM’s architecture recurrent
activity doesn’t involve a long chain of activity (though some architectures
such as the NTM do utilize a traditional recurrent architecture as a
controller). Rather, the externally stored embedding vectors are used at each
time-step, but no messages are passed from previous time-steps. This means that
vanishing gradients aren’t a problem, as all of the necessary gradient paths
are short. However, these paths are extremely numerous (one per embedding
vector in memory) and reused for a very long time (until it leaves the memory).
Thus, the forward-pass information of each memory must be stored for the entire
duration of the memory. This is problematic as this additional storage far
surpasses that of the actual memories, to the extent that large memories on
infeasible to back-propagate through in high dimensional settings. One way to
get around the need to hold onto forward-pass information is to recalculate the
forward-pass whenever gradient information is available. However, if the
observations are too large to store in the domain of interest, direct
reinstatement of a forward pass cannot occur. Instead, we rely on a learned
autoencoder to reinstate the observation, and then use the embedding network to
recalculate the forward-pass. Since the recalculated embedding vector is
unlikely to perfectly match the one stored in memory, we try out 2
approximations to utilize error gradient w.r.t. the vector in memory.

Computer Vision and Pattern Recognition

Hierarchical Salient Object Detection for Assisted Grasping

Dominik Alexander Klein, Boris Illing, Bastian Gaspers, Dirk Schulz, Armin Bernd Cremers
Comments: Submitted to ICRA 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

Visual scene decomposition into semantic entities is one of the major
challenges when creating a reliable object grasping system. Recently, we
introduced a bottom-up hierarchical clustering approach which is able to
segment objects and parts in a scene. In this paper, we introduce a transform
from such a segmentation into a corresponding, hierarchical saliency function.
In comprehensive experiments we demonstrate its ability to detect salient
objects in a scene. Furthermore, this hierarchical saliency defines a most
salient corresponding region (scale) for every point in an image. Based on
this, an easy-to-use pick and place manipulation system was developed and
tested exemplarily.

Automatic Spatial Context-Sensitive Cloud/Cloud-Shadow Detection in Multi-Source Multi-Spectral Earth Observation Images: AutoCloud+

Andrea Baraldi
Comments: Invitation to tender ESA/AO/1 8373/15/I NB, VAE: Next Generation EO based Information Services
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The proposed Earth observation (EO) based value adding system (EO VAS),
hereafter identified as AutoCloud+, consists of an innovative EO image
understanding system (EO IUS) design and implementation capable of automatic
spatial context sensitive cloud/cloud shadow detection in multi source multi
spectral (MS) EO imagery, whether or not radiometrically calibrated, acquired
by multiple platforms, either spaceborne or airborne, including unmanned aerial
vehicles (UAVs). It is worth mentioning that the same EO IUS architecture is
suitable for a large variety of EO based value adding products and services,
including: (i) low level image enhancement applications, such as automatic MS
image topographic correction, co registration, mosaicking and compositing, (ii)
high level MS image land cover (LC) and LC change (LCC) classification and
(iii) content based image storage/retrieval in massive multi source EO image
databases (big data mining).

Geometric features for voxel-based surface recognition

Dmitry Yarotsky
Comments: 13 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We introduce a library of geometric voxel features for CAD surface
recognition/retrieval tasks. Our features include local versions of the
intrinsic volumes (the usual 3D volume, surface area, integrated mean and
Gaussian curvature) and a few closely related quantities. We also compute Haar
wavelet and statistical distribution features by aggregating raw voxel
features. We apply our features to object classification on the ESB data set
and demonstrate accurate results with a small number of shallow decision trees.

Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading

Chunlin Tian, Weijun Ji, Yuan Yuan
Comments: 9 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The Aduio-visual Speech Recognition (AVSR) which employs both the video and
audio information to do Automatic Speech Recognition (ASR) is one of the
application of multimodal leaning making ASR system more robust and accuracy.
The traditional models usually treated AVSR as inference or projection but
strict prior limits its ability. As the revival of deep learning, Deep Neural
Networks (DNN) becomes an important toolkit in many traditional classification
tasks including ASR, image classification, natural language processing. Some
DNN models were used in AVSR like Multimodal Deep Autoencoders (MDAEs),
Multimodal Deep Belief Network (MDBN) and Multimodal Deep Boltzmann Machine
(MDBM) that actually work better than traditional methods. However, such DNN
models have several shortcomings: (1) They don’t balance the modal fusion and
temporal fusion, or even haven’t temporal fusion; (2)The architecture of these
models isn’t end-to-end, the training and testing getting cumbersome. We
propose a DNN model, Auxiliary Multimodal LSTM (am-LSTM), to overcome such
weakness. The am-LSTM could be trained and tested in one time, alternatively
easy to train and preventing overfitting automatically. The extensibility and
flexibility are also take into consideration. The experiments shows that
am-LSTM is much better than traditional methods and other DNN models in three
datasets: AVLetters, AVLetters2, AVDigits.

Bandwidth limited object recognition in high resolution imagery

Laura Lopez-Fuentes, Andrew D.Bagdanov, Joost van de Weijer, Harald Skinnemoen
Comments: 9 pages, 9 figures, accepted in WACV
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel method to optimize bandwidth usage for object
detection in critical communication scenarios. We develop two operating models
of active information seeking. The first model identifies promising regions in
low resolution imagery and progressively requests higher resolution regions on
which to perform recognition of higher semantic quality. The second model
identifies promising regions in low resolution imagery while simultaneously
predicting the approximate location of the object of higher semantic quality.
From this general framework, we develop a car recognition system via
identification of its license plate and evaluate the performance of both models
on a car dataset that we introduce. Results are compared with traditional JPEG
compression and demonstrate that our system saves up to one order of magnitude
of bandwidth while sacrificing little in terms of recognition performance.

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks

Wenjie Luo, Yujia Li, Raquel Urtasun, Richard Zemel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

We study characteristics of receptive fields of units in deep convolutional
networks. The receptive field size is a crucial issue in many visual tasks, as
the output must respond to large enough areas in the image to capture
information about large objects. We introduce the notion of an effective
receptive field, and show that it both has a Gaussian distribution and only
occupies a fraction of the full theoretical receptive field. We analyze the
effective receptive field in several architecture designs, and the effect of
nonlinear activations, dropout, sub-sampling and skip connections on it. This
leads to suggestions for ways to address its tendency to be too small.

Light Source Estimation with Analytical Path-tracing

Mike Kasper, Nima Keivan, Gabe Sibley, Christoffer Heckman
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a novel algorithm for light source estimation in scenes
reconstructed with a RGB-D camera based on an analytically-derived formulation
of path-tracing. Our algorithm traces the reconstructed scene with a custom
path-tracer and computes the analytical derivatives of the light transport
equation from principles in optics. These derivatives are then used to perform
gradient descent, minimizing the photometric error between one or more captured
reference images and renders of our current lighting estimation using an
environment map parameterization for light sources. We show that our approach
of modeling all light sources as points at infinity approximates lights located
near the scene with surprising accuracy. Due to the analytical formulation of
derivatives, optimization to the solution is considerably accelerated. We
verify our algorithm using both real and synthetic data.

Embedding Watermarks into Deep Neural Networks

Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, Shin'ichi Satoh
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks have recently achieved significant progress. Sharing
trained models of these deep neural networks is very important in the rapid
progress of researching or developing deep neural network systems. At the same
time, it is necessary to protect the rights of shared trained models. To this
end, we propose to use a digital watermarking technology to protect
intellectual property or detect intellectual property infringement of trained
models. Firstly, we formulate a new problem: embedding watermarks into deep
neural networks. We also define requirements, embedding situations, and attack
types for watermarking to deep neural networks. Secondly, we propose a general
framework to embed a watermark into model parameters using a parameter
regularizer. Our approach does not hurt the performance of networks into which
a watermark is embedded. Finally, we perform comprehensive experiments to
reveal the potential of watermarking to deep neural networks as a basis of this
new problem. We show that our framework can embed a watermark in the situations
of training a network from scratch, fine-tuning, and distilling without hurting
the performance of a deep neural network. The embedded watermark does not
disappear even after fine-tuning or parameter pruning; the watermark completely
remains even after removing 65% of parameters were pruned.

The implementation of this research is: this https URL

Iterative Block Tensor Singular Value Thresholding for Extraction of Low Rank Component of Image Data

Longxi Chen, Yipeng Liu, Ce Zhu
Comments: accepted by ICASSP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Tensor principal component analysis (TPCA) is a multi-linear extension of
principal component analysis which converts a set of correlated measurements
into several principal components. In this paper, we propose a new robust TPCA
method to extract the princi- pal components of the multi-way data based on
tensor singular value decomposition. The tensor is split into a number of
blocks of the same size. The low rank component of each block tensor is
extracted using iterative tensor singular value thresholding method. The prin-
cipal components of the multi-way data are the concatenation of all the low
rank components of all the block tensors. We give the block tensor incoherence
conditions to guarantee the successful decom- position. This factorization has
similar optimality properties to that of low rank matrix derived from singular
value decomposition. Ex- perimentally, we demonstrate its effectiveness in two
applications, including motion separation for surveillance videos and
illumination normalization for face images.

Boosting Dictionary Learning with Error Codes

Yigit Oktar, Mehmet Turkan
Comments: 5 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In conventional sparse representations based dictionary learning algorithms,
initial dictionaries are generally assumed to be proper representatives of the
system at hand. However, this may not be the case, especially in some systems
restricted to random initializations. Therefore, a supposedly optimal
state-update based on such an improper model might lead to undesired effects
that will be conveyed to successive iterations. In this paper, we propose a
dictionary learning method which includes a general feedback process that codes
the intermediate error left over from a less intensive initial learning
attempt, and then adjusts sparse codes accordingly. Experimental observations
show that such an additional step vastly improves rates of convergence in
high-dimensional cases, also results in better converged states in the case of
random initializations. Improvements also scale up with more lenient sparsity
constraints.

Density-Wise Two Stage Mammogram Classification using Texture Exploiting Descriptors

Deepti Tamrakar, Kapil Ahuja
Comments: 23 Pages, 8 Figures, and 6 Tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Breast cancer is becoming increasing pervasive, and its early detection is an
important step in saving life of any patient. Mammography is an important tool
in breast cancer diagnosis. The most important step here is classification of
mammogram patch as normal-abnormal and benign-malignant.

Texture of a breast in a mammogram patch plays a big role in these
classifications. We propose a new feature extraction descriptor called
Histogram of Oriented Texture (HOT), which is a combination of Histogram of
Gradients (HOG) and Gabor filter, and exploits this fact. We also revisit Pass
Band Discrete Cosine Transform (PB-DCT) descriptor that captures texture
information well. All features of a mammogram patch may not be useful. Hence,
we apply a feature selection technique called Discrimination Potentiality (DP).
Our resulting descriptors, DP-HOT and DP-PB-DCT, are compared with standard
descriptors.

Density of mammogram is important for classification, and has not been
studied exhaustively. IRMA database, containing MIAS and DDSM datasets, is the
standard database that provides mammogram patches, and most researchers have
tested their frameworks only on a subset of patches from this database. We
apply our two new descriptors on it all images of IRMA database for density
wise classification, and compare with standard descriptors. We achieve higher
accuracy than all of the existing standard descriptors (100% for MIAS dataset,
and more than 92% for DDSM dataset).

Learning Linear Dynamical Systems with High-Order Tensor Data for Skeleton based Action Recognition

Wenwen Ding, Kai Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, there has been renewed interest in developing methods for
skeleton-based human action recognition. A skeleton sequence can be naturally
represented as a high-order tensor time series. In this paper, we model and
analyze tensor time series with Linear Dynamical System (LDS) which is the most
common for encoding spatio-temporal time-series data in various disciplines dut
to its relative simplicity and efficiency. However, the traditional LDS treats
the latent and observation state at each frame of video as a column vector.
Such a vector representation fails to take into account the curse of
dimensionality as well as valuable structural information with human action.
Considering this fact, we propose generalized Linear Dynamical System (gLDS)
for modeling tensor observation in the time series and employ Tucker
decomposition to estimate the LDS parameters as action descriptors. Therefore,
an action can be represented as a subspace corresponding to a point on a
Grassmann manifold. Then we perform classification using dictionary learning
and sparse coding over Grassmann manifold. Experiments on MSR Action3D Dataset,
UCF Kinect Dataset and Northwestern-UCLA Multiview Action3D Dataset demonstrate
that our proposed method achieves superior performance to the state-of-the-art
algorithms.

A Watermarking Technique Using Discrete Curvelet Transform for Security of Multiple Biometric Features

Rohit M. Thanki, Ved Vyas Dwivedi, Komal R. Borisagar
Journal-ref: International Journal of Information Processing,volume 10, issue
1, pp. 103 – 114 (2016)
Subjects: Multimedia (cs.MM); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

The robustness and security of the biometric watermarking approach can be
improved by using a multiple watermarking. This multiple watermarking proposed
for improving security of biometric features and data. When the imposter tries
to create the spoofed biometric feature, the invisible biometric watermark
features can provide appropriate protection to multimedia data. In this paper,
a biometric watermarking technique with multiple biometric watermarks are
proposed in which biometric features of fingerprint, face, iris and signature
is embedded in the image. Before embedding, fingerprint, iris, face and
signature features are extracted using Shen-Castan edge detection and Principal
Component Analysis. These all biometric watermark features are embedded into
various mid band frequency curvelet coefficients of host image. All four
fingerprint features, iris features, facial features and signature features are
the biometric characteristics of the individual and they are used for cross
verification and copyright protection if any manipulation occurs. The proposed
technique is fragile enough; features cannot be extracted from the watermarked
image when an imposter tries to remove watermark features illegally. It can use
for multiple copyright authentication and verification.

3D tracking of water hazards with polarized stereo cameras

Chuong V. Nguyen, Michael Milford, Robert Mahony
Comments: 8 pages, Submitted for review for ICRA 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Current self-driving car systems operate well in sunny weather but struggle
in adverse conditions. One of the most commonly encountered adverse conditions
involves water on the road caused by rain, sleet, melting snow or flooding.
While some advances have been made in using conventional RGB camera and LIDAR
technology for detecting water hazards, other sources of information such as
polarization offer a promising and potentially superior approach to this
problem in terms of performance and cost. In this paper, we present a novel
stereo-polarization system for detecting and tracking water hazards based on
polarization and color variation of reflected light. To evaluate this system,
we present a new large water-on-road datasets spanning approximately 2 km of
driving in various on road and off road conditions and demonstrate for the
first time reliable water detection and tracking over a wide range of realistic
car driving water conditions using polarized vision as the primary sensing
modality. Our system successfully detects water hazards up to 80m and doubles
the success detection range as compared to a similar polarized camera system.
Finally, we discuss several interesting challenges and propose future research
directions for further improving robust autonomous car perception in hazardous
wet conditions using polarization sensors.

On Hölder projective divergences

Frank Nielsen, Ke Sun, Stéphane Marchand-Maillet
Comments: 25 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

We describe a framework to build distances by measuring the tightness of
inequalities, and introduce the notion of proper statistical divergences and
improper pseudo-divergences. We then consider the H”older ordinary and reverse
inequalities, and present two novel classes of H”older divergences and
pseudo-divergences that both encapsulate the special case of the Cauchy-Schwarz
divergence. We report closed-form formulas for those statistical
dissimilarities when considering distributions belonging to the same
exponential family provided that the natural parameter space is a cone (e.g.,
multivariate Gaussians), or affine (e.g., categorical distributions). Those new
classes of H”older distances are invariant to rescaling, and thus do not
require distributions to be normalized. Finally, we show how to compute
statistical H”older centroids with respect to those divergences, and carry out
center-based clustering toy experiments on a set of Gaussian distributions that
demonstrate empirically that symmetrized H”older divergences outperform the
symmetric Cauchy-Schwarz divergence.

Artificial Intelligence

Hedera: Scalable Indexing and Exploring Entities in Wikipedia Revision History

Tuan Tran, Tu Ngoc Nguyen
Comments: Pubished via CEUR-WS.org/Vol-1272
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Much of work in semantic web relying on Wikipedia as the main source of
knowledge often work on static snapshots of the dataset. The full history of
Wikipedia revisions, while contains much more useful information, is still
difficult to access due to its exceptional volume. To enable further research
on this collection, we developed a tool, named Hedera, that efficiently
extracts semantic information from Wikipedia revision history datasets. Hedera
exploits Map-Reduce paradigm to achieve rapid extraction, it is able to handle
one entire Wikipedia articles revision history within a day in a medium-scale
cluster, and supports flexible data structures for various kinds of semantic
web study.

Minimally Naturalistic Artificial Intelligence

Steven Stenberg Hansen
Comments: Accepted into the NIPS 2016 Workshop on Machine Intelligence (M.A.I.N.)
Subjects: Artificial Intelligence (cs.AI)

The rapid advancement of machine learning techniques has re-energized
research into general artificial intelligence. While the idea of
domain-agnostic meta-learning is appealing, this emerging field must come to
terms with its relationship to human cognition and the statistics and structure
of the tasks humans perform. The position of this article is that only by
aligning our agents’ abilities and environments with those of humans do we
stand a chance at developing general artificial intelligence (GAI). A broad
reading of the famous ‘No Free Lunch’ theorem is that there is no universally
optimal inductive bias or, equivalently, bias-free learning is impossible. This
follows from the fact that there are an infinite number of ways to extrapolate
data, any of which might be the one used by the data generating environment; an
inductive bias prefers some of these extrapolations to others, which lowers
performance in environments using these adversarial extrapolations. We may
posit that the optimal GAI is the one that maximally exploits the statistics of
its environment to create its inductive bias; accepting the fact that this
agent is guaranteed to be extremely sub-optimal for some alternative
environments. This trade-off appears benign when thinking about the environment
as being the physical universe, as performance on any fictive universe is
obviously irrelevant. But, we should expect a sharper inductive bias if we
further constrain our environment. Indeed, we implicitly do so by defining GAI
in terms of accomplishing that humans consider useful. One common version of
this is need the for ‘common-sense reasoning’, which implicitly appeals to the
statistics of physical universe as perceived by humans.

Long Timescale Credit Assignment in NeuralNetworks with External Memory

Credit assignment in traditional recurrent neural networks usually involves
back-propagating through a long chain of tied weight matrices. The length of
this chain scales linearly with the number of time-steps as the same network is
run at each time-step. This creates many problems, such as vanishing gradients,
that have been well studied. In contrast, a NNEM’s architecture recurrent
activity doesn’t involve a long chain of activity (though some architectures
such as the NTM do utilize a traditional recurrent architecture as a
controller). Rather, the externally stored embedding vectors are used at each
time-step, but no messages are passed from previous time-steps. This means that
vanishing gradients aren’t a problem, as all of the necessary gradient paths
are short. However, these paths are extremely numerous (one per embedding
vector in memory) and reused for a very long time (until it leaves the memory).
Thus, the forward-pass information of each memory must be stored for the entire
duration of the memory. This is problematic as this additional storage far
surpasses that of the actual memories, to the extent that large memories on
infeasible to back-propagate through in high dimensional settings. One way to
get around the need to hold onto forward-pass information is to recalculate the
forward-pass whenever gradient information is available. However, if the
observations are too large to store in the domain of interest, direct
reinstatement of a forward pass cannot occur. Instead, we rely on a learned
autoencoder to reinstate the observation, and then use the embedding network to
recalculate the forward-pass. Since the recalculated embedding vector is
unlikely to perfectly match the one stored in memory, we try out 2
approximations to utilize error gradient w.r.t. the vector in memory.

Thompson Sampling For Stochastic Bandits with Graph Feedback

Aristide C. Y. Tossou, Christos Dimitrakakis, Devdatt Dubhashi
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We present a novel extension of Thompson Sampling for stochastic sequential
decision problems with graph feedback, even when the graph structure itself is
unknown and/or changing. We provide theoretical guarantees on the Bayesian
regret of the algorithm, linking its performance to the underlying properties
of the graph. Thompson Sampling has the advantage of being applicable without
the need to construct complicated upper confidence bounds for different
problems. We illustrate its performance through extensive experimental results
on real and simulated networks with graph feedback. More specifically, we
tested our algorithms on power law, planted partitions and Erdo’s-Renyi graphs,
as well as on graphs derived from Facebook and Flixster data. These all show
that our algorithms clearly outperform related methods that employ upper
confidence bounds, even if the latter use more information about the graph.

Achieving Privacy in the Adversarial Multi-Armed Bandit

Aristide C. Y. Tossou, Christos Dimitrakakis
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

In this paper, we improve the previously best known regret bound to achieve
(epsilon)-differential privacy in oblivious adversarial bandits from
(mathcal{O}{(T^{2/3}/epsilon)}) to (mathcal{O}{(sqrt{T} ln T /epsilon)}).
This is achieved by combining a Laplace Mechanism with EXP3. We show that
though EXP3 is already differentially private, it leaks a linear amount of
information in (T). However, we can improve this privacy by relying on its
intrinsic exponential mechanism for selecting actions. This allows us to reach
(mathcal{O}{(sqrt{ln T})})-DP, with a regret of (mathcal{O}{(T^{2/3})})
that holds against an adaptive adversary, an improvement from the best known of
(mathcal{O}{(T^{3/4})}). This is done by using an algorithm that run EXP3 in a
mini-batch loop. Finally, we run experiments that clearly demonstrate the
validity of our theoretical analysis.

Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks

Vahid Behzadan, Arslan Munir
Comments: 14 pages, 5 figures, pre-print of submission to MLDM ’17
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep learning classifiers are known to be inherently vulnerable to
manipulation by intentionally perturbed inputs, named adversarial examples. In
this work, we establish that reinforcement learning techniques based on Deep
Q-Networks (DQNs) are also vulnerable to adversarial input perturbations, and
verify the transferability of adversarial examples across different DQN models.
Furthermore, we present a novel class of attacks based on this vulnerability
that enable policy manipulation and induction in the learning process of DQNs.
We propose an attack mechanism that exploits the transferability of adversarial
examples to implement policy induction attacks on DQNs, and demonstrate its
efficacy and impact through experimental study of a game-learning scenario.

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks

Wenjie Luo, Yujia Li, Raquel Urtasun, Richard Zemel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

We study characteristics of receptive fields of units in deep convolutional
networks. The receptive field size is a crucial issue in many visual tasks, as
the output must respond to large enough areas in the image to capture
information about large objects. We introduce the notion of an effective
receptive field, and show that it both has a Gaussian distribution and only
occupies a fraction of the full theoretical receptive field. We analyze the
effective receptive field in several architecture designs, and the effect of
nonlinear activations, dropout, sub-sampling and skip connections on it. This
leads to suggestions for ways to address its tendency to be too small.

Near Optimal Behavior via Approximate State Abstraction

David Abel, D. Ellis Hershkowitz, Michael L. Littman
Comments: Earlier version published at ICML 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

The combinatorial explosion that plagues planning and reinforcement learning
(RL) algorithms can be moderated using state abstraction. Prohibitively large
task representations can be condensed such that essential information is
preserved, and consequently, solutions are tractably computable. However, exact
abstractions, which treat only fully-identical situations as equivalent, fail
to present opportunities for abstraction in environments where no two
situations are exactly alike. In this work, we investigate approximate state
abstractions, which treat nearly-identical situations as equivalent. We present
theoretical guarantees of the quality of behaviors derived from four types of
approximate abstractions. Additionally, we empirically demonstrate that
approximate abstractions lead to reduction in task complexity and bounded loss
of optimality of behavior in a variety of environments.

Agent-Agnostic Human-in-the-Loop Reinforcement Learning

David Abel, John Salvatier, Andreas Stuhlmüller, Owain Evans
Comments: Presented at the NIPS Workshop on the Future of Interactive Learning Machines, 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Providing Reinforcement Learning agents with expert advice can dramatically
improve various aspects of learning. Prior work has developed teaching
protocols that enable agents to learn efficiently in complex environments; many
of these methods tailor the teacher’s guidance to agents with a particular
representation or underlying learning scheme, offering effective but
specialized teaching procedures. In this work, we explore protocol programs, an
agent-agnostic schema for Human-in-the-Loop Reinforcement Learning. Our goal is
to incorporate the beneficial properties of a human teacher into Reinforcement
Learning without making strong assumptions about the inner workings of the
agent. We show how to represent existing approaches such as action pruning,
reward shaping, and training in simulation as special cases of our schema and
conduct preliminary experiments on simple domains.

A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue

Mihail Eric, Christopher D. Manning
Comments: 5 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Task-oriented dialogue focuses on conversational agents that participate in
user-initiated dialogues on domain-specific topics. In contrast to chatbots,
which simply seek to sustain open-ended meaningful discourse, existing
task-oriented agents usually explicitly model user intent and belief states.
This paper examines bypassing such an explicit representation by depending on a
latent neural embedding of state and learning selective attention to dialogue
history together with copying to incorporate relevant prior context. We
complement recent work by showing the effectiveness of simple
sequence-to-sequence neural architectures with a copy mechanism. Our model
outperforms more complex memory-augmented models by 7% in per-response
generation and is on par with the current state-of-the-art on DSTC2.

Learning to Invert: Signal Recovery via Deep Convolutional Networks

Ali Mousavi, Richard G. Baraniuk
Comments: Accepted at The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Learning (cs.LG)

The promise of compressive sensing (CS) has been offset by two significant
challenges. First, real-world data is not exactly sparse in a fixed basis.
Second, current high-performance recovery algorithms are slow to converge,
which limits CS to either non-real-time applications or scenarios where massive
back-end computing is available. In this paper, we attack both of these
challenges head-on by developing a new signal recovery framework we call {em
DeepInverse} that learns the inverse transformation from measurement vectors to
signals using a {em deep convolutional network}. When trained on a set of
representative images, the network learns both a representation for the signals
(addressing challenge one) and an inverse map approximating a greedy or convex
recovery algorithm (addressing challenge two). Our experiments indicate that
the DeepInverse network closely approximates the solution produced by
state-of-the-art CS recovery algorithms yet is hundreds of times faster in run
time. The tradeoff for the ultrafast run time is a computationally intensive,
off-line training procedure typical to deep networks. However, the training
needs to be completed only once, which makes the approach attractive for a host
of sparse recovery problems.

Real-time eSports Match Result Prediction

Yifan Yang, Tian Qin, Yu-Heng Lei
Comments: 8 pages, 8 figures
Subjects: Applications (stat.AP); Artificial Intelligence (cs.AI); Learning (cs.LG)

In this paper, we try to predict the winning team of a match in the
multiplayer eSports game Dota 2. To address the weaknesses of previous work, we
consider more aspects of prior (pre-match) features from individual players’
match history, as well as real-time (during-match) features at each minute as
the match progresses. We use logistic regression, the proposed Attribute
Sequence Model, and their combinations as the prediction models. In a dataset
of 78362 matches where 20631 matches contain replay data, our experiments show
that adding more aspects of prior features improves accuracy from 58.69% to
71.49%, and introducing real-time features achieves up to 93.73% accuracy when
predicting at the 40th minute.

Information Retrieval

Semantic classifier approach to document classification

Piotr Borkowski, Krzysztof Ciesielski, Mieczysław A. Kłopotek
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

In this paper we propose a new document classification method, bridging
discrepancies (so-called semantic gap) between the training set and the
application sets of textual data. We demonstrate its superiority over classical
text classification approaches, including traditional classifier ensembles. The
method consists in combining a document categorization technique with a single
classifier or a classifier ensemble (SEMCOM algorithm – Committee with Semantic
Categorizer).

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke
Comments: Proceedings of the 39th European Conference on Information Retrieval (ECIR2017)
Subjects: Information Retrieval (cs.IR)

A high degree of topical diversity is often considered to be an important
characteristic of interesting text documents. A recent proposal for measuring
topical diversity identifies three elements for assessing diversity: words,
topics, and documents as collections of words. Topic models play a central role
in this approach. Using standard topic models for measuring diversity of
documents is suboptimal due to generality and impurity. General topics only
include common information from a background corpus and are assigned to most of
the documents in the collection. Impure topics contain words that are not
related to the topic; impurity lowers the interpretability of topic models and
impure topics are likely to get assigned to documents erroneously. We propose a
hierarchical re-estimation approach for topic models to combat generality and
impurity; the proposed approach operates at three levels: words, topics, and
documents. Our re-estimation approach for measuring documents’ topical
diversity outperforms the state of the art on PubMed dataset which is commonly
used for diversity experiments.

The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams

David Graus, Daan Odijk, Maarten de Rijke
Comments: 37 pages, 10 figures
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

We study how collective memories are formed online. We do so by tracking
entities that emerge in public discourse, that is, in online text streams such
as social media and news streams, before they are incorporated into Wikipedia,
which, we argue, can be viewed as an online place for collective memory. By
tracking how entities emerge in public discourse, i.e., the temporal patterns
between their first mention in online text streams and subsequent incorporation
into collective memory, we gain insights into how the collective remembrance
process happens online. Specifically, we analyze nearly 80,000 entities as they
emerge in online text streams before they are incorporated into Wikipedia. The
online text streams we use for our analysis comprise of social media and news
streams, and span over 579 million documents in a timespan of 18 months. We
discover two main emergence patterns: entities that emerge in a “bursty”
fashion, i.e., that appear in public discourse without a precedent, blast into
activity and transition into collective memory. Other entities display a
“delayed” pattern, where they appear in public discourse, experience a period
of inactivity, and then resurface before transitioning into our cultural
collective memory.

Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events

Tuan Tran, Claudia Niederée, Nattiya Kanhabua, Ujwal Gadiraju, Avishek Anand
Comments: Published via ACM to CIKM 2015
Subjects: Information Retrieval (cs.IR)

Long-running, high-impact events such as the Boston Marathon bombing often
develop through many stages and involve a large number of entities in their
unfolding. Timeline summarization of an event by key sentences eases story
digestion, but does not distinguish between what a user remembers and what she
might want to re-check. In this work, we present a novel approach for timeline
summarization of high-impact events, which uses entities instead of sentences
for summarizing the event at each individual point in time. Such entity
summaries can serve as both (1) important memory cues in a retrospective event
consideration and (2) pointers for personalized event exploration. In order to
automatically create such summaries, it is crucial to identify the “right”
entities for inclusion. We propose to learn a ranking function for entities,
with a dynamically adapted trade-off between the in-document salience of
entities and the informativeness of entities across documents, i.e., the level
of new information associated with an entity for a time point under
consideration. Furthermore, for capturing collective attention for an entity we
use an innovative soft labeling approach based on Wikipedia. Our experiments on
a real large news datasets confirm the effectiveness of the proposed methods.

Can We Find Documents in Web Archives without Knowing their Contents?

Khoi Duy Vo, Tuan Tran, Tu Ngoc Nguyen, Xiaofei Zhu, Wolfgang Nejdl
Comments: Published via ACM to Websci 2015
Subjects: Information Retrieval (cs.IR)

Recent advances of preservation technologies have led to an increasing number
of Web archive systems and collections. These collections are valuable to
explore the past of the Web, but their value can only be uncovered with
effective access and exploration mechanisms. Ideal search and rank- ing methods
must be robust to the high redundancy and the temporal noise of contents, as
well as scalable to the huge amount of data archived. Despite several attempts
in Web archive search, facilitating access to Web archive still remains a
challenging problem.

In this work, we conduct a first analysis on different ranking strategies
that exploit evidences from metadata instead of the full content of documents.
We perform a first study to compare the usefulness of non-content evidences to
Web archive search, where the evidences are mined from the metadata of file
headers, links and URL strings only. Based on these findings, we propose a
simple yet surprisingly effective learning model that combines multiple
evidences to distinguish “good” from “bad” search results. We conduct empirical
experiments quantitatively as well as qualitatively to confirm the validity of
our proposed method, as a first step towards better ranking in Web archives
taking meta- data into account.

Semantic Annotation for Microblog Topics Using Wikipedia Temporal Information

Tuan Tran, Nam Khanh Tran, Teka Hadgu Asmelash, Robert Jäschke
Comments: Published via ACL to EMNLP 2025
Subjects: Information Retrieval (cs.IR)

Trending topics in microblogs such as Twitter are valuable resources to
understand social aspects of real-world events. To enable deep analyses of such
trends, semantic annotation is an effective approach; yet the problem of
annotating microblog trending topics is largely unexplored by the research
community. In this work, we tackle the problem of mapping trending Twitter
topics to entities from Wikipedia. We propose a novel model that complements
traditional text-based approaches by rewarding entities that exhibit a high
temporal correlation with topics during their burst time period. By exploiting
temporal information from the Wikipedia edit history and page view logs, we
have improved the annotation performance by 17-28\%, as compared to the
competitive baselines.

Location Inference from Tweets using Grid-based Classification

Oluwaseun Ajao, Deepak P, Jun Hong
Comments: Location Inference from Tweets using Grid-based Classification
Subjects: Information Retrieval (cs.IR)

The impact of social media and its growing association with the sharing of
ideas and propagation of messages remains vital in everyday communication.
Twitter is one effective platform for the dissemination of news and stories
about recent events happening around the world. It has a continually growing
database currently adopted by over 300 million users. In this paper we propose
a novel grid-based approach employing supervised Multinomial Naive Bayes while
extracting geographic entities from relevant user descriptions metadata which
gives a spatial indication of the user location. To the best of our knowledge
our approach is the first to make location inference from tweets using
geo-enriched grid-based classification. Our approach performs better than
existing baselines achieving more than 57% accuracy at city-level granularity.
In addition we present a novel framework for content-based estimation of user
locations by specifying levels of granularity required in pre-defined location
grids.

End-to-End ASR-free Keyword Search from Speech

End-to-end (E2E) systems have achieved competitive results compared to
conventional hybrid hidden Markov model (HMM)-deep neural network based
automatic speech recognition (ASR) systems. Such E2E systems are attractive due
to the lack of dependence on alignments between input acoustic and output
grapheme or HMM state sequence during training. This paper explores the design
of an ASR-free end-to-end system for text query-based keyword search (KWS) from
speech trained with minimal supervision. Our E2E KWS system consists of three
sub-systems. The first sub-system is a recurrent neural network (RNN)-based
acoustic auto-encoder trained to reconstruct the audio through a
finite-dimensional representation. The second sub-system is a character-level
RNN language model using embeddings learned from a convolutional neural
network. Since the acoustic and text query embeddings occupy different
representation spaces, they are input to a third feed-forward neural network
that predicts whether the query occurs in the acoustic utterance or not. This
E2E ASR-free KWS system performs respectably despite lacking a conventional ASR
system and trains much faster.

Hedera: Scalable Indexing and Exploring Entities in Wikipedia Revision History

Tuan Tran, Tu Ngoc Nguyen
Comments: Pubished via CEUR-WS.org/Vol-1272
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Much of work in semantic web relying on Wikipedia as the main source of
knowledge often work on static snapshots of the dataset. The full history of
Wikipedia revisions, while contains much more useful information, is still
difficult to access due to its exceptional volume. To enable further research
on this collection, we developed a tool, named Hedera, that efficiently
extracts semantic information from Wikipedia revision history datasets. Hedera
exploits Map-Reduce paradigm to achieve rapid extraction, it is able to handle
one entire Wikipedia articles revision history within a day in a medium-scale
cluster, and supports flexible data structures for various kinds of semantic
web study.

Computation and Language

End-to-End ASR-free Keyword Search from Speech

End-to-end (E2E) systems have achieved competitive results compared to
conventional hybrid hidden Markov model (HMM)-deep neural network based
automatic speech recognition (ASR) systems. Such E2E systems are attractive due
to the lack of dependence on alignments between input acoustic and output
grapheme or HMM state sequence during training. This paper explores the design
of an ASR-free end-to-end system for text query-based keyword search (KWS) from
speech trained with minimal supervision. Our E2E KWS system consists of three
sub-systems. The first sub-system is a recurrent neural network (RNN)-based
acoustic auto-encoder trained to reconstruct the audio through a
finite-dimensional representation. The second sub-system is a character-level
RNN language model using embeddings learned from a convolutional neural
network. Since the acoustic and text query embeddings occupy different
representation spaces, they are input to a third feed-forward neural network
that predicts whether the query occurs in the acoustic utterance or not. This
E2E ASR-free KWS system performs respectably despite lacking a conventional ASR
system and trains much faster.

Machine Translation Approaches and Survey for Indian Languages

Nadeem Jadoon Khan, Waqas Anwar, Nadir Durrani
Subjects: Computation and Language (cs.CL)

In this study, we present an analysis regarding the performance of the
state-of-art Phrase-based Statistical Machine Translation (SMT) on multiple
Indian languages. We report baseline systems on several language pairs. The
motivation of this study is to promote the development of SMT and linguistic
resources for these language pairs, as the current state-of-the-art is quite
bleak due to sparse data resources. The success of an SMT system is contingent
on the availability of a large parallel corpus. Such data is necessary to
reliably estimate translation probabilities. We report the performance of
baseline systems translating from Indian languages (Bengali, Guajarati, Hindi,
Malayalam, Punjabi, Tamil, Telugu and Urdu) into English with average 10%
accurate results for all the language pairs.

Deep Memory Networks for Attitude Identification

Cheng Li, Xiaoxiao Guo, Qiaozhu Mei
Comments: Accepted to WSDM’17
Subjects: Computation and Language (cs.CL)

We consider the task of identifying attitudes towards a given set of entities
from text. Conventionally, this task is decomposed into two separate subtasks:
target detection that identifies whether each entity is mentioned in the text,
either explicitly or implicitly, and polarity classification that classifies
the exact sentiment towards an identified entity (the target) into positive,
negative, or neutral.

Instead, we show that attitude identification can be solved with an
end-to-end machine learning architecture, in which the two subtasks are
interleaved by a deep memory network. In this way, signals produced in target
detection provide clues for polarity classification, and reversely, the
predicted polarity provides feedback to the identification of targets.
Moreover, the treatments for the set of targets also influence each other —
the learned representations may share the same semantics for some targets but
vary for others. The proposed deep memory network, the AttNet, outperforms
methods that do not consider the interactions between the subtasks or those
among the targets, including conventional machine learning methods and the
state-of-the-art deep learning models.

Dialog Context Language Modeling with Recurrent Neural Networks

Bing Liu, Ian Lane
Comments: Accepted for publication at ICASSP 2017
Subjects: Computation and Language (cs.CL)

In this work, we propose contextual language models that incorporate dialog
level discourse information into language modeling. Previous works on
contextual language model treat preceding utterances as a sequence of inputs,
without considering dialog interactions. We design recurrent neural network
(RNN) based contextual language models that specially track the interactions
between speakers in a dialog. Experiment results on Switchboard Dialog Act
Corpus show that the proposed model outperforms conventional single turn based
RNN language model by 3.3% on perplexity. The proposed models also demonstrate
advantageous performance over other competitive contextual language models.

Neural Models for Sequence Chunking

Feifei Zhai, Saloni Potdar, Bing Xiang, Bowen Zhou
Comments: Accepted by AAAI 2017
Subjects: Computation and Language (cs.CL)

Many natural language understanding (NLU) tasks, such as shallow parsing
(i.e., text chunking) and semantic slot filling, require the assignment of
representative labels to the meaningful chunks in a sentence. Most of the
current deep neural network (DNN) based methods consider these tasks as a
sequence labeling problem, in which a word, rather than a chunk, is treated as
the basic unit for labeling. These chunks are then inferred by the standard IOB
(Inside-Outside-Beginning) labels. In this paper, we propose an alternative
approach by investigating the use of DNN for sequence chunking, and propose
three neural models so that each chunk can be treated as a complete unit for
labeling. Experimental results show that the proposed neural sequence chunking
models can achieve start-of-the-art performance on both the text chunking and
slot filling tasks.

A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue

Mihail Eric, Christopher D. Manning
Comments: 5 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Task-oriented dialogue focuses on conversational agents that participate in
user-initiated dialogues on domain-specific topics. In contrast to chatbots,
which simply seek to sustain open-ended meaningful discourse, existing
task-oriented agents usually explicitly model user intent and belief states.
This paper examines bypassing such an explicit representation by depending on a
latent neural embedding of state and learning selective attention to dialogue
history together with copying to incorporate relevant prior context. We
complement recent work by showing the effectiveness of simple
sequence-to-sequence neural architectures with a copy mechanism. Our model
outperforms more complex memory-augmented models by 7% in per-response
generation and is on par with the current state-of-the-art on DSTC2.

QCRI Machine Translation Systems for IWSLT 16

Nadir Durrani, Fahim Dalvi, Hassan Sajjad, Stephan Vogel
Subjects: Computation and Language (cs.CL)

This paper describes QCRI’s machine translation systems for the IWSLT 2016
evaluation campaign. We participated in the Arabic->English and English->Arabic
tracks. We built both Phrase-based and Neural machine translation models, in an
effort to probe whether the newly emerged NMT framework surpasses the
traditional phrase-based systems in Arabic-English language pairs. We trained a
very strong phrase-based system including, a big language model, the Operation
Sequence Model, Neural Network Joint Model and Class-based models along with
different domain adaptation techniques such as MML filtering, mixture modeling
and using fine tuning over NNJM model. However, a Neural MT system, trained by
stacking data from different genres through fine-tuning, and applying ensemble
over 8 models, beat our very strong phrase-based system by a significant 2 BLEU
points margin in Arabic->English direction. We did not obtain similar gains in
the other direction but were still able to outperform the phrase-based system.
We also applied system combination on phrase-based and NMT outputs.

Deep Neural Networks for Czech Multi-label Document Classification

Ladislav Lenc, Pavel Král
Comments: Dears, this paper has been already accepted for CICLing 2016 and presented at the conference in 04 2016. Unfortunately, the proceedings is not ready yet. Therefore, we have decided to publish this paper at ArXiv. -acceptance letter begin- Dear Professor Kral, We are happy to notify you that your paper 167: “Deep Neural Networks for Czech Multi-label Document Classification” has been ACCEPTED
Subjects: Computation and Language (cs.CL)

This paper is focused on automatic multi-label document classification of
Czech text documents. The current approaches usually use some pre-processing
which can have negative impact (loss of information, additional implementation
work, etc). Therefore, we would like to omit it and use deep neural networks
that learn from simple features. This choice was motivated by their successful
usage in many other machine learning fields. Two different networks are
compared: the first one is a standard multi-layer perceptron, while the second
one is a popular convolutional network. The experiments on a Czech newspaper
corpus show that both networks significantly outperform baseline method which
uses a rich set of features with maximum entropy classifier. We have also shown
that convolutional network gives the best results.

Semantic classifier approach to document classification

Piotr Borkowski, Krzysztof Ciesielski, Mieczysław A. Kłopotek
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

In this paper we propose a new document classification method, bridging
discrepancies (so-called semantic gap) between the training set and the
application sets of textual data. We demonstrate its superiority over classical
text classification approaches, including traditional classifier ensembles. The
method consists in combining a document categorization technique with a single
classifier or a classifier ensemble (SEMCOM algorithm – Committee with Semantic
Categorizer).

The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams

David Graus, Daan Odijk, Maarten de Rijke
Comments: 37 pages, 10 figures
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

We study how collective memories are formed online. We do so by tracking
entities that emerge in public discourse, that is, in online text streams such
as social media and news streams, before they are incorporated into Wikipedia,
which, we argue, can be viewed as an online place for collective memory. By
tracking how entities emerge in public discourse, i.e., the temporal patterns
between their first mention in online text streams and subsequent incorporation
into collective memory, we gain insights into how the collective remembrance
process happens online. Specifically, we analyze nearly 80,000 entities as they
emerge in online text streams before they are incorporated into Wikipedia. The
online text streams we use for our analysis comprise of social media and news
streams, and span over 579 million documents in a timespan of 18 months. We
discover two main emergence patterns: entities that emerge in a “bursty”
fashion, i.e., that appear in public discourse without a precedent, blast into
activity and transition into collective memory. Other entities display a
“delayed” pattern, where they appear in public discourse, experience a period
of inactivity, and then resurface before transitioning into our cultural
collective memory.

DyNet: The Dynamic Neural Network Toolkit

Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin
Comments: 33 pages
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Mathematical Software (cs.MS)

We describe DyNet, a toolkit for implementing neural network models based on
dynamic declaration of network structure. In the static declaration strategy
that is used in toolkits like Theano, CNTK, and TensorFlow, the user first
defines a computation graph (a symbolic representation of the computation), and
then examples are fed into an engine that executes this computation and
computes its derivatives. In DyNet’s dynamic declaration strategy, computation
graph construction is mostly transparent, being implicitly constructed by
executing procedural code that computes the network outputs, and the user is
free to use different network structures for each input. Dynamic declaration
thus facilitates the implementation of more complicated network architectures,
and DyNet is specifically designed to allow users to implement their models in
a way that is idiomatic in their preferred programming language (C++ or
Python). One challenge with dynamic declaration is that because the symbolic
computation graph is defined anew for every training example, its construction
must have low overhead. To achieve this, DyNet has an optimized C++ backend and
lightweight graph representation. Experiments show that DyNet’s speeds are
faster than or comparable with static declaration toolkits, and significantly
faster than Chainer, another dynamic declaration toolkit. DyNet is released
open-source under the Apache 2.0 license and available at
this http URL

Distributed, Parallel, and Cluster Computing

Automatic SDF-based Code Generation from Simulink Models for Embedded Software Development

Maher Fakih, Sebastian Warsitz
Comments: 10 pages, 9 figures, Presented at HIP3ES, 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

Matlab/Simulink is a wide-spread tool for model-based design of embedded
systems. Supporting hierarchy, domain specific building blocks, functional
simulation and automatic code-generation, makes it well-suited for the design
of control and signal processing systems. In this work, we propose an automated
translation methodology for a subset of Simulink models to Synchronous dataflow
Graphs (SDFGs) including the automatic code-generation of SDF-compatible
embedded code. A translation of Simulink models to SDFGs, is very suitable due
to Simulink actor-oriented modeling nature, allowing the application of several
optimization techniques from the SDFG domain. Because of their well-defined
semantics, SDFGs can be analyzed at compiling phase to obtain deadlock-free and
memory-efficient schedules. In addition, several real-time analysis methods
exist which allow throughput-optimal mappings of SDFGs to Multiprocessor on
Chip (MPSoC) while guaranteeing upper-bounded latencies. The correctness of our
translation is justified by integrating the SDF generated code as a
software-in-the-loop (SIL) and comparing its results with the results of the
model-in-the-loop (MIL) simulation of reference Simulink models. The
translation is demonstrated with the help of two case studies: a Transmission
Controller Unit (TCU) and an Automatic Climate Control.

Analysis of Distributed ADMM Algorithm for Consensus Optimization in Presence of Error

Layla Majzoobi, Farshad Lahouti
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)

ADMM is a popular algorithm for solving convex optimization problems.
Applying this algorithm to distributed consensus optimization problem results
in a fully distributed iterative solution which relies on processing at the
nodes and communication between neighbors. Local computations usually suffer
from different types of errors, due to e.g., observation or quantization noise,
which can degrade the performance of the algorithm. In this work, we focus on
analyzing the convergence behavior of distributed ADMM for consensus
optimization in presence of additive node error. We specifically show that (a
noisy) ADMM converges linearly under certain conditions and also examine the
associated convergence point. Numerical results are provided which demonstrate
the effectiveness of the presented analysis.

Computing Resource Allocation in Three-Tier IoT Fog Networks: a Joint Optimization Approach Combining Stackelberg Game and Matching

Huaqing Zhang, Yong Xiao, Shengrong Bu, Dusit Niyato, Richard Yu, Zhu Han
Subjects: Computer Science and Game Theory (cs.GT); Distributed, Parallel, and Cluster Computing (cs.DC)

Fog computing is a promising architecture to provide economic and low latency
data services for future Internet of things (IoT)-based network systems. It
relies on a set of low-power fog nodes that are close to the end users to
offload the services originally targeting at cloud data centers. In this paper,
we consider a specific fog computing network consisting of a set of data
service operators (DSOs) each of which controls a set of fog nodes to provide
the required data service to a set of data service subscribers (DSSs). How to
allocate the limited computing resources of fog nodes (FNs) to all the DSSs to
achieve an optimal and stable performance is an important problem. In this
paper, we propose a joint optimization framework for all FNs, DSOs and DSSs to
achieve the optimal resource allocation schemes in a distributed fashion. In
the framework, we first formulate a Stackelberg game to analyze the pricing
problem for the DSOs as well as the resource allocation problem for the DSSs.
Under the scenarios that the DSOs can know the expected amount of resource
purchased by the DSSs, a many-to-many matching game is applied to investigate
the pairing problem between DSOs and FNs. Finally, within the same DSO, we
apply another layer of many-to-many matching between each of the paired FNs and
serving DSSs to solve the FN-DSS pairing problem. Simulation results show that
our proposed framework can significantly improve the performance of the
IoT-based network systems.

Learning

Classification of MRI data using Deep Learning and Gaussian Process-based Model Selection

Hadrien Bertrand, Matthieu Perrot, Roberto Ardon, Isabelle Bloch
Comments: Accepted at ISBI 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

The classification of MRI images according to the anatomical field of view is
a necessary task to solve when faced with the increasing quantity of medical
images. In parallel, advances in deep learning makes it a suitable tool for
computer vision problems. Using a common architecture (such as AlexNet)
provides quite good results, but not sufficient for clinical use. Improving the
model is not an easy task, due to the large number of hyper-parameters
governing both the architecture and the training of the network, and to the
limited understanding of their relevance. Since an exhaustive search is not
tractable, we propose to optimize the network first by random search, and then
by an adaptive search based on Gaussian Processes and Probability of
Improvement. Applying this method on a large and varied MRI dataset, we show a
substantial improvement between the baseline network and the final one (up to
20\% for the most difficult classes).

Fast Rates for Empirical Risk Minimization of Strict Saddle Problems

Alon Gonen, Shai Shalev-Shwartz
Subjects: Learning (cs.LG)

We derive bounds on the sample complexity of empirical risk minimization
(ERM) in the context of minimizing non-convex risks that admit the strict
saddle property. Recent progress in non-convex optimization has yielded
efficient algorithms for minimizing such functions. Our results imply that
these efficient algorithms are statistically stable and also generalize well.
In particular, we derive fast rates which resemble the bounds that are often
attained in the strongly convex setting. We specify our bounds to Principal
Component Analysis and Independent Component Analysis. Our results and
techniques may pave the way for statistical analyses of additional strict
saddle problems.

Learning Traffic as Images: A Deep Convolution Neural Network for Large-scale Transportation Network Speed Prediction

Xiaolei Ma, Zhuang Dai, Zhengbing He, Yunpeng Wang
Subjects: Learning (cs.LG)

This paper proposes a convolution neural network (CNN)-based method that
learns traffic as images and predicts large-scale, network-wide traffic speed
with high accuracy. Spatiotemporal traffic dynamics is converted to images
describing the time and space relations of traffic flow via a two-dimensional
time-space matrix. CNN is applied to the image following two consecutive steps:
abstract traffic feature extraction and network-wide traffic speed prediction.
The effectiveness of the proposed method is evaluated by taking two real-world
transportation networks, the second ring road and north-east transportation
network in Beijing, as examples, and comparing the method with four prevailing
algorithms, namely, ordinary least squares, k-nearest neighbors, artificial
neural network, and random forest. The results show that the proposed method
outperforms the four algorithms by an average accuracy improvement of 27.96%
within acceptable execution time. The CNN can train the model in reasonable
time and thus are suitable for large-scale transportation networks.

Thompson Sampling For Stochastic Bandits with Graph Feedback

Aristide C. Y. Tossou, Christos Dimitrakakis, Devdatt Dubhashi
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

We present a novel extension of Thompson Sampling for stochastic sequential
decision problems with graph feedback, even when the graph structure itself is
unknown and/or changing. We provide theoretical guarantees on the Bayesian
regret of the algorithm, linking its performance to the underlying properties
of the graph. Thompson Sampling has the advantage of being applicable without
the need to construct complicated upper confidence bounds for different
problems. We illustrate its performance through extensive experimental results
on real and simulated networks with graph feedback. More specifically, we
tested our algorithms on power law, planted partitions and Erdo’s-Renyi graphs,
as well as on graphs derived from Facebook and Flixster data. These all show
that our algorithms clearly outperform related methods that employ upper
confidence bounds, even if the latter use more information about the graph.

Achieving Privacy in the Adversarial Multi-Armed Bandit

Aristide C. Y. Tossou, Christos Dimitrakakis
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

In this paper, we improve the previously best known regret bound to achieve
(epsilon)-differential privacy in oblivious adversarial bandits from
(mathcal{O}{(T^{2/3}/epsilon)}) to (mathcal{O}{(sqrt{T} ln T /epsilon)}).
This is achieved by combining a Laplace Mechanism with EXP3. We show that
though EXP3 is already differentially private, it leaks a linear amount of
information in (T). However, we can improve this privacy by relying on its
intrinsic exponential mechanism for selecting actions. This allows us to reach
(mathcal{O}{(sqrt{ln T})})-DP, with a regret of (mathcal{O}{(T^{2/3})})
that holds against an adaptive adversary, an improvement from the best known of
(mathcal{O}{(T^{3/4})}). This is done by using an algorithm that run EXP3 in a
mini-batch loop. Finally, we run experiments that clearly demonstrate the
validity of our theoretical analysis.

Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks

Vahid Behzadan, Arslan Munir
Comments: 14 pages, 5 figures, pre-print of submission to MLDM ’17
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep learning classifiers are known to be inherently vulnerable to
manipulation by intentionally perturbed inputs, named adversarial examples. In
this work, we establish that reinforcement learning techniques based on Deep
Q-Networks (DQNs) are also vulnerable to adversarial input perturbations, and
verify the transferability of adversarial examples across different DQN models.
Furthermore, we present a novel class of attacks based on this vulnerability
that enable policy manipulation and induction in the learning process of DQNs.
We propose an attack mechanism that exploits the transferability of adversarial
examples to implement policy induction attacks on DQNs, and demonstrate its
efficacy and impact through experimental study of a game-learning scenario.

Near Optimal Behavior via Approximate State Abstraction

David Abel, D. Ellis Hershkowitz, Michael L. Littman
Comments: Earlier version published at ICML 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

The combinatorial explosion that plagues planning and reinforcement learning
(RL) algorithms can be moderated using state abstraction. Prohibitively large
task representations can be condensed such that essential information is
preserved, and consequently, solutions are tractably computable. However, exact
abstractions, which treat only fully-identical situations as equivalent, fail
to present opportunities for abstraction in environments where no two
situations are exactly alike. In this work, we investigate approximate state
abstractions, which treat nearly-identical situations as equivalent. We present
theoretical guarantees of the quality of behaviors derived from four types of
approximate abstractions. Additionally, we empirically demonstrate that
approximate abstractions lead to reduction in task complexity and bounded loss
of optimality of behavior in a variety of environments.

Field-aware Factorization Machines in a Real-world Online Advertising System

Yuchin Juan, Damien Lefortier, Olivier Chapelle
Subjects: Learning (cs.LG)

Predicting user response is one of the core machine learning tasks in
computational advertising. Field-aware Factorization Machines (FFM) have
recently been established as a state-of-the-art method for that problem and in
particular won two Kaggle challenges. This paper presents some results from
implementing this method in a production system that predicts click-through and
conversion rates for display advertising and shows that this method it is not
only effective to win challenges but is also valuable in a real-world
prediction system. We also discuss some specific challenges and solutions to
reduce the training time, namely the use of an innovative seeding algorithm and
a distributed learning mechanism.

Agent-Agnostic Human-in-the-Loop Reinforcement Learning

Providing Reinforcement Learning agents with expert advice can dramatically
improve various aspects of learning. Prior work has developed teaching
protocols that enable agents to learn efficiently in complex environments; many
of these methods tailor the teacher’s guidance to agents with a particular
representation or underlying learning scheme, offering effective but
specialized teaching procedures. In this work, we explore protocol programs, an
agent-agnostic schema for Human-in-the-Loop Reinforcement Learning. Our goal is
to incorporate the beneficial properties of a human teacher into Reinforcement
Learning without making strong assumptions about the inner workings of the
agent. We show how to represent existing approaches such as action pruning,
reward shaping, and training in simulation as special cases of our schema and
conduct preliminary experiments on simple domains.

Breeding electric zebras in the fields of Medicine

Federico Cabitza
Comments: Work-in-progress
Subjects: Learning (cs.LG)

A few notes on the use of machine learning in medicine.

Scalable and Incremental Learning of Gaussian Mixture Models

Rafael Pinto, Paulo Engel
Comments: 13 pages, 1 figure, submitted for peer-review. arXiv admin note: substantial text overlap with arXiv:1506.04422
Subjects: Learning (cs.LG)

This work presents a fast and scalable algorithm for incremental learning of
Gaussian mixture models. By performing rank-one updates on its precision
matrices and determinants, its asymptotic time complexity is of BigO{NKD^2}
for (N) data points, (K) Gaussian components and (D) dimensions. The resulting
algorithm can be applied to high dimensional tasks, and this is confirmed by
applying it to the classification datasets MNIST and CIFAR-10. Additionally, in
order to show the algorithm’s applicability to function approximation and
control tasks, it is applied to three reinforcement learning tasks and its
data-efficiency is evaluated.

Marked Temporal Dynamics Modeling based on Recurrent Neural Network

Yongqing Wang, Shenghua Liu, Huawei Shen, Xueqi Cheng
Subjects: Learning (cs.LG)

We are now witnessing the increasing availability of event stream data, i.e.,
a sequence of events with each event typically being denoted by the time it
occurs and its mark information (e.g., event type). A fundamental problem is to
model and predict such kind of marked temporal dynamics, i.e., when the next
event will take place and what its mark will be. Existing methods either
predict only the mark or the time of the next event, or predict both of them,
yet separately. Indeed, in marked temporal dynamics, the time and the mark of
the next event are highly dependent on each other, requiring a method that
could simultaneously predict both of them. To tackle this problem, in this
paper, we propose to model marked temporal dynamics by using a mark-specific
intensity function to explicitly capture the dependency between the mark and
the time of the next event. Extensive experiments on two datasets demonstrate
that the proposed method outperforms state-of-the-art methods at predicting
marked temporal dynamics.

On Hölder projective divergences

Frank Nielsen, Ke Sun, Stéphane Marchand-Maillet
Comments: 25 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

We describe a framework to build distances by measuring the tightness of
inequalities, and introduce the notion of proper statistical divergences and
improper pseudo-divergences. We then consider the H”older ordinary and reverse
inequalities, and present two novel classes of H”older divergences and
pseudo-divergences that both encapsulate the special case of the Cauchy-Schwarz
divergence. We report closed-form formulas for those statistical
dissimilarities when considering distributions belonging to the same
exponential family provided that the natural parameter space is a cone (e.g.,
multivariate Gaussians), or affine (e.g., categorical distributions). Those new
classes of H”older distances are invariant to rescaling, and thus do not
require distributions to be normalized. Finally, we show how to compute
statistical H”older centroids with respect to those divergences, and carry out
center-based clustering toy experiments on a set of Gaussian distributions that
demonstrate empirically that symmetrized H”older divergences outperform the
symmetric Cauchy-Schwarz divergence.

End-to-End ASR-free Keyword Search from Speech

End-to-end (E2E) systems have achieved competitive results compared to
conventional hybrid hidden Markov model (HMM)-deep neural network based
automatic speech recognition (ASR) systems. Such E2E systems are attractive due
to the lack of dependence on alignments between input acoustic and output
grapheme or HMM state sequence during training. This paper explores the design
of an ASR-free end-to-end system for text query-based keyword search (KWS) from
speech trained with minimal supervision. Our E2E KWS system consists of three
sub-systems. The first sub-system is a recurrent neural network (RNN)-based
acoustic auto-encoder trained to reconstruct the audio through a
finite-dimensional representation. The second sub-system is a character-level
RNN language model using embeddings learned from a convolutional neural
network. Since the acoustic and text query embeddings occupy different
representation spaces, they are input to a third feed-forward neural network
that predicts whether the query occurs in the acoustic utterance or not. This
E2E ASR-free KWS system performs respectably despite lacking a conventional ASR
system and trains much faster.

Geometric features for voxel-based surface recognition

Dmitry Yarotsky
Comments: 13 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We introduce a library of geometric voxel features for CAD surface
recognition/retrieval tasks. Our features include local versions of the
intrinsic volumes (the usual 3D volume, surface area, integrated mean and
Gaussian curvature) and a few closely related quantities. We also compute Haar
wavelet and statistical distribution features by aggregating raw voxel
features. We apply our features to object classification on the ESB data set
and demonstrate accurate results with a small number of shallow decision trees.

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks

Wenjie Luo, Yujia Li, Raquel Urtasun, Richard Zemel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

We study characteristics of receptive fields of units in deep convolutional
networks. The receptive field size is a crucial issue in many visual tasks, as
the output must respond to large enough areas in the image to capture
information about large objects. We introduce the notion of an effective
receptive field, and show that it both has a Gaussian distribution and only
occupies a fraction of the full theoretical receptive field. We analyze the
effective receptive field in several architecture designs, and the effect of
nonlinear activations, dropout, sub-sampling and skip connections on it. This
leads to suggestions for ways to address its tendency to be too small.

An Online Convex Optimization Approach to Dynamic Network Resource Allocation

Tianyi Chen, Qing Ling, Georgios B. Giannakis
Subjects: Systems and Control (cs.SY); Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Existing approaches to online convex optimization (OCO) make sequential
one-slot-ahead decisions, which lead to (possibly adversarial) losses that
drive subsequent decision iterates. Their performance is evaluated by the
so-called regret that measures the difference of losses between the online
solution and the best yet fixed overall solution in hindsight. The present
paper deals with online convex optimization involving adversarial loss
functions and adversarial constraints, where the constraints are revealed after
making decisions, and can be tolerable to instantaneous violations but must be
satisfied in the long term. Performance of an online algorithm in this setting
is assessed by: i) the difference of its losses relative to the best dynamic
solution with one-slot-ahead information of the loss function and the
constraint (that is here termed dynamic regret); and, ii) the accumulated
amount of constraint violations (that is here termed dynamic fit). In this
context, a modified online saddle-point (MOSP) scheme is developed, and proved
to simultaneously yield sub-linear dynamic regret and fit, provided that the
accumulated variations of per-slot minimizers and constraints are sub-linearly
growing with time. MOSP is also applied to the dynamic network resource
allocation task, and it is compared with the well-known stochastic dual
gradient method. Under various scenarios, numerical experiments demonstrate the
performance gain of MOSP relative to the state-of-the-art.

Communication-Efficient Algorithms for Decentralized and Stochastic Optimization

Guanghui Lan, Soomin Lee, Yi Zhou
Subjects: Optimization and Control (math.OC); Learning (cs.LG)

We present a new class of decentralized first-order methods for nonsmooth and
stochastic optimization problems defined over multiagent networks. Considering
that communication is a major bottleneck in decentralized optimization, our
main goal in this paper is to develop algorithmic frameworks which can
significantly reduce the number of inter-node communications. We first propose
a decentralized primal-dual method which can find an (epsilon)-solution both
in terms of functional optimality gap and feasibility residual in
(O(1/epsilon)) inter-node communication rounds when the objective functions
are convex and the local primal subproblems are solved exactly. Our major
contribution is to present a new class of decentralized primal-dual type
algorithms, namely the decentralized communication sliding (DCS) methods, which
can skip the inter-node communications while agents solve the primal
subproblems iteratively through linearizations of their local objective
functions. By employing DCS, agents can still find an (epsilon)-solution in
(O(1/epsilon)) (resp., (O(1/sqrt{epsilon}))) communication rounds for
general convex functions (resp., strongly convex functions), while maintaining
the (O(1/epsilon^2)) (resp., (O(1/epsilon))) bound on the total number of
intra-node subgradient evaluations. We also present a stochastic counterpart
for these algorithms, denoted by SDCS, for solving stochastic optimization
problems whose objective function cannot be evaluated exactly. In comparison
with existing results for decentralized nonsmooth and stochastic optimization,
we can reduce the total number of inter-node communication rounds by orders of
magnitude while still maintaining the optimal complexity bounds on intra-node
stochastic subgradient evaluations. The bounds on the subgradient evaluations
are actually comparable to those required for centralized nonsmooth and
stochastic optimization.

Learning to Invert: Signal Recovery via Deep Convolutional Networks

The promise of compressive sensing (CS) has been offset by two significant
challenges. First, real-world data is not exactly sparse in a fixed basis.
Second, current high-performance recovery algorithms are slow to converge,
which limits CS to either non-real-time applications or scenarios where massive
back-end computing is available. In this paper, we attack both of these
challenges head-on by developing a new signal recovery framework we call {em
DeepInverse} that learns the inverse transformation from measurement vectors to
signals using a {em deep convolutional network}. When trained on a set of
representative images, the network learns both a representation for the signals
(addressing challenge one) and an inverse map approximating a greedy or convex
recovery algorithm (addressing challenge two). Our experiments indicate that
the DeepInverse network closely approximates the solution produced by
state-of-the-art CS recovery algorithms yet is hundreds of times faster in run
time. The tradeoff for the ultrafast run time is a computationally intensive,
off-line training procedure typical to deep networks. However, the training
needs to be completed only once, which makes the approach attractive for a host
of sparse recovery problems.

Long Timescale Credit Assignment in NeuralNetworks with External Memory

Credit assignment in traditional recurrent neural networks usually involves
back-propagating through a long chain of tied weight matrices. The length of
this chain scales linearly with the number of time-steps as the same network is
run at each time-step. This creates many problems, such as vanishing gradients,
that have been well studied. In contrast, a NNEM’s architecture recurrent
activity doesn’t involve a long chain of activity (though some architectures
such as the NTM do utilize a traditional recurrent architecture as a
controller). Rather, the externally stored embedding vectors are used at each
time-step, but no messages are passed from previous time-steps. This means that
vanishing gradients aren’t a problem, as all of the necessary gradient paths
are short. However, these paths are extremely numerous (one per embedding
vector in memory) and reused for a very long time (until it leaves the memory).
Thus, the forward-pass information of each memory must be stored for the entire
duration of the memory. This is problematic as this additional storage far
surpasses that of the actual memories, to the extent that large memories on
infeasible to back-propagate through in high dimensional settings. One way to
get around the need to hold onto forward-pass information is to recalculate the
forward-pass whenever gradient information is available. However, if the
observations are too large to store in the domain of interest, direct
reinstatement of a forward pass cannot occur. Instead, we rely on a learned
autoencoder to reinstate the observation, and then use the embedding network to
recalculate the forward-pass. Since the recalculated embedding vector is
unlikely to perfectly match the one stored in memory, we try out 2
approximations to utilize error gradient w.r.t. the vector in memory.

Information Theory

Better Protocol for XOR Game using Communication Protocol

Ryuhei Mori
Comments: 11 pages
Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)

Bri”et et al. showed that an efficient communication protocol implies a
reliable XOR game protocol. In this work, we improve this relationship, and
obtain a nontrivial lower bound (2log3approx 3.1699) of XOR-amortized
communication complexity of the equality function. The proof uses an elegant
idea of Pawl owski et al. in a paper on information causality. Although the
improvement of a lower bound of a communication complexity is at most a factor
2, all arguments and proofs in this work are quite simple and intuitive.

The Velocity of the Propagating Wave for Spatially Coupled Systems with Applications to LDPC Codes

Rafah El-Khatib, Nicolas Macris
Comments: 33 pages, 12 figures
Subjects: Information Theory (cs.IT); Pattern Formation and Solitons (nlin.PS)

We consider the dynamics of message passing for spatially coupled codes and,
in particular, the set of density evolution equations that tracks the profile
of decoding errors along the spatial direction of coupling. It is known that,
for suitable boundary conditions and after a transient phase, the error profile
exhibits a “solitonic behavior”. Namely, a uniquely-shaped wavelike solution
develops, that propagates with constant velocity. Under this assumption we
derive an analytical formula for the velocity in the framework of a continuum
limit of the spatially coupled system. The general formalism is developed for
spatially coupled low-density parity-check codes on general binary memoryless
symmetric channels which form the main system of interest in this work. We
apply the formula for special channels and illustrate that it matches the
direct numerical evaluation of the velocity for a wide range of noise values. A
possible application of the velocity formula to the evaluation of finite size
scaling law parameters is also discussed. We conduct a similar analysis for
general scalar systems and illustrate the findings with applications to
compressive sensing and generalized low-density parity-check codes on the
binary erasure or binary symmetric channels.

Generalized Expectation Consistent Signal Recovery for Nonlinear Measurements

Hengtao He, Chao-Kai Wen, Shi Jin
Comments: 6 pages, 4 figures
Subjects: Information Theory (cs.IT)

In this paper, we propose a generalized expectation consistent signal
recovery algorithm to estimate the signal (mathbf{x}) from the nonlinear
measurements of a linear transform output (mathbf{z}=mathbf{A}mathbf{x}).
This estimation problem has been encountered in many applications, such as
communications with front-end impairments, compressed sensing, and phase
retrieval. The proposed algorithm extends the prior art called generalized
turbo signal recovery from a partial discrete Fourier transform matrix
(mathbf{A}) to a class of general matrices. Numerical results show the
excellent agreement of the proposed algorithm with the theoretical
Bayesian-optimal estimator derived using the replica method.

Stable Wireless Network Control Under Service Constraints

Martin Kasparick, Gerhard Wunder
Comments: Accepted for publication in IEEE Transactions on Control of Network Systems
Subjects: Information Theory (cs.IT)

We consider the design of wireless queueing network control policies with
particular focus on combining stability with additional application-dependent
requirements. Thereby, we consequently pursue a cost function based approach
that provides the flexibility to incorporate constraints and requirements of
particular services or applications. As typical examples of such requirements,
we consider the reduction of buffer underflows in case of streaming traffic,
and energy efficiency in networks of battery powered nodes. Compared to the
classical throughput optimal control problem, such requirements significantly
complicate the control problem. We provide easily verifyable theoretical
conditions for stability, and, additionally, compare various candidate cost
functions applied to wireless networks with streaming media traffic. Moreover,
we demonstrate how the framework can be applied to the problem of energy
efficient routing, and we demonstrate the aplication of our framework in
cross-layer control problems for wireless multihop networks, using an advanced
power control scheme for interference mitigation, based on successive convex
approximation. In all scenarios, the performance of our control framework is
evaluated using extensive numerical simulations.

Control Capacity

Gireeja Ranade, Anant Sahai
Comments: 52 pages
Subjects: Information Theory (cs.IT); Systems and Control (cs.SY)

Feedback control actively dissipates uncertainty from a dynamical system by
means of actuation. We develop a notion of “control capacity” that gives a
fundamental limit (in bits) on the rate at which a controller can dissipate the
uncertainty from a system, i.e. stabilize to a known fixed point. We give a
computable single-letter characterization of control capacity for memoryless
stationary scalar multiplicative actuation channels. Control capacity allows us
to answer questions of stabilizability for scalar linear systems: a system with
actuation uncertainty is stabilizable if and only if the control capacity is
larger than the log of the unstable open-loop eigenvalue.

For second-moment senses of stability, we recover the classic uncertainty
threshold principle result. However, our definition of control capacity can
quantify the stabilizability limits for any moment of stability. Our
formulation parallels the notion of Shannon’s communication capacity, and thus
yields both a strong converse and a way to compute the value of
side-information in control. The results in our paper are motivated by
bit-level models for control that build on the deterministic models that are
widely used to understand information flows in wireless network information
theory.

Secret sharing schemes based on additive codes over (GF(4))

Jon-Lark Kim, Nari Lee
Comments: 20 pages
Subjects: Information Theory (cs.IT)

A secret sharing scheme (SSS) was introduced by Shamir in 1979 using
polynomial interpolation. Later it turned out that it is equivalent to an SSS
based on a Reed-Solomon code. SSSs based on linear codes have been studied by
many researchers. However there is little research on SSSs based on additive
codes. In this paper, we study SSSs based on additive codes over (GF(4)) and
show that they require at least two steps of calculations to reveal the secret.
We also define minimal access structures of SSSs from additive codes over
(GF(4)) and describe SSSs using some interesting additive codes over (GF(4))
which contain generalized 2-designs.

A Projection Decoding of a Binary Extremal Self-Dual Code of Length (40)

Jon-Lark Kim, Nari Lee
Comments: 23 pages
Subjects: Information Theory (cs.IT)

As far as we know, there is no decoding algorithm of any binary self-dual
([40, 20, 8]) code except for the syndrome decoding applied to the code
directly. This syndrome decoding for a binary self-dual ([40,20,8]) code is not
efficient in the sense that it cannot be done by hand due to a large syndrome
table. The purpose of this paper is to give two new efficient decoding
algorithms for an extremal binary doubly-even self-dual ([40,20, 8]) code
(C_{40,1}^{DE}) by hand with the help of a Hermitian self-dual ([10,5,4]) code
(E_{10}) over (GF(4)). The main idea of this decoding is to project codewords
of (C_{40,1}^{DE}) onto (E_{10}) so that it reduces the complexity of the
decoding of (C_{40,1}^{DE}). The first algorithm is called the representation
decoding algorithm. It is based on the pattern of codewords of (E_{10}). Using
certain automorphisms of (E_{10}), we show that only eight types of codewords
of (E_{10}) can produce all the codewords of (E_{10}). The second algorithm is
called the syndrome decoding algorithm based on (E_{10}). It first solves the
syndrome equation in (E_{10}) and finds a corresponding binary codeword of
(C_{40,1}^{DE}).

Some Bounds on Binary LCD Codes

Lucky Galvez, Jon-Lark Kim, Nari Lee, Young Gun Roe, Byung-Sun Won
Comments: 9 pages, 2 tables
Subjects: Information Theory (cs.IT)

A linear code with a complementary dual (or LCD code) is defined to be a
linear code (C) whose dual code (C^{perp}) satisfies (C cap C^{perp})=
(left{ mathbf{0}
ight} ). Let (LCD{[}n,k{]}) denote the maximum of
possible values of (d) among ([n,k,d]) binary LCD codes. We give exact values
of (LCD{[}n,k{]}) for (1 le k le n le 12).

We also show that (LCD[n,n-i]=2) for any (igeq2) and (ngeq2^{i}).
Furthermore, we show that (LCD[n,k]leq LCD[n,k-1]) for (k) odd and
(LCD[n,k]leq LCD[n,k-2]) for (k) even.

Capacity and delay scaling for broadcast transmission in highly mobile wireless networks

Rajat Talak, Sertac Karaman, Eytan Modiano
Subjects: Information Theory (cs.IT)

We study broadcast capacity and minimum delay scaling laws for highly mobile
wireless networks, in which each node has to disseminate or broadcast packets
to all other nodes in the network. In particular, we consider a cell
partitioned network under the simplified independent and identically
distributed (IID) mobility model, in which each node chooses a new cell at
random every time slot. We derive scaling laws for broadcast capacity and
minimum delay as a function of the cell size. We propose a simple
first-come-first-serve (FCFS) flooding scheme that nearly achieves both
capacity and minimum delay scaling. Our results show that high mobility does
not improve broadcast capacity, and that both capacity and delay improve with
increasing cell sizes. In contrast to what has been speculated in the
literature we show that there is (nearly) no tradeoff between capacity and
delay. Our analysis makes use of the theory of Markov Evolving Graphs (MEGs)
and develops two new bounds on flooding time in MEGs by relaxing the previously
required expander property assumption.

Optimal Repair Schemes for Some Families of Full-Length Reed-Solomon Codes

Hoang Dau, Olgica Milenkovic
Comments: 5 pages
Subjects: Information Theory (cs.IT)

Reed-Solomon codes have found many applications in practical storage systems,
but were until recently considered unsuitable for distributed storage
applications due to the widely-held belief that they have poor repair
bandwidth. The work of Guruswami and Wootters (STOC’16) has shown that one can
actually perform bandwidth-efficient linear repair with Reed-Solomon codes:
When the codes are over the field (mathbb{F}_{q^t}) and the number of parities
(r geq q^s), where ((t-s)) divides (t), there exists a linear scheme that
achieves a repair bandwidth of ((n-1)(t-s)log_2 q) bits. We extend this result
by showing the existence of such a linear repair scheme for every (1 leq s <
t). Moreover, our new schemes are optimal among all linear repair schemes for
Reed-Solomon codes when (n = q^t) and (r = q^s). Additionally, we improve the
lower bound on the repair bandwidth for Reed-Solomon codes, also established in
the work of Guruswami and Wootters.

Multiple Access Wiretap Channel with Noiseless Feedback

Bin Dai, Zheng Ma
Comments: 17pages
Subjects: Information Theory (cs.IT)

The physical layer security in the up-link of the wireless communication
systems is often modeled as the multiple access wiretap channel (MAC-WT), and
recently it has received a lot attention. In this paper, the MAC-WT has been
re-visited by considering the situation that the legitimate receiver feeds his
received channel output back to the transmitters via two noiseless channels,
respectively. This model is called the MAC-WT with noiseless feedback. Inner
and outer bounds on the secrecy capacity region of this feedback model are
provided. To be specific, we first present a decode-and-forward (DF) inner
bound on the secrecy capacity region of this feedback model, and this bound is
constructed by allowing each transmitter to decode the other one’s transmitted
message from the feedback, and then each transmitter uses the decoded message
to re-encode his own messages, i.e., this DF inner bound allows the independent
transmitters to co-operate with each other. Then, we provide a hybrid inner
bound which is strictly larger than the DF inner bound, and it is constructed
by using the feedback as a tool not only to allow the independent transmitters
to co-operate with each other, but also to generate two secret keys
respectively shared between the legitimate receiver and the two transmitters.
Finally, we give a sato-type outer bound on the secrecy capacity region of this
feedback model. The results of this paper are further explained via a Gaussian
example.

Making compression algorithms for Unicode text

Adam Gleave, Christian Steinruecken
Comments: 10 pages
Subjects: Information Theory (cs.IT)

The majority of online content is written in languages other than English,
and is most commonly encoded in UTF-8, the world’s dominant Unicode character
encoding. Traditional compression algorithms typically operate on individual
bytes. While this approach works well for the single-byte ASCII encoding, it
works poorly for UTF-8, where characters often span multiple bytes. Previous
research has focused on developing Unicode compressors from scratch, which
often failed to outperform established algorithms such as bzip2. We develop a
technique to modify byte-based compressors to operate directly on Unicode
characters, and implement variants of LZW and PPM that apply this technique. We
find that our method substantially improves compression effectiveness on a
UTF-8 corpus, with our PPM variant outperforming the state-of-the-art PPMII
compressor. On ASCII and binary files, our variants perform similarly to the
original unmodified compressors.

Using data-compressors for statistical analysis of problems on homogeneity testing and classification

Boris Ryabko, Andrey Guskov, Irina Selivanova
Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

Nowadays data compressors are applied to many problems of text analysis, but
many such applications are developed outside of the framework of mathematical
statistics. In this paper we overcome this obstacle and show how several
methods of classical mathematical statistics can be developed based on
applications of the data compressors.

On Hermitian LCD codes from cyclic codes and their applications to orthogonal direct sum masking

Chengju Li
Comments: 16 pages
Subjects: Information Theory (cs.IT)

Cyclic codes are an interesting type of linear codes and have wide
applications in communication and storage systems due to their efficient
encoding and decoding algorithms. It was proved that asymptotically good
Hermitian LCD codes exist. The objective of this paper is to construct some
cyclic Hermitian LCD codes over finite fields and analyse their parameters. The
dimensions of these codes are settled and the lower bounds on their minimum
distances are presented. Most Hermitian LCD codes presented in this paper are
not BCH codes. In addition, we employ Hermitian LCD codes to propose a
Hermitian orthogonal direct sum masking scheme that achieves protection against
fault injection attacks. It is shown that the codes with great minimum
distances are desired to improve the resistance.

The Passive Eavesdropper Affects my Channel: Secret-Key Rates under Real-World Conditions (Extended Version)

Christan Zenger, Hendrik Vogt, Jan Zimmer, Aydin Sezgin, Christof Paar
Comments: Full measurement in Appendix
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

Channel-reciprocity based key generation (CRKG) has gained significant
importance as it has recently been proposed as a potential lightweight security
solution for IoT devices. However, the impact of the attacker’s position in
close range has only rarely been evaluated in practice, posing an open research
problem about the security of real-world realizations. Furthermore, this would
further bridge the gap between theoretical channel models and their
practice-oriented realizations. For security metrics, we utilize
cross-correlation, mutual information, and a lower bound on secret-key
capacity. We design a practical setup of three parties such that the channel
statistics, although based on joint randomness, are always reproducible. We run
experiments to obtain channel states and evaluate the aforementioned metrics
for the impact of an attacker depending on his position. It turns out the
attacker himself affects the outcome, which has not been adequately regarded
yet in standard channel models.

Performance of time delay estimation in a cognitive radar

Kumar Vijay Mishra, Yonina C. Eldar
Comments: 5 pages, 1 figure
Subjects: Information Theory (cs.IT)

A cognitive radar adapts the transmit waveform in response to changes in the
radar and target environment. In this work, we analyze the recently proposed
sub-Nyquist cognitive radar wherein the total transmit power in a multi-band
cognitive waveform remains the same as its full-band conventional counterpart.
For such a system, we derive lower bounds on the mean-squared-error (MSE) of a
single-target time delay estimate. We formulate a procedure to select the
optimal bands, and recommend distribution of the total power in different bands
to enhance the accuracy of delay estimation. In particular, using Cram’er-Rao
bounds, we show that equi-width subbands in cognitive radar always have better
delay estimation than the conventional radar. Further analysis using Ziv-Zakai
bound reveals that cognitive radar performs well in low signal-to-noise (SNR)
regions.

Multipermutation Ulam Sphere Analysis Toward Characterizing Maximal Code Size

Justin Kong, Manabu Hagiwara
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

Permutation codes, in the form of rank modulation, have shown promise for
applications such as flash memory. One of the metrics recently suggested as
appropriate for rank modulation is the Ulam metric, which measures the minimum
translocation distance between permutations. Multipermutation codes have also
been proposed as a generalization of permutation codes that would improve code
size (and consequently the code rate). In this paper we analyze the Ulam metric
in the context of multipermutations, noting some similarities and differences
between the Ulam metric in the context of permutations. We also consider sphere
sizes for multipermutations under the Ulam metric and resulting bounds on code
size.

Raptor Codes for Higher-Order Modulation Using a Multi-Edge Framework

Sachini Jayasooriya, Mahyar Shirvanimoghaddam, Lawrence Ong, Sarah J. Johnson
Comments: This work has been submitted to the IEEE for possible publication
Subjects: Information Theory (cs.IT)

In this paper, we represent Raptor codes as multi-edge type low-density
parity-check (MET-LDPC) codes, which gives a general framework to design them
for higher-order modulation using MET density evolution. We then propose an
efficient Raptor code design method for higher-order modulation, where we
design distinct degree distributions for distinct bit levels. We consider a
joint decoding scheme based on belief propagation for Raptor codes and also
derive an exact expression for the stability condition. In several examples, we
demonstrate that the higher-order modulated Raptor codes designed using the
multi-edge framework outperform previously reported higher-order modulation
codes in literature.

Cooperative Multi-Sender Index Coding

Min Li, Lawrence Ong, Sarah J. Johnson
Subjects: Information Theory (cs.IT)

In this paper, we establish new capacity bounds for the multi-sender unicast
index-coding problem. We first revisit existing outer and inner bounds proposed
by Sadeghi et al. and identify their suboptimality in general. We then propose
a new multi-sender maximal-acyclic-induced-subgraph outer bound that improves
upon the existing bound in two aspects. For inner bound, we identify
shortcomings of the state-of-the-art partitioned Distributed Composite Coding
(DCC) in the strategy of sender partitioning and in the implementation of
multi-sender composite coding. We then modify the existing sender partitioning
by a new joint link-and-sender partitioning technique, which allows each sender
to split its link capacity so as to contribute to collaborative transmissions
in multiple groups if necessary. This leads to a modified DCC (mDCC) scheme
that is shown to outperform partitioned DCC and suffice to achieve optimality
for some index-coding instances. We also propose cooperative compression of
composite messages in composite coding to exploit the potential overlapping of
messages at different senders to support larger composite rates than those by
point-to-point compression in the existing DCC schemes. Combining joint
partitioning and cooperative compression said, we develop a new multi-sender
Cooperative Composite Coding (CCC) scheme for the problem. The CCC scheme
improves upon partitioned DCC and mDCC in general, and is the key to achieve
optimality for a number of index-coding instances. The usefulness of each
scheme is illuminated via examples, and the capacity region is established for
each example.

A General Sequential Delay-Doppler Estimation Scheme for Sub-Nyquist Pulse-Doppler Radar

Shengyao Chen, Feng Xi, Zhong Liu
Comments: 23 pages, 8 figures
Subjects: Information Theory (cs.IT)

Sequential estimation of the delay and Doppler parameters for sub-Nyquist
radars by analog-to-information conversion (AIC) systems has received wide
attention recently. However, the estimation methods reported are AIC-dependent
and have poor performance for off-grid targets. This paper develops a general
estimation scheme in the sense that it is applicable to all AICs regardless
whether the targets are on or off the grids. The proposed scheme estimates the
delay and Doppler parameters sequentially, in which the delay estimation is
formulated into a beamspace direction-of- arrival problem and the Doppler
estimation is translated into a line spectrum estimation problem. Then the
well-known spatial and temporal spectrum estimation techniques are used to
provide efficient and high-resolution estimates of the delay and Doppler
parameters. In addition, sufficient conditions on the AIC to guarantee the
successful estimation of off-grid targets are provided, while the existing
conditions are mostly related to the on-grid targets. Theoretical analyses and
numerical experiments show the effectiveness and the correctness of the
proposed scheme.

Performance Analysis of Low-Density Parity-Check Codes over 2D Interference Channels via Density Evolution

Jun Yao, Kah Chan Teh, Kwok Hung Li
Subjects: Information Theory (cs.IT)

The theoretical analysis of detection and decoding of low-density
parity-check (LDPC) codes transmitted over channels with two-dimensional (2D)
interference and additive white Gaussian noise (AWGN) is provided in this
paper. The detection and decoding system adopts the joint iterative detection
and decoding scheme (JIDDS) in which the log-domain sum-product algorithm is
adopted to decode the LDPC codes. The graph representations of the JIDDS are
explained. Using the graph representations, we prove that the message-flow
neighborhood of the detection and decoding system will be tree-like for a
sufficiently long code length. We further confirm that the performance of the
JIDDS will concentrate around the performance in which message-flow
neighborhood is tree-like. Based on the tree-like message-flow neighborhood, we
employ a modified density evolution algorithm to track the message densities
during the iterations. A threshold is calculated using the density evolution
algorithm which can be considered as the theoretical performance limit of the
system. Simulation results demonstrate that the modified density evolution is
effective in analyzing the performance of 2D interference systems.

On the Achievable Secrecy Diversity of Cooperative Networks with Untrusted Relays

Mohaned Chraiti, Ali Ghrayeb, Chadi Assi, Mazen O. Hasna
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

Cooperative relaying is often deployed to enhance the communication
reliability (i.e., diversity order) and consequently the end-to-end achievable
rate. However, this raises several security concerns when the relays are
untrusted since they may have access to the relayed message. In this paper, we
study the achievable secrecy diversity order of cooperative networks with
untrusted relays. In particular, we consider a network with an N-antenna
transmitter (Alice), K single-antenna relays, and a single-antenna destination
(Bob). We consider the general scenario where there is no relation between N
and K, and therefore K can be larger than N. Alice and Bob are assumed to be
far away from each other, and all communication is done through the relays,
i.e., there is no direct link. Providing secure communication while enhancing
the diversity order has been shown to be very challenging. In fact, it has been
shown in the literature that the maximum achievable secrecy diversity order for
the adopted system model is one (while using artificial noise jamming). In this
paper, we adopt a nonlinear interference alignment scheme that we have proposed
recently to transmit the signals from Alice to Bob. We analyze the proposed
scheme in terms of the achievable secrecy rate and secrecy diversity order.
Assuming Gaussian inputs, we derive an explicit expression for the achievable
secrecy rate and show analytically that a secrecy diversity order of up to
min(N,K)-1 can be achieved using the proposed technique. We provide several
numerical examples to validate the obtained analytical results and demonstrate
the superiority of the proposed technique to its counterparts that exist in the
literature.

Origin of life in a digital microcosm

Nitash C G, Thomas LaBar, Arend Hintze, Christoph Adami (Michigan State University)
Comments: 20 pages, 7 figures. To appear in special issue of Philosophical Transactions of the Royal Society A: Re-Conceptualizing the Origins of Life from a Physical Sciences Perspective
Subjects: Populations and Evolution (q-bio.PE); Information Theory (cs.IT); Adaptation and Self-Organizing Systems (nlin.AO); Molecular Networks (q-bio.MN)

While all organisms on Earth descend from a common ancestor, there is no
consensus on whether the origin of this ancestral self-replicator was a one-off
event or whether it was only the final survivor of multiple origins. Here we
use the digital evolution system Avida to study the origin of self-replicating
computer programs. By using a computational system, we avoid many of the
uncertainties inherent in any biochemical system of self-replicators (while
running the risk of ignoring a fundamental aspect of biochemistry). We
generated the exhaustive set of minimal-genome self-replicators and analyzed
the network structure of this fitness landscape. We further examined the
evolvability of these self-replicators and found that the evolvability of a
self-replicator is dependent on its genomic architecture. We studied the
differential ability of replicators to take over the population when competed
against each other (akin to a primordial-soup model of biogenesis) and found
that the probability of a self-replicator out-competing the others is not
uniform. Instead, progenitor (most-recent common ancestor) genotypes are
clustered in a small region of the replicator space. Our results demonstrate
how computational systems can be used as test systems for hypotheses concerning
the origin of life.

On Hölder projective divergences

Frank Nielsen, Ke Sun, Stéphane Marchand-Maillet
Comments: 25 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

We describe a framework to build distances by measuring the tightness of
inequalities, and introduce the notion of proper statistical divergences and
improper pseudo-divergences. We then consider the H”older ordinary and reverse
inequalities, and present two novel classes of H”older divergences and
pseudo-divergences that both encapsulate the special case of the Cauchy-Schwarz
divergence. We report closed-form formulas for those statistical
dissimilarities when considering distributions belonging to the same
exponential family provided that the natural parameter space is a cone (e.g.,
multivariate Gaussians), or affine (e.g., categorical distributions). Those new
classes of H”older distances are invariant to rescaling, and thus do not
require distributions to be normalized. Finally, we show how to compute
statistical H”older centroids with respect to those divergences, and carry out
center-based clustering toy experiments on a set of Gaussian distributions that
demonstrate empirically that symmetrized H”older divergences outperform the
symmetric Cauchy-Schwarz divergence.

Learning to Invert: Signal Recovery via Deep Convolutional Networks

The promise of compressive sensing (CS) has been offset by two significant
challenges. First, real-world data is not exactly sparse in a fixed basis.
Second, current high-performance recovery algorithms are slow to converge,
which limits CS to either non-real-time applications or scenarios where massive
back-end computing is available. In this paper, we attack both of these
challenges head-on by developing a new signal recovery framework we call {em
DeepInverse} that learns the inverse transformation from measurement vectors to
signals using a {em deep convolutional network}. When trained on a set of
representative images, the network learns both a representation for the signals
(addressing challenge one) and an inverse map approximating a greedy or convex
recovery algorithm (addressing challenge two). Our experiments indicate that
the DeepInverse network closely approximates the solution produced by
state-of-the-art CS recovery algorithms yet is hundreds of times faster in run
time. The tradeoff for the ultrafast run time is a computationally intensive,
off-line training procedure typical to deep networks. However, the training
needs to be completed only once, which makes the approach attractive for a host
of sparse recovery problems.

Quantum and Super-quantum enhancements to two-sender, two-receiver channels

Yihui Quek, Peter Shor
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We study the consequences of ‘super-quantum non-local correlations’ as
represented by the PR-box model of Popescu and Rohrlich, and show PR-boxes can
enhance the capacity of noisy interference channels between two senders and two
receivers. PR-box correlations violate Bell/CHSH inequalities and are thus
stronger — more non-local — than quantum mechanics; yet weak enough to
respect special relativity in prohibiting faster-than-light communication.
Understanding their power will yield insight into the non-locality of quantum
mechanics. We exhibit two proof-of-concept channels: first, we show a channel
between two sender-receiver pairs where the senders are not allowed to
communicate, for which a shared super-quantum bit (a PR-box) allows perfect
communication. This feat is not achievable with the best classical (senders
share no resources) or quantum entanglement-assisted (senders share
entanglement) strategies. Second, we demonstrate a class of channels for which
a tunable parameter achieves a double separation of capacities; for some range
of epsilon, the super-quantum assisted strategy does better than the
entanglement-assisted strategy, which in turn does better than the classical
one.