IT博客汇 | arXiv Paper Daily: Wed, 9 Nov 2016

arXiv Paper Daily: Wed, 9 Nov 2016

我爱机器学习(52ml.net)发表于 2016-11-09 00:00:00

Neural and Evolutionary Computing

Adversarial Ladder Networks

Juan Maroñas Molano, Alberto Albiol Colomer, Roberto Paredes Palacios
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

The use of unsupervised data in addition to supervised data in training
discriminative neural networks has improved the performance of this clas-
sification scheme. However, the best results were achieved with a training
process that is divided in two parts: first an unsupervised pre-training step
is done for initializing the weights of the network and after these weights are
refined with the use of supervised data. On the other hand adversarial noise
has improved the results of clas- sical supervised learning. Recently, a new
neural network topology called Ladder Network, where the key idea is based in
some properties of hierar- chichal latent variable models, has been proposed as
a technique to train a neural network using supervised and unsupervised data at
the same time with what is called semi-supervised learning. This technique has
reached state of the art classification. In this work we add adversarial noise
to the ladder network and get state of the art classification, with several
important conclusions on how adversarial noise can help in addition with new
possible lines of investi- gation. We also propose an alternative to add
adversarial noise to unsu- pervised data.

Unsupervised Pretraining for Sequence to Sequence Learning

Prajit Ramachandran, Peter J. Liu, Quoc V. Le
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Sequence to sequence models are successful tools for supervised sequence
learning tasks, such as machine translation. Despite their success, these
models still require much labeled data and it is unclear how to improve them
using unlabeled data, which is much less expensive to obtain. In this paper, we
present simple changes that lead to a significant improvement in the accuracy
of seq2seq models when the labeled set is small. Our method intializes the
encoder and decoder of the seq2seq model with the trained weights of two
language models, and then all weights are jointly fine-tuned with labeled data.
An additional language modeling loss can be used to regularize the model during
fine-tuning. We apply this method to low-resource tasks in machine translation
and abstractive summarization and find that it significantly improves the
subsequent supervised models. Our main finding is that the pretraining
accelerates training and improves generalization of seq2seq models, achieving
state-of-the-art results on the WMT English(
ightarrow)German task. Our model
obtains an improvement of 1.3 BLEU from the previous best models on both WMT’14
and WMT’15 English(
ightarrow)German. Our ablation study shows that
pretraining helps seq2seq models in different ways depending on the nature of
the task: translation benefits from the improved generalization whereas
summarization benefits from the improved optimization.

Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders

Nat Dilokthanakul, Pedro A.M. Mediano, Marta Garnelo, Matthew C.H. Lee, Hugh Salimbeni, Kai Arulkumaran, Murray Shanahan
Comments: 12 pages, 4 figures, Under review as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We study a variant of the variational autoencoder model with a Gaussian
mixture as a prior distribution, with the goal of performing unsupervised
clustering through deep generative models. We observe that the standard
variational approach in these models is unsuited for unsupervised clustering,
and mitigate this problem by leveraging a principled information-theoretic
regularisation term known as consistency violation. Adding this term to the
standard variational optimisation objective yields networks with both
meaningful internal representations and well-defined clusters. We demonstrate
the performance of this scheme on synthetic data, MNIST and SVHN, showing that
the obtained clusters are distinct, interpretable and result in achieving
higher performance on unsupervised clustering classification than previous
approaches.

Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky
Comments: ICLR 2017 submission
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

We formulate sequence to sequence transduction as a noisy channel decoding
problem and use recurrent neural networks to parameterise the source and
channel models. Unlike direct models which can suffer from explaining-away
effects during training, noisy channel models must produce outputs that explain
their inputs, and their component models can be trained with not only paired
training samples but also unpaired samples from the marginal output
distribution. Using a latent variable to control how much of the conditioning
sequence the channel model needs to read in order to generate a subsequent
symbol, we obtain a tractable and effective beam search decoder. Experimental
results on abstractive sentence summarisation, morphological inflection, and
machine translation show that noisy channel models outperform direct models,
and that they significantly benefit from increased amounts of unpaired output
data that direct models cannot easily use.

Cognitive Discriminative Mappings for Rapid Learning

Wen-Chieh Fang, Yi-ting Chiang
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Humans can learn concepts or recognize items from just a handful of examples,
while machines require many more samples to perform the same task. In this
paper, we build a computational model to investigate the possibility of this
kind of rapid learning. The proposed method aims to improve the learning task
of input from sensory memory by leveraging the information retrieved from
long-term memory. We present a simple and intuitive technique called cognitive
discriminative mappings (CDM) to explore the cognitive problem. First, CDM
separates and clusters the data instances retrieved from long-term memory into
distinct classes with a discrimination method in working memory when a sensory
input triggers the algorithm. CDM then maps each sensory data instance to be as
close as possible to the median point of the data group with the same class.
The experimental results demonstrate that the CDM approach is effective for
learning the discriminative features of supervised classifications with few
training sensory input instances.

An Efficient Approach to Boosting Performance of Deep Spiking Network Training

Seongsik Park, Sang-gil Lee, Hyunha Nam, Sungroh Yoon
Comments: 9 pages, 5 figures
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Nowadays deep learning is dominating the field of machine learning with
state-of-the-art performance in various application areas. Recently, spiking
neural networks (SNNs) have been attracting a great deal of attention, notably
owning to their power efficiency, which can potentially allow us to implement a
low-power deep learning engine suitable for real-time/mobile applications.
However, implementing SNN-based deep learning remains challenging, especially
gradient-based training of SNNs by error backpropagation. We cannot simply
propagate errors through SNNs in conventional way because of the property of
SNNs that process discrete data in the form of a series. Consequently, most of
the previous studies employ a workaround technique, which first trains a
conventional weighted-sum deep neural network and then maps the learning
weights to the SNN under training, instead of training SNN parameters directly.
In order to eliminate this workaround, recently proposed is a new class of SNN
named deep spiking networks (DSNs), which can be trained directly (without a
mapping from conventional deep networks) by error backpropagation with
stochastic gradient descent. In this paper, we show that the initialization of
the membrane potential on the backward path is an important step in DSN
training, through diverse experiments performed under various conditions.
Furthermore, we propose a simple and efficient method that can improve DSN
training by controlling the initial membrane potential on the backward path. In
our experiments, adopting the proposed approach allowed us to boost the
performance of DSN training in terms of converging time and accuracy.

Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

David Balduzzi, Brian McWilliams, Tony Butler-Yeoman
Comments: 13 pages, 6 figures
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Modern convolutional networks, incorporating rectifiers and max-pooling, are
neither smooth nor convex. Standard guarantees therefore do not apply.
Nevertheless, methods from convex optimization such as gradient descent and
Adam are widely used as building blocks for deep learning algorithms. This
paper provides the first convergence guarantee applicable to modern convnets.
The guarantee matches a lower bound for convex nonsmooth functions. The key
technical tool is the neural Taylor approximation — a straightforward
application of Taylor expansions to neural networks — and the associated
Taylor loss. Experiments on a range of optimizers, layers, and tasks provide
evidence that the analysis accurately captures the dynamics of neural
optimization.

The second half of the paper applies the Taylor approximation to isolate the
main difficulty in training rectifier nets: that gradients are shattered. We
investigate the hypothesis that, by exploring the space of activation
configurations more thoroughly, adaptive optimizers such as RMSProp and Adam
are able to converge to better solutions.

Neuromorphic Silicon Photonics

Alexander N. Tait, Ellen Zhou, Thomas Ferreira de Lima, Allie X. Wu, Mitchell A. Nahmias, Bhavin J. Shastri, Paul R. Prucnal
Comments: 10 pages, 6 figures, submitted
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE); Optics (physics.optics)

We report first observations of an integrated analog photonic network, in
which connections are configured by microring weight banks, as well as the
first use of electro-optic modulators as photonic neurons. A mathematical
isomorphism between the silicon photonic circuit and a continuous neural model
is demonstrated through dynamical bifurcation analysis. Exploiting this
isomorphism, existing neural engineering tools can be adapted to silicon
photonic information processing systems. A 49-node silicon photonic neural
network programmed using a “neural compiler” is simulated and predicted to
outperform a conventional approach 1,960-fold in a toy differential system
emulation task. Photonic neural networks leveraging silicon photonic platforms
could access new regimes of ultrafast information processing for radio,
control, and scientific computing.

Computer Vision and Pattern Recognition

Multispectral Deep Neural Networks for Pedestrian Detection

Jingjing Liu, Shaoting Zhang, Shu Wang, Dimitris N. Metaxas
Comments: 13 pages, 8 figures, BMVC 2016 oral
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multispectral pedestrian detection is essential for around-the-clock
applications, e.g., surveillance and autonomous driving. We deeply analyze
Faster R-CNN for multispectral pedestrian detection task and then model it into
a convolutional network (ConvNet) fusion problem. Further, we discover that
ConvNet-based pedestrian detectors trained by color or thermal images
separately provide complementary information in discriminating human instances.
Thus there is a large potential to improve pedestrian detection by using color
and thermal images in DNNs simultaneously. We carefully design four ConvNet
fusion architectures that integrate two-branch ConvNets on different DNNs
stages, all of which yield better performance compared with the baseline
detector. Our experimental results on KAIST pedestrian benchmark show that the
Halfway Fusion model that performs fusion on the middle-level convolutional
features outperforms the baseline method by 11% and yields a missing rate 3.5%
lower than the other proposed architectures.

Estimating motion with principal component regression strategies

Felipe P. do Carmo, Vania Vieira Estrela, Joaquim Teixeira de Assis
Comments: 6 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:1610.02923
Journal-ref: Proceedings of the IEEE International Workshop on Multimedia
Signal Processing, 2009, MMSP ’09, 2009
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, two simple principal component regression methods for
estimating the optical flow between frames of video sequences according to a
pel-recursive manner are introduced. These are easy alternatives to dealing
with mixtures of motion vectors in addition to the lack of prior information on
spatial-temporal statistics (although they are supposed to be normal in a local
sense). The 2D motion vector estimation approaches take into consideration
simple image properties and are used to harmonize regularized least square
estimates. Their main advantage is that no knowledge of the noise distribution
is necessary, although there is an underlying assumption of localized
smoothness. Preliminary experiments indicate that this approach provides robust
estimates of the optical flow.

The Loss Surface of Residual Networks: Ensembles and the Role of Batch Normalization

Etai Littwin, Lior Wolf
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep Residual Networks present a premium in performance in comparison to
conventional networks of the same depth and are trainable at extreme depths. It
has recently been shown that Residual Networks behave like ensembles of
relatively shallow networks. We show that these ensembles are dynamic: while
initially the virtual ensemble is mostly at depths lower than half the
network’s depth, as training progresses, it becomes deeper and deeper. The main
mechanism that controls the dynamic ensemble behavior is the scaling
introduced, e.g., by the Batch Normalization technique. We explain this
behavior and demonstrate the driving force behind it. As a main tool in our
analysis, we employ generalized spin glass models, which we also use in order
to study the number of critical points in the optimization of Residual
Networks.

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, Convolutional Neural Networks (ConvNets) have shown promising
performances in many computer vision tasks, especially image-based recognition.
How to effectively use ConvNets for video-based recognition is still an open
problem. In this paper, we propose a compact, effective yet simple method to
encode spatio-temporal information carried in (3D) skeleton sequences into
multiple (2D) images, referred to as Joint Trajectory Maps (JTM), and ConvNets
are adopted to exploit the discriminative features for real-time human action
recognition. The proposed method has been evaluated on three public benchmarks,
i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal
human action dataset (UTD-MHAD) and achieved the state-of-the-art results.

Domain Adaptation with L2 constraints for classifying images from different endoscope systems

Toru Tamaki, Shoji Sonoyama, Takio Kurita, Tsubasa Hirakawa, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka, Kazuaki Chayama
Comments: 28 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

This paper proposes a method for domain adaptation that extends the maximum
margin domain transfer (MMDT) proposed by Hoffman et al., by introducing L_2
distance constraints between samples of different domains; thus, our method is
denoted as MMDTL2. Motivated by the differences between the images taken by
narrow band imaging (NBI) endoscopic devices, we utilize different NBI devices
as different domains and estimate the transformations between samples of
different domains, i.e., image samples taken by different NBI endoscope
systems. We first formulate the problem in the primal form, and then derive the
dual form with much lesser computational costs as compared to the naive
approach. From our experimental results using NBI image datasets from two
different NBI endoscopic devices, we find that MMDTL2 is more stable than MMDT
and better than support vector machines without adaptation.

Multiple Object Tracking with Kernelized Correlation Filters in Urban Mixed Traffic

Yuebin Yang, Guillaume-Alexandre Bilodeau
Comments: 7 pages, 4 figures, 2 tables, 1 algorithm
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, the Kernelized Correlation Filters tracker (KCF) achieved
competitive performance and robustness in visual object tracking. On the other
hand, visual trackers are not typically used in multiple object tracking. In
this paper, we investigate how a robust visual tracker like KCF can improve
multiple object tracking. Since KCF is a fast tracker, many can be used in
parallel and still result in fast tracking. We build a multiple object tracking
system based on KCF and background subtraction. Background subtraction is
applied to extract moving objects and get their scale and size in combination
with KCF outputs, while KCF is used for data association and to handle
fragmentation and occlusion problems. As a result, KCF and background
subtraction help each other to take tracking decision at every frame. Sometimes
KCF outputs are the most trustworthy (e.g. during occlusion), while in some
other case, it is the background subtraction outputs. To validate the
effectiveness of our system, the algorithm is demonstrated on four urban video
recordings from a standard dataset. Results show that our method is competitive
with state-of-the-art trackers even if we use a much simpler data association
step.

Quantum spectral analysis: frequency at time

Mario Mastriani
Comments: 140 pages, 78 figures, 8 tables. arXiv admin note: text overlap with arXiv:0803.2507 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A quantum time-dependent spectrum analysis, or simply, quantum spectral
analysis (QuSA) is presented in this work, and it is based on Schrodinger
equation, which is a partial differential equation that describes how the
quantum state of a non-relativistic physical system changes with time. In
classic world is named frequency at time (FAT), which is presented here in
opposition and as a complement of traditional spectral analysis
frequency-dependent based on Fourier theory. Besides, FAT is a metric, which
assesses the impact of the flanks of a signal on its frequency spectrum, which
is not taken into account by Fourier theory and even less in real time. Even
more, and unlike all derived tools from Fourier Theory (i.e., continuous,
discrete, fast, short-time, fractional and quantum Fourier Transform, as well
as, Gabor) FAT has the following advantages: a) compact support with excellent
energy output treatment, b) low computational cost, O(N) for signals and O(N2)
for images, c) it does not have phase uncertainties (indeterminate phase for
magnitude = 0) as Discrete and Fast Fourier Transform (DFT, FFT, respectively),
d) among others. In fact, FAT constitutes one side of a triangle (which from
now on is closed) and it consists of the original signal in time, spectral
analysis based on Fourier Theory and FAT. Thus a toolbox is completed, which it
is essential for all applications of Digital Signal Processing (DSP) and
Digital Image Processing (DIP); and, even, in the latter, FAT allows edge
detection (which is called flank detection in case of signals), denoising,
despeckling, compression, and superresolution of still images. Such
applications include signals intelligence and imagery intelligence. On the
other hand, we will present other DIP tools, which are also derived from the
Schrodinger equation.

Gradients of Counterfactuals

Mukund Sundararajan, Ankur Taly, Qiqi Yan
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Gradients have been used to quantify feature importance in machine learning
models. Unfortunately, in nonlinear deep networks, not only individual neurons
but also the whole network can saturate, and as a result an important input
feature can have a tiny gradient. We study various networks, and observe that
this phenomena is indeed widespread, across many inputs.

We propose to examine interior gradients, which are gradients of
counterfactual inputs constructed by scaling down the original input. We apply
our method to the GoogleNet architecture for object recognition in images, as
well as a ligand-based virtual screening network with categorical features and
an LSTM based language model for the Penn Treebank dataset. We visualize how
interior gradients better capture feature importance. Furthermore, interior
gradients are applicable to a wide variety of deep networks, and have the
attribution property that the feature importance scores sum to the the
prediction score.

Best of all, interior gradients can be computed just as easily as gradients.
In contrast, previous methods are complex to implement, which hinders practical
adoption.

Artificial Intelligence

On interestingness measures of formal concepts

Sergei O. Kuznetsov, Tatiana Makhalova
Comments: 20 pages, 5 figures, 3 tables
Subjects: Artificial Intelligence (cs.AI)

Formal concepts and closed itemsets proved to be of big importance for
knowledge discovery, both as a tool for concise representation of association
rules and a tool for clustering and constructing domain taxonomies and
ontologies. Exponential explosion makes it difficult to consider the whole
concept lattice arising from data, one needs to select most useful and
interesting concepts. In this paper interestingness measures of concepts are
considered and compared with respect to various aspects, such as efficiency of
computation and applicability to noisy data and performing ranking correlation.

Cognitive Discriminative Mappings for Rapid Learning

Wen-Chieh Fang, Yi-ting Chiang
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Humans can learn concepts or recognize items from just a handful of examples,
while machines require many more samples to perform the same task. In this
paper, we build a computational model to investigate the possibility of this
kind of rapid learning. The proposed method aims to improve the learning task
of input from sensory memory by leveraging the information retrieved from
long-term memory. We present a simple and intuitive technique called cognitive
discriminative mappings (CDM) to explore the cognitive problem. First, CDM
separates and clusters the data instances retrieved from long-term memory into
distinct classes with a discrimination method in working memory when a sensory
input triggers the algorithm. CDM then maps each sensory data instance to be as
close as possible to the median point of the data group with the same class.
The experimental results demonstrate that the CDM approach is effective for
learning the discriminative features of supervised classifications with few
training sensory input instances.

The Data Complexity of Description Logic Ontologies

Carsten Lutz, Frank Wolter
Subjects: Artificial Intelligence (cs.AI)

We analyze the data complexity of ontology-mediated querying where the
ontologies are formulated in a description logic (DL) of the ALC family and
queries are conjunctive queries, positive existential queries, or acyclic
conjunctive queries. Our approach is non-uniform in the sense that we aim to
understand the complexity of each single ontology instead of for all ontologies
formulated in a certain language. While doing so, we quantify over the queries
and are interested, for example, in the question whether all queries can be
evaluated in polynomial time w.r.t. a given ontology. Our results include a
PTime/coNP-dichotomy for ontologies of depth one in the description logic
ALCFI, the equivalence of a PTime/coNP-dichotomy for ALC and ALCI-ontologies of
unrestricted depth to the famous dichotomy conjecture for CSPs by Feder and
Vardi, and the failure of PTime/coNP-dichotomy theorem for ALCF-ontologies.
Regarding the latter DL, we also show that it is undecidable whether a given
ontology admits PTime query evaluation.

Proceedings of the First International Workshop on Argumentation in Logic Programming and Non-Monotonic Reasoning (Arg-LPNMR 2016)

Sarah Alice Gaggl, Juan Carlos Nieves, Hannes Strass
Subjects: Artificial Intelligence (cs.AI)

This volume contains the papers presented at Arg-LPNMR 2016: First
International Workshop on Argumentation in Logic Programming and Nonmonotonic
Reasoning held on July 8-10, 2016 in New York City, NY.

Combining observational and experimental data to find heterogeneous treatment effects

Alexander Peysakhovich, Akos Lada
Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Every design choice will have different effects on different units. However
traditional A/B tests are often underpowered to identify these heterogeneous
effects. This is especially true when the set of unit-level attributes is
high-dimensional and our priors are weak about which particular covariates are
important. However, there are often observational data sets available that are
orders of magnitude larger. We propose a method to combine these two data
sources to estimate heterogeneous treatment effects. First, we use
observational time series data to estimate a mapping from covariates to
unit-level effects. These estimates are likely biased but under some conditions
the bias preserves unit-level relative rank orderings. If these conditions
hold, we only need sufficient experimental data to identify a monotonic,
one-dimensional transformation from observationally predicted treatment effects
to real treatment effects. This reduces power demands greatly and makes the
detection of heterogeneous effects much easier. As an application, we show how
our method can be used to improve Facebook page recommendations.

Sentence Ordering using Recurrent Neural Networks

Lajanugen Logeswaran, Honglak Lee, Dragomir Radev
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

Modeling the structure of coherent texts is a task of great importance in
NLP. The task of organizing a given set of sentences into a coherent order has
been commonly used to build and evaluate models that understand such structure.
In this work we propose an end-to-end neural approach based on the recently
proposed set to sequence mapping framework to address the sentence ordering
problem. Our model achieves state-of-the-art performance in the order
discrimination task on two datasets widely used in the literature. We also
consider a new interesting task of ordering abstracts from conference papers
and research proposals and demonstrate strong performance against recent
methods. Visualizing the sentence representations learned by the model shows
that the model has captured high level logical structure in these paragraphs.
The model also learns rich semantic sentence representations by learning to
order texts, performing comparably to recent unsupervised representation
learning methods in the sentence similarity and paraphrase detection tasks.

The Neural Noisy Channel

We formulate sequence to sequence transduction as a noisy channel decoding
problem and use recurrent neural networks to parameterise the source and
channel models. Unlike direct models which can suffer from explaining-away
effects during training, noisy channel models must produce outputs that explain
their inputs, and their component models can be trained with not only paired
training samples but also unpaired samples from the marginal output
distribution. Using a latent variable to control how much of the conditioning
sequence the channel model needs to read in order to generate a subsequent
symbol, we obtain a tractable and effective beam search decoder. Experimental
results on abstractive sentence summarisation, morphological inflection, and
machine translation show that noisy channel models outperform direct models,
and that they significantly benefit from increased amounts of unpaired output
data that direct models cannot easily use.

Learning from Untrusted Data

Moses Charikar, Jacob Steinhardt, Gregory Valiant
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Cryptography and Security (cs.CR); Statistics Theory (math.ST)

The vast majority of theoretical results in machine learning and statistics
assume that the available training data is a reasonably reliable reflection of
the phenomena to be learned or estimated. Similarly, the majority of machine
learning and statistical techniques used in practice are brittle to the
presence of large amounts of biased or malicious data. In this work we propose
two novel frameworks in which to study estimation, learning, and optimization
in the presence of significant fractions of arbitrary data. The first
framework, which we term list-decodable learning, asks whether it is possible
to return a list of answers, with the guarantee that at least one of them is
accurate. For example, given a dataset of (n) points for which an unknown
subset of (alpha n) points are drawn from a distribution of interest, and no
assumptions are made about the remaining ((1-alpha)n) points, is it possible
to return a list of (poly(1/alpha)) answers, one of which is correct? The
second framework, which we term the semi-verified learning model considers the
extent to which a small dataset of trusted data (drawn from the distribution in
question) can be leveraged to enable the accurate extraction of information
from a much larger but untrusted dataset (of which only an (alpha)-fraction is
drawn from the distribution).

We show strong positive results in both settings, and provide an algorithm
for robust learning in a very general stochastic optimization setting. This
general result has immediate implications for robust estimation in a number of
settings, including for robustly estimating the mean of distributions with
bounded second moments, robustly learning mixtures of such distributions, and
robustly finding planted partitions in random graphs in which significant
portions of the graph have been perturbed by an adversary.

Normalizing Flows on Riemannian Manifolds

Mevlana C. Gemici, Danilo Rezende, Shakir Mohamed
Comments: 3 pages, 2 figures, Workshop on Bayesian Deep Learning at NIPS 2016
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Statistics Theory (math.ST)

We consider the problem of density estimation on Riemannian manifolds.
Density estimation on manifolds has many applications in fluid-mechanics,
optics and plasma physics and it appears often when dealing with angular
variables (such as used in protein folding, robot limbs, gene-expression) and
in general directional statistics. In spite of the multitude of algorithms
available for density estimation in the Euclidean spaces (mathbf{R}^n) that
scale to large n (e.g. normalizing flows, kernel methods and variational
approximations), most of these methods are not immediately suitable for density
estimation in more general Riemannian manifolds. We revisit techniques related
to homeomorphisms from differential geometry for projecting densities to
sub-manifolds and use it to generalize the idea of normalizing flows to more
general Riemannian manifolds. The resulting algorithm is scalable, simple to
implement and suitable for use with automatic differentiation. We demonstrate
concrete examples of this method on the n-sphere (mathbf{S}^n).

Dynamic Coattention Networks For Question Answering

Caiming Xiong, Victor Zhong, Richard Socher
Comments: 13 pages, 7 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Several deep learning models have been proposed for question answering.
However, due to their single-pass nature, they have no way to recover from
local maxima corresponding to incorrect answers. To address this problem, we
introduce the Dynamic Coattention Network (DCN) for question answering. The DCN
first fuses co-dependent representations of the question and the document in
order to focus on relevant parts of both. Then a dynamic pointing decoder
iterates over potential answer spans. This iterative procedure enables the
model to recover from initial local maxima corresponding to incorrect answers.
On the Stanford question answering dataset, a single DCN model improves the
previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains
80.4% F1.

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
Comments: Under review as a conference paper at ICLR 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Transfer and multi-task learning have traditionally focused on either a
single source-target pair or very few, similar tasks. Ideally, the linguistic
levels of morphology, syntax and semantics would benefit each other by being
trained in a single model. We introduce such a joint many-task model together
with a strategy for successively growing its depth to solve increasingly
complex tasks. All layers include shortcut connections to both word
representations and lower-level task predictions. We use a simple
regularization term to allow for optimizing all model weights to improve one
task’s loss without exhibiting catastrophic interference of the other tasks.
Our single end-to-end trainable model obtains state-of-the-art results on
chunking, dependency parsing, semantic relatedness and textual entailment. It
also performs competitively on POS tagging. Our dependency parsing layer relies
only on a single feed-forward pass and does not require a beam search.

Quasi-Recurrent Neural Networks

James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
Comments: Submitted to conference track at ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

Recurrent neural networks are a powerful tool for modeling sequential data,
but the dependence of each timestep’s computation on the previous timestep’s
output limits parallelism and makes RNNs unwieldy for very long sequences. We
introduce quasi-recurrent neural networks (QRNNs), an approach to neural
sequence modeling that alternates convolutional layers, which apply in parallel
across timesteps, and a minimalist recurrent pooling function that applies in
parallel across channels. Despite lacking trainable recurrent layers, stacked
QRNNs have better predictive accuracy than stacked LSTMs of the same hidden
size. Due to their increased parallelism, they are up to 16 times faster at
train and test time. Experiments on language modeling, sentiment
classification, and character-level neural machine translation demonstrate
these advantages and underline the viability of QRNNs as a basic building block
for a variety of sequence tasks.

Information Retrieval

Balotage in Argentina 2015, a sentiment analysis of tweets

Daniel Robins, Fernando Emmanuel Frati, Jonatan Alvarez, Jose Texier
Comments: in Spanish. Jornadas de Cloud Computing, La Plata – Argentina. 2016
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Twitter social network contains a large amount of information generated by
its users. That information is composed of opinions and comments that may
reflect trends in social behavior. There is talk of trend when it is possible
to identify opinions and comments geared towards the same shared by a lot of
people direction. To determine if two or more written opinions share the same
address, techniques Natural Language Processing (NLP) are used. This paper
proposes a methodology for predicting reflected in Twitter from the use of
sentiment analysis functions NLP based on social behaviors. The case study was
selected the 2015 Presidential in Argentina, and a software architecture Big
Data composed Vertica data base with the component called Pulse was used.
Through the analysis it was possible to detect trends in voting intentions with
regard to the presidential candidates, achieving greater accuracy in predicting
that achieved with traditional systems surveys.

Computation and Language

Unsupervised Pretraining for Sequence to Sequence Learning

Prajit Ramachandran, Peter J. Liu, Quoc V. Le
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Sequence to sequence models are successful tools for supervised sequence
learning tasks, such as machine translation. Despite their success, these
models still require much labeled data and it is unclear how to improve them
using unlabeled data, which is much less expensive to obtain. In this paper, we
present simple changes that lead to a significant improvement in the accuracy
of seq2seq models when the labeled set is small. Our method intializes the
encoder and decoder of the seq2seq model with the trained weights of two
language models, and then all weights are jointly fine-tuned with labeled data.
An additional language modeling loss can be used to regularize the model during
fine-tuning. We apply this method to low-resource tasks in machine translation
and abstractive summarization and find that it significantly improves the
subsequent supervised models. Our main finding is that the pretraining
accelerates training and improves generalization of seq2seq models, achieving
state-of-the-art results on the WMT English(
ightarrow)German task. Our model
obtains an improvement of 1.3 BLEU from the previous best models on both WMT’14
and WMT’15 English(
ightarrow)German. Our ablation study shows that
pretraining helps seq2seq models in different ways depending on the nature of
the task: translation benefits from the improved generalization whereas
summarization benefits from the improved optimization.

Sentence Ordering using Recurrent Neural Networks

Lajanugen Logeswaran, Honglak Lee, Dragomir Radev
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

Modeling the structure of coherent texts is a task of great importance in
NLP. The task of organizing a given set of sentences into a coherent order has
been commonly used to build and evaluate models that understand such structure.
In this work we propose an end-to-end neural approach based on the recently
proposed set to sequence mapping framework to address the sentence ordering
problem. Our model achieves state-of-the-art performance in the order
discrimination task on two datasets widely used in the literature. We also
consider a new interesting task of ordering abstracts from conference papers
and research proposals and demonstrate strong performance against recent
methods. Visualizing the sentence representations learned by the model shows
that the model has captured high level logical structure in these paragraphs.
The model also learns rich semantic sentence representations by learning to
order texts, performing comparably to recent unsupervised representation
learning methods in the sentence similarity and paraphrase detection tasks.

Veracity Computing from Lexical Cues and Perceived Certainty Trends

Uwe D. Reichel, Piroska Lendvai
Comments: to appear in: Proc. 2nd Workshop on Noisy User-generated Text, Osaka, Japan, 2016
Subjects: Computation and Language (cs.CL)

We present a data-driven method for determining the veracity of a set of
rumorous claims on social media data. Tweets from different sources pertaining
to a rumor are processed on three levels: first, factuality values are assigned
to each tweet based on four textual cue categories relevant for our journalism
use case; these amalgamate speaker support in terms of polarity and commitment
in terms of certainty and speculation. Next, the proportions of these lexical
cues are utilized as predictors for tweet certainty in a generalized linear
regression model. Subsequently, lexical cue proportions, predicted certainty,
as well as their time course characteristics are used to compute veracity for
each rumor in terms of the identity of the rumor-resolving tweet and its binary
resolution value judgment. The system operates without access to
extralinguistic resources. Evaluated on the data portion for which hand-labeled
examples were available, it achieves .74 F1-score on identifying rumor
resolving tweets and .76 F1-score on predicting if a rumor is resolved as true
or false.

Contradiction Detection for Rumorous Claims

Piroska Lendvai, Uwe D. Reichel
Comments: to appear in: Proc. Extra-Propositional Aspects of Meaning (ExProM) in Computational Linguistics, Osaka, Japan, 2016
Subjects: Computation and Language (cs.CL)

The utilization of social media material in journalistic workflows is
increasing, demanding automated methods for the identification of mis- and
disinformation. Since textual contradiction across social media posts can be a
signal of rumorousness, we seek to model how claims in Twitter posts are being
textually contradicted. We identify two different contexts in which
contradiction emerges: its broader form can be observed across independently
posted tweets and its more specific form in threaded conversations. We define
how the two scenarios differ in terms of central elements of argumentation:
claims and conversation structure. We design and evaluate models for the two
scenarios uniformly as 3-way Recognizing Textual Entailment tasks in order to
represent claims and conversation structure implicitly in a generic inference
model, while previous studies used explicit or no representation of these
properties. To address noisy text, our classifiers use simple similarity
features derived from the string and part-of-speech level. Corpus statistics
reveal distribution differences for these features in contradictory as opposed
to non-contradictory tweet relations, and the classifiers yield state of the
art performance.

The Neural Noisy Channel

We formulate sequence to sequence transduction as a noisy channel decoding
problem and use recurrent neural networks to parameterise the source and
channel models. Unlike direct models which can suffer from explaining-away
effects during training, noisy channel models must produce outputs that explain
their inputs, and their component models can be trained with not only paired
training samples but also unpaired samples from the marginal output
distribution. Using a latent variable to control how much of the conditioning
sequence the channel model needs to read in order to generate a subsequent
symbol, we obtain a tractable and effective beam search decoder. Experimental
results on abstractive sentence summarisation, morphological inflection, and
machine translation show that noisy channel models outperform direct models,
and that they significantly benefit from increased amounts of unpaired output
data that direct models cannot easily use.

Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches

Shane Settle, Karen Livescu
Comments: To appear at SLT 2016
Subjects: Computation and Language (cs.CL)

Acoustic word embeddings — fixed-dimensional vector representations of
variable-length spoken word segments — have begun to be considered for tasks
such as speech recognition and query-by-example search. Such embeddings can be
learned discriminatively so that they are similar for speech segments
corresponding to the same word, while being dissimilar for segments
corresponding to different words. Recent work has found that acoustic word
embeddings can outperform dynamic time warping on query-by-example search and
related word discrimination tasks. However, the space of embedding models and
training approaches is still relatively unexplored. In this paper we present
new discriminative embedding models based on recurrent neural networks (RNNs).
We consider training losses that have been successful in prior work, in
particular a cross entropy loss for word classification and a contrastive loss
that explicitly aims to separate same-word and different-word pairs in a
“Siamese network” training setting. We find that both classifier-based and
Siamese RNN embeddings improve over previously reported results on a word
discrimination task, with Siamese RNNs outperforming classification models. In
addition, we present analyses of the learned embeddings and the effects of
variables such as dimensionality and network structure.

A Surrogate-based Generic Classifier for Chinese Movie Reviews

Yufeng Ma, Long Xia, Wenqi Shen, Weiguo Fan
Subjects: Computation and Language (cs.CL)

With the emerging of various online video platforms like Youtube, Youku and
LeTV, online movie reviews become more and more important both for movie
viewers and producers. As a result, automatically classifying reviews according
to different requirements evolves as a popular research topic and is very
essential in our daily life. In this paper, we focused on reviews of hot TV
series in China and successfully trained generic classifiers based on 8
predefined categories. The experimental results showed promising performance
and effectiveness of its generalization to different TV series.

Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents

Rui Zhang, Honglak Lee, Dragomir Radev
Comments: NAACL2016
Subjects: Computation and Language (cs.CL)

The goal of sentence and document modeling is to accurately represent the
meaning of sentences and documents for various Natural Language Processing
tasks. In this work, we present Dependency Sensitive Convolutional Neural
Networks (DSCNN) as a general-purpose classification system for both sentences
and documents. DSCNN hierarchically builds textual representations by
processing pretrained word embeddings via Long Short-Term Memory networks and
subsequently extracting features with convolution operators. Compared with
existing recursive neural models with tree structures, DSCNN does not rely on
parsers and expensive phrase labeling, and thus is not restricted to
sentence-level tasks. Moreover, unlike other CNN-based models that analyze
sentences locally by sliding windows, our system captures both the dependency
information within each sentence and relationships across sentences in the same
document. Experiment results demonstrate that our approach is achieving
state-of-the-art performance on several tasks, including sentiment analysis,
question type classification, and subjectivity classification.

Cruciform: Solving Crosswords with Natural Language Processing

Dragomir Radev, Rui Zhang, Steve Wilson, Derek Van Assche, Henrique Spyra Gubert, Alisa Krivokapic, MeiXing Dong, Chongruo Wu, Spruce Bondera, Luke Brandl, Jeremy Dohmann
Subjects: Computation and Language (cs.CL)

Crossword puzzles are popular word games that require not only a large
vocabulary, but also a broad knowledge of topics. Answering each clue is a
natural language task on its own as many clues contain nuances, puns, or
counter-intuitive word definitions. Additionally, it can be extremely difficult
to ascertain definitive answers without the constraints of the crossword grid
itself. This task is challenging for both humans and computers. We describe
here a new crossword solving system, Cruciform. We employ a group of natural
language components, each of which returns a list of candidate words with
scores when given a clue. These lists are used in conjunction with the fill
intersections in the puzzle grid to formulate a constraint satisfaction
problem, in a manner similar to the one used in the Dr. Fill system. We
describe the results of several of our experiments with the system.

A Convolutional Encoder Model for Neural Machine Translation

Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin
Comments: 13 pages
Subjects: Computation and Language (cs.CL)

The prevalent approach to neural machine translation relies on bi-directional
LSTMs to encode the source sentence. In this paper we present a faster and
conceptually simpler architecture based on a succession of convolutional
layers. This allows to encode the entire source sentence simultaneously
compared to recurrent networks for which computation is constrained by temporal
dependencies. We achieve a new state-of-the-art on WMT’16 English-Romanian
translation and outperform several recently published results on the WMT’15
English-German task. We also achieve almost the same accuracy as a very deep
LSTM setup on WMT’14 English-French translation. Our convolutional encoder
speeds up CPU decoding by more than two times at the same or higher accuracy as
a strong bi-directional LSTM baseline.

Balotage in Argentina 2015, a sentiment analysis of tweets

Twitter social network contains a large amount of information generated by
its users. That information is composed of opinions and comments that may
reflect trends in social behavior. There is talk of trend when it is possible
to identify opinions and comments geared towards the same shared by a lot of
people direction. To determine if two or more written opinions share the same
address, techniques Natural Language Processing (NLP) are used. This paper
proposes a methodology for predicting reflected in Twitter from the use of
sentiment analysis functions NLP based on social behaviors. The case study was
selected the 2015 Presidential in Argentina, and a software architecture Big
Data composed Vertica data base with the component called Pulse was used.
Through the analysis it was possible to detect trends in voting intentions with
regard to the presidential candidates, achieving greater accuracy in predicting
that achieved with traditional systems surveys.

Distributed, Parallel, and Cluster Computing

Optimization and parallelization of B-spline based orbital evaluations in QMC on multi/many-core shared memory processors

Amrita Mathuriya, Ye Luo, Anouar Benali, Luke Shulenburger, Jeongnim Kim
Comments: 11 pages, 10 figures, 4 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

B-spline based orbital representations are widely used in Quantum Monte Carlo
(QMC) simulations of solids, historically taking as much as 50% of the total
run time. Random accesses to a large four-dimensional array make it challenging
to efficiently utilize caches and wide vector units of modern CPUs. We present
node-level optimizations of B-spline evaluations on multi/many-core shared
memory processors. To increase SIMD efficiency and bandwidth utilization, we
first apply data layout transformation from array-of-structures to
structure-of-arrays (SoA). Then by blocking SoA objects, we optimize cache
reuse and get sustained throughput for a range of problem sizes. We implement
efficient nested threading in B-spline orbital evaluation kernels, paving the
way towards enabling strong scaling of QMC simulations. These optimizations are
portable on four distinct cache-coherent architectures and result in up to 5.6x
performance enhancements on Intel Xeon Phi processor 7250P (KNL), 5.7x on Intel
Xeon Phi coprocessor 7120P, 10x on an Intel Xeon processor E5v4 CPU and 9.5x on
BlueGene/Q processor. Our nested threading implementation shows nearly ideal
parallel efficiency on KNL up to 16 threads. We employ roofline performance
analysis to model the impacts of our optimizations. This work combined with our
current efforts of optimizing other QMC kernels, result in greater than 4.5x
speedup of miniQMC on KNL.

Multidimensional Asymptotic Consensus in Dynamic Networks

Bernadette Charron-Bost, Matthias Függer, Thomas Nowak
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY)

We study the problem of asymptotic consensus as it occurs in a wide range of
applications in both man-made and natural systems. In particular, we study
systems with directed communication graphs that may change over time.

We recently proposed a new family of convex combination algorithms in
dimension one whose weights depend on the received values and not only on the
communication topology. Here, we extend this approach to arbitrarily high
dimensions by introducing two new algorithms: the ExtremePoint and the Centroid
algorithm. Contrary to classical convex combination algorithms, both have
component-wise contraction rates that are constant in the number of agents.
Paired with a speed-up technique for convex combination algorithms, we get a
convergence time linear in the number of agents, which is optimal.

Besides their respective contraction rates, the two algorithms differ in the
fact that the Centroid algorithm’s update rule is independent of any coordinate
system while the ExtremePoint algorithm implicitly assumes a common agreed-upon
coordinate system among agents. The latter assumption may be realistic in some
man-made multi-agent systems but is highly questionable in systems designed for
the modelization of natural phenomena.

Finally we prove that our new algorithms also achieve asymptotic consensus
under very weak connectivity assumptions, provided that agent interactions are
bidirectional.

Memory layout in GPU implementation of lattice Boltzmann method for sparse 3D geometries

Tadeusz Tomczak, Roman G. Szafran
Comments: 14 pages, 14 figures, sent to Computer Physics Comunications
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

We describe a high-performance implementation of the lattice Boltzmann method
(LBM) for sparse 3D geometries on graphic processors (GPU). The main
contribution of this work is a data layout that allows to minimise the number
of redundant memory transactions during the propagation step of LBM. We show
that by using a uniform mesh of small three-dimensional tiles and a careful
data placement it is possible to utilise more than 70% of maximum theoretical
GPU memory bandwidth for D3Q19 lattice and double precision numbers. The
performance of our implementation is thoroughly examined and compared with
other GPU implementations of LBM. The proposed method performs the best for
sparse geometries with good spatial locality.

Automated Application Offloading through Ant-inspired Decision-Making

Roya Golchay (CITI), Frédéric Le Mouël (CITI), Julien Ponge (CITI), Nicolas Stouls (CITI)
Journal-ref: Proceedings of the 13th International Conference on New
Technologies in Distributed Systems (NOTERE’2016), Jul 2016, Paris, France
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

-The explosive trend of smartphone usage as the most effective and convenient
communication tools of human life in recent years make developers build ever
more complex smartphone applications. Gaming, navigation, video editing,
augmented reality, and speech recognition applications require considerable
computational power and energy. Although smart- phones have a wide range of
capabilities – GPS, WiFi, cameras – their inherent limitations – frequent
disconnections, mobility – and significant constraints – size, lower weights,
longer battery life – make difficult to exploiting their full potential to run
complex applications. Several research works have proposed solutions in
application offloading domain, but few ones concerning the highly changing
properties of the environment. To address these issues, we realize an automated
application offloading middleware, ACOMMA, with dynamic and re-adaptable
decision-making engine. The decision engine of ACOMMA is based on an ant-
inspired algorithm.

An Optimal Ancestry Labeling Scheme with Applications to XML Trees and Universal Posets

Pierre Fraigniaud (IRIF), Amos Korman (IRIF)
Journal-ref: Journal of the ACM, ACM, 2016, 63, pp.1 – 31
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

In this paper we solve the ancestry-labeling scheme problem which aims at
assigning the shortest possible labels (bit strings) to nodes of rooted trees,
so that ancestry queries between any two nodes can be answered by inspecting
their assigned labels only. This problem was introduced more than twenty years
ago by Kannan et al. [STOC ’88], and is among the most well-studied problems in
the field of informative labeling schemes. We construct an ancestry-labeling
scheme for n-node trees with label size log 2 n + O(log log n) bits, thus
matching the log 2 n + (Omega)(log log n) bits lower bound given by Alstrup et
al. [SODA ’03]. Our scheme is based on a simplified ancestry scheme that
operates extremely well on a restricted set of trees. In particular, for the
set of n-node trees with depth at most d, the simplified ancestry scheme enjoys
label size of log 2 n + 2 log 2 d + O(1) bits. Since the depth of most XML
trees is at most some small constant, such an ancestry scheme may be of
practical use. In addition, we also obtain an adjacency-labeling scheme that
labels n-node trees of depth d with labels of size log 2 n + 3 log 2 d + O(1)
bits. All our schemes assign the labels in linear time, and guarantee that any
query can be answered in constant time. Finally, our ancestry scheme finds
applications to the construction of small universal partially ordered sets
(posets). Specifically, for any fixed integer k, it enables the construction of
a universal poset of size~Osize~ size~O(n k) for the family of n-element posets
with tree-dimension at most k. Up to lower order terms, this bound is tight
thanks to a lower bound of n k–o(1) due to Alon and Scheinerman [Order ’88].

GPU-Based Parallel Integration of Large Numbers of Independent ODE Systems

Kyle E Niemeyer, Chih-Jen Sung
Comments: 21 pages, 2 figures
Journal-ref: Numerical Computations with GPUs, Ch. 8 (2014) 159-182. V
Kindratenko (Ed.)
Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Computational Physics (physics.comp-ph)

The task of integrating a large number of independent ODE systems arises in
various scientific and engineering areas. For nonstiff systems, common explicit
integration algorithms can be used on GPUs, where individual GPU threads
concurrently integrate independent ODEs with different initial conditions or
parameters. One example is the fifth-order adaptive Runge-Kutta-Cash-Karp
(RKCK) algorithm. In the case of stiff ODEs, standard explicit algorithms
require impractically small time-step sizes for stability reasons, and implicit
algorithms are therefore commonly used instead to allow larger time steps and
reduce the computational expense. However, typical high-order implicit
algorithms based on backwards differentiation formulae (e.g., VODE, LSODE)
involve complex logical flow that causes severe thread divergence when
implemented on GPUs, limiting the performance. Therefore, alternate algorithms
are needed. A GPU-based Runge-Kutta-Chebyshev (RKC) algorithm can handle
moderate levels of stiffness and performs significantly faster than not only an
equivalent CPU version but also a CPU-based implicit algorithm (VODE) based on
results shown in the literature. In this chapter, we present the mathematical
background, implementation details, and source code for the RKCK and RKC
algorithms for use integrating large numbers of independent systems of ODEs on
GPUs. In addition, brief performance comparisons are shown for each algorithm,
demonstrating the potential benefit of moving to GPU-based ODE integrators.

Learning

Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders

We study a variant of the variational autoencoder model with a Gaussian
mixture as a prior distribution, with the goal of performing unsupervised
clustering through deep generative models. We observe that the standard
variational approach in these models is unsuited for unsupervised clustering,
and mitigate this problem by leveraging a principled information-theoretic
regularisation term known as consistency violation. Adding this term to the
standard variational optimisation objective yields networks with both
meaningful internal representations and well-defined clusters. We demonstrate
the performance of this scheme on synthetic data, MNIST and SVHN, showing that
the obtained clusters are distinct, interpretable and result in achieving
higher performance on unsupervised clustering classification than previous
approaches.

Gradients of Counterfactuals

Mukund Sundararajan, Ankur Taly, Qiqi Yan
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Gradients have been used to quantify feature importance in machine learning
models. Unfortunately, in nonlinear deep networks, not only individual neurons
but also the whole network can saturate, and as a result an important input
feature can have a tiny gradient. We study various networks, and observe that
this phenomena is indeed widespread, across many inputs.

We propose to examine interior gradients, which are gradients of
counterfactual inputs constructed by scaling down the original input. We apply
our method to the GoogleNet architecture for object recognition in images, as
well as a ligand-based virtual screening network with categorical features and
an LSTM based language model for the Penn Treebank dataset. We visualize how
interior gradients better capture feature importance. Furthermore, interior
gradients are applicable to a wide variety of deep networks, and have the
attribution property that the feature importance scores sum to the the
prediction score.

Best of all, interior gradients can be computed just as easily as gradients.
In contrast, previous methods are complex to implement, which hinders practical
adoption.

PixelSNE: Visualizing Fast with Just Enough Precision via Pixel-Aligned Stochastic Neighbor Embedding

Minjeong Kim, Minsuk Choi, Sunwoong Lee, Jian Tang, Haesun Park, Jaegul Choo
Subjects: Learning (cs.LG)

Embedding and visualizing large-scale high-dimensional data in a
two-dimensional space is an important problem since such visualization can
reveal deep insights out of complex data. Most of the existing embedding
approaches, however, run on an excessively high precision, ignoring the fact
that at the end, embedding outputs are converted into coarse-grained discrete
pixel coordinates in a screen space. Motivated by such an observation and
directly considering pixel coordinates in an embedding optimization process, we
accelerate Barnes-Hut tree-based t-distributed stochastic neighbor embedding
(BH-SNE), known as a state-of-the-art 2D embedding method, and propose a novel
method called PixelSNE, a highly-efficient, screen resolution-driven 2D
embedding method with a linear computational complexity in terms of the number
of data items. Our experimental results show the significantly fast running
time of PixelSNE by a large margin against BH-SNE, while maintaining the
minimal degradation in the embedding quality. Finally, the source code of our
method is publicly available at this https URL

An Efficient Approach to Boosting Performance of Deep Spiking Network Training

Seongsik Park, Sang-gil Lee, Hyunha Nam, Sungroh Yoon
Comments: 9 pages, 5 figures
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Nowadays deep learning is dominating the field of machine learning with
state-of-the-art performance in various application areas. Recently, spiking
neural networks (SNNs) have been attracting a great deal of attention, notably
owning to their power efficiency, which can potentially allow us to implement a
low-power deep learning engine suitable for real-time/mobile applications.
However, implementing SNN-based deep learning remains challenging, especially
gradient-based training of SNNs by error backpropagation. We cannot simply
propagate errors through SNNs in conventional way because of the property of
SNNs that process discrete data in the form of a series. Consequently, most of
the previous studies employ a workaround technique, which first trains a
conventional weighted-sum deep neural network and then maps the learning
weights to the SNN under training, instead of training SNN parameters directly.
In order to eliminate this workaround, recently proposed is a new class of SNN
named deep spiking networks (DSNs), which can be trained directly (without a
mapping from conventional deep networks) by error backpropagation with
stochastic gradient descent. In this paper, we show that the initialization of
the membrane potential on the backward path is an important step in DSN
training, through diverse experiments performed under various conditions.
Furthermore, we propose a simple and efficient method that can improve DSN
training by controlling the initial membrane potential on the backward path. In
our experiments, adopting the proposed approach allowed us to boost the
performance of DSN training in terms of converging time and accuracy.

Learning Dynamic Programming with Split-Merge Networks

Alex Noway, Joan Bruna
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We consider the learning of algorithmic tasks by mere observation of
input-output pairs. Rather than studying this as a black-box discrete
regression problem with no assumption whatsoever on the input-output mapping,
we concentrate on tasks that are amenable to the principle of emph{divide and
conquer}, and study what are its implications in terms of learning.

This principle creates a powerful inductive bias that we exploit with neural
architectures that are defined recursively, by learning two scale-invariant
atomic operators: how to emph{split} a given input into two disjoint sets, and
how to emph{merge} two partially solved tasks into a larger partial solution.
The scale invariance creates parameter sharing across all stages of the
architecture, and the dynamic design creates architectures whose complexity can
be tuned in a differentiable manner.

As a result, our model is trained by backpropagation not only to minimize the
errors at the output, but also to do so as efficiently as possible, by
enforcing shallower computation graphs. Moreover, thanks to the scale
invariance, the model can be trained only with only input/output pairs,
removing the need to know oracle intermediate split and merge decisions. As it
turns out, accuracy and complexity are not independent qualities, and we verify
empirically that when the learnt complexity matches the underlying complexity
of the task, this results in higher accuracy and better generalization in two
paradigmatic problems: sorting and finding planar convex hulls.

Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

David Balduzzi, Brian McWilliams, Tony Butler-Yeoman
Comments: 13 pages, 6 figures
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Modern convolutional networks, incorporating rectifiers and max-pooling, are
neither smooth nor convex. Standard guarantees therefore do not apply.
Nevertheless, methods from convex optimization such as gradient descent and
Adam are widely used as building blocks for deep learning algorithms. This
paper provides the first convergence guarantee applicable to modern convnets.
The guarantee matches a lower bound for convex nonsmooth functions. The key
technical tool is the neural Taylor approximation — a straightforward
application of Taylor expansions to neural networks — and the associated
Taylor loss. Experiments on a range of optimizers, layers, and tasks provide
evidence that the analysis accurately captures the dynamics of neural
optimization.

The second half of the paper applies the Taylor approximation to isolate the
main difficulty in training rectifier nets: that gradients are shattered. We
investigate the hypothesis that, by exploring the space of activation
configurations more thoroughly, adaptive optimizers such as RMSProp and Adam
are able to converge to better solutions.

Learning from Untrusted Data

The vast majority of theoretical results in machine learning and statistics
assume that the available training data is a reasonably reliable reflection of
the phenomena to be learned or estimated. Similarly, the majority of machine
learning and statistical techniques used in practice are brittle to the
presence of large amounts of biased or malicious data. In this work we propose
two novel frameworks in which to study estimation, learning, and optimization
in the presence of significant fractions of arbitrary data. The first
framework, which we term list-decodable learning, asks whether it is possible
to return a list of answers, with the guarantee that at least one of them is
accurate. For example, given a dataset of (n) points for which an unknown
subset of (alpha n) points are drawn from a distribution of interest, and no
assumptions are made about the remaining ((1-alpha)n) points, is it possible
to return a list of (poly(1/alpha)) answers, one of which is correct? The
second framework, which we term the semi-verified learning model considers the
extent to which a small dataset of trusted data (drawn from the distribution in
question) can be leveraged to enable the accurate extraction of information
from a much larger but untrusted dataset (of which only an (alpha)-fraction is
drawn from the distribution).

We show strong positive results in both settings, and provide an algorithm
for robust learning in a very general stochastic optimization setting. This
general result has immediate implications for robust estimation in a number of
settings, including for robustly estimating the mean of distributions with
bounded second moments, robustly learning mixtures of such distributions, and
robustly finding planted partitions in random graphs in which significant
portions of the graph have been perturbed by an adversary.

Unsupervised Pretraining for Sequence to Sequence Learning

Prajit Ramachandran, Peter J. Liu, Quoc V. Le
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Sequence to sequence models are successful tools for supervised sequence
learning tasks, such as machine translation. Despite their success, these
models still require much labeled data and it is unclear how to improve them
using unlabeled data, which is much less expensive to obtain. In this paper, we
present simple changes that lead to a significant improvement in the accuracy
of seq2seq models when the labeled set is small. Our method intializes the
encoder and decoder of the seq2seq model with the trained weights of two
language models, and then all weights are jointly fine-tuned with labeled data.
An additional language modeling loss can be used to regularize the model during
fine-tuning. We apply this method to low-resource tasks in machine translation
and abstractive summarization and find that it significantly improves the
subsequent supervised models. Our main finding is that the pretraining
accelerates training and improves generalization of seq2seq models, achieving
state-of-the-art results on the WMT English(
ightarrow)German task. Our model
obtains an improvement of 1.3 BLEU from the previous best models on both WMT’14
and WMT’15 English(
ightarrow)German. Our ablation study shows that
pretraining helps seq2seq models in different ways depending on the nature of
the task: translation benefits from the improved generalization whereas
summarization benefits from the improved optimization.

Sentence Ordering using Recurrent Neural Networks

Lajanugen Logeswaran, Honglak Lee, Dragomir Radev
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

Modeling the structure of coherent texts is a task of great importance in
NLP. The task of organizing a given set of sentences into a coherent order has
been commonly used to build and evaluate models that understand such structure.
In this work we propose an end-to-end neural approach based on the recently
proposed set to sequence mapping framework to address the sentence ordering
problem. Our model achieves state-of-the-art performance in the order
discrimination task on two datasets widely used in the literature. We also
consider a new interesting task of ordering abstracts from conference papers
and research proposals and demonstrate strong performance against recent
methods. Visualizing the sentence representations learned by the model shows
that the model has captured high level logical structure in these paragraphs.
The model also learns rich semantic sentence representations by learning to
order texts, performing comparably to recent unsupervised representation
learning methods in the sentence similarity and paraphrase detection tasks.

Cognitive Discriminative Mappings for Rapid Learning

Wen-Chieh Fang, Yi-ting Chiang
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Humans can learn concepts or recognize items from just a handful of examples,
while machines require many more samples to perform the same task. In this
paper, we build a computational model to investigate the possibility of this
kind of rapid learning. The proposed method aims to improve the learning task
of input from sensory memory by leveraging the information retrieved from
long-term memory. We present a simple and intuitive technique called cognitive
discriminative mappings (CDM) to explore the cognitive problem. First, CDM
separates and clusters the data instances retrieved from long-term memory into
distinct classes with a discrimination method in working memory when a sensory
input triggers the algorithm. CDM then maps each sensory data instance to be as
close as possible to the median point of the data group with the same class.
The experimental results demonstrate that the CDM approach is effective for
learning the discriminative features of supervised classifications with few
training sensory input instances.

Domain Adaptation with L2 constraints for classifying images from different endoscope systems

This paper proposes a method for domain adaptation that extends the maximum
margin domain transfer (MMDT) proposed by Hoffman et al., by introducing L_2
distance constraints between samples of different domains; thus, our method is
denoted as MMDTL2. Motivated by the differences between the images taken by
narrow band imaging (NBI) endoscopic devices, we utilize different NBI devices
as different domains and estimate the transformations between samples of
different domains, i.e., image samples taken by different NBI endoscope
systems. We first formulate the problem in the primal form, and then derive the
dual form with much lesser computational costs as compared to the naive
approach. From our experimental results using NBI image datasets from two
different NBI endoscopic devices, we find that MMDTL2 is more stable than MMDT
and better than support vector machines without adaptation.

An Online Prediction Framework for Non-Stationary Time Series

Christopher Xie, Avleen Bijral, Juan Lavista Ferres
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We extend an online time series prediction algorithm for ARMA processes to
describe a framework for time series prediction that can efficiently handle
non-stationarities that exist in many real time series. We show that
appropriate transformations to such time series can lead to theoretical and
empirical gains. To account for the phenomenon of cointegration in the
multivariate case, we present a novel algorithm EC-VARMA-OGD that estimates
both the auto-regressive and the cointegrating parameters. Relaxing the
assumptions for the analysis, we prove a sub-linear regret bound for all the
methods described. We note that the theoretical guarantees do not provide a
complete picture, thus we provide a data-dependent analysis of the
follow-the-leader algorithm for least squares loss that explains the success of
using non-stationary transformations. We support all of our results with
experiments on simulated and real data.

Adversarial Ladder Networks

Juan Maroñas Molano, Alberto Albiol Colomer, Roberto Paredes Palacios
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

The use of unsupervised data in addition to supervised data in training
discriminative neural networks has improved the performance of this clas-
sification scheme. However, the best results were achieved with a training
process that is divided in two parts: first an unsupervised pre-training step
is done for initializing the weights of the network and after these weights are
refined with the use of supervised data. On the other hand adversarial noise
has improved the results of clas- sical supervised learning. Recently, a new
neural network topology called Ladder Network, where the key idea is based in
some properties of hierar- chichal latent variable models, has been proposed as
a technique to train a neural network using supervised and unsupervised data at
the same time with what is called semi-supervised learning. This technique has
reached state of the art classification. In this work we add adversarial noise
to the ladder network and get state of the art classification, with several
important conclusions on how adversarial noise can help in addition with new
possible lines of investi- gation. We also propose an alternative to add
adversarial noise to unsu- pervised data.

Learning Influence Functions from Incomplete Observations

Xinran He, Ke Xu, David Kempe, Yan Liu
Comments: Full version of paper “Learning Influence Functions from Incomplete Observations” in NIPS16
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of learning influence functions under incomplete
observations of node activations. Incomplete observations are a major concern
as most (online and real-world) social networks are not fully observable. We
establish both proper and improper PAC learnability of influence functions
under randomly missing observations. Proper PAC learnability under the
Discrete-Time Linear Threshold (DLT) and Discrete-Time Independent Cascade
(DIC) models is established by reducing incomplete observations to complete
observations in a modified graph. Our improper PAC learnability result applies
for the DLT and DIC models as well as the Continuous-Time Independent Cascade
(CIC) model. It is based on a parametrization in terms of reachability
features, and also gives rise to an efficient and practical heuristic.
Experiments on synthetic and real-world datasets demonstrate the ability of our
method to compensate even for a fairly large fraction of missing observations.

Information Theory

Tradeoff Caching Strategy of Outage Probability and Fronthaul Usage in Cloud-RAN

Zhun Ye, Cunhua Pan, Huiling Zhu, Jiangzhou Wang
Comments: submitted to one journal for possible publication
Subjects: Information Theory (cs.IT)

In this paper, optimal content caching strategy is proposed to jointly
minimize the cell average outage probability and fronthaul usage in cloud radio
access network (Cloud-RAN). At first, an accurate closed form expression of the
outage probability conditioned on the user’s location is presented, and the
cell average outage probability is obtained through the composite Simpson’s
integration. The caching strategy for jointly optimizing the cell average
outage probability and fronthaul usage is then formulated as a weighted sum
minimization problem, which is a nonlinear 0-1 integer NP-hard problem. In
order to deal with the NP-hard problem, two heuristic algorithms are proposed
in this paper. Firstly, a genetic algorithm (GA) based approach is proposed.
Numerical results show that the performance of the proposed GA-based approach
with significantly reduced computational complexity is close to the optimal
performance achieved by exhaustive search based caching strategy. Secondly, in
order to further reduce the computational complexity, a mode selection approach
is proposed. Numerical results show that this approach can achieve near-optimal
performance over a wide range of the weighting factors through a single
computation.

RSB Decoupling Property of MAP Estimators

Ali Bereyhi, Ralf R. Müller, Hermann Schulz-Baldes
Comments: 5 pages, presented in Information Theory Workshop 2016
Subjects: Information Theory (cs.IT)

The large-system decoupling property of a MAP estimator is studied when it
estimates the i.i.d. vector (oldsymbol{x}) from the observation
(oldsymbol{y}=mathbf{A}oldsymbol{x}+oldsymbol{z}) with (mathbf{A})
being chosen from a wide range of matrix ensembles, and the noise vector
(oldsymbol{z}) being i.i.d. and Gaussian. Using the replica method, we show
that the marginal joint distribution of any two corresponding input and output
symbols converges to a deterministic distribution which describes the
input-output distribution of a single user system followed by a MAP estimator.
Under the (b)RSB assumption, the single user system is a scalar channel with
additive noise where the noise term is given by the sum of an independent
Gaussian random variable and (b) correlated interference terms. As the (b)RSB
assumption reduces to RS, the interference terms vanish which results in the
formerly studied RS decoupling principle.

On the Uplink Spectral Efficiency of Full-Duplex Cooperative OFDMA Systems

Jafar Banar, S. Mohammad Razavizadeh
Comments: 5 pages, 4 figures, conference
Subjects: Information Theory (cs.IT)

In this paper, we develop a resource allocation algorithm for uplink of
in-band full-duplex (FD) cellular networks. The FD cellular network is assumed
to be based on orthogonal frequency division multiple access (OFDMA) and
consists of a base station communicating with multiple users. Some of the users
in the network act as relay for other users and help them to transmit their
data to the base station. These relays are FD and work based on amplify and
forward (AF) protocol. By appropriate selection of the relays and optimized
allocation of subcarriers and powers to all users, we try to maximize the total
sum rate of the network. During this optimization, we also impose some
constraints on the users’ quality of service (QoS) and power. We propose a new
algorithm to select the best relays based on the users’ maximum data rate and
also use Linear Assignment Problem Jonker-Volgenant (LAPJV) algorithm for
subcarrier assignment. It is proved that the resulting optimization problem can
be converted to a convex problem, and hence it can be solved by standard
numerical methods. The simulation results demonstrate the effect of the
proposed scheme on the sum rate and coverage of the network.

Spectrum sharing in energy harvesting cognitive radio networks: A cross-layer perspective

Tian Zhang, Wei Chen
Comments: Conference paper
Subjects: Information Theory (cs.IT)

In the paper, we present a cross-layer perspective on data transmission in
energy harvesting cognitive radio networks (CRNs). The physical layer power
allocation and network layer delay are jointly considered. The delay optimal
power allocation is studied taking account the randomness of harvested energy,
data generation, channel state and the grid price. To guarantee PU’s
transmission, its Signal-Interference-Ratio (SIR) should be no less than a
threshold. Each user, including primary user (PU) as well as secondary user
(SU), has energy harvesting devices, and the PU can also purchases the grid
power. Each user is rational and selfish to minimize its own the buffer delay.
We formulate a stochastic Stackelberg game in a bilevel manner. After
decoupling via rewriting the objective and constraints, an equivalent tractable
reconstruction is derived. First, we give a distributive algorithm to obtain
the Nash equilibrium (NE) of the lower level SUs’ noncooperative stochastic
game. Thereafter, the stochastic Stackelberg game is discussed under the
circumstances that there is no information exchange between PU and SU.
Distributed iterative algorithms are designed. Furthermore, a distributive
online algorithm is proposed. Finally, simulations are carried out to verify
the correctness and demonstrate the effectiveness of proposed algorithms.

Single-Tap Precoders and Decoders for Multi-User MIMO FBMC-OQAM under Strong Channel Frequency Selectivity

François Rottenberg, Xavier Mestre, François Horlin, Jérôme Louveaux
Journal-ref: IEEE Transactions on Signal Processing (2016)
Subjects: Information Theory (cs.IT)

The design of linear precoders or decoders for multiuser (MU) multiple-input
multiple-output (MIMO) filterbank multicarrier (FBMC) modulations in the case
of strong channel frequency selectivity is presented. The users and the base
station (BS) communicate using space division multiple access (SDMA). The low
complexity proposed solution is based on a single tap per-subcarrier
precoding/decoding matrix at the base station (BS) in the downlink/uplink. As
opposed to classical approaches that assume flat channel frequency selectivity
at the subcarrier level, the BS does not make this assumption and takes into
account the distortion caused by channel frequency selectivity. The expression
of the FBMC asymptotic mean squared error (MSE) in the case of strong channel
selectivity derived in earlier works is developed and extended. The linear
precoders and decoders are found by optimizing the MSE formula under two design
criteria, namely zero forcing (ZF) or minimum mean squared error (MMSE).
Finally, simulation results demonstrate the performance of the optimized
design. As long as the number of BS antennas is larger than the number of
users, it is shown that those extra degrees of freedom can be used to
compensate for the channel frequency selectivity.

Policy Optimization for Content Push via Energy Harvesting Small Cells in Heterogeneous Networks

Jie Gong, Sheng Zhou, Zhenyu Zhou, Zhisheng Niu
Comments: 30 pages, 7 figures, to appear in IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

Motivated by the rapid development of energy harvesting technology and
content-aware communication in access networks, this paper considers the push
mechanism design in small-cell base stations (SBSs) powered by renewable
energy. A user request can be satisfied by either push or unicast from the SBS.
If the SBS cannot handle the request, the user is blocked by the SBS and is
served by the macro-cell BS (MBS) instead, which typically consumes more
energy. We aim to minimize the ratio of user requests blocked by the SBS. With
finite battery capacity, Markov decision process based problem is formulated,
and the optimal policy is found by dynamic programming (DP). Two
threshold-based policies are proposed: the push-only threshold-based (POTB)
policy and the energy-efficient threshold-based (EETB) policy, and the
closed-form blocking probabilities with infinite battery capacity are derived.
Numerical results show that the proposed policies outperform the conventional
non-push policy if the content popularity changes slowly or the content request
generating rate is high, and can achieve the performance of the greedy optimal
threshold-based (GOTB) policy. In addition, the performance gap between the
threshold-based policies and the DP optimal policy is small when the energy
arrival rate is low or the request generating rate is high.

Low-Complexity QoS-Aware Coordinated Scheduling for Heterogenous Networks

Jun Zhu, Hong-Chuan Yang
Comments: 6 pages, 6 figures, accepted by IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT)

In this paper, we consider a heterogenous network (HetNet), where low-power
indoor femtocells are deployed in the coverage area of the existing macro base
station (MBS). This paper proposes a novel coordinated random beamforming and
user scheduling strategy to improve the throughput of users served by the
femtocell access point (FAP) while satisfying the quality-of-service (QoS)
requirements of users served by both MBS and FAP. The strategy, termed as
QoS-Aware Coodinated Scheduling (QACS), requires limited coordination between
the MBS and FAP, i.e., only the indexes of the qualified beams are shared.
Exact statistical analysis for the ergodic achievable rate of both FAP and MBS
with the proposed strategy are presented. Scheduling fairness is also addressed
for the proposed QACS.

On a condition equivalent to the Maximal Distance Separable conjecture

Jeffery Sun, Steven Damelin, Daniel Kaiser
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

We prove that the following are equivalent. We denote by (mathcal{P}_q) the
vector space of functions from a finite field (mathbb{F}_q) to itself, which
can be represented as the space (mathcal{P}_q := mathbb{F}_q[x]/(x^q-x)) of
polynomial functions. We denote by (mathcal{O}_n subset mathcal{P}_q) the
set of polynomials that are either the zero polynomial, or have at most (n)
distinct roots in (mathbb{F}_q). Given two subspaces (Y,Z) of (mathcal{P}_q),
we denote by (langle Y,Z
angle) their span.

A) Let (k, q) integers, with (q) a prime power and (2 leq k leq q). Suppose
that either:

1) (q) is odd

2) (q) is even and (k
otin {3, q-1}).

Then there do not exist distinct subspaces (Y) and (Z) of (mathcal{P}_q)
such that:

1′) (dim(langle Y, Z
angle) = k)

2′) (dim(Y) = dim(Z) = k-1).

3′) (langle Y, Z
angle subset mathcal{O}_{k-1})

4′) (Y, Z subset mathcal{O}_{k-2})

5′) (Ycap Z subset mathcal{O}_{k-3}).

B) The MDS conjecture is true.

Optimal Dynamic Point Selection for Power Minimization in Multiuser Downlink CoMP

Duy H. N. Nguyen, Long B. Le, Tho Le-Ngoc
Comments: 14 pages, 9 figures, accepted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

This paper examines a CoMP system where multiple base-stations (BS) employ
coordinated beamforming to serve multiple mobile-stations (MS). Under the
dynamic point selection mode, each MS can be assigned to only one BS at any
time. This work then presents a solution framework to optimize the BS
associations and coordinated beamformers for all MSs. With target
signal-to-interference-plus-noise ratios at the MSs, the design objective is to
minimize either the weighted sum transmit power or the per-BS transmit power
margin. Since the original optimization problems contain binary variables
indicating the BS associations, finding their optimal solutions is a
challenging task. To circumvent this difficulty, we first relax the original
problems into new optimization problems by expanding their constraint sets.
Based on the nonconvex quadratic constrained quadratic programming framework,
we show that these relaxed problems can be solved optimally. Interestingly,
with the first design objective, the obtained solution from the relaxed problem
is also optimal to the original problem. With the second design objective, a
suboptimal solution to the original problem is then proposed, based on the
obtained solution from the relaxed problem. Simulation results show that the
resulting jointly optimal BS association and beamforming design significantly
outperforms fixed BS association schemes.

Chaos-based Wireless Communication Resisting Multipath Effect

Junliang Yao, Chen Li, Haipeng Ren
Comments: 5 pages, 3 figures, 1 table
Subjects: Information Theory (cs.IT)

In additive white gaussian noise (AWGN) channel, chaos has been proved to be
optimal coherent communication waveform in the sense of maximizing the
signal-to-noise ratio (SNR). In wireless communication system, intersymbol
interference (ISI) caused by multipath propagation is one of main obstacles to
achieve high bit transmission rate and low bit error rate (BER). How to resist
multipath effect is a fundamental problem in chaos-based wireless communication
system (CWCS). In this paper, a CWCS is built by transmitting chaotic signals
generated by a hybrid dynamical system and filtering the received signal using
the corresponding matched filter to decrease noise effect and to detect binary
information. We find that the multipath effect can be effectively resisted by
regrouping the return map of received signal and setting the corresponding
threshold using available information. We show that the optimal threshold is a
function of the channel parameters and the transmitted information symbols.
Practically, the channel parameters are time-variant, and the future
information symbols are unavailable. In this case, a suboptimal threshold (SOT)
is proposed, and the BER using the SOT is derived analytically. Numerical
experimental results show that the CWCS achieves a very competitive performance
even under inaccurate channel parameters.

Spectral Statistics of Lattice Graph Percolation Models

Stephen Kruzick, Jose M. F. Moura
Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT)

In graph signal processing, the graph adjacency matrix or the graph Laplacian
commonly define the shift operator. The spectral decomposition of the shift
operator plays an important role in that the eigenvalues represent frequencies
and the eigenvectors provide a spectral basis. This is useful, for example, in
the design of filters. However, the graph or network may be uncertain due to
stochastic influences in construction and maintenance, and, under such
conditions, the eigenvalues of the shift matrix become random variables. This
paper examines the spectral distribution of the eigenvalues of random networks
formed by including each link of a D-dimensional lattice supergraph
independently with identical probability, a percolation model. Using the
stochastic canonical equation methods developed by Girko for symmetric matrices
with independent upper triangular entries, a deterministic distribution is
found that asymptotically approximates the empirical spectral distribution of
the scaled adjacency matrix for a model with arbitrary parameters. The main
results characterize the form of the solution to an important system of
equations that leads to this deterministic distribution function and
significantly reduce the number of equations that must be solved to find the
solution for a given set of model parameters. Simulations comparing the
expected empirical spectral distributions and the computed deterministic
distributions are provided for sample parameters.

Analysis of Static Cellular Cooperation between Mutually Nearest Neighboring Nodes

Luis David Alvarez Corrales, Anastasios Giovanidis, Philippe Martins, Laurent Decreusefond
Comments: 17 pages, double column, Appendices A-D, 9 Figures, 18 total subfigures. arXiv admin note: text overlap with arXiv:1604.04640
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Probability (math.PR)

Cooperation in cellular networks is a promising scheme to improve system
performance. Existing works consider that a user dynamically chooses the
stations that cooperate for his/her service, but such assumption often has
practical limitations. Instead, cooperation groups can be predefined and
static, with nodes linked by fixed infrastructure. To analyze such a potential
network, we propose a grouping method based on node proximity. With the
Mutually Nearest Neighbour Relation, we allow the formation of singles and
pairs of nodes. Given an initial topology for the stations, two new point
processes are defined, one for the singles and one for the pairs. We derive
structural characteristics for these processes and analyse the resulting
interference fields. When the node positions follow a Poisson Point Process
(PPP) the processes of singles and pairs are not Poisson. However, the
performance of the original model can be approximated by the superposition of
two PPPs. This allows the derivation of exact expressions for the coverage
probability. Numerical evaluation shows coverage gains from different signal
cooperation that can reach up to 15% compared to the standard noncooperative
coverage. The analysis is general and can be applied to any type of cooperation
in pairs of transmitting nodes.