IT博客汇 | arXiv Paper Daily: Thu, 10 Nov 2016

arXiv Paper Daily: Thu, 10 Nov 2016

我爱机器学习(52ml.net)发表于 2016-11-10 00:00:00

Neural and Evolutionary Computing

Bio-Inspired Spiking Convolutional Neural Network using Layer-wise Sparse Coding and STDP Learning

Amirhossein Tavanaei, Anthony S. Maida
Subjects: Neural and Evolutionary Computing (cs.NE)

Hierarchical feature discovery using non-spiking convolutional neural
networks (CNNs) has attracted much recent interest in machine learning and
computer vision. However, it is still not well understood how to create spiking
deep networks with multi-layer, unsupervised learning. One advantage of spiking
CNNs is their bio-realism. Another advantage is that they represent information
using sparse spike-trains which enable power-efficient implementation. This
paper explores a novel bio-inspired spiking CNN that is trained in a greedy,
layer-wise fashion. The proposed network consists of a spiking
convolutional-pooling layer followed by a feature discovery layer. Kernels for
the convolutional layer are trained using local learning. The learning is
implemented using a sparse, spiking auto-encoder representing primary visual
features. The feature discovery layer is equipped with a probabilistic
spike-timing-dependent plasticity (STDP) learning rule. This layer represents
complex visual features using probabilistic leaky, integrate-and-fire (LIF)
neurons. Our results show that the convolutional layer is stack-admissible,
enabling it to support a multi-layer learning. The visual features obtained
from the proposed probabilistic LIF neurons in the feature discovery layer are
utilized for training a classifier. Classification results contribute to the
independent and informative visual features extracted in a hierarchy of
convolutional and feature discovery layers. The proposed model is evaluated on
the MNIST digit dataset using clean and noisy images. The recognition
performance for clean images is above 98%. The performance loss for recognizing
the noisy images is in the range 0.1% to 8.5% depending on noise types and
densities. This level of performance loss indicates that the network is robust
to additive noise.

Lie-Access Neural Turing Machines

Greg Yang, Alexander M. Rush
Comments: Submitted to ICLR. Rewrite and improvement of this https URL
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Recent work has demonstrated the effectiveness of employing explicit external
memory structures in conjunction with deep neural models for algorithmic
learning (Graves et al. 2014; Weston et al. 2014). These models utilize
differentiable versions of traditional discrete memory-access structures
(random access, stacks, tapes) to provide the variable-length storage necessary
for computational tasks. In this work, we propose an alternative model,
Lie-access memory, that is explicitly designed for the neural setting. In this
paradigm, memory is accessed using a continuous head in a key-space manifold.
The head is moved via Lie group actions, such as shifts or rotations, generated
by a controller, and soft memory access is performed by considering the
distance to keys associated with each memory. We argue that Lie groups provide
a natural generalization of discrete memory structures, such as Turing
machines, as they provide inverse and identity operators while maintain
differentiability. To experiment with this approach, we implement several
simplified Lie-access neural Turing machine (LANTM) with different Lie groups.
We find that this approach is able to perform well on a range of algorithmic
tasks.

Incremental Sequence Learning

Edwin D. de Jong
Comments: 18 pages
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Deep learning research over the past years has shown that by increasing the
scope or difficulty of the learning problem over time, increasingly complex
learning problems can be addressed. We study incremental learning in the
context of sequence learning, using generative RNNs in the form of multi-layer
recurrent Mixture Density Networks. We introduce Incremental Sequence Learning,
a simple incremental approach to sequence learning. Incremental Sequence
Learning starts out by using only the first few steps of each sequence as
training data. Each time a performance criterion has been reached, the length
of the parts of the sequences used for training is increased. To evaluate
Incremental Sequence Learning and comparison methods, we introduce and make
available a novel sequence learning task and data set: predicting and
classifying MNIST pen stroke sequences, where the familiar handwritten digit
images have been transformed to pen stroke sequences representing the skeletons
of the digits. We find that Incremental Sequence Learning greatly speeds up
sequence learning and reaches the best test performance level of regular
sequence learning 20 times faster, reduces the test error by 74%, and in
general performs more robustly; it displays lower variance and achieves
sustained progress after all three comparison method have stopped improving. A
trained sequence prediction model is also used in transfer learning to the task
of sequence classification, where it is found that transfer learning realizes
improved classification performance compared to methods that learn to classify
from scratch.

RL(^2): Fast Reinforcement Learning via Slow Reinforcement Learning

Yan Duan, John Schulman, Xi Chen, Peter Bartlett, Ilya Sutskever, Pieter Abbeel
Comments: 14 pages. Under review as a conference paper at ICLR 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Deep reinforcement learning (deep RL) has been successful in learning
sophisticated behaviors automatically; however, the learning process requires a
huge number of trials. In contrast, animals can learn new tasks in just a few
trials, benefiting from their prior knowledge about the world. This paper seeks
to bridge this gap. Rather than designing a “fast” reinforcement learning
algorithm, we propose to represent it as a recurrent neural network (RNN) and
learn it from data. In our proposed method, RL(^2), the algorithm is encoded in
the weights of the RNN, which are learned slowly through a general-purpose
(“slow”) RL algorithm. The RNN receives all information a typical RL algorithm
would receive, including observations, actions, rewards, and termination flags;
and it retains its state across episodes in a given Markov Decision Process
(MDP). The activations of the RNN store the state of the “fast” RL algorithm on
the current (previously unseen) MDP. We evaluate RL(^2) experimentally on both
small-scale and large-scale problems. On the small-scale side, we train it to
solve randomly generated multi-arm bandit problems and finite MDPs. After
RL(^2) is trained, its performance on new MDPs is close to human-designed
algorithms with optimality guarantees. On the large-scale side, we test RL(^2)
on a vision-based navigation task and show that it scales up to
high-dimensional problems.

Computer Vision and Pattern Recognition

Optimal Multiple Surface Segmentation with Convex Priors in Irregularly Sampled Space

Abhay Shah, Junjie Bai, Michael D. Abramoff, Xiaodong Wu
Comments: 20 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Optimal surface segmentation is widely used in numerous medical image
segmentation applications. However, nodes in the graph based optimal surface
segmentation method typically encode uniformly distributed orthogonal voxels of
the volume. Thus the segmentation cannot attain an accuracy greater than a
single unit voxel, i.e. the distance between two adjoining nodes in graph
space. Segmentation accuracy higher than a unit voxel is achievable by
exploiting partial volume information in the voxels which shall result in
non-equidistant spacing between adjoining graph nodes. This paper reports a
generalized graph based optimal multiple surface segmentation method with
convex priors which segments the target surfaces in irregularly sampled space.
The proposed method allows non-equidistant spacing between the adjoining graph
nodes to achieve subvoxel accurate segmentation by utilizing the partial volume
information in the voxels. The partial volume information in the voxels is
exploited by computing a displacement field from the original volume data to
identify the subvoxel accurate centers within each voxel resulting in
non-equidistant spacing between the adjoining graph nodes. The smoothness of
each surface modelled as a convex constraint governs the connectivity and
regularity of the surface. We employ an edge-based graph representation to
incorporate the necessary constraints and the globally optimal solution is
obtained by computing a minimum s-t cut. The proposed method was validated on
25 optical coherence tomography image volumes of the retina and 10
intravascular multi-frame ultrasound image datasets for subvoxel and super
resolution segmentation accuracy. In all cases, the approach yielded highly
accurate results. Our approach can be readily extended to higher-dimensional
segmentations.

Node-Adapt, Path-Adapt and Tree-Adapt:Model-Transfer Domain Adaptation for Random Forest

Azadeh S. Mozafari, David Vazquez, Mansour Jamzad, Antonio M. Lopez
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Random Forest (RF) is a successful paradigm for learning classifiers due to
its ability to learn from large feature spaces and seamlessly integrate
multi-class classification, as well as the achieved accuracy and processing
efficiency. However, as many other classifiers, RF requires domain adaptation
(DA) provided that there is a mismatch between the training (source) and
testing (target) domains which provokes classification degradation.
Consequently, different RF-DA methods have been proposed, which not only
require target-domain samples but revisiting the source-domain ones, too. As
novelty, we propose three inherently different methods (Node-Adapt, Path-Adapt
and Tree-Adapt) that only require the learned source-domain RF and a relatively
few target-domain samples for DA, i.e. source-domain samples do not need to be
available. To assess the performance of our proposals we focus on image-based
object detection, using the pedestrian detection problem as challenging
proof-of-concept. Moreover, we use the RF with expert nodes because it is a
competitive patch-based pedestrian model. We test our Node-, Path- and
Tree-Adapt methods in standard benchmarks, showing that DA is largely achieved.

Audio Visual Speech Recognition using Deep Recurrent Neural Networks

Abhinav Thanda, Shankar M Venkatesan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

In this work, we propose a training algorithm for an audio-visual automatic
speech recognition (AV-ASR) system using deep recurrent neural network
(RNN).First, we train a deep RNN acoustic model with a Connectionist Temporal
Classification (CTC) objective function. The frame labels obtained from the
acoustic model are then used to perform a non-linear dimensionality reduction
of the visual features using a deep bottleneck network. Audio and visual
features are fused and used to train a fusion RNN. The use of bottleneck
features for visual modality helps the model to converge properly during
training. Our system is evaluated on GRID corpus. Our results show that
presence of visual modality gives significant improvement in character error
rate (CER) at various levels of noise even when the model is trained without
noisy data. We also provide a comparison of two fusion methods: feature fusion
and decision fusion.

The Little Engine that Could: Regularization by Denoising (RED)

Yaniv Romano, Michael Elad, Peyman Milanfar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (cs.NA)

Removal of noise from an image is an extensively studied problem in image
processing. Indeed, the recent advent of sophisticated and highly effective
denoising algorithms lead some to believe that existing methods are touching
the ceiling in terms of noise removal performance. Can we leverage this
impressive achievement to treat other tasks in image processing? Recent work
has answered this question positively, in the form of the Plug-and-Play Prior
((P^3)) method, showing that any inverse problem can be handled by sequentially
applying image denoising steps. This relies heavily on the ADMM optimization
technique in order to obtain this chained denoising interpretation.

Is this the only way in which tasks in image processing can exploit the image
denoising engine? In this paper we provide an alternative, more powerful and
more flexible framework for achieving the same goal. As opposed to the (P^3)
method, we offer Regularization by Denoising (RED): using the denoising engine
in defining the regularization of the inverse problem. We propose an explicit
image-adaptive Laplacian-based regularization functional, making the overall
objective functional clearer and better defined. With a complete flexibility to
choose the iterative optimization procedure for minimizing the above
functional, RED is capable of incorporating any image denoising algorithm,
treat general inverse problems very effectively, and is guaranteed to converge
to the globally optimal result. We test this approach and demonstrate
state-of-the-art results in the image deblurring and super-resolution problems.

Semi-Supervised Recognition of the Diploglossus Millepunctatus Lizard Species using Artificial Vision Algorithms

Jhony-Heriberto Giraldo-Zuluaga, Augusto Salazar, Juan M. Daza
Comments: arXiv admin note: text overlap with arXiv:1603.00841
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Animal biometrics is an important requirement for monitoring and conservation
tasks. The classical animal biometrics risk the animals’ integrity, are
expensive for numerous animals, and depend on expert criterion. The
non-invasive biometrics techniques offer alternatives to manage the
aforementioned problems. In this paper we propose an automatic segmentation and
identification algorithm based on artificial vision algorithms to recognize
Diploglossus millepunctatus. Diploglossus millepunctatus is an endangered
lizard species. The algorithm is based on two stages: automatic segmentation to
remove the subjective evaluation, and one identification stage to reduce the
analysis time. A 82.87% of correct segmentation in average is reached.
Meanwhile the identification algorithm is achieved with euclidean distance
point algorithms such as Iterative Closest Point and Procrustes Analysis. A
performance of 92.99% on the top 1, and a 96.82% on the top 5 is reached. The
developed software, and the database used in this paper are publicly available
for download from the web page of the project.

Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data

Xinghua Lou, Ken Kansky, Wolfgang Lehrach, CC Laan, Bhaskara Marthi, D. Scott Phoenix, Dileep George
Journal-ref: Advances in Neural Information Processing Systems 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We demonstrate that a generative model for object shapes can achieve state of
the art results on challenging scene text recognition tasks, and with orders of
magnitude fewer training images than required for competing discriminative
methods. In addition to transcribing text from challenging images, our method
performs fine-grained instance segmentation of characters. We show that our
model is more robust to both affine transformations and non-affine deformations
compared to previous approaches.

Deep Convolutional Neural Network for 6-DOF Image Localization

Daoyuan Jia, Yongchi Su, Chunping Li
Comments: will update soon
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present an accurate and robust method for six degree of freedom image
localization. There are two key-points of our method, 1. automatic immense
photo synthesis and labeling from point cloud model and, 2. pose estimation
with deep convolutional neural networks regression. Our model can directly
regresses 6-DOF camera poses from images, accurately describing where and how
it was captured. We achieved an accuracy within 1 meters and 1 degree on our
out-door dataset, which covers about 2 acres on our school campus.

A backward pass through a CNN using a generative model of its activations

Huayan Wang, Anna Chen, Yi Liu, Dileep George, D. Scott Phoenix
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Neural networks have shown to be a practical way of building a very complex
mapping between a pre-specified input space and output space. For example, a
convolutional neural network (CNN) mapping an image into one of a thousand
object labels is approaching human performance in this particular task. However
the mapping (neural network) does not automatically lend itself to other forms
of queries, for example, to detect/reconstruct object instances, to enforce
top-down signal on ambiguous inputs, or to recover object instances from
occlusion. One way to address these queries is a backward pass through the
network that fuses top-down and bottom-up information. In this paper, we show a
way of building such a backward pass by defining a generative model of the
neural network’s activations. Approximate inference of the model would
naturally take the form of a backward pass through the CNN layers, and it
addresses the aforementioned queries in a unified framework.

Robust Cardiac Motion Estimation using Ultrafast Ultrasound Data: A Low-Rank-Topology-Preserving Approach

Angelica I. Aviles, Thomas Widlak, Alicia Casals, Maartje M. Nillesen, Habib Ammari
Comments: 15 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cardiac motion estimation is an important diagnostic tool to detect heart
diseases and it has been explored with modalities such as MRI and conventional
ultrasound (US) sequences. US cardiac motion estimation still presents
challenges because of the complex motion patterns and the presence of noise. In
this work, we propose a novel approach to estimate the cardiac motion using
ultrafast ultrasound data. — Our solution is based on a variational
formulation characterized by the L2-regularized class. The displacement is
represented by a lattice of b-splines and we ensure robustness by applying a
maximum likelihood type estimator. While this is an important part of our
solution, the main highlight of this paper is to combine a low-rank data
representation with topology preservation. Low-rank data representation
(achieved by finding the k-dominant singular values of a Casorati Matrix
arranged from the data sequence) speeds up the global solution and achieves
noise reduction. On the other hand, topology preservation (achieved by
monitoring the Jacobian determinant) allows to radically rule out distortions
while carefully controlling the size of allowed expansions and contractions.
Our variational approach is carried out on a realistic dataset as well as on a
simulated one. We demonstrate how our proposed variational solution deals with
complex deformations through careful numerical experiments. While maintaining
the accuracy of the solution, the low-rank preprocessing is shown to speed up
the convergence of the variational problem. Beyond cardiac motion estimation,
our approach is promising for the analysis of other organs that experience
motion.

Gaussian process regression can turn non-uniform and undersampled diffusion MRI data into diffusion spectrum imaging

Jens Sjölund, Anders Eklund, Evren Özarslan, Hans Knutsson
Comments: 5 pages
Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We propose to use Gaussian process regression to accurately estimate the
diffusion MRI signal at arbitrary locations in q-space. By estimating the
signal on a grid, we can do synthetic diffusion spectrum imaging:
reconstructing the ensemble averaged propagator (EAP) by an inverse Fourier
transform. We also propose an alternative reconstruction method guaranteeing a
nonnegative EAP that integrates to unity. The reconstruction is validated on
data simulated from two Gaussians at various crossing angles. Moreover, we
demonstrate on non-uniformly sampled in vivo data that the method is far
superior to linear interpolation, and allows a drastic undersampling of the
data with only a minor loss of accuracy. We envision the method as a potential
replacement for standard diffusion spectrum imaging, in particular when
acquistion time is limited.

Inferring low-dimensional microstructure representations using convolutional neural networks

Nicholas Lubbers, Turab Lookman, Kipton Barros
Comments: 25 Pages, 12 Figures
Subjects: Computational Physics (physics.comp-ph); Materials Science (cond-mat.mtrl-sci); Computer Vision and Pattern Recognition (cs.CV)

We apply recent advances in machine learning and computer vision to a central
problem in materials informatics: The statistical representation of
microstructural images. We use activations in a pre-trained convolutional
neural network to provide a high-dimensional characterization of a set of
synthetic microstructural images. Next, we use manifold learning to obtain a
low-dimensional embedding of this statistical characterization. We show that
the low-dimensional embedding extracts the parameters used to generate the
images. According to a variety of metrics, the convolutional neural network
method yields dramatically better embeddings than the analogous method derived
from two-point correlations alone.

Artificial Intelligence

Encoding monotonic multi-set preferences using CI-nets: preliminary report

Martin Diller, Anthony Hunter
Subjects: Artificial Intelligence (cs.AI)

CP-nets and their variants constitute one of the main AI approaches for
specifying and reasoning about preferences. CI-nets, in particular, are a
CP-inspired formalism for representing ordinal preferences over sets of goods,
which are typically required to be monotonic.

Considering also that goods often come in multi-sets rather than sets, a
natural question is whether CI-nets can be used more or less directly to encode
preferences over multi-sets. We here provide some initial ideas on how to
achieve this, in the sense that at least a restricted form of reasoning on our
framework, which we call “confined reasoning”, can be efficiently reduced to
reasoning on CI-nets. Our framework nevertheless allows for encoding
preferences over multi-sets with unbounded multiplicities. We also show the
extent to which it can be used to represent preferences where multiplicites of
the goods are not stated explicitly (“purely qualitative preferences”) as well
as a potential use of our generalization of CI-nets as a component of a recent
system for evidence aggregation.

RL(^2): Fast Reinforcement Learning via Slow Reinforcement Learning

Deep reinforcement learning (deep RL) has been successful in learning
sophisticated behaviors automatically; however, the learning process requires a
huge number of trials. In contrast, animals can learn new tasks in just a few
trials, benefiting from their prior knowledge about the world. This paper seeks
to bridge this gap. Rather than designing a “fast” reinforcement learning
algorithm, we propose to represent it as a recurrent neural network (RNN) and
learn it from data. In our proposed method, RL(^2), the algorithm is encoded in
the weights of the RNN, which are learned slowly through a general-purpose
(“slow”) RL algorithm. The RNN receives all information a typical RL algorithm
would receive, including observations, actions, rewards, and termination flags;
and it retains its state across episodes in a given Markov Decision Process
(MDP). The activations of the RNN store the state of the “fast” RL algorithm on
the current (previously unseen) MDP. We evaluate RL(^2) experimentally on both
small-scale and large-scale problems. On the small-scale side, we train it to
solve randomly generated multi-arm bandit problems and finite MDPs. After
RL(^2) is trained, its performance on new MDPs is close to human-designed
algorithms with optimality guarantees. On the large-scale side, we test RL(^2)
on a vision-based navigation task and show that it scales up to
high-dimensional problems.

Recursive Decomposition for Nonconvex Optimization

Abram L. Friesen, Pedro Domingos
Comments: 11 pages, 7 figures, pdflatex
Journal-ref: Proceedings of the 24th International Joint Conference on
Artificial Intelligence (2015), pp. 253-259
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Continuous optimization is an important problem in many areas of AI,
including vision, robotics, probabilistic inference, and machine learning.
Unfortunately, most real-world optimization problems are nonconvex, causing
standard convex techniques to find only local optima, even with extensions like
random restarts and simulated annealing. We observe that, in many cases, the
local modes of the objective function have combinatorial structure, and thus
ideas from combinatorial optimization can be brought to bear. Based on this, we
propose a problem-decomposition approach to nonconvex optimization. Similarly
to DPLL-style SAT solvers and recursive conditioning in probabilistic
inference, our algorithm, RDIS, recursively sets variables so as to simplify
and decompose the objective function into approximately independent
sub-functions, until the remaining functions are simple enough to be optimized
by standard techniques like gradient descent. The variables to set are chosen
by graph partitioning, ensuring decomposition whenever possible. We show
analytically that RDIS can solve a broad class of nonconvex optimization
problems exponentially faster than gradient descent with random restarts.
Experimentally, RDIS outperforms standard techniques on problems like structure
from motion and protein folding.

Tuning Recurrent Neural Networks with Reinforcement Learning

Natasha Jaques, Shixiang Gu, Richard E. Turner, Douglas Eck
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Sequence models can be trained using supervised learning and a next-step
prediction objective. This approach, however, suffers from known failure modes.
For example, it is notoriously difficult to ensure multi-step generated
sequences have coherent global structure. Motivated by the fact that
reinforcement learning (RL) can be used to impose arbitrary properties on
generated data by choosing appropriate reward functions, in this paper we
propose a novel approach for sequence training which combines Maximum
Likelihood (ML) and RL training. We refine a sequence predictor by optimizing
for some imposed reward functions, while maintaining good predictive properties
learned from data. We propose efficient ways to solve this by augmenting deep
Q-learning with a cross-entropy reward and deriving novel off-policy methods
for RNNs from stochastic optimal control (SOC). We explore the usefulness of
our approach in the context of music generation. An LSTM is trained on a large
corpus of songs to predict the next note in a musical sequence. This Note-RNN
is then refined using RL, where the reward function is a combination of rewards
based on rules of music theory, as well as the output of another trained
Note-RNN. We show that by combining ML and RL, this RL Tuner method can not
only produce more pleasing melodies, but that it can significantly reduce
unwanted behaviors and failure modes of the RNN.

Information Retrieval

An Automated System for Essay Scoring of Online Exams in Arabic based on Stemming Techniques and Levenshtein Edit Operations

Emad Fawzi Al-Shalabi
Comments: 5 pages, 2 figures
Journal-ref: IJCSI International Journal of Computer Science Issues, Volume 13,
Issue 5, September 2016
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

In this article, an automated system is proposed for essay scoring in Arabic
language for online exams based on stemming techniques and Levenshtein edit
operations. An online exam has been developed on the proposed mechanisms,
exploiting the capabilities of light and heavy stemming. The implemented online
grading system has shown to be an efficient tool for automated scoring of essay
questions.

Computation and Language

When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

Barbara Plank, Malvina Nissim
Comments: Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2016)
Subjects: Computation and Language (cs.CL)

We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter
data, in the context of the Evalita 2016 PoSTWITA shared task. We show that
training the tagger on native Twitter data enriched with little amounts of
specifically selected gold data and additional silver-labelled data scraped
from Facebook, yields better results than using large amounts of manually
annotated data from a mix of genres.

Distant supervision for emotion detection using Facebook reactions

Chris Pool, Malvina Nissim
Comments: Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES 2016), held in conjunction with COLING 2016, Osaka, Japan
Subjects: Computation and Language (cs.CL)

We exploit the Facebook reaction feature in a distant supervised fashion to
train a support vector machine classifier for emotion detection, using several
feature combinations and combining different Facebook pages. We test our models
on existing benchmarks for emotion detection and show that employing only
information that is derived completely automatically, thus without relying on
any handcrafted lexicon as it’s usually done, we can achieve competitive
results. The results also show that there is large room for improvement,
especially by gearing the collection of Facebook pages, with a view to the
target domain.

A Comparison of Word Embeddings for English and Cross-Lingual Chinese Word Sense Disambiguation

Hong Jin Kang, Tao Chen, Muthu Kumar Chandrasekaran, Min-Yen Kan
Comments: 10 pages
Subjects: Computation and Language (cs.CL)

Word embeddings are now ubiquitous forms of word representation in natural
language processing. There have been applications of word embeddings for
monolingual word sense disambiguation (WSD) in English, but few comparisons
have been done. This paper attempts to bridge that gap by examining popular
embeddings for the task of monolingual English WSD. Our simplified method leads
to comparable state-of-the-art performance without expensive retraining.
Cross-Lingual WSD – where the word senses of a word in a source language e come
from a separate target translation language f – can also assist in language
learning; for example, when providing translations of target vocabulary for
learners. Thus we have also applied word embeddings to the novel task of
cross-lingual WSD for Chinese and provide a public dataset for further
benchmarking. We have also experimented with using word embeddings for LSTM
networks and found surprisingly that a basic LSTM network does not work well.
We discuss the ramifications of this outcome.

Increasing the throughput of machine translation systems using clouds

Jernej Vičič, Andrej Brodnik
Comments: 20 pages, 7 figures
Subjects: Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC)

The manuscript presents an experiment at implementation of a Machine
Translation system in a MapReduce model. The empirical evaluation was done
using fully implemented translation systems embedded into the MapReduce
programming model. Two machine translation paradigms were studied: shallow
transfer Rule Based Machine Translation and Statistical Machine Translation.

The results show that the MapReduce model can be successfully used to
increase the throughput of a machine translation system. Furthermore this
method enhances the throughput of a machine translation system without
decreasing the quality of the translation output.

Thus, the present manuscript also represents a contribution to the seminal
work in natural language processing, specifically Machine Translation. It first
points toward the importance of the definition of the metric of throughput of
translation system and, second, the applicability of the machine translation
task to the MapReduce paradigm.

Old Content and Modern Tools – Searching Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910

Kimmo Kettunen, Eetu Mäkelä, Teemu Ruokolainen, Juha Kuokkala, Laura Löfberg
Comments: 24 pages, 13 tables
Subjects: Computation and Language (cs.CL)

Named Entity Recognition (NER), search, classification and tagging of names
and name like frequent informational elements in texts, has become a standard
information extraction procedure for textual data. NER has been applied to many
types of texts and different types of entities: newspapers, fiction, historical
records, persons, locations, chemical compounds, protein families, animals etc.
In general a NER system’s performance is genre and domain dependent and also
used entity categories vary (Nadeau and Sekine, 2007). The most general set of
named entities is usually some version of three partite categorization of
locations, persons and organizations. In this paper we report first large scale
trials and evaluation of NER with data out of a digitized Finnish historical
newspaper collection Digi. Experiments, results and discussion of this research
serve development of the Web collection of historical Finnish newspapers.

Digi collection contains 1,960,921 pages of newspaper material from years
1771-1910 both in Finnish and Swedish. We use only material of Finnish
documents in our evaluation. The OCRed newspaper collection has lots of OCR
errors; its estimated word level correctness is about 70-75 % (Kettunen and
P”a”akk”onen, 2016). Our principal NER tagger is a rule-based tagger of
Finnish, FiNER, provided by the FIN-CLARIN consortium. We show also results of
limited category semantic tagging with tools of the Semantic Computing Research
Group (SeCo) of the Aalto University. Three other tools are also evaluated
briefly.

This research reports first published large scale results of NER in a
historical Finnish OCRed newspaper collection. Results of the research
supplement NER results of other languages with similar noisy data.

Automatic recognition of child speech for robotic applications in noisy environments

Samuel Fernando, Roger K. Moore, David Cameron, Emily C. Collins, Abigail Millings, Amanda J. Sharkey, Tony J. Prescott
Comments: Submission to Computer Speech and Language, special issue on Interaction Technologies for Children
Subjects: Computation and Language (cs.CL); Sound (cs.SD)

Automatic speech recognition (ASR) allows a natural and intuitive interface
for robotic educational applications for children. However there are a number
of challenges to overcome to allow such an interface to operate robustly in
realistic settings, including the intrinsic difficulties of recognising child
speech and high levels of background noise often present in classrooms. As part
of the EU EASEL project we have provided several contributions to address these
challenges, implementing our own ASR module for use in robotics applications.
We used the latest deep neural network algorithms which provide a leap in
performance over the traditional GMM approach, and apply data augmentation
methods to improve robustness to noise and speaker variation. We provide a
close integration between the ASR module and the rest of the dialogue system,
allowing the ASR to receive in real-time the language models relevant to the
current section of the dialogue, greatly improving the accuracy. We integrated
our ASR module into an interactive, multimodal system using a small humanoid
robot to help children learn about exercise and energy. The system was
installed at a public museum event as part of a research study where 320
children (aged 3 to 14) interacted with the robot, with our ASR achieving 90%
accuracy for fluent and near-fluent speech.

Audio Visual Speech Recognition using Deep Recurrent Neural Networks

Abhinav Thanda, Shankar M Venkatesan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

In this work, we propose a training algorithm for an audio-visual automatic
speech recognition (AV-ASR) system using deep recurrent neural network
(RNN).First, we train a deep RNN acoustic model with a Connectionist Temporal
Classification (CTC) objective function. The frame labels obtained from the
acoustic model are then used to perform a non-linear dimensionality reduction
of the visual features using a deep bottleneck network. Audio and visual
features are fused and used to train a fusion RNN. The use of bottleneck
features for visual modality helps the model to converge properly during
training. Our system is evaluated on GRID corpus. Our results show that
presence of visual modality gives significant improvement in character error
rate (CER) at various levels of noise even when the model is trained without
noisy data. We also provide a comparison of two fusion methods: feature fusion
and decision fusion.

An Automated System for Essay Scoring of Online Exams in Arabic based on Stemming Techniques and Levenshtein Edit Operations

In this article, an automated system is proposed for essay scoring in Arabic
language for online exams based on stemming techniques and Levenshtein edit
operations. An online exam has been developed on the proposed mechanisms,
exploiting the capabilities of light and heavy stemming. The implemented online
grading system has shown to be an efficient tool for automated scoring of essay
questions.

Distributed, Parallel, and Cluster Computing

Coarse mesh partitioning for tree based AMR

Carsten Burstedde, Johannes Holke
Comments: 28 pages, 11 figures, 6 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In tree based adaptive mesh refinement, elements are partitioned between
processes using a space filling curve. The curve establishes an ordering
between all elements that derive from the same root element, the tree. When
representing more complex geometries by patching together several trees, the
roots of these trees form an unstructured coarse mesh. We present an algorithm
to partition the elements of the coarse mesh such that (a) the fine mesh can be
load-balanced to equal element counts per process regardless of the
element-to-tree map and (b) each process that holds fine mesh elements has
access to the meta data of all relevant trees. As an additional feature, the
algorithm partitions the meta data of relevant ghost (halo) trees as well. We
develop in detail how each process computes the communication pattern for the
partition routine without handshaking and with minimal data movement. We
demonstrate the scalability of this approach on up to 917e3 MPI ranks and
.37e12 coarse mesh elements, measuring run times of one second or less.

SLA-aware Interactive Workflow Assistant for HPC Parameter Sweeping Experiments

Bruno Silva, Marco A. S. Netto, Renato L. F. Cunha
Comments: 11 pages, 9 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

A common workflow in science and engineering is to (i) setup and deploy large
experiments with tasks comprising an application and multiple parameter values;
(ii) generate intermediate results; (iii) analyze them; and (iv) reprioritize
the tasks. These steps are repeated until the desired goal is achieved, which
can be the evaluation/simulation of complex systems or model calibration. Due
to time and cost constraints, sweeping all possible parameter values of the
user application is not always feasible. Experimental Design techniques can
help users reorganize submission-execution-analysis workflows to bring a
solution in a more timely manner. This paper introduces a novel tool that
leverages users’ feedback on analyzing intermediate results of parameter
sweeping experiments to advise them about their strategies on parameter
selections tied to their SLA constraints. We evaluated our tool with three
applications of distinct domains and search space shapes. Our main finding is
that users with submission-execution-analysis workflows can benefit from their
interaction with intermediate results and adapt themselves according to their
domain expertise and SLA constraints.

Helping HPC Users Specify Job Memory Requirements via Machine Learning

Eduardo R. Rodrigues, Renato L. F. Cunha, Marco A. S. Netto, Michael Spriggs
Comments: 8 pages, 3 figures, presented at the Third Annual Workshop on HPC User Support Tools
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Resource allocation in High Performance Computing (HPC) settings is still not
easy for end-users due to the wide variety of application and environment
configuration options. Users have difficulties to estimate the number of
processors and amount of memory required by their jobs, select the queue and
partition, and estimate when job output will be available to plan for next
experiments. Apart from wasting infrastructure resources by making wrong
allocation decisions, overall user response time can also be negatively
impacted. Techniques that exploit batch scheduler systems to predict waiting
time and runtime of user jobs have already been proposed. However, we observed
that such techniques are not suitable for predicting job memory usage. In this
paper we introduce a tool to help users predict their memory requirements using
machine learning. We describe the integration of the tool with a batch
scheduler system, discuss how batch scheduler log data can be exploited to
generate memory usage predictions through machine learning, and present results
of two production systems containing thousands of jobs.

Language Support for Reliable Memory Regions

Saurabh Hukerikar, Christian Engelmann
Comments: The 29th International Workshop on Languages and Compilers for Parallel Computing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL); Software Engineering (cs.SE)

The path to exascale computational capabilities in high-performance computing
(HPC) systems is challenged by the inadequacy of present software technologies
to adapt to the rapid evolution of architectures of supercomputing systems. The
constraints of power have driven system designs to include increasingly
heterogeneous architectures and diverse memory technologies and interfaces.
Future systems are also expected to experience an increased rate of errors,
such that the applications will no longer be able to assume correct behavior of
the underlying machine. To enable the scientific community to succeed in
scaling their applications, and to harness the capabilities of exascale
systems, we need software strategies that provide mechanisms for explicit
management of resilience to errors in the system, in addition to locality of
reference in the complex memory hierarchies of future HPC systems.

In prior work, we introduced the concept of explicitly reliable memory
regions, called havens. Memory management using havens supports reliability
management through a region-based approach to memory allocations. Havens enable
the creation of robust memory regions, whose resilient behavior is guaranteed
by software-based protection schemes. In this paper, we propose language
support for havens through type annotations that make the structure of a
program’s havens more explicit and convenient for HPC programmers to use. We
describe how the extended haven-based memory management model is implemented,
and demonstrate the use of the language-based annotations to affect the
resiliency of a conjugate gradient solver application.

ZeroTouch Provisioning (ZTP) Model and Infrastructure Components for Multi-provider Cloud Services Provisioning

Yuri Demchenko, Paola Grosso, Cees de Laat, Sonja Filiposka, Migiel de Vos
Comments: 6 pages, 2 fugures
Journal-ref: Fifth IEEE International Workshop on Cloud Computing Interclouds,
Multiclouds, Federations, and Interoperability (Intercloud 2016), In Proc.
IEEE International Conference on Cloud Engineering (IC2E), April 4 – 8, 2016,
Berlin, Germany
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This paper presents results of the ongoing development of the Cloud Services
Delivery Infrastructure (CSDI) that provides a basis for infrastructure centric
cloud services provisioning, operation and management in multi-cloud
multi-provider environment defined as a Zero Touch Provisioning, Operation and
Management (ZTP/ZTPOM) model. The presented work refers to use cases from data
intensive research that require high performance computation resources and
large storage volumes that are typically distributed between datacenters often
involving multiple cloud providers. Automation for large scale scientific (and
industrial) applications should include provisioning of both inter-cloud
network infrastructure and intra-cloud application resources. It should provide
support for the complete application operation workflow together with the
possible application infrastructure and resources changes that can occur during
the application lifecycle. The authors investigate existing technologies for
automation of the service provisioning and management processes aiming to
cross-pollinate best practices from currently disconnected domains such as
cloud based applications provisioning and multi-domain high-performance network
provisioning. The paper refers to the previous and legacy research by authors,
the Open Cloud eXchange (OCX), that has been proposed to address the last mile
problem in cloud services delivery to campuses over trans-national backbone
networks such as GEANT. OCX will serve as an integral component of the
prospective ZTP infrastructure over the GEANT network. Another important
component, the Marketplace, is defined for providing cloud services and
applications discovery (in generally intercloud environment) and may also
support additional services such as services composition and trust brokering
for establishing customer-provider federations.

Resilience Design Patterns – A Structured Approach to Resilience at Extreme Scale

Saurabh Hukerikar, Christian Engelmann
Comments: Oak Ridge National Laboratory Technical Report version 1.0
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

In this document, we develop a structured approach to the management of HPC
resilience based on the concept of resilience-based design patterns. A design
pattern is a general repeatable solution to a commonly occurring problem. We
identify the commonly occurring problems and solutions used to deal with
faults, errors and failures in HPC systems. The catalog of resilience design
patterns provides designers with reusable design elements. We define a design
framework that enhances our understanding of the important constraints and
opportunities for solutions deployed at various layers of the system stack. The
framework may be used to establish mechanisms and interfaces to coordinate
flexible fault management across hardware and software components. The
framework also enables optimization of the cost-benefit trade-offs among
performance, resilience, and power consumption. The overall goal of this work
is to enable a systematic methodology for the design and evaluation of
resilience technologies in extreme-scale HPC systems that keep scientific
applications running to a correct solution in a timely and cost-efficient
manner in spite of frequent faults, errors, and failures of various types.

Increasing the throughput of machine translation systems using clouds

Jernej Vičič, Andrej Brodnik
Comments: 20 pages, 7 figures
Subjects: Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC)

The manuscript presents an experiment at implementation of a Machine
Translation system in a MapReduce model. The empirical evaluation was done
using fully implemented translation systems embedded into the MapReduce
programming model. Two machine translation paradigms were studied: shallow
transfer Rule Based Machine Translation and Statistical Machine Translation.

The results show that the MapReduce model can be successfully used to
increase the throughput of a machine translation system. Furthermore this
method enhances the throughput of a machine translation system without
decreasing the quality of the translation output.

Thus, the present manuscript also represents a contribution to the seminal
work in natural language processing, specifically Machine Translation. It first
points toward the importance of the definition of the metric of throughput of
translation system and, second, the applicability of the machine translation
task to the MapReduce paradigm.

Learning

Fair Learning in Markovian Environments

Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Aaron Roth
Subjects: Learning (cs.LG)

We initiate the study of fair learning in Markovian settings, where the
actions of a learning algorithm may affect its environment and future rewards.
Working in the model of reinforcement learning, we define a fairness constraint
requiring that an algorithm never prefers one action over another if the
long-term (discounted) reward of choosing the latter action is higher.

Our first result is negative: despite the fact that fairness is consistent
with the optimal policy, any learning algorithm satisfying fairness must take
exponentially many rounds in the number of states to achieve non-trivial
approximation to the optimal policy. Our main result is a polynomial time
algorithm that is provably fair under an approximate notion of fairness, thus
establishing an exponential gap between exact and approximate fairness.

Incremental Sequence Learning

Edwin D. de Jong
Comments: 18 pages
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Deep learning research over the past years has shown that by increasing the
scope or difficulty of the learning problem over time, increasingly complex
learning problems can be addressed. We study incremental learning in the
context of sequence learning, using generative RNNs in the form of multi-layer
recurrent Mixture Density Networks. We introduce Incremental Sequence Learning,
a simple incremental approach to sequence learning. Incremental Sequence
Learning starts out by using only the first few steps of each sequence as
training data. Each time a performance criterion has been reached, the length
of the parts of the sequences used for training is increased. To evaluate
Incremental Sequence Learning and comparison methods, we introduce and make
available a novel sequence learning task and data set: predicting and
classifying MNIST pen stroke sequences, where the familiar handwritten digit
images have been transformed to pen stroke sequences representing the skeletons
of the digits. We find that Incremental Sequence Learning greatly speeds up
sequence learning and reaches the best test performance level of regular
sequence learning 20 times faster, reduces the test error by 74%, and in
general performs more robustly; it displays lower variance and achieves
sustained progress after all three comparison method have stopped improving. A
trained sequence prediction model is also used in transfer learning to the task
of sequence classification, where it is found that transfer learning realizes
improved classification performance compared to methods that learn to classify
from scratch.

Attributing Hacks

Ziqi Liu, Alexander J. Smola, Kyle Soska, Yu-Xiang Wang, Qinghua Zheng
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Applications (stat.AP)

In this paper we describe an algorithm for estimating the provenance of hacks
on websites. That is, given properties of sites and the temporal occurrence of
attacks, we are able to attribute individual attacks to joint causes and
vulnerabilities, as well as estimating the evolution of these vulnerabilities
over time. Specifically, we use hazard regression with a time-varying additive
hazard function parameterized in a generalized linear form. The activation
coefficients on each feature are continuous-time functions over time. We
formulate the problem of learning these functions as a constrained variational
maximum likelihood estimation problem with total variation penalty and show
that the optimal solution is a 0th order spline (a piecewise constant function)
with a finite number of known knots. This allows the inference problem to be
solved efficiently and at scale by solving a finite dimensional optimization
problem. Extensive experiments on real data sets show that our method
significantly outperforms Cox’s proportional hazard model. We also conduct a
case study and verify that the fitted functions are indeed recovering
vulnerable features and real-life events such as the release of code to exploit
these features in hacker blogs.

Online Learning for Wireless Distributed Computing

Yi-Hsuan Kao, Kwame Wright, Bhaskar Krishnamachari, Fan Bai
Comments: 10 pages, 8 figures, conference
Subjects: Learning (cs.LG)

There has been a growing interest for Wireless Distributed Computing (WDC),
which leverages collaborative computing over multiple wireless devices. WDC
enables complex applications that a single device cannot support individually.
However, the problem of assigning tasks over multiple devices becomes
challenging in the dynamic environments encountered in real-world settings,
considering that the resource availability and channel conditions change over
time in unpredictable ways due to mobility and other factors. In this paper, we
formulate a task assignment problem as an online learning problem using an
adversarial multi-armed bandit framework. We propose MABSTA, a novel online
learning algorithm that learns the performance of unknown devices and channel
qualities continually through exploratory probing and makes task assignment
decisions by exploiting the gained knowledge. For maximal adaptability, MABSTA
is designed to make no stochastic assumption about the environment. We analyze
it mathematically and provide a worst-case performance guarantee for any
dynamic environment. We also compare it with the optimal offline policy as well
as other baselines via emulations on trace-data obtained from a wireless IoT
testbed, and show that it offers competitive and robust performance in all
cases. To the best of our knowledge, MABSTA is the first online algorithm in
this domain of task assignment problems and provides provable performance
guarantee.

Tuning Recurrent Neural Networks with Reinforcement Learning

Natasha Jaques, Shixiang Gu, Richard E. Turner, Douglas Eck
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Sequence models can be trained using supervised learning and a next-step
prediction objective. This approach, however, suffers from known failure modes.
For example, it is notoriously difficult to ensure multi-step generated
sequences have coherent global structure. Motivated by the fact that
reinforcement learning (RL) can be used to impose arbitrary properties on
generated data by choosing appropriate reward functions, in this paper we
propose a novel approach for sequence training which combines Maximum
Likelihood (ML) and RL training. We refine a sequence predictor by optimizing
for some imposed reward functions, while maintaining good predictive properties
learned from data. We propose efficient ways to solve this by augmenting deep
Q-learning with a cross-entropy reward and deriving novel off-policy methods
for RNNs from stochastic optimal control (SOC). We explore the usefulness of
our approach in the context of music generation. An LSTM is trained on a large
corpus of songs to predict the next note in a musical sequence. This Note-RNN
is then refined using RL, where the reward function is a combination of rewards
based on rules of music theory, as well as the output of another trained
Note-RNN. We show that by combining ML and RL, this RL Tuner method can not
only produce more pleasing melodies, but that it can significantly reduce
unwanted behaviors and failure modes of the RNN.

Delving into Transferable Adversarial Examples and Black-box Attacks

Yanpei Liu, Xinyun Chen, Chang Liu, Dawn Song
Subjects: Learning (cs.LG)

An intriguing property of deep neural networks is the existence of
adversarial examples, which can transfer among different architectures. These
transferable adversarial examples may severely hinder deep neural network-based
applications. Previous works mostly study the transferability using small scale
datasets. In this work, we are the first to conduct an extensive study of the
transferability over large models and a large scale dataset, and we are also
the first to study the transferability of targeted adversarial examples with
their target labels. We study both non-targeted and targeted adversarial
examples, and show that while transferable non-targeted adversarial examples
are easy to find, targeted adversarial examples generated using existing
approaches almost never transfer with their target labels. Therefore, we
propose novel ensemble-based approaches to generating transferable adversarial
examples. Using such approaches, we observe a large proportion of targeted
adversarial examples that are able to transfer with their target labels for the
first time. We also present some geometric studies to help understanding the
transferable adversarial examples. Finally, we show that the adversarial
examples generated using ensemble-based approaches can successfully attack
Clarifai.com, which is a black-box image classification system.

Recursive Regression with Neural Networks: Approximating the HJI PDE Solution

Vicenç Rubies Royo
Subjects: Learning (cs.LG); Dynamical Systems (math.DS)

Most machine learning applications using neural networks seek to approximate
some function g(x) by minimizing some cost criterion. In the simplest case, if
one has access to pairs of the form (x, y) where y = g(x), the problem can be
framed as a simple regression problem. Beyond this family of problems, we find
many important cases where g(x) is unknown so this approach is not always
viable. However, similar to what we find in the work of Mnih et al. (2013), if
we have some known properties of the function we are seeking to approximate,
there is still hope to frame the problem as a regression problem. In this work,
we show this in the context of trying to approximate the solution to a
particular partial differential equation known as the Hamilton-Jacobi-Isaacs
PDE found in the fields of control theory and robotics.

Variational Lossy Autoencoder

Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Representation learning seeks to expose certain aspects of observed data in a
learned representation that’s amenable to downstream tasks like classification.
For instance, a good representation for 2D images might be one that describes
only global structure and discards information about detailed texture. In this
paper, we present a simple but principled method to learn such global
representations by combining Variational Autoencoder (VAE) with neural
autoregressive models such as RNN, MADE and PixelRNN/CNN. Our proposed VAE
model allows us to have control over what the global latent code can learn and
, by designing the architecture accordingly, we can force the global latent
code to discard irrelevant information such as texture in 2D images, and hence
the code only “autoencodes” data in a lossy fashion. In addition, by leveraging
autoregressive models as both prior distribution (p(z)) and decoding
distribution (p(x|z)), we can greatly improve generative modeling performance
of VAEs, achieving new state-of-the-art results on MNIST, OMNIGLOT and
Caltech-101 Silhouettes density estimation tasks.

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation

Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh
Subjects: Information Theory (cs.IT); Learning (cs.LG)

The advent of data science has spurred interest in estimating properties of
distributions over large alphabets. Fundamental symmetric properties such as
support size, support coverage, entropy, and proximity to uniformity, received
most attention, with each property estimated using a different technique and
often intricate analysis tools.

We prove that for all these properties, a single, simple, plug-in
estimator—profile maximum likelihood (PML)—performs as well as the best
specialized techniques. This raises the possibility that PML may optimally
estimate many other symmetric properties.

Heter-LP: A heterogeneous label propagation algorithm and its application in drug repositioning

Maryam Lotfi Shahreza, Nasser Ghadiri, Seyed Rasul Mossavi, Jaleh Varshosaz, James Green
Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG)

Drug repositioning offers an effective solution to drug discovery, saving
both time and resources by finding new indications for existing drugs.
Typically, a drug takes effect via its protein targets in the cell. As a
result, it is necessary for drug development studies to conduct an
investigation into the interrelationships of drugs, protein targets, and
diseases. Although previous studies have made a strong case for the
effectiveness of integrative network-based methods for predicting these
interrelationships, little progress has been achieved in this regard within
drug repositioning research. Moreover, the interactions of new drugs and
targets (lacking any known targets and drugs, respectively) cannot be
accurately predicted by most established methods. In this paper, we propose a
novel semi-supervised heterogeneous label propagation algorithm named Heter-LP,
which applies both local as well as global network features for data
integration. To predict drug-target, disease-target, and drug-disease
associations, we use information about drugs, diseases, and targets as
collected from multiple sources at different levels. Our algorithm integrates
these various types of data into a heterogeneous network and implements a label
propagation algorithm to find new interactions. Statistical analyses of 10-fold
cross-validation results and experimental analysis support the effectiveness of
the proposed algorithm.

Audio Visual Speech Recognition using Deep Recurrent Neural Networks

Abhinav Thanda, Shankar M Venkatesan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

In this work, we propose a training algorithm for an audio-visual automatic
speech recognition (AV-ASR) system using deep recurrent neural network
(RNN).First, we train a deep RNN acoustic model with a Connectionist Temporal
Classification (CTC) objective function. The frame labels obtained from the
acoustic model are then used to perform a non-linear dimensionality reduction
of the visual features using a deep bottleneck network. Audio and visual
features are fused and used to train a fusion RNN. The use of bottleneck
features for visual modality helps the model to converge properly during
training. Our system is evaluated on GRID corpus. Our results show that
presence of visual modality gives significant improvement in character error
rate (CER) at various levels of noise even when the model is trained without
noisy data. We also provide a comparison of two fusion methods: feature fusion
and decision fusion.

Lie-Access Neural Turing Machines

Greg Yang, Alexander M. Rush
Comments: Submitted to ICLR. Rewrite and improvement of this https URL
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Recent work has demonstrated the effectiveness of employing explicit external
memory structures in conjunction with deep neural models for algorithmic
learning (Graves et al. 2014; Weston et al. 2014). These models utilize
differentiable versions of traditional discrete memory-access structures
(random access, stacks, tapes) to provide the variable-length storage necessary
for computational tasks. In this work, we propose an alternative model,
Lie-access memory, that is explicitly designed for the neural setting. In this
paradigm, memory is accessed using a continuous head in a key-space manifold.
The head is moved via Lie group actions, such as shifts or rotations, generated
by a controller, and soft memory access is performed by considering the
distance to keys associated with each memory. We argue that Lie groups provide
a natural generalization of discrete memory structures, such as Turing
machines, as they provide inverse and identity operators while maintain
differentiability. To experiment with this approach, we implement several
simplified Lie-access neural Turing machine (LANTM) with different Lie groups.
We find that this approach is able to perform well on a range of algorithmic
tasks.

RL(^2): Fast Reinforcement Learning via Slow Reinforcement Learning

Deep reinforcement learning (deep RL) has been successful in learning
sophisticated behaviors automatically; however, the learning process requires a
huge number of trials. In contrast, animals can learn new tasks in just a few
trials, benefiting from their prior knowledge about the world. This paper seeks
to bridge this gap. Rather than designing a “fast” reinforcement learning
algorithm, we propose to represent it as a recurrent neural network (RNN) and
learn it from data. In our proposed method, RL(^2), the algorithm is encoded in
the weights of the RNN, which are learned slowly through a general-purpose
(“slow”) RL algorithm. The RNN receives all information a typical RL algorithm
would receive, including observations, actions, rewards, and termination flags;
and it retains its state across episodes in a given Markov Decision Process
(MDP). The activations of the RNN store the state of the “fast” RL algorithm on
the current (previously unseen) MDP. We evaluate RL(^2) experimentally on both
small-scale and large-scale problems. On the small-scale side, we train it to
solve randomly generated multi-arm bandit problems and finite MDPs. After
RL(^2) is trained, its performance on new MDPs is close to human-designed
algorithms with optimality guarantees. On the large-scale side, we test RL(^2)
on a vision-based navigation task and show that it scales up to
high-dimensional problems.

Recursive Decomposition for Nonconvex Optimization

Continuous optimization is an important problem in many areas of AI,
including vision, robotics, probabilistic inference, and machine learning.
Unfortunately, most real-world optimization problems are nonconvex, causing
standard convex techniques to find only local optima, even with extensions like
random restarts and simulated annealing. We observe that, in many cases, the
local modes of the objective function have combinatorial structure, and thus
ideas from combinatorial optimization can be brought to bear. Based on this, we
propose a problem-decomposition approach to nonconvex optimization. Similarly
to DPLL-style SAT solvers and recursive conditioning in probabilistic
inference, our algorithm, RDIS, recursively sets variables so as to simplify
and decompose the objective function into approximately independent
sub-functions, until the remaining functions are simple enough to be optimized
by standard techniques like gradient descent. The variables to set are chosen
by graph partitioning, ensuring decomposition whenever possible. We show
analytically that RDIS can solve a broad class of nonconvex optimization
problems exponentially faster than gradient descent with random restarts.
Experimentally, RDIS outperforms standard techniques on problems like structure
from motion and protein folding.

Information Theory

NP-Hardness of Reed-Solomon Decoding, and the Prouhet-Tarry-Escott Problem

Venkata Gandikota, Badih Ghazi, Elena Grigorescu
Subjects: Information Theory (cs.IT); Computational Complexity (cs.CC)

Establishing the complexity of {em Bounded Distance Decoding} for
Reed-Solomon codes is a fundamental open problem in coding theory, explicitly
asked by Guruswami and Vardy (IEEE Trans. Inf. Theory, 2005). The problem is
motivated by the large current gap between the regime when it is NP-hard, and
the regime when it is efficiently solvable (i.e., the Johnson radius).

We show the first NP-hardness results for asymptotically smaller decoding
radii than the maximum likelihood decoding radius of Guruswami and Vardy.
Specifically, for Reed-Solomon codes of length (N) and dimension (K=O(N)), we
show that it is NP-hard to decode more than ( N-K- cfrac{log N}{loglog N})
errors (with (c>0) an absolute constant). Moreover, we show that the problem is
NP-hard under quasipolynomial-time reductions for an error amount (> N-K-
clog{N}) (with (c>0) an absolute constant).

These results follow from the NP-hardness of a generalization of the
classical Subset Sum problem to higher moments, called {em Moments Subset
Sum}, which has been a known open problem, and which may be of independent
interest.

We further reveal a strong connection with the well-studied
Prouhet-Tarry-Escott problem in Number Theory, which turns out to capture a
main barrier in extending our techniques. We believe the Prouhet-Tarry-Escott
problem deserves further study in the theoretical computer science community.

The Nonconvex Geometry of Low-Rank Matrix Optimizations with General Objective Functions

Qiuwei Li, Gongguo Tang
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

This work considers the minimization of a general convex function (f(X)) over
the cone of positive semi-definite matrices whose optimal solution (X^star) is
of low-rank. Standard first-order convex solvers require performing an
eigenvalue decomposition in each iteration, severely limiting their
scalability. A natural nonconvex reformulation of the problem factors the
variable (X) into the product of a rectangular matrix with fewer columns and
its transpose. For a special class of matrix sensing and completion problems
with quadratic objective functions, local search algorithms applied to the
factored problem have been shown to be much more efficient and, in spite of
being nonconvex, to converge to the global optimum. The purpose of this work is
to extend this line of study to general convex objective functions (f(X)) and
investigate the geometry of the resulting factored formulations. Specifically,
we prove that when (f(X)) satisfies restricted strong convexity and smoothness,
each critical point of the factored problem either corresponds to the optimal
solution (X^star) or is a strict saddle point where the Hessian matrix has a
negative eigenvalue. Such a geometric structure of the factored formulation
ensures that many local search algorithms can converge to the global optimum
with random initializations.

Throughput Analysis of Decentralized Coded Content Caching in Cellular Networks

Mohsen Karimzadeh Kiskani, Hamid R. Sadjadpour
Comments: Accepted to be published in IEEE Transactions on Wireless Communications (November 2016)
Subjects: Information Theory (cs.IT)

Decentralized coded content caching for next generation cellular networks is
studied. The contents are linearly combined and cached in under-utilized caches
of User Terminals (UTs) and its throughput capacity is compared with
decentralized uncoded content caching. In both scenarios, we consider multihop
Device-to-Device (D2D) communications and the use of femtocaches in the
network. It is shown that decentralized coded content caching can increase the
network throughput capacity compared to decentralized uncoded caching by
reducing the number of hops needed to deliver the desired content. Further, the
throughput capacity for Zipfian content request distribution is computed and it
is shown that the decentralized coded content cache placement can increase the
throughput capacity of cellular networks by a factor of ((log(n))^2) where (n)
is the number of nodes served by a femtocache.

Channel Estimation for Hybrid Architecture Based Wideband Millimeter Wave Systems

Kiran Venugopal, Ahmed Alkhateeb, Nuria González Prelcic, Robert W. Heath, Jr
Comments: 31 pages, 13 figures
Subjects: Information Theory (cs.IT)

Hybrid analog and digital precoding allows millimeter wave (mmWave) systems
to achieve both array and multiplexing gain. The design of the hybrid precoders
and combiners, though, is usually based on knowledge of the channel. Prior work
on mmWave channel estimation with hybrid architectures focused on narrowband
channels. Since mmWave systems will be wideband with frequency selectivity, it
is vital to develop channel estimation solutions for hybrid architectures based
wideband mmWave systems. In this paper, we develop a sparse formulation and
compressed sensing based solutions for the wideband mmWave channel estimation
problem for hybrid architectures. First, we leverage the sparse structure of
the frequency selective mmWave channels and formulate the channel estimation
problem as a sparse recovery in both time and frequency domains. Then, we
propose explicit channel estimation techniques for purely time or frequency
domains and for combined time/frequency domains. Our solutions are suitable for
both SC-FDE and OFDM systems. Simulation results show that the proposed
solutions achieve good channel estimation quality, while requiring small
training overhead. Leveraging the hybrid architecture at the transceivers gives
further improvement in estimation error performance and achievable rates.

Energy and Bursty Packet Loss Tradeoff over Fading Channels: A System Level Model

M. Majid Butt, Eduard A. Jorswieck, Amr Mohamed
Journal-ref: IEEE Systems Journal, 2016
Subjects: Information Theory (cs.IT)

Energy efficiency and quality of service (QoS) guarantees are the key design
goals for the 5G wireless communication systems. In this context, we discuss a
multiuser scheduling scheme over fading channels for loss tolerant
applications. The loss tolerance of the application is characterized in terms
of different parameters that contribute to quality of experience for the
application. The mobile users are scheduled opportunistically such that a
minimum QoS is guaranteed. We propose an opportunistic scheduling scheme and
address the cross layer design framework when channel state information is not
perfectly available at the transmitter and the receiver. We characterize the
system energy as a function of different QoS and channel state estimation error
parameters. The optimization problem is formulated using Markov chain framework
and solved using stochastic optimization techniques. The results demonstrate
that the parameters characterizing the packet loss are tightly coupled and
relaxation of one parameter does not benefit the system much if the other
constraints are tight. We evaluate the energy-performance trade-off numerically
and show the effect of channel uncertainty on the packet scheduler design.

New User-Irrepressible Protocol Sequences

Yijin Zhang, Yuan-Hsun Lo, Kenneth W. Shum, Wing Shing Wong
Comments: 15 pages, 3 tables
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

Protocol sequences are binary and periodic sequences used for deterministic
multiple access in a collision channel without feedback. In this paper, we
focus on user-irrepressible (UI) protocol sequences that can guarantee a
positive individual throughput per sequence period with probability one for a
slot-synchronous channel. As the sequence period has a fundamental impact on
the worst-case channel access delay, a common objective of designing UI
sequences is to make the sequence period as short as possible. Consider a
communication channel that is shared by (M) active users, and assume that each
protocol sequence has a constant Hamming weight (w). To attain a better delay
performance than previously known UI sequences, this paper presents a CRTm
construction of UI sequences with (w=M+1). For all non-prime (Mgeq 8), our
construction produces the shortest known sequence period of UI sequences.
Simulation results further show that the new construction enjoys a better
average delay performance than other constructions. In addition, we derive an
asymptotic lower bound on the minimum sequence period for (w=M+1) if the
sequence structure satisfies some technical conditions, called equi-difference,
and prove the tightness of this lower bound by using the CRTm construction.

Bayesian data assimilation based on a family of outer measures

Jeremie Houssineau, Daniel E. Clark
Subjects: Information Theory (cs.IT)

A flexible representation of uncertainty that remains within the standard
framework of probabilistic measure theory is presented along with a study of
its properties. This representation relies on a specific type of outer measure
that is based on the measure of a supremum, hence combining additive and highly
sub-additive components. It is shown that this type of outer measure enables
the introduction of intuitive concepts such as pullback and general data
assimilation operations.

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation

Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh
Subjects: Information Theory (cs.IT); Learning (cs.LG)

The advent of data science has spurred interest in estimating properties of
distributions over large alphabets. Fundamental symmetric properties such as
support size, support coverage, entropy, and proximity to uniformity, received
most attention, with each property estimated using a different technique and
often intricate analysis tools.

We prove that for all these properties, a single, simple, plug-in
estimator—profile maximum likelihood (PML)—performs as well as the best
specialized techniques. This raises the possibility that PML may optimally
estimate many other symmetric properties.

Dynamic TDD: the Asynchronous Case and the Synchronous Case

Ming Ding, David Lopez Perez, Guoqiang Mao, Zihuai Lin
Comments: 7 pages, 5 figures, submitted to IEEE ICC 2017
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Small cell networks (SCNs) are envisioned to embrace dynamic time division
duplexing (TDD) in order to tailor downlink (DL)/uplink (UL) subframe resources
to quick variations and burstiness of DL/UL traffic. The study of dynamic TDD
is particularly important because it serves as the predecessor of the full
duplex transmission technology, which has been identified as one of the
candidate technologies for the 5th-generation (5G) networks. Up to now, the
existing work on dynamic TDD only considered an asynchronous network scenario.
However, the current 4G network is a synchronous one and it is very likely that
the future 5G networks will follow the same system design philosophy for easy
implementation. In this paper, for the first time, we consider dynamic TDD in
synchronous networks and present analytical results on the probabilities of
inter-cell inter-link interference and the DL/UL time resource utilization.
Based on our analytical results, the area spectral efficiency is further
investigated to shed new light on the performance and the deployment of dynamic
TDD in future synchronous networks.

Unit circle MVDR beamformer

Saurav R Tuladhar, John R Buck
Comments: Accepted to ICASSP 2015
Subjects: Information Theory (cs.IT)

The array polynomial is the z-transform of the array weights for a narrowband
planewave beamformer using a uniform linear array (ULA). Evaluating the array
polynomial on the unit circle in the complex plane yields the beampattern. The
locations of the polynomial zeros on the unit circle indicate the nulls of the
beampattern. For planewave signals measured with a ULA, the locations of the
ensemble MVDR polynomial zeros are constrained on the unit circle. However,
sample matrix inversion (SMI) MVDR polynomial zeros generally do not fall on
the unit circle. The proposed unit circle MVDR (UC MVDR) projects the zeros of
the SMI MVDR polynomial radially on the unit circle. This satisfies the
constraint on the zeros of ensemble MVDR polynomial. Numerical simulations show
that the UC MVDR beamformer suppresses interferers better than the SMI MVDR and
the diagonal loaded MVDR beamformer and also improves the white noise gain
(WNG).

Probabilistic Caching in Wireless D2D Networks: Cache Hit Optimal vs. Throughput Optimal

Zheng Chen, Nikolaos Pappas, Marios Kountouris
Comments: IEEE Communications Letters
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

Departing from the conventional cache hit optimization in cache-enabled
wireless networks, we consider an alternative optimization approach for the
probabilistic caching placement in stochastic wireless D2D caching networks
taking into account the reliability of D2D transmissions. Using tools from
stochastic geometry, we provide a closed-form approximation of cache-aided
throughput, which measures the density of successfully served requests by local
device caches, and we obtain the optimal caching probabilities with numerical
optimization. Compared to the cache-hit-optimal case, the optimal caching
probabilities obtained by cache-aided throughput optimization show notable gain
in terms of the density of successfully served user requests, particularly in
dense user environments.

Mobile Lattice-Coded Physical-Layer Network Coding With Practical Channel Alignment

Yihua Tan, Soung Chang Liew, Tao Huang
Comments: 18 pages, 15 figures
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

The original concept of physical-layer network coding (PNC) was first
proposed in a MobiCom challenge paper in 2006 as a new paradigm to boost the
throughput of wireless relay networks. Since then, PNC has attracted a wide
following within the research community. A high point of PNC research was a
theoretical proof that the use of nested lattice codes in PNC could achieve the
capacity of a two-way relay network to within half bit. Many practical
challenges, however, remain to be addressed before the full potential of
lattice-coded PNC can be realized. Two major challenges are channel alignment
of distributed nodes and complexity reduction of lattice encoding-decoding.
This paper reports a first comprehensive implementation of a lattice-coded PNC
system. Our contributions are twofold: 1) we design and demonstrate a
low-overhead channel precoding scheme that can accurately align the channels of
distributed nodes driven by independent low-cost temperature-compensated
oscillators (TCXO); 2) we adapt the low-density lattice code (LDLC) for use in
practical PNC systems. Our lattice-coded PNC implementation yields good
throughput performance in static line-of-sight (LoS) scenario and mobile
non-LoS scenarios.

Minimum node degree in inhomogeneous random key graphs with unreliable links

Rashad Eletreby, Osman Yağan
Comments: In proceedings of the IEEE International Symposium on Information Theory (ISIT) 2016. arXiv admin note: text overlap with arXiv:1610.07576
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT); Probability (math.PR)

We consider wireless sensor networks under a heterogeneous random key
predistribution scheme and an on-off channel model. The heterogeneous key
predistribution scheme has recently been introduced by Yau{g}an – as an
extension to the Eschenauer and Gligor scheme – for the cases when the network
consists of sensor nodes with varying level of resources and/or connectivity
requirements, e.g., regular nodes vs. cluster heads. The network is modeled by
the intersection of the inhomogeneous random key graph (induced by the
heterogeneous scheme) with an ErdH{o}s-R’enyi graph (induced by the on/off
channel model). We present conditions (in the form of zero-one laws) on how to
scale the parameters of the intersection model so that with high probability
all of its nodes are connected to at least (k) other nodes; i.e., the minimum
node degree of the graph is no less than (k). We also present numerical results
to support our results in the finite-node regime. The numerical results suggest
that the conditions that ensure (k)-connectivity coincide with those ensuring
the minimum node degree being no less than (k).