IT博客汇 | arXiv Paper Daily: Thu, 27 Oct 2016

arXiv Paper Daily: Thu, 27 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-27 00:00:00

Computer Vision and Pattern Recognition

Mask-off: Synthesizing Face Images in the Presence of Head-mounted Displays

Yajie Zhao, Qingguo Xu, Xinyu Huang, Ruigang Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A head-mounted display (HMD) could be an important component of augmented
reality system. However, as the upper face region is seriously occluded by the
device, the user experience could be affected in applications such as
telecommunication and multi-player video games. In this paper, we first present
a novel experimental setup that consists of two near-infrared (NIR) cameras to
point to the eye regions and one visible-light RGB camera to capture the
visible face region. The main purpose of this paper is to synthesize realistic
face images without occlusions based on the images captured by these cameras.
To this end, we propose a novel synthesis framework that contains four modules:
3D head reconstruction, face alignment and tracking, face synthesis, and eye
synthesis. In face synthesis, we propose a novel algorithm that can robustly
align and track a personalized 3D head model given a face that is severely
occluded by the HMD. In eye synthesis, in order to generate accurate eye
movements and dynamic wrinkle variations around eye regions, we propose another
novel algorithm to colorize the NIR eye images and further remove the “red eye”
effects caused by the colorization. Results show that both hardware setup and
system framework are robust to synthesize realistic face images in video
sequences.

Estimating the concentration of gold nanoparticles incorporated on Natural Rubber membranes using Multi-Level Starlet Optimal Segmentation

Alexandre Fioravante de Siqueira, Flávio Camargo Cabrera, Aylton Pagamisse, Aldo Eloizo Job
Comments: 22 pages, 8 figures
Journal-ref: J Nanopart Res (2014) 16: 2809
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This study consolidates Multi-Level Starlet Segmentation (MLSS) and
Multi-Level Starlet Optimal Segmentation (MLSOS), techniques for
photomicrograph segmentation that use starlet wavelet detail levels to separate
areas of interest in an input image. Several segmentation levels can be
obtained using Multi-Level Starlet Segmentation; after that, Matthews
correlation coefficient (MCC) is used to choose an optimal segmentation level,
giving rise to Multi-Level Starlet Optimal Segmentation. In this paper, MLSOS
is employed to estimate the concentration of gold nanoparticles with diameter
around 47 nm, reducted on natural rubber membranes. These samples were used on
the construction of SERS/SERRS substrates and in the study of natural rubber
membranes with incorporated gold nanoparticles influence on Leishmania
braziliensis physiology. Precision, recall and accuracy are used to evaluate
the segmentation performance, and MLSOS presents accuracy greater than 88% for
this application.

Universal adversarial perturbations

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Given a state-of-the-art deep neural network classifier, we show the
existence of a universal (image-agnostic) and very small perturbation vector
that causes natural images to be misclassified with high probability. We
propose a systematic algorithm for computing universal perturbations, and show
that state-of-the-art deep neural networks are highly vulnerable to such
perturbations, albeit being quasi-imperceptible to the human eye. We further
empirically analyze these universal perturbations and show, in particular, that
they generalize very well across neural networks. The surprising existence of
universal perturbations reveals important geometric correlations among the
high-dimensional decision boundary of classifiers. It further outlines
potential security breaches with the existence of single directions in the
input space that adversaries can possibly exploit to break a classifier on most
natural images.

Video Analysis of "YouTube Funnies" to Aid the Study of Human Gait and Falls – Preliminary Results and Proof of Concept

Babak Taati, Pranay Lohia, Avril Mansfield, Ahmed Ashraf
Comments: 4 pages, 3 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Because falls are funny, YouTube and other video sharing sites contain a
large repository of real-life falls. We propose extracting gait and balance
information from these videos to help us better understand some of the factors
that contribute to falls. Proof-of-concept is explored in a single video
containing multiple (n=14) falls/non-falls in the presence of an unexpected
obstacle. The analysis explores: computing spatiotemporal parameters of gait in
a video captured from an arbitrary viewpoint; the relationship between
parameters of gait from the last few steps before the obstacle and falling vs.
not falling; and the predictive capacity of a multivariate model in predicting
a fall in the presence of an unexpected obstacle. Homography transformations
correct the perspective projection distortion and allow for the consistent
tracking of gait parameters as an individual walks in an arbitrary direction in
the scene. A synthetic top view allows for computing the average stride length
and a synthetic side view allows for measuring up and down motions of the head.
In leave-one-out cross-validation, we were able to correctly predict whether a
person would fall or not in 11 out of the 14 cases (78.6%), just by looking at
the average stride length and the range of vertical head motion during the 1-4
most recent steps prior to reaching the obstacle.

Incremental Nonparametric Weighted Feature Extraction for OnlineSubspace Pattern Classification

Hamid Abrishami Moghaddam, Elaheh Raisi
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, a new online method based on nonparametric weighted feature
extraction (NWFE) is proposed. NWFE was introduced to enjoy optimum
characteristics of linear discriminant analysis (LDA) and nonparametric
discriminant analysis (NDA) while rectifying their drawbacks. It emphasizes the
points near decision boundary by putting greater weights on them and
deemphasizes other points. Incremental nonparametric weighted feature
extraction (INWFE) is the online version of NWFE. INWFE has advantages of NWFE
method such as extracting more than L-1 features in contrast to LDA. It is
independent of the class distribution and performs well in complex distributed
data. The effects of outliers are reduced due to the nature of its
nonparametric scatter matrix. Furthermore, it is possible to add new samples
asynchronously, i.e. whenever a new sample becomes available at any given time,
it can be added to the algorithm. This is useful for many real world
applications since all data cannot be available in advance. This method is
implemented on Gaussian and non-Gaussian multidimensional data, a number of UCI
datasets and Indian Pine dataset. Results are compared with NWFE in terms of
classification accuracy and execution time. For nearest neighbour classifier it
shows that this technique converges to NWFE at the end of learning process. In
addition, the computational complexity is reduced in comparison with NWFE in
terms of execution time.

Predicting First Impressions with Deep Learning

Mel McCurrie, Fernando Beletti, Lucas Parzianello, Allen Westendorp, Samuel Anthony, Walter Scheirer
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Describable visual facial attributes are now commonplace in human biometrics
and affective computing, with existing algorithms even reaching a sufficient
point of maturity for placement into commercial products. These algorithms
model objective facets of facial appearance, such as hair and eye color,
expression, and aspects of the geometry of the face. A natural extension, which
has not been studied to any great extent thus far, is the ability to model
subjective attributes that are assigned to a face based purely on visual
judgements. For instance, with just a glance, our first impression of a face
may lead us to believe that a person is smart, worthy of our trust, and perhaps
even our admiration – regardless of the underlying truth behind such
attributes. Psychologists believe that these judgements are based on a variety
of factors such as emotional states, personality traits, and other physiognomic
cues. But work in this direction leads to an interesting question: how do we
create models for problems where there is no ground truth, only measurable
behavior? In this paper, we introduce a new convolutional neural network-based
regression framework that allows us to train predictive models of crowd
behavior for social attribute assignment. Over images from the AFLW face
database, these models demonstrate strong correlations with human crowd
ratings.

The Event-Camera Dataset: Event-based Data for Pose Estimation, Visual Odometry, and SLAM

Elias Mueggler, Henri Rebecq, Guillermo Gallego, Tobi Delbruck, Davide Scaramuzza
Comments: 7 pages, 4 figures, 3 tables
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

New vision sensors, such as the Dynamic and Active-pixel Vision sensor
(DAVIS), incorporate a conventional global-shutter camera and an event-based
sensor in the same pixel array. These sensors have great potential for
high-speed robotics and computer vision because they allow us to combine the
benefits of conventional cameras with those of event-based sensors: low
latency, high temporal resolution, and very high dynamic range. However, new
algorithms are required to exploit the sensor characteristics and cope with its
unconventional output, which consists of a stream of asynchronous brightness
changes (called “events”) and synchronous grayscale frames. For this purpose,
we present and release a collection of datasets captured with a DAVIS in a
variety of synthetic and real environments, which we hope will motivate
research on new algorithms for high-speed and high-dynamic-range robotics and
computer-vision applications. In addition to global-shutter intensity images
and asynchronous events, we also provide inertial measurements and ground truth
from a motion-capture system. All the data are released both as standard text
files and binary files (i.e., rosbag). This paper provides an overview of the
available data.

Image Segmentation for Fruit Detection and Yield Estimation in Apple Orchards

Suchet Bargoti, James Underwood
Comments: This paper is the initial version of the manuscript submitted to The Journal of Field Robotics in May 2016. Following reviews and revisions, the paper has been accepted for publication. The reviewed version includes extended comparison between the different classification frameworks and a more in-depth literature review
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Ground vehicles equipped with monocular vision systems are a valuable source
of high resolution image data for precision agriculture applications in
orchards. This paper presents an image processing framework for fruit detection
and counting using orchard image data. A general purpose image segmentation
approach is used, including two feature learning algorithms; multi-scale
Multi-Layered Perceptrons (MLP) and Convolutional Neural Networks (CNN). These
networks were extended by including contextual information about how the image
data was captured (metadata), which correlates with some of the appearance
variations and/or class distributions observed in the data. The pixel-wise
fruit segmentation output is processed using the Watershed Segmentation (WS)
and Circular Hough Transform (CHT) algorithms to detect and count individual
fruits. Experiments were conducted in a commercial apple orchard near
Melbourne, Australia. The results show an improvement in fruit segmentation
performance with the inclusion of metadata on the previously benchmarked MLP
network. We extend this work with CNNs, bringing agrovision closer to the
state-of-the-art in computer vision, where although metadata had negligible
influence, the best pixel-wise F1-score of (0.791) was achieved. The WS
algorithm produced the best apple detection and counting results, with a
detection F1-score of (0.858). As a final step, image fruit counts were
accumulated over multiple rows at the orchard and compared against the
post-harvest fruit counts that were obtained from a grading and counting
machine. The count estimates using CNN and WS resulted in the best performance
for this dataset, with a squared correlation coefficient of (r^2=0.826).

Artificial Intelligence

New Liftable Classes for First-Order Probabilistic Inference

Seyed Mehran Kazemi, Angelika Kimmig, Guy Van den Broeck, David Poole
Comments: Accepted at NIPS-2016. 22 pages, 1 figure
Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Statistical relational models provide compact encodings of probabilistic
dependencies in relational domains, but result in highly intractable graphical
models. The goal of lifted inference is to carry out probabilistic inference
without needing to reason about each individual separately, by instead treating
exchangeable, undistinguished objects as a whole. In this paper, we study the
domain recursion inference rule, which, despite its central role in early
theoretical results on domain-lifted inference, has later been believed
redundant. We show that this rule is more powerful than expected, and in fact
significantly extends the range of models for which lifted inference runs in
time polynomial in the number of individuals in the domain. This includes an
open problem called S4, the symmetric transitivity model, and a first-order
logic encoding of the birthday paradox. We further identify new classes S2FO2
and S2RU of domain-liftable theories, which respectively subsume FO2 and
recursively unary theories, the largest classes of domain-liftable theories
known so far, and show that using domain recursion can achieve exponential
speedup even in theories that cannot fully be lifted with the existing set of
inference rules.

A self-tuning Firefly algorithm to tune the parameters of Ant Colony System (ACSFA)

M. K. A. Ariyaratne, T. G. I. Fernando, S. Weerakoon
Comments: 18 pages, 21 references, 5 figures
Subjects: Artificial Intelligence (cs.AI)

Ant colony system (ACS) is a promising approach which has been widely used in
problems such as Travelling Salesman Problems (TSP), Job shop scheduling
problems (JSP) and Quadratic Assignment problems (QAP). In its original
implementation, parameters of the algorithm were selected by trial and error
approach. Over the last few years, novel approaches have been proposed on
adapting the parameters of ACS in improving its performance. The aim of this
paper is to use a framework introduced for self-tuning optimization algorithms
combined with the firefly algorithm (FA) to tune the parameters of the ACS
solving symmetric TSP problems. The FA optimizes the problem specific
parameters of ACS while the parameters of the FA are tuned by the selected
framework itself. With this approach, the user neither has to work with the
parameters of ACS nor the parameters of FA. Using common symmetric TSP problems
we demonstrate that the framework fits well for the ACS. A detailed statistical
analysis further verifies the goodness of the new ACS over the existing ACS and
also of the other techniques used to tune the parameters of ACS.

A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns

Zhuo Chen, Kyle Marple, Elmer Salazar, Gopal Gupta, Lakshman Tamil
Comments: Paper presented at the 32nd International Conference on Logic Programming (ICLP 2016), New York City, USA, 16-21 October 2016, 14 pages, LaTeX
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)

Management of chronic diseases such as heart failure, diabetes, and chronic
obstructive pulmonary disease (COPD) is a major problem in health care. A
standard approach that the medical community has devised to manage widely
prevalent chronic diseases such as chronic heart failure (CHF) is to have a
committee of experts develop guidelines that all physicians should follow.
These guidelines typically consist of a series of complex rules that make
recommendations based on a patient’s information. Due to their complexity,
often the guidelines are either ignored or not complied with at all, which can
result in poor medical practices. It is not even clear whether it is humanly
possible to follow these guidelines due to their length and complexity. In the
case of CHF management, the guidelines run nearly 80 pages. In this paper we
describe a physician-advisory system for CHF management that codes the entire
set of clinical practice guidelines for CHF using answer set programming. Our
approach is based on developing reasoning templates (that we call knowledge
patterns) and using these patterns to systemically code the clinical guidelines
for CHF as ASP rules. Use of the knowledge patterns greatly facilitates the
development of our system. Given a patient’s medical information, our system
generates a recommendation for treatment just as a human physician would, using
the guidelines. Our system will work even in the presence of incomplete
information. Our work makes two contributions: (i) it shows that highly complex
guidelines can be successfully coded as ASP rules, and (ii) it develops a
series of knowledge patterns that facilitate the coding of knowledge expressed
in a natural language and that can be used for other application domains. This
paper is under consideration for acceptance in TPLP.

Kissing Cuisines: Exploring Worldwide Culinary Habits on the Web

Sina Sajadmanesh, Sina Jafarzadeh, Seyed Ali Ossia, Hamid R. Rabiee, Hamed Haddadi, Yelena Mejova, Mirco Musolesi, Emiliano De Cristofaro, Gianluca Stringhini
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

As food becomes an important part of modern life, recipes shared on the web
are a great indicator of civilizations and culinary attitudes in different
countries. Similarly, ingredients, flavors, and nutrition information are
strong signals of the taste preferences of individuals from various parts of
the world. Yet, we do not have a thorough understanding of these palate
varieties.

In this paper, we present a large-scale study of recipes published on the Web
and their content, aiming to understand cuisines and culinary habits around the
world. Using a database of more than 157K recipes from over 200 different
cuisines, we analyze ingredients, flavors, and nutritional values which
distinguish dishes from different regions, and use this knowledge to assess the
predictability of recipes from different cuisines. We then use country health
statistics to understand the relation between these factors and health
indicators of different nations, such as obesity, diabetes, migration, and
health expenditure. Our results confirm the strong effects of geographical and
cultural similarities on recipes, health indicators, and culinary preferences
between countries around the world.

Universal adversarial perturbations

Given a state-of-the-art deep neural network classifier, we show the
existence of a universal (image-agnostic) and very small perturbation vector
that causes natural images to be misclassified with high probability. We
propose a systematic algorithm for computing universal perturbations, and show
that state-of-the-art deep neural networks are highly vulnerable to such
perturbations, albeit being quasi-imperceptible to the human eye. We further
empirically analyze these universal perturbations and show, in particular, that
they generalize very well across neural networks. The surprising existence of
universal perturbations reveals important geometric correlations among the
high-dimensional decision boundary of classifiers. It further outlines
potential security breaches with the existence of single directions in the
input space that adversaries can possibly exploit to break a classifier on most
natural images.

Quantum-enhanced machine learning

Vedran Dunjko, Jacob M. Taylor, Hans J. Briegel
Comments: 5+15 pages. This paper builds upon and mostly supersedes arXiv:1507.08482. In addition to results provided in this previous work, here we achieve learning improvements in more general environments, and provide connections to other work in quantum machine learning. Explicit constructions of oracularized environments given in arXiv:1507.08482 are omitted in this version
Journal-ref: Phys. Rev. Lett. 117, 130501 (2016)
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Learning (cs.LG)

The emerging field of quantum machine learning has the potential to
substantially aid in the problems and scope of artificial intelligence. This is
only enhanced by recent successes in the field of classical machine learning.
In this work we propose an approach for the systematic treatment of machine
learning, from the perspective of quantum information. Our approach is general
and covers all three main branches of machine learning: supervised,
unsupervised and reinforcement learning. While quantum improvements in
supervised and unsupervised learning have been reported, reinforcement learning
has received much less attention. Within our approach, we tackle the problem of
quantum enhancements in reinforcement learning as well, and propose a
systematic scheme for providing improvements. As an example, we show that
quadratic improvements in learning efficiency, and exponential improvements in
performance over limited time periods, can be obtained for a broad class of
learning problems.

Infinite-dimensional Log-Determinant divergences II: Alpha-Beta divergences

Minh Ha Quang
Comments: 60 pages
Subjects: Functional Analysis (math.FA); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

This work presents a parametrized family of divergences, namely Alpha-Beta
Log- Determinant (Log-Det) divergences, between positive definite unitized
trace class operators on a Hilbert space. This is a generalization of the
Alpha-Beta Log-Determinant divergences between symmetric, positive definite
matrices to the infinite-dimensional setting. The family of Alpha-Beta Log-Det
divergences is highly general and contains many divergences as special cases,
including the recently formulated infinite dimensional affine-invariant
Riemannian distance and the infinite-dimensional Alpha Log-Det divergences
between positive definite unitized trace class operators. In particular, it
includes a parametrized family of metrics between positive definite trace class
operators, with the affine-invariant Riemannian distance and the square root of
the symmetric Stein divergence being special cases. For the Alpha-Beta Log-Det
divergences between covariance operators on a Reproducing Kernel Hilbert Space
(RKHS), we obtain closed form formulas via the corresponding Gram matrices.

Information Retrieval

Inferring individual attributes from search engine queries and auxiliary information

Luca Soldaini, Elad Yom-Tov
Subjects: Information Retrieval (cs.IR)

Internet data has surfaced as a primary source for investigation of different
aspects of human behavior. A crucial step in such studies is finding a suitable
cohort (i.e., a set of users) that shares a common trait of interest to
researchers. However, direct identification of users sharing this trait is
often impossible, as the data available to researchers is usually anonymized to
preserve user privacy. To facilitate research on specific topics of interest,
especially in medicine, we introduce an algorithm for identifying a trait of
interest in anonymous users. We illustrate how a small set of labeled examples,
together with statistical information about the entire population, can be
aggregated to obtain labels on unseen examples. We validate our approach using
labeled data from the political domain.

We provide two applications of the proposed algorithm to the medical domain.
In the first, we demonstrate how to identify users whose search patterns
indicate they might be suffering from certain types of cancer. In the second,
we detail an algorithm to predict the distribution of diseases given their
incidence in a subset of the population at study, making it possible to predict
disease spread from partial epidemiological data.

Learning to Match Using Local and Distributed Representations of Text for Web Search

Bhaskar Mitra, Fernando Diaz, Nick Craswell
Subjects: Information Retrieval (cs.IR)

Models such as latent semantic analysis and those based on neural embeddings
learn distributed representations of text, and match the query against the
document in the latent semantic space. In traditional information retrieval
models, on the other hand, terms have discrete or local representations, and
the relevance of a document is determined by the exact matches of query terms
in the body text. We hypothesize that matching with distributed representations
complements matching with traditional local representations, and that a
combination of the two is favorable. We propose a novel document ranking model
composed of two separate deep neural networks, one that matches the query and
the document using a local representation, and another that matches the query
and the document using learned distributed representations. The two networks
are jointly trained as part of a single neural network. We show that this
combination or `duet’ performs significantly better than either neural network
individually on a Web page ranking task, and also significantly outperforms
traditional baselines and other recently proposed models based on neural
networks.

A recommender system for efficient discovery of new anomalies in large-scale access logs

Heju Jiang, Scott Algatt, Parvez Ahammad
Comments: 9 pages, 6 figures
Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR)

We present a novel, non-standard recommender system for large-scale security
policy management(SPM). Our system Helios discovers and recommends unknown and
unseen anomalies in large-scale access logs with minimal supervision and no
starting information on users and items. Typical recommender systems assume
availability of user- and item-related information, but such information is not
usually available in access logs. To resolve this problem, we first use
discrete categorical labels to construct categorical combinations from access
logs in a bootstrapping manner. Then, we utilize rank statistics of entity rank
and order categorical combinations for recommendation. From a double-sided cold
start, with minimal supervision, Helios learns to recommend most salient
anomalies at large-scale, and provides visualizations to security experts to
explain rationale behind the recommendations. Our experiments show Helios to be
suitable for large-scale applications: from cold starts, in less than 60
minutes, Helios can analyze roughly 4.6 billion records in logs of 400GB with
about 300 million potential categorical combinations, then generate ranked
categorical combinations as recommended discoveries. We also show that, even
with limited computing resources, Helios accelerates unknown and unseen anomaly
discovery process for SPM by 1 to 3 orders of magnitude, depending on use
cases. In addition, Helios’ design is flexible with metrics and measurement
fields used for discoveries and recommendations. Overall, our system leads to
more efficient and customizable SPM processes with faster discoveries of unseen
and unknown anomalies.

Modeling Ambiguity, Subjectivity, and Diverging Viewpoints in Opinion Question Answering Systems

Mengting Wan, Julian McAuley
Comments: 10 pages, accepted by ICDM’2016
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Product review websites provide an incredible lens into the wide variety of
opinions and experiences of different people, and play a critical role in
helping users discover products that match their personal needs and
preferences. To help address questions that can’t easily be answered by reading
others’ reviews, some review websites also allow users to pose questions to the
community via a question-answering (QA) system. As one would expect, just as
opinions diverge among different reviewers, answers to such questions may also
be subjective, opinionated, and divergent. This means that answering such
questions automatically is quite different from traditional QA tasks, where it
is assumed that a single `correct’ answer is available. While recent work
introduced the idea of question-answering using product reviews, it did not
account for two aspects that we consider in this paper: (1) Questions have
multiple, often divergent, answers, and this full spectrum of answers should
somehow be used to train the system; and (2) What makes a `good’ answer depends
on the asker and the answerer, and these factors should be incorporated in
order for the system to be more personalized. Here we build a new QA dataset
with 800 thousand questions—and over 3.1 million answers—and show that
explicitly accounting for personalization and ambiguity leads both to
quantitatively better answers, but also a more nuanced view of the range of
supporting, but subjective, opinions.

Dis-S2V: Discourse Informed Sen2Vec

Tanay Kumar Saha, Shafiq Joty, Naeemul Hassan, Mohammad Al Hasan
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Vector representation of sentences is important for many text processing
tasks that involve clustering, classifying, or ranking sentences. Recently,
distributed representation of sentences learned by neural models from unlabeled
data has been shown to outperform the traditional bag-of-words representation.
However, most of these learning methods consider only the content of a sentence
and disregard the relations among sentences in a discourse by and large.

In this paper, we propose a series of novel models for learning latent
representations of sentences (Sen2Vec) that consider the content of a sentence
as well as inter-sentence relations. We first represent the inter-sentence
relations with a language network and then use the network to induce contextual
information into the content-based Sen2Vec models. Two different approaches are
introduced to exploit the information in the network. Our first approach
retrofits (already trained) Sen2Vec vectors with respect to the network in two
different ways: (1) using the adjacency relations of a node, and (2) using a
stochastic sampling method which is more flexible in sampling neighbors of a
node. The second approach uses a regularizer to encode the information in the
network into the existing Sen2Vec model. Experimental results show that our
proposed models outperform existing methods in three fundamental information
system tasks demonstrating the effectiveness of our approach. The models
leverage the computational power of multi-core CPUs to achieve fine-grained
computational efficiency. We make our code publicly available upon acceptance.

Computation and Language

Distraction-Based Neural Networks for Document Summarization

Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang
Comments: Published in IJCAI-2016: the 25th International Joint Conference on Artificial Intelligence
Journal-ref: IJCAI, 2016
Subjects: Computation and Language (cs.CL)

Distributed representation learned with neural networks has recently shown to
be effective in modeling natural languages at fine granularities such as words,
phrases, and even sentences. Whether and how such an approach can be extended
to help model larger spans of text, e.g., documents, is intriguing, and further
investigation would still be desirable. This paper aims to enhance neural
network models for such a purpose. A typical problem of document-level modeling
is automatic summarization, which aims to model documents in order to generate
summaries. In this paper, we propose neural models to train computers not just
to pay attention to specific regions and content of input documents with
attention models, but also distract them to traverse between different content
of a document so as to better grasp the overall meaning for summarization.
Without engineering any features, we train the models on two large datasets.
The models achieve the state-of-the-art performance, and they significantly
benefit from the distraction modeling, particularly when input documents are
long.

Broad Context Language Modeling as Reading Comprehension

Zewei Chu, Hai Wang, Kevin Gimpel, David McAllester
Subjects: Computation and Language (cs.CL)

Progress in text understanding has been driven by the availability of large
datasets that test particular capabilities, like recent datasets for assessing
reading comprehension. We focus here on the LAMBADA dataset, a word prediction
task requiring broader context than the immediate sentence. We view the LAMBADA
task as a reading comprehension problem and apply off-the-shelf comprehension
models based on neural networks. Though these models are constrained to choose
a word from the context, they improve the state of the art on LAMBADA from 7.3%
to 45.4%. We analyze 100 instances, finding that neural network readers perform
well in cases that involve selecting a name from the context based on dialogue
or discourse cues but struggle when coreference resolution or external
knowledge is needed.

Content Selection in Data-to-Text Systems: A Survey

Dimitra Gkatzia
Subjects: Computation and Language (cs.CL)

Data-to-text systems are powerful in generating reports from data
automatically and thus they simplify the presentation of complex data. Rather
than presenting data using visualisation techniques, data-to-text systems use
natural (human) language, which is the most common way for human-human
communication. In addition, data-to-text systems can adapt their output content
to users’ preferences, background or interests and therefore they can be
pleasant for users to interact with. Content selection is an important part of
every data-to-text system, because it is the module that determines which from
the available information should be conveyed to the user. This survey initially
introduces the field of data-to-text generation, describes the general
data-to-text system architecture and then it reviews the state-of-the-art
content selection methods. Finally, it provides recommendations for choosing an
approach and discusses opportunities for future research.

Dis-S2V: Discourse Informed Sen2Vec

Tanay Kumar Saha, Shafiq Joty, Naeemul Hassan, Mohammad Al Hasan
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Vector representation of sentences is important for many text processing
tasks that involve clustering, classifying, or ranking sentences. Recently,
distributed representation of sentences learned by neural models from unlabeled
data has been shown to outperform the traditional bag-of-words representation.
However, most of these learning methods consider only the content of a sentence
and disregard the relations among sentences in a discourse by and large.

In this paper, we propose a series of novel models for learning latent
representations of sentences (Sen2Vec) that consider the content of a sentence
as well as inter-sentence relations. We first represent the inter-sentence
relations with a language network and then use the network to induce contextual
information into the content-based Sen2Vec models. Two different approaches are
introduced to exploit the information in the network. Our first approach
retrofits (already trained) Sen2Vec vectors with respect to the network in two
different ways: (1) using the adjacency relations of a node, and (2) using a
stochastic sampling method which is more flexible in sampling neighbors of a
node. The second approach uses a regularizer to encode the information in the
network into the existing Sen2Vec model. Experimental results show that our
proposed models outperform existing methods in three fundamental information
system tasks demonstrating the effectiveness of our approach. The models
leverage the computational power of multi-core CPUs to achieve fine-grained
computational efficiency. We make our code publicly available upon acceptance.

Word Embeddings and Their Use In Sentence Classification Tasks

Amit Mandelbaum, Adi Shalev
Comments: 13 pages, 10 figures
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

This paper have two parts. In the first part we discuss word embeddings. We
discuss the need for them, some of the methods to create them, and some of
their interesting properties. We also compare them to image embeddings and see
how word embedding and image embedding can be combined to perform different
tasks. In the second part we implement a convolutional neural network trained
on top of pre-trained word vectors. The network is used for several
sentence-level classification tasks, and achieves state-of-art (or comparable)
results, demonstrating the great power of pre-trainted word embeddings over
random ones.

Modeling Ambiguity, Subjectivity, and Diverging Viewpoints in Opinion Question Answering Systems

Mengting Wan, Julian McAuley
Comments: 10 pages, accepted by ICDM’2016
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Product review websites provide an incredible lens into the wide variety of
opinions and experiences of different people, and play a critical role in
helping users discover products that match their personal needs and
preferences. To help address questions that can’t easily be answered by reading
others’ reviews, some review websites also allow users to pose questions to the
community via a question-answering (QA) system. As one would expect, just as
opinions diverge among different reviewers, answers to such questions may also
be subjective, opinionated, and divergent. This means that answering such
questions automatically is quite different from traditional QA tasks, where it
is assumed that a single `correct’ answer is available. While recent work
introduced the idea of question-answering using product reviews, it did not
account for two aspects that we consider in this paper: (1) Questions have
multiple, often divergent, answers, and this full spectrum of answers should
somehow be used to train the system; and (2) What makes a `good’ answer depends
on the asker and the answerer, and these factors should be incorporated in
order for the system to be more personalized. Here we build a new QA dataset
with 800 thousand questions—and over 3.1 million answers—and show that
explicitly accounting for personalization and ambiguity leads both to
quantitatively better answers, but also a more nuanced view of the range of
supporting, but subjective, opinions.

Distributed, Parallel, and Cluster Computing

Havens: Explicit Reliable Memory Regions for HPC Applications

Saurabh Hukerikar, Christian Engelmann
Comments: 2016 IEEE High Performance Extreme Computing Conference (HPEC ’16), September 2016, Waltham, MA, USA
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Supporting error resilience in future exascale-class supercomputing systems
is a critical challenge. Due to transistor scaling trends and increasing memory
density, scientific simulations are expected to experience more interruptions
caused by transient errors in the system memory. Existing hardware-based
detection and recovery techniques will be inadequate to manage the presence of
high memory fault rates.

In this paper we propose a partial memory protection scheme based on
region-based memory management. We define the concept of regions called havens
that provide fault protection for program objects. We provide reliability for
the regions through a software-based parity protection mechanism. Our approach
enables critical program objects to be placed in these havens. The fault
coverage provided by our approach is application agnostic, unlike
algorithm-based fault tolerance techniques.

Oh-RAM! One and a Half Round Atomic Memory

Theophanis Hadjistasi, Nicolas Nicolaou, Alexander Schwarzmann
Comments: A Brief Announcement related to this work was presented at ACM PODC, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Emulating atomic read/write shared objects in a message-passing system is a
fundamental problem in distributed computing. Considering that network
communication is the most expensive resource, efficiency is measured first of
all in terms of the communication needed to implement read and write
operations. It is well known that 2 communication round-trip phases involving
in total 4 message exchanges are sufficient to implemented atomic operations.
It is also known that under certain constraints on the number of readers with
respect to the numbers of replica servers and failures it is possible to
implement single-writer atomic objects such that each operation involves one
round-trip phase. We present algorithms that allow operations to complete in 3
communication exchanges without imposing any constraints on the number of
readers and writers. Specifically, we present an atomic memory implementation
for the SWMR setting, where reads complete in 3 communication exchanges and
writes complete in 2 exchanges. We pose the question of whether it is possible
to implement MWMR memory where operations complete in at most 3 communication
exchanges. We answer this question in the negative by showing that an atomic
memory implementation is impossible if both read and write operations take 3
communication exchanges, even when assuming two writers, two readers, and a
single replica server failure. Motivated by this impossibility result, we
provide a MWMR atomic memory implementation where reads involve 3 and writes 4
communication exchanges. In light of our impossibility result these algorithms
are optimal in terms of the number of communication exchanges. We rigorously
reason about the correctness of the algorithms.

An Enhanced Model for Stochastic Coordination

Nuno Oliveira (HASLab INESC TEX), Luis Soares Barbosa (HASLab INESC TEC)
Comments: In Proceedings iFMCloud 2016, arXiv:1610.07700
Journal-ref: EPTCS 228, 2016, pp. 35-45
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)

Applications developed over the cloud coordinate several, often anonymous,
computational resources, distributed over different execution nodes, within
flexible architectures. Coordination models able to represent quantitative data
provide a powerful basis for their analysis and validation. This paper extends
IMCreo, a semantic model for Stochastic reo based on interactive Markov chains,
to enhance its scalability, by regarding each channel and node, as well as
interface components, as independent stochastic processes that may (or may not)
synchronise with the rest of the coordination circuit.

Modeling Deployment Decisions for Elastic Services with ABS

Einar Broch Johnsen, Ka I Pun, S. Lizeth Tapia Tarifa
Comments: In Proceedings iFMCloud 2016, arXiv:1610.07700
Journal-ref: EPTCS 228, 2016, pp. 16-26
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

The use of cloud technology can offer significant savings for the deployment
of services, provided that the service is able to make efficient use of the
available virtual resources to meet service-level requirements. To avoid
software designs that scale poorly, it is important to make deployment
decisions for the service at design time, early in the development of the
service itself. ABS offers a formal, model-based approach which integrates the
design of services with the modeling of deployment decisions. In this paper, we
illustrate the main concepts of this approach by modeling a scalable pool of
workers with an auto-scaling strategy and by using the model to compare
deployment decisions with respect to client traffic with peak loads.

Static Analysis Using the Cloud

Rahul Kumar (Microsoft Research, Redmond, WA, USA), Chetan Bansal (Microsoft Research, Redmond, WA, USA), Jakob Lichtenberg (Microsoft, Redmond, WA, USA)
Comments: In Proceedings iFMCloud 2016, arXiv:1610.07700
Journal-ref: EPTCS 228, 2016, pp. 2-15
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

In this paper we describe our experience of using Microsoft Azure cloud
computing platform for static analysis. We start by extending Static Driver
Verifier to operate in the Microsoft Azure cloud with significant improvements
in performance and scalability. We present our results of using SDV on single
drivers and driver suites using various configurations of the cloud relative to
a local machine. Finally, we describe the Static Module Verifier platform, a
highly extensible and configurable platform for static analysis of generic
modules, where we have integrated support for verification using a cloud
services provider (Microsoft Azure in this case).

The Reverse Cuthill-McKee Algorithm in Distributed-Memory

Ariful Azad, Mathias Jacquelin, Aydin Buluc, Esmond G. Ng
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (cs.NA); Numerical Analysis (math.NA)

Ordering vertices of a graph is key to minimize fill-in and data structure
size in sparse direct solvers, maximize locality in iterative solvers, and
improve performance in graph algorithms. Except for naturally parallelizable
ordering methods such as nested dissection, many important ordering methods
have not been efficiently mapped to distributed-memory architectures. In this
paper, we present the first-ever distributed-memory implementation of the
reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse
matrix. Our parallelization uses a two-dimensional sparse matrix decomposition.
We achieve high performance by decomposing the problem into a small number of
primitives and utilizing optimized implementations of these primitives. Our
implementation shows strong scaling up to 1024 cores for smaller matrices and
up to 4096 cores for larger matrices.

Evaluating load balancing policies for performance and energy-efficiency

Freek van den Berg (University of Twente), Björn F. Postema (University of Twente), Boudewijn R. Haverkort (University of Twente)
Comments: In Proceedings QAPL’16, arXiv:1610.07696
Journal-ref: EPTCS 227, 2016, pp. 98-117
Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC)

Nowadays, more and more increasingly hard computations are performed in
challenging fields like weather forecasting, oil and gas exploration, and
cryptanalysis. Many of such computations can be implemented using a computer
cluster with a large number of servers. Incoming computation requests are then,
via a so-called load balancing policy, distributed over the servers to ensure
optimal performance. Additionally, being able to switch-off some servers during
low period of workload, gives potential to reduced energy consumption.
Therefore, load balancing forms, albeit indirectly, a trade-off between
performance and energy consumption. In this paper, we introduce a syntax for
load-balancing policies to dynamically select a server for each request based
on relevant criteria, including the number of jobs queued in servers, power
states of servers, and transition delays between power states of servers. To
evaluate many policies, we implement two load balancers in: (i) iDSL, a
language and tool-chain for evaluating service-oriented systems, and (ii) a
simulation framework in AnyLogic. Both implementations are successfully
validated by comparison of the results.

Parameterized Dataflow (Extended Abstract)

Dominic Duggan (Stevens Institute of Technology), Jianhua Yao (Stevens Institute of Technology)
Comments: In Proceedings QAPL’16, arXiv:1610.07696
Journal-ref: EPTCS 227, 2016, pp. 63-81
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)

Dataflow networks have application in various forms of stream processing, for
example for parallel processing of multimedia data. The description of dataflow
graphs, including their firing behavior, is typically non-compositional and not
amenable to separate compilation. This article considers a dataflow language
with a type and effect system that captures the firing behavior of actors. This
system allows definitions to abstract over actor firing rates, supporting the
definition and safe composition of actor definitions where firing rates are not
instantiated until a dataflow graph is launched.

Learning

Adaptive matching pursuit for sparse signal recovery

Tiep H. Vu, Hojjat S. Mousavi, Vishal Monga
Comments: ICASSP
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Spike and Slab priors have been of much recent interest in signal processing
as a means of inducing sparsity in Bayesian inference. Applications domains
that benefit from the use of these priors include sparse recovery, regression
and classification. It is well-known that solving for the sparse coefficient
vector to maximize these priors results in a hard non-convex and mixed integer
programming problem. Most existing solutions to this optimization problem
either involve simplifying assumptions/relaxations or are computationally
expensive. We propose a new greedy and adaptive matching pursuit (AMP)
algorithm to directly solve this hard problem. Essentially, in each step of the
algorithm, the set of active elements would be updated by either adding or
removing one index, whichever results in better improvement. In addition, the
intermediate steps of the algorithm are calculated via an inexpensive Cholesky
decomposition which makes the algorithm much faster. Results on simulated data
sets as well as real-world image recovery challenges confirm the benefits of
the proposed AMP, particularly in providing a superior cost-quality trade-off
over existing alternatives.

An Improved Approach for Prediction of Parkinson's Disease using Machine Learning Techniques

Kamal Nayan Reddy Challa, Venkata Sasank Pagolu, Ganapati Panda, Babita Majhi
Comments: Conference Paper
Subjects: Learning (cs.LG)

Parkinson’s disease (PD) is one of the major public health problems in the
world. It is a well-known fact that around one million people suffer from
Parkinson’s disease in the United States whereas the number of people suffering
from Parkinson’s disease worldwide is around 5 million. Thus, it is important
to predict Parkinson’s disease in early stages so that early plan for the
necessary treatment can be made. People are mostly familiar with the motor
symptoms of Parkinson’s disease, however, an increasing amount of research is
being done to predict the Parkinson’s disease from non-motor symptoms that
precede the motor ones. If an early and reliable prediction is possible then a
patient can get a proper treatment at the right time. Nonmotor symptoms
considered are Rapid Eye Movement (REM) sleep Behaviour Disorder (RBD) and
olfactory loss. Developing machine learning models that can help us in
predicting the disease can play a vital role in early prediction. In this
paper, we extend a work which used the non-motor features such as RBD and
olfactory loss. Along with this the extended work also uses important
biomarkers. In this paper, we try to model this classifier using different
machine learning models that have not been used before. We developed automated
diagnostic models using Multilayer Perceptron, BayesNet, Random Forest and
Boosted Logistic Regression. It has been observed that Boosted Logistic
Regression provides the best performance with an impressive accuracy of 97.159
% and the area under the ROC curve was 98.9%. Thus, it is concluded that these
models can be used for early prediction of Parkinson’s disease.

Things Bayes can't do

Daniil Ryabko
Journal-ref: In Proceedings of ALT, LNCS 9925, pp.253-260, Bari, Italy, 2016
Subjects: Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

The problem of forecasting conditional probabilities of the next event given
the past is considered in a general probabilistic setting. Given an arbitrary
(large, uncountable) set C of predictors, we would like to construct a single
predictor that performs asymptotically as well as the best predictor in C, on
any data. Here we show that there are sets C for which such predictors exist,
but none of them is a Bayesian predictor with a prior concentrated on C. In
other words, there is a predictor with sublinear regret, but every Bayesian
predictor must have a linear regret. This negative finding is in sharp contrast
with previous results that establish the opposite for the case when one of the
predictors in (C) achieves asymptotically vanishing error. In such a case, if
there is a predictor that achieves asymptotically vanishing error for any
measure in C, then there is a Bayesian predictor that also has this property,
and whose prior is concentrated on (a countable subset of) C.

Word Embeddings and Their Use In Sentence Classification Tasks

Amit Mandelbaum, Adi Shalev
Comments: 13 pages, 10 figures
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

This paper have two parts. In the first part we discuss word embeddings. We
discuss the need for them, some of the methods to create them, and some of
their interesting properties. We also compare them to image embeddings and see
how word embedding and image embedding can be combined to perform different
tasks. In the second part we implement a convolutional neural network trained
on top of pre-trained word vectors. The network is used for several
sentence-level classification tasks, and achieves state-of-art (or comparable)
results, demonstrating the great power of pre-trainted word embeddings over
random ones.

Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation

Thomas Brouwer, Jes Frellsen, Pietro Lio'
Comments: NIPS 2016 Workshop on Advances in Approximate Bayesian Inference
Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

We present a fast variational Bayesian algorithm for performing non-negative
matrix factorisation and tri-factorisation. We show that our approach achieves
faster convergence per iteration and timestep (wall-clock) than Gibbs sampling
and non-probabilistic approaches, and do not require additional samples to
estimate the posterior. We show that in particular for matrix tri-factorisation
convergence is difficult, but our variational Bayesian approach offers a fast
solution, allowing the tri-factorisation approach to be used more effectively.

Socratic Learning

Rose Yu, Paroma Varma, Dan Iter, Christopher De Sa, Christopher Ré
Comments: 3 figures
Subjects: Learning (cs.LG)

Modern machine learning techniques, such as deep learning, often use
discriminative models that require large amounts of labeled data. An
alternative approach is to use a generative model, which leverages heuristics
from domain experts to train on unlabeled data. Domain experts often prefer to
use generative models because they “tell a story” about their data.
Unfortunately, generative models are typically less accurate than
discriminative models. Several recent approaches combine both types of model to
exploit their strengths. In this setting, a misspecified generative model can
hurt the performance of subsequent discriminative training. To address this
issue, we propose a framework called Socratic learning that automatically uses
information from the discriminative model to correct generative model
misspecification. Furthermore, this process provides users with interpretable
feedback about how to improve their generative model. We evaluate Socratic
learning on real-world relation extraction tasks and observe an immediate
improvement in classification accuracy that could otherwise require several
weeks of effort by domain experts.

Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, Krishna P. Gummadi
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

As the use of automated decision making systems becomes wide-spread, there is
a growing concern about their potential unfairness towards people with certain
traits. Anti-discrimination laws in various countries prohibit unfair treatment
of individuals based on specific traits, also called sensitive attributes
(e.g., gender, race). In many learning scenarios, the trained algorithms
(classifiers) make decisions with certain inaccuracy (misclassification rate).
As learning mechanisms target minimizing the error rate for all decisions, it
is quite possible that the optimally trained algorithm makes decisions for
users belonging to different sensitive attribute groups with different error
rates (e.g., decision errors for females are higher than for males). To account
for and avoid such unfairness when learning, in this paper, we introduce a new
notion of unfairness, disparate mistreatment, which is defined in terms of
misclassification rates. We then propose an intuitive measure of disparate
mistreatment for decision boundary-based classifiers, which can be easily
incorporated into their formulation as a convex-concave constraint. Experiments
on synthetic as well as real world datasets show that our methodology is
effective at avoiding disparate mistreatment, often at a small cost in terms of
accuracy.

Counterfactual Reasoning about Intent for Interactive Navigation in Dynamic Environments

A. Bordallo, F. Previtali, N. Nardelli, S. Ramamoorthy
Journal-ref: 2015 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS 2015), pp. 2943-2950
Subjects: Robotics (cs.RO); Learning (cs.LG)

Many modern robotics applications require robots to function autonomously in
dynamic environments including other decision making agents, such as people or
other robots. This calls for fast and scalable interactive motion planning.
This requires models that take into consideration the other agent’s intended
actions in one’s own planning. We present a real-time motion planning framework
that brings together a few key components including intention inference by
reasoning counterfactually about potential motion of the other agents as they
work towards different goals. By using a light-weight motion model, we achieve
efficient iterative planning for fluid motion when avoiding pedestrians, in
parallel with goal inference for longer range movement prediction. This
inference framework is coupled with a novel distributed visual tracking method
that provides reliable and robust models for the current belief-state of the
monitored environment. This combined approach represents a computationally
efficient alternative to previously studied policy learning methods that often
require significant offline training or calibration and do not yet scale to
densely populated environments. We validate this framework with experiments
involving multi-robot and human-robot navigation. We further validate the
tracker component separately on much larger scale unconstrained pedestrian data
sets.

Universal adversarial perturbations

Given a state-of-the-art deep neural network classifier, we show the
existence of a universal (image-agnostic) and very small perturbation vector
that causes natural images to be misclassified with high probability. We
propose a systematic algorithm for computing universal perturbations, and show
that state-of-the-art deep neural networks are highly vulnerable to such
perturbations, albeit being quasi-imperceptible to the human eye. We further
empirically analyze these universal perturbations and show, in particular, that
they generalize very well across neural networks. The surprising existence of
universal perturbations reveals important geometric correlations among the
high-dimensional decision boundary of classifiers. It further outlines
potential security breaches with the existence of single directions in the
input space that adversaries can possibly exploit to break a classifier on most
natural images.

Quantum-enhanced machine learning

The emerging field of quantum machine learning has the potential to
substantially aid in the problems and scope of artificial intelligence. This is
only enhanced by recent successes in the field of classical machine learning.
In this work we propose an approach for the systematic treatment of machine
learning, from the perspective of quantum information. Our approach is general
and covers all three main branches of machine learning: supervised,
unsupervised and reinforcement learning. While quantum improvements in
supervised and unsupervised learning have been reported, reinforcement learning
has received much less attention. Within our approach, we tackle the problem of
quantum enhancements in reinforcement learning as well, and propose a
systematic scheme for providing improvements. As an example, we show that
quadratic improvements in learning efficiency, and exponential improvements in
performance over limited time periods, can be obtained for a broad class of
learning problems.

Universality of Baysian mixture predictors

Daniil Ryabko
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Learning (cs.LG)

The problem is that of sequential probability forecasting for finite-valued
time series. The data is generated by an unknown probability distribution over
the space of all one-way infinite sequences. It is known that this measure
belongs to a given set C, but the latter is completely arbitrary (uncountably
infinite, without any structure given). The performance is measured with
asymptotic average log loss. In this work it is shown that the minimax
asymptotic performance is always attainable, and it is attained by a convex
combination of a countably many measures from the set C (a Bayesian mixture).
This was previously only known for the case when the best achievable asymptotic
error is 0. This also contrasts previous results that show that in the
non-realizable case all Bayesian mixtures may be suboptimal, while there is a
predictor that achieves the optimal performance.

Automatic measurement of vowel duration via structured prediction

Yossi Adi, Joseph Keshet, Emily Cibelli, Erin Gustafson, Cynthia Clopper, Matthew Goldrick
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

A key barrier to making phonetic studies scalable and replicable is the need
to rely on subjective, manual annotation. To help meet this challenge, a
machine learning algorithm was developed for automatic measurement of a widely
used phonetic measure: vowel duration. Manually-annotated data were used to
train a model that takes as input an arbitrary length segment of the acoustic
signal containing a single vowel that is preceded and followed by consonants
and outputs the duration of the vowel. The model is based on the structured
prediction framework. The input signal and a hypothesized set of a vowel’s
onset and offset are mapped to an abstract vector space by a set of acoustic
feature functions. The learning algorithm is trained in this space to minimize
the difference in expectations between predicted and manually-measured vowel
durations. The trained model can then automatically estimate vowel durations
without phonetic or orthographic transcription. Results comparing the model to
three sets of manually annotated data suggest it out-performed the current gold
standard for duration measurement, an HMM-based forced aligner (which requires
orthographic or phonetic transcription as an input).

Image Segmentation for Fruit Detection and Yield Estimation in Apple Orchards

Ground vehicles equipped with monocular vision systems are a valuable source
of high resolution image data for precision agriculture applications in
orchards. This paper presents an image processing framework for fruit detection
and counting using orchard image data. A general purpose image segmentation
approach is used, including two feature learning algorithms; multi-scale
Multi-Layered Perceptrons (MLP) and Convolutional Neural Networks (CNN). These
networks were extended by including contextual information about how the image
data was captured (metadata), which correlates with some of the appearance
variations and/or class distributions observed in the data. The pixel-wise
fruit segmentation output is processed using the Watershed Segmentation (WS)
and Circular Hough Transform (CHT) algorithms to detect and count individual
fruits. Experiments were conducted in a commercial apple orchard near
Melbourne, Australia. The results show an improvement in fruit segmentation
performance with the inclusion of metadata on the previously benchmarked MLP
network. We extend this work with CNNs, bringing agrovision closer to the
state-of-the-art in computer vision, where although metadata had negligible
influence, the best pixel-wise F1-score of (0.791) was achieved. The WS
algorithm produced the best apple detection and counting results, with a
detection F1-score of (0.858). As a final step, image fruit counts were
accumulated over multiple rows at the orchard and compared against the
post-harvest fruit counts that were obtained from a grading and counting
machine. The count estimates using CNN and WS resulted in the best performance
for this dataset, with a squared correlation coefficient of (r^2=0.826).

A statistical framework for fair predictive algorithms

Kristian Lum, James Johndrow
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Predictive modeling is increasingly being employed to assist human
decision-makers. One purported advantage of replacing human judgment with
computer models in high stakes settings– such as sentencing, hiring, policing,
college admissions, and parole decisions– is the perceived “neutrality” of
computers. It is argued that because computer models do not hold personal
prejudice, the predictions they produce will be equally free from prejudice.
There is growing recognition that employing algorithms does not remove the
potential for bias, and can even amplify it, since training data were
inevitably generated by a process that is itself biased. In this paper, we
provide a probabilistic definition of algorithmic bias. We propose a method to
remove bias from predictive models by removing all information regarding
protected variables from the permitted training data. Unlike previous work in
this area, our framework is general enough to accommodate arbitrary data types,
e.g. binary, continuous, etc. Motivated by models currently in use in the
criminal justice system that inform decisions on pre-trial release and
paroling, we apply our proposed method to a dataset on the criminal histories
of individuals at the time of sentencing to produce “race-neutral” predictions
of re-arrest. In the process, we demonstrate that the most common approach to
creating “race-neutral” models– omitting race as a covariate– still results
in racially disparate predictions. We then demonstrate that the application of
our proposed method to these data removes racial disparities from predictions
with minimal impact on predictive accuracy.

Information Theory

Low-Delay Distributed Source Coding for Time-Varying Sources with Unknown Statistics

Fangzhou Chen, Bin Li, Can Emre Koksal
Subjects: Information Theory (cs.IT)

We consider a system in which two nodes take correlated measurements of a
random source with time-varying and unknown statistics. The observations of the
source at the first node are to be losslessly replicated with a given
probability of outage at the second node, which receives data from the first
node over a constant-rate errorless channel. We develop a system and associated
strategies for joint distributed source coding (encoding and decoding) and
transmission control in order to achieve low end-to-end delay. Slepian-Wolf
coding in its traditional form cannot be applied in our scenario, since the
encoder requires the joint statistics of the observations and the associated
decoding delay is very high. We analytically evaluate the performance of our
strategies and show that the delay achieved by them are order optimal, as the
conditional entropy of the source approaches to the channel rate. We also
evaluate the performance of our algorithms based on real-world experiments
using two cameras recording videos of a scene at different angles. Having
realized our schemes, we demonstrated that, even with a very low-complexity
quantizer, a compression ratio of approximately 50% is achievable for lossless
replication at the decoder, at an average delay of a few seconds.

Sum-networks: Dependency on Characteristic of the Finite Field under Linear Network Coding

Niladri Das, Brijesh Kumar Rai
Subjects: Information Theory (cs.IT)

Sum-networks are networks where all the terminals demand the sum of the
symbols generated at the sources. It has been shown that for any finite
set/co-finite set of prime numbers, there exists a sum-network which has a
vector linear solution if and only if the characteristic of the finite field
belongs to the given set. It has also been shown that for any positive rational
number (k/n), there exists a sum-network which has capacity equal to (k/n). It
is a natural question whether, for any positive rational number (k/n), and for
any finite set/co-finite set of primes ({p_1,p_2,ldots,p_l}), there exists a
sum-network which has a capacity achieving rate (k/n) fractional linear network
coding solution if and only if the characteristic of the finite field belongs
to the given set. We show that indeed there exists such a sum-network by
constructing such a sum-network.

Joint MOO of Transmit Precoding and Receiver Design in a Downlink Time Switching MISO SWIPT System

Nafiseh Janatian, Ivan Stupia, Luc Vandendorpe
Subjects: Information Theory (cs.IT)

In this paper, we consider a time-switching (TS) co-located simultaneous
wireless information and power transfer (SWIPT) system consisting of multiple
multi-antenna access points which serve multiple single antenna users. In this
scenario, we propose a multi-objective optimization (MOO) framework to design
jointly the Pareto optimal beamforming vector and the TS ratio for each
receiver. The objective is to maximize the utility vector including the
achieved data rates and the harvested energies of all users simultaneously.
This problem is a non-convex rank-constrained MOO problem which is relaxed and
transformed into a non-convex semidefinite program (SDP) based on the weighted
Chebycheff method. The majorization-minimization algorithm is utilized to solve
the nonconvex SDP and the optimal solution is proved to satisfy the rank
constraint. We also study the problem of optimizing the beamforming vectors in
a fixed TS ratio scenario with the same approach. Numerical results are
provided for two coordinated access points with MISO configuration. The results
illustrate the trade-off between harvested energy and information data rate
objectives and show the effect of optimizing the precoding strategy and TS
ratio on this trade-off.

Cube-Split: Structured Quantizers on the Grassmannian of Lines

Alexis Decurninge, Maxime Guillaud
Comments: submitted to IEEE Wireless Communications and Networking Conference 2017
Subjects: Information Theory (cs.IT)

This paper introduces a new quantization scheme for real and complex
Grassmannian sources. The proposed approach relies on a structured codebook
based on a geometric construction of a collection of bent grids defined from an
initial mesh on the unit-norm sphere. The associated encoding and decoding
algorithms have very low complexity (equivalent to a scalar quantizer), while
their efficiency (in terms of the achieved distortion) is on par with the best
known structured approaches, and compares well with the theoretical bounds.
These properties make this codebook suitable for high-resolutions, real-time
applications such as channel state feedback in massive multiple-input
multiple-output (MIMO) wireless communication systems.

A New Piggybacking Design for Systematic MDS Storage Codes

Chong Shangguan, Gennian Ge
Comments: 7 pages, submitted
Subjects: Information Theory (cs.IT)

Distributed storage codes have important applications in the design of modern
storage systems. In a distributed storage system, every storage node has a
probability to fail and once an individual storage node fails, it must be
reconstructed using data stored in the surviving nodes. Computation load and
network bandwidth are two important issues we need to concern when repairing a
failed node. The traditional maximal distance separable (MDS) storage codes
have low repair complexity but high repair bandwidth. On the contrary, minimal
storage regenerating (MSR) codes have low repair bandwidth but high repair
complexity. Fortunately, the newly introduced piggyback codes combine the
advantages of both ones.

In this paper, by introducing a novel piggybacking design framework for
systematic MDS codes, we construct a storage code whose average repair
bandwidth rate, i.e., the ratio of average repair bandwidth and the amount of
the original data, can be as low as (frac{sqrt{2r-1}}{r}), which
significantly improves the ratio (frac{r-1}{2r-1}) of the previous result. In
the meanwhile, every failed systematic node of the new code can be
reconstructed quickly using the decoding algorithm of an MDS code, only with
some additional additions over the underlying finite field. This is very fast
compared with the complex matrix multiplications needed in the repair of a
failed node of an MSR code.

The reversible negacyclic codes over finite fields

Shixin Zhu, Binbin Pang, Zhonghua Sun
Subjects: Information Theory (cs.IT)

In this paper, by investigating the factor of the (x^n+1), we deduce that the
structure of the reversible negacyclic code over the finite field
(mathbb{F}_{q}), where (q) is an odd prime power. Though studying
(q-)cyclotomic cosets modulo (2n), we obtain the parameters of negacyclic BCH
code of length (n=frac{q^ell+1}{2}) , (n=frac{q^m-1}{2(q-1)}) and
(n=frac{q^{tcdot2^ au}-1}{2(q^t+1)}). Some optimal linear codes from
negacyclic codes are given. Finally, we discuss a class of MDS LCD negacyclic
codes.

Multi-Antenna Transmission in Downlink Heterogeneous Cellular Networks under A Threshold-based Mobile Association Policy

Tong-Xing Zheng, Hui-Ming Wang, Moon Ho Lee
Comments: Journal paper, double column, 13 pages, 9 figures, accepted to appear on IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)

With the recent emergence of 5G era, heterogeneous cellular networks (HCNs)
have invoked a popular research interest. In this paper, we provide a
comprehensive analysis for multiantenna transmissions in a multi-tier downlink
HCN. We first propose a reliability-oriented threshold-based mobile association
policy, where each user connects to the strongest base station from which this
user can obtain the largest truncated long-term received power. Under our
mobile association policy, we derive analytical expressions for the exact
outage probability of an arbitrary randomly located user, along with
computationally convenient lower and upper bounds. Asymptotic analysis on the
outage probability shows that introducing a large access threshold into mobile
association significantly decreases the outage probability. We further
investigate the spectrum efficiency and the energy efficiency of the HCN. Our
theoretic analysis and numerical validations show that both the spectrum and
energy efficiencies can be improved by properly choosing the access threshold.

Transfer entropy in continuous time, with applications to jump and neural spiking processes

Richard E. Spinney, Mikhail Prokopenko, Joseph T. Lizier
Comments: 21 pages, 2 figures
Subjects: Information Theory (cs.IT); Adaptation and Self-Organizing Systems (nlin.AO); Data Analysis, Statistics and Probability (physics.data-an); Neurons and Cognition (q-bio.NC)

Transfer entropy has been used to quantify the directed flow of information
between source and target variables in many complex systems. Originally
formulated in discrete time, we provide a framework for considering transfer
entropy in continuous time systems. By appealing to a measure theoretic
formulation we generalise transfer entropy, describing it in terms of
Radon-Nikodym derivatives between measures of complete path realisations. The
resulting formalism introduces and emphasises the idea that transfer entropy is
an expectation of an individually fluctuating quantity along a path, in the
same way we consider the expectation of physical quantities such as work and
heat. We recognise that transfer entropy is a quantity accumulated over a
finite time interval, whilst permitting an associated instantaneous transfer
entropy rate. We use this approach to produce an explicit form for the transfer
entropy for pure jump processes, and highlight the simplified form in the
specific case of point processes (frequently used in neuroscience to model
neural spike trains). We contrast our approach with previous attempts to
formulate information flow between continuous time point processes within a
discrete time framework, which incur issues that our continuous time approach
naturally avoids. Finally, we present two synthetic spiking neuron model
examples to exhibit the pertinent features of our formalism, namely that the
information flow for point processes consists of discontinuous jump
contributions (at spikes in the target) interrupting a continuously varying
contribution (relating to waiting times between target spikes).

Controlled Barrage Regions: Stochastic Modeling, Analysis, and Optimization

Salvatore Talarico, Matthew C. Valenti, Thomas R. Halford
Comments: 7 pages, 3 images, to appear in Military Communication Conference (MILCOM). arXiv admin note: text overlap with arXiv:1408.5928
Subjects: Information Theory (cs.IT)

A barrage relay network (BRN) is a broadcast oriented ad hoc network
involving autonomous cooperative communication, a slotted time-division frame
format, and a coarse slot-level synchronization. While inherently a broadcast
protocol, BRNs can support unicast transmission by superimposing a plurality of
controlled barrage regions (CBRs) onto the network. Within each CBRs, a new
packet is injected by the unicast source during the first time slot of each new
radio frame. When a CBRs is sufficiently long that a packet might not be able
to reach the other end within a radio frame, multiple packets can be active at
the same time via spatial pipelining, resulting in interference within the
CBRs. In this paper, the dynamics of packet transmission within a CBRs is
described as a Markov process, and the outage probability of each link within
the CBRs is evaluated in closed form, thereby accounting for fading and
co-channel interference. In order to account for the linkage between
simultaneous active packets and their temporal correlation, a Viterbi-like
algorithm is used. Using this accurate analytical framework, a line network is
optimized, which identifies the code rate, the number of relays, and the length
of a radio frame that maximizes the transport capacity.

Asynchronous Transmission over Gaussian Interference Channels with Stochastic Data Arrival

Kamyar Moshksar
Subjects: Information Theory (cs.IT)

This paper addresses a Gaussian interference channel with two
transmitter-receiver~(Tx-Rx) pairs under stochastic data arrival~(GIC-SDA).
Information bits arrive at the transmitters according to independent and
asynchronous Bernoulli processes~(Tx-Tx~asynchrony). Each information source
turns off after generating a given total number of bits. The transmissions are
extit{asynchronous} (Tx-Rx~asynchrony) in the sense that each Tx sends a
codeword to its Rx immediately after there are enough bits available in its
buffer. Such asynchronous style of transmission is shown to significantly
reduce the transmission delay in comparison with the existing Tx-Rx synchronous
transmission schemes. The receivers learn the activity frames of both
transmitters by employing sequential joint-typicality detection. As a
consequence, the GIC-SDA under Tx-Rx asynchrony is represented by a standard
GIC with state known at the receivers. The cardinality of the state space is
(inom{2N_1+2N_2}{2N_2}) in which (N_1, N_2) are the numbers of transmitted
codewords by the two transmitters. Each realization of the state imposes two
sets of constraints on (N_1, N_2) referred to as the geometric and reliability
constraints. In a scenario where the transmitters are only aware of the
statistics of Tx-Tx~asynchrony, it is shown how one designs (N_1,N_2) to
achieve target transmission rates for both users and minimize the probability
of unsuccessful decoding.

Optimal Power Allocation and Active Interference Mitigation for Spatial Multiplexed MIMO Cognitive Systems

Nikolaos I. Miridakis, Minghua Xia, Theodoros A. Tsiftsis
Subjects: Information Theory (cs.IT)

In this paper, the performance of an underlay multiple-input multiple-output
(MIMO) cognitive radio system is analytically studied. In particular, the
secondary transmitter operates in a spatial multiplexing transmission mode,
while a zero-forcing (ZF) detector is employed at the secondary receiver.
Additionally, the secondary system is interfered by multiple randomly
distributed single-antenna primary users (PUs). To enhance the performance of
secondary transmission, optimal power allocation is performed at the secondary
transmitter with a constraint on the interference temperature (IT) specified by
the PUs. The outage probability of the secondary receiver is explicitly derived
in an exact closed-form expression. Also, some special cases of practical
interest, including co-located PUs and massive MIMO, are discussed. Further, to
mitigate instantaneous excessive interference onto PUs caused by the
time-average IT, an iterative antenna reduction algorithm is developed for the
secondary transmitter and, accordingly, the average number of transmit antennas
is analytically computed. Extensive numerical and simulation results
corroborate the effectiveness of our analysis.

Distributed Spatial Multiplexing Systems with Hardware Impairments and Imperfect Channel Estimation under Rank-(1) Rician Fading Channels

Nikolaos I. Miridakis, Theodoros A. Tsiftsis, Corbett Rowell
Subjects: Information Theory (cs.IT)

The performance of a multiuser communication system with single-antenna
transmitting terminals and a multi-antenna base-station receiver is
analytically investigated. The system operates under independent and
non-identically distributed rank-(1) Rician fading channels with imperfect
channel estimation and residual hardware impairments (compensation algorithms
are assumed, which mitigate the main impairments) at the transceiver. The
spatial multiplexing mode of operation is considered where all the users are
simultaneously transmitting their streams to the receiver. Zero-forcing is
applied along with successive interference cancellation (SIC) as a means for
efficient detection of the received streams. New analytical closed-form
expressions are derived for some important performance metrics, namely, the
outage probability and ergodic capacity of the entire system. Both the
analytical expressions and simulation results show the impact of imperfect
channel estimation and hardware impairments to the overall system performance
in the usage scenarios of massive MIMO and mmWave communication systems.

Low rank matrix recovery from Clifford orbits

Richard Kueng, Huangjun Zhu, David Gross
Comments: 19 pages, no figure
Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)

We prove that low-rank matrices can be recovered efficiently from a small
number of measurements that are sampled from orbits of a certain matrix group.
As a special case, our theory makes statements about the phase retrieval
problem. Here, the task is to recover a vector given only the amplitudes of its
inner product with a small number of vectors from an orbit. Variants of the
group in question have appeared under different names in many areas of
mathematics. In coding theory and quantum information, it is the complex
Clifford group; in time-frequency analysis the oscillator group; and in
mathematical physics the metaplectic group. It affords one particularly small
and highly structured orbit that includes and generalizes the discrete Fourier
basis: While the Fourier vectors have coefficients of constant modulus and
phases that depend linearly on their index, the vectors in said orbit have
phases with a quadratic dependence. In quantum information, the orbit is used
extensively and is known as the set of stabilizer states. We argue that due to
their rich geometric structure and their near-optimal recovery properties,
stabilizer states form an ideal model for structured measurements for phase
retrieval. Our results hold for (mgeq C kappa_r r d log(d)) measurements,
where the oversampling factor k varies between (kappa_r=1) and (kappa_r =
r^2) depending on the orbit. The reconstruction is stable towards both additive
noise and deviations from the assumption of low rank. If the matrices of
interest are in addition positive semidefinite, reconstruction may be performed
by a simple constrained least squares regression. Our proof methods could be
adapted to cover orbits of other groups.

Universality of Baysian mixture predictors

Daniil Ryabko
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Learning (cs.LG)

The problem is that of sequential probability forecasting for finite-valued
time series. The data is generated by an unknown probability distribution over
the space of all one-way infinite sequences. It is known that this measure
belongs to a given set C, but the latter is completely arbitrary (uncountably
infinite, without any structure given). The performance is measured with
asymptotic average log loss. In this work it is shown that the minimax
asymptotic performance is always attainable, and it is attained by a convex
combination of a countably many measures from the set C (a Bayesian mixture).
This was previously only known for the case when the best achievable asymptotic
error is 0. This also contrasts previous results that show that in the
non-realizable case all Bayesian mixtures may be suboptimal, while there is a
predictor that achieves the optimal performance.

Infinite-dimensional Log-Determinant divergences II: Alpha-Beta divergences

Minh Ha Quang
Comments: 60 pages
Subjects: Functional Analysis (math.FA); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

This work presents a parametrized family of divergences, namely Alpha-Beta
Log- Determinant (Log-Det) divergences, between positive definite unitized
trace class operators on a Hilbert space. This is a generalization of the
Alpha-Beta Log-Determinant divergences between symmetric, positive definite
matrices to the infinite-dimensional setting. The family of Alpha-Beta Log-Det
divergences is highly general and contains many divergences as special cases,
including the recently formulated infinite dimensional affine-invariant
Riemannian distance and the infinite-dimensional Alpha Log-Det divergences
between positive definite unitized trace class operators. In particular, it
includes a parametrized family of metrics between positive definite trace class
operators, with the affine-invariant Riemannian distance and the square root of
the symmetric Stein divergence being special cases. For the Alpha-Beta Log-Det
divergences between covariance operators on a Reproducing Kernel Hilbert Space
(RKHS), we obtain closed form formulas via the corresponding Gram matrices.