IT博客汇 | arXiv Paper Daily: Mon, 31 Oct 2016

arXiv Paper Daily: Mon, 31 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-31 00:00:00

Neural and Evolutionary Computing

Learning to Reason With Adaptive Computation

Mark Neumann, Pontus Stenetorp, Sebastian Riedel
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

Multi-hop inference is necessary for machine learning systems to successfully
solve tasks such as Recognising Textual Entailment and Machine Reading. In this
work, we demonstrate the effectiveness of adaptive computation for learning the
number of inference steps required for examples of different complexity and
that learning the correct number of inference steps is difficult. We introduce
the first model involving Adaptive Computation Time which provides a small
performance benefit on top of a similar model without an adaptive component as
well as enabling considerable insight into the reasoning process of the model.

Computer Vision and Pattern Recognition

Real-time Online Action Detection Forests using Spatio-temporal Contexts

Seungryul Baek, Kwang In Kim, Tae-Kyun Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Online action detection (OAD) is challenging since 1) robust yet
computationally expensive features cannot be straightforwardly used due to the
real-time processing requirements and 2) the localization and classification of
actions have to be performed even before they are fully observed. We propose a
new random forest (RF)-based online action detection framework that addresses
these challenges. Our algorithm uses computationally efficient skeletal joint
features. High accuracy is achieved by using robust convolutional neural
network (CNN)-based features which are extracted from the raw RGBD images, plus
the temporal relationships between the current frame of interest, and the past
and future frames. While these high-quality features are not available in
real-time testing scenario, we demonstrate that they can be effectively
exploited in training RF classifiers: We use these spatio-temporal contexts to
craft RF’s new split functions improving RFs’ leaf node statistics. Experiments
with challenging MSRAction3D, G3D, and OAD datasets demonstrate that our
algorithm significantly improves the accuracy over the state-of-the-art online
action detection algorithms while achieving the real-time efficiency of
existing skeleton-based RF classifiers.

The TUM LapChole dataset for the M2CAI 2016 workflow challenge

Ralf Stauder, Daniel Ostler, Michael Kranzfelder, Sebastian Koller, Hubertus Feußner, Nassir Navab
Comments: 5 pages, 2 figures, preliminary reference for published dataset (until larger comparison study of workshop organizers is published)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this technical report we present our collected dataset of laparoscopic
cholecystectomies (LapChole). Laparoscopic videos of a total of 20 surgeries
were recorded and annotated with surgical phase labels, of which 15 were
randomly pre-determined as training data, while the remaining 5 videos are
selected as test data. This dataset was later included as part of the M2CAI
2016 workflow detection challenge during MICCAI 2016 in Athens.

Learnable Visual Markers

Oleg Grinchuk, Vadim Lebedev, Victor Lempitsky
Comments: NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a new approach to designing visual markers (analogous to QR-codes,
markers for augmented reality, and robotic fiducial tags) based on the advances
in deep generative networks. In our approach, the markers are obtained as color
images synthesized by a deep network from input bit strings, whereas another
deep network is trained to recover the bit strings back from the photos of
these markers. The two networks are trained simultaneously in a joint
backpropagation process that takes characteristic photometric and geometric
distortions associated with marker fabrication and marker scanning into
account. Additionally, a stylization loss based on statistics of activations in
a pretrained classification network can be inserted into the learning in order
to shift the marker appearance towards some texture prototype. In the
experiments, we demonstrate that the markers obtained using our approach are
capable of retaining bit strings that are long enough to be practical. The
ability to automatically adapt markers according to the usage scenario and the
desired capacity as well as the ability to combine information encoding with
artistic stylization are the unique properties of our approach. As a byproduct,
our approach provides an insight on the structure of patterns that are most
suitable for recognition by ConvNets and on their ability to distinguish
composite patterns.

Judging a Book By its Cover

Brian Kenji Iwana, Seiichi Uchida
Comments: 5 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Book covers communicate information to potential readers, but can the same
information be learned by computers? We propose a method of using a
Convolutional Neural Network (CNN) to predict the genre of a book based on the
visual clues provided by its cover. The purpose is to investigate whether
relationships between books and their covers can be learned. However,
determining the genre of a book is a difficult task because covers can be
ambiguous and genres can be overarching. Despite this, we show that a CNN can
extract features and learn underlying design rules set by the designer to
define a genre. Using machine learning, we can bring the large amount of
resources available to the book cover design process.

Towards automatic pulmonary nodule management in lung cancer screening with deep learning

Francesco Ciompi, Kaman Chung, Sarah J. van Riel, Arnaud Arindra Adiyoso Setio, Paul K. Gerke, Colin Jacobs, Ernst Th. Scholten, Cornelia Schaefer-Prokop, Mathilde M. W. Wille, Alfonso Marchiano, Ugo Pastorino, Mathias Prokop, Bram van Ginneken
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The introduction of lung cancer screening programs will produce an
unprecedented amount of chest CT scans in the near future, which radiologists
will have to read in order to decide on a patient follow-up strategy. According
to the current guidelines, the workup of screen-detected nodules strongly
relies on nodule size and nodule type. In this paper, we present a deep
learning system based on multi-stream multi-scale convolutional networks, which
automatically classifies all nodule types relevant for nodule workup. The
system processes raw CT data containing a nodule without the need for any
additional information such as nodule segmentation or nodule size and learns a
representation of 3D data by analyzing an arbitrary number of 2D views of a
given nodule. The deep learning system was trained with data from the Italian
MILD screening trial and validated on an independent set of data from the
Danish DLCST screening trial. We analyze the advantage of processing nodules at
multiple scales with a multi-stream convolutional network architecture, and we
show that the proposed deep learning system achieves performance at classifying
nodule type within the inter-observer variability among four experienced human
observers.

Recent advances in content based video copy detection

Sanket Shinde, Girija Chiddarwar
Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the immense number of videos being uploaded to the video sharing sites,
issue of copyright infringement arises with uploading of illicit copies or
transformed versions of original video. Thus safeguarding copyright of digital
media has become matter of concern. To address this concern, it is obliged to
have a video copy detection system which is sufficiently robust to detect these
transformed videos with ability to pinpoint location of copied segments. This
paper outlines recent advancement in content based video copy detection, mainly
focusing on different visual features employed by video copy detection systems.
Finally we evaluate performance of existing video copy detection systems.

Icon: An Interactive Approach to Train Deep Neural Networks for Segmentation of Neuronal Structures

Felix Gonda, Verena Kaynig, Ray Thouis, Daniel Haehn, Jeff Lichtman, Toufiq Parag, Hanspeter Pfister
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present an interactive approach to train a deep neural network pixel
classifier for the segmentation of neuronal structures. An interactive training
scheme reduces the extremely tedious manual annotation task that is typically
required for deep networks to perform well on image segmentation problems. Our
proposed method employs a feedback loop that captures sparse annotations using
a graphical user interface, trains a deep neural network based on recent and
past annotations, and displays the prediction output to users in almost
real-time. Our implementation of the algorithm also allows multiple users to
provide annotations in parallel and receive feedback from the same classifier.
Quick feedback on classifier performance in an interactive setting enables
users to identify and label examples that are more important than others for
segmentation purposes. Our experiments show that an interactively-trained pixel
classifier produces better region segmentation results on Electron Microscopy
(EM) images than those generated by a network of the same architecture trained
offline on exhaustive ground-truth labels.

Compressive Holographic Video

Zihao Wang, Leonidas Spinoulas, Kuan He, Huaijin Chen, Lei Tian, Aggelos K. Katsaggelos, Oliver Cossairt
Comments: 12 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Compressed sensing has been discussed separately in spatial and temporal
domains. Compressive holography has been introduced as a method that allows 3D
tomographic reconstruction at different depths from a single 2D image. Coded
exposure is a temporal compressed sensing method for high speed video
acquisition. In this work, we combine compressive holography and coded exposure
techniques and extend the discussion to 4D reconstruction in space and time
from one coded captured image. In our prototype, digital in-line holography was
used for imaging macroscopic, fast moving objects. The pixel-wise temporal
modulation was implemented by a digital micromirror device. In this paper we
demonstrate (10 imes) temporal super resolution with multiple depths recovery
from a single image. Two examples are presented for the purpose of recording
subtle vibrations and tracking small particles within 5 ms.

Cross-Modal Scene Networks

Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Comments: See more at this http URL arXiv admin note: text overlap with arXiv:1607.07295
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)

People can recognize scenes across many different modalities beyond natural
images. In this paper, we investigate how to learn cross-modal scene
representations that transfer across modalities. To study this problem, we
introduce a new cross-modal scene dataset. While convolutional neural networks
can categorize scenes well, they also learn an intermediate representation not
aligned across modalities, which is undesirable for cross-modal transfer
applications. We present methods to regularize cross-modal convolutional neural
networks so that they have a shared representation that is agnostic of the
modality. Our experiments suggest that our scene representation can help
transfer representations across modalities for retrieval. Moreover, our
visualizations suggest that units emerge in the shared representation that tend
to activate on consistent concepts independently of the modality.

SoundNet: Learning Sound Representations from Unlabeled Video

Yusuf Aytar, Carl Vondrick, Antonio Torralba
Comments: NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Sound (cs.SD)

We learn rich natural sound representations by capitalizing on large amounts
of unlabeled sound data collected in the wild. We leverage the natural
synchronization between vision and sound to learn an acoustic representation
using two-million unlabeled videos. Unlabeled video has the advantage that it
can be economically acquired at massive scales, yet contains useful signals
about natural sound. We propose a student-teacher training procedure which
transfers discriminative visual knowledge from well established visual
recognition models into the sound modality using unlabeled video as a bridge.
Our sound representation yields significant performance improvements over the
state-of-the-art results on standard benchmarks for acoustic scene/object
classification. Visualizations suggest some high-level semantics automatically
emerge in the sound network, even though it is trained without ground truth
labels.

Artificial Intelligence

Flexible constrained sampling with guarantees for pattern mining

Vladimir Dzyuba, Matthijs van Leeuwen, Luc De Raedt
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (stat.ML)

Pattern sampling has been proposed as a potential solution to the infamous
pattern explosion. Instead of enumerating all patterns that satisfy the
constraints, individual patterns are sampled proportional to a given quality
measure. Several sampling algorithms have been proposed, but each of them has
its limitations when it comes to 1) flexibility in terms of quality measures
and constraints that can be used, and/or 2) guarantees with respect to sampling
accuracy. We therefore present Flexics, the first flexible pattern sampler that
supports a broad class of quality measures and constraints, while providing
strong guarantees regarding sampling accuracy. To achieve this, we leverage the
perspective on pattern mining as a constraint satisfaction problem and build
upon the latest advances in sampling solutions in SAT as well as existing
pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
a variety of pattern languages, which allows us to introduce and tackle the
novel task of sampling sets of patterns. We introduce and empirically evaluate
two variants of Flexics: 1) a generic variant that addresses the well-known
itemset sampling task and the novel pattern set sampling task as well as a wide
range of expressive constraints within these tasks, and 2) a specialized
variant that exploits existing frequent itemset techniques to achieve
substantial speed-ups. Experiments show that Flexics is both accurate and
efficient, making it a useful tool for pattern-based data exploration.

Discovering Blind Spots of Predictive Models: Representations and Policies for Guided Exploration

Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz
Subjects: Artificial Intelligence (cs.AI)

Predictive models deployed in the world may assign incorrect labels to
instances with high confidence. Such errors or unknown unknowns are rooted in
model incompleteness, and typically arise because of the mismatch between
training data and the cases seen in the open world. As the models are blind to
such errors, input from an oracle is needed to identify these failures. In this
paper, we formulate and address the problem of optimizing the discovery of
unknown unknowns of any predictive model under a fixed budget, which limits the
number of times an oracle can be queried for true labels. We propose a
model-agnostic methodology which uses feedback from an oracle to both identify
unknown unknowns and to intelligently guide the discovery. We employ a
two-phase approach which first organizes the data into multiple partitions
based on instance similarity, and then utilizes an explore-exploit strategy for
discovering unknown unknowns across these partitions. We demonstrate the
efficacy of our framework by varying the underlying causes of unknown unknowns
across various applications. To the best of our knowledge, this paper presents
the first algorithmic approach to the problem of discovering unknown unknowns
of predictive models.

Improving Sampling from Generative Autoencoders with Markov Chains

Kai Arulkumaran, Antonia Creswell, Anil Anthony Bharath
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We focus on generative autoencoders, such as variational or adversarial
autoencoders, which jointly learn a generative model alongside an inference
model. We define generative autoencoders as autoencoders which are trained to
softly enforce a prior on the latent distribution learned by the model.
However, the model does not necessarily learn to match the prior. We formulate
a Markov chain Monte Carlo (MCMC) sampling process, equivalent to iteratively
encoding and decoding, which allows us to sample from the learned latent
distribution. Using this we can improve the quality of samples drawn from the
model, especially when the learned distribution is far from the prior. Using
MCMC sampling, we also reveal previously unseen differences between generative
autoencoders trained either with or without the denoising criterion.

Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts

Pavlina Fragkou
Comments: 32 pages. arXiv admin note: text overlap with arXiv:1308.0661, arXiv:1204.2847 by other authors
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

In this paper we examine the benefit of performing named entity recognition
(NER) and co-reference resolution to an English and a Greek corpus used for
text segmentation. The aim here is to examine whether the combination of text
segmentation and information extraction can be beneficial for the
identification of the various topics that appear in a document. NER was
performed manually in the English corpus and was compared with the output
produced by publicly available annotation tools while, an already existing tool
was used for the Greek corpus. Produced annotations from both corpora were
manually corrected and enriched to cover four types of named entities.
Co-reference resolution i.e., substitution of every reference of the same
instance with the same named entity identifier was subsequently performed. The
evaluation, using five text segmentation algorithms for the English corpus and
four for the Greek corpus leads to the conclusion that, the benefit highly
depends on the segment’s topic, the number of named entity instances appearing
in it, as well as the segment’s length.

How Users Explore Ontologies on the Web: A Study of NCBO's BioPortal Usage Logs

Simon Walk, Lisette Espín-Noboa, Denis Helic, Markus Strohmaier, Mark Musen
Comments: Under review for WWW’17
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Digital Libraries (cs.DL); Human-Computer Interaction (cs.HC)

Ontologies in the biomedical domain are numerous, highly specialized and very
expensive to develop. Thus, a crucial prerequisite for ontology adoption and
reuse is effective support for exploring and finding existing ontologies.
Towards that goal, the National Center for Biomedical Ontology (NCBO) has
developed BioPortal—an online repository designed to support users in
exploring and finding more than 500 existing biomedical ontologies. In 2016,
BioPortal represents one of the largest portals for exploration of semantic
biomedical vocabularies and terminologies, which is used by many researchers
and practitioners. While usage of this portal is high, we know very little
about how exactly users search and explore ontologies and what kind of usage
patterns or user groups exist in the first place. Deeper insights into user
behavior on such portals can provide valuable information to devise strategies
for a better support of users in exploring and finding existing ontologies, and
thereby enable better ontology reuse. To that end, we study and group users
according to their browsing behavior on BioPortal using data mining techniques.
Additionally, we use the obtained groups to characterize and compare
exploration strategies across ontologies. In particular, we were able to
identify seven distinct browsing-behavior types, which all make use of
different functionality provided by BioPortal. For example, Search Explorers
make extensive use of the search functionality while Ontology Tree Explorers
mainly rely on the class hierarchy to explore ontologies. Further, we show that
specific characteristics of ontologies influence the way users explore and
interact with the website. Our results may guide the development of more
user-oriented systems for ontology exploration on the Web.

Fuzzy Bayesian Learning

Indranil Pan, Dirk Bester
Comments: 13 pages, 10 figures, submitted
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

In this paper we propose a novel approach for learning from data using rule
based fuzzy inference systems where the model parameters are estimated using
Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the
applicability of the method for regression and classification tasks using
synthetic data-sets and also a real world example in the financial services
industry. Then we demonstrate how the method can be extended for knowledge
extraction to select the individual rules in a Bayesian way which best explains
the given data. Finally we discuss the advantages and pitfalls of using this
method over state-of-the-art techniques and highlight the specific class of
problems where this would be useful.

Integrating Topic Models and Latent Factors for Recommendation

Danis J. Wilson, Wei Zhang
Comments: 10 pages, 3 figures
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

The research of personalized recommendation techniques today has mostly
parted into two mainstream directions, i.e., the factorization-based approaches
and topic models. Practically, they aim to benefit from the numerical ratings
and textual reviews, correspondingly, which compose two major information
sources in various real-world systems. However, although the two approaches are
supposed to be correlated for their same goal of accurate recommendation, there
still lacks a clear theoretical understanding of how their objective functions
can be mathematically bridged to leverage the numerical ratings and textual
reviews collectively, and why such a bridge is intuitively reasonable to match
up their learning procedures for the rating prediction and top-N recommendation
tasks, respectively.

In this work, we exposit with mathematical analysis that, the vector-level
randomization functions to coordinate the optimization objectives of
factorizational and topic models unfortunately do not exist at all, although
they are usually pre-assumed and intuitively designed in the literature.
Fortunately, we also point out that one can avoid the seeking of such a
randomization function by optimizing a Joint Factorizational Topic (JFT) model
directly. We apply our JFT model to restaurant recommendation, and study its
performance in both normal and cross-city recommendation scenarios, where the
latter is an extremely difficult task for its inherent cold-start nature.
Experimental results on real-world datasets verified the appealing performance
of our approach against previous methods, on both rating prediction and top-N
recommendation tasks.

Optimal Belief Approximation

Reimar H. Leike, Torsten A. Enßlin
Subjects: Statistics Theory (math.ST); Artificial Intelligence (cs.AI); Data Analysis, Statistics and Probability (physics.data-an)

In Bayesian statistics probability distributions express beliefs. However,
for many problems the beliefs cannot be computed analytically and
approximations of beliefs are needed. We seek a ranking function that
quantifies how “embarrassing” it is to communicate a given approximation. We
show that there is only one ranking under the requirements that (1) the best
ranked approximation is the non-approximated belief and (2) that the ranking
judges approximations only by their predictions for actual outcomes. We find
that this ranking is equivalent to the Kullback-Leibler divergence that is
frequently used in the literature. However, there seems to be confusion about
the correct order in which its functional arguments, the approximated and
non-approximated beliefs, should be used. We hope that our elementary
derivation settles the apparent confusion. We show for example that when
approximating beliefs with Gaussian distributions the optimal approximation is
given by moment matching. This is in contrast to many suggested computational
schemes.

Information Retrieval

Sentiment Analysis of Twitter Data for Predicting Stock Market Movements

Venkata Sasank Pagolu, Kamal Nayan Reddy Challa, Ganapati Panda, Babita Majhi
Comments: 6 pages 4 figures Conference Paper
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Predicting stock market movements is a well-known problem of interest.
Now-a-days social media is perfectly representing the public sentiment and
opinion about current events. Especially, twitter has attracted a lot of
attention from researchers for studying the public sentiments. Stock market
prediction on the basis of public sentiments expressed on twitter has been an
intriguing field of research. Previous studies have concluded that the
aggregate public mood collected from twitter may well be correlated with Dow
Jones Industrial Average Index (DJIA). The thesis of this work is to observe
how well the changes in stock prices of a company, the rises and falls, are
correlated with the public opinions being expressed in tweets about that
company. Understanding author’s opinion from a piece of text is the objective
of sentiment analysis. The present paper have employed two different textual
representations, Word2vec and N-gram, for analyzing the public sentiments in
tweets. In this paper, we have applied sentiment analysis and supervised
machine learning principles to the tweets extracted from twitter and analyze
the correlation between stock market movements of a company and sentiments in
tweets. In an elaborate way, positive news and tweets in social media about a
company would definitely encourage people to invest in the stocks of that
company and as a result the stock price of that company would increase. At the
end of the paper, it is shown that a strong correlation exists between the rise
and falls in stock prices with the public sentiments in tweets.

Integrating Topic Models and Latent Factors for Recommendation

Danis J. Wilson, Wei Zhang
Comments: 10 pages, 3 figures
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

The research of personalized recommendation techniques today has mostly
parted into two mainstream directions, i.e., the factorization-based approaches
and topic models. Practically, they aim to benefit from the numerical ratings
and textual reviews, correspondingly, which compose two major information
sources in various real-world systems. However, although the two approaches are
supposed to be correlated for their same goal of accurate recommendation, there
still lacks a clear theoretical understanding of how their objective functions
can be mathematically bridged to leverage the numerical ratings and textual
reviews collectively, and why such a bridge is intuitively reasonable to match
up their learning procedures for the rating prediction and top-N recommendation
tasks, respectively.

In this work, we exposit with mathematical analysis that, the vector-level
randomization functions to coordinate the optimization objectives of
factorizational and topic models unfortunately do not exist at all, although
they are usually pre-assumed and intuitively designed in the literature.
Fortunately, we also point out that one can avoid the seeking of such a
randomization function by optimizing a Joint Factorizational Topic (JFT) model
directly. We apply our JFT model to restaurant recommendation, and study its
performance in both normal and cross-city recommendation scenarios, where the
latter is an extremely difficult task for its inherent cold-start nature.
Experimental results on real-world datasets verified the appealing performance
of our approach against previous methods, on both rating prediction and top-N
recommendation tasks.

Toward Implicit Sample Noise Modeling: Deviation-driven Matrix Factorization

Guang-He Lee, Shao-Wen Yang, Shou-De Lin
Comments: 6 pages + 1 reference page
Subjects: Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

The objective function of a matrix factorization model usually aims to
minimize the average of a regression error contributed by each element.
However, given the existence of stochastic noises, the implicit deviations of
sample data from their true values are almost surely diverse, which makes each
data point not equally suitable for fitting a model. In this case, simply
averaging the cost among data in the objective function is not ideal.
Intuitively we would like to emphasize more on the reliable instances (i.e.,
those contain smaller noise) while training a model. Motivated by such
observation, we derive our formula from a theoretical framework for optimal
weighting under heteroscedastic noise distribution. Specifically, by modeling
and learning the deviation of data, we design a novel matrix factorization
model. Our model has two advantages. First, it jointly learns the deviation and
conducts dynamic reweighting of instances, allowing the model to converge to a
better solution. Second, during learning the deviated instances are assigned
lower weights, which leads to faster convergence since the model does not need
to overfit the noise. The experiments are conducted in clean recommendation and
noisy sensor datasets to test the effectiveness of the model in various
scenarios. The results show that our model outperforms the state-of-the-art
factorization and deep learning models in both accuracy and efficiency.

Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts

In this paper we examine the benefit of performing named entity recognition
(NER) and co-reference resolution to an English and a Greek corpus used for
text segmentation. The aim here is to examine whether the combination of text
segmentation and information extraction can be beneficial for the
identification of the various topics that appear in a document. NER was
performed manually in the English corpus and was compared with the output
produced by publicly available annotation tools while, an already existing tool
was used for the Greek corpus. Produced annotations from both corpora were
manually corrected and enriched to cover four types of named entities.
Co-reference resolution i.e., substitution of every reference of the same
instance with the same named entity identifier was subsequently performed. The
evaluation, using five text segmentation algorithms for the English corpus and
four for the Greek corpus leads to the conclusion that, the benefit highly
depends on the segment’s topic, the number of named entity instances appearing
in it, as well as the segment’s length.

Computation and Language

Word Embeddings for the Construction Domain

Antoine J.-P. Tixier, Michalis Vazirgiannis, Matthew R. Hallowell
Subjects: Computation and Language (cs.CL)

We introduce word vectors for the construction domain. Our vectors were
obtained by running word2vec on an 11M-word corpus that we created from scratch
by leveraging freely-accessible online sources of construction-related text. We
first explore the embedding space and show that our vectors capture meaningful
construction-specific concepts. We then evaluate the performance of our vectors
against that of ones trained on a 100B-word corpus (Google News) within the
framework of an injury report classification task. Without any parameter
tuning, our embeddings give competitive results, and outperform the Google News
vectors in many cases. Using a keyword-based compression of the reports also
leads to a significant speed-up with only a limited loss in performance. We
release our corpus and the data set we created for the classification task as
publicly available, in the hope that they will be used by future studies for
benchmarking and building on our work.

Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts

In this paper we examine the benefit of performing named entity recognition
(NER) and co-reference resolution to an English and a Greek corpus used for
text segmentation. The aim here is to examine whether the combination of text
segmentation and information extraction can be beneficial for the
identification of the various topics that appear in a document. NER was
performed manually in the English corpus and was compared with the output
produced by publicly available annotation tools while, an already existing tool
was used for the Greek corpus. Produced annotations from both corpora were
manually corrected and enriched to cover four types of named entities.
Co-reference resolution i.e., substitution of every reference of the same
instance with the same named entity identifier was subsequently performed. The
evaluation, using five text segmentation algorithms for the English corpus and
four for the Greek corpus leads to the conclusion that, the benefit highly
depends on the segment’s topic, the number of named entity instances appearing
in it, as well as the segment’s length.

Towards a continuous modeling of natural language domains

Sebastian Ruder, Parsa Ghaffari, John G. Breslin
Comments: 5 pages, 3 figures, published in Uphill Battles in Language Processing workshop, EMNLP 2016
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Humans continuously adapt their style and language to a variety of domains.
However, a reliable definition of `domain’ has eluded researchers thus far.
Additionally, the notion of discrete domains stands in contrast to the
multiplicity of heterogeneous domains that humans navigate, many of which
overlap. In order to better understand the change and variation of human
language, we draw on research in domain adaptation and extend the notion of
discrete domains to the continuous spectrum. We propose representation
learning-based models that can adapt to continuous domains and detail how these
can be used to investigate variation in language. To this end, we propose to
use dialogue modeling as a test bed due to its proximity to language modeling
and its social component.

Representation Learning Models for Entity Search

Shijia E, Yang Xiang, Mohan Zhang
Comments: submitted to WWW2017
Subjects: Computation and Language (cs.CL)

We focus on the problem of learning distributed representations for entity
search queries, named entities, and their short descriptions. With our
representation learning models, the entity search query, named entity and
description can be represented as low-dimensional vectors. Our goal is to
develop a simple but effective model that can make the distributed
representations of query related entities similar to the query in the vector
space. Hence, we propose three kinds of learning strategies, and the difference
between them mainly lies in how to deal with the relationship between an entity
and its description. We analyze the strengths and weaknesses of each learning
strategy and validate our methods on public datasets which contain four kinds
of named entities, i.e., movies, TV shows, restaurants and celebrities. The
experimental results indicate that our proposed methods can adapt to different
types of entity search queries, and outperform the current state-of-the-art
methods based on keyword matching and vanilla word2vec models. Besides, the
proposed methods can be trained fast and be easily extended to other similar
tasks.

Sentiment Analysis of Twitter Data for Predicting Stock Market Movements

Predicting stock market movements is a well-known problem of interest.
Now-a-days social media is perfectly representing the public sentiment and
opinion about current events. Especially, twitter has attracted a lot of
attention from researchers for studying the public sentiments. Stock market
prediction on the basis of public sentiments expressed on twitter has been an
intriguing field of research. Previous studies have concluded that the
aggregate public mood collected from twitter may well be correlated with Dow
Jones Industrial Average Index (DJIA). The thesis of this work is to observe
how well the changes in stock prices of a company, the rises and falls, are
correlated with the public opinions being expressed in tweets about that
company. Understanding author’s opinion from a piece of text is the objective
of sentiment analysis. The present paper have employed two different textual
representations, Word2vec and N-gram, for analyzing the public sentiments in
tweets. In this paper, we have applied sentiment analysis and supervised
machine learning principles to the tweets extracted from twitter and analyze
the correlation between stock market movements of a company and sentiments in
tweets. In an elaborate way, positive news and tweets in social media about a
company would definitely encourage people to invest in the stocks of that
company and as a result the stock price of that company would increase. At the
end of the paper, it is shown that a strong correlation exists between the rise
and falls in stock prices with the public sentiments in tweets.

Distributed, Parallel, and Cluster Computing

Domain Specific Distributed Search Engine Based on Semantic P2P Networks

Lican Huang
Comments: 10 pages, 7 figures , ICNDC2016 conference
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This paper presents a distributed search engine based on semantic P2P
Networks. The user’s computers join the domains in which user wants to share
information in semantic P2P networks which is domain specific virtual tree
(VIRGO ). Each user computer contains search engine which indexes the domain
specific information on local computer or Internet. We can get all search
information through P2P message provided by all joined computers. By companies’
effort, we have implemented a prototype of distributed search engine, which
demonstrates easily retrieving domain-related information provided by joined
computers .

CHOPtrey: contextual online polynomial extrapolation for enhanced multi-core co-simulation of complex systems

Abir Ben Khaled-El Feki, Laurent Duval, Cyril Faure, Daniel Simon, Mongi Ben Gaid
Subjects: Systems and Control (cs.SY); Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC)

The growing complexity of Cyber-Physical Systems (CPS), together with
increasingly available parallelism provided by multi-core chips, fosters the
parallelization of simulation. Simulation speed-ups are expected from
co-simulation and parallelization based on model splitting into weak-coupled
sub-models, as for instance in the framework of Functional Mockup Interface
(FMI). However, slackened synchronization between sub-models and their
associated solvers running in parallel introduces integration errors, which
must be kept inside acceptable bounds.

CHOPtrey denotes a forecasting framework enhancing the performance of complex
system co-simulation, with a trivalent articulation. First, we consider the
framework of a Computationally Hasty Online Prediction system (CHOPred). It
allows to improve the trade-off between integration speed-ups, needing large
communication steps, and simulation precision, needing frequent updates for
model inputs. Second, smoothed adaptive forward prediction improves
co-simulation accuracy. It is obtained by past-weighted extrapolation based on
Causal Hopping Oblivious Polynomials (CHOPoly). And third, signal behavior is
segmented to handle the discontinuities of the exchanged signals: the
segmentation is performed in a Contextual & Hierarchical Ontology of Patterns
(CHOPatt).

Implementation strategies and simulation results demonstrate the framework
ability to adaptively relax data communication constraints beyond
synchronization points which sensibly accelerate simulation. The CHOPtrey
framework extends the range of applications of standard Lagrange-type methods,
often deemed unstable. The embedding of predictions in lag-dependent smoothing
and discontinuity handling demonstrates its practical efficiency.

Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

Satya P. Jammy, Christian T. Jacobs, Neil D. Sandham
Comments: Author accepted version. Accepted for publication in Journal of Computational Science on 27 October 2016
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)

Future architectures designed to deliver exascale performance motivate the
need for novel algorithmic changes in order to fully exploit their
capabilities. In this paper, the performance of several numerical algorithms,
characterised by varying degrees of memory and computational intensity, are
evaluated in the context of finite difference methods for fluid dynamics
problems. It is shown that, by storing some of the evaluated derivatives as
single thread- or process-local variables in memory, or recomputing the
derivatives on-the-fly, a speed-up of ~2 can be obtained compared to
traditional algorithms that store all derivatives in global arrays.

Type oriented parallel programming for Exascale

Nick Brown
Comments: As presented at the Exascale Applications and Software Conference (EASC), 9th-11th April 2013
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)

Whilst there have been great advances in HPC hardware and software in recent
years, the languages and models that we use to program these machines have
remained much more static. This is not from a lack of effort, but instead by
virtue of the fact that the foundation that many programming languages are
built on is not sufficient for the level of expressivity required for parallel
work. The result is an implicit trade-off between programmability and
performance which is made worse due to the fact that, whilst many scientific
users are experts within their own fields, they are not HPC experts.

Type oriented programming looks to address this by encoding the complexity of
a language via the type system. Most of the language functionality is contained
within a loosely coupled type library that can be flexibly used to control many
aspects such as parallelism. Due to the high level nature of this approach
there is much information available during compilation which can be used for
optimisation and, in the absence of type information, the compiler can apply
sensible default options thus supporting both the expert programmer and novice
alike.

We demonstrate that, at no performance or scalability penalty when running on
up to 8196 cores of a Cray XE6 system, codes written in this type oriented
manner provide improved programmability. The programmer is able to write
simple, implicit parallel, HPC code at a high level and then explicitly tune by
adding additional type information if required.

Learning

Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated

Namrata Vaswani, Han Guo
Comments: To appear in NIPS 2016. Longer version submitted to IEEE Trans. Sig. Proc. is availabe at this http URL arXiv admin note: text overlap with arXiv:1608.04320
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Given a matrix of observed data, Principal Components Analysis (PCA) computes
a small number of orthogonal directions that contain most of its variability.
Provably accurate solutions for PCA have been in use for a long time. However,
to the best of our knowledge, all existing theoretical guarantees for it assume
that the data and the corrupting noise are mutually independent, or at least
uncorrelated. This is valid in practice often, but not always. In this paper,
we study the PCA problem in the setting where the data and noise can be
correlated. Such noise is often also referred to as “data-dependent noise”. We
obtain a correctness result for the standard eigenvalue decomposition (EVD)
based solution to PCA under simple assumptions on the data-noise correlation.
We also develop and analyze a generalization of EVD, cluster-EVD, that improves
upon EVD in certain regimes.

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

Antoine Gautier, Quynh Nguyen, Matthias Hein
Comments: Long version of NIPS 2016 paper
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

The optimization problem behind neural networks is highly non-convex.
Training with stochastic gradient descent and variants requires careful
parameter tuning and provides no guarantee to achieve the global optimum. In
contrast we show under quite weak assumptions on the data that a particular
class of feedforward neural networks can be trained globally optimal with a
linear convergence rate with our nonlinear spectral method. Up to our knowledge
this is the first practically feasible method which achieves such a guarantee.
While the method can in principle be applied to deep networks, we restrict
ourselves for simplicity in this paper to one and two hidden layer networks.
Our experiments confirm that these models are rich enough to achieve good
performance on a series of real-world datasets.

Improving Sampling from Generative Autoencoders with Markov Chains

Kai Arulkumaran, Antonia Creswell, Anil Anthony Bharath
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We focus on generative autoencoders, such as variational or adversarial
autoencoders, which jointly learn a generative model alongside an inference
model. We define generative autoencoders as autoencoders which are trained to
softly enforce a prior on the latent distribution learned by the model.
However, the model does not necessarily learn to match the prior. We formulate
a Markov chain Monte Carlo (MCMC) sampling process, equivalent to iteratively
encoding and decoding, which allows us to sample from the learned latent
distribution. Using this we can improve the quality of samples drawn from the
model, especially when the learned distribution is far from the prior. Using
MCMC sampling, we also reveal previously unseen differences between generative
autoencoders trained either with or without the denoising criterion.

Toward Implicit Sample Noise Modeling: Deviation-driven Matrix Factorization

Guang-He Lee, Shao-Wen Yang, Shou-De Lin
Comments: 6 pages + 1 reference page
Subjects: Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

The objective function of a matrix factorization model usually aims to
minimize the average of a regression error contributed by each element.
However, given the existence of stochastic noises, the implicit deviations of
sample data from their true values are almost surely diverse, which makes each
data point not equally suitable for fitting a model. In this case, simply
averaging the cost among data in the objective function is not ideal.
Intuitively we would like to emphasize more on the reliable instances (i.e.,
those contain smaller noise) while training a model. Motivated by such
observation, we derive our formula from a theoretical framework for optimal
weighting under heteroscedastic noise distribution. Specifically, by modeling
and learning the deviation of data, we design a novel matrix factorization
model. Our model has two advantages. First, it jointly learns the deviation and
conducts dynamic reweighting of instances, allowing the model to converge to a
better solution. Second, during learning the deviated instances are assigned
lower weights, which leads to faster convergence since the model does not need
to overfit the noise. The experiments are conducted in clean recommendation and
noisy sensor datasets to test the effectiveness of the model in various
scenarios. The results show that our model outperforms the state-of-the-art
factorization and deep learning models in both accuracy and efficiency.

Hierarchical Clustering via Spreading Metrics

Aurko Roy, Sebastian Pokutta
Comments: Extended abstract in proceedings of NIPS 2016
Subjects: Learning (cs.LG)

We study the cost function for hierarchical clusterings introduced by
[arXiv:1510.05043] where hierarchies are treated as first-class objects rather
than deriving their cost from projections into flat clusters. It was also shown
in [arXiv:1510.05043] that a top-down algorithm returns a hierarchical
clustering of cost at most (Oleft(alpha_n log n
ight)) times the cost of
the optimal hierarchical clustering, where (alpha_n) is the approximation
ratio of the Sparsest Cut subroutine used. Thus using the best known
approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top
down algorithm returns a hierarchical clustering of cost at most
(Oleft(log^{3/2} n
ight)) times the cost of the optimal solution. We improve
this by giving an (O(log{n}))-approximation algorithm for this problem. Our
main technical ingredients are a combinatorial characterization of ultrametrics
induced by this cost function, deriving an Integer Linear Programming (ILP)
formulation for this family of ultrametrics, and showing how to iteratively
round an LP relaxation of this formulation by using the idea of emph{sphere
growing} which has been extensively used in the context of graph partitioning.
We also prove that our algorithm returns an (O(log{n}))-approximate
hierarchical clustering for a generalization of this cost function also studied
in [arXiv:1510.05043]. Experiments show that the hierarchies found by using the
ILP formulation as well as our rounding algorithm often have better projections
into flat clusters than the standard linkage based algorithms. We also give
constant factor inapproximability results for this problem.

A Conceptual Development of Quench Prediction App build on LSTM and ELQA framework

Matej Mertik, Maciej Wielgosz, Andrzej Skoczeń
Subjects: Learning (cs.LG)

This article presents a development of web application for quench prediction
in gls{te-mpe-ee} at CERN. The authors describe an ELectrical Quality
Assurance (ELQA) framework, a platform which was designed for rapid development
of web integrated data analysis applications for different analysis needed
during the hardware commissioning of the Large Hadron Collider (LHC). In second
part the article describes a research carried out with the data collected from
Quench Detection System by means of using an LSTM recurrent neural network. The
article discusses and presents a conceptual work of implementing quench
prediction application for gls{te-mpe-ee} based on the ELQA and quench
prediction algorithm.

SOL: A Library for Scalable Online Learning Algorithms

Yue Wu, Steven C.H. Hoi, Chenghao Liu, Jing Lu, Doyen Sahoo, Nenghai Yu
Comments: 5 pages
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

SOL is an open-source library for scalable online learning algorithms, and is
particularly suitable for learning with high-dimensional data. The library
provides a family of regular and sparse online learning algorithms for
large-scale binary and multi-class classification tasks with high efficiency,
scalability, portability, and extensibility. SOL was implemented in C++, and
provided with a collection of easy-to-use command-line tools, python wrappers
and library calls for users and developers, as well as comprehensive documents
for both beginners and advanced users. SOL is not only a practical machine
learning toolbox, but also a comprehensive experimental platform for online
learning research. Experiments demonstrate that SOL is highly efficient and
scalable for large-scale machine learning with high-dimensional data.

Orthogonal Random Features

Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar
Comments: NIPS 2016
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We present an intriguing discovery related to Random Fourier Features: in
Gaussian kernel approximation, replacing the random Gaussian matrix by a
properly scaled random orthogonal matrix significantly decreases kernel
approximation error. We call this technique Orthogonal Random Features (ORF),
and provide theoretical and empirical justification for this behavior.
Motivated by this discovery, we further propose Structured Orthogonal Random
Features (SORF), which uses a class of structured discrete orthogonal matrices
to speed up the computation. The method reduces the time cost from
(mathcal{O}(d^2)) to (mathcal{O}(d log d)), where (d) is the data
dimensionality, with almost no compromise in kernel approximation quality
compared to ORF. Experiments on several datasets verify the effectiveness of
ORF and SORF over the existing methods. We also provide discussions on using
the same type of discrete orthogonal structure for a broader range of
applications.

Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

Jack W Rae, Jonathan J Hunt, Tim Harley, Ivo Danihelka, Andrew Senior, Greg Wayne, Alex Graves, Timothy P Lillicrap
Comments: in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain
Subjects: Learning (cs.LG)

Neural networks augmented with external memory have the ability to learn
algorithmic solutions to complex tasks. These models appear promising for
applications such as language modeling and machine translation. However, they
scale poorly in both space and time as the amount of memory grows — limiting
their applicability to real-world domains. Here, we present an end-to-end
differentiable memory access scheme, which we call Sparse Access Memory (SAM),
that retains the representational power of the original approaches whilst
training efficiently with very large memories. We show that SAM achieves
asymptotic lower bounds in space and time complexity, and find that an
implementation runs (1,!000 imes) faster and with (3,!000 imes) less
physical memory than non-sparse models. SAM learns with comparable data
efficiency to existing models on a range of synthetic tasks and one-shot
Omniglot character recognition, and can scale to tasks requiring (100,!000)s
of time steps and memories. As well, we show how our approach can be adapted
for models that maintain temporal associations between memories, as with the
recently introduced Differentiable Neural Computer.

Homotopy Method for Tensor Principal Component Analysis

Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobah
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Developing efficient and guaranteed nonconvex algorithms has been an
important challenge in modern machine learning. Algorithms with good empirical
performance such as stochastic gradient descent often lack theoretical
guarantees. In this paper, we analyze the class of homotopy or continuation
methods for global optimization of nonconvex functions. These methods start
from an objective function that is efficient to optimize (e.g. convex), and
progressively modify it to obtain the required objective, and the solutions are
passed along the homotopy path. For the challenging problem of tensor PCA, we
prove global convergence of the homotopy method in the “high noise” regime. The
signal-to-noise requirement for our algorithm is tight in the sense that it
matches the recovery guarantee for the best degree-4 sum-of-squares algorithm.
In addition, we prove a phase transition along the homotopy path for tensor
PCA. This allows to simplify the homotopy method to a local search algorithm,
viz., tensor power iterations, with a specific initialization and a noise
injection procedure, while retaining the theoretical guarantees.

Towards a continuous modeling of natural language domains

Humans continuously adapt their style and language to a variety of domains.
However, a reliable definition of `domain’ has eluded researchers thus far.
Additionally, the notion of discrete domains stands in contrast to the
multiplicity of heterogeneous domains that humans navigate, many of which
overlap. In order to better understand the change and variation of human
language, we draw on research in domain adaptation and extend the notion of
discrete domains to the continuous spectrum. We propose representation
learning-based models that can adapt to continuous domains and detail how these
can be used to investigate variation in language. To this end, we propose to
use dialogue modeling as a test bed due to its proximity to language modeling
and its social component.

A framework for adaptive regularization in streaming Lasso models

Ricardo Pio Monti, Christoforos Anagnostopoulos, Giovanni Montana
Comments: 18 pages, 4 figures. arXiv admin note: text overlap with arXiv:1511.02187
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Large scale, streaming datasets are ubiquitous in modern machine learning.
Streaming algorithms must be scalable, amenable to incremental training and
robust to the presence of non-stationarity. In this work consider the problem
of learning (ell_1) regularized linear models in the context of streaming
data. In particular, the focus of this work revolves around how to select the
regularization parameter when data arrives sequentially and the underlying
distribution is non-stationary (implying the choice of optimal regularization
parameter is itself time-varying). We propose a novel framework through which
to infer an adaptive regularization parameter. Our approach employs an (ell_1)
penalty constraint where the corresponding sparsity parameter is iteratively
updated via stochastic gradient descent. This serves to reformulate the choice
of regularization parameter in a principled framework for online learning and
allows for the derivation of convergence guarantees in a non-stochastic
setting. We validate our approach using simulated and real datasets and present
an application to a neuroimaging dataset.

(f)-Divergence Inequalities via Functional Domination

Igal Sason, Sergio Verdú
Comments: A conference paper, 5 pages. To be presented in the 2016 ICSEE International Conference on the Science of Electrical Engineering, Nov. 16–18, Eilat, Israel. See this https URL for the full paper version, published as a journal paper in the IEEE Trans. on Information Theory, vol. 62, no. 11, pp. 5973-6006, November 2016
Subjects: Information Theory (cs.IT); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

This paper considers derivation of (f)-divergence inequalities via the
approach of functional domination. Bounds on an (f)-divergence based on one or
several other (f)-divergences are introduced, dealing with pairs of probability
measures defined on arbitrary alphabets. In addition, a variety of bounds are
shown to hold under boundedness assumptions on the relative information. The
journal paper, which includes more approaches for the derivation of
f-divergence inequalities and proofs, is available on the arXiv at
this https URL, and it has been published in the IEEE Trans.
on Information Theory, vol. 62, no. 11, pp. 5973-6006, November 2016.

Missing Data Imputation for Supervised Learning

Jason Poulos, Rafael Valle
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

This paper compares methods for imputing missing categorical data for
supervised learning tasks. The ability of researchers to accurately fit a model
and yield unbiased estimates may be compromised by missing data, which are
prevalent in survey-based social science research. We experiment on two machine
learning benchmark datasets with missing categorical data, comparing
classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with
different degrees of missing-data perturbation. The results show imputation
methods can increase predictive accuracy in the presence of missing-data
perturbation. Additionally, we find that for imputed models, missing-data
perturbation can improve prediction accuracy by regularizing the classifier.

Professor Forcing: A New Algorithm for Training Recurrent Networks

Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, Yoshua Bengio
Comments: NIPS 2016 Accepted Paper
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

The Teacher Forcing algorithm trains recurrent networks by supplying observed
sequence values as inputs during training and using the network’s own
one-step-ahead predictions to do multi-step sampling. We introduce the
Professor Forcing algorithm, which uses adversarial domain adaptation to
encourage the dynamics of the recurrent network to be the same when training
the network and when sampling from the network over multiple time steps. We
apply Professor Forcing to language modeling, vocal synthesis on raw waveforms,
handwriting generation, and image generation. Empirically we find that
Professor Forcing acts as a regularizer, improving test likelihood on character
level Penn Treebank and sequential MNIST. We also find that the model
qualitatively improves samples, especially when sampling for a large number of
time steps. This is supported by human evaluation of sample quality. Trade-offs
between Professor Forcing and Scheduled Sampling are discussed. We produce
T-SNEs showing that Professor Forcing successfully makes the dynamics of the
network during training and sampling more similar.

Operator Variational Inference

Rajesh Ranganath, Jaan Altosaar, Dustin Tran, David M. Blei
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)

Variational inference is an umbrella term for algorithms which cast Bayesian
inference as optimization. Classically, variational inference uses the
Kullback-Leibler divergence to define the optimization. Though this divergence
has been widely used, the resultant posterior approximation can suffer from
undesirable statistical properties. To address this, we reexamine variational
inference from its roots as an optimization problem. We use operators, or
functions of functions, to design variational objectives. As one example, we
design a variational objective with a Langevin-Stein operator. We develop a
black box algorithm, operator variational inference (OPVI), for optimizing any
operator objective. Importantly, operators enable us to make explicit the
statistical and computational tradeoffs for variational inference. We can
characterize different properties of variational objectives, such as objectives
that admit data subsampling—allowing inference to scale to massive data—as
well as objectives that admit variational programs—a rich class of posterior
approximations that does not require a tractable density. We illustrate the
benefits of OPVI on a mixture model and a generative model of images.

Cross-Modal Scene Networks

People can recognize scenes across many different modalities beyond natural
images. In this paper, we investigate how to learn cross-modal scene
representations that transfer across modalities. To study this problem, we
introduce a new cross-modal scene dataset. While convolutional neural networks
can categorize scenes well, they also learn an intermediate representation not
aligned across modalities, which is undesirable for cross-modal transfer
applications. We present methods to regularize cross-modal convolutional neural
networks so that they have a shared representation that is agnostic of the
modality. Our experiments suggest that our scene representation can help
transfer representations across modalities for retrieval. Moreover, our
visualizations suggest that units emerge in the shared representation that tend
to activate on consistent concepts independently of the modality.

SoundNet: Learning Sound Representations from Unlabeled Video

Yusuf Aytar, Carl Vondrick, Antonio Torralba
Comments: NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Sound (cs.SD)

We learn rich natural sound representations by capitalizing on large amounts
of unlabeled sound data collected in the wild. We leverage the natural
synchronization between vision and sound to learn an acoustic representation
using two-million unlabeled videos. Unlabeled video has the advantage that it
can be economically acquired at massive scales, yet contains useful signals
about natural sound. We propose a student-teacher training procedure which
transfers discriminative visual knowledge from well established visual
recognition models into the sound modality using unlabeled video as a bridge.
Our sound representation yields significant performance improvements over the
state-of-the-art results on standard benchmarks for acoustic scene/object
classification. Visualizations suggest some high-level semantics automatically
emerge in the sound network, even though it is trained without ground truth
labels.

Information Theory

User Cooperation for Enhanced Throughput Fairness in Wireless Powered Communication Networks

Mingquan Zhong, Suzhi Bi, Xiaohui Lin
Comments: This paper has been accepted by Springer Wireless Networks. arXiv admin note: text overlap with arXiv:1606.02033
Subjects: Information Theory (cs.IT)

This paper studies a novel user cooperation method in a wireless powered
cooperative communication network (WPCN) in which a pair of distributed
terminal users first harvest wireless energy broadcasted by one energy node
(EN) and then use the harvested energy to transmit information to a destination
node (DN). In particular, the two cooperating users exchange their independent
information with each other so as to form a virtual antenna array and transmit
jointly to the DN. By allowing the users to share their harvested energy to
transmit each other’s information, the proposed method can effectively mitigate
the inherent user unfairness problem in WPCN, where one user may suffer from
very low data rate due to poor energy harvesting performance and high data
transmission consumptions. Depending on the availability of channel state
information at the transmitters, we consider the two users cooperating using
either coherent or non-coherent data transmissions. In both cases, we derive
the maximum common throughput achieved by the cooperation schemes through
optimizing the time allocation on wireless energy transfer, user message
exchange, and joint information transmissions in a fixed-length time slot. We
also perform numerical analysis to study the impact of channel conditions on
the system performance. By comparing with some existing benchmark schemes, our
results demonstrate the effectiveness of the proposed user cooperation in a
WPCN under different application scenarios.

Generalized Common Information: Common Information Extraction and Private Sources Synthesis

Lei Yu, Houqiang Li, Chang Wen Chen
Comments: 42 pages. arXiv admin note: text overlap with arXiv:1203.0730 by other authors
Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

In literature, two different common informations were defined by G’acs and
K”orner and by Wyner, respectively. In this paper, we generalize and unify
them, and define a generalized version of common information,
information-correlation function, by exploiting maximal correlation as a
commonness or privacy measure. The G’acs-K”orner common information and Wyner
common information are two special and extreme cases of our generalized
definition. Besides, we also study the problems of common information
extraction and private sources synthesis, and show that information-correlation
function is the optimal rate under a given maximal correlation constraint in
these problems.

Performance Impact of LOS and NLOS Transmissions in Dense Cellular Networks under Rician Fading

Amir H. Jafari, Ming Ding, David Lopez-Perez, Jie Zhang
Comments: 24 pages, 3 figures. Submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

In this paper, we analyse the performance of dense small cell network (SCNs).
We derive analytical expressions for both their coverage probability and their
area spectral efficiency (ASE) using a path loss model that considers both
line-of-sight (LOS) and non-LOS (NLOS) components. Due to the close proximity
of small cell base stations (BSs) and user equipments (UEs) in such dense SCNs,
we also consider Rician fading as the multi-path fading channel model for both
the LOS and NLOS fading transmissions. The Rayleigh fading used in most of
existing works analysing dense SCNs is not accurate enough. Then, we compare
the performance impact of LOS and NLOS transmissions in dense SCNs under Rician
fading with that based on Rayleigh fading. The analysis and the simulation
results show that in dense SCNs where LOS transmissions dominate the
performance, the impact of Rician fading on the overall system performance is
minor, and does not help to address the performance losses brought by the
transition of many interfering signals from NLOS to LOS.

Generalized I-MMSE for K-User Gaussian Channels

Samah A. M. Ghanem
Comments: arXiv admin note: substantial text overlap with arXiv:1504.06884
Subjects: Information Theory (cs.IT)

In this paper, we generalize the fundamental relation between the mutual
information and the minimum mean squared error (MMSE) by Guo, Shamai, and Verdu
[1] to K-User Gaussian channels. We prove that the derivative of the multiuser
mutual information with respect to the signal to noise ratio (SNR) is equal to
the total MMSE plus a covariance term with respect to the cross correlation of
the multiuser input estimates, the channels and the precoding matrices. We shed
light that such relation is a generalized I-MMSE with one step lookahead and
lookback, applied to the Successive Interference Cancellation (SIC) in the
decoding process.

An Application of Group Theory in Confidential Network Communications

Juan Antonio Lopez-Ramos, Joachim Rosenthal, Davide Schipani, Reto Schnyder
Comments: to appear in Mathematical Methods in the Applied Sciences
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

A new proposal for group key exchange is introduced which proves to be both
efficient and secure and compares favorably with state of the art protocols.

Decentralized Power Control for Slotted Spread Spectrum Aloha with Successive Interference Cancellation

Francisco Lázaro
Comments: Accepted for publication at the 11th International ITG Conference on Systems, Communications and Coding, SCC 2017
Subjects: Information Theory (cs.IT)

In this paper, we study slotted Spread Spectrum Aloha with Successive
Interference Cancellation at the receiver over a Gaussian channel. We consider
a decentralized power control setting in which each user chooses its transmit
power independently at random according to a power distribution with continuous
support. In this setting, we derive an analytical expression for the expected
interference power experienced by a user. This allows us to derive analytically
the power distribution that, during the Successive Interference Cancellation
process leads to a constant signal to noise plus interference ratio for all
users. We consider both perfect and imperfect interference cancellation.

Finite-Length Analysis of Frameless ALOHA

Francisco Lázaro, Čedomir Stefanović
Comments: Accepted for publication at SCC 2017
Subjects: Information Theory (cs.IT)

In this paper we present an exact finite-length analysis of frameless ALOHA
that is obtained through a dynamical programming approach. Monte Carlo
simulations are performed in order to verify the analysis. Two examples are
provided that illustrate how the analysis can be used to optimize the
parameters of frameless ALOHA. To the best of the knowledge of the authors,
this is the first contribution dealing with an exact finite-length
characterization of a protocol from the coded slotted ALOHA family of
protocols.

Steearable Discrete Cosine Transform

Giulia Fracastoro, Sophie Marie Fosson, Enrico Magli
Subjects: Information Theory (cs.IT); Multimedia (cs.MM); Optimization and Control (math.OC)

In image compression, classical block-based separable transforms tend to be
inefficient when image blocks contain arbitrarily shaped discontinuities. For
this reason, transforms incorporating directional information are an appealing
alternative. In this paper, we propose a new approach to this problem, namely a
discrete cosine transform (DCT) that can be steered in any chosen direction.
Such transform, called steerable DCT (SDCT), allows to rotate in a flexible way
pairs of basis vectors, and enables precise matching of directionality in each
image block, achieving improved coding efficiency. The optimal rotation angles
for SDCT can be represented as solution of a suitable rate-distortion (RD)
problem. We propose iterative methods to search such solution, and we develop a
fully fledged image encoder to practically compare our techniques with other
competing transforms. Analytical and numerical results prove that SDCT
outperforms both DCT and state-of-the-art directional transforms.

Symbol Synchronization for Diffusive Molecular Communication Systems

Vahid Jamali, Arman Ahmadzadeh, Robert Schober
Comments: This paper has been submitted for presentation at IEEE International Conference on Communications (ICC) 2017
Subjects: Information Theory (cs.IT)

Symbol synchronization refers to the estimation of the start of a symbol
interval and is needed for reliable detection. In this paper, we develop a
symbol synchronization framework for molecular communication (MC) systems where
we consider some practical challenges which have not been addressed in the
literature yet. In particular, we take into account that in MC systems, the
transmitter may not be equipped with an internal clock and may not be able to
emit molecules with a fixed release frequency. Such restrictions hold for
practical nanotransmitters, e.g. modified cells, where the lengths of the
symbol intervals may vary due to the inherent randomness in the availability of
food and energy for molecule generation, the process for molecule production,
and the release process. To address this issue, we propose to employ two types
of molecules, one for synchronization and one for data transmission. We derive
the optimal maximum likelihood (ML) symbol synchronization scheme as a
performance upper bound. Since ML synchronization entails high complexity, we
also propose two low-complexity synchronization schemes, namely a peak
observation-based scheme and a threshold-trigger scheme, which are suitable for
MC systems with limited computational capabilities. Our simulation results
reveal the effectiveness of the proposed synchronization~schemes and suggest
that the end-to-end performance of MC systems significantly depends on the
accuracy of symbol synchronization.

(f)-Divergence Inequalities via Functional Domination

This paper considers derivation of (f)-divergence inequalities via the
approach of functional domination. Bounds on an (f)-divergence based on one or
several other (f)-divergences are introduced, dealing with pairs of probability
measures defined on arbitrary alphabets. In addition, a variety of bounds are
shown to hold under boundedness assumptions on the relative information. The
journal paper, which includes more approaches for the derivation of
f-divergence inequalities and proofs, is available on the arXiv at
this https URL, and it has been published in the IEEE Trans.
on Information Theory, vol. 62, no. 11, pp. 5973-6006, November 2016.

Broadcast Coded Modulation: Multilevel and Bit-interleaved Construction

Ahmed Abotabl, Aria Nosratinia
Subjects: Information Theory (cs.IT)

The capacity of the AWGN broadcast channel is achieved by superposition
coding, but superposition of individual coded modulations expands the
modulation alphabet and distorts its configuration. Coded modulation over a
broadcast channel subject to a specific channel-input modulation constraint
remains an important open problem. Some progress has been made in the related
area of unequal-error protection modulations which can be considered
single-user broadcast transmission, but it does not approach all points on the
boundary of the capacity region. This paper studies broadcast coded modulation
using multilevel coding (MLC) subject to a specific channel input
constellation. The conditions under which multilevel codes can achieve the
constellation-constrained capacity of the AWGN broadcast channel are derived.
For any given constellation, we propose a pragmatic multilevel design technique
with near constellation-constrained-capacity performance where the coupling of
the superposition inner and outer codes are localized to each bit-level. It is
shown that this can be further relaxed to a code coupling on only one bit
level, with little or no penalty under natural labeling. The rate allocation
problem between the bit levels of the two users is studied and a pragmatic
method is proposed, again with near-capacity performance. In further pursuit of
lower complexity, a hybrid MLC-BICM is proposed, whose performance is shown to
be very close to the boundary of the constellation-constrained capacity region.
Simulation results show that good point-to-point LDPC codes produce excellent
performance in the proposed coded modulation framework.

Through the Haze: A Non-Convex Approach to Blind Calibration for Linear Random Sensing Models

Valerio Cambareri, Laurent Jacques
Comments: 42 pages, 7 figures. A finalised version of this draft is being submitted to Information and Inference: a Journal of the IMA
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

Computational sensing strategies often suffer from calibration errors in the
physical implementation of their ideal sensing models. Such uncertainties are
typically addressed by using multiple, accurately chosen training signals to
recover the missing information on the sensing model, an approach that can be
resource-consuming and cumbersome. Conversely, blind calibration does not
employ any training signal, but corresponds to a bilinear inverse problem whose
algorithmic solution is an open issue. We here address blind calibration as a
non-convex problem for linear random sensing models, in which we aim to recover
an unknown signal from its projections on sub-Gaussian random vectors, each
subject to an unknown multiplicative factor (or gain). To solve this
optimisation problem we resort to projected gradient descent starting from a
suitable, carefully chosen initialisation point. An analysis of this algorithm
allows us to show that it converges to the global optimum provided a sample
complexity requirement is met, i.e., relating convergence to the amount of
information collected during the sensing process. Interestingly, we show that
this requirement is actually linear (up to (log) factors) in the number of
unknowns of the problem. This sample complexity is found both in absence of
prior information, as well as when subspace priors are available for both the
signal and gains, allowing a further reduction of the number of observations
required for their provably exact recovery. Moreover, in the presence of noise
we show how our algorithm yields a solution whose accuracy degrades gracefully
with the amount of noise affecting the measurements. Finally, we present some
numerical experiments in an imaging context, for which our algorithm allows for
a simple solution to blind calibration of the gains in an imaging sensor array.

Direct-dynamical entanglement-discord relations

Virginia Feldman, Jonas Maziero, A. Auyuanet
Comments: 7 pages, 3 figures
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

In this article, by considering Bell-diagonal two-qubit initial states
submitted to local dynamics generated by the phase damping, bit flip, phase
flip, bit-phase flip, and depolarizing channels, we report some elegant
direct-dynamical relations between geometric measures of entanglement and
discord. The complex scenario appearing already in this simplified case study
indicates that similarly simple relation shall hardly be found in more general
situations.