IT博客汇 | arXiv Paper Daily: Wed, 8 Feb 2017

arXiv Paper Daily: Wed, 8 Feb 2017

我爱机器学习(52ml.net)发表于 2017-02-08 00:00:00

Neural and Evolutionary Computing

Estimation of classrooms occupancy using a multi-layer perceptron

Eugénio Rodrigues, Luísa Dias Pereira, Adélio Rodrigues Gaspar, Álvaro Gomes, Manuel Carlos Gameiro da Silva
Comments: 6 pages, 2 figures, conference article
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

This paper presents a multi-layer perceptron model for the estimation of
classrooms number of occupants from sensed indoor environmental data-relative
humidity, air temperature, and carbon dioxide concentration. The modelling
datasets were collected from two classrooms in the Secondary School of Pombal,
Portugal. The number of occupants and occupation periods were obtained from
class attendance reports. However, post-class occupancy was unknown and the
developed model is used to reconstruct the classrooms occupancy by filling the
unreported periods. Different model structure and environment variables
combination were tested. The model with best accuracy had as input vector 10
variables of five averaged time intervals of relative humidity and carbon
dioxide concentration. The model presented a mean square error of 1.99,
coefficient of determination of 0.96 with a significance of p-value < 0.001,
and a mean absolute error of 1 occupant. These results show promising
estimation capabilities in uncertain indoor environment conditions.

Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming

Andrew Sohn, Randal S. Olson, Jason H. Moore
Comments: 9 pages, 4 figures, submitted to GECCO 2017 conference and currently under review
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

Machine learning has been gaining traction in recent years to meet the demand
for tools that can efficiently analyze and make sense of the ever-growing
databases of biomedical data in health care systems around the world. However,
effectively using machine learning methods requires considerable domain
expertise, which can be a barrier of entry for bioinformaticians new to
computational data science methods. Therefore, off-the-shelf tools that make
machine learning more accessible can prove invaluable for bioinformaticians. To
this end, we have developed an open source pipeline optimization tool
(TPOT-MDR) that uses genetic programming to automatically design machine
learning pipelines for bioinformatics studies. In TPOT-MDR, we implement
Multifactor Dimensionality Reduction (MDR) as a feature construction method for
modeling higher-order feature interactions, and combine it with a new expert
knowledge-guided feature selector for large biomedical data sets. We
demonstrate TPOT-MDR’s capabilities using a combination of simulated and real
world data sets from human genetics and find that TPOT-MDR significantly
outperforms modern machine learning methods such as logistic regression and
eXtreme Gradient Boosting (XGBoost). We further analyze the best pipeline
discovered by TPOT-MDR for a real world problem and highlight TPOT-MDR’s
ability to produce a high-accuracy solution that is also easily interpretable.

Characterisation of speech diversity using self-organising maps

Tom A. F. Anderson, David M. W. Powers
Comments: 16th Speech Science and Technology Conference (SST2016)
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

We report investigations into speaker classification of larger quantities of
unlabelled speech data using small sets of manually phonemically annotated
speech. The Kohonen speech typewriter is a semi-supervised method comprised of
self-organising maps (SOMs) that achieves low phoneme error rates. A SOM is a
2D array of cells that learn vector representations of the data based on
neighbourhoods. In this paper, we report a method to evaluate pronunciation
using multilevel SOMs with /hVd/ single syllable utterances for the study of
vowels, for Australian pronunciation.

Computer Vision and Pattern Recognition

An Implementation of Faster RCNN with Study for Region Sampling

Xinlei Chen, Abhinav Gupta
Comments: Technical Report, 3 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We adapted the join-training scheme of Faster RCNN framework from Caffe to
TensorFlow as a baseline implementation for object detection. Our code is made
publicly available. This report documents the simplifications made to the
original pipeline, with justifications from ablation analysis on both PASCAL
VOC 2007 and COCO 2014. We further investigated the role of non-maximal
suppression (NMS) in selecting regions-of-interest (RoIs) for region
classification, and found that a biased sampling toward small regions helps
performance and can achieve on-par mAP to NMS-based sampling when converged
sufficiently.

Tracking using Numerous Anchor points

Tanushri Chakravorty, Guillaume-Alexandre Bilodeau, Eric Granger
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, an online adaptive model-free tracker is proposed to track
single objects in video sequences to deal with real-world tracking challenges
like low-resolution, object deformation, occlusion and motion blur. The novelty
lies in the construction of a strong appearance model that captures features
from the initialized bounding box and then are assembled into anchor-point
features. These features memorize the global pattern of the object and have an
internal star graph-like structure. These features are unique and flexible and
helps tracking generic and deformable objects with no limitation on specific
objects. In addition, the relevance of each feature is evaluated online using
short-term consistency and long-term consistency. These parameters are adapted
to retain consistent features that vote for the object location and that deal
with outliers for long-term tracking scenarios. Additionally, voting in a
Gaussian manner helps in tackling inherent noise of the tracking system and in
accurate object localization. Furthermore, the proposed tracker uses pairwise
distance measure to cope with scale variations and combines pixel-level binary
features and global weighted color features for model update. Finally,
experimental results on a visual tracking benchmark dataset are presented to
demonstrate the effectiveness and competitiveness of the proposed tracker.

Face Aging With Conditional Generative Adversarial Networks

Grigory Antipov, Moez Baccouche, Jean-Luc Dugelay
Comments: 5 pages, 3 figures, submitted to ICIP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

It has been recently shown that Generative Adversarial Networks (GANs) can
produce synthetic images of exceptional visual fidelity. In this work, we
propose the GAN-based method for automatic face aging. Contrary to previous
works employing GANs for altering of facial attributes, we make a particular
emphasize on preserving the original person’s identity in the aged version of
his/her face. To this end, we introduce a novel approach for
“Identity-Preserving” optimization of GAN’s latent vectors. The objective
evaluation of the resulting aged and rejuvenated face images by the
state-of-the-art face recognition and age estimation solutions demonstrate the
high potential of the proposed method.

Image Reconstruction using Matched Wavelet Estimated from Data Sensed Compressively using Partial Canonical Identity Matrix

Naushad Ansari, Anubha Gupta
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a joint framework wherein lifting-based, separable,
image-matched wavelets are estimated from compressively sensed (CS) images and
used for the reconstruction of the same. Matched wavelet can be easily designed
if full image is available. Also matched wavelet may provide better
reconstruction results in CS application compared to standard wavelet
sparsifying basis. Since in CS application, we have compressively sensed image
instead of full image, existing methods of designing matched wavelet cannot be
used. Thus, we propose a joint framework that estimates matched wavelet from
the compressively sensed images and also reconstructs full images. This paper
has three significant contributions. First, lifting-based, image-matched
separable wavelet is designed from compressively sensed images and is also used
to reconstruct the same. Second, a simple sensing matrix is employed to sample
data at sub-Nyquist rate such that sensing and reconstruction time is reduced
considerably without any noticeable degradation in the reconstruction
performance. Third, a new multi-level L-Pyramid wavelet decomposition strategy
is provided for separable wavelet implementation on images that leads to
improved reconstruction performance. Compared to CS-based reconstruction using
standard wavelets with Gaussian sensing matrix and with existing wavelet
decomposition strategy, the proposed methodology provides faster and better
image reconstruction in compressive sensing application.

Hashing in the Zero Shot Framework via Domain Adaptation

Shubham Pachori, Ameya Deshpande, Shanmuganathan Raman
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Techniques to learn hash codes which can store and retrieve large dimensional
multimedia data efficiently have attracted broad research interests in the
recent years. With rapid explosion of newly emerged concepts and online data,
existing supervised hashing algorithms suffer from the problem of scarcity of
ground truth annotations due to the high cost of obtaining manual annotations.
Therefore, we propose an algorithm to learn a hash function from training
images belonging to seen classes which can efficiently encode images of unseen
classes to binary codes. Specifically, we project the image features from
visual space and semantic features from semantic space into a common Hamming
subspace. Earlier works to generate hash codes have tried to relax the discrete
constraints on hash codes and solve the continuous optimization problem.
However, it often leads to quantization errors. In this work, we use the
max-margin classifier to learn an efficient hash function. To address the
concern of domain-shift which may arise due to the introduction of new classes,
we also introduce an unsupervised domain adaptation model in the proposed
hashing framework. Results on the three image datasets show the advantage of
using domain adaptation in learning a high-quality hash function and
superiority of our method for the task of image retrieval performance as
compared to several state-of-the-art hashing methods.

A New Point-set Registration Algorithm for Fingerprint Matching

A. Pasha Hosseinbor, Renat Zhdanov, Alexander Ushveridze
Comments: Point pattern matching, point-set registration, fingerprint, minutia matching, alignment
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A novel minutia-based fingerprint matching algorithm is proposed that employs
iterative global alignment on two minutia sets. The matcher considers all
possible minutia pairings and iteratively aligns the two sets until the number
of minutia pairs does not exceed the maximum number of allowable one-to-one
pairings. The optimal alignment parameters are derived analytically via linear
least squares. The first alignment establishes a region of overlap between the
two minutia sets, which is then (iteratively) refined by each successive
alignment. After each alignment, minutia pairs that exhibit weak correspondence
are discarded. The process is repeated until the number of remaining pairs no
longer exceeds the maximum number of allowable one-to-one pairings. The
proposed algorithm is tested on both the FVC2000 and FVC2002 databases, and the
results indicate that the proposed matcher is both effective and efficient for
fingerprint authentication; it is fast and does not utilize any computationally
expensive mathematical functions (e.g. trigonometric, exponential). In addition
to the proposed matcher, another contribution of the paper is the analytical
derivation of the least squares solution for the optimal alignment parameters
for two point-sets lacking exact correspondence.

Development of JavaScript-based deep learning platform and application to distributed training

Masatoshi Hidaka, Ken Miura, Tatsuya Harada
Comments: Workshop paper for ICLR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

Deep learning is increasingly attracting attention for processing big data.
Existing frameworks for deep learning must be set up to specialized computer
systems. Gaining sufficient computing resources therefore entails high costs of
deployment and maintenance. In this work, we implement a matrix library and
deep learning framework that uses JavaScript. It can run on web browsers
operating on ordinary personal computers and smartphones. Using JavaScript,
deep learning can be accomplished in widely diverse environments without the
necessity for software installation. Using GPGPU from WebCL framework, our
framework can train large scale convolutional neural networks such as VGGNet
and ResNet. In the experiments, we demonstrate their practicality by training
VGGNet in a distributed manner using web browsers as the client.

A Region Based Easy Path Wavelet Transform For Sparse Image Representation

Renato Budinich
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

The Easy Path Wavelet Transform is an adaptive transform for bivariate
functions (in particular natural images) which has been proposed in [1]. It
provides a sparse representation by finding a path in the domain of the
function leveraging the local correlations of the function values. It then
applies a one dimensional wavelet transform to the obtained vector, decimates
the points and iterates the procedure. The main drawback of such method is the
need to store, for each level of the transform, the path which vectorizes the
two dimensional data. Here we propose a variation on the method which consists
of firstly applying a segmentation procedure to the function domain,
partitioning it into regions where the variation in the function values is low;
in a second step, inside each such region, a path is found in some
deterministic way, i.e. not data-dependent. This circumvents the need to store
the paths at each level, while still obtaining good quality lossy compression.
This method is particularly well suited to encode a Region of Interest in the
image with different quality than the rest of the image.

[1] Gerlind Plonka. The easy path wavelet transform: A new adaptive wavelet
transform for sparse representation of two-dimensional data. Multiscale
Modeling & Simulation, 7(3):1474(-)1496, 2008.

Low Rank Matrix Recovery with Simultaneous Presence of Outliers and Sparse Corruption

Mostafa Rahmani, George Atia
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We study a data model in which the data matrix D can be expressed as D = L +
S + C, where L is a low rank matrix, S an element-wise sparse matrix and C a
matrix whose non-zero columns are outlying data points. To date, robust PCA
algorithms have solely considered models with either S or C, but not both. As
such, existing algorithms cannot account for simultaneous element-wise and
column-wise corruptions. In this paper, a new robust PCA algorithm that is
robust to simultaneous types of corruption is proposed. Our approach hinges on
the sparse approximation of a sparsely corrupted column so that the sparse
expansion of a column with respect to the other data points is used to
distinguish a sparsely corrupted inlier column from an outlying data point. We
also develop a randomized design which provides a scalable implementation of
the proposed approach. The core idea of sparse approximation is analyzed
analytically where we show that the underlying ell_1-norm minimization can
obtain the representation of an inlier in presence of sparse corruptions.

Artificial Intelligence

Extracting Lifted Mutual Exclusion Invariants from Temporal Planning Domains

Sara Bernardini, Fabio Fagnani, David E. Smith
Subjects: Artificial Intelligence (cs.AI)

We present a technique for automatically extracting mutual exclusion
invariants from temporal planning instances. It first identifies a set of
invariant templates by inspecting the lifted representation of the domain and
then checks these templates against properties that assure invariance. Our
technique builds on other approaches to invariant synthesis presented in the
literature, but departs from their limited focus on instantaneous actions by
addressing temporal domains. To deal with time, we formulate invariance
conditions that account for the entire structure of the actions and the
possible concurrent interactions between them. As a result, we construct a
significantly more comprehensive technique than previous methods, which is able
to find not only invariants for temporal domains, but also a broader set of
invariants for non-temporal domains. The experimental results reported in this
paper provide evidence that identifying a broader set of invariants results in
the generation of fewer multi-valued state variables with larger domains. We
show that, in turn, this reduction in the number of variables reflects
positively on the performance of a number of temporal planners that use a
variable/value representation by significantly reducing their running time.

ASHACL: Alternative Shapes Constraint Language

Peter F. Patel-Schneider
Comments: 17 pages
Subjects: Artificial Intelligence (cs.AI)

ASHACL, a variant of the W3C Shapes Constraint Language, is designed to
determine whether an RDF graph meets some conditions. These conditions are
grouped into shapes, which validate whether particular RDF terms each meet the
constraints of the shape. Shapes are themselves expressed as RDF triples in an
RDF graph, called a shapes graph.

Solving the Brachistochrone Problem by an Influence Diagram

Jiří Vomlel
Comments: 8 pages, 2 figures
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)

Influence diagrams are a decision-theoretic extension of probabilistic
graphical models. In this paper we show how they can be used to solve the
Brachistochrone problem. We present results of numerical experiments on this
problem, compare the solution provided by the influence diagram with the
optimal solution. The R code used for the experiments is presented in the
Appendix.

Representations of language in a model of visually grounded speech signal

Grzegorz Chrupała, Lieke Gelderloos, Afra Alishahi
Comments: 10 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present a visually grounded model of speech perception which projects
spoken utterances and images to a joint semantic space. We use a multi-layer
recurrent highway network to model the temporal nature of spoken speech, and
show that it learns to extract both form and meaning-based linguistic knowledge
from the input signal. We carry out an in-depth analysis of the representations
used by different components of the trained model and show that encoding of
semantic aspects tends to become richer as we go up the hierarchy of layers,
whereas encoding of form-related aspects of the language input tends to
initially increase and then plateau or decrease.

Learning what matters – Sampling interesting patterns

Vladimir Dzyuba, Matthijs van Leeuwen
Comments: PAKDD 2017, extended version
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Databases (cs.DB)

In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user’s interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.

Information Retrieval

First Study on Data Readiness Level

Hui Guan, Thanos Gentimis, Hamid Krim, James Keiser
Comments: 9 pages
Subjects: Information Retrieval (cs.IR); General Literature (cs.GL)

We introduce the idea of Data Readiness Level (DRL) to measure the relative
richness of data to answer specific questions often encountered by data
scientists. We first approach the problem in its full generality explaining its
desired mathematical properties and applications and then we propose and study
two DRL metrics. Specifically, we define DRL as a function of at least four
properties of data: Noisiness, Believability, Relevance, and Coherence. The
information-theoretic based metrics, Cosine Similarity and Document Disparity,
are proposed as indicators of Relevance and Coherence for a piece of data. The
proposed metrics are validated through a text-based experiment using Twitter
data.

Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models

Navid Rekabsaz, Mihai Lupu, Artem Baklanov, Allan Hanbury, Alexander Duer, Linda Anderson
Subjects: Information Retrieval (cs.IR); Computational Engineering, Finance, and Science (cs.CE)

Volatility prediction–an essential concept in financial markets–has
recently been addressed using sentiment analysis methods. We investigate the
sentiment of annual disclosures of companies in stock markets to forecast
volatility. We specifically explore the use of recent Information Retrieval
(IR) term weighting models that are effectively extended by related terms using
word embeddings. In parallel to textual information, factual market data have
been widely used as the mainstream approach to forecast market risk. We
therefore study different fusion methods to combine text and market data
resources. Our word embedding-based approach significantly outperforms
state-of-the-art methods. In addition, we investigate the characteristics of
the reports of the companies in different financial sectors.

Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

Ibrahim Abu El-Khair
Journal-ref: Abu El-Khair, Ibrahim. (2006). Effects of stop words elimination
for Arabic information retrieval: a comparative study. International Journal
of Computing & Information Sciences 4 (3), 119-133
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

The effectiveness of three stop words lists for Arabic Information
Retrieval—General Stoplist, Corpus-Based Stoplist, Combined Stoplist —were
investigated in this study. Three popular weighting schemes were examined: the
inverse document frequency weight, probabilistic weighting, and statistical
language modelling. The Idea is to combine the statistical approaches with
linguistic approaches to reach an optimal performance, and compare their effect
on retrieval. The LDC (Linguistic Data Consortium) Arabic Newswire data set was
used with the Lemur Toolkit. The Best Match weighting scheme used in the Okapi
retrieval system had the best overall performance of the three weighting
algorithms used in the study, stoplists improved retrieval effectiveness
especially when used with the BM25 weight. The overall performance of a general
stoplist was better than the other two lists.

Computation and Language

Fast and Accurate Sequence Labeling with Iterated Dilated Convolutions

Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum
Subjects: Computation and Language (cs.CL)

Bi-directional LSTMs have emerged as a standard method for obtaining
per-token vector representations serving as input to various token labeling
tasks (whether followed by Viterbi prediction or independent classification).
This paper proposes an alternative to Bi-LSTMs for this purpose: iterated
dilated convolutional neural networks (ID-CNNs), which have better capacity
than traditional CNNs for large context and structured prediction. We describe
a distinct combination of network structure, parameter sharing and training
procedures that is not only more accurate than Bi-LSTM-CRFs, but also 8x faster
at test time on long sequences. Moreover, ID-CNNs with independent
classification enable a dramatic 14x test-time speedup, while still attaining
accuracy comparable to the Bi-LSTM-CRF. We further demonstrate the ability of
ID-CNNs to combine evidence over long sequences by demonstrating their improved
accuracy on whole-document (rather than per-sentence) inference. Unlike LSTMs
whose sequential processing on sentences of length N requires O(N) time even in
the face of parallelism, IDCNNs permit fixed-depth convolutions to run in
parallel across entire documents. Today when many companies run basic NLP on
the entire web and large-volume traffic, faster methods are paramount to saving
time and energy costs.

Characterisation of speech diversity using self-organising maps

We report investigations into speaker classification of larger quantities of
unlabelled speech data using small sets of manually phonemically annotated
speech. The Kohonen speech typewriter is a semi-supervised method comprised of
self-organising maps (SOMs) that achieves low phoneme error rates. A SOM is a
2D array of cells that learn vector representations of the data based on
neighbourhoods. In this paper, we report a method to evaluate pronunciation
using multilevel SOMs with /hVd/ single syllable utterances for the study of
vowels, for Australian pronunciation.

Knowledge Adaptation: Teaching to Adapt

Sebastian Ruder, Parsa Ghaffari, John G. Breslin
Comments: 11 pages, 4 figures, 2 tables
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Domain adaptation is crucial in many real-world applications where the
distribution of the training data differs from the distribution of the test
data. Previous Deep Learning-based approaches to domain adaptation need to be
trained jointly on source and target domain data and are therefore unappealing
in scenarios where models need to be adapted to a large number of domains or
where a domain is evolving, e.g. spam detection where attackers continuously
change their tactics.

To fill this gap, we propose Knowledge Adaptation, an extension of Knowledge
Distillation (Bucilua et al., 2006; Hinton et al., 2015) to the domain
adaptation scenario. We show how a student model achieves state-of-the-art
results on unsupervised domain adaptation from multiple sources on a standard
sentiment analysis benchmark by taking into account the domain-specific
expertise of multiple teachers and the similarities between their domains.

When learning from a single teacher, using domain similarity to gauge
trustworthiness is inadequate. To this end, we propose a simple metric that
correlates well with the teacher’s accuracy in the target domain. We
demonstrate that incorporating high-confidence examples selected by this metric
enables the student model to achieve state-of-the-art performance in the
single-source scenario.

Representations of language in a model of visually grounded speech signal

Grzegorz Chrupała, Lieke Gelderloos, Afra Alishahi
Comments: 10 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present a visually grounded model of speech perception which projects
spoken utterances and images to a joint semantic space. We use a multi-layer
recurrent highway network to model the temporal nature of spoken speech, and
show that it learns to extract both form and meaning-based linguistic knowledge
from the input signal. We carry out an in-depth analysis of the representations
used by different components of the trained model and show that encoding of
semantic aspects tends to become richer as we go up the hierarchy of layers,
whereas encoding of form-related aspects of the language input tends to
initially increase and then plateau or decrease.

EliXa: A Modular and Flexible ABSA Platform

Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri
Comments: 5 pages, conference
Journal-ref: Proceedings of the 9th International Workshop on Semantic
Evaluation (SemEval 2015). Association for Computational Linguistics, June
2015, Denver, Colorado, pp.748-752
Subjects: Computation and Language (cs.CL)

This paper presents a supervised Aspect Based Sentiment Analysis (ABSA)
system. Our aim is to develop a modular platform which allows to easily conduct
experiments by replacing the modules or adding new features. We obtain the best
result in the Opinion Target Extraction (OTE) task (slot 2) using an
off-the-shelf sequence labeler. The target polarity classification (slot 3) is
addressed by means of a multiclass SVM algorithm which includes lexical based
features such as the polarity values obtained from domain and open polarity
lexicons. The system obtains accuracies of 0.70 and 0.73 for the restaurant and
laptop domain respectively, and performs second best in the out-of-domain
hotel, achieving an accuracy of 0.80.

A Knowledge-Grounded Neural Conversation Model

Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, Michel Galley
Comments: 10 pages
Subjects: Computation and Language (cs.CL)

Neural network models are capable of generating extremely natural sounding
conversational interactions. Nevertheless, these models have yet to demonstrate
that they can incorporate content in the form of factual information or
entity-grounded opinion that would enable them to serve in more task-oriented
conversational applications. This paper presents a novel, fully data-driven,
and knowledge-grounded neural conversation model aimed at producing more
contentful responses without slot filling. We generalize the widely-used
Seq2Seq approach by conditioning responses on both conversation history and
external “facts”, allowing the model to be versatile and applicable in an
open-domain setting. Our approach yields significant improvements over a
competitive Seq2Seq baseline. Human judges found that our outputs are
significantly more informative.

Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

The effectiveness of three stop words lists for Arabic Information
Retrieval—General Stoplist, Corpus-Based Stoplist, Combined Stoplist —were
investigated in this study. Three popular weighting schemes were examined: the
inverse document frequency weight, probabilistic weighting, and statistical
language modelling. The Idea is to combine the statistical approaches with
linguistic approaches to reach an optimal performance, and compare their effect
on retrieval. The LDC (Linguistic Data Consortium) Arabic Newswire data set was
used with the Lemur Toolkit. The Best Match weighting scheme used in the Okapi
retrieval system had the best overall performance of the three weighting
algorithms used in the study, stoplists improved retrieval effectiveness
especially when used with the BM25 weight. The overall performance of a general
stoplist was better than the other two lists.

Comparative Study of CNN and RNN for Natural Language Processing

Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Schütze
Comments: 7 pages, 11 figures
Subjects: Computation and Language (cs.CL)

Deep neural networks (DNN) have revolutionized the field of natural language
processing (NLP). Convolutional neural network (CNN) and recurrent neural
network (RNN), the two main types of DNN architectures, are widely explored to
handle various NLP tasks. CNN is supposed to be good at extracting
position-invariant features and RNN at modeling units in sequence. The state of
the art on many NLP tasks often switches due to the battle between CNNs and
RNNs. This work is the first systematic comparison of CNN and RNN on a wide
range of representative NLP tasks, aiming to give basic guidance for DNN
selection.

The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task

Roy Schwartz, Maarten Sap, Ioannis Konstas, Leila Zilles, Yejin Choi, Noah A. Smith
Subjects: Computation and Language (cs.CL)

A writer’s style depends not just on personal traits but also on her intent
and mental state. In this paper, we show how variants of the same writing task
can lead to measurable differences in writing style. We present a case study
based on the story cloze task (Mostafazadeh et al., 2016a), where annotators
were assigned similar writing tasks with different constraints: (1) writing an
entire story, (2) adding a story ending for a given story context, and (3)
adding an incoherent ending to a story. We show that a simple linear classifier
informed with stylistic features is able to successfully distinguish between
the three cases, without even looking at the story context. In addition, our
style-based classifier establishes a new state-of-the-art result on the story
cloze challenge, substantially higher than previous results based on deep
learning models. Our results demonstrate that different task framings can
dramatically affect the way people write.

Neural Discourse Structure for Text Categorization

Yangfeng Ji, Noah Smith
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

We show that discourse structure, as defined by Rhetorical Structure Theory
and provided by an existing discourse parser, benefits text categorization. Our
approach uses a recursive neural network and a newly proposed attention
mechanism to compute a representation of the text that focuses on salient
content, from the perspective of both RST and the task. Experiments consider
variants of the approach and illustrate its strengths and weaknesses.

Living a discrete life in a continuous world: Reference with distributed representations

Gemma Boleda, Sebastian Padó, Nghia The Pham, Marco Baroni
Comments: Under review at ACL 2017. 6 pages, 1 figure
Subjects: Computation and Language (cs.CL)

Reference is the crucial property of language that allows us to connect
linguistic expressions to the world. Modeling it requires handling both
continuous and discrete aspects of meaning. Data-driven models excel at the
former, but struggle with the latter, and the reverse is true for symbolic
models.

We propose a fully data-driven, end-to-end trainable model that, while
operating on continuous multimodal representations, learns to organize them
into a discrete-like entity library. We also introduce a referential task to
test it, cross-modal tracking. Our model beats standard neural network
architectures, but is outperformed by some parametrizations of Memory Networks,
another model with external memory.

Beam Search Strategies for Neural Machine Translation

Markus Freitag, Yaser Al-Onaizan
Subjects: Computation and Language (cs.CL)

The basic concept in Neural Machine Translation (NMT) is to train a large
Neural Network that maximizes the translation performance on a given parallel
corpus. NMT is then using a simple left-to-right beam-search decoder to
generate new translations that approximately maximize the trained conditional
probability. The current beam search strategy generates the target sentence
word by word from left-to- right while keeping a fixed amount of active
candidates at each time step. First, this simple search is less adaptive as it
also expands candidates whose scores are much worse than the current best.
Secondly, it does not expand hypotheses if they are not within the best scoring
candidates, even if their scores are close to the best one. The latter one can
be avoided by increasing the beam size until no performance improvement can be
observed. While you can reach better performance, this has the draw- back of a
slower decoding speed. In this paper, we concentrate on speeding up the decoder
by applying a more flexible beam search strategy whose candidate size may vary
at each time step depending on the candidate scores. We speed up the original
decoder by up to 43% for the two language pairs German-English and
Chinese-English without losing any translation quality.

Ensemble Distillation for Neural Machine Translation

Markus Freitag, Yaser Al-Onaizan, Baskaran Sankaran
Subjects: Computation and Language (cs.CL)

Knowledge distillation describes a method for training a student network to
perform better by learning from a stronger teacher network. In this work, we
run experiments with different kinds of teacher net- works to enhance the
translation performance of a student Neural Machine Translation (NMT) network.
We demonstrate techniques based on an ensemble and a best BLEU teacher network.
We also show how to benefit from a teacher network that has the same
architecture and dimensions of the student network. Further- more, we introduce
a data filtering technique based on the dissimilarity between the forward
translation (obtained during knowledge distillation) of a given source sentence
and its target reference. We use TER to measure dissimilarity. Finally, we show
that an ensemble teacher model can significantly reduce the student model size
while still getting performance improvements compared to the baseline student
network.

Multi-task Coupled Attentions for Category-specific Aspect and Opinion Terms Co-extraction

Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier
Subjects: Computation and Language (cs.CL)

In aspect-based sentiment analysis, most existing methods either focus on
aspect/opinion terms extraction or aspect terms categorization. However, each
task by itself only provides partial information to end users. To generate more
detailed and structured opinion analysis, we propose a finer-grained problem,
which we call category-specific aspect and opinion terms extraction. This
problem involves the identification of aspect and opinion terms within each
sentence, as well as the categorization of the identified terms. To this end,
we propose an end-to-end multi-task attention model, where each task
corresponds to aspect/opinion terms extraction for a specific category. Our
model benefits from exploring the commonalities and relationships among
different tasks to address the data sparsity issue. We demonstrate its
state-of-the-art performance on three benchmark datasets.

Distributed, Parallel, and Cluster Computing

Achieving Dilution without Knowledge of Coordinates in the SINR Model

William K. Moses Jr., Shailesh Vaya
Comments: 10 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Considerable literature has been developed for various fundamental
distributed problems in the SINR (Signal-to-Interference-plus-Noise-Ratio)
model for radio transmission. A setting typically studied is when all nodes
transmit a signal of the same strength, and each device only has access to
knowledge about the total number of nodes in the network (n), the range from
which each node’s label is taken ([1,dots,N]), and the label of the device
itself. In addition, an assumption is made that each node also knows its
coordinates in the Euclidean plane. In this paper, we create a technique which
allows algorithm designers to remove that last assumption. The assumption about
the unavailability of the knowledge of the physical coordinates of the nodes
truly captures the `ad-hoc’ nature of wireless networks.

Previous work in this area uses a flavor of a technique called dilution, in
which nodes transmit in a (predetermined) round-robin fashion, and are able to
reach all their neighbors. However, without knowing the physical coordinates,
it’s not possible to know the coordinates of their containing (pivotal) grid
box and seemingly not possible to use dilution (to coordinate their
transmissions). We propose a new technique to achieve dilution without using
the knowledge of physical coordinates. This technique exploits the
understanding that the transmitting nodes lie in 2-D space, segmented by an
appropriate pivotal grid, without explicitly referring to the actual physical
coordinates of these nodes. Using this technique, it is possible for every weak
device to successfully transmit its message to all of its neighbors in
(Theta(lg N)) rounds, as long as the density of transmitting nodes in any
physical grid box is bounded by a known constant. This technique, we feel, is
an important generic tool for devising practical protocols when physical
coordinates of the nodes are not known.

Unobtrusive Deferred Update Stabilization for Efficient Geo-Replication

Chathuri Gunawardhana, Manuel Bravo, Luís Rodrigues
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

In this paper we propose a novel approach to manage the throughput vs latency
tradeoff that emerges when managing updates in geo-replicated systems. Our
approach consists in allowing full concurrency when processing local updates
and using a deferred local serialisation procedure before shipping updates to
remote datacenters. This strategy allows to implement inexpensive mechanisms to
ensure system consistency requirements while avoiding intrusive effects on
update operations, a major performance limitation of previous systems. We have
implemented our approach as a variant of Riak KV. Our extensive evaluation
shows that we outperform sequencer-based approaches by almost an order of
magnitude in the maximum achievable throughput. Furthermore, unlike previous
sequencer-free solutions, our approach reaches nearly optimal remote update
visibility latencies without limiting throughput.

Model-driven Scheduling for Distributed Stream Processing Systems

Anshu Shukla, Yogesh Simmhan
Comments: 54 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Distributed Stream Processing frameworks are being commonly used with the
evolution of Internet of Things(IoT). These frameworks are designed to adapt to
the dynamic input message rate by scaling in/out.Apache Storm, originally
developed by Twitter is a widely used stream processing engine while others
includes Flink, Spark streaming. For running the streaming applications
successfully there is need to know the optimal resource requirement, as
over-estimation of resources adds extra cost.So we need some strategy to come
up with the optimal resource requirement for a given streaming application. In
this article, we propose a model-driven approach for scheduling streaming
applications that effectively utilizes a priori knowledge of the applications
to provide predictable scheduling behavior. Specifically, we use application
performance models to offer reliable estimates of the resource allocation
required. Further, this intuition also drives resource mapping, and helps
narrow the estimated and actual dataflow performance and resource utilization.
Together, this model-driven scheduling approach gives a predictable application
performance and resource utilization behavior for executing a given DSPS
application at a target input stream rate on distributed resources.

Development of JavaScript-based deep learning platform and application to distributed training

Deep learning is increasingly attracting attention for processing big data.
Existing frameworks for deep learning must be set up to specialized computer
systems. Gaining sufficient computing resources therefore entails high costs of
deployment and maintenance. In this work, we implement a matrix library and
deep learning framework that uses JavaScript. It can run on web browsers
operating on ordinary personal computers and smartphones. Using JavaScript,
deep learning can be accomplished in widely diverse environments without the
necessity for software installation. Using GPGPU from WebCL framework, our
framework can train large scale convolutional neural networks such as VGGNet
and ResNet. In the experiments, we demonstrate their practicality by training
VGGNet in a distributed manner using web browsers as the client.

Learning

Preference-based Teaching

Ziyuan Gao, Christoph Ries, Hans Ulrich Simon, Sandra Zilles
Comments: 35 pages
Subjects: Learning (cs.LG)

We introduce a new model of teaching named “preference-based teaching” and a
corresponding complexity parameter—the preference-based teaching dimension
(PBTD)—representing the worst-case number of examples needed to teach any
concept in a given concept class. Although the PBTD coincides with the
well-known recursive teaching dimension (RTD) on finite classes, it is
radically different on infinite ones: the RTD becomes infinite already for
trivial infinite classes (such as half-intervals) whereas the PBTD evaluates to
reasonably small values for a wide collection of infinite classes including
classes consisting of so-called closed sets w.r.t. a given closure operator,
including various classes related to linear sets over (mathbb{N}_0) (whose RTD
had been studied quite recently) and including the class of Euclidean
half-spaces. On top of presenting these concrete results, we provide the reader
with a theoretical framework (of a combinatorial flavor) which helps to derive
bounds on the PBTD.

Empirical Risk Minimization for Stochastic Convex Optimization: (O(1/n))- and (O(1/n^2))-type of Risk Bounds

Lijun Zhang, Tianbao Yang, Rong Jin
Subjects: Learning (cs.LG)

Although there exist plentiful theories of empirical risk minimization (ERM)
for supervised learning, current theoretical understandings of ERM for a
related problem—stochastic convex optimization (SCO), are limited. In this
work, we strengthen the realm of ERM for SCO by exploiting smoothness and
strong convexity conditions to improve the risk bounds. First, we establish an
(widetilde{O}(d/n + sqrt{F_*/n})) risk bound when the random function is
nonnegative, convex and smooth, and the expected function is Lipschitz
continuous, where (d) is the dimensionality of the problem, (n) is the number
of samples, and (F_*) is the minimal risk. Thus, when (F_*) is small we obtain
an (widetilde{O}(d/n)) risk bound, which is analogous to the
(widetilde{O}(1/n)) optimistic rate of ERM for supervised learning. Second, if
the objective function is also (lambda)-strongly convex, we prove an
(widetilde{O}(d/n + kappa F_*/n )) risk bound where (kappa) is the condition
number, and improve it to (O(1/[lambda n^2] + kappa F_*/n)) when
(n=widetilde{Omega}(kappa d)). As a result, we obtain an (O(kappa/n^2))
risk bound under the condition that (n) is large and (F_*) is small, which to
the best of our knowledge, is the first (O(1/n^2))-type of risk bound of ERM.
Third, we stress that the above results are established in a unified framework,
which allows us to derive new risk bounds under weaker conditions, e.g.,
without convexity of the random function and Lipschitz continuity of the
expected function. Finally, we demonstrate that to achieve an (O(1/[lambda
n^2] + kappa F_*/n)) risk bound for supervised learning, the
(widetilde{Omega}(kappa d)) requirement on (n) can be replaced with
(Omega(kappa^2)), which is dimensionality-independent.

Sparse Algorithm for Robust LSSVM in Primal Space

Li Chen, Shuisheng Zhou
Comments: 22 pages, 4 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

As enjoying the closed form solution, least squares support vector machine
(LSSVM) has been widely used for classification and regression problems having
the comparable performance with other types of SVMs. However, LSSVM has two
drawbacks: sensitive to outliers and lacking sparseness. Robust LSSVM (R-LSSVM)
overcomes the first partly via nonconvex truncated loss function, but the
current algorithms for R-LSSVM with the dense solution are faced with the
second drawback and are inefficient for training large-scale problems. In this
paper, we interpret the robustness of R-LSSVM from a re-weighted viewpoint and
give a primal R-LSSVM by the representer theorem. The new model may have sparse
solution if the corresponding kernel matrix has low rank. Then approximating
the kernel matrix by a low-rank matrix and smoothing the loss function by
entropy penalty function, we propose a convergent sparse R-LSSVM (SR-LSSVM)
algorithm to achieve the sparse solution of primal R-LSSVM, which overcomes two
drawbacks of LSSVM simultaneously. The proposed algorithm has lower complexity
than the existing algorithms and is very efficient for training large-scale
problems. Many experimental results illustrate that SR-LSSVM can achieve better
or comparable performance with less training time than related algorithms,
especially for training large scale problems.

Estimation of classrooms occupancy using a multi-layer perceptron

This paper presents a multi-layer perceptron model for the estimation of
classrooms number of occupants from sensed indoor environmental data-relative
humidity, air temperature, and carbon dioxide concentration. The modelling
datasets were collected from two classrooms in the Secondary School of Pombal,
Portugal. The number of occupants and occupation periods were obtained from
class attendance reports. However, post-class occupancy was unknown and the
developed model is used to reconstruct the classrooms occupancy by filling the
unreported periods. Different model structure and environment variables
combination were tested. The model with best accuracy had as input vector 10
variables of five averaged time intervals of relative humidity and carbon
dioxide concentration. The model presented a mean square error of 1.99,
coefficient of determination of 0.96 with a significance of p-value < 0.001,
and a mean absolute error of 1 occupant. These results show promising
estimation capabilities in uncertain indoor environment conditions.

Knowledge Adaptation: Teaching to Adapt

Sebastian Ruder, Parsa Ghaffari, John G. Breslin
Comments: 11 pages, 4 figures, 2 tables
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Domain adaptation is crucial in many real-world applications where the
distribution of the training data differs from the distribution of the test
data. Previous Deep Learning-based approaches to domain adaptation need to be
trained jointly on source and target domain data and are therefore unappealing
in scenarios where models need to be adapted to a large number of domains or
where a domain is evolving, e.g. spam detection where attackers continuously
change their tactics.

To fill this gap, we propose Knowledge Adaptation, an extension of Knowledge
Distillation (Bucilua et al., 2006; Hinton et al., 2015) to the domain
adaptation scenario. We show how a student model achieves state-of-the-art
results on unsupervised domain adaptation from multiple sources on a standard
sentiment analysis benchmark by taking into account the domain-specific
expertise of multiple teachers and the similarities between their domains.

When learning from a single teacher, using domain similarity to gauge
trustworthiness is inadequate. To this end, we propose a simple metric that
correlates well with the teacher’s accuracy in the target domain. We
demonstrate that incorporating high-confidence examples selected by this metric
enables the student model to achieve state-of-the-art performance in the
single-source scenario.

Truncated Variational EM for Semi-Supervised Neural Simpletrons

Dennis Forster, Jörg Lücke
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Inference and learning for probabilistic generative networks is often very
challenging and typically prevents scalability to as large networks as used for
deep discriminative approaches. To obtain efficiently trainable, large-scale
and well performing generative networks for semi-supervised learning, we here
combine two recent developments: a neural network reformulation of hierarchical
Poisson mixtures (Neural Simpletrons), and a novel truncated variational EM
approach (TV-EM). TV-EM provides theoretical guarantees for learning in
generative networks, and its application to Neural Simpletrons results in
particularly compact, yet approximately optimal, modifications of learning
equations. If applied to standard benchmarks, we empirically find, that
learning converges in fewer EM iterations, that the complexity per EM iteration
is reduced, and that final likelihood values are higher on average. For the
task of classification on data sets with few labels, learning improvements
result in consistently lower error rates if compared to applications without
truncation. Experiments on the MNIST data set herein allow for comparison to
standard and state-of-the-art models in the semi-supervised setting. Further
experiments on the NIST SD19 data set show the scalability of the approach when
a manifold of additional unlabeled data is available.

Gated Multimodal Units for Information Fusion

John Arevalo, Thamar Solorio, Manuel Montes-y-Gómez, Fabio A. González
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

This paper presents a novel model for multimodal learning based on gated
neural networks. The Gated Multimodal Unit (GMU) model is intended to be used
as an internal unit in a neural network architecture whose purpose is to find
an intermediate representation based on a combination of data from different
modalities. The GMU learns to decide how modalities influence the activation of
the unit using multiplicative gates. It was evaluated on a multilabel scenario
for genre classification of movies using the plot and the poster. The GMU
improved the macro f-score performance of single-modality approaches and
outperformed other fusion strategies, including mixture of experts models.
Along with this work, the MM-IMDb dataset is released which, to the best of our
knowledge, is the largest publicly available multimodal dataset for genre
prediction on movies.

Representations of language in a model of visually grounded speech signal

Grzegorz Chrupała, Lieke Gelderloos, Afra Alishahi
Comments: 10 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present a visually grounded model of speech perception which projects
spoken utterances and images to a joint semantic space. We use a multi-layer
recurrent highway network to model the temporal nature of spoken speech, and
show that it learns to extract both form and meaning-based linguistic knowledge
from the input signal. We carry out an in-depth analysis of the representations
used by different components of the trained model and show that encoding of
semantic aspects tends to become richer as we go up the hierarchy of layers,
whereas encoding of form-related aspects of the language input tends to
initially increase and then plateau or decrease.

Continuous-Time User Modeling in the Presence of Badges: A Probabilistic Approach

Ali Khodadadi, Seyed Abbas Hosseini, Erfan Tavakoli, Hamid R. Rabiee
Comments: 27 pages, 7 figures
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

User modeling plays an important role in delivering customized web services
to the users and improving their engagement. However, most user models in the
literature do not explicitly consider the temporal behavior of users. More
recently, continuous-time user modeling has gained considerable attention and
many user behavior models have been proposed based on temporal point processes.
However, typical point process based models often considered the impact of peer
influence and content on the user participation and neglected other factors.
Gamification elements, are among those factors that are neglected, while they
have a strong impact on user participation in online services. In this paper,
we propose interdependent multi-dimensional temporal point processes that
capture the impact of badges on user participation besides the peer influence
and content factors. We extend the proposed processes to model user actions
over the community based question and answering websites, and propose an
inference algorithm based on Variational-EM that can efficiently learn the
model parameters. Extensive experiments on both synthetic and real data
gathered from Stack Overflow show that our inference algorithm learns the
parameters efficiently and the proposed method can better predict the user
behavior compared to the alternatives.

Low Rank Matrix Recovery with Simultaneous Presence of Outliers and Sparse Corruption

Mostafa Rahmani, George Atia
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We study a data model in which the data matrix D can be expressed as D = L +
S + C, where L is a low rank matrix, S an element-wise sparse matrix and C a
matrix whose non-zero columns are outlying data points. To date, robust PCA
algorithms have solely considered models with either S or C, but not both. As
such, existing algorithms cannot account for simultaneous element-wise and
column-wise corruptions. In this paper, a new robust PCA algorithm that is
robust to simultaneous types of corruption is proposed. Our approach hinges on
the sparse approximation of a sparsely corrupted column so that the sparse
expansion of a column with respect to the other data points is used to
distinguish a sparsely corrupted inlier column from an outlying data point. We
also develop a randomized design which provides a scalable implementation of
the proposed approach. The core idea of sparse approximation is analyzed
analytically where we show that the underlying ell_1-norm minimization can
obtain the representation of an inlier in presence of sparse corruptions.

Neural Discourse Structure for Text Categorization

Yangfeng Ji, Noah Smith
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

We show that discourse structure, as defined by Rhetorical Structure Theory
and provided by an existing discourse parser, benefits text categorization. Our
approach uses a recursive neural network and a newly proposed attention
mechanism to compute a representation of the text that focuses on salient
content, from the perspective of both RST and the task. Experiments consider
variants of the approach and illustrate its strengths and weaknesses.

Learning similarity preserving representations with neural similarity encoders

Franziska Horn, Klaus-Robert Müller
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Many dimensionality reduction or manifold learning algorithms optimize for
retaining the pairwise similarities, distances, or local neighborhoods of data
points. Spectral methods like Kernel PCA (kPCA) or isomap achieve this by
computing the singular value decomposition (SVD) of some similarity matrix to
obtain a low dimensional representation of the original data. However, this is
computationally expensive if a lot of training examples are available and,
additionally, representations for new (out-of-sample) data points can only be
created when the similarities to the original training examples can be
computed. We introduce similarity encoders (SimEc), which learn similarity
preserving representations by using a feed-forward neural network to map data
into an embedding space where the original similarities can be approximated
linearly. The model optimizes the same objective as kPCA but in the process it
learns a linear or non-linear embedding function (in the form of the tuned
neural network), with which the representations of novel data points can be
computed – even if the original pairwise similarities of the training set were
generated by an unknown process such as human ratings. By creating embeddings
for both image and text datasets, we demonstrate that SimEc can, on the one
hand, reach the same solution as spectral methods, and, on the other hand,
obtain meaningful embeddings from similarities based on human labels.

Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming

Machine learning has been gaining traction in recent years to meet the demand
for tools that can efficiently analyze and make sense of the ever-growing
databases of biomedical data in health care systems around the world. However,
effectively using machine learning methods requires considerable domain
expertise, which can be a barrier of entry for bioinformaticians new to
computational data science methods. Therefore, off-the-shelf tools that make
machine learning more accessible can prove invaluable for bioinformaticians. To
this end, we have developed an open source pipeline optimization tool
(TPOT-MDR) that uses genetic programming to automatically design machine
learning pipelines for bioinformatics studies. In TPOT-MDR, we implement
Multifactor Dimensionality Reduction (MDR) as a feature construction method for
modeling higher-order feature interactions, and combine it with a new expert
knowledge-guided feature selector for large biomedical data sets. We
demonstrate TPOT-MDR’s capabilities using a combination of simulated and real
world data sets from human genetics and find that TPOT-MDR significantly
outperforms modern machine learning methods such as logistic regression and
eXtreme Gradient Boosting (XGBoost). We further analyze the best pipeline
discovered by TPOT-MDR for a real world problem and highlight TPOT-MDR’s
ability to produce a high-accuracy solution that is also easily interpretable.

Information Theory

Massive MIMO Beam-forming for High Speed Train Communication: Directivity vs Beamwidth

Xuhong Chen, Jiaxun Lu, Pingyi Fan
Subjects: Information Theory (cs.IT)

High-mobility adaption and massive Multiple-input Multiple-output (MIMO)
application are two primary evolving objectives for the next generation high
speed train communication system. In this paper, we consider how to design a
location-aware beam-forming for the massive MIMO system.We first analyze the
tradeoff between beam directivity and beamwidth, based on which we present the
sensitivity analysis of positioning accuracy. Then, we derive the maximum beam
directivity and corresponding beamwidth under the restriction of diverse
positioning accuracies to guarantee a high efficient transmission. Finally, we
present a low-complexity beam-forming design with positioning robustness
utilizing location information, which requires neither eigen-decomposing (ED)
the uplink channel covariance matrix (CCM) nor ED the downlink CCM (DCCM).
Numerical simulation indicates that a massive MIMO system with less than a
certain positioning error can guarantee a required performance with satisfying
transmission efficiency in the high-mobility scenario.

Robust Regularized ZF in Cooperative Broadcast Channel under Distributed CSIT

Qianrui Li, Paul de Kerret, David Gesbert, Nicolas Gresset
Comments: 38 pages, 4 figures, submitted to IEEE transaction on IT
Subjects: Information Theory (cs.IT)

In this work, we consider the sum rate performance of joint processing
coordinated multi-point transmission network (JP-CoMP, a.k.a Network MIMO) in a
so-called distributed channel state information (D-CSI) setting. In the D-CSI
setting, the various transmitters (TXs) acquire a local, TX-dependent, estimate
of the global multi-user channel state matrix obtained via terminal feedback
and limited backhauling. The CSI noise across TXs can be independent or
correlated, so as to reflect the degree to which TXs can exchange information
over the backhaul, hence allowing to model a range of situations bridging fully
distributed and fully centralized CSI settings. In this context we aim to study
the price of CSI distributiveness in terms of sum rate at finite SNR when
compared with conventional centralized scenarios. We consider the family of
JP-CoMP precoders known as regularized zero-forcing (RZF). We conduct our study
in the large scale antenna regime, as it is currently envisioned to be used in
real 5G deployments. It is then possible to obtain accurate approximations on
so-called deterministic equivalents of the signal to interference and noise
ratios. Guided by the obtained deterministic equivalents, we propose an
approach to derive a RZF scheme that is robust to the distributed aspect of the
CSI, whereby the key idea lies in the optimization of a TX-dependent power
level and regularization factor. Our analysis confirms the improved robustness
of the proposed scheme with respect to CSI inconsistency at different TXs, even
with moderate number of antennas and receivers (RXs).

An Efficient Global Algorithm for Single-Group Multicast Beamforming

Cheng Lu, Ya-Feng Liu
Comments: 35 pages, 4 figures, 5 tables; submitted for possible publication
Subjects: Information Theory (cs.IT)

Consider the single-group multicast beamforming problem, where multiple users
receive the same data stream simultaneously from a single transmitter. The
problem is NP-hard and all existing algorithms for the problem either find
suboptimal approximate or local stationary solutions. In this paper, we propose
an efficient branch-and-bound algorithm for the problem that is guaranteed to
find its global solution. To the best of our knowledge, our proposed algorithm
is the first tailored global algorithm for the single-group multicast
beamforming problem. Simulation results show that our proposed algorithm is
computationally efficient (albeit its theoretical worst-case iteration
complexity is exponential with respect to the number of receivers) and it
significantly outperforms a state-of-the-art general-purpose global
optimization solver called Baron. Our proposed algorithm provides an important
benchmark for performance evaluation of existing algorithms for the same
problem. By using it as the benchmark, we show that two state-of-the-art
algorithms, semidefinite relaxation algorithm and successive linear
approximation algorithm, work well when the problem dimension (i.e., the number
of antennas at the transmitter and the number of receivers) is small but their
performance deteriorates quickly as the problem dimension increases.

A Region Based Easy Path Wavelet Transform For Sparse Image Representation

Renato Budinich
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

The Easy Path Wavelet Transform is an adaptive transform for bivariate
functions (in particular natural images) which has been proposed in [1]. It
provides a sparse representation by finding a path in the domain of the
function leveraging the local correlations of the function values. It then
applies a one dimensional wavelet transform to the obtained vector, decimates
the points and iterates the procedure. The main drawback of such method is the
need to store, for each level of the transform, the path which vectorizes the
two dimensional data. Here we propose a variation on the method which consists
of firstly applying a segmentation procedure to the function domain,
partitioning it into regions where the variation in the function values is low;
in a second step, inside each such region, a path is found in some
deterministic way, i.e. not data-dependent. This circumvents the need to store
the paths at each level, while still obtaining good quality lossy compression.
This method is particularly well suited to encode a Region of Interest in the
image with different quality than the rest of the image.

[1] Gerlind Plonka. The easy path wavelet transform: A new adaptive wavelet
transform for sparse representation of two-dimensional data. Multiscale
Modeling & Simulation, 7(3):1474(-)1496, 2008.

The Meta Distribution of the SIR for Cellular Networks with Power Control

Yuanjie Wang, Martin Haenggi, Zhenhui Tan
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

The meta distribution of the signal-to-interference ratio (SIR) provides
fine-grained information about the performance of individual links in a
wireless network. This paper focuses on the analysis of the meta distribution
of the SIR for both the cellular network uplink and downlink with fractional
power control. For the uplink scenario, an approximation of the interfering
user point process with a non-homogeneous Poisson point process is used. The
moments of the meta distribution for both scenarios are calculated. Some
bounds, the analytical expression, the mean local delay, and the beta
approximation of the meta distribution are provided. The results give
interesting insights into the effect of the power control in both the uplink
and downlink. Detailed simulations show that the approximations made in the
analysis are well justified.

Joint Pushing and Caching for Bandwidth Utilization Maximization in Wireless Networks

Yaping Sun, Ying Cui, Hui Liu
Comments: 5 pages, 1 figure, submitted to IEEE ISIT 2017
Subjects: Information Theory (cs.IT)

Joint pushing and caching is recognized as an efficient remedy to the problem
of spectrum scarcity incurred by the tremendous mobile data traffic. In this
paper, by exploiting storage resource at end-users and predictability of
users’s demand processes, we would like to design the optimal joint pushing and
caching to maximize bandwidth utilization, which is one of the most important
concerns for network operators. In particular, we formulate the stochastic
optimization problem as an infinite horizon average cost Markov Decision
Process (MDP), for which there generally exist only numerical solutions without
many insights. By structural analysis, we show how the optimal policy achieves
a balance between the current transmission cost and the future average
transmission cost. In addition, we show that the optimal average transmission
cost decreases with the cache size, revealing a tradeoff between the cache size
and the bandwidth utilization. Moreover, due to the fact that obtaining a
numerical optimal solution suffers the curse of dimensionality and implementing
it requires a centralized controller and global system information, we develop
a decentralized policy with polynomial complexity w.r.t. the numbers of users
and files as well as cache size, by a linear approximation of the value
function and relaxing the original policy optimization problem. The results in
this paper offer useful guidelines for designing practical cache-enabled
content-centric wireless networks.

Temporal-Spatial Aggregation for Cache-Enabled Wireless Multicasting Networks with Asynchronous Content Requests

Jifang Xing, Ying Cui, Vincent Lau
Comments: 5 pages, 2 figures, submitted to IEEE ISIT 2017
Subjects: Information Theory (cs.IT)

Existing multicasting schemes for massive content delivery do not fully
utilize multicasting opportunities in delay tolerant content-oriented
applications. In this paper, we propose a novel temporal-spatial
aggregation-based multicasting scheme in a large-scale cache-enabled wireless
network. The proposed scheme can efficiently exploit multicasting opportunities
in asynchronous content requests to improve spectral efficiency. By making use
of the delay tolerance of elastic services, the proposed scheme achieves a
better energy-throughput-delay tradeoff. Utilizing tools from stochastic
geometry, we derive a tractable expression for the successful transmission
probability in the general region. Using asymptotic approximations, we derive
closed form successful transmission probabilities in the large delay region as
well as the large and small user density regions. The asymptotic results reveal
that the successful transmission probability increases and the energy
consumption decreases at the cost of delay increase in these asymptotic
regions. The analysis in this paper provides a new understanding of the
energy-throughput-delay tradeoff for massive content delivery in large-scale
cache-enabled wireless networks.

Capacity Bounds on the Downlink of Symmetric, Multi-Relay, Single Receiver C-RAN Networks

Shirin Saeedi Bidokhti, Gerhard Kramer, Shlomo Shamai (Shitz)
Subjects: Information Theory (cs.IT)

The downlink of symmetric Cloud Radio Access Networks (C-RAN) with multiple
relays and a single receiver is studied. Lower and upper bounds are derived on
the capacity. The lower bound is achieved by Marton’s coding which facilitates
dependence among the multiple-access channel inputs. The upper bound uses
Ozarow’s technique to augment the system with an auxiliary random variable. The
bounds are studied over scalar Gaussian C-RANs and are shown to meet and
characterize the capacity for interesting regimes of operation.

Multiuser Communication Based on the DFT Eigenstructure

R. M. Campello de Souza, H. M. de Oliveira, R. J. Cintra
Comments: 5 pages, 2 figures, 3 tables
Subjects: Information Theory (cs.IT); Applications (stat.AP)

The eigenstructure of the discrete Fourier transform (DFT) is examined and
new systematic procedures to generate eigenvectors of the unitary DFT are
proposed. DFT eigenvectors are suggested as user signatures for data
communication over the real adder channel (RAC). The proposed multiuser
communication system over the 2-user RAC is detailed.

Sequential Coding of Gauss-Markov Sources over Packet-Erasure Channels with Feedback

Anatoly Khina, Ashish Khisti, Babak Hassibi
Comments: Extended version of a paper submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

We consider the problem of sequential transmission of Gauss-Markov sources
over packet-erasure channels with a possibly delayed output feedback. For the
case of instantaneous feedback, we determine the optimal squared error
distortions for given transmission rates for all time instants, and construct a
scheme that achieves all of them simultaneously. This establishes the optimal
rate-distortion region for sequential coding of Gauss–Markov sources without
packet erasures, as a special case. For the case of delayed feedback, we
connect the problem to that of compression with side information that is known
at the encoder and may be known at the decoder – where the most recent packets
serve as side information that may have been erased. We conclude the paper by
demonstrating that the loss due to a delay by one time instant is rather small.

Successive Local and Successive Global Omniscience

Anoosheh Heidarzadeh, Alex Sprintson
Subjects: Information Theory (cs.IT)

This paper considers two generalizations of the cooperative data exchange
problem, referred to as the successive local omniscience (SLO) and the
successive global omniscience (SGO). The users are divided into (ell) nested
sub-groups. Each user initially knows a subset of packets in a ground set (X)
of size (k), and all users wish to learn all packets in (X). The users exchange
their packets by broadcasting coded or uncoded packets. In SLO or SGO,
respectively, the (l)th ((1leq lleq ell)) smallest sub-group of users need
to learn all packets they collectively hold or all packets in (X), in the (l)th
round of transmissions. The problem is to find the minimum sum-rate (i.e., the
total transmission rate by all users) for each round, subject to minimizing the
sum-rate for the previous round. To solve this problem, we use a
linear-programming approach. For the cases in which the packets are randomly
distributed among users, we construct a system of linear equations whose
solution characterizes the minimum sum-rate for each round with high
probability as (k) tends to infinity. Moreover, for the special case of two
nested groups, we derive closed-form expressions, which hold with high
probability as (k) tends to infinity, for the minimum sum-rate for each round.

One shot entanglement assisted classical and quantum communication over noisy quantum channels: A hypothesis testing and convex split approach

Anurag Anshu, Rahul Jain, Naqueeb Ahmad Warsi
Comments: 29 pages, 8 figure, version 1
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

In this work we study several communication tasks over noisy quantum channels
and provide bounds on the amount of reliable communication between senders and
receivers. Our results are in the one-shot setting with the extra resource of
quantum entanglement between senders and receivers. We show 1) tight bounds on
the amount of reliable communication for the {em point-to-point channel}, 2)
an achievability bound for point-to-point channel with quantum side information
about the channel at the encoder (quantum analogue of the {em
Gel’fand-Pinsker} channel), 3) an achievability bound for point-to-point
channel with limited quantum side information about the channel at the encoder,
4) an achievability bound for the {em broadcast-channel}, a quantum analogue
of the {em Marton inner bound}.

We take a unified approach to arrive at all our results, primarily using {em
hypothesis testing and convex split}. The forms of our results match with that
of the corresponding results for similar communication tasks over noisy
classical channels. To obtain the quantum analogue of Marton inner bound we
prove a (bi-partite) generalization of the convex-split lemma, which may be of
independent interest.

Trimming the Independent Fat: Sufficient Statistics, Mutual Information, and Predictability from Effective Channel States

Ryan G. James, John R. Mahoney, James P. Crutchfield
Comments: 6 pages, 4 figures; this http URL
Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)

One of the most fundamental questions one can ask about a pair of random
variables X and Y is the value of their mutual information. Unfortunately, this
task is often stymied by the extremely large dimension of the variables. We
might hope to replace each variable by a lower-dimensional representation that
preserves the relationship with the other variable. The theoretically ideal
implementation is the use of minimal sufficient statistics, where it is
well-known that either X or Y can be replaced by their minimal sufficient
statistic about the other while preserving the mutual information. While
intuitively reasonable, it is not obvious or straightforward that both
variables can be replaced simultaneously. We demonstrate that this is in fact
possible: the information X’s minimal sufficient statistic preserves about Y is
exactly the information that Y’s minimal sufficient statistic preserves about
X. As an important corollary, we consider the case where one variable is a
stochastic process’ past and the other its future and the present is viewed as
a memoryful channel. In this case, the mutual information is the channel
transmission rate between the channel’s effective states. That is, the
past-future mutual information (the excess entropy) is the amount of
information about the future that can be predicted using the past. Translating
our result about minimal sufficient statistics, this is equivalent to the
mutual information between the forward- and reverse-time causal states of
computational mechanics. We close by discussing multivariate extensions to this
use of minimal sufficient statistics.

A Digital Hardware Fast Algorithm and FPGA-based Prototype for a Novel 16-point Approximate DCT for Image Compression Applications

F. M. Bayer, R. J. Cintra, A. Edirisuriya, A. Madanayake
Comments: 17 pages, 6 figures, 6 tables
Journal-ref: Measurement Science and Technology, Volume 23, Number 11, 2012
Subjects: Multimedia (cs.MM); Hardware Architecture (cs.AR); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Methodology (stat.ME)

The discrete cosine transform (DCT) is the key step in many image and video
coding standards. The 8-point DCT is an important special case, possessing
several low-complexity approximations widely investigated. However, 16-point
DCT transform has energy compaction advantages. In this sense, this paper
presents a new 16-point DCT approximation with null multiplicative complexity.
The proposed transform matrix is orthogonal and contains only zeros and ones.
The proposed transform outperforms the well-know Walsh-Hadamard transform and
the current state-of-the-art 16-point approximation. A fast algorithm for the
proposed transform is also introduced. This fast algorithm is experimentally
validated using hardware implementations that are physically realized and
verified on a 40 nm CMOS Xilinx Virtex-6 XC6VLX240T FPGA chip for a maximum
clock rate of 342 MHz. Rapid prototypes on FPGA for 8-bit input word size shows
significant improvement in compressed image quality by up to 1-2 dB at the cost
of only eight adders compared to the state-of-art 16-point DCT approximation
algorithm in the literature [S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy. A
novel transform for image compression. In {em Proceedings of the 53rd IEEE
International Midwest Symposium on Circuits and Systems (MWSCAS)}, 2010].