IT博客汇 | arXiv Paper Daily: Wed, 19 Apr 2017

arXiv Paper Daily: Wed, 19 Apr 2017

我爱机器学习(52ml.net)发表于 2017-04-19 00:00:00

Neural and Evolutionary Computing

Diagonal RNNs in Symbolic Music Modeling

Y. Cem Subakan, Paris Smaragdis
Comments: Submitted to Waspaa 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we propose a new Recurrent Neural Network (RNN) architecture.
The novelty is simple: We use diagonal recurrent matrices instead of full. This
results in better test likelihood and faster convergence compared to regular
full RNNs in most of our experiments. We show the benefits of using diagonal
recurrent matrices with popularly used LSTM and GRU architectures as well as
with the vanilla RNN architecture, on four standard symbolic music datasets.

A Study of Deep Learning Robustness Against Computation Failures

Jean-Charles Vialatte, François Leduc-Primeau
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

For many types of integrated circuits, accepting larger failure rates in
computations can be used to improve energy efficiency. We study the performance
of faulty implementations of certain deep neural networks based on pessimistic
and optimistic models of the effect of hardware faults. After identifying the
impact of hyperparameters such as the number of layers on robustness, we study
the ability of the network to compensate for computational failures through an
increase of the network size. We show that some networks can achieve equivalent
performance under faulty implementations, and quantify the required increase in
computational complexity.

LibOPT: An Open-Source Platform for Fast Prototyping Soft Optimization Techniques

Joao Paulo Papa, Gustavo Henrique Rosa, Douglas Rodrigues, Xin-She Yang
Subjects: Neural and Evolutionary Computing (cs.NE)

Optimization techniques play an important role in several scientific and
real-world applications, thus becoming of great interest for the community. As
a consequence, a number of open-source libraries are available in the
literature, which ends up fostering the research and development of new
techniques and applications. In this work, we present a new library for the
implementation and fast prototyping of nature-inspired techniques called
LibOPT. Currently, the library implements 15 techniques and 112 benchmarking
functions, as well as it also supports 11 hypercomplex-based optimization
approaches, which makes it one of the first of its kind. We showed how one can
easily use and also implement new techniques in LibOPT under the C paradigm.
Examples are provided with samples of source-code using benchmarking functions.

The Emergence of Canalization and Evolvability in an Open-Ended, Interactive Evolutionary System

Joost Huizinga, Kenneth O. Stanley, Jeff Clune
Comments: SI can be found at: this http URL
Subjects: Neural and Evolutionary Computing (cs.NE)

Natural evolution has produced a tremendous diversity of functional
organisms. Many believe an essential component of this process was the
evolution of evolvability, whereby evolution speeds up its ability to innovate
by generating a more adaptive pool of offspring. One hypothesized mechanism for
evolvability is developmental canalization, wherein certain dimensions of
variation become more likely to be traversed and others are prevented from
being explored (e.g. offspring tend to have similarly sized legs, and mutations
affect the length of both legs, not each leg individually). While ubiquitous in
nature, canalization almost never evolves in computational simulations of
evolution. Not only does that deprive us of in silico models in which to study
the evolution of evolvability, but it also raises the question of which
conditions give rise to this form of evolvability. Answering this question
would shed light on why such evolvability emerged naturally and could
accelerate engineering efforts to harness evolution to solve important
engineering challenges. In this paper we reveal a unique system in which
canalization did emerge in computational evolution. We document that genomes
entrench certain dimensions of variation that were frequently explored during
their evolutionary history. The genetic representation of these organisms also
evolved to be highly modular and hierarchical, and we show that these
organizational properties correlate with increased fitness. Interestingly, the
type of computational evolutionary experiment that produced this evolvability
was very different from traditional digital evolution in that there was no
objective, suggesting that open-ended, divergent evolutionary processes may be
necessary for the evolution of evolvability.

Learning Affine Feature Space Transformations in Symbolic Regression

Jan Žegklitz, Petr Pošík
Subjects: Neural and Evolutionary Computing (cs.NE)

We propose a new type of leaf node for use in Symbolic Regression (SR) that
performs linear combinations of feature variables (LCF). These nodes can be
handled in three different modes — an unsynchronized mode, where all LCFs are
free to change on their own, a synchronized mode, where LCFs are sorted into
groups in which they are forced to be identical throughout the whole
individual, and a globally synchronized mode, which is similar to the previous
mode but the grouping is done across the whole population. We also present two
methods of evolving the weights of the LCFs — a purely stochastic way via
mutation and a gradient-based way based on the backpropagation algorithm known
from neural networks — and also a combination of both. We experimentally
evaluate all configurations of LCFs in Multi-Gene Genetic Programming (MGGP),
which was chosen as baseline, on a number of benchmarks. According to the
results, we identified two configurations which increase the performance of the
algorithm.

A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems

Nikolaos Antoniadis, Angelo Sifaleras
Comments: 8 pages, 1 figure
Journal-ref: Electronic Notes in Discrete Mathematics, Volume 58, April 2017,
Pages 47-54, ISSN 1571-0653
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM)

In this paper, we study various parallelization schemes for the Variable
Neighborhood Search (VNS) metaheuristic on a CPU-GPU system via OpenMP and
OpenACC. A hybrid parallel VNS method is applied to recent benchmark problem
instances for the multi-product dynamic lot sizing problem with product returns
and recovery, which appears in reverse logistics and is known to be NP-hard. We
report our findings regarding these parallelization approaches and present
promising computational results.

On Improving the Capacity of Solving Large-scale Wireless Network Design Problems by Genetic Algorithms

Fabio D'Andreagiovanni
Comments: This is the authors’ final version of the paper published in Di Chio C. et al. (Eds): EvoApplications 2011, LNCS 6625, pp. 11-20, 2011. The final publication is available at Springer via this http URL
Journal-ref: Di Chio C. et al. (Eds), EvoApplications 2011, Springer LNCS vol.
6625, pp. 11-20, 2011
Subjects: Optimization and Control (math.OC); Neural and Evolutionary Computing (cs.NE)

Over the last decade, wireless networks have experienced an impressive growth
and now play a main role in many telecommunications systems. As a consequence,
scarce radio resources, such as frequencies, became congested and the need for
effective and efficient assignment methods arose. In this work, we present a
Genetic Algorithm for solving large instances of the Power, Frequency and
Modulation Assignment Problem, arising in the design of wireless networks. To
our best knowledge, this is the first Genetic Algorithm that is proposed for
such problem. Compared to previous works, our approach allows a wider
exploration of the set of power solutions, while eliminating sources of
numerical problems. The performance of the algorithm is assessed by tests over
a set of large realistic instances of a Fixed WiMAX Network.

Criticality as It Could Be: organizational invariance as self-organized criticality in embodied agents

Miguel Aguilera, Manuel G. Bedia
Subjects: Adaptation and Self-Organizing Systems (nlin.AO); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

This paper outlines a methodological approach to generate adaptive agents
driving themselves near points of criticality. Using a synthetic approach we
construct a conceptual model that, instead of specifying mechanistic
requirements to generate criticality, exploits the maintenance of an
organizational structure capable of reproducing critical behavior. Our approach
captures the well-known principle of universality that classifies critical
phenomena inside a few universality classes of systems without relying on
specific mechanisms or topologies. In particular, we implement an artificial
embodied agent controlled by a neural network maintaining a correlation
structure randomly sampled form a lattice Ising model at a critical point. We
evaluate the agent in two classical reinforcement learning scenarios: the
Mountain Car benchmark and the Acrobot double pendulum, finding that in both
cases the neural controller reaches a point of criticality, which coincides
with a transition point between two regimes of the agent’s behaviour,
maximizing the synergistic information between hidden neurons and sensorimotor
patterns. Finally, we discuss the possible applications of this synthetic
approach to the comprehension of deeper principles connected to the pervasive
presence of criticality in biological and cognitive systems.

Computer Vision and Pattern Recognition

Light Field Blind Motion Deblurring

Pratul P. Srinivasan, Ren Ng, Ravi Ramamoorthi
Comments: To be presented at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We study the problem of deblurring light fields of general 3D scenes captured
under 3D camera motion and present both theoretical and practical
contributions. By analyzing the motion-blurred light field in the primal and
Fourier domains, we develop intuition into the effects of camera motion on the
light field, show the advantages of capturing a 4D light field instead of a
conventional 2D image for motion deblurring, and derive simple methods of
motion deblurring in certain cases. We then present an algorithm to blindly
deblur light fields of general scenes without any estimation of scene geometry,
and demonstrate that we can recover both the sharp light field and the 3D
camera motion path of real and synthetically-blurred light fields.

Ranking to Learn: Feature Ranking and Selection via Eigenvector Centrality

Giorgio Roffo, Simone Melzi
Comments: Preprint version – Lecture Notes in Computer Science – Springer 2017
Journal-ref: New Frontiers in Mining Complex Patterns, Fifth International
workshop, nfMCP2016. Lecture Notes in Computer Science – Springer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

In an era where accumulating data is easy and storing it inexpensive, feature
selection plays a central role in helping to reduce the high-dimensionality of
huge amounts of otherwise meaningless data. In this paper, we propose a
graph-based method for feature selection that ranks features by identifying the
most important ones into arbitrary set of cues. Mapping the problem on an
affinity graph-where features are the nodes-the solution is given by assessing
the importance of nodes through some indicators of centrality, in particular,
the Eigen-vector Centrality (EC). The gist of EC is to estimate the importance
of a feature as a function of the importance of its neighbors. Ranking central
nodes individuates candidate features, which turn out to be effective from a
classification point of view, as proved by a thoroughly experimental section.
Our approach has been tested on 7 diverse datasets from recent literature
(e.g., biological data and object recognition, among others), and compared
against filter, embedded and wrappers methods. The results are remarkable in
terms of accuracy, stability and low execution time.

Interactive Outlining of Pancreatic Cancer Liver Metastases in Ultrasound Images

Jan Egger, Dieter Schmalstieg, Xiaojun Chen, Wolfram G. Zoller, Alexander Hann
Comments: 15 pages, 16 figures, 2 tables, 58 references
Journal-ref: Sci. Rep. 7, 892, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Ultrasound (US) is the most commonly used liver imaging modality worldwide.
Due to its low cost, it is increasingly used in the follow-up of cancer
patients with metastases localized in the liver. In this contribution, we
present the results of an interactive segmentation approach for liver
metastases in US acquisitions. A (semi-) automatic segmentation is still very
challenging because of the low image quality and the low contrast between the
metastasis and the surrounding liver tissue. Thus, the state of the art in
clinical practice is still manual measurement and outlining of the metastases
in the US images. We tackle the problem by providing an interactive
segmentation approach providing real-time feedback of the segmentation results.
The approach has been evaluated with typical US acquisitions from the clinical
routine, and the datasets consisted of pancreatic cancer metastases. Even for
difficult cases, satisfying segmentations results could be achieved because of
the interactive real-time behavior of the approach. In total, 40 clinical
images have been evaluated with our method by comparing the results against
manual ground truth segmentations. This evaluation yielded to an average Dice
Score of 85% and an average Hausdorff Distance of 13 pixels.

A Comment on "Analysis of Video Image Sequences Using Point and Line Correspondences"

Mieczysław A. Kłopotek
Journal-ref: preliminary version of: M.A. K{l}opotek: A comment on “Analysis
of video image sequences using point and line correspondences”. Pattern
Recognition 28(1995)2, pp. 283-292
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we would like to deny the results of Wang et al. raising two
fundamental claims:

* A line does not contribute anything to recognition of motion parameters
from two images

* Four traceable points are not sufficient to recover motion parameters from
two perspective

To be constructive, however, we show that four traceable points are
sufficient to recover motion parameters from two frames under orthogonal
projection and that five points are sufficient to simplify the solution of the
two-frame problem under orthogonal projection to solving a linear equation
system.

Image Fusion With Cosparse Analysis Operator

Rui Gao, Sergiy A. Vorobyov, Hong Zhao
Comments: 12 pages, 4 figures, 1 table, Submitted to IEEE Signal Processing Letters on December 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

The paper addresses the image fusion problem, where multiple images captured
with different focus distances are to be combined into a higher quality
all-in-focus image. Most current approaches for image fusion strongly rely on
the unrealistic noise-free assumption used during the image acquisition, and
then yield limited robustness in fusion processing. In our approach, we
formulate the multi-focus image fusion problem in terms of an analysis sparse
model, and simultaneously perform the restoration and fusion of multi-focus
images. Based on this model, we propose an analysis operator learning, and
define a novel fusion function to generate an all-in-focus image. Experimental
evaluations confirm the effectiveness of the proposed fusion approach both
visually and quantitatively, and show that our approach outperforms
state-of-the-art fusion methods.

Robust Optical Flow Estimation in Rainy Scenes

Ruoteng Li, Robby T. Tan, Loong-Fah Cheong
Comments: 8 pages, ICCV
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a method to estimate optical flow under rainy scenes.
Optical flow estimation in the rainy scenes is considered challenging due to
background degradation introduced by rain streaks and rain accumulation effects
in the scene. Rain accumulation effect refers to poor visibility of remote
object due to the intense rainfall. Most existing optical flow methods are
erroneous when applied to rain sequences because the conventional brightness
constancy constraint (BCC) and gradient constancy constraint (GCC) generally
break down in this situation. In this paper, our method considers the rain
streaks and rain accumulation separately. Based on the fact that the RGB color
channels receive raindrop radiance equally, we introduce a residue channel as a
new data constraint to significantly reduce rain streaks. In the case of rain
accumulation, our method proposes to separate the image into a piecewise-smooth
background layer and a high-frequency detail layer and enforce BCC on the
background layer only. Results on both synthetic dataset and real images show
that our algorithm outperforms existing methods on different types of rain
sequences. To our knowledge, this is the first optical flow method dealing with
rain.

Fast 2-D Complex Gabor Filter with Kernel Decomposition

Suhyuk Um, Jaeyoon Kim, Dongbo Min (Senior Member, IEEE)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

2-D complex Gabor filtering has found numerous applications in the fields of
computer vision and image processing. Especially, in some applications, it is
often needed to compute 2-D complex Gabor filter bank consisting of the 2-D
complex Gabor filtering outputs at multiple orientations and frequencies.
Although several approaches for fast 2-D complex Gabor filtering have been
proposed, they primarily focus on reducing the runtime of performing the 2-D
complex Gabor filtering once at specific orientation and frequency. To obtain
the 2-D complex Gabor filter bank output, existing methods are repeatedly
applied with respect to multiple orientations and frequencies. In this paper,
we propose a novel approach that efficiently computes the 2-D complex Gabor
filter bank by reducing the computational redundancy that arises when
performing the Gabor filtering at multiple orientations and frequencies. The
proposed method first decomposes the Gabor basis kernels to allow a fast
convolution with the Gaussian kernel in a separable manner. This enables
reducing the runtime of the 2-D complex Gabor filter bank by reusing
intermediate results of the 2-D complex Gabor filtering computed at a specific
orientation. Furthermore, we extend this idea into 2-D localized sliding
discrete Fourier transform (SDFT) using the Gaussian kernel in the DFT
computation, which lends a spatial localization ability as in the 2-D complex
Gabor filter. Experimental results demonstrate that our method runs faster than
state-of-the-arts methods for fast 2-D complex Gabor filtering, while
maintaining similar filtering quality.

Deep Self-Taught Learning for Weakly Supervised Object Localization

Zequn Jie, Yunchao Wei, Xiaojie Jin, Jiashi Feng, Wei Liu
Comments: Accepted as spotlight paper by CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most existing weakly supervised localization (WSL) approaches learn detectors
by finding positive bounding boxes based on features learned with image-level
supervision. However, those features do not contain spatial location related
information and usually provide poor-quality positive samples for training a
detector. To overcome this issue, we propose a deep self-taught learning
approach, which makes the detector learn the object-level features reliable for
acquiring tight positive samples and afterwards re-train itself based on them.
Consequently, the detector progressively improves its detection ability and
localizes more informative positive samples. To implement such self-taught
learning, we propose a seed sample acquisition method via image-to-object
transferring and dense subgraph discovery to find reliable positive samples for
initializing the detector. An online supportive sample harvesting scheme is
further proposed to dynamically select the most confident tight positive
samples and train the detector in a mutual boosting way. To prevent the
detector from being trapped in poor optima due to overfitting, we propose a new
relative improvement of predicted CNN scores for guiding the self-taught
learning process. Extensive experiments on PASCAL 2007 and 2012 show that our
approach outperforms the state-of-the-arts, strongly validating its
effectiveness.

Video Object Segmentation using Supervoxel-Based Gerrymandering

Brent A. Griffin, Jason J. Corso
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Pixels operate locally. Superpixels have some potential to collect
information across many pixels; supervoxels have more potential by implicitly
operating across time. In this paper, we explore this well established notion
thoroughly analyzing how supervoxels can be used in place of and in conjunction
with other means of aggregating information across space-time. Focusing on the
problem of strictly unsupervised video object segmentation, we devise a method
called supervoxel gerrymandering that links masks of foregroundness and
backgroundness via local and non-local consensus measures. We pose and answer a
series of critical questions about the ability of supervoxels to adequately
sway local voting; the questions regard type and scale of supervoxels as well
as local versus non-local consensus, and the questions are posed in a general
way so as to impact the broader knowledge of the use of supervoxels in video
understanding. We work with the DAVIS dataset and find that our analysis yields
an unsupervised method that outperforms all other known unsupervised methods
and even many supervised ones.

A Gabor Filter Texture Analysis Approach for Histopathological Brain Tumor Subtype Discrimination

Omar S. Al-Kadi
Comments: 14 pages,4 figures, 2 tables
Journal-ref: ISESCO Journal of Science and Technology, vol. 12, no. 22, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Meningioma brain tumour discrimination is challenging as many histological
patterns are mixed between the different subtypes. In clinical practice,
dominant patterns are investigated for signs of specific meningioma pathology;
however the simple observation could result in inter- and intra-observer
variation due to the complexity of the histopathological patterns. Also
employing a computerised feature extraction approach applied at a single
resolution scale might not suffice in accurately delineating the mixture of
histopathological patterns. In this work we propose a novel multiresolution
feature extraction approach for characterising the textural properties of the
different pathological patterns (i.e. mainly cell nuclei shape, orientation and
spatial arrangement within the cytoplasm). The pattern textural properties are
characterised at various scales and orientations for an improved separability
between the different extracted features. The Gabor filter energy output of
each magnitude response was combined with four other fixed-resolution texture
signatures (2 model-based and 2 statistical-based) with and without cell nuclei
segmentation. The highest classification accuracy of 95% was reported when
combining the Gabor filters energy and the meningioma subimage fractal
signature as a feature vector without performing any prior cell nuceli
segmentation. This indicates that characterising the cell-nuclei
self-similarity properties via Gabor filters can assists in achieving an
improved meningioma subtype classification, which can assist in overcoming
variations in reported diagnosis.

Google's Cloud Vision API Is Not Robust To Noise

Hossein Hosseini, Baicen Xiao, Radha Poovendran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Google has recently introduced the Cloud Vision API for image analysis.
According to the demonstration website, the API “quickly classifies images into
thousands of categories, detects individual objects and faces within images,
and finds and reads printed words contained within images.” It can be also used
to “detect different types of inappropriate content from adult to violent
content.”

In this paper, we evaluate the robustness of the Google’s Cloud Vision API to
input perturbation. In particular, we show that by adding sufficient noise to
the image, the API generates completely different outputs for the noisy image,
while a human observer would perceive its original content. We show that the
attack is consistently successful, by performing extensive experiments on
different image types, including natural images, images containing faces and
images with texts. Our findings indicate the vulnerability of the API in
adversarial environments. For example, an adversary can bypass an image
filtering system by adding noise to an image with inappropriate content.

Unsupervised Learning by Predicting Noise

Piotr Bojanowski, Armand Joulin
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Convolutional neural networks provide visual features that perform remarkably
well in many computer vision applications. However, training these networks
requires significant amounts of supervision. This paper introduces a generic
framework to train deep networks, end-to-end, with no supervision. We propose
to fix a set of target representations, called Noise As Targets (NAT), and to
constrain the deep features to align to them. This domain agnostic approach
avoids the standard unsupervised learning issues of trivial solutions and
collapsing of features. Thanks to a stochastic batch reassignment strategy and
a separable square loss function, it scales to millions of images. The proposed
approach produces representations that perform on par with state-of-the-art
unsupervised methods on ImageNet and Pascal VOC.

Artificial Intelligence

Synergy of all-purpose static solver and temporal reasoning tools in dynamic integrated expert systems

Galina Rybina, Alexey Mozgachev, Dmitry Demidov
Comments: 8 pages, 3 figures
Journal-ref: “Informatsionno-izmeritelnye i upravlyayushchie sistemy”
(Information-measuring and Control Systems) no.8, vol.12, 2014. pp 27-33.
ISSN 2070-0814
Subjects: Artificial Intelligence (cs.AI)

The paper discusses scientific and technological problems of dynamic
integrated expert systems development. Extensions of problem-oriented
methodology for dynamic integrated expert systems development are considered.
Attention is paid to the temporal knowledge representation and processing.

Understanding Negations in Information Processing: Learning from Replicating Human Behavior

Nicolas Pröllochs, Stefan Feuerriegel, Dirk Neumann
Comments: 39 pages
Subjects: Artificial Intelligence (cs.AI); Applications (stat.AP); Machine Learning (stat.ML)

Information systems experience an ever-growing volume of unstructured data,
particularly in the form of textual materials. This represents a rich source of
information from which one can create value for people, organizations and
businesses. For instance, recommender systems can benefit from automatically
understanding preferences based on user reviews or social media. However, it is
difficult for computer programs to correctly infer meaning from narrative
content. One major challenge is negations that invert the interpretation of
words and sentences. As a remedy, this paper proposes a novel learning strategy
to detect negations: we apply reinforcement learning to find a policy that
replicates the human perception of negations based on an exogenous response,
such as a user rating for reviews. Our method yields several benefits, as it
eliminates the former need for expensive and subjective manual labeling in an
intermediate stage. Moreover, the inferred policy can be used to derive
statistical inferences and implications regarding how humans process and act on
negations.

Anomaly detection and motif discovery in symbolic representations of time series

Fabio Guigou (ICube), Pierre Collet (ICube, UNISTRA), Pierre Parrend (ICube)
Subjects: Artificial Intelligence (cs.AI)

The advent of the Big Data hype and the consistent recollection of event logs
and real-time data from sensors, monitoring software and machine configuration
has generated a huge amount of time-varying data in about every sector of the
industry. Rule-based processing of such data has ceased to be relevant in many
scenarios where anomaly detection and pattern mining have to be entirely
accomplished by the machine. Since the early 2000s, the de-facto standard for
representing time series has been the Symbolic Aggregate approXimation (SAX).In
this document, we present a few algorithms using this representation for
anomaly detection and motif discovery, also known as pattern mining, in such
data. We propose a benchmark of anomaly detection algorithms using data from
Cloud monitoring software.

Semantic Similarity from Natural Language and Ontology Analysis

Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, Jacky Montmain
Comments: preprint version of the book Semantic Similarity from Natural Language and Ontology Analysis (Synthesis Lectures on Human Language Technologies – Morgan & Claypool publishers)
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Artificial Intelligence federates numerous scientific fields in the aim of
developing machines able to assist human operators performing complex
treatments — most of which demand high cognitive skills (e.g. learning or
decision processes). Central to this quest is to give machines the ability to
estimate the likeness or similarity between things in the way human beings
estimate the similarity between stimuli.

In this context, this book focuses on semantic measures: approaches designed
for comparing semantic entities such as units of language, e.g. words,
sentences, or concepts and instances defined into knowledge bases. The aim of
these measures is to assess the similarity or relatedness of such semantic
entities by taking into account their semantics, i.e. their meaning —
intuitively, the words tea and coffee, which both refer to stimulating
beverage, will be estimated to be more semantically similar than the words
toffee (confection) and coffee, despite that the last pair has a higher
syntactic similarity. The two state-of-the-art approaches for estimating and
quantifying semantic similarities/relatedness of semantic entities are
presented in detail: the first one relies on corpora analysis and is based on
Natural Language Processing techniques and semantic models while the second is
based on more or less formal, computer-readable and workable forms of knowledge
such as semantic networks, thesaurus or ontologies. (…) Beyond a simple
inventory and categorization of existing measures, the aim of this monograph is
to convey novices as well as researchers of these domains towards a better
understanding of semantic similarity estimation and more generally semantic
measures.

Approximations from Anywhere and General Rough Sets

A. Mani
Comments: 20 Pages. Scheduled to appear in IJCRS’2017 LNCS Proceedings, Springer
Subjects: Logic (math.LO); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Information Theory (cs.IT)

Not all approximations arise from information systems. The problem of fitting
approximations, subjected to some rules (and related data), to information
systems in a rough scheme of things is known as the emph{inverse problem}. The
inverse problem is more general than the duality (or abstract representation)
problems and was introduced by the present author in her earlier papers. From
the practical perspective, a few (as opposed to one) theoretical frameworks may
be suitable for formulating the problem itself. emph{Granular operator spaces}
have been recently introduced and investigated by the present author in her
recent work in the context of antichain based and dialectical semantics for
general rough sets. The nature of the inverse problem is examined from
number-theoretic and combinatorial perspectives in a higher order variant of
granular operator spaces and some necessary conditions are proved. The results
and the novel approach would be useful in a number of unsupervised and semi
supervised learning contexts and algorithms.

The Causality/Repair Connection in Databases: Causality-Programs

Leopoldo Bertossi
Comments: 7-pages, conference submission as short paper
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

In this work, answer-set programs that specify repairs of databases are used
as a basis for solving computational and reasoning problems about causes for
query answers from databases.

Information Retrieval

Mining Worse and Better Opinions. Unsupervised and Agnostic Aggregation of Online Reviews

Michela Fazzolari, Marinella Petrocchi, Alessandro Tommasi, Cesare Zavattari
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

In this paper, we propose a novel approach for aggregating online reviews,
according to the opinions they express. Our methodology is unsupervised – due
to the fact that it does not rely on pre-labeled reviews – and it is agnostic –
since it does not make any assumption about the domain or the language of the
review content. We measure the adherence of a review content to the domain
terminology extracted from a review set. First, we demonstrate the
informativeness of the adherence metric with respect to the score associated
with a review. Then, we exploit the metric values to group reviews, according
to the opinions they express. Our experimental campaign has been carried out on
two large datasets collected from Booking and Amazon, respectively.

FEUP at SemEval-2017 Task 5: Predicting Sentiment Polarity and Intensity with Financial Word Embeddings

Pedro Saleiro, Eduarda Mendes Rodrigues, Carlos Soares, Eugénio Oliveira
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

This paper presents the approach developed at the Faculty of Engineering of
University of Porto, to participate in SemEval 2017, Task 5: Fine-grained
Sentiment Analysis on Financial Microblogs and News. The task consisted in
predicting a real continuous variable from -1.0 to +1.0 representing the
polarity and intensity of sentiment concerning companies/stocks mentioned in
short texts. We modeled the task as a regression analysis problem and combined
traditional techniques such as pre-processing short texts, bag-of-words
representations and lexical-based features with enhanced financial specific
bag-of-embeddings. We used an external collection of tweets and news headlines
mentioning companies/stocks from S&P 500 to create financial word embeddings
which are able to capture domain-specific syntactic and semantic similarities.
The resulting approach obtained a cosine similarity score of 0.69 in sub-task
5.1 – Microblogs and 0.68 in sub-task 5.2 – News Headlines.

Computation and Language

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Adina Williams, Nikita Nangia, Samuel R. Bowman
Comments: 10 pages, 2 figures, 5 tables, submitted to EMNLP 2017
Subjects: Computation and Language (cs.CL)

This paper introduces the Multi-Genre Natural Language Inference (MultiNLI)
corpus, a dataset designed for use in the development and evaluation of machine
learning models for sentence understanding. In addition to being one of the
largest corpora available for the task of NLI, at 433k examples, this corpus
improves upon available resources in its coverage: it offers data from ten
distinct genres of written and spoken English–making it possible to evaluate
systems on nearly the full complexity of the language–and it offers an
explicit setting for the evaluation of cross-genre domain adaptation.

An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification

Cristina España-Bonet, Ádám Csaba Varga, Alberto Barrón-Cedeño, Josef van Genabith
Comments: 15 pages, 3 figures
Subjects: Computation and Language (cs.CL)

End-to-end neural machine translation has overtaken statistical machine
translation in terms of translation quality for some language pairs, specially
those with a large amount of parallel data available. Beside this palpable
improvement, neural networks embrace several new properties. A single system
can be trained to translate between many languages at almost no additional cost
other than training time. Furthermore, internal representations learned by the
network serve as a new semantic representation of words -or sentences- which,
unlike standard word embeddings, are learned in an essentially bilingual or
even multilingual context. In view of these properties, the contribution of the
present work is two-fold. First, we systematically study the context vectors,
i.e. output of the encoder, and their prowess as an interlingua representation
of a sentence. Their quality and effectiveness are assessed by similarity
measures across translations, semantically related, and semantically unrelated
sentence pairs. Second, and as extrinsic evaluation of the first point, we
identify parallel sentences in comparable corpora, obtaining an F1=98.2% on
data from a shared task when using only context vectors. F1 reaches 98.9% when
complementary similarity measures are used.

Representing Sentences as Low-Rank Subspaces

Jiaqi Mu, Suma Bhat, Pramod Viswanath
Subjects: Computation and Language (cs.CL)

Sentences are important semantic units of natural language. A generic,
distributional representation of sentences that can capture the latent
semantics is beneficial to multiple downstream applications. We observe a
simple geometry of sentences — the word representations of a given sentence
(on average 10.23 words in all SemEval datasets with a standard deviation 4.84)
roughly lie in a low-rank subspace (roughly, rank 4). Motivated by this
observation, we represent a sentence by the low-rank subspace spanned by its
word vectors. Such an unsupervised representation is empirically validated via
semantic textual similarity tasks on 19 different datasets, where it
outperforms the sophisticated neural network models, including skip-thought
vectors, by 15% on average.

Baselines and test data for cross-lingual inference

Željko Agić, Natalie Schluter
Comments: Submitted for review at EMNLP 2017
Subjects: Computation and Language (cs.CL)

Research in natural language inference is currently exclusive to English.
Here, we propose to advance toward multilingual evaluation. To that end, we
provide test data for four major languages. We experiment with a set of
baselines based on cross-lingual embeddings and machine translation. While our
best system scores an average accuracy of just over 75%, we focus largely on
enabling further research in multilingual inference.

Sentiment analysis based on rhetorical structure theory: Learning deep neural networks from discourse trees

Mathias Kraus, Stefan Feuerriegel
Subjects: Computation and Language (cs.CL)

Prominent applications of sentiment analysis are countless, including areas
such as marketing, customer service and communication. The conventional
bag-of-words approach for measuring sentiment merely counts term frequencies;
however, it neglects the position of the terms within the discourse. As a
remedy, we thus develop a discourse-aware method that builds upon the discourse
structure of documents. For this purpose, we utilize rhetorical structure
theory to label (sub-)clauses according to their hierarchical relationships and
then assign polarity scores to individual leaves. To learn from the resulting
rhetoric structure, we propose a tensor-based, tree-structured deep neural
network (named RST-LSTM) in order to process the complete discourse tree. The
underlying attention mechanism infers the salient passages of narrative
materials. In addition, we suggest two algorithms for data augmentation (node
reordering and artificial leaf insertion) that increase our training set and
reduce overfitting. Our benchmarks demonstrate the superior performance of our
approach. Ultimately, this work advances the status quo in natural language
processing by developing algorithms that incorporate semantic information.

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

Matthew Dunn, Levent Sagun, Mike Higgins, Ugur Guney, Volkan Cirik, Kyunghyun Cho
Subjects: Computation and Language (cs.CL)

We publicly release a new large-scale dataset, called SearchQA, for machine
comprehension, or question-answering. Unlike recently released datasets, such
as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to
reflect a full pipeline of general question-answering. That is, we start not
from an existing article and generate a question-answer pair, but start from an
existing question-answer pair, crawled from J! Archive, and augment it with
text snippets retrieved by Google. Following this approach, we built SearchQA,
which consists of more than 140k question-answer pairs with each pair having
49.6 snippets on average. Each question-answer-context tuple of the SearchQA
comes with additional meta-data such as the snippet’s URL, which we believe
will be valuable resources for future research. We conduct human evaluation as
well as test two baseline methods, one simple word selection and the other deep
learning based, on the SearchQA. We show that there is a meaningful gap between
the human and machine performances. This suggests that the proposed dataset
could well serve as a benchmark for question-answering.

Automatic Disambiguation of French Discourse Connectives

Majid Laali, Leila Kosseim
Journal-ref: International Journal of Computational Linguistics and
Applications, vol. 7, no. 1, 2016, pp. 11-30
Subjects: Computation and Language (cs.CL)

Discourse connectives (e.g. however, because) are terms that can explicitly
convey a discourse relation within a text. While discourse connectives have
been shown to be an effective clue to automatically identify discourse
relations, they are not always used to convey such relations, thus they should
first be disambiguated between discourse-usage non-discourse-usage. In this
paper, we investigate the applicability of features proposed for the
disambiguation of English discourse connectives for French. Our results with
the French Discourse Treebank (FDTB) show that syntactic and lexical features
developed for English texts are as effective for French and allow the
disambiguation of French discourse connectives with an accuracy of 94.2%.

FEUP at SemEval-2017 Task 5: Predicting Sentiment Polarity and Intensity with Financial Word Embeddings

Pedro Saleiro, Eduarda Mendes Rodrigues, Carlos Soares, Eugénio Oliveira
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

This paper presents the approach developed at the Faculty of Engineering of
University of Porto, to participate in SemEval 2017, Task 5: Fine-grained
Sentiment Analysis on Financial Microblogs and News. The task consisted in
predicting a real continuous variable from -1.0 to +1.0 representing the
polarity and intensity of sentiment concerning companies/stocks mentioned in
short texts. We modeled the task as a regression analysis problem and combined
traditional techniques such as pre-processing short texts, bag-of-words
representations and lexical-based features with enhanced financial specific
bag-of-embeddings. We used an external collection of tweets and news headlines
mentioning companies/stocks from S&P 500 to create financial word embeddings
which are able to capture domain-specific syntactic and semantic similarities.
The resulting approach obtained a cosine similarity score of 0.69 in sub-task
5.1 – Microblogs and 0.68 in sub-task 5.2 – News Headlines.

Mining Worse and Better Opinions. Unsupervised and Agnostic Aggregation of Online Reviews

Michela Fazzolari, Marinella Petrocchi, Alessandro Tommasi, Cesare Zavattari
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

In this paper, we propose a novel approach for aggregating online reviews,
according to the opinions they express. Our methodology is unsupervised – due
to the fact that it does not rely on pre-labeled reviews – and it is agnostic –
since it does not make any assumption about the domain or the language of the
review content. We measure the adherence of a review content to the domain
terminology extracted from a review set. First, we demonstrate the
informativeness of the adherence metric with respect to the score associated
with a review. Then, we exploit the metric values to group reviews, according
to the opinions they express. Our experimental campaign has been carried out on
two large datasets collected from Booking and Amazon, respectively.

Semantic Similarity from Natural Language and Ontology Analysis

Artificial Intelligence federates numerous scientific fields in the aim of
developing machines able to assist human operators performing complex
treatments — most of which demand high cognitive skills (e.g. learning or
decision processes). Central to this quest is to give machines the ability to
estimate the likeness or similarity between things in the way human beings
estimate the similarity between stimuli.

In this context, this book focuses on semantic measures: approaches designed
for comparing semantic entities such as units of language, e.g. words,
sentences, or concepts and instances defined into knowledge bases. The aim of
these measures is to assess the similarity or relatedness of such semantic
entities by taking into account their semantics, i.e. their meaning —
intuitively, the words tea and coffee, which both refer to stimulating
beverage, will be estimated to be more semantically similar than the words
toffee (confection) and coffee, despite that the last pair has a higher
syntactic similarity. The two state-of-the-art approaches for estimating and
quantifying semantic similarities/relatedness of semantic entities are
presented in detail: the first one relies on corpora analysis and is based on
Natural Language Processing techniques and semantic models while the second is
based on more or less formal, computer-readable and workable forms of knowledge
such as semantic networks, thesaurus or ontologies. (…) Beyond a simple
inventory and categorization of existing measures, the aim of this monograph is
to convey novices as well as researchers of these domains towards a better
understanding of semantic similarity estimation and more generally semantic
measures.

Does Neural Machine Translation Benefit from Larger Context?

Sebastien Jean, Stanislas Lauly, Orhan Firat, Kyunghyun Cho
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

We propose a neural machine translation architecture that models the
surrounding text in addition to the source sentence. These models lead to
better performance, both in terms of general translation quality and pronoun
prediction, when trained on small corpora, although this improvement largely
disappears when trained with a larger corpus. We also discover that
attention-based neural machine translation is well suited for pronoun
prediction and compares favorably with other approaches that were specifically
designed for this task.

Exploring Sparsity in Recurrent Neural Networks

Sharan Narang, Gregory Diamos, Shubho Sengupta, Erich Elsen
Comments: Published as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

Recurrent Neural Networks (RNN) are widely used to solve a variety of
problems and as the quantity of data and the amount of available compute have
increased, so have model sizes. The number of parameters in recent
state-of-the-art networks makes them hard to deploy, especially on mobile
phones and embedded devices. The challenge is due to both the size of the model
and the time it takes to evaluate it. In order to deploy these RNNs
efficiently, we propose a technique to reduce the parameters of a network by
pruning weights during the initial training of the network. At the end of
training, the parameters of the network are sparse while accuracy is still
close to the original dense neural network. The network size is reduced by 8x
and the time required to train the model remains constant. Additionally, we can
prune a larger dense network to achieve better than baseline performance while
still reducing the total number of parameters significantly. Pruning RNNs
reduces the size of the model and can also help achieve significant inference
time speed-up using sparse matrix multiply. Benchmarks show that using our
technique model size can be reduced by 90% and speed-up is around 2x to 7x.

Distributed, Parallel, and Cluster Computing

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Suejb Memeti, Lu Li, Sabri Pllana, Joanna Kolodziej, Christoph Kessler
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Programming Languages (cs.PL); Software Engineering (cs.SE)

Many modern parallel computing systems are heterogeneous at their node level.
Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or
Intel Xeon Phi) that provide high performance with suitable energy-consumption
characteristics. However, exploiting the available performance of heterogeneous
architectures may be challenging. There are various parallel programming
frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that
is suitable for a target context is not straightforward.

In this paper, we study empirically the characteristics of OpenMP, OpenACC,
OpenCL, and CUDA with respect to programming productivity, performance, and
energy. To evaluate the programming productivity we use our homegrown tool
CodeStat, which enables us to determine the percentage of code lines that was
required to parallelize the code using a specific framework. We use our tool
x-MeterPU to evaluate the energy consumption and the performance. Experiments
are conducted using the industry-standard SPEC benchmark suite and the Rodinia
benchmark suite for accelerated computing on heterogeneous systems that combine
Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi
co-processor.

Scalable Global Grid catalogue for LHC Run3 and beyond

M Martinez Pedreira, C Grigoras for the ALICE Collaboration
Comments: Proceedings of the 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, 10-14 October 2016, San Francisco. Submitted to Journal of Physics: Conference Series (JPCS)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

The AliEn (ALICE Environment) file catalogue is a global unique namespace
providing mapping between a UNIX-like logical name structure and the
corresponding physical files distributed over 80 storage elements worldwide.
Powerful search tools and hierarchical metadata information are integral parts
of the system and are used by the Grid jobs as well as local users to store and
access all files on the Grid storage elements. The catalogue has been in
production since 2005 and over the past 11 years has grown to more than 2
billion logical file names. The backend is a set of distributed relational
databases, ensuring smooth growth and fast access. Due to the anticipated fast
future growth, we are looking for ways to enhance the performance and
scalability by simplifying the catalogue schema while keeping the functionality
intact. We investigated different backend solutions, such as distributed key
value stores, as replacement for the relational database. This contribution
covers the architectural changes in the system, together with the technology
evaluation, benchmark results and conclusions.

Making data center computations fast, but not so furious

Daniel Porto, João Loff, Rui Duarte, Luis Ceze, Rodrigo Rodrigues
Comments: The 7th Workshop on Multi-core and Rack Scale Systems – MARS’17
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We propose an aggressive computational sprinting variant for data center
environments. While most of previous work on computational sprinting focuses on
maximizing the sprinting process while ensuring non-faulty conditions, we take
advantage of the existing replication in data centers to push the system beyond
its safety limits. In this paper we outline this vision, we survey existing
techniques for achieving it, and we present some design ideas for future work
in this area.

A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems

In this paper, we study various parallelization schemes for the Variable
Neighborhood Search (VNS) metaheuristic on a CPU-GPU system via OpenMP and
OpenACC. A hybrid parallel VNS method is applied to recent benchmark problem
instances for the multi-product dynamic lot sizing problem with product returns
and recovery, which appears in reverse logistics and is known to be NP-hard. We
report our findings regarding these parallelization approaches and present
promising computational results.

Conceptual Frameworks for Building Online Citizen Science Projects

Poonam Yadav, John Darlington
Journal-ref: Human Computation (2016) 3:1:213-223 ISSN: 2330-8001, DOI:
10.15346/hc.v3i1.12
Subjects: Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE); Social and Information Networks (cs.SI)

In recent years, citizen science has grown in popularity due to a number of
reasons, including the emphasis on informal learning and creativity potential
associated with these initiatives. Citizen science projects address research
questions from various domains, ranging from Ecology to Astronomy. Due to the
advancement of communication technologies, which makes outreach and engagement
of wider communities easier, scientists are keen to turn their own research
into citizen science projects. However, the development, deployment and
management of these projects remains challenging. One of the most important
challenges is building the project itself. There is no single tool or
framework, which guides the step-by-step development of the project, since
every project has specific characteristics, such as geographical constraints or
volunteers’ mode of participation. Therefore, in this article, we present a
series of conceptual frameworks for categorisation, decision and deployment,
which guide a citizen science project creator in every step of creating a new
project starting from the research question to project deployment. The
frameworks are designed with consideration to the properties of already
existing citizen science projects and could be easily extended to include other
dimensions, which are not currently perceived.

A Secure Key Agreement Protocol for Dynamic Group

Muhammad Bilal, Shin-Gak Kang
Comments: This article is accepted for the publication in Cluster Computing-The Journal of Networks, Software Tools and Applications. Print ISSN 1386-7857, Online ISSN 1573-7543
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

To accomplish secure group communication, it is essential to share a unique
cryptographic key among group members. The underlying challenges to group key
agreement are scalability, efficiency, and security. In a dynamic group
environment, the rekeying process is more frequent; therefore, it is more
crucial to design an efficient group key agreement protocol. Moreover, with the
emergence of various group-based services, it is becoming common for several
multicast groups to coexist in the same network. These multicast groups may
have several shared users; a join or leave request by a single user can trigger
regeneration of multiple group keys. Under the given circumstances the rekeying
process becomes a challenging task. In this work, we propose a novel
methodology for group key agreement which exploits the state vectors of group
members. The state vector is a set of randomly generated nonce instances which
determine the logical link between group members and which empowers the group
member to generate multiple cryptographic keys independently. Using local
knowledge of a secret nonce, each member can generate and share a large number
of secure keys, indicating that SGRS inherently provides a considerable amount
of secure subgroup multicast communication using subgroup multicasting keys
derived from local state vectors. The resulting protocol is secure and
efficient in terms of both communication and computation.

Learning

Hot or not? Forecasting cellular network hot spots using sector performance indicators

Joan Serrà, Ilias Leontiadis, Alexandros Karatzoglou, Konstantina Papagiannaki
Comments: Accepted for publication at ICDE 2017 – Industrial Track
Subjects: Learning (cs.LG); Networking and Internet Architecture (cs.NI); Systems and Control (cs.SY)

To manage and maintain large-scale cellular networks, operators need to know
which sectors underperform at any given time. For this purpose, they use the
so-called hot spot score, which is the result of a combination of multiple
network measurements and reflects the instantaneous overall performance of
individual sectors. While operators have a good understanding of the current
performance of a network and its overall trend, forecasting the performance of
each sector over time is a challenging task, as it is affected by both regular
and non-regular events, triggered by human behavior and hardware failures. In
this paper, we study the spatio-temporal patterns of the hot spot score and
uncover its regularities. Based on our observations, we then explore the
possibility to use recent measurements’ history to predict future hot spots. To
this end, we consider tree-based machine learning models, and study their
performance as a function of time, amount of past data, and prediction horizon.
Our results indicate that, compared to the best baseline, tree-based models can
deliver up to 14% better forecasts for regular hot spots and 153% better
forecasts for non-regular hot spots. The latter brings strong evidence that,
for moderate horizons, forecasts can be made even for sectors exhibiting
isolated, non-regular behavior. Overall, our work provides insight into the
dynamics of cellular sectors and their predictability. It also paves the way
for more proactive network operations with greater forecasting horizons.

HPSLPred: An Ensemble Multi-label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source

Shixiang Wan, Quan Zou
Subjects: Learning (cs.LG)

Predicting the subcellular localization of proteins is an important and
challenging problem. Traditional experimental approaches are often expensive
and time-consuming. Consequently, a growing number of research efforts employ a
series of machine learning approaches to predict the subcellular location of
proteins. There are two main challenges among the state-of-the-art prediction
methods. First, most of the existing techniques are designed to deal with
multi-class rather than multi-label classification, which ignores connections
between multiple labels. In reality, multiple locations of particular proteins
implies that there are vital and unique biological significances that deserve
special focus and cannot be ignored. Second, techniques for handling imbalanced
data in multi-label classification problems are necessary, but never employed.
For solving these two issues, we have developed an ensemble multi-label
classifier called HPSLPred, which can be applied for multi-label classification
with an imbalanced protein source. For convenience, a user-friendly webserver
has been established at this http URL

Stein Variational Autoencoder

Yunchen Pu, Zhe Gan, Ricardo Henao, Chunyuan Li, Shaobo Han, Lawrence Carin
Subjects: Learning (cs.LG)

A new method for learning variational autoencoders is developed, based on an
application of Stein’s operator. The framework represents the encoder as a deep
nonlinear function through which samples from a simple distribution are fed.
One need not make parametric assumptions about the form of the encoder
distribution, and performance is further enhanced by integrating the proposed
encoder with importance sampling. Example results are demonstrated across
multiple unsupervised and semi-supervised problems, including semi-supervised
analysis of the ImageNet data, demonstrating the scalability of the model to
large datasets.

O(^2)TD: (Near)-Optimal Off-Policy TD Learning

Bo Liu, Daoming Lyu, Wen Dong, Saad Biaz
Comments: 10 pages, 7 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Temporal difference learning and Residual Gradient methods are the most
widely used temporal difference based learning algorithms; however, it has been
shown that none of their objective functions is optimal w.r.t approximating the
true value function (V). Two novel algorithms are proposed to approximate the
true value function (V). This paper makes the following contributions: (1) A
batch algorithm that can help find the approximate optimal off-policy
prediction of the true value function (V). (2) A linear computational cost (per
step) near-optimal algorithm that can learn from a collection of off-policy
samples. (3) A new perspective of the emphatic temporal difference learning
which bridges the gap between off-policy optimality and off-policy stability.

Exploring Sparsity in Recurrent Neural Networks

Sharan Narang, Gregory Diamos, Shubho Sengupta, Erich Elsen
Comments: Published as a conference paper at ICLR 2017
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

Recurrent Neural Networks (RNN) are widely used to solve a variety of
problems and as the quantity of data and the amount of available compute have
increased, so have model sizes. The number of parameters in recent
state-of-the-art networks makes them hard to deploy, especially on mobile
phones and embedded devices. The challenge is due to both the size of the model
and the time it takes to evaluate it. In order to deploy these RNNs
efficiently, we propose a technique to reduce the parameters of a network by
pruning weights during the initial training of the network. At the end of
training, the parameters of the network are sparse while accuracy is still
close to the original dense neural network. The network size is reduced by 8x
and the time required to train the model remains constant. Additionally, we can
prune a larger dense network to achieve better than baseline performance while
still reducing the total number of parameters significantly. Pruning RNNs
reduces the size of the model and can also help achieve significant inference
time speed-up using sparse matrix multiply. Benchmarks show that using our
technique model size can be reduced by 90% and speed-up is around 2x to 7x.

Diagonal RNNs in Symbolic Music Modeling

Y. Cem Subakan, Paris Smaragdis
Comments: Submitted to Waspaa 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we propose a new Recurrent Neural Network (RNN) architecture.
The novelty is simple: We use diagonal recurrent matrices instead of full. This
results in better test likelihood and faster convergence compared to regular
full RNNs in most of our experiments. We show the benefits of using diagonal
recurrent matrices with popularly used LSTM and GRU architectures as well as
with the vanilla RNN architecture, on four standard symbolic music datasets.

Ranking to Learn: Feature Ranking and Selection via Eigenvector Centrality

In an era where accumulating data is easy and storing it inexpensive, feature
selection plays a central role in helping to reduce the high-dimensionality of
huge amounts of otherwise meaningless data. In this paper, we propose a
graph-based method for feature selection that ranks features by identifying the
most important ones into arbitrary set of cues. Mapping the problem on an
affinity graph-where features are the nodes-the solution is given by assessing
the importance of nodes through some indicators of centrality, in particular,
the Eigen-vector Centrality (EC). The gist of EC is to estimate the importance
of a feature as a function of the importance of its neighbors. Ranking central
nodes individuates candidate features, which turn out to be effective from a
classification point of view, as proved by a thoroughly experimental section.
Our approach has been tested on 7 diverse datasets from recent literature
(e.g., biological data and object recognition, among others), and compared
against filter, embedded and wrappers methods. The results are remarkable in
terms of accuracy, stability and low execution time.

A Study of Deep Learning Robustness Against Computation Failures

Jean-Charles Vialatte, François Leduc-Primeau
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

For many types of integrated circuits, accepting larger failure rates in
computations can be used to improve energy efficiency. We study the performance
of faulty implementations of certain deep neural networks based on pessimistic
and optimistic models of the effect of hardware faults. After identifying the
impact of hyperparameters such as the number of layers on robustness, we study
the ability of the network to compensate for computational failures through an
increase of the network size. We show that some networks can achieve equivalent
performance under faulty implementations, and quantify the required increase in
computational complexity.

Unsupervised Learning by Predicting Noise

Piotr Bojanowski, Armand Joulin
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Convolutional neural networks provide visual features that perform remarkably
well in many computer vision applications. However, training these networks
requires significant amounts of supervision. This paper introduces a generic
framework to train deep networks, end-to-end, with no supervision. We propose
to fix a set of target representations, called Noise As Targets (NAT), and to
constrain the deep features to align to them. This domain agnostic approach
avoids the standard unsupervised learning issues of trivial solutions and
collapsing of features. Thanks to a stochastic batch reassignment strategy and
a separable square loss function, it scales to millions of images. The proposed
approach produces representations that perform on par with state-of-the-art
unsupervised methods on ImageNet and Pascal VOC.

Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models

Yannis Papanikolaou, Grigorios Tsoumakas, Manos Laliotis, Nikos Markantonatos, Ioannis Vlahavas
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Background: In this paper we present the approaches and methods employed in
order to deal with a large scale multi-label semantic indexing task of
biomedical papers. This work was mainly implemented within the context of the
BioASQ challenge of 2014. Methods: The main contribution of this work is a
multi-label ensemble method that incorporates a McNemar statistical
significance test in order to validate the combination of the constituent
machine learning algorithms. Some secondary contributions include a study on
the temporal aspects of the BioASQ corpus (observations apply also to the
BioASQ’s super-set, the PubMed articles collection) and the proper adaptation
of the algorithms used to deal with this challenging classification task.
Results: The ensemble method we developed is compared to other approaches in
experimental scenarios with subsets of the BioASQ corpus giving positive
results. During the BioASQ 2014 challenge we obtained the first place during
the first batch and the third in the two following batches. Our success in the
BioASQ challenge proved that a fully automated machine-learning approach, which
does not implement any heuristics and rule-based approaches, can be highly
competitive and outperform other approaches in similar challenging contexts.

Know Your Master: Driver Profiling-based Anti-theft Method

Byung Il Kwak, JiYoung Woo, Huy Kang Kim
Comments: 8 pages, 11 figures, Accepted for PST 2016 : 14th International Conference on Privacy, Security and Trust
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

Although many anti-theft technologies are implemented, auto-theft is still
increasing. Also, security vulnerabilities of cars can be used for auto-theft
by neutralizing anti-theft system. This keyless auto-theft attack will be
increased as cars adopt computerized electronic devices more. To detect
auto-theft efficiently, we propose the driver verification method that analyzes
driving patterns using measurements from the sensor in the vehicle. In our
model, we add mechanical features of automotive parts that are excluded in
previous works, but can be differentiated by drivers’ driving behaviors. We
design the model that uses significant features through feature selection to
reduce the time cost of feature processing and improve the detection
performance. Further, we enrich the feature set by deriving statistical
features such as mean, median, and standard deviation. This minimizes the
effect of fluctuation of feature values per driver and finally generates the
reliable model. We also analyze the effect of the size of sliding window on
performance to detect the time point when the detection becomes reliable and to
inform owners the theft event as soon as possible. We apply our model with real
driving and show the contribution of our work to the literature of driver
identification.

Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction

Kun Gai, Xiaoqiang Zhu, Han Li, Kai Liu, Zhe Wang
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

CTR prediction in real-world business is a difficult machine learning problem
with large scale nonlinear sparse data. In this paper, we introduce an
industrial strength solution with model named Large Scale Piece-wise Linear
Model (LS-PLM). We formulate the learning problem with (L_1) and (L_{2,1})
regularizers, leading to a non-convex and non-smooth optimization problem.
Then, we propose a novel algorithm to solve it efficiently, based on
directional derivatives and quasi-Newton method. In addition, we design a
distributed system which can run on hundreds of machines parallel and provides
us with the industrial scalability. LS-PLM model can capture nonlinear patterns
from massive sparse data, saving us from heavy feature engineering jobs. Since
2012, LS-PLM has become the main CTR prediction model in Alibaba’s online
display advertising system, serving hundreds of millions users every day.

Does Neural Machine Translation Benefit from Larger Context?

Sebastien Jean, Stanislas Lauly, Orhan Firat, Kyunghyun Cho
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

We propose a neural machine translation architecture that models the
surrounding text in addition to the source sentence. These models lead to
better performance, both in terms of general translation quality and pronoun
prediction, when trained on small corpora, although this improvement largely
disappears when trained with a larger corpus. We also discover that
attention-based neural machine translation is well suited for pronoun
prediction and compares favorably with other approaches that were specifically
designed for this task.

Does robustness imply tractability? A lower bound for planted clique in the semi-random model

Jacob Steinhardt
Subjects: Computational Complexity (cs.CC); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)

We consider a robust analog of the planted clique problem. In this analog, a
set (S) of vertices is chosen and all edges in (S) are included; then, edges
between (S) and the rest of the graph are included with probability
(frac{1}{2}), while edges not touching (S) are allowed to vary arbitrarily.
For this semi-random model, we show that the information-theoretic threshold
for recovery is ( ilde{Theta}(sqrt{n})), in sharp contrast to the classical
information-theoretic threshold of (Theta(log(n))). This matches the
conjectured computational threshold for the classical planted clique problem,
and thus raises the intriguing possibility that, once we require robustness,
there is no computational-statistical gap for planted clique.

Google's Cloud Vision API Is Not Robust To Noise

Hossein Hosseini, Baicen Xiao, Radha Poovendran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Google has recently introduced the Cloud Vision API for image analysis.
According to the demonstration website, the API “quickly classifies images into
thousands of categories, detects individual objects and faces within images,
and finds and reads printed words contained within images.” It can be also used
to “detect different types of inappropriate content from adult to violent
content.”

In this paper, we evaluate the robustness of the Google’s Cloud Vision API to
input perturbation. In particular, we show that by adding sufficient noise to
the image, the API generates completely different outputs for the noisy image,
while a human observer would perceive its original content. We show that the
attack is consistently successful, by performing extensive experiments on
different image types, including natural images, images containing faces and
images with texts. Our findings indicate the vulnerability of the API in
adversarial environments. For example, an adversary can bypass an image
filtering system by adding noise to an image with inappropriate content.

Information Theory

Reverse Engineering of Communications Networks: Evolution and Challenges

Mehdi Teimouri, Hamidreza Kakaei Motlagh
Comments: 18 pages, 9 figures
Subjects: Information Theory (cs.IT)

Reverse engineering of a communications network is the process of identifying
the communications protocol used in the network. This problem arises in various
situations such as eavesdropping, intelligent jamming, cognitive radio, and
adaptive coding and modulation (ACM). According to the Open Systems
Interconnection (OSI) reference model, the first step in reverse engineering of
communications networks is recognition of physical layer which consists of
recognition of digital modulations and identification of physical layer
transmission techniques. The next step is recognition of data link layer
(consisting of frame synchronization, recognition of channel codes,
reconstruction of interleavers, reconstruction of scramblers, etc.) and also
recognition of network and transport layers. The final step in reverse
engineering of communications networks is recognition of upper layers which
essentially can be seen as identification of source encoders. The objective of
this paper is to provide a comprehensive overview on the current methods for
reverse engineering of communications networks. Furthermore, challenges and
open research issues in this field are introduced.

Wave-like Decoding of Tail-biting Spatially Coupled LDPC Codes Through Iterative Demapping

Sebastian Cammerer, Laurent Schmalen, Vahid Aref, Stephan ten Brink
Comments: presentat at the International Symposium on Turbo Codes & Iterative Information Processing (ISTC), Brest, Sept. 2016
Subjects: Information Theory (cs.IT)

For finite coupling lengths, terminated spatially coupled low-density
parity-check (SC-LDPC) codes show a non-negligible rate-loss. In this paper, we
investigate if this rate loss can be mitigated by tail-biting SC-LDPC codes in
conjunction with iterative demapping of higher order modulation formats.
Therefore, we examine the BP threshold of different coupled and uncoupled
ensembles. A comparison between the decoding thresholds approximated by EXIT
charts and the density evolution results of the coupled and uncoupled ensemble
is given. We investigate the effect and potential of different labelings for
such a set-up using per-bit EXIT curves, and exemplify the method for a 16-QAM
system, e.g., using set partitioning labelings. A hybrid mapping is proposed,
where different sub-blocks use different labelings in order to further optimize
the decoding thresholds of tail-biting codes, while the computational
complexity overhead through iterative demapping remains small.

Power Efficient Hybrid Beamforming for Massive MIMO Public Channel

Cheng Zhang, Yongming Huang, Yindi Jing, Luxi Yang
Comments: submitted to journal
Subjects: Information Theory (cs.IT)

For hybrid massive MIMO public channel with any sector size in either
microwave or millimeter wave (mmwave) band, this paper studies the hybrid
beamforming optimized to minimize the transmit power while guaranteeing the
quality of service (QoS) for randomly deployed users. First the ideal
beampattern is derived via Parseval Identity, based on which a beamforming
design problem is formulated to minimize the gap with the idea beampattern. The
problem is transformable to a multiconvex one and an iterative optimization
algorithm is used to obtain the full-digital beamformer. In addition, with the
help of same beampattern theorem, the power amplifier (PA) efficiency of the
beamformer is improved with unchanged beampattern. Finally, the hybrid
beamforming design is obtained that achieves the full-digital beamformer
solution. Simulations verifies the advantages of the proposed scheme over
existing ones.

Waveform Design for Wireless Power Transfer with Limited Feedback

Yang Huang, Bruno Clerckx
Comments: Submitted for journal publication
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Waveform design is a key technique to jointly exploit a beamforming gain, the
channel frequency-selectivity and the rectifier nonlinearity, so as to enhance
the end-to-end power transfer efficiency of Wireless Power Transfer (WPT).
Those waveforms have been designed assuming perfect channel state information
at the transmitter. This paper proposes two waveform strategies relying on
limited feedback for multi-antenna multi-sine WPT over frequency-selective
channels. In the waveform selection strategy, the Energy Transmitter (ET)
transmits over multiple timeslots with every time a different waveform precoder
within a codebook, and the Energy Receiver (ER) reports the index of the
precoder in the codebook that leads to the largest harvested energy. In the
waveform refinement strategy, the ET sequentially transmits two waveforms in
each stage, and the ER reports one feedback bit indicating an increase/decrease
in the harvested energy during this stage. Based on multiple one-bit feedback,
the ET successively refines waveform precoders in a tree-structured codebook
over multiple stages. By employing the framework of the generalized Lloyd’s
algorithm, novel algorithms are proposed for both strategies to optimize the
codebooks in both space and frequency domains. The proposed limited
feedback-based waveform strategies are shown to outperform a set of baselines,
achieving higher harvested energy.

How to exploit prior information in low-complexity models

Sajad Daei, Farzan Haddadi
Subjects: Information Theory (cs.IT)

Compressed Sensing refers to extracting a low-dimensional structured signal
of interest from its incomplete random linear observations. A line of recent
work has studied that, with the extra prior information about the signal, one
can recover the signal with much fewer observations. For this purpose, the
general approach is to solve weighted convex function minimization problem. In
such settings, the convex function is chosen to promote the low-dimensional
structure and the optimal weights are so chosen to reduce the number of
measurements required for the optimization problem. In this paper, we consider
a generalized non-uniform model in which the structured signal falls into some
partitions, with entries of each partition having a definite probability to be
an element of the structure support. Given these probabilities and regarding
the recent developments in conic integral geometry, we provide a method to
choose the unique optimal weights for any general low-dimensional signal model.
This class of low-dimensional signal model includes many popular examples such
as (ell_1) analysis (entry-wise sparsity in an arbitrary redundant
dictionary), (ell_{1,2}) norm (block sparsity) and total variation semi-norm
(for piece-wise constant signals). We show through precise analysis and
simulations that the weighted convex optimization problem significantly
improves the regular convex optimization problem as we choose the unique
optimal weights.

Improving the Performance of OTDOA based Positioning in NB-IoT System

Sha Hu, Axel Berg, Xuhong Li, Fredrik Rusek
Comments: Submitted to VTC-Fall, 2017, 7 pages, 11 figures
Subjects: Information Theory (cs.IT)

In this paper, we consider positioning with
observed-time-difference-of-arrival (OTDOA) for a device deployed in
long-term-evolution (LTE) based narrow-band Internet-of-things (NB-IoT)
systems. We propose an iterative expectation-maximization based successive
interference cancellation (EM-SIC) algorithm to jointly consider estimations of
residual frequency-offset (FO), fading-channel taps and time-of-arrival (ToA)
of the first arrival-path for each of the detected cells. In order to design a
low complexity ToA detector and also due to the limits of low-cost analog
circuits, we assume an NB-IoT device working at a low-sampling rate such as
1.92 MHz or lower. The proposed EM-SIC algorithm comprises two stages to detect
ToA, based on which OTDOA can be calculated. In a first stage, after running
the EM-SIC block a predefined number of iterations, a coarse ToA is estimated
for each of the detected cells. Then in a second stage, to improve the ToA
resolution, a low-pass filter is utilized to interpolate the correlations of
time-domain PRS signal evaluated at a low sampling-rate to a high sampling-rate
such as 30.72 MHz. To keep low-complexity, only the correlations inside a small
search window centered at the coarse ToA estimates are upsampled. Then, the
refined ToAs are estimated based on upsampled correlations. If at least three
cells are detected, with OTDOA and the locations of detected cell sites, the
position of the NB-IoT device can be estimated. We show through numerical
simulations that, the proposed EM-SIC based ToA detector is robust against
impairments introduced by inter-cell interference, fading-channel and residual
FO. Thus significant signal-to-noise (SNR) gains are obtained over traditional
ToA detectors that do not consider these impairments when positioning a device.

Joint Domain Based Massive Access for Small Packets Traffic of Uplink Wireless Channel

Qiang Song, Ronggui Xie, Huarui Yin, Guo Wei
Comments: 6 pages, 5 figures.submitted to globecom 2017
Subjects: Information Theory (cs.IT)

The fifth generation (5G) communication scenarios such as the cellular
network and the emerging machine type communications will produce massive small
packets. To support massive connectivity and avoid signaling overhead caused by
the transmission of those small packets, this paper proposes a novel method to
improve the transmission efficiency for massive connections of wireless uplink
channel. The proposed method combines compressive sensing (CS) with power
domain NOMA jointly, especially neither the scheduling nor the centralized
power allocation is necessary in the method. Both the analysis and simulation
show that the method can support up to two or three times overloading.

On Low Complexity Detection for QAM Isomorphic Constellations

Farbod Kayhan
Comments: Submitted to IEEE GLOBECOM 2017
Subjects: Information Theory (cs.IT)

Despite of the known gap from the Shannon’s capacity, several standards are
still employing QAM or star shape constellations, mainly due to the existing
low complexity detectors. In this paper, we investigate the low complexity
detection for a family of QAM isomorphic constellations. These constellations
are known to perform very close to the peak-power limited capacity,
outperforming the DVB-S2X standard constellations. The proposed strategy is to
first remap the received signals to the QAM constellation using the existing
isomorphism and then break the log likelihood ratio computations to two one
dimensional PAM constellations. Gains larger than 0.6 dB with respect to QAM
can be obtained over the peak power limited channels without any increase in
detection complexity. Our scheme also provides a systematic way to design
constellations with low complexity one dimensional detectors. Several open
problems are discussed at the end of the paper.

Coverage and Rate of Downlink Sequence Transmissions with Reliability Guarantees

Jihong Park, Petar Popovski
Comments: 4 pages, 4 figures, submitted to IEEE Wireless Communications Letters
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Real-time distributed control is a promising application of 5G in which
communication links should satisfy certain reliability guarantees. In this
letter, we derive closed-form maximum average rate when a device (e.g.
industrial machine) downloads a sequence of n operational commands through
cellular connection, while guaranteeing a certain coverage for all n messages.
The result is based on novel closed-form n- successive coverage bounds. The
proposed lower bound provides a simple approximation that is increasingly
accurate in the high reliability region. For moderate coverage, a linear
combination of the proposed bounds enables deriving a tractable approximation.
Both techniques can be utilized for calculating the maximum average rate with
and without using adaptive modulation and coding (AMC).

On PGZ decoding of alternant codes

R. Farré, N. Sayols, S. Xambó-Descamps
Comments: 16 pages
Subjects: Information Theory (cs.IT)

In this note we first review the classical Petterson-Gorenstein-Zierler
decoding algorithm for the class of alternant codes (which includes
Reed-Solomon, Bose-Chaudhuri-Hocquenghem and classical Goppa codes), then we
present an improvement of the method to find the number of errors and the
errorlocator polynomial, and finally we illustrate the procedure with several
examples. In two appendices we sketch the main features of the system we have
used for the computations.

Secret Key Generation from Correlated Sources and Secure Link

Daming Cao, Wei Kang
Comments: 5 pages, 1 figure
Subjects: Information Theory (cs.IT)

In this paper, we study the problem of secret key generation from both
correlated sources and a secure channel. We obtain the optimal secret key rate
in this problem and show that the optimal scheme is to conduct secret key
generation and key distribution jointly, where every bit in the secret channel
will yield more than one bit of secret key rate. This joint scheme is better
than the separation-based scheme, where the secure channel is used for key
distribution, and as a result, every bit in the secure channel can only provide
one bit of secret key rate.

ECG Signal Compression and Optimization in Remote Monitoring Networks

Pengda Wong
Comments: 12 pages, 12 figures
Subjects: Information Theory (cs.IT)

We proposed a practical ECG compression system which is beneficial for
tele-monitoring cardiovascular diseases. There are two steps in the compression
framework. First, we partition ECG signal into segments according to R- to
R-wave periods. The partition aims at achieving more stable statistical
features between segments of ECG signal which is beneficial for saving bit
rates. After the partition, we optimize the bit rate in the sense of minimizing
ECG reconstruction error under a constraint of consumed bits. From the
experiment results, the proposed compression scheme is able to reduce the
computation for updating codebook, and save channel capacity resources for
transmitting ECG signals.

Mutual Information, Relative Entropy and Estimation Error in Semi-martingale Channels

Jiantao Jiao, Kartik Venkat, Tsachy Weissman
Subjects: Information Theory (cs.IT)

Fundamental relations between information and estimation have been
established in the literature for the continuous-time Gaussian and Poisson
channels, in a long line of work starting from the classical representation
theorems by Duncan and Kabanov respectively. In this work, we demonstrate that
such relations hold for a much larger family of continuous-time channels. We
introduce the family of semi-martingale channels where the channel output is a
semi-martingale stochastic process, and the channel input modulates the
characteristics of the semi-martingale. For these channels, which includes as a
special case the continuous time Gaussian and Poisson models, we establish new
representations relating the mutual information between the channel input and
output to an optimal causal filtering loss, thereby unifying and considerably
extending results from the Gaussian and Poisson settings. Extensions to the
setting of mismatched estimation are also presented where the relative entropy
between the laws governing the output of the channel under two different input
distributions is equal to the cumulative difference between the estimation loss
incurred by using the mismatched and optimal causal filters respectively. The
main tool underlying these results is the Doob–Meyer decomposition of a class
of likelihood ratio sub-martingales. The results in this work can be viewed as
the continuous-time analogues of recent generalizations for relations between
information and estimation for discrete-time L’evy channels.

Capacity of Cellular Wireless Network

Rahul Vaze, Srikanth Iyer
Comments: A shorter version to appear in WiOpt 2017
Subjects: Information Theory (cs.IT)

Earlier definitions of capacity for wireless networks, e.g., transport or
transmission capacity, for which exact theoretical results are known, are well
suited for ad hoc networks but are not directly applicable for cellular
wireless networks, where large-scale basestation (BS) coordination is not
possible, and retransmissions/ARQ under the SINR model is a universal feature.

In this paper, cellular wireless networks, where both BS locations and mobile
user (MU) locations are distributed as independent Poisson point processes are
considered, and each MU connects to its nearest BS. With ARQ, under the SINR
model, the effective downlink rate of packet transmission is the reciprocal of
the expected delay (number of retransmissions needed till success), which we
use as our network capacity definition after scaling it with the BS density.

Exact characterization of this natural capacity metric for cellular wireless
networks is derived. The capacity is shown to first increase polynomially with
the BS density in the low BS density regime and then scale inverse
exponentially with the increasing BS density. Two distinct upper bounds are
derived that are relevant for the low and the high BS density regimes. A single
power control strategy is shown to achieve the upper bounds in both the
regimes. This result is fundamentally different from the well known capacity
results for ad hoc networks, such as transport and transmission capacity that
scale as the square root of the (high) BS density. Our results show that the
strong temporal correlations of SINRs with PPP distributed BS locations is
limiting, and the realizable capacity in cellular wireless networks in high-BS
density regime is much smaller than previously thought. A byproduct of our
analysis shows that the capacity of the ALOHA strategy with retransmissions is
zero.

Satellite Based Positioning Signal Acquisition at Higher Order Cycle Frequency

Pengda Wong
Subjects: Information Theory (cs.IT)

The acquisition of the signal from the satellite based positioning systems,
such as GPS, Galileo, and Compass, encounters challenges in the urban streets,
indoor. For improving the acquisition performance, the data accumulation is
usually performed to improve the signal-to-noise ratio which is defined on the
second order statistics. Different from the conventional approaches, the
acquisition based on higher order cyclostatistics is proposed. Using the
cyclostatistics, the estimation of the initial phase and Doppler shift of the
signal is presented respectively. Afterwards, a joint estimator is introduced.
The analysis in this paper is performed on GPS signal. Indeed, the proposed
estimation method can be straightforwardly extended to acquire the signal from
the other satellite positioning systems. The simulation and experiment results
demonstrate that the proposed signal acquisition scheme achieves the detection
probability of 0.9 at the CNR 28dBHz.

"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

Sanghamitra Dutta, Viveck Cadambe, Pulkit Grover
Comments: Presented at NIPS 2016, Barcelona, Spain
Subjects: Information Theory (cs.IT)

Faced with saturation of Moore’s law and increasing dimension of data, system
designers have increasingly resorted to parallel and distributed computing.
However, distributed computing is often bottle necked by a small fraction of
slow processors called “stragglers” that reduce the speed of computation
because the fusion node has to wait for all processors to finish. To combat the
effect of stragglers, recent literature introduces redundancy in computations
across processors, e.g.,~using repetition-based strategies or erasure codes.
The fusion node can exploit this redundancy by completing the computation using
outputs from only a subset of the processors, ignoring the stragglers. In this
paper, we propose a novel technique — that we call “Short-Dot” — to introduce
redundant computations in a coding theory inspired fashion, for computing
linear transforms of long vectors. Instead of computing long dot products as
required in the original linear transform, we construct a larger number of
redundant and short dot products that can be computed faster and more
efficiently at individual processors. In reference to comparable schemes that
introduce redundancy to tackle stragglers, Short-Dot reduces the cost of
computation, storage and communication since shorter portions are stored and
computed at each processor, and also shorter portions of the input is
communicated to each processor. We demonstrate through probabilistic analysis
as well as experiments that Short-Dot offers significant speed-up compared to
existing techniques. We also derive trade-offs between the length of the
dot-products and the resilience to stragglers (number of processors to wait
for), for any such strategy and compare it to that achieved by our strategy.

Performance Analysis of Slotted Secondary Transmission with Adaptive Modulation under Interweave Cognitive Radio Implementation

Wen-Jing Wang, Hong-Chuan Yang
Subjects: Information Theory (cs.IT)

In cognitive radio communication, unlicensed secondary user (SU) can access
under-utilized spectrum of the licensed primary user (PU) opportunistically for
emerging wireless applications. With interweave implementation, SU has to
perform spectrum sensing on the target frequency band and waits for
transmission if PU occupies the channel. This waiting time results in extra
delay for secondary transmission. In this paper, the delay and throughput
performance of secondary packet transmission is evaluated with slotted
transmission protocol. We propose a discrete-time Markov model to characterize
secondary slotted transmission process. Close-form solution of collision
probability is obtained. We then carry out the queuing delay and throughput
analysis based on a two-dimensional-finite-state Markov chain for small-size
packet transmission. For large-size packets, the distribution function of
extended delivery time for secondary packet transmission is also derived.
Selected numerical results are presented to illustrate the mathematical
formulas and to validate our research results.

Approximations from Anywhere and General Rough Sets

Not all approximations arise from information systems. The problem of fitting
approximations, subjected to some rules (and related data), to information
systems in a rough scheme of things is known as the emph{inverse problem}. The
inverse problem is more general than the duality (or abstract representation)
problems and was introduced by the present author in her earlier papers. From
the practical perspective, a few (as opposed to one) theoretical frameworks may
be suitable for formulating the problem itself. emph{Granular operator spaces}
have been recently introduced and investigated by the present author in her
recent work in the context of antichain based and dialectical semantics for
general rough sets. The nature of the inverse problem is examined from
number-theoretic and combinatorial perspectives in a higher order variant of
granular operator spaces and some necessary conditions are proved. The results
and the novel approach would be useful in a number of unsupervised and semi
supervised learning contexts and algorithms.

Image Fusion With Cosparse Analysis Operator

The paper addresses the image fusion problem, where multiple images captured
with different focus distances are to be combined into a higher quality
all-in-focus image. Most current approaches for image fusion strongly rely on
the unrealistic noise-free assumption used during the image acquisition, and
then yield limited robustness in fusion processing. In our approach, we
formulate the multi-focus image fusion problem in terms of an analysis sparse
model, and simultaneously perform the restoration and fusion of multi-focus
images. Based on this model, we propose an analysis operator learning, and
define a novel fusion function to generate an all-in-focus image. Experimental
evaluations confirm the effectiveness of the proposed fusion approach both
visually and quantitatively, and show that our approach outperforms
state-of-the-art fusion methods.

Competitive Resource Allocation in HetNets: the Impact of Small-cell Spectrum Constraints and Investment Costs

Cheng Chen, Randall A. Berry, Michael L. Honig, Vijay G. Subramanian
Comments: 14 pages, 8 figures. submitted to IEEE Transactions on Cognitive Communications and Networking
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

Heterogeneous wireless networks with small-cell deployments in licensed and
unlicensed spectrum bands are a promising approach for expanding wireless
connectivity and service. As a result, wireless service providers (SPs) are
adding small-cells to augment their existing macro-cell deployments. This added
flexibility complicates network management, in particular, service pricing and
spectrum allocations across macro- and small-cells. Further, these decisions
depend on the degree of competition among SPs. Restrictions on shared spectrum
access imposed by regulators, such as low power constraints that lead to
small-cell deployments, along with the investment cost needed to add small
cells to an existing network, also impact strategic decisions and market
efficiency. If the revenue generated by small-cells does not cover the
investment cost, then there will be no deployment even if it increases social
welfare. We study the implications of such spectrum constraints and investment
costs on resource allocation and pricing decisions by competitive SPs, along
with the associated social welfare. Our results show that while the optimal
resource allocation taking constraints and investment into account can be
uniquely determined, adding those features with strategic SPs can have a
substantial effect on the equilibrium market structure.

Performance Impact of Base Station Antenna Heights in Dense Cellular Networks

Ming Ding, David Lopez-Perez
Comments: IEEE journal submission. Submission on Apr. 26, 2016. First-round decision (Major Revision) on Sep. 12, 2016. Major revision submitted on Nov. 7, 2016. Second-round in progress
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

In this paper, we present a new and significant theoretical discovery. If the
absolute height difference between base station (BS) antenna and user equipment
(UE) antenna is larger than zero, then the network performance in terms of both
the coverage probability and the area spectral efficiency (ASE) will
continuously decrease toward zero as the BS density increases for ultra-dense
(UD) small cell networks (SCNs). Such findings are completely different from
the conclusions in existing works, both quantitatively and qualitatively. In
particular, this performance behavior has a tremendous impact on the deployment
of UD SCNs in the 5th-generation (5G) era. Network operators may invest large
amounts of money in deploying more network infrastructure to only obtain an
even less network capacity. Our study results reveal that one way to address
this issue is to lower the SCN BS antenna height to the UE antenna height.
However, this requires a revolutionized approach of BS architecture and
deployment, which is explored in this paper too.

Does robustness imply tractability? A lower bound for planted clique in the semi-random model

Jacob Steinhardt
Subjects: Computational Complexity (cs.CC); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)

We consider a robust analog of the planted clique problem. In this analog, a
set (S) of vertices is chosen and all edges in (S) are included; then, edges
between (S) and the rest of the graph are included with probability
(frac{1}{2}), while edges not touching (S) are allowed to vary arbitrarily.
For this semi-random model, we show that the information-theoretic threshold
for recovery is ( ilde{Theta}(sqrt{n})), in sharp contrast to the classical
information-theoretic threshold of (Theta(log(n))). This matches the
conjectured computational threshold for the classical planted clique problem,
and thus raises the intriguing possibility that, once we require robustness,
there is no computational-statistical gap for planted clique.