IT博客汇 | arXiv Paper Daily: Wed, 19 Oct 2016

arXiv Paper Daily: Wed, 19 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-19 00:00:00

Neural and Evolutionary Computing

Scaling Up MAP-Elites Using Centroidal Voronoi Tessellations

Vassilis Vassiliades, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret
Subjects: Neural and Evolutionary Computing (cs.NE)

The recently introduced Multi-dimensional Archive of Phenotypic Elites
(MAP-Elites) is an evolutionary algorithm capable of producing a large archive
of diverse, high-performing solutions in a single run. It works by discretizing
a continuous feature space into unique regions according to the desired
discretization per dimension. While simple, this algorithm has a main drawback:
it cannot scale to high-dimensional feature spaces since the number of regions
increase exponentially with the number of dimensions. In this paper, we address
this limitation by introducing a simple extension of MAP-Elites that has a
constant, pre-defined number of regions irrespective of the dimensionality of
the feature space. Our main insight is that methods from computational geometry
could partition a high-dimensional space into well-spread geometric regions. In
particular, our algorithm uses a centroidal Voronoi tessellation (CVT) to
divide the feature space into a desired number of regions; it then places every
generated individual in its closest region, replacing a less fit one if the
region is already occupied. We demonstrate the effectiveness of the new
“CVT-MAP-Elites” algorithm in high-dimensional feature spaces through
comparisons against MAP-Elites in a hexapod locomotion task.

Design Mining Microbial Fuel Cell Cascades

Richard J. Preen, Jiseon You, Larry Bull, Ioannis A. Ieropoulos
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)

Microbial fuel cells (MFCs) perform wastewater treatment and electricity
production through the conversion of organic matter using microorganisms. For
practical applications, it has been suggested that greater efficiency can be
achieved by arranging multiple MFC units into physical stacks in a cascade with
feedstock flowing sequentially between units. In this paper, we investigate the
use of computational intelligence to physically explore and optimise
(potentially) heterogeneous MFC designs in a cascade, i.e. without simulation.
Conductive structures are 3-D printed and inserted into the anodic chamber of
each MFC unit, augmenting a carbon fibre veil anode and affecting the
hydrodynamics, including the feedstock volume and hydraulic retention time, as
well as providing unique habitats for microbial colonisation. We show that it
is possible to use design mining to identify new conductive inserts that
increase both the cascade power output and power density.

Online Contrastive Divergence with Generative Replay: Experience Replay without Storing Data

Decebal Constantin Mocanu, Maria Torres Vega, Eric Eaton, Peter Stone, Antonio Liotta
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Conceived in the early 1990s, Experience Replay (ER) has been shown to be a
successful mechanism to allow online learning algorithms to reuse past
experiences. Traditionally, ER can be applied to all machine learning paradigms
(i.e., unsupervised, supervised, and reinforcement learning). Recently, ER has
contributed to improving the performance of deep reinforcement learning. Yet,
its application to many practical settings is still limited by the memory
requirements of ER, necessary to explicitly store previous observations. To
remedy this issue, we explore a novel approach, Online Contrastive Divergence
with Generative Replay (OCD_GR), which uses the generative capability of
Restricted Boltzmann Machines (RBMs) instead of recorded past experiences. The
RBM is trained online, and does not require the system to store any of the
observed data points. We compare OCD_GR to ER on 9 real-world datasets,
considering a worst-case scenario (data points arriving in sorted order) as
well as a more realistic one (sequential random-order data points). Our results
show that in 64.28% of the cases OCD_GR outperforms ER and in the remaining
35.72% it has an almost equal performance, while having a considerably reduced
space complexity (i.e., memory usage) at a comparable time complexity.

Computer Vision and Pattern Recognition

Fast L1-NMF for Multiple Parametric Model Estimation

Mariano Tepper, Guillermo Sapiro
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

In this work we introduce a comprehensive algorithmic pipeline for multiple
parametric model estimation. The proposed approach analyzes the information
produced by a random sampling algorithm (e.g., RANSAC) from a machine
learning/optimization perspective, using a extit{parameterless} biclustering
algorithm based on L1 nonnegative matrix factorization (L1-NMF). The proposed
framework exploits consistent patterns that naturally arise during the RANSAC
execution, while explicitly avoiding spurious inconsistencies. Contrarily to
the main trends in the literature, the proposed technique does not impose
non-intersecting parametric models. A new accelerated algorithm to compute
L1-NMFs allows to handle medium-sized problems faster while also extending the
usability of the algorithm to much larger datasets. This accelerated algorithm
has applications in any other context where an L1-NMF is needed, beyond the
biclustering approach to parameter estimation here addressed. We accompany the
algorithmic presentation with theoretical foundations and numerous and diverse
examples.

Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences

Eren Erdal Aksoy, Adil Orhan, Florentin Woergoetter
Comments: IJCV preprint manuscript
Journal-ref: International Journal of Computer Vision, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Understanding continuous human actions is a non-trivial but important problem
in computer vision. Although there exists a large corpus of work in the
recognition of action sequences, most approaches suffer from problems relating
to vast variations in motions, action combinations, and scene contexts. In this
paper, we introduce a novel method for semantic segmentation and recognition of
long and complex manipulation action tasks, such as “preparing a breakfast” or
“making a sandwich”. We represent manipulations with our recently introduced
“Semantic Event Chain” (SEC) concept, which captures the underlying
spatiotemporal structure of an action invariant to motion, velocity, and scene
context. Solely based on the spatiotemporal interactions between manipulated
objects and hands in the extracted SEC, the framework automatically parses
individual manipulation streams performed either sequentially or concurrently.
Using event chains, our method further extracts basic primitive elements of
each parsed manipulation. Without requiring any prior object knowledge, the
proposed framework can also extract object-like scene entities that exhibit the
same role in semantically similar manipulations. We conduct extensive
experiments on various recent datasets to validate the robustness of the
framework.

From Traditional to Modern : Domain Adaptation for Action Classification in Short Social Video Clips

Aditya Singh, Saurabh Saini, Rajvi Shah, P J Narayanan
Comments: 9 pages, GCPR, 2016
Journal-ref: Pattern Recognition,38th German Conference, GCPR 2016, Hannover,
Germany, September 12-15, 2016, Proceedings,pp 245-257
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Short internet video clips like vines present a significantly wild
distribution compared to traditional video datasets. In this paper, we focus on
the problem of unsupervised action classification in wild vines using
traditional labeled datasets. To this end, we use a data augmentation based
simple domain adaptation strategy. We utilise semantic word2vec space as a
common subspace to embed video features from both, labeled source domain and
unlablled target domain. Our method incrementally augments the labeled source
with target samples and iteratively modifies the embedding function to bring
the source and target distributions together. Additionally, we utilise a
multi-modal representation that incorporates noisy semantic information
available in form of hash-tags. We show the effectiveness of this simple
adaptation technique on a test set of vines and achieve notable improvements in
performance.

Deep Identity-aware Transfer of Facial Attributes

Mu Li, Wangmeng Zuo, David Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a Deep convolutional network model for Identity-Aware
Transfer (DIAT) of facial attributes. Given the source input image and the
reference attribute, DIAT aims to generate a facial image (i.e., target image)
that not only owns the reference attribute but also keep the same or similar
identity to the input image. We develop a two-stage scheme to transfer the
input image to each reference attribute label. A feed-forward transform network
is first trained by combining perceptual identity-aware loss and GAN-based
attribute loss, and a face enhancement network is then introduced to improve
the visual quality. We further define perceptual identity loss on the
convolutional feature maps of the attribute discriminator, resulting in a
DIAT-A model. Our DIAT and DIAT-A models can provide a unified solution for
several representative facial attribute transfer tasks such as expression
transfer, accessory removal, age progression, and gender transfer. The
experimental results validate their effectiveness. Even for some
identity-related attribute (e.g., gender), our DIAT-A can obtain visually
impressive results by changing the attribute while retaining most identity
features of the source image.

Master's Thesis : Deep Learning for Visual Recognition

Rémi Cadène, Nicolas Thome, Matthieu Cord
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The goal of our research is to develop methods advancing automatic visual
recognition. In order to predict the unique or multiple labels associated to an
image, we study different kind of Deep Neural Networks architectures and
methods for supervised features learning. We first draw up a state-of-the-art
review of the Convolutional Neural Networks aiming to understand the history
behind this family of statistical models, the limit of modern architectures and
the novel techniques currently used to train deep CNNs. The originality of our
work lies in our approach focusing on tasks with a low amount of data. We
introduce different models and techniques to achieve the best accuracy on
several kind of datasets, such as a medium dataset of food recipes (100k
images) for building a web API, or a small dataset of satellite images (6,000)
for the DSG online challenge that we’ve won. We also draw up the
state-of-the-art in Weakly Supervised Learning, introducing different kind of
CNNs able to localize regions of interest. Our last contribution is a
framework, build on top of Torch7, for training and testing deep models on any
visual recognition tasks and on datasets of any scale.

Edge Based Grid Super-Imposition for Crowd Emotion Recognition

Amol Patwardhan
Comments: 6 pages, 6 figure, 1 table, emotion, crowd, group, spontaneous, indoor, edge, grid, mesh, tracking, temporal feature
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Numerous automatic continuous emotion detection system studies have examined
mostly use of videos and images containing individual person expressing
emotions. This study examines the detection of spontaneous emotions in a group
and crowd settings. Edge detection was used with a grid of lines
superimposition to extract the features. The feature movement in terms of
movement from the reference point was used to track across sequences of images
from the color channel. Additionally the video data capturing was done on
spontaneous emotions invoked by watching sports events from group of
participants. The method was view and occlusion independent and the results
were not affected by presence of multiple people chaotically expressing various
emotions. The edge thresholds of 0.2 and grid thresholds of 20 showed the best
accuracy results. The overall accuracy of the group emotion classifier was
70.9%.

M2CAI Workflow Challenge: Convolutional Neural Networks with Time Smoothing and Hidden Markov Model for Video Frames Classification

Rémi Cadène, Thomas Robert, Nicolas Thome, Matthieu Cord
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Our approach is among the three best to tackle the M2CAI Workflow challenge.
The latter consists in recognizing the operation phase for each frames of
endoscopic videos. In this technical report, we compare several classification
models and temporal smoothing methods. Our submitted solution is a fine tuned
Residual Network-200 on 80% of the training set with temporal smoothing using
simple temporal averaging of the predictions and a Hidden Markov Model modeling
the sequence.

Shape-based defect classification for Non Destructive Testing

Gianni D'Angelo, Salvatore Rampone
Comments: 5 pages, IEEE International Workshop
Journal-ref: IEEE International Workshop on Metrology for Aerospace, Benevento,
Italy, June 4-5, 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE)

The aim of this work is to classify the aerospace structure defects detected
by eddy current non-destructive testing. The proposed method is based on the
assumption that the defect is bound to the reaction of the probe coil impedance
during the test. Impedance plane analysis is used to extract a feature vector
from the shape of the coil impedance in the complex plane, through the use of
some geometric parameters. Shape recognition is tested with three different
machine-learning based classifiers: decision trees, neural networks and Naive
Bayes. The performance of the proposed detection system are measured in terms
of accuracy, sensitivity, specificity, precision and Matthews correlation
coefficient. Several experiments are performed on dataset of eddy current
signal samples for aircraft structures. The obtained results demonstrate the
usefulness of our approach and the competiveness against existing descriptors.

Real-time analysis of cataract surgery videos using statistical models

Katia Charrière, Gwenolé Quellec, Mathieu Lamard, David Martiano, Guy Cazuguel, Gouenou Coatrieux, Béatrice Cochener
Comments: This is an extended version of a paper presented at the CBMI 2016 conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The automatic analysis of the surgical process, from videos recorded during
surgeries, could be very useful to surgeons, both for training and for
acquiring new techniques. The training process could be optimized by
automatically providing some targeted recommendations or warnings, similar to
the expert surgeon’s guidance. In this paper, we propose to reuse videos
recorded and stored during cataract surgeries to perform the analysis. The
proposed system allows to automatically recognize, in real time, what the
surgeon is doing: what surgical phase or, more precisely, what surgical step he
or she is performing. This recognition relies on the inference of a multilevel
statistical model which uses 1) the conditional relations between levels of
description (steps and phases) and 2) the temporal relations among steps and
among phases. The model accepts two types of inputs: 1) the presence of
surgical tools, manually provided by the surgeons, or 2) motion in videos,
automatically analyzed through the Content Based Video retrieval (CBVR)
paradigm. Different data-driven statistical models are evaluated in this paper.
For this project, a dataset of 30 cataract surgery videos was collected at
Brest University hospital. The system was evaluated in terms of area under the
ROC curve. Promising results were obtained using either the presence of
surgical tools ((A_z) = 0.983) or motion analysis ((A_z) = 0.759). The
generality of the method allows to adapt it to any kinds of surgeries. The
proposed solution could be used in a computer assisted surgery tool to support
surgeons during the surgery.

ARTiS: Appearance-based Action Recognition in Task Space

Markus Eich, Sareh Shirazi, Gordon Wyeth
Comments: Submitted to ICRA 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

To have a robot actively supporting a human during a collaborative task, it
is crucial that robots are able to identify the current action in order to
predict the next one. Common approaches make use of high-level knowledge, such
as object affordances, semantics or understanding of actions in terms of pre-
and post-conditions. These approaches often require hand-coded a priori
knowledge, time- and resource- intensive or supervised learning techniques. We
propose to reframe this problem as an appearance- based place recognition
problem. In our framework, we regard sequences of visual images of human
actions as a map in analogy to the visual place recognition problem. Observing
the task for the second time, our approach is able to recognize pre-observed
actions in a one-shot learning approach and is thereby able to recognize the
current observation in the task space. We propose two new methods for creating
and aligning action observations within a task map. We compare and verify our
approaches with real data of humans assembling an IKEA flat pack drawer.

Artificial Intelligence

Deep Amortized Inference for Probabilistic Programs

Daniel Ritchie, Paul Horsfall, Noah D. Goodman
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Probabilistic programming languages (PPLs) are a powerful modeling tool, able
to represent any computable probability distribution. Unfortunately,
probabilistic program inference is often intractable, and existing PPLs mostly
rely on expensive, approximate sampling-based methods. To alleviate this
problem, one could try to learn from past inferences, so that future inferences
run faster. This strategy is known as amortized inference; it has recently been
applied to Bayesian networks and deep generative models. This paper proposes a
system for amortized inference in PPLs. In our system, amortization comes in
the form of a parameterized guide program. Guide programs have similar
structure to the original program, but can have richer data flow, including
neural network components. These networks can be optimized so that the guide
approximately samples from the posterior distribution defined by the original
program. We present a flexible interface for defining guide programs and a
stochastic gradient-based scheme for optimizing guide parameters, as well as
some preliminary results on automatically deriving guide programs. We explore
in detail the common machine learning pattern in which a ‘local’ model is
specified by ‘global’ random values and used to generate independent observed
data points; this gives rise to amortized local inference supporting global
model learning.

Census Signal Temporal Logic Inference for Multi-Agent Group Behavior Analysis

Zhe Xu, Agung Julius
Comments: 26 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Logic (math.LO); Optimization and Control (math.OC)

In this paper, we define a novel census signal temporal logic (CensusSTL)
that focuses on the number of agents in different subsets of a group that
complete a certain task specified by the signal temporal logic (STL). CensusSTL
consists of an “inner logic” STL formula and an “outer logic” STL formula. We
present a new inference algorithm to infer CensusSTL formulae from the
trajectory data of a group of agents. We first identify the “inner logic” STL
formula and then infer the subgroups based on whether the agents’ behaviors
satisfy the “inner logic” formula at each time point. We use two different
approaches to infer the subgroups based on similarity and complementarity,
respectively. The “outer logic” CensusSTL formula is inferred from the census
trajectories of different subgroups. We apply the algorithm in analyzing data
from a soccer match by inferring the CensusSTL formula for different subgroups
of a soccer team.

Identifiability and Transportability in Dynamic Causal Networks

Gilles Blondel, Marta Arias, Ricard Gavaldà
Comments: Presented at the 2016 ACM SIGKDD Workshop on Causal Discovery
Subjects: Artificial Intelligence (cs.AI)

In this paper we propose a causal analog to the purely observational Dynamic
Bayesian Networks, which we call Dynamic Causal Networks. We provide a sound
and complete algorithm for identification of Dynamic Causal Net- works, namely,
for computing the effect of an intervention or experiment, based on passive
observations only, whenever possible. We note the existence of two types of
confounder variables that affect in substantially different ways the iden-
tification procedures, a distinction with no analog in either Dynamic Bayesian
Networks or standard causal graphs. We further propose a procedure for the
transportability of causal effects in Dynamic Causal Network settings, where
the re- sult of causal experiments in a source domain may be used for the
identification of causal effects in a target domain.

Weighted Positive Binary Decision Diagrams for Exact Probabilistic Inference

Giso H. Dal, Peter J.F. Lucas
Comments: 30 pages
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

Recent work on weighted model counting has been very successfully applied to
the problem of probabilistic inference in Bayesian networks. The probability
distribution is encoded into a Boolean normal form and compiled to a target
language, in order to represent local structure expressed among conditional
probabilities more efficiently. We show that further improvements are possible,
by exploiting the knowledge that is lost during the encoding phase and
incorporating it into a compiler inspired by Satisfiability Modulo Theories.
Constraints among variables are used as a background theory, which allows us to
optimize the Shannon decomposition. We propose a new language, called Weighted
Positive Binary Decision Diagrams, that reduces the cost of probabilistic
inference by using this decomposition variant to induce an arithmetic circuit
of reduced size.

Diagnosis of aerospace structure defects by a HPC implemented soft computing algorithm

Gianni D'Angelo, Salvatore Rampone
Comments: 5 pages, IEEE International Workshop on Metrology for Aerospace
Journal-ref: IEEE International Workshop on Metrology for Aerospace, Benevento,
Italy, May 29-30, 2014
Subjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Data Analysis, Statistics and Probability (physics.data-an)

This study concerns with the diagnosis of aerospace structure defects by
applying a HPC parallel implementation of a novel learning algorithm, named
U-BRAIN. The Soft Computing approach allows advanced multi-parameter data
processing in composite materials testing. The HPC parallel implementation
overcomes the limits due to the great amount of data and the complexity of data
processing. Our experimental results illustrate the effectiveness of the
U-BRAIN parallel implementation as defect classifier in aerospace structures.
The resulting system is implemented on a Linux-based cluster with multi-core
architecture.

Makespan Optimal Solving of Cooperative Path-Finding via Reductions to Propositional Satisfiability

Pavel Surynek
Subjects: Artificial Intelligence (cs.AI)

The problem of makespan optimal solving of cooperative path finding (CPF) is
addressed in this paper. The task in CPF is to relocate a group of agents in a
non-colliding way so that each agent eventually reaches its goal location from
the given initial location. The abstraction adopted in this work assumes that
agents are discrete items moving in an undirected graph by traversing edges.
Makespan optimal solving of CPF means to generate solutions that are as short
as possi-ble in terms of the total number of time steps required for the
execution of the solution.

We show that reducing CPF to propositional satisfiability (SAT) represents a
viable option for obtaining makespan optimal solutions. Several encodings of
CPF into propositional formulae are suggested and experimentally evaluated. The
evaluation indicates that SAT based CPF solving outperforms other makespan
optimal methods significantly in highly constrained situations (environments
that are densely occupied by agents).

VRPBench: A Vehicle Routing Benchmark Tool

Guilherme A. Zeni, Mauro Menzori, P. S. Martins, Luis A. A. Meira
Subjects: Artificial Intelligence (cs.AI)

The number of optimization techniques in the combinatorial domain is large
and diversified. Nevertheless, there is still a lack of real benchmarks to
validate optimization algorithms. In this work we introduce VRPBench, a tool to
create instances and visualize solutions to the Vehicle Routing Problem (VRP)
in a planar graph embedded in the Euclidean 2D space. We use VRPBench to model
a real-world mail delivery case of the city of Artur Nogueira. Such scenarios
were characterized as a multi-objective optimization of the VRP. We extracted a
weighted graph from a digital map of the city to create a challenging benchmark
for the VRP. Each instance models one generic day of mail delivery with
hundreds to thousands of delivery points, thus allowing both the comparison and
validation of optimization algorithms for routing problems.

Design Mining Microbial Fuel Cell Cascades

Microbial fuel cells (MFCs) perform wastewater treatment and electricity
production through the conversion of organic matter using microorganisms. For
practical applications, it has been suggested that greater efficiency can be
achieved by arranging multiple MFC units into physical stacks in a cascade with
feedstock flowing sequentially between units. In this paper, we investigate the
use of computational intelligence to physically explore and optimise
(potentially) heterogeneous MFC designs in a cascade, i.e. without simulation.
Conductive structures are 3-D printed and inserted into the anodic chamber of
each MFC unit, augmenting a carbon fibre veil anode and affecting the
hydrodynamics, including the feedstock volume and hydraulic retention time, as
well as providing unique habitats for microbial colonisation. We show that it
is possible to use design mining to identify new conductive inserts that
increase both the cascade power output and power density.

Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Pranay Dighe, Afsaneh Asaei, Herve Bourlard
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

Conventional deep neural networks (DNN) for speech acoustic modeling rely on
Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary
class labels as the targets for DNN training. Subword classes in speech
recognition systems correspond to context-dependent tied states or senones. The
present work addresses some limitations of GMM-HMM senone alignments for DNN
training. We hypothesize that the senone probabilities obtained from a DNN
trained with binary labels can provide more accurate targets to learn better
acoustic models. However, DNN outputs bear inaccuracies which are exhibited as
high dimensional unstructured noise, whereas the informative components are
structured and low-dimensional. We exploit principle component analysis (PCA)
and sparse coding to characterize the senone subspaces. Enhanced probabilities
obtained from low-rank and sparse reconstructions are used as soft-targets for
DNN acoustic modeling, that also enables training with untranscribed data.
Experiments conducted on AMI corpus shows 4.6% relative reduction in word error
rate.

Computation and Language

Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Pranay Dighe, Afsaneh Asaei, Herve Bourlard
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

Conventional deep neural networks (DNN) for speech acoustic modeling rely on
Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary
class labels as the targets for DNN training. Subword classes in speech
recognition systems correspond to context-dependent tied states or senones. The
present work addresses some limitations of GMM-HMM senone alignments for DNN
training. We hypothesize that the senone probabilities obtained from a DNN
trained with binary labels can provide more accurate targets to learn better
acoustic models. However, DNN outputs bear inaccuracies which are exhibited as
high dimensional unstructured noise, whereas the informative components are
structured and low-dimensional. We exploit principle component analysis (PCA)
and sparse coding to characterize the senone subspaces. Enhanced probabilities
obtained from low-rank and sparse reconstructions are used as soft-targets for
DNN acoustic modeling, that also enables training with untranscribed data.
Experiments conducted on AMI corpus shows 4.6% relative reduction in word error
rate.

Stylometric Analysis of Early Modern Period English Plays

Mark Eisen, Santiago Segarra, Gabriel Egan, Alejandro Ribeiro
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Function word adjacency networks (WANs) are used to study the authorship of
plays from the Early Modern English period. In these networks, nodes are
function words and directed edges between two nodes represent the likelihood of
ordered co-appearance of the two words. For every analyzed play a WAN is
constructed and these are aggregated to generate author profile networks. We
first study the similarity of writing styles between Early English playwrights
by comparing the profile WANs. The accuracy of using WANs for authorship
attribution is then demonstrated by attributing known plays among six popular
playwrights. The WAN method is shown to additionally outperform other
frequency-based methods on attributing Early English plays. This high
classification power is then used to investigate the authorship of anonymous
plays. Moreover, WANs are shown to be reliable classifiers even when
attributing collaborative plays. For several plays of disputed co- authorship,
a deeper analysis is performed by attributing every act and scene separately,
in which we both corroborate existing breakdowns and provide evidence of new
assignments. Finally, the impact of genre on attribution accuracy is examined
revealing that the genre of a play partially conditions the choice of the
function words used in it.

Vietnamese Named Entity Recognition using Token Regular Expressions and Bidirectional Inference

Phuong Le-Hong
Comments: Submitted to the VLSP Workshop 2016
Subjects: Computation and Language (cs.CL)

This paper describes an efficient approach to improve the accuracy of a named
entity recognition system for Vietnamese. The approach combines regular
expressions over tokens and a bidirectional inference method in a sequence
labelling model, which achieves an overall (F_1) score of 89.66% on a test set
of an evaluation compaign, organized in late 2016 by the Vietnamese Language
and Speech Processing (VLSP) community.

SYSTRAN's Pure Neural Machine Translation Systems

Josep Crego, Jungi Kim, Guillaume Klein, Anabel Rebollo, Kathy Yang, Jean Senellart, Egor Akhanov, Patrice Brunelle, Aurelien Coquard, Yongchao Deng, Satoshi Enoue, Chiyo Geiss, Joshua Johanson, Ardas Khalsa, Raoum Khiari, Byeongil Ko, Catherine Kobus, Jean Lorieux, Leidiana Martins, Dang-Chuan Nguyen, Alexandra Priori, Thomas Riccardi, Natalia Segal, Christophe Servan, Cyril Tiquet, Bo Wang, Jin Yang, Dakun Zhang, Jing Zhou, Peter Zoldan
Subjects: Computation and Language (cs.CL)

Since the first online demonstration of Neural Machine Translation (NMT) by
LISA, NMT development has recently moved from laboratory to production systems
as demonstrated by several entities announcing roll-out of NMT engines to
replace their existing technologies. NMT systems have a large number of
training configurations and the training process of such systems is usually
very long, often a few weeks, so role of experimentation is critical and
important to share. In this work, we present our approach to production-ready
systems simultaneously with release of online demonstrators covering a large
variety of languages (12 languages, for 32 language pairs). We explore
different practical choices: an efficient and evolutive open-source framework;
data preparation; network architecture; additional implemented features; tuning
for production; etc. We discuss about evaluation methodology, present our first
findings and we finally outline further work.

Our ultimate goal is to share our expertise to build competitive production
systems for “generic” translation. We aim at contributing to set up a
collaborative framework to speed-up adoption of the technology, foster further
research efforts and enable the delivery and adoption to/by industry of
use-case specific engines integrated in real production workflows. Mastering of
the technology would allow us to build translation engines suited for
particular needs, outperforming current simplest/uniform systems.

Addressing Community Question Answering in English and Arabic

Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Alessandro Moschitti, Shafiq Joty, Fahad A. Al Obaidli, Kateryna Tymoshenko, Antonio Uva
Comments: presented at Second WebQA workshop, SIGIR2016 (this http URL)
Subjects: Computation and Language (cs.CL)

This paper studies the impact of different types of features applied to
learning to re-rank questions in community Question Answering. We tested our
models on two datasets released in SemEval-2016 Task 3 on “Community Question
Answering”. Task 3 targeted real-life Web fora both in English and Arabic. Our
models include bag-of-words features (BoW), syntactic tree kernels (TKs), rank
features, embeddings, and machine translation evaluation features. To the best
of our knowledge, structural kernels have barely been applied to the question
reranking task, where they have to model paraphrase relations. In the case of
the English question re-ranking task, we compare our learning to rank (L2R)
algorithms against a strong baseline given by the Google-generated ranking
(GR). The results show that i) the shallow structures used in our TKs are
robust enough to noisy data and ii) improving GR is possible, but effective BoW
features and TKs along with an accurate model of GR features in the used L2R
algorithm are required. In the case of the Arabic question re-ranking task, for
the first time we applied tree kernels on syntactic trees of Arabic sentences.
Our approaches to both tasks obtained the second best results on SemEval-2016
subtasks B on English and D on Arabic.

Personalized Machine Translation: Preserving Original Author Traits

Ella Rabinovich, Shachar Mirkin, Raj Nath Patel, Lucia Specia, Shuly Wintner
Comments: 10 pages
Subjects: Computation and Language (cs.CL)

The language that we produce reflects our personality, and various personal
and demographic characteristics can be detected in natural language texts. We
focus on one particular personal trait, gender, and study how it is manifested
in original texts and in translations. We show that gender has a powerful,
clear signal in originals, but this signal is obfuscated in human and machine
translation. We then propose simple domain-adaptation techniques that help
retain the original gender traits in the translation outcome, without harming
the quality of the translation, thereby creating more personalized machine
translation systems.

End-to-end attention-based distant speech recognition with Highway LSTM

Hassan Taherian
Comments: 8 pages, 2 figures
Subjects: Computation and Language (cs.CL)

End-to-end attention-based models have been shown to be competitive
alternatives to conventional DNN-HMM models in the Speech Recognition Systems.
In this paper, we extend existing end-to-end attention-based models that can be
applied for Distant Speech Recognition (DSR) task. Specifically, we propose an
end-to-end attention-based speech recognizer with multichannel input that
performs sequence prediction directly at the character level. To gain a better
performance, we also incorporate Highway long short-term memory (HLSTM) which
outperforms previous models on AMI distant speech recognition task.

The infochemical core

Antoni Hernández-Fernández, Ramon Ferrer-i-Cancho
Journal-ref: The infochemical core. Journal of Quantitative Linguistics 23 (2),
133-153 (2016)
Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL)

Vocalizations and less often gestures have been the object of linguistic
research over decades. However, the development of a general theory of
communication with human language as a particular case requires a clear
understanding of the organization of communication through other means.
Infochemicals are chemical compounds that carry information and are employed by
small organisms that cannot emit acoustic signals of optimal frequency to
achieve successful communication. Here the distribution of infochemicals across
species is investigated when they are ranked by their degree or the number of
species with which it is associated (because they produce or they are sensitive
to it). The quality of the fit of different functions to the dependency between
degree and rank is evaluated with a penalty for the number of parameters of the
function. Surprisingly, a double Zipf (a Zipf distribution with two regimes
with a different exponent each) is the model yielding the best fit although it
is the function with the largest number of parameters. This suggests that the
world wide repertoire of infochemicals contains a chemical nucleus shared by
many species and reminiscent of the core vocabularies found for human language
in dictionaries or large corpora.

A General Framework for Content-enhanced Network Representation Learning

Xiaofei Sun, Jiang Guo, Xiao Ding, Ting Liu
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Learning (cs.LG)

This paper investigates the problem of network embedding, which aims at
learning low-dimensional vector representation of nodes in networks. Most
existing network embedding methods rely solely on the network structure, i.e.,
the linkage relationships between nodes, but ignore the rich content
information associated with it, which is common in real world networks and
beneficial to describing the characteristics of a node. In this paper, we
propose content-enhanced network embedding (CENE), which is capable of jointly
leveraging the network structure and the content information. Our approach
integrates text modeling and structure modeling in a general framework by
treating the content information as a special kind of node. Experiments on
several real world net- works with application to node classification show that
our models outperform all existing network embedding methods, demonstrating the
merits of content information and joint learning.

Distributed, Parallel, and Cluster Computing

Distributed Computation of Mixing Time

Anisur Rahaman Molla, Gopal Pandurangan
Comments: To appear in the Proceedings of ICDCN 2017, 10 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

The mixing time of a graph is an important metric, which is not only useful
in analyzing connectivity and expansion properties of the network, but also
serves as a key parameter in designing efficient algorithms. We present an
efficient distributed algorithm for computing the mixing time of undirected
graphs. Our algorithm estimates the mixing time ( au_s) (with respect to a
source node (s)) of any (n)-node undirected graph in (O( au_s log n)) rounds.
Our algorithm is based on random walks and require very little memory and use
lightweight local computations, and work in the CONGEST model. Hence our
algorithm is scalable under bandwidth constraints and can be an helpful
building block in the design of topologically aware networks.

High-performance K-means Implementation based on a Coarse-grained Map-Reduce Architecture

Zhehao Li, Jifang Jin, Lingli Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)

The k-means algorithm is one of the most common clustering algorithms and
widely used in data mining and pattern recognition. The increasing
computational requirement of big data applications makes hardware acceleration
for the k-means algorithm necessary. In this paper, a coarse-grained Map-Reduce
architecture is proposed to implement the k-means algorithm on an FPGA.
Algorithmic segmentation, data path elaboration and automatic control are
applied to optimize the architecture for high performance. In addition, high
level synthesis technique is utilized to reduce development cycles and
complexity. For a single iteration in the k-means algorithm, a throughput of
28.74 Gbps is achieved. The performance shows at least 3.93x speedup compared
with four representative existing FPGA-based implementations and can satisfy
the demand of big data applications.

Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

Arda Aytekin, Hamid Reza Feyzmahdavian, Mikael Johansson
Comments: 10 pages, 3 figures
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)

This paper presents an asynchronous incremental aggregated gradient algorithm
and its implementation in a parameter server framework for solving regularized
optimization problems. The algorithm can handle both general convex (possibly
non-smooth) regularizers and general convex constraints. When the empirical
data loss is strongly convex, we establish linear convergence rate, give
explicit expressions for step-size choices that guarantee convergence to the
optimum, and bound the associated convergence factors. The expressions have an
explicit dependence on the degree of asynchrony and recover classical results
under synchronous operation. Simulations and implementations on commercial
compute clouds validate our findings.

Learning

Online Contrastive Divergence with Generative Replay: Experience Replay without Storing Data

Decebal Constantin Mocanu, Maria Torres Vega, Eric Eaton, Peter Stone, Antonio Liotta
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Conceived in the early 1990s, Experience Replay (ER) has been shown to be a
successful mechanism to allow online learning algorithms to reuse past
experiences. Traditionally, ER can be applied to all machine learning paradigms
(i.e., unsupervised, supervised, and reinforcement learning). Recently, ER has
contributed to improving the performance of deep reinforcement learning. Yet,
its application to many practical settings is still limited by the memory
requirements of ER, necessary to explicitly store previous observations. To
remedy this issue, we explore a novel approach, Online Contrastive Divergence
with Generative Replay (OCD_GR), which uses the generative capability of
Restricted Boltzmann Machines (RBMs) instead of recorded past experiences. The
RBM is trained online, and does not require the system to store any of the
observed data points. We compare OCD_GR to ER on 9 real-world datasets,
considering a worst-case scenario (data points arriving in sorted order) as
well as a more realistic one (sequential random-order data points). Our results
show that in 64.28% of the cases OCD_GR outperforms ER and in the remaining
35.72% it has an almost equal performance, while having a considerably reduced
space complexity (i.e., memory usage) at a comparable time complexity.

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon
Subjects: Learning (cs.LG)

Federated Learning is a machine learning setting where the goal is to train a
high-quality centralized model with training data distributed over a large
number of clients each with unreliable and relatively slow network connections.
We consider learning algorithms for this setting where on each round, each
client independently computes an update to the current model based on its local
data, and communicates this update to a central server, where the client-side
updates are aggregated to compute a new global model. The typical clients in
this setting are mobile phones, and communication efficiency is of utmost
importance. In this paper, we propose two ways to reduce the uplink
communication costs. The proposed methods are evaluated on the application of
training a deep neural network to perform image classification. Our best
approach reduces the upload communication required to train a reasonable model
by two orders of magnitude.

Provably Good Early Detection of Diseases using Non-Sparse Covariance-Regularized Linear Discriminant Analysis

Haoyi Xiong, Yanjie Fu, Wenqing Hu, Guanling Chen, Laura E. Barnes
Subjects: Learning (cs.LG)

To improve the performance of Linear Discriminant Analysis (LDA) for early
detection of diseases using Electronic Health Records (EHR) data, we propose
TheName{} — a novel framework for emph{underline{E}HR based
underline{E}arly underline{D}etection of underline{D}iseases} on top of
emph{Covariance-Regularized} LDA models. Specifically, TheName employs a
emph{non-sparse} inverse covariance matrix (or namely precision matrix)
estimator derived from graphical lasso and incorporates the estimator into LDA
classifiers to improve classification accuracy. Theoretical analysis on
TheName shows that it can bound the expected error rate of LDA
classification, under certain assumptions. Finally, we conducted extensive
experiments using a large-scale real-world EHR dataset — CHSN. We compared our
solution with other regularized LDA and downstream classifiers. The result
shows TheName outperforms all baselines and backups our theoretical analysis.

Sequential Learning without Feedback

Manjesh Hanawal, Csaba Szepesvari, Venkatesh Saligrama
Subjects: Learning (cs.LG)

In many security and healthcare systems a sequence of features/sensors/tests
are used for detection and diagnosis. Each test outputs a prediction of the
latent state, and carries with it inherent costs. Our objective is to {it
learn} strategies for selecting tests to optimize accuracy & costs.
Unfortunately it is often impossible to acquire in-situ ground truth
annotations and we are left with the problem of unsupervised sensor selection
(USS). We pose USS as a version of stochastic partial monitoring problem with
an {it unusual} reward structure (even noisy annotations are unavailable).
Unsurprisingly no learner can achieve sublinear regret without further
assumptions. To this end we propose the notion of weak-dominance. This is a
condition on the joint probability distribution of test outputs and latent
state and says that whenever a test is accurate on an example, a later test in
the sequence is likely to be accurate as well. We empirically verify that weak
dominance holds on real datasets and prove that it is a maximal condition for
achieving sublinear regret. We reduce USS to a special case of multi-armed
bandit problem with side information and develop polynomial time algorithms
that achieve sublinear regret.

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)

Some machine learning applications involve training data that is sensitive,
such as the medical histories of patients in a clinical trial. A model may
inadvertently and implicitly store some of its training data; careful analysis
of the model may therefore reveal sensitive information.

To address this problem, we demonstrate a generally applicable approach to
providing strong privacy guarantees for training data. The approach combines,
in a black-box fashion, multiple models trained with disjoint datasets, such as
records from different subsets of users. Because they rely directly on
sensitive data, these models are not published, but instead used as teachers
for a student model. The student learns to predict an output chosen by noisy
voting among all of the teachers, and cannot directly access an individual
teacher or the underlying data or parameters. The student’s privacy properties
can be understood both intuitively (since no single teacher and thus no single
dataset dictates the student’s training) and formally, in terms of differential
privacy. These properties hold even if an adversary can not only query the
student but also inspect its internal workings.

Compared with previous work, the approach imposes only weak assumptions on
how teachers are trained: it applies to any model, including non-convex models
like DNNs. We achieve state-of-the-art privacy/utility trade-offs on MNIST and
SVHN thanks to an improved privacy analysis and semi-supervised learning.

Deep Amortized Inference for Probabilistic Programs

Daniel Ritchie, Paul Horsfall, Noah D. Goodman
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Probabilistic programming languages (PPLs) are a powerful modeling tool, able
to represent any computable probability distribution. Unfortunately,
probabilistic program inference is often intractable, and existing PPLs mostly
rely on expensive, approximate sampling-based methods. To alleviate this
problem, one could try to learn from past inferences, so that future inferences
run faster. This strategy is known as amortized inference; it has recently been
applied to Bayesian networks and deep generative models. This paper proposes a
system for amortized inference in PPLs. In our system, amortization comes in
the form of a parameterized guide program. Guide programs have similar
structure to the original program, but can have richer data flow, including
neural network components. These networks can be optimized so that the guide
approximately samples from the posterior distribution defined by the original
program. We present a flexible interface for defining guide programs and a
stochastic gradient-based scheme for optimizing guide parameters, as well as
some preliminary results on automatically deriving guide programs. We explore
in detail the common machine learning pattern in which a ‘local’ model is
specified by ‘global’ random values and used to generate independent observed
data points; this gives rise to amortized local inference supporting global
model learning.

Fast L1-NMF for Multiple Parametric Model Estimation

Mariano Tepper, Guillermo Sapiro
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

In this work we introduce a comprehensive algorithmic pipeline for multiple
parametric model estimation. The proposed approach analyzes the information
produced by a random sampling algorithm (e.g., RANSAC) from a machine
learning/optimization perspective, using a extit{parameterless} biclustering
algorithm based on L1 nonnegative matrix factorization (L1-NMF). The proposed
framework exploits consistent patterns that naturally arise during the RANSAC
execution, while explicitly avoiding spurious inconsistencies. Contrarily to
the main trends in the literature, the proposed technique does not impose
non-intersecting parametric models. A new accelerated algorithm to compute
L1-NMFs allows to handle medium-sized problems faster while also extending the
usability of the algorithm to much larger datasets. This accelerated algorithm
has applications in any other context where an L1-NMF is needed, beyond the
biclustering approach to parameter estimation here addressed. We accompany the
algorithmic presentation with theoretical foundations and numerous and diverse
examples.

Feasibility Based-Large Margin Nearest Neighbor Metric Learning

Babak Hosseini, Barbara Hammer
Comments: This is the preprint of a submitted conference paper as provided by the authors
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

In the area of data classification, one of the prominent algorithms is the
large margin nearest neighbor (LMNN) approach which is a metric learning to
enhance the performance of the popular k-nearest neighbor classifier. In
principles, LMNN learns a more efficient metric in the input space by using a
linear mapping as the outcome of a convex optimization problem. However, one of
the greatest weak points of LMNN is the strong reliance of its optimization
paradigm on how the neighboring points are chosen. In this paper, it is
mathematically proved for the first time that the regular way of choosing the
target points can lead to non-feasible optimization conditions regardless of
the number of chosen neighboring points. We present a mathematical approach to
categorize the target points into feasible and infeasible exemplars, an also we
provide a feasibility measure for preference of the target candidates. In our
proposed Feasibility Based-LMNN algorithm, we use the above clue to construct
the optimization problem based on the most promising general mapping directions
in the input space. Our empirical results shows that via using the proposed
FB-LMNN approach the optimization problem will converge in a better optimum
point, and therefor leads to better classification on the well-known benchmark
datasets.

Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Pranay Dighe, Afsaneh Asaei, Herve Bourlard
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)

Conventional deep neural networks (DNN) for speech acoustic modeling rely on
Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary
class labels as the targets for DNN training. Subword classes in speech
recognition systems correspond to context-dependent tied states or senones. The
present work addresses some limitations of GMM-HMM senone alignments for DNN
training. We hypothesize that the senone probabilities obtained from a DNN
trained with binary labels can provide more accurate targets to learn better
acoustic models. However, DNN outputs bear inaccuracies which are exhibited as
high dimensional unstructured noise, whereas the informative components are
structured and low-dimensional. We exploit principle component analysis (PCA)
and sparse coding to characterize the senone subspaces. Enhanced probabilities
obtained from low-rank and sparse reconstructions are used as soft-targets for
DNN acoustic modeling, that also enables training with untranscribed data.
Experiments conducted on AMI corpus shows 4.6% relative reduction in word error
rate.

Markov Chain Truncation for Doubly-Intractable Inference

Colin Wei, Iain Murray
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Computing partition functions, the normalizing constants of probability
distributions, is often hard. Variants of importance sampling give unbiased
estimates of a normalizer Z, however, unbiased estimates of the reciprocal 1/Z
are harder to obtain. Unbiased estimates of 1/Z allow Markov chain Monte Carlo
sampling of “doubly-intractable” distributions, such as the parameter posterior
for Markov Random Fields or Exponential Random Graphs. We demonstrate how to
construct unbiased estimates for 1/Z given access to black-box importance
sampling estimators for Z. We adapt recent work on random series truncation and
Markov chain coupling, producing estimators with lower variance and a higher
percentage of positive estimates than before. Our debiasing algorithms are
simple to implement, and have some theoretical and empirical advantages over
existing methods.

Stylometric Analysis of Early Modern Period English Plays

Mark Eisen, Santiago Segarra, Gabriel Egan, Alejandro Ribeiro
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Function word adjacency networks (WANs) are used to study the authorship of
plays from the Early Modern English period. In these networks, nodes are
function words and directed edges between two nodes represent the likelihood of
ordered co-appearance of the two words. For every analyzed play a WAN is
constructed and these are aggregated to generate author profile networks. We
first study the similarity of writing styles between Early English playwrights
by comparing the profile WANs. The accuracy of using WANs for authorship
attribution is then demonstrated by attributing known plays among six popular
playwrights. The WAN method is shown to additionally outperform other
frequency-based methods on attributing Early English plays. This high
classification power is then used to investigate the authorship of anonymous
plays. Moreover, WANs are shown to be reliable classifiers even when
attributing collaborative plays. For several plays of disputed co- authorship,
a deeper analysis is performed by attributing every act and scene separately,
in which we both corroborate existing breakdowns and provide evidence of new
assignments. Finally, the impact of genre on attribution accuracy is examined
revealing that the genre of a play partially conditions the choice of the
function words used in it.

Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

This paper presents an asynchronous incremental aggregated gradient algorithm
and its implementation in a parameter server framework for solving regularized
optimization problems. The algorithm can handle both general convex (possibly
non-smooth) regularizers and general convex constraints. When the empirical
data loss is strongly convex, we establish linear convergence rate, give
explicit expressions for step-size choices that guarantee convergence to the
optimum, and bound the associated convergence factors. The expressions have an
explicit dependence on the degree of asynchrony and recover classical results
under synchronous operation. Simulations and implementations on commercial
compute clouds validate our findings.

An Interactive Machine Learning Framework

Teng Lee, James Johnson, Steve Cheng
Subjects: Human-Computer Interaction (cs.HC); Learning (cs.LG)

Machine learning (ML) is believed to be an effective and efficient tool to
build reliable prediction model or extract useful structure from an avalanche
of data. However, ML is also criticized by its difficulty in interpretation and
complicated parameter tuning. In contrast, visualization is able to well
organize and visually encode the entangled information in data and guild
audiences to simpler perceptual inferences and analytic thinking. But large
scale and high dimensional data will usually lead to the failure of many
visualization methods. In this paper, we close a loop between ML and
visualization via interaction between ML algorithm and users, so machine
intelligence and human intelligence can cooperate and improve each other in a
mutually rewarding way. In particular, we propose “transparent boosting tree
(TBT)”, which visualizes both the model structure and prediction statistics of
each step in the learning process of gradient boosting tree to user, and
involves user’s feedback operations to trees into the learning process. In TBT,
ML is in charge of updating weights in learning model and filtering information
shown to user from the big data, while visualization is in charge of providing
a visual understanding of ML model to facilitate user exploration. It combines
the advantages of both ML in big data statistics and human in decision making
based on domain knowledge. We develop a user friendly interface for this novel
learning method, and apply it to two datasets collected from real applications.
Our study shows that making ML transparent by using interactive visualization
can significantly improve the exploration of ML algorithms, give rise to novel
insights of ML models, and integrates both machine and human intelligence.

Modern WLAN Fingerprinting Indoor Positioning Methods and Deployment Challenges

Ali Khalajmehrabadi, Nikolaos Gatsis, David Akopian
Subjects: Networking and Internet Architecture (cs.NI); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)

Wireless Local Area Network (WLAN) has become a promising choice for indoor
positioning as the only existing and established infrastructure, to localize
the mobile and stationary users indoors. However, since WLAN has been initially
designed for wireless networking and not positioning, the localization task
based on WLAN signals has several challenges. Amongst the WLAN positioning
methods, WLAN fingerprinting localization has recently achieved great attention
due to its promising results. WLAN fingerprinting faces several challenges and
hence, in this paper, our goal is to overview these challenges and the
state-of-the-art solutions. This paper consists of three main parts: 1)
Conventional localization schemes; 2) State-of-the-art approaches; 3) Practical
deployment challenges. Since all the proposed methods in WLAN literature have
been conducted and tested in different settings, the reported results are not
equally comparable. So, we compare some of the main localization schemes in a
single real environment and assess their localization accuracy, positioning
error statistics, and complexity. Our results depict illustrative evaluation of
WLAN localization systems and guide to future improvement opportunities.

Structured Group Sparsity: A Novel Indoor WLAN Localization, Outlier Detection, and Radio Map Interpolation Scheme

Ali Khalajmehrabadi, Nikolaos Gatsis, David Akopian
Subjects: Networking and Internet Architecture (cs.NI); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)

This paper introduces novel schemes for indoor localization, outlier
detection, and radio map interpolation using Wireless Local Area Networks
(WLANs). The localization method consists of a novel multicomponent
optimization technique that minimizes the squared (ell_{2})-norm of the
residuals between the radio map and the online Received Signal Strength (RSS)
measurements, the (ell_{1})-norm of the user’s location vector, and weighted
(ell_{2})-norms of layered groups of Reference Points (RPs). RPs are grouped
using a new criterion based on the similarity between the so-called Access
Point (AP) coverage vectors. In addition, since AP readings are prone to
containing inordinate readings, called outliers, an augmented optimization
problem is proposed to detect the outliers and localize the user with cleaned
online measurements. Moreover, a novel scheme to record fingerprints from a
smaller number of RPs and estimate the radio map at RPs without recorded
fingerprints is developed using sparse recovery techniques. All localization
schemes are tested on RSS fingerprints collected from a real environment. The
overall scheme has comparable complexity with competing approaches, while it
performs with high accuracy under a small number of APs and finer granularity
of RPs.

A Joint Indoor WLAN Localization and Outlier Detection Scheme Using LASSO and Elastic-Net Optimization Techniques

Ali Khalajmehrabadi, Nikolaos Gatsis, Daniel Pack, David Akopian
Subjects: Networking and Internet Architecture (cs.NI); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)

In this paper, we introduce two indoor Wireless Local Area Network (WLAN)
positioning methods using augmented sparse recovery algorithms. These schemes
render a sparse user’s position vector, and in parallel, minimize the distance
between the online measurement and radio map. The overall localization scheme
for both methods consists of three steps: 1) coarse localization, obtained from
comparing the online measurements with clustered radio map. A novel graph-based
method is proposed to cluster the offline fingerprints. In the online phase, a
Region Of Interest (ROI) is selected within which we search for the user’s
location; 2) Access Point (AP) selection; and 3) fine localization through the
novel sparse recovery algorithms. Since the online measurements are subject to
inordinate measurement readings, called outliers, the sparse recovery methods
are modified in order to jointly estimate the outliers and user’s position
vector. The outlier detection procedure identifies the APs whose readings are
either not available or erroneous. The proposed localization methods have been
tested with Received Signal Strength (RSS) measurements in a typical office
environment and the results show that they can localize the user with
significantly high accuracy and resolution which is superior to the results
from competing WLAN fingerprinting localization methods.

Information Theory

Uniform Recovery from Subgaussian Multi-Sensor Measurements

Il Yong Chu, Ben Adcock
Comments: 39 pages, 7 figures
Subjects: Information Theory (cs.IT); Functional Analysis (math.FA)

Parallel acquisition systems are employed successfully in a variety of
different sensing applications when a single sensor cannot provide enough
measurements for a high-quality reconstruction. In this paper, we consider
compressed sensing (CS) for parallel acquisition systems when the individual
sensors use subgaussian random sampling. Our main results are a series of
uniform recovery guarantees which relate the number of measurements required to
the basis in which the solution is sparse and certain characteristics of the
multi-sensor system, known as sensor profile matrices. In particular, we derive
sufficient conditions for optimal recovery, in the sense that the number of
measurements required per sensor decreases linearly with the total number of
sensors, and demonstrate explicit examples of multi-sensor systems for which
this holds. We establish these results by proving the so-called Asymmetric
Restricted Isometry Property (ARIP) for the sensing system and use this to
derive both nonuniversal and universal recovery guarantees. Compared to
existing work, our results not only lead to better stability and robustness
estimates but also provide simpler and sharper constants in the measurement
conditions. Finally, we show how the problem of CS with block-diagonal sensing
matrices can be viewed as a particular case of our multi-sensor framework.
Specializing our results to this setting leads to a new recovery guarantee
depending only on the incoherence of the sparsity basis. This improves existing
results in the literature and confirms the existence of deterministic sparsity
bases for which CS with subgaussian block-diagonal matrices has a comparable
recovery guarantee to CS with full subgaussian matrices.

Packet Error Rate Analysis of Uncoded Schemes in Block-Fading Channels using Extreme Value Theory

Aamir Mahmood, Riku Jäntti
Subjects: Information Theory (cs.IT)

We present a generic approximation of the packet error rate (PER) function of
uncoded schemes in the AWGN channel using extreme value theory (EVT). The PER
function can assume both the exponential and the Gaussian Q-function bit error
rate (BER) forms. The EVT approach leads us to a best closed-form
approximation, in terms of accuracy and computational efficiency, of the
average PER in block-fading channels. The numerical analysis shows that the
approximation holds tight for any value of SNR and packet length whereas the
earlier studies approximate the average PER only at asymptotic SNRs and packet
lengths.

Tractable Framework for the Analysis of General Multi-Tier Heterogeneous Cellular Networks

Serkan Ak, Hazer Inaltekin, H. Vincent Poor
Comments: arXiv admin note: substantial text overlap with arXiv:1601.06023, arXiv:1601.07315
Subjects: Information Theory (cs.IT)

This paper investigates the downlink performance of K-tier heteregeneous
cellular networks (HCNs) under general settings. First, Gaussian approximation
bounds for the standardized aggregate wireless interference (AWI) in dense
K-tier HCNs are obtained for when base stations (BSs) in each tier are
distributed over the plane according to a spatial and general Poisson point
process. The Kolmogorov-Smirnov (KS) distance is used to measure deviations of
the distribution of the standardized AWI from the standard normal distribution.
An explicit and analytical expression bounding the KS distance between these
two distributions is obtained as a function of a broad range of network
parameters such as per-tier transmission power levels, per-tier BS intensity,
BS locations, general fading statistics, and general bounded path-loss models.
Bounds achieve a good statistical match between the standardized AWI
distribution and its normal approximation even for moderately dense HCNs.
Second, various spatial performance metrics of interest such as outage
capacity, ergodic capacity and area spectral efficiency in the downlink of
K-tier HCNs for general signal propogation models are investigated by making
use of the derived distribution approximation results. Considering two specific
BS association policies, it is shown that the derived performance bounds track
the actual performance metrics reasonably well for a wide range of BS
intensities, with the gap among them becoming negligibly small for denser HCN
deployments.

An SMDP Approach to Optimal PHY Configuration in Wireless Networks

Mark Shifrin, Daniel S. Menasché, Asaf Cohen, Omer Gurewitz, Dennis Goeckel
Comments: arXiv admin note: text overlap with arXiv:1601.06859
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

In this work, we study the optimal configuration of the physical layer in
wireless networks by means of Semi-Markov Decision Process (SMDP) modeling. In
particular, assume the physical layer is characterized by a set of potential
operating points, with each point corresponding to a rate and reliability pair;
for example, these pairs might be obtained through a now-standard
diversity-vs-multiplexing trade-off characterization. Given the current network
state (e.g., buffer occupancies), a Decision Maker (DM) needs to dynamically
decide which operating point to use. The SMDP problem formulation allows us to
choose from these pairs an optimal selection, which is expressed by a decision
rule as a function of the number of awaiting packets in the source finite
queue, channel state, size of the packet to be transmitted. We derive a general
solution which covers various model configurations, including packet size
distributions and varying channels. For the specific case of exponential
transmission time, we analytically prove the optimal policy has a threshold
structure. Numerical results validate this finding, as well as depict
muti-threshold policies for time varying channels such as the Gilbert-Elliot
channel

Auxiliary Beam Pair Enabled AoD and AoA Estimation in Closed-Loop Large-Scale mmWave MIMO System

Dalin Zhu, Junil Choi, Robert W. Heath Jr
Comments: Submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

Channel estimation is of critical importance in millimeter-wave (mmWave)
multiple-input multiple-output (MIMO) system. Due to the use of large antenna
arrays, low-complexity mmWave specific channel estimation algorithms are
required. In this paper, an auxiliary beam pair design is proposed to provide
high-resolution estimates of the channel’s angle-of-departure (AoD) and
angle-of-arrival (AoA) for mmWave MIMO system. By performing an amplitude
comparison with respect to each auxiliary beam pair, a set of ratio measures
that characterize the channel’s AoD and AoA are obtained by the receiver.
Either the best ratio measure or the estimated AoD is quantized and fed back to
the transmitter via a feedback channel. The proposed technique can be
incorporated into control channel design to minimize initial access delay.
Though the design principles are derived assuming a high-power regime,
evaluation under more realistic assumption show that by employing the proposed
method, good angle estimation performance is achieved under various
signal-to-noise ratio levels and channel conditions.

Conic Quadratic Formulations for Wireless Communications Design

Quang-Doanh Vu, Markku Juntti, Een-Kee Hong, Le-Nam Tran
Comments: Submitted for possible publication, 13 pages, 9 figures
Subjects: Information Theory (cs.IT)

As a wide class of resource management problems in wireless communications
are nonconvex and even NP-hard in many cases, finding globally optimal
solutions to these problems is of little practical interest. Towards more
pragmatic approaches, there is a rich literature on iterative methods aiming at
finding a solution satisfying necessary optimality conditions to these
problems. These approaches have been derived under several similar mathematical
frameworks such as inner approximation algorithm, concave-convex procedure,
majorization-minimization algorithm, and successive convex approximation (SCA).
However, a large portion of existing algorithms arrive at a relatively generic
program at each iteration, which is less computationally efficient compared to
a more standard convex formulation. The purpose of this paper is to present
useful transformations and approximations to deal with nonconvexity in wireless
communications design problems. More specifically, the proposed formulations
can approximate nonconvex problems by a series of second-order cone programs
(SOCPs) in light of SCA framework. We revisit various design problems in
wireless communications to demonstrate the advantages of the proposed idea.
Theoretical analysis and numerical results show the superior efficiency in
terms of computational cost of our proposed solutions compared to the existing
ones.

On Optimizing Hierarchical Modulation in AWGN channel

Baicen Xiao, Kexin Xiao, Zhiyong Chen, Hui Liu
Comments: Submitted to IEEE ICC 2017
Subjects: Information Theory (cs.IT)

Hierarchical modulation (HM) is able to provide different levels of
protection for data streams and achieve a rate region that cannot be realized
by traditional orthogonal schemes, such as time division (TD). Nevertheless,
criterions and algorithms for general HM design are not available in existing
literatures. In this paper, we jointly optimize the constellation positions and
binary labels for HM to be used in additive white gaussian noise (AWGN)
channel. Based on bit-interleaved coded modulation (BICM) with successive
interference cancellation (SIC) capacity, our main purpose is to maximize the
rate of one data stream, with power constrains and the constrain that the rate
of other data streams should be larger than given thresholds. Multi-start
interior-point algorithm is used to carry out the constellation optimization
problems and methods to reduce optimization complexity are also proposed in
this paper. Numerical results verify the performance gains of optimized HM
compared with optimized quadrature amplidude modulation (QAM) based HM and
other orthogonal transmission methods.

Optimal Communication Scheduling and Remote Estimation over an Additive Noise Channel

Xiaobin Gao, Emrah Akyol, Tamer Basar
Comments: Extended version of a journal submission
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Systems and Control (cs.SY)

This paper considers a sequential sensor scheduling and remote estimation
problem with one sensor and one estimator. The sensor makes sequential
observations about the state of an underlying memoryless stochastic process and
makes a decision as to whether or not to send this measurement to the
estimator. The sensor and the estimator have the common objective of minimizing
expected distortion in the estimation of the state of the process, over a
finite time horizon. The sensor is either charged a cost for each transmission
or constrained on transmission times. As opposed to the prior work where
communication between the sensor and the estimator was assumed to be perfect
(noiseless), in this work an additive noise channel with fixed power constraint
is considered; hence, the sensor has to encode its message before transmission.
Under some technical assumptions, we obtain the optimal encoding and estimation
policies in conjunction with the optimal transmission schedule. The impact of
the presence of a noisy channel is analyzed numerically based on dynamic
programming. This analysis yields some rather surprising results such as a
phase transition phenomenon in the number of used transmission opportunities,
which was not encountered in the noiseless communication setting.