IT博客汇 | arXiv Paper Daily: Mon, 27 Mar 2017

arXiv Paper Daily: Mon, 27 Mar 2017

我爱机器学习(52ml.net)发表于 2017-03-27 00:00:00

Neural and Evolutionary Computing

PonyGE2: Grammatical Evolution in Python

Michael Fenton, James McDermott, David Fagan, Stefan Forstenlechner, Michael O'Neill, Erik Hemberg
Comments: 8 pages, 4 figures, submitted to the 2017 GECCO Workshop on Evolutionary Computation Software Systems (EvoSoft)
Subjects: Neural and Evolutionary Computing (cs.NE)

Grammatical Evolution (GE) is a population-based evolutionary algorithm,
where a formal grammar is used in the genotype to phenotype mapping process.
PonyGE2 is an open source implementation of GE in Python, developed at UCD’s
Natural Computing Research and Applications group. It is intended as an
advertisement and a starting-point for those new to GE, a reference for
students and researchers, a rapid-prototyping medium for our own experiments,
and a Python workout. As well as providing the characteristic genotype to
phenotype mapping of GE, a search algorithm engine is also provided. A number
of sample problems and tutorials on how to use and adapt PonyGE2 have been
developed.

Long-Term Evolution of Genetic Programming Populations

W. B. Langdon
Comments: Longer version of Langdon:2017:GECCO, July 2017, ACM, Berlin, RN/17/05 this http URL
Subjects: Neural and Evolutionary Computing (cs.NE)

We evolve binary mux-6 trees for up to 100000 generations evolving some
programs with more than a hundred million nodes. Our unbounded Long-Term
Evolution Experiment LTEE GP appears not to evolve building blocks but does
suggests a limit to bloat. We do see periods of tens even hundreds of
generations where the population is 100 percent functionally converged. The
distribution of tree sizes is not as predicted by theory.

Computer Vision and Pattern Recognition

Radiomics strategies for risk assessment of tumour failure in head-and-neck cancer

Martin Vallières (1), Emily Kay-Rivest (2), Léo Jean Perrin (3), Xavier Liem (4), Christophe Furstoss (5), Hugo J. W. L. Aerts (6), Nader Khaouam (5), Phuc Felix Nguyen-Tan (4), Chang-Shu Wang (3), Khalil Sultanem (2), Jan Seuntjens (1), Issam El Naqa (7) ((1) Medical Physics Unit, McGill University, Montréal, Canada, (2) Radiation Oncology Division, Hôpital général juif, Montréal, Canada, (3) Department of Radiation Oncology, Centre hospitalier universitaire de Sherbrooke, Montréal, Canada, (4) Department of Radiation Oncology, Centre hospitalier de l'Université de Montréal, Montréal, Canada, (5) Department of Radiation Oncology, Hôpital Maisonneuve-Rosemont, Montréal, Canada, (6) Departments of Radiation Oncology & Radiology, Dana-Farber Cancer Institute, Boston, USA, (7) Department of Radiation Oncology, Physics Division, University of Michigan, Ann Arbor, USA)
Comments: (1) Paper: 33 pages, 4 figures, 1 table; (2) SUPP info: 41 pages, 7 figures, 8 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Quantitative extraction of high-dimensional mineable data from medical images
is a process known as radiomics. Radiomics is foreseen as an essential
prognostic tool for cancer risk assessment and the quantification of
intratumoural heterogeneity. In this work, 1615 radiomic features (quantifying
tumour image intensity, shape, texture) extracted from pre-treatment FDG-PET
and CT images of 300 patients from four different cohorts were analyzed for the
risk assessment of locoregional recurrences (LR) and distant metastases (DM) in
head-and-neck cancer. Prediction models combining radiomic and clinical
variables were constructed via random forests and imbalance-adjustment
strategies using two of the four cohorts. Independent validation of the
prediction and prognostic performance of the models was carried out on the
other two cohorts (LR: AUC = 0.69 and CI = 0.67; DM: AUC = 0.86 and CI = 0.88).
Furthermore, the results obtained via Kaplan-Meier analysis demonstrated the
potential of radiomics for assessing the risk of specific tumour outcomes using
multiple stratification groups. This could have important clinical impact,
notably by allowing for a better personalization of chemo-radiation treatments
for head-and-neck cancer patients from different risk groups.

Local Deep Neural Networks for Age and Gender Classification

Zukang Liao, Stavros Petridis, Maja Pantic
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Local deep neural networks have been recently introduced for gender
recognition. Although, they achieve very good performance they are very
computationally expensive to train. In this work, we introduce a simplified
version of local deep neural networks which significantly reduces the training
time. Instead of using hundreds of patches per image, as suggested by the
original method, we propose to use 9 overlapping patches per image which cover
the entire face region. This results in a much reduced training time, since
just 9 patches are extracted per image instead of hundreds, at the expense of a
slightly reduced performance. We tested the proposed modified local deep neural
networks approach on the LFW and Adience databases for the task of gender and
age classification. For both tasks and both databases the performance is up to
1% lower compared to the original version of the algorithm. We have also
investigated which patches are more discriminative for age and gender
classification. It turns out that the mouth and eyes regions are useful for age
classification, whereas just the eye region is useful for gender
classification.

Multi-stage Multi-recursive-input Fully Convolutional Networks for Neuronal Boundary Detection

Wei Shen, Bin Wang, Yuan Jiang, Yan Wang, Alan Yuille
Comments: 10 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the field of connectomics, neuroscientists seek to identify cortical
connectivity comprehensively. Neuronal boundary detection from the Electron
Microscopy (EM) images is often done to assist the automatic reconstruction of
neuronal circuit. But the segmentation of EM images is a challenging problem,
as it requires the detector to be able to detect both filament-like thin and
blob-like thick membrane, while suppressing the ambiguous intracellular
structure. In this paper, we propose multi-stage multi-recursive-input fully
convolutional networks to address this problem. The multiple recursive inputs
for one stage, i.e., the multiple side outputs with different receptive field
sizes learned from the lower stage, provide multi-scale contextual boundary
information for the consecutive learning. This design is
biologically-plausible, as it likes a human visual system to compare different
possible segmentation solutions to address the ambiguous boundary issue. Our
multi-stage networks are trained end-to-end. It achieves promising results on a
public available mouse piriform cortex dataset, which significantly outperforms
other competitors.

Content-Based Image Retrieval Based on Late Fusion of Binary and Local Descriptors

Nouman Ali, Danish Ali Mazhar, Zeshan Iqbal, Rehan Ashraf, Jawad Ahmed, Farrukh Zeeshan Khan
Journal-ref: International Journal of Computer Science and Information Security
(IJCSIS), Volume 14, Issue 11, Publication date 2016/11
Subjects: Computer Vision and Pattern Recognition (cs.CV)

One of the challenges in Content-Based Image Retrieval (CBIR) is to reduce
the semantic gaps between low-level features and high-level semantic concepts.
In CBIR, the images are represented in the feature space and the performance of
CBIR depends on the type of selected feature representation. Late fusion also
known as visual words integration is applied to enhance the performance of
image retrieval. The recent advances in image retrieval diverted the focus of
research towards the use of binary descriptors as they are reported
computationally efficient. In this paper, we aim to investigate the late fusion
of Fast Retina Keypoint (FREAK) and Scale Invariant Feature Transform (SIFT).
The late fusion of binary and local descriptor is selected because among binary
descriptors, FREAK has shown good results in classification-based problems
while SIFT is robust to translation, scaling, rotation and small distortions.
The late fusion of FREAK and SIFT integrates the performance of both feature
descriptors for an effective image retrieval. Experimental results and
comparisons show that the proposed late fusion enhances the performances of
image retrieval.

Medical Image Retrieval using Deep Convolutional Neural Network

Adnan Qayyum, Syed Muhammad Anwar, Muhammad Awais, Muhammad Majid
Comments: Submitted to Neurocomputing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

With a widespread use of digital imaging data in hospitals, the size of
medical image repositories is increasing rapidly. This causes difficulty in
managing and querying these large databases leading to the need of content
based medical image retrieval (CBMIR) systems. A major challenge in CBMIR
systems is the semantic gap that exists between the low level visual
information captured by imaging devices and high level semantic information
perceived by human. The efficacy of such systems is more crucial in terms of
feature representations that can characterize the high-level information
completely. In this paper, we propose a framework of deep learning for CBMIR
system by using deep Convolutional Neural Network (CNN) that is trained for
classification of medical images. An intermodal dataset that contains twenty
four classes and five modalities is used to train the network. The learned
features and the classification results are used to retrieve medical images.
For retrieval, best results are achieved when class based predictions are used.
An average classification accuracy of 99.77% and a mean average precision of
0.69 is achieved for retrieval task. The proposed method is best suited to
retrieve multimodal medical images for different body organs.

Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach

Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, Shuicheng Yan
Comments: Accepted to appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We investigate a principle way to progressively mine discriminative object
regions using classification networks to address the weakly-supervised semantic
segmentation problems. Classification networks are only responsive to small and
sparse discriminative regions from the object of interest, which deviates from
the requirement of the segmentation task that needs to localize dense, interior
and integral regions for pixel-wise inference. To mitigate this gap, we propose
a new adversarial erasing approach for localizing and expanding object regions
progressively. Starting with a single small object region, our proposed
approach drives the classification network to sequentially discover new and
complement object regions by erasing the current mined regions in an
adversarial manner. These localized regions eventually constitute a dense and
complete object region for learning semantic segmentation. To further enhance
the quality of the discovered regions by adversarial erasing, an online
prohibitive segmentation learning approach is developed to collaborate with
adversarial erasing by providing auxiliary segmentation supervision modulated
by the more reliable classification scores. Despite its apparent simplicity,
the proposed approach achieves 55.0% and 55.7% mean Intersection-over-Union
(mIoU) scores on PASCAL VOC 2012 val and test sets, which are the new
state-of-the-arts.

DeepVisage: Making face recognition simple yet with powerful generalization skills

Abul Hasnat, Julien Bohné, Stéphane Gentric, Liming Chen
Comments: 9 pages, under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face recognition (FR) methods report significant performance by adopting the
convolutional neural network (CNN) based learning methods. Although CNNs are
mostly trained by optimizing the softmax loss, the recent trend shows an
improvement of accuracy with different strategies, such as task-specific CNN
learning with different loss functions, fine-tuning on target dataset, metric
learning and concatenating features from multiple CNNs. Incorporating these
tasks obviously requires additional efforts. Moreover, it demotivates the
discovery of efficient CNN models for FR which are trained only with identity
labels. We focus on this fact and propose an easily trainable and single CNN
based FR method. Our CNN model exploits the residual learning framework.
Additionally, it uses normalized features to compute the loss. Our extensive
experiments show excellent generalization on different datasets. We obtain very
competitive and state-of-the-art results on the LFW, IJB-A, YouTube faces and
CACD datasets.

Feature Fusion using Extended Jaccard Graph and Stochastic Gradient Descent for Robot

Shenglan Liu, Muxin Sun, Wei Wang, Feilong Wang
Comments: Assembly Automation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

Robot vision is a fundamental device for human-robot interaction and robot
complex tasks. In this paper, we use Kinect and propose a feature graph fusion
(FGF) for robot recognition. Our feature fusion utilizes RGB and depth
information to construct fused feature from Kinect. FGF involves multi-Jaccard
similarity to compute a robust graph and utilize word embedding method to
enhance the recognition results. We also collect DUT RGB-D face dataset and a
benchmark datset to evaluate the effectiveness and efficiency of our method.
The experimental results illustrate FGF is robust and effective to face and
object datasets in robot applications.

A Hybrid Deep Learning Approach for Texture Analysis

Hussein Adly, Mohamed Moustafa
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Texture classification is a problem that has various applications such as
remote sensing and forest species recognition. Solutions tend to be custom fit
to the dataset used but fails to generalize. The Convolutional Neural Network
(CNN) in combination with Support Vector Machine (SVM) form a robust selection
between powerful invariant feature extractor and accurate classifier. The
fusion of experts provides stability in classification rates among different
datasets.

Scalable Person Re-identification on Supervised Smoothed Manifold

Song Bai, Xiang Bai, Qi Tian
Comments: Accepted as spotlight by CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most existing person re-identification algorithms either extract robust
visual features or learn discriminative metrics for person images. However, the
underlying manifold which those images reside on is rarely investigated. That
raises a problem that the learned metric is not smooth with respect to the
local geometry structure of the data manifold.

In this paper, we study person re-identification with manifold-based affinity
learning, which did not receive enough attention from this area. An
unconventional manifold-preserving algorithm is proposed, which can 1) make the
best use of supervision from training data, whose label information is given as
pairwise constraints; 2) scale up to large repositories with low on-line time
complexity; and 3) be plunged into most existing algorithms, serving as a
generic postprocessing procedure to further boost the identification
accuracies. Extensive experimental results on five popular person
re-identification benchmarks consistently demonstrate the effectiveness of our
method. Especially, on the largest CUHK03 and Market-1501, our method
outperforms the state-of-the-art alternatives by a large margin with high
efficiency, which is more appropriate for practical applications.

Improving Classification by Improving Labelling: Introducing Probabilistic Multi-Label Object Interaction Recognition

Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work deviates from easy-to-define class boundaries for object
interactions. For the task of object interaction recognition, often captured
using an egocentric view, we show that semantic ambiguities in verbs and
recognising sub-interactions along with concurrent interactions result in
legitimate class overlaps (Figure 1). We thus aim to model the mapping between
observations and interaction classes, as well as class overlaps, towards a
probabilistic multi-label classifier that emulates human annotators. Given a
video segment containing an object interaction, we model the probability for a
verb, out of a list of possible verbs, to be used to annotate that interaction.
The proba- bility is learnt from crowdsourced annotations, and is tested on two
public datasets, comprising 1405 video sequences for which we provide
annotations on 90 verbs. We outper- form conventional single-label
classification by 11% and 6% on the two datasets respectively, and show that
learning from annotation probabilities outperforms majority voting and enables
discovery of co-occurring labels.

Deep Direct Regression for Multi-Oriented Scene Text Detection

Wenhao He, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu
Comments: 9 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we first provide a new perspective to divide existing high
performance object detection methods into direct and indirect regressions.
Direct regression performs boundary regression by predicting the offsets from a
given point, while indirect regression predicts the offsets from some bounding
box proposals. Then we analyze the drawbacks of the indirect regression, which
the recent state-of-the-art detection structures like Faster-RCNN and SSD
follows, for multi-oriented scene text detection, and point out the potential
superiority of direct regression. To verify this point of view, we propose a
deep direct regression based method for multi-oriented scene text detection.
Our detection framework is simple and effective with a fully convolutional
network and one-step post processing. The fully convolutional network is
optimized in an end-to-end way and has bi-task outputs where one is pixel-wise
classification between text and non-text, and the other is direct regression to
determine the vertex coordinates of quadrilateral text boundaries. The proposed
method is particularly beneficial for localizing incidental scene texts. On the
ICDAR2015 Incidental Scene Text benchmark, our method achieves the F1-measure
of 81%, which is a new state-of-the-art and significantly outperforms previous
approaches. On other standard datasets with focused scene texts, our method
also reaches the state-of-the-art performance.

View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data

Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, Nanning Zheng
Comments: Technique report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Skeleton-based human action recognition has recently attracted increasing
attention due to the popularity of 3D skeleton data. One main challenge lies in
the large view variations in captured human actions. We propose a novel view
adaptation scheme to automatically regulate observation viewpoints during the
occurrence of an action. Rather than re-positioning the skeletons based on a
human defined prior criterion, we design a view adaptive recurrent neural
network (RNN) with LSTM architecture, which enables the network itself to adapt
to the most suitable observation viewpoints from end to end. Extensive
experiment analyses show that the proposed view adaptive RNN model strives to
(1) transform the skeletons of various views to much more consistent viewpoints
and (2) maintain the continuity of the action rather than transforming every
frame to the same position with the same body orientation. Our model achieves
state-of-the-art performance on three benchmark datasets. On the current
largest NTU RGB+D dataset, our scheme outperforms the state of the art by an
impressive 6% gain in accuracy.

Semi-Automatic Segmentation and Ultrasonic Characterization of Solid Breast Lesions

Mohammad Saad Billah, Tahmida Binte Mahmud
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Characterization of breast lesions is an essential prerequisite to detect
breast cancer in an early stage. Automatic segmentation makes this
categorization method robust by freeing it from subjectivity and human error.
Both spectral and morphometric features are successfully used for
differentiating between benign and malignant breast lesions. In this thesis, we
used empirical mode decomposition method for semi-automatic segmentation.
Sonographic features like ehcogenicity, heterogeneity, FNPA, margin definition,
Hurst coefficient, compactness, roundness, aspect ratio, convexity, solidity,
form factor were calculated to be used as our characterization parameters. All
of these parameters did not give desired comparative results. But some of them
namely echogenicity, heterogeneity, margin definition, aspect ratio and
convexity gave good results and were used for characterization.

Single Image Super-resolution with a Parameter Economic Residual-like Convolutional Neural Network

Yudong Liang, Ze Yang, Kai Zhang, Yihui He, Jinjun Wang, Nanning Zheng
Comments: Extentions of mmm 2017 paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent years have witnessed great success of convolutional neural network
(CNN) for various problems both in low and high level visions. Especially
noteworthy is the residual network which was originally proposed to handle
high-level vision problems and enjoys several merits. This paper aims to extend
the merits of residual network, such as skip connection induced fast training,
for a typical low-level vision problem, i.e., single image super-resolution. In
general, the two main challenges of existing deep CNN for supper-resolution lie
in the gradient exploding/vanishing problem and large amount of parameters or
computational cost as CNN goes deeper. Correspondingly, the skip connections or
identity mapping shortcuts are utilized to avoid gradient exploding/vanishing
problem. To tackle with the second problem, a parameter economic CNN
architecture which has carefully designed width, depth and skip connections was
proposed. Different residual-like architectures for image superresolution has
also been compared. Experimental results have demonstrated that the proposed
CNN model can not only achieve state-of-the-art PSNR and SSIM results for
single image super-resolution but also produce visually pleasant results. This
paper has extended the mmm 2017 paper with more experiments and explanations.

On the Robustness of Convolutional Neural Networks to Internal Architecture and Weight Perturbations

Nicholas Cheney, Martin Schrimpf, Gabriel Kreiman
Comments: under review at ICML 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Deep convolutional neural networks are generally regarded as robust function
approximators. So far, this intuition is based on perturbations to external
stimuli such as the images to be classified. Here we explore the robustness of
convolutional neural networks to perturbations to the internal weights and
architecture of the network itself. We show that convolutional networks are
surprisingly robust to a number of internal perturbations in the higher
convolutional layers but the bottom convolutional layers are much more fragile.
For instance, Alexnet shows less than a 30% decrease in classification
performance when randomly removing over 70% of weight connections in the top
convolutional or dense layers but performance is almost at chance with the same
perturbation in the first convolutional layer. Finally, we suggest further
investigations which could continue to inform the robustness of convolutional
networks to internal perturbations.

Artificial Intelligence

Reasoning by Cases in Structured Argumentation

Mathieu Beirlaen, Jesse Heyninck, Christian Straßer
Comments: Proceedings of SAC/KRR 2017
Subjects: Artificial Intelligence (cs.AI)

We extend the (ASPIC^+) framework for structured argumentation so as to allow
applications of the reasoning by cases inference scheme for defeasible
arguments. Given an argument with conclusion `(A) or (B)’, an argument based on
(A) with conclusion (C), and an argument based on (B) with conclusion (C), we
allow the construction of an argument with conclusion (C). We show how our
framework leads to different results than other approaches in non-monotonic
logic for dealing with disjunctive information, such as disjunctive default
theory or approaches based on the OR-rule (which allows to derive a defeasible
rule `If ((A) or (B)) then (C)’, given two defeasible rules `If (A) then (C)’
and `If (B) then (C)’). We raise new questions regarding the subtleties of
reasoning defeasibly with disjunctive information, and show that its
formalization is more intricate than one would presume.

Smart Augmentation – Learning an Optimal Data Augmentation Strategy

Joseph Lemley, Shabab Bazrafkan, Peter Corcoran
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

A recurring problem faced when training neural networks is that there is
typically not enough data to maximize the generalization capability of deep
neural networks(DNN). There are many techniques to address this, including data
augmentation, dropout, and transfer learning. In this paper, we introduce an
additional method which we call Smart Augmentation and we show how to use it to
increase the accuracy and reduce overfitting on a target network. Smart
Augmentation works by creating a network that learns how to generate augmented
data during the training process of a target network in a way that reduces that
networks loss. This allows us to learn augmentations that minimize the error of
that network.

Smart Augmentation has shown the potential to increase accuracy by
demonstrably significant measures on all datasets tested. In addition, it has
shown potential to achieve similar or improved performance levels with
significantly smaller network sizes in a number of tested cases.

Overcoming Catastrophic Forgetting by Incremental Moment Matching

Sang-Woo Lee, Jin-Hwa Kim, Jung-Woo Ha, Byoung-Tak Zhang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Catastrophic forgetting is a problem which refers to losing the information
of the first task after training from the second task in continual learning of
neural networks. To resolve this problem, we propose the incremental moment
matching (IMM), which uses the Bayesian neural network framework. IMM assumes
that the posterior distribution of parameters of neural networks is
approximated with Gaussian distribution and incrementally matches the moment of
the posteriors, which are trained for the first and second task, respectively.
To make our Gaussian assumption reasonable, the IMM procedure utilizes various
transfer learning techniques including weight transfer, L2-norm of old and new
parameters, and a newly proposed variant of dropout using old parameters. We
analyze our methods on the MNIST and CIFAR-10 datasets, and then evaluate them
on a real-world life-log dataset collected using Google Glass. Experimental
results show that IMM produces state-of-the-art performance in a variety of
datasets.

Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, Sandeep Soni, Jaime Teevan, Andrés Monroy-Hernández
Comments: 10 pages
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Although information workers may complain about meetings, they are an
essential part of their work life. Consequently, busy people spend a
significant amount of time scheduling meetings. We present Calendar.help, a
system that provides fast, efficient scheduling through structured workflows.
Users interact with the system via email, delegating their scheduling needs to
the system as if it were a human personal assistant. Common scheduling
scenarios are broken down using well-defined workflows and completed as a
series of microtasks that are automated when possible and executed by a human
otherwise. Unusual scenarios fall back to a trained human assistant who
executes them as unstructured macrotasks. We describe the iterative approach we
used to develop Calendar.help, and share the lessons learned from scheduling
thousands of meetings during a year of real-world deployments. Our findings
provide insight into how complex information tasks can be broken down into
repeatable components that can be executed efficiently to improve productivity.

Supervisor Synthesis of POMDP based on Automata Learning

Xiaobin Zhang, Bo Wu, Hai Lin
Subjects: Systems and Control (cs.SY); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)

As a general and thus popular model for autonomous systems, partially
observable Markov decision process (POMDP) can capture uncertainties from
different sources like sensing noises, actuation errors, and uncertain
environments. However, its comprehensiveness makes the planning and control in
POMDP difficult. Traditional POMDP planning problems target to find the optimal
policy to maximize the expectation of accumulated rewards. But for safety
critical applications, guarantees of system performance described by formal
specifications are desired, which motivates us to consider formal methods to
synthesize supervisor for POMDP. With system specifications given by
Probabilistic Computation Tree Logic (PCTL), we propose a supervisory control
framework with a type of deterministic finite automata (DFA), za-DFA, as the
controller form. While the existing work mainly relies on optimization
techniques to learn fixed-size finite state controllers (FSCs), we develop an
(L^*) learning based algorithm to determine both space and transitions of
za-DFA. Membership queries and different oracles for conjectures are defined.
The learning algorithm is sound and complete. An example is given in detailed
steps to illustrate the supervisor synthesis algorithm.

Computation and Language

Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching

Victor Soto, Julia Hirschberg
Comments: Submitted to Interspeech 2017
Subjects: Computation and Language (cs.CL)

Code-switching is the phenomenon by which bilingual speakers switch between
multiple languages during communication. The importance of developing language
technologies for codeswitching data is immense, given the large populations
that routinely code-switch. High-quality linguistic annotations are extremely
valuable for any NLP task, and performance is often limited by the amount of
high-quality labeled data. However, little such data exists for code-switching.
In this paper, we describe crowd-sourcing universal part-of-speech tags for the
Miami Bangor Corpus of Spanish-English code-switched speech. We split the
annotation task into three subtasks: one in which a subset of tokens are
labeled automatically, one in which questions are specifically designed to
disambiguate a subset of high frequency words, and a more general cascaded
approach for the remaining data in which questions are displayed to the worker
following a decision tree structure. Each subtask is extended and adapted for a
multilingual setting and the universal tagset. The quality of the annotation
process is measured using hidden check questions annotated with gold labels.
The overall agreement between gold standard labels and the majority vote is
between 0.95 and 0.96 for just three labels and the average recall across
part-of-speech tags is between 0.87 and 0.99, depending on the task.

Interactive Natural Language Acquisition in a Multi-modal Recurrent Neural Architecture

Stefan Heinrich, Stefan Wermter
Comments: To appear in Connection Science, 2017, 33 pages
Subjects: Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)

The human brain is one of the most complex dynamic systems that enables us to
communicate in natural language. We have a good understanding of some
principles underlying natural languages and language processing, some knowledge
about socio-cultural conditions framing acquisition, and some insights about
where activity is occurring in the brain. However, we were not yet able to
understand the behavioural and mechanistic characteristics for natural language
and how mechanisms in the brain allow to acquire and process language.

In an effort to bridge the gap between insights from behavioural psychology
and neuroscience, the goal of this paper is to contribute a computational
understanding of the appropriate characteristics that favour language
acquisition, in a brain-inspired neural architecture. Accordingly, we provide
concepts and refinements in cognitive modelling regarding principles and
mechanisms in the brain – such as the hierarchical abstraction of context – in
a plausible recurrent architecture. On this basis, we propose neurocognitively
plausible model for embodied language acquisition from real world interaction
of a humanoid robot with its environment. The model is capable of learning
language production grounded in both, temporal dynamic somatosensation and
vision. In particular, the architecture consists of a continuous time recurrent
neural network, where parts have different leakage characteristics and thus
operate on multiple timescales for every modality and the association of the
higher level nodes of all modalities into cell assemblies. Thus, this model
features hierarchical concept abstraction in sensation as well as concept
decomposition in production, multi-modal integration, and self-organisation of
latent representations.

Batch-normalized joint training for DNN-based distant speech recognition

Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Comments: arXiv admin note: text overlap with arXiv:1703.08002
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Improving distant speech recognition is a crucial step towards flexible
human-machine interfaces. Current technology, however, still exhibits a lack of
robustness, especially when adverse acoustic conditions are met. Despite the
significant progress made in the last years on both speech enhancement and
speech recognition, one potential limitation of state-of-the-art technology
lies in composing modules that are not well matched because they are not
trained jointly. To address this concern, a promising approach consists in
concatenating a speech enhancement and a speech recognition deep neural network
and to jointly update their parameters as if they were within a single bigger
network. Unfortunately, joint training can be difficult because the output
distribution of the speech enhancement system may change substantially during
the optimization procedure. The speech recognition module would have to deal
with an input distribution that is non-stationary and unnormalized. To mitigate
this issue, we propose a joint training approach based on a fully
batch-normalized architecture. Experiments, conducted using different datasets,
tasks and acoustic conditions, revealed that the proposed framework
significantly overtakes other competitive solutions, especially in challenging
environments.

TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia

Fabian Flöck, Kenan Erdogan, Maribel Acosta
Subjects: Computation and Language (cs.CL)

We present a dataset that contains every instance of all tokens (~ words)
ever written in undeleted, non-redirect English Wikipedia articles until
October 2016, in total 13,545,349,787 instances. Each token is annotated with
(i) the article revision it was originally created in, and (ii) lists with all
the revisions in which the token was ever deleted and (potentially) re-added
and re-deleted from its article, enabling a complete and straightforward
tracking of its history. This data would be exceedingly hard to create by an
average potential user as it is (i) very expensive to compute and as (ii)
accurately tracking the history of each token in revisioned documents is a
non-trivial task. Adapting a state-of-the-art algorithm, we have produced a
dataset that allows for a range of analyses and metrics, already popular in
research and going beyond, to be generated on complete-Wikipedia scale;
ensuring quality and allowing researchers to forego expensive text-comparison
computation, which so far has hindered scalable usage. We show how this data
enables, on token-level, computation of provenance, measuring survival of
content over time, very detailed conflict metrics, and fine-grained
interactions of editors like partial reverts, re-additions and other metrics,
in the process gaining several novel insights.

Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

Although information workers may complain about meetings, they are an
essential part of their work life. Consequently, busy people spend a
significant amount of time scheduling meetings. We present Calendar.help, a
system that provides fast, efficient scheduling through structured workflows.
Users interact with the system via email, delegating their scheduling needs to
the system as if it were a human personal assistant. Common scheduling
scenarios are broken down using well-defined workflows and completed as a
series of microtasks that are automated when possible and executed by a human
otherwise. Unusual scenarios fall back to a trained human assistant who
executes them as unstructured macrotasks. We describe the iterative approach we
used to develop Calendar.help, and share the lessons learned from scheduling
thousands of meetings during a year of real-world deployments. Our findings
provide insight into how complex information tasks can be broken down into
repeatable components that can be executed efficiently to improve productivity.

Are crossing dependencies really scarce?

Ramon Ferrer-i-Cancho, Carlos Gomez-Rodriguez, J.L. Esteban
Subjects: Physics and Society (physics.soc-ph); Statistical Mechanics (cond-mat.stat-mech); Computation and Language (cs.CL); Data Analysis, Statistics and Probability (physics.data-an)

The syntactic structure of a sentence can be modelled as a tree, where
vertices correspond to words and edges indicate syntactic dependencies. It has
been claimed recurrently that the number of edge crossings in real sentences is
small. However, a baseline or null hypothesis has been lacking. Here we
quantify the amount of crossings of real sentences and compare it to the
predictions of a series of baselines. We conclude that crossings are really
scarce in real sentences. Their scarcity is unexpected by the hubiness of the
trees. Indeed, real sentences are close to linear trees, where the potential
number of crossings is maximized.

Interacting Conceptual Spaces I : Grammatical Composition of Concepts

Joe Bolt, Bob Coecke, Fabrizio Genovese, Martha Lewis, Dan Marsden, Robin Piedeleu
Subjects: Logic in Computer Science (cs.LO); Computation and Language (cs.CL)

The categorical compositional approach to meaning has been successfully
applied in natural language processing, outperforming other models in
mainstream empirical language processing tasks. We show how this approach can
be generalized to conceptual space models of cognition. In order to do this,
first we introduce the category of convex relations as a new setting for
categorical compositional semantics, emphasizing the convex structure important
to conceptual space applications. We then show how to construct conceptual
spaces for various types such as nouns, adjectives and verbs. Finally we show
by means of examples how concepts can be systematically combined to establish
the meanings of composite phrases from the meanings of their constituent parts.
This provides the mathematical underpinnings of a new compositional approach to
cognition.

Distributed, Parallel, and Cluster Computing

An Algorithmic Approach to the Asynchronous Computability Theorem

Vikram Saraph, Maurice Herlihy, Eli Gafni
Comments: 16 pages, 2 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The asynchronous computability theorem (ACT) uses concepts from combinatorial
topology to characterize which tasks have wait-free solutions in read-write
memory. A task can be expressed as a relation between two chromatic simplicial
complexes. The theorem states that a task has a protocol (algorithm) if and
only if there is a certain chromatic simplicial map compatible with that
relation.

While the original proof of the ACT relied on an involved combinatorial
argument, Borowsky and Gafni later proposed an alternative proof that relied on
a algorithmic construction, termed the “convergence algorithm”. The description
of this algorithm was incomplete, and presented without proof. In this paper,
we give the first complete description, along with a proof of correctness.

Virtualization technology for distributed time sensitive domains

Carlos Antonio Perea-Gómez
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)

This paper reports on the state of the art of virtualization technology for
both general purpose domains as well as real-time domains. There exits no
entirely instantaneous data transmission/transfer. There always exist a delay
while transmitting data, either in the processing or in the medium itself.
However most systems are designed to function appropriately with a delay
tolerance. This delay, inevitably, is affected when operating with an extra
layer, the virtualization. For real time systems it is crucial to know the
temporal limits in order not to surpass them. Introducing virtualization in the
real-time domain therefore requires deeper analysis by making use of techniques
that will offer results with deterministic execution times. The study of time
in systems and its behaviour under various possible circumstances is hence a
key for properly assessing this technology applied to both domains, especially
the real-time domain.

A duality-based approach for distributed min-max optimization with application to demand side management

Ivano Notarnicola, Mauro Franceschelli, Giuseppe Notarstefano
Comments: arXiv admin note: substantial text overlap with arXiv:1611.09168
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY)

In this paper we consider a distributed optimization scenario in which a set
of processors aims at minimizing the maximum of a collection of “separable
convex functions” subject to local constraints. This set-up is motivated by
peak-demand minimization problems in smart grids. Here, the goal is to minimize
the peak value over a finite horizon with: (i) the demand at each time instant
being the sum of contributions from different devices, and (ii) the local
states at different time instants being coupled through local dynamics. The
min-max structure and the double coupling (through the devices and over the
time horizon) makes this problem challenging in a distributed set-up (e.g.,
well-known distributed dual decomposition approaches cannot be applied). We
propose a distributed algorithm based on the combination of duality methods and
properties from min-max optimization. Specifically, we derive a series of
equivalent problems by introducing ad-hoc slack variables and by going back and
forth from primal and dual formulations. On the resulting problem we apply a
dual subgradient method, which turns out to be a distributed algorithm. We
prove the correctness of the proposed algorithm and show its effectiveness via
numerical computations.

A randomized primal distributed algorithm for partitioned and big-data non-convex optimization

Ivano Notarnicola, Giuseppe Notarstefano
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (cs.NA)

In this paper we consider a distributed optimization scenario in which the
aggregate objective function to minimize is partitioned, big-data and possibly
non-convex. Specifically, we focus on a set-up in which the dimension of the
decision variable depends on the network size as well as the number of local
functions, but each local function handled by a node depends only on a (small)
portion of the entire optimization variable. This problem set-up has been shown
to appear in many interesting network application scenarios. As main paper
contribution, we develop a simple, primal distributed algorithm to solve the
optimization problem, based on a randomized descent approach, which works under
asynchronous gossip communication. We prove that the proposed asynchronous
algorithm is a proper, ad-hoc version of a coordinate descent method and thus
converges to a stationary point. To show the effectiveness of the proposed
algorithm, we also present numerical simulations on a non-convex quadratic
program, which confirm the theoretical results.

Taming Tail Latency for Erasure-coded, Distributed Storage Systems

Vaneet Aggarwal, Abubakr O. Al-Abbasi, Jingxian Fan, Tian Lan
Comments: 11 pages, 8 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

Distributed storage systems are known to be susceptible to long tails in
response time. In modern online storage systems such as Bing, Facebook, and
Amazon, the long tails of the service latency are of particular concern. with
99.9th percentile response times being orders of magnitude worse than the mean.
As erasure codes emerge as a popular technique to achieve high data reliability
in distributed storage while attaining space efficiency, taming tail latency
still remains an open problem due to the lack of mathematical models for
analyzing such systems. To this end, we propose a framework for quantifying and
optimizing tail latency in erasure-coded storage systems. In particular, we
derive upper bounds on tail latency in closed form for arbitrary service time
distribution and heterogeneous files. Based on the model, we formulate an
optimization problem to jointly minimize the weighted latency tail probability
of all files over the placement of files on the servers, and the choice of
servers to access the requested files. The non-convex problem is solved using
an efficient, alternating optimization algorithm. Numerical results show
significant reduction of tail latency for erasure-coded storage systems with a
realistic workload.

LRC: Dependency-Aware Cache Management for Data Analytics Clusters

Yinghao Yu, Wei Wang, Jun Zhang, Khaled Ben Letaief
Comments: 9 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Memory caches are being aggressively used in today’s data-parallel systems
such as Spark, Tez, and Piccolo. However, prevalent systems employ rather
simple cache management policies–notably the Least Recently Used (LRU)
policy–that are oblivious to the application semantics of data dependency,
expressed as a directed acyclic graph (DAG). Without this knowledge, memory
caching can at best be performed by “guessing” the future data access patterns
based on historical information (e.g., the access recency and/or frequency),
which frequently results in inefficient, erroneous caching with low hit ratio
and a long response time. In this paper, we propose a novel cache replacement
policy, Least Reference Count (LRC), which exploits the application-specific
DAG information to optimize the cache management. LRC evicts the cached data
blocks whose reference count is the smallest. The reference count is defined,
for each data block, as the number of dependent child blocks that have not been
computed yet. We demonstrate the efficacy of LRC through both empirical
analysis and cluster deployments against popular benchmarking workloads. Our
Spark implementation shows that, compared with LRU, LRC speeds up typical
applications by 60%.

Automatically Tuning the GCC Compiler to Optimize the Performance of Applications Running on the ARM Cortex-M3

Craig Blackmore, Oliver Ray, Kerstin Eder
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

This paper introduces a novel method for automatically tuning the selection
of compiler flags in order to optimize the performance of software that is
intended to run on particular hardware platforms. Motivated by the rapid recent
expansion of so-called Internet of Things (IoT) devices, we are particularly
interested in improving the execution time of code running on embedded system
architectures. We demonstrate the effectiveness of our approach on code
compiled by the GNU C Compiler (GCC) for the ARM Cortex-M3 (CM3) processor; and
we show how our method outperforms the current industry standard -O3
optimization level across a diverse embedded system benchmark suite called
BEEBS. We begin by conducting an investigatory study to quantify the potential
gains by using existing iterative compilation approaches that time-intensively
search for optimal configurations for each given benchmark. Then we adapt
iterative compilation to output a single configuration that optimizes
performance across the entire benchmark suite as a whole. Although this is a
time consuming process, our approach eventually constructs a simple variation
of -O3, which we call -Ocm3, that realizes nearly two thirds of the known
available gains on the CM3 architecture (beyond -O3) and significantly
outperforms a far more complex state-of-the-art predictive method. Our approach
suggests that 26 flags should be removed from -O3 while three other flags
should be added. We analyze in detail two of the former and explain why turning
them off improves performance on this processor.

Speeding up TestU01 with the use of HTCondor

Joshua Beizer
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Testing random number generators is a very important task that, in the resent
past, has taken upwards of twelve hours when testing with the current agship
testing suite TestU01. Through this paper we will discuss the possible
performance increases to the existing random number generator testing suite
TestU01 that are available by offering the executable to an HTCondor pool to
execute on. We will see that with a few modifications we are able to decrease
the running time of a sample run of Big Crush from about five and a half hours
to only five and a half minutes. We will also see that not only the time taken
for the testing to complete is shortened, but also the amount of time the user
is unable to use their testing computer is reduced to almost none.
Additionally, during this paper we will look at the standard implementation of
TestU01 and how it compares with a preexisting Parallel version of TestU01. We
will be comparing the performance of the standard version of TestU01 with the
parallel version so that we have a performance baseline to test our own
modifications against. Finally, this paper will also discuss the use of the
distributed computing system HTCondor, and cover some basic instructions
related to setting up and using HTCondor. HTCondor is already known to increase
the performance of a networked group of computers by allowing super users to
utilize additional resources from other lesser users idle workstation. We will
relate the model recommended for HTCondor to a standard computer lab found at a
University and explain why they are the perfect candidates for the system.

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

Grégory M. Essertel, Ruby Y. Tahboub, James M. Decker, Kevin J. Brown, Kunle Olukotun, Tiark Rompf
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Programming Languages (cs.PL)

The need for modern data analytics to combine relational, procedural, and
map-reduce-style functional processing is widely recognized. State-of-the-art
systems like Spark have added SQL front-ends and relational query optimization,
which promise an increase in expressiveness and performance. But how good are
these extensions at extracting high performance from modern hardware platforms?

While Spark has made impressive progress, we show that for relational
workloads, there is still a significant gap compared with best-of-breed query
engines. And when stepping outside of the relational world, query optimization
techniques are ineffective if large parts of a computation have to be treated
as user-defined functions (UDFs).

We present Flare: a new back-end for Spark that brings performance closer to
the best SQL engines, without giving up the added expressiveness of Spark. We
demonstrate order of magnitude speedups both for relational workloads such as
TPC-H, as well as for a range of machine learning kernels that combine
relational and iterative functional processing.

Flare achieves these results through (1) compilation to native code, (2)
replacing parts of the Spark runtime system, and (3) extending the scope of
optimization and code generation to large classes of UDFs.

Learning

Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks

Shuai Xiao, Junchi Yan, Mehrdad Farajtabar, Le Song, Xiaokang Yang, Hongyuan Zha
Comments: 14 pages
Subjects: Learning (cs.LG)

A variety of real-world processes (over networks) produce sequences of data
whose complex temporal dynamics need to be studied. More especially, the event
timestamps can carry important information about the underlying network
dynamics, which otherwise are not available from the time-series evenly sampled
from continuous signals. Moreover, in most complex processes, event sequences
and evenly-sampled times series data can interact with each other, which
renders joint modeling of those two sources of data necessary. To tackle the
above problems, in this paper, we utilize the rich framework of (temporal)
point processes to model event data and timely update its intensity function by
the synergic twin Recurrent Neural Networks (RNNs). In the proposed
architecture, the intensity function is synergistically modulated by one RNN
with asynchronous events as input and another RNN with time series as input.
Furthermore, to enhance the interpretability of the model, the attention
mechanism for the neural point process is introduced. The whole model with
event type and timestamp prediction output layers can be trained end-to-end and
allows a black-box treatment for modeling the intensity. We substantiate the
superiority of our model in synthetic data and three real-world benchmark
datasets.

Overcoming Catastrophic Forgetting by Incremental Moment Matching

Sang-Woo Lee, Jin-Hwa Kim, Jung-Woo Ha, Byoung-Tak Zhang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Catastrophic forgetting is a problem which refers to losing the information
of the first task after training from the second task in continual learning of
neural networks. To resolve this problem, we propose the incremental moment
matching (IMM), which uses the Bayesian neural network framework. IMM assumes
that the posterior distribution of parameters of neural networks is
approximated with Gaussian distribution and incrementally matches the moment of
the posteriors, which are trained for the first and second task, respectively.
To make our Gaussian assumption reasonable, the IMM procedure utilizes various
transfer learning techniques including weight transfer, L2-norm of old and new
parameters, and a newly proposed variant of dropout using old parameters. We
analyze our methods on the MNIST and CIFAR-10 datasets, and then evaluate them
on a real-world life-log dataset collected using Google Glass. Experimental
results show that IMM produces state-of-the-art performance in a variety of
datasets.

K-Means Clustering using Tabu Search with Quantized Means

Kojo Sarfo Gyamfi, James Brusey, Andrew Hunt
Comments: World Conference on Engineering and Computer Science
Subjects: Learning (cs.LG)

The Tabu Search (TS) metaheuristic has been proposed for K-Means clustering
as an alternative to Lloyd’s algorithm, which for all its ease of
implementation and fast runtime, has the major drawback of being trapped at
local optima. While the TS approach can yield superior performance, it involves
a high computational complexity. Moreover, the difficulty in parameter
selection in the existing TS approach does not make it any more attractive.
This paper presents an alternative, low-complexity formulation of the TS
optimization procedure for K-Means clustering. This approach does not require
many parameter settings. We initially constrain the centers to points in the
dataset. We then aim at evolving these centers using a unique neighborhood
structure that makes use of gradient information of the objective function.
This results in an efficient exploration of the search space, after which the
means are refined. The proposed scheme is implemented in MATLAB and tested on
four real-world datasets, and it achieves a significant improvement over the
existing TS approach in terms of the intra cluster sum of squares and
computational time.

Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

Kojo Sarfo Gyamfi, James Brusey, Andrew Hunt, Elena Gaura
Subjects: Learning (cs.LG)

Under normality and homoscedasticity assumptions, Linear Discriminant
Analysis (LDA) is known to be optimal in terms of minimising the Bayes error
for binary classification. In the heteroscedastic case, LDA is not guaranteed
to minimise this error. Assuming heteroscedasticity, we derive a linear
classifier, the Gaussian Linear Discriminant (GLD), that directly minimises the
Bayes error for binary classification. In addition, we also propose a local
neighbourhood search (LNS) algorithm to obtain a more robust classifier if the
data is known to have a non-normal distribution. We evaluate the proposed
classifiers on two artificial and ten real-world datasets that cut across a
wide range of application areas including handwriting recognition, medical
diagnosis and remote sensing, and then compare our algorithm against existing
LDA approaches and other linear classifiers. The GLD is shown to outperform the
original LDA procedure in terms of the classification accuracy under
heteroscedasticity. While it compares favourably with other existing
heteroscedastic LDA approaches, the GLD requires as much as 60 times lower
training time on some datasets. Our comparison with the support vector machine
(SVM) also shows that, the GLD, together with the LNS, requires as much as 150
times lower training time to achieve an equivalent classification accuracy on
some of the datasets. Thus, our algorithms can provide a cheap and reliable
option for classification in a lot of expert systems.

Asymmetric Learning Vector Quantization for Efficient Nearest Neighbor Classification in Dynamic Time Warping Spaces

Brijnesh Jain, David Schultz
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

The nearest neighbor method together with the dynamic time warping (DTW)
distance is one of the most popular approaches in time series classification.
This method suffers from high storage and computation requirements for large
training sets. As a solution to both drawbacks, this article extends learning
vector quantization (LVQ) from Euclidean spaces to DTW spaces. The proposed LVQ
scheme uses asymmetric weighted averaging as update rule. Empirical results
exhibited superior performance of asymmetric generalized LVQ (GLVQ) over other
state-of-the-art prototype generation methods for nearest neighbor
classification.

Multi-Level Discovery of Deep Options

Roy Fox, Sanjay Krishnan, Ion Stoica, Ken Goldberg
Subjects: Learning (cs.LG)

Augmenting an agent’s control with useful higher-level behaviors called
options can greatly reduce the sample complexity of reinforcement learning, but
manually designing options is infeasible in high-dimensional and abstract state
spaces. While recent work has proposed several techniques for automated option
discovery, they do not scale to multi-level hierarchies and to expressive
representations such as deep networks. We present Discovery of Deep Options
(DDO), a policy-gradient algorithm that discovers parametrized options from a
set of demonstration trajectories, and can be used recursively to discover
additional levels of the hierarchy. The scalability of our approach to
multi-level hierarchies stems from the decoupling of low-level option discovery
from high-level meta-control policy learning, facilitated by
under-parametrization of the high level. We demonstrate that using the
discovered options to augment the action space of Deep Q-Network agents can
accelerate learning by guiding exploration in tasks where random actions are
unlikely to reach valuable states. We show that DDO is effective in adding
options that accelerate learning in 4 out of 5 Atari RAM environments chosen in
our experiments. We also show that DDO can discover structure in robot-assisted
surgical videos and kinematics that match expert annotation with 72% accuracy.

Experimental Identification of Hard Data Sets for Classification and Feature Selection Methods with Insights on Method Selection

Cuiju Luan, Guozhu Dong
Comments: 19 pages, 3 figures, 12 tables
Subjects: Learning (cs.LG)

The paper first reports an experimentally identified list of benchmark data
sets that are hard for representative classification and feature selection
methods. This was done after systematically evaluating a total of 54
combinations of methods, involving nine state-of-the-art classification
algorithms and six commonly used feature selection methods, on 129 data sets
from the UCI repository (some data sets with known high classification accuracy
were excluded). In this paper, a data set for classification is called hard if
none of the 54 combinations can achieve an AUC over 0.8 and none of them can
achieve an F-Measure value over 0.8; it is called easy otherwise. A total of 17
out of the 129 data sets were found to be hard in that sense. This paper also
compares the performance of different methods, and it produces rankings of
classification methods, separately on the hard data sets and on the easy data
sets. This paper is the first to rank methods separately for hard data sets and
for easy data sets. It turns out that the classifier rankings resulting from
our experiments are somehow different from those in the literature and hence
they offer new insights on method selection.

On the Robustness of Convolutional Neural Networks to Internal Architecture and Weight Perturbations

Nicholas Cheney, Martin Schrimpf, Gabriel Kreiman
Comments: under review at ICML 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Deep convolutional neural networks are generally regarded as robust function
approximators. So far, this intuition is based on perturbations to external
stimuli such as the images to be classified. Here we explore the robustness of
convolutional neural networks to perturbations to the internal weights and
architecture of the network itself. We show that convolutional networks are
surprisingly robust to a number of internal perturbations in the higher
convolutional layers but the bottom convolutional layers are much more fragile.
For instance, Alexnet shows less than a 30% decrease in classification
performance when randomly removing over 70% of weight connections in the top
convolutional or dense layers but performance is almost at chance with the same
perturbation in the first convolutional layer. Finally, we suggest further
investigations which could continue to inform the robustness of convolutional
networks to internal perturbations.

Batch-normalized joint training for DNN-based distant speech recognition

Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Comments: arXiv admin note: text overlap with arXiv:1703.08002
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Improving distant speech recognition is a crucial step towards flexible
human-machine interfaces. Current technology, however, still exhibits a lack of
robustness, especially when adverse acoustic conditions are met. Despite the
significant progress made in the last years on both speech enhancement and
speech recognition, one potential limitation of state-of-the-art technology
lies in composing modules that are not well matched because they are not
trained jointly. To address this concern, a promising approach consists in
concatenating a speech enhancement and a speech recognition deep neural network
and to jointly update their parameters as if they were within a single bigger
network. Unfortunately, joint training can be difficult because the output
distribution of the speech enhancement system may change substantially during
the optimization procedure. The speech recognition module would have to deal
with an input distribution that is non-stationary and unnormalized. To mitigate
this issue, we propose a joint training approach based on a fully
batch-normalized architecture. Experiments, conducted using different datasets,
tasks and acoustic conditions, revealed that the proposed framework
significantly overtakes other competitive solutions, especially in challenging
environments.

Smart Augmentation – Learning an Optimal Data Augmentation Strategy

Joseph Lemley, Shabab Bazrafkan, Peter Corcoran
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

A recurring problem faced when training neural networks is that there is
typically not enough data to maximize the generalization capability of deep
neural networks(DNN). There are many techniques to address this, including data
augmentation, dropout, and transfer learning. In this paper, we introduce an
additional method which we call Smart Augmentation and we show how to use it to
increase the accuracy and reduce overfitting on a target network. Smart
Augmentation works by creating a network that learns how to generate augmented
data during the training process of a target network in a way that reduces that
networks loss. This allows us to learn augmentations that minimize the error of
that network.

Smart Augmentation has shown the potential to increase accuracy by
demonstrably significant measures on all datasets tested. In addition, it has
shown potential to achieve similar or improved performance levels with
significantly smaller network sizes in a number of tested cases.

Feature Fusion using Extended Jaccard Graph and Stochastic Gradient Descent for Robot

Shenglan Liu, Muxin Sun, Wei Wang, Feilong Wang
Comments: Assembly Automation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

Robot vision is a fundamental device for human-robot interaction and robot
complex tasks. In this paper, we use Kinect and propose a feature graph fusion
(FGF) for robot recognition. Our feature fusion utilizes RGB and depth
information to construct fused feature from Kinect. FGF involves multi-Jaccard
similarity to compute a robust graph and utilize word embedding method to
enhance the recognition results. We also collect DUT RGB-D face dataset and a
benchmark datset to evaluate the effectiveness and efficiency of our method.
The experimental results illustrate FGF is robust and effective to face and
object datasets in robot applications.

Information Theory

Smart Meter Privacy with Renewable Energy and a Storage Device

Giulio Giaconi, Deniz Gunduz, H. Vincent Poor
Comments: 31 pages, 7 figures, submitted for Journal publication
Subjects: Information Theory (cs.IT)

A smart meter (SM) measures a consumer’s electricity consumption and reports
it automatically to a utility provider (UP) in almost real time. Despite many
advantages of SMs, their use also leads to serious concerns about consumer
privacy. In this paper, SM privacy is studied by considering the presence of a
renewable energy source (RES) and a battery, which can be used to partially
hide the consumer’s energy consumption behavior. Privacy is measured by the
information leakage rate, which denotes the average mutual information between
the user’s real energy consumption and the energy requested from the grid,
which the SM reads and reports to the UP. The impact of the knowledge of the
amount of energy generated by the RES at the UP is also considered. The minimum
information leakage rate is characterized as a computable information theoretic
single-letter expression in the two extreme cases, that is, when the battery
capacity is infinite or zero. Numerical results are presented for the finite
battery capacity case to illustrate the potential privacy gains from the
existence of a storage device. It is shown that, while the information leakage
rate decreases with increasing availability of an RES, larger storage capacity
is needed to fully exploit the available energy to improve the privacy.

A new class of three-weight linear codes from weakly regular plateaued functions

Sihem Mesnager, Ferruh Özbudak, Ahmet Sınak
Comments: The Extended Abstract of this work was submitted to WCC-2017 (the Tenth International Workshop on Coding and Cryptography)
Subjects: Information Theory (cs.IT)

Linear codes with few weights have many applications in secret sharing
schemes, authentication codes, communication and strongly regular graphs. In
this paper, we consider linear codes with three weights in arbitrary
characteristic. To do this, we generalize the recent contribution of Mesnager
given in [Cryptography and Communications 9(1), 71-84, 2017]. We first present
a new class of binary linear codes with three weights from plateaued Boolean
functions and their weight distributions. We next introduce the notion of
(weakly) regular plateaued functions in odd characteristic (p) and give
concrete examples of these functions. Moreover, we construct a new class of
three-weight linear (p)-ary codes from weakly regular plateaued functions and
determine their weight distributions. We finally analyse the constructed linear
codes for secret sharing schemes.

Diffusion L0-norm constraint improved proportionate LMS algorithm for sparse distributed estimation

Zongsheng Zheng, Zhigang Liu
Subjects: Information Theory (cs.IT)

To exploit the sparsity of the considered system, the diffusion
proportionate-type least mean square (PtLMS) algorithms assign different gains
to each tap in the convergence stage while the diffusion sparsity-constrained
LMS (ScLMS) algorithms pull the components towards zeros in the steady-state
stage. In this paper, by minimizing a differentiable cost function that
utilizes the Riemannian distance between the updated and previous weight
vectors as well as the L0 norm of the weighted updated weight vector, we
propose a diffusion L0-norm constraint improved proportionate LMS (L0-IPLMS)
algorithm, which combines the benefits of the diffusion PtLMS and diffusion
ScLMS algorithms and performs the best performance among them. Simulations in a
system identification context confirm the improvement of the proposed
algorithm.

Combinatorial metrics: MacWilliams-type identities, isometries and extension property

Jerry Anderson Pinheiro, Roberto Assis Machado, Marcelo Firer
Subjects: Information Theory (cs.IT)

In this work we characterize the combinatorial metrics admitting a
MacWilliams-type identity and describe the group of linear isometries of such
metrics. Considering coverings that are not connected, we classify the metrics
satisfying the MacWilliams extension property.

SINR and Throughput of Dense Cellular Networks with Stretched Exponential Path Loss

Ahmad AlAmmouri, Jeffrey G. Andrews, François Baccelli
Comments: Submitted to IEEE TWC
Subjects: Information Theory (cs.IT)

Distance-based attenuation is a critical aspect of wireless communications.
As opposed to the ubiquitous power-law path loss model, this paper proposes a
stretched exponential path loss model that is suitable for short-range
communication. In this model, the signal power attenuates over a distance (r)
as (e^{-alpha r^{eta}}), where (alpha,eta) are tunable parameters. Using
experimental propagation measurements, we show that the proposed model is
accurate for short to moderate distances in the range (r in (5,300)) meters
and so is a suitable model for dense and ultradense networks. We integrate this
path loss model into a downlink cellular network with base stations modeled by
a Poisson point process, and derive expressions for the coverage probability,
potential throughput, and area spectral efficiency. Although the most general
result for coverage probability has a double integral, several special cases
are given where the coverage probability has a compact or even closed form. We
then show that the potential throughput is maximized for a particular BS
density and then collapses to zero for high densities, assuming a fixed SINR
threshold. We next prove that the area spectral efficiency, which assumes an
adaptive SINR threshold, is non-decreasing with the BS density and converges to
a constant for high densities.

Millimeter Wave Small-Scale Spatial Statistics in an Urban Microcell Scenario

Shu Sun, Hangsong Yan, George R. MacCartney Jr., Theodore S. Rappaport
Comments: 7 pages, 11 figures, in 2017 IEEE International Conference on Communications (ICC), May 2017
Subjects: Information Theory (cs.IT)

This paper presents outdoor wideband small-scale spatial fading and
autocorrelation measurements and results in the 73 GHz millimeter-wave (mmWave)
band conducted in downtown Brooklyn, New York. Both directional and
omnidirectional receiver (RX) antennas are studied. Two pairs of transmitter
(TX) and RX locations were tested with one line-of-sight (LOS) and one
non-lineof- sight (NLOS) environment, where a linear track was employed at each
RX to move the antenna in half-wavelength increments. Measured data reveal that
the small-scale spatial fading of the received signal voltage amplitude are
generally Ricean-distributed for both omnidirectional and directional RX
antenna patterns under both LOS and NLOS conditions in most cases, except for
the log-normal distribution for the omnidirectional RX antenna pattern in the
NLOS environment. Sinusoidal exponential and typical exponential functions are
found to model small-scale spatial autocorrelation of the received signal
voltage amplitude in LOS and NLOS environments in most cases, respectively.
Furthermore, different decorrelation distances were observed for different RX
track orientations, i.e., for different directions of motion relative to the
TX. Results herein are valuable for characterizing smallscale spatial fading
and autocorrelation properties in multipleinput multiple-output (MIMO) systems
for fifth-generation (5G) mmWave frequencies.

A Novel Millimeter-Wave Channel Simulator and Applications for 5G Wireless Communications

Shu Sun, George R. MacCartney Jr., Theodore S. Rappaport
Comments: 7 pages, 8 figures, in 2017 IEEE International Conference on Communications (ICC), May 2017
Subjects: Information Theory (cs.IT)

This paper presents details and applications of a novel channel simulation
software named NYUSIM, which can be used to generate realistic temporal and
spatial channel responses to support realistic physical- and link-layer
simulations and design for fifth-generation (5G) cellular communications.
NYUSIM is built upon the statistical spatial channel model for broadband
millimeter-wave (mmWave) wireless communication systems developed by
researchers at New York University (NYU). The simulator is applicable for a
wide range of carrier frequencies (500 MHz to 100 GHz), radio frequency (RF)
bandwidths (0 to 800 MHz), antenna beamwidths (7? to 360? for azimuth and 7? to
45? for elevation), and operating scenarios (urban microcell, urban macrocell,
and rural macrocell), and also incorporates multiple-input multiple-output
(MIMO) antenna arrays at the transmitter and receiver. This paper also provides
examples to demonstrate how to use NYUSIM for analyzing MIMO channel conditions
and spectral efficiencies, which show that NYUSIM is an alternative and more
realistic channel model compared to the 3rd Generation Partnership Project
(3GPP) and other channel models for mmWave bands.

Millimeter Wave MIMO Channel Estimation Based on Adaptive Compressed Sensing

Shu Sun, Theodore S. Rappaport
Comments: 7 pages, 5 figures, in 2017 IEEE International Conference on Communications Workshop (ICCW), May 2017
Subjects: Information Theory (cs.IT)

Multiple-input multiple-output (MIMO) systems are well suited for
millimeter-wave (mmWave) wireless communications where large antenna arrays can
be integrated in small form factors due to tiny wavelengths, thereby providing
high array gains while supporting spatial multiplexing, beamforming, or antenna
diversity. It has been shown that mmWave channels exhibit sparsity due to the
limited number of dominant propagation paths, thus compressed sensing
techniques can be leveraged to conduct channel estimation at mmWave
frequencies. This paper presents a novel approach of constructing beamforming
dictionary matrices for sparse channel estimation using the continuous basis
pursuit (CBP) concept, and proposes two novel low-complexity algorithms to
exploit channel sparsity for adaptively estimating multipath channel parameters
in mmWave channels. We verify the performance of the proposed CBP-based
beamforming dictionary and the two algorithms using a simulator built upon a
three-dimensional mmWave statistical spatial channel model, NYUSIM, that is
based on real-world propagation measurements. Simulation results show that the
CBPbased dictionary offers substantially higher estimation accuracy and greater
spectral efficiency than the grid-based counterpart introduced by previous
researchers, and the algorithms proposed here render better performance but
require less computational effort compared with existing algorithms.

Fast and Flexible Successive-Cancellation List Decoders for Polar Codes

Seyyed Ali Hashemi, Carlo Condo, Warren J. Gross
Subjects: Information Theory (cs.IT)

Polar codes have gained significant amount of attention during the past few
years and have been selected as a coding scheme for the next generation of
mobile broadband standard. Among decoding schemes, successive-cancellation list
(SCL) decoding provides a reasonable trade-off between the error-correction
performance and hardware implementation complexity when used to decode polar
codes, at the cost of limited throughput. The simplified SCL (SSCL) and its
extension SSCL-SPC increase the speed of decoding by removing redundant
calculations when encountering particular information and frozen bit patterns
(rate one and single parity check codes), while keeping the error-correction
performance unaltered. In this paper, we improve SSCL and SSCL-SPC by proving
that the list size imposes a specific number of bit estimations required to
decode rate one and single parity check codes. Thus, the number of estimations
can be limited while guaranteeing exactly the same error-correction performance
as if all bits of the code were estimated. We call the new decoding algorithms
Fast-SSCL and Fast-SSCL-SPC. Moreover, we show that the number of bit
estimations in a practical application can be tuned to achieve desirable speed,
while keeping the error-correction performance almost unchanged. Hardware
architectures implementing both algorithms are then described and implemented:
it is shown that our design can achieve 1.86 Gb/s throughput, higher than the
best state-of-the-art decoders.

MSE estimates for multitaper spectral estimation and off-grid compressive sensing

Luís Daniel Abreu, José Luis Romero
Comments: This replaces arXiv: 1503.02991
Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

We obtain estimates for the Mean Squared Error (MSE) for the multitaper
spectral estimator and certain compressive acquisition methods for multi-band
signals. We confirm a fact discovered by Thomson [Spectrum estimation and
harmonic analysis, Proc. IEEE, 1982]: assuming bandwidth (W) and (N) time
domain observations, the average of the square of the first (K=2NW) Slepian
functions approaches, as (K) grows, an ideal band-pass kernel for the interval
([-W,W]). We provide an analytic proof of this fact and measure the
corresponding rate of convergence in the (L^{1}) norm. This validates a
heuristic approximation used to control the MSE of the multitaper estimator.
The estimates have also consequences for the method of compressive acquisition
of multi-band signals introduced by Davenport and Wakin, giving MSE
approximation bounds for the dictionary formed by modulation of the critical
number of prolates.

Taming Tail Latency for Erasure-coded, Distributed Storage Systems

Vaneet Aggarwal, Abubakr O. Al-Abbasi, Jingxian Fan, Tian Lan
Comments: 11 pages, 8 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

Distributed storage systems are known to be susceptible to long tails in
response time. In modern online storage systems such as Bing, Facebook, and
Amazon, the long tails of the service latency are of particular concern. with
99.9th percentile response times being orders of magnitude worse than the mean.
As erasure codes emerge as a popular technique to achieve high data reliability
in distributed storage while attaining space efficiency, taming tail latency
still remains an open problem due to the lack of mathematical models for
analyzing such systems. To this end, we propose a framework for quantifying and
optimizing tail latency in erasure-coded storage systems. In particular, we
derive upper bounds on tail latency in closed form for arbitrary service time
distribution and heterogeneous files. Based on the model, we formulate an
optimization problem to jointly minimize the weighted latency tail probability
of all files over the placement of files on the servers, and the choice of
servers to access the requested files. The non-convex problem is solved using
an efficient, alternating optimization algorithm. Numerical results show
significant reduction of tail latency for erasure-coded storage systems with a
realistic workload.

Existence of Stein Kernels under a Spectral Gap, and Discrepancy Bound

Thomas A. Courtade, Max Fathi, Ashwin Pananjady
Comments: 19 pages, comments are welcome
Subjects: Probability (math.PR); Information Theory (cs.IT); Functional Analysis (math.FA)

We establish existence of Stein kernels for probability measures on
(mathbb{R}^d) satisfying a Poincar’e inequality, and obtain bounds on the
Stein discrepancy of such measures. Applications to quantitative central limit
theorems are discussed, including a new CLT in Wasserstein distance (W_2) with
optimal rate and dependence on the dimension. As a byproduct, we obtain a
stability version of an estimate of the Poincar’e constant of probability
measures under a second moment constraint. The results extend more generally to
the setting of converse weighted Poincar’e inequalities. The proof is based on
simple arguments of calculus of variations.

Further, we establish two general properties enjoyed by the Stein
discrepancy, holding whenever a Stein kernel exists: Stein discrepancy is
strictly decreasing along the CLT, and it controls the skewness of a random
vector.