IT博客汇 | arXiv Paper Daily: Tue, 28 Mar 2017

arXiv Paper Daily: Tue, 28 Mar 2017

我爱机器学习(52ml.net)发表于 2017-03-28 00:00:00

Neural and Evolutionary Computing

Where to put the Image in an Image Caption Generator

Marc Tanti (1), Albert Gatt (1), Kenneth P. Camilleri (1) ((1) University of Malta)
Comments: under review, 29 pages, 5 figures, 6 tables
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

When a neural language model is used for caption generation, the image
information can be fed to the neural network either by directly incorporating
it in a recurrent neural network — conditioning the language model by
injecting image features — or in a layer following the recurrent neural
network — conditioning the language model by merging the image features. While
merging implies that visual features are bound at the end of the caption
generation process, injecting can bind the visual features at a variety stages.
In this paper we empirically show that late binding is superior to early
binding in terms of different evaluation metrics. This suggests that the
different modalities (visual and linguistic) for caption generation should not
be jointly encoded by the RNN; rather, the multimodal integration should be
delayed to a subsequent stage. Furthermore, this suggests that recurrent neural
networks should not be viewed as actually generating text, but only as encoding
it for prediction in a subsequent layer.

Deep Deterministic Policy Gradient for Urban Traffic Light Control

Noe Casas
Subjects: Neural and Evolutionary Computing (cs.NE)

Traffic light timing optimization is still an active line of research despite
the wealth of scientific literature on the topic, and the problem remains
unsolved for any non-toy scenario. One of the key issues with traffic light
optimization is the large scale of the input information that is available for
the controlling agent, namely all the traffic data that is continually sampled
by the traffic detectors that cover the urban network. This issue has in the
past forced researchers to focus on agents that work on localized parts of the
traffic network, typically on individual intersections, and to coordinate every
individual agent in a multi-agent setup. In order to overcome the large scale
of the available state information, we propose to rely on the ability of deep
Learning approaches to handle large input spaces, in the form of Deep
Deterministic Policy Gradient (DDPG) algorithm. We performed several
experiments with a range of models, from the very simple one (one intersection)
to the more complex one (a big city section).

Surrogate Model of Multi-Period Flexibility from a Home Energy Management System

Rui Pinto, Ricardo Bessa, Manuel Matos
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Near-future electric distribution grids operation will have to rely on
demand-side flexibility, both by implementation of demand response strategies
and by taking advantage of the intelligent management of increasingly common
small-scale energy storage. Home energy management systems (HEMS) will play a
crucial role on the flexibility provision to both system operators and market
players like aggregators. Modeling multi-period flexibility from residential
consumers (HEMS flexibility), such as battery storage and electric water
heater, while complying with internal constraints (comfort levels, data
privacy) and uncertainty is a complex task. This paper describes a
computational method that is capable of efficiently define and learn the
feasible flexibility set from controllable resources connected to a HEMS. An
Evolutionary Particle Swarm Optimization (EPSO) algorithm is adopted and
reshaped to derive a set of feasible temporal trajectories for the residential
net-load, considering storage, flexible appliances, and predefined costumer
preferences, as well as load and photovoltaic (PV) forecast uncertainty. A
support vector data description (SVDD) algorithm is used to build models
capable of classifying feasible and unfeasible HEMS operating trajectories upon
request from an optimization/control algorithm operated by a DSO or market
player.

Balancing Selection Pressures, Multiple Objectives, and Neural Modularity to Coevolve Cooperative Agent Behavior

Alex C. Rollins, Jacob Schrum
Subjects: Neural and Evolutionary Computing (cs.NE)

Previous research using evolutionary computation in Multi-Agent Systems
indicates that assigning fitness based on team vs. individual behavior has a
strong impact on the ability of evolved teams of artificial agents to exhibit
teamwork in challenging tasks. However, such research only made use of
single-objective evolution. In contrast, when a multiobjective evolutionary
algorithm is used, populations can be subject to individual-level objectives,
team-level objectives, or combinations of the two. This paper explores the
performance of cooperatively coevolved teams of agents controlled by artificial
neural networks subject to these types of objectives. Specifically, predator
agents are evolved to capture scripted prey agents in a torus-shaped grid
world. Because of the tension between individual and team behaviors, multiple
modes of behavior can be useful, and thus the effect of modular neural networks
is also explored. Results demonstrate that fitness rewarding individual
behavior is superior to fitness rewarding team behavior, despite being applied
to a cooperative task. However, the use of networks with multiple modules
allows predators to discover intelligent behavior, regardless of which type of
objectives are used.

Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

Sebastian Gehrmann, Franck Dernoncourt, Yeran Li, Eric T. Carlson, Joy T. Wu, Jonathan Welt, John Foote Jr., Edward T. Moseley, David W. Grant, Patrick D. Tyler, Leo Anthony Celi
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Objective: We investigate whether deep learning techniques for natural
language processing (NLP) can be used efficiently for patient phenotyping.
Patient phenotyping is a classification task for determining whether a patient
has a medical condition, and is a crucial part of secondary analysis of
healthcare data. We assess the performance of deep learning algorithms and
compare them with classical NLP approaches.

Materials and Methods: We compare convolutional neural networks (CNNs),
n-gram models, and approaches based on cTAKES that extract pre-defined medical
concepts from clinical notes and use them to predict patient phenotypes. The
performance is tested on 10 different phenotyping tasks using 1,610 discharge
summaries extracted from the MIMIC-III database.

Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The
average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our
model having an F1-score up to 37 points higher than alternative approaches. We
additionally assess the interpretability of our model by presenting a method
that extracts the most salient phrases for a particular prediction.

Conclusion: We show that NLP methods based on deep learning improve the
performance of patient phenotyping. Our CNN-based algorithm automatically
learns the phrases associated with each patient phenotype. As such, it reduces
the annotation complexity for clinical domain experts, who are normally
required to develop task-specific annotation rules and identify relevant
phrases. Our method performs well in terms of both performance and
interpretability, which indicates that deep learning is an effective approach
to patient phenotyping based on clinicians’ notes.

Computer Vision and Pattern Recognition

Coherent Online Video Style Transfer

Dongdong Chen, Jing Liao, Yuan Lu, Nenghai Yu, Gang Hua
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Training a feed-forward network for fast neural style transfer of images is
proven to be successful. However, the naive extension to process video frame by
frame is prone to producing flickering results. We propose the first end-to-end
network for online video style transfer, which generates temporally coherent
stylized video sequences in near real-time. Two key ideas include an efficient
network by incorporating short-term coherence, and propagating short-term
coherence to long-term, which ensures the consistency over larger period of
time. Our network can incorporate different image stylization networks. We show
that the proposed method clearly outperforms the per-frame baseline both
qualitatively and quantitatively. Moreover, it can achieve visually comparable
coherence to optimization-based video style transfer, but is three orders of
magnitudes faster in runtime.

StyleBank: An Explicit Representation for Neural Image Style Transfer

Dongdong Chen, Yuan Lu, Jing Liao, Nenghai Yu, Gang Hua
Comments: Accepted by CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose StyleBank, which is composed of multiple convolution filter banks
and each filter bank explicitly represents one style, for neural image style
transfer. To transfer an image to a specific style, the corresponding filter
bank is operated on top of the intermediate feature embedding produced by a
single auto-encoder. The StyleBank and the auto-encoder are jointly learnt,
where the learning is conducted in such a way that the auto-encoder does not
encode any style information thanks to the flexibility introduced by the
explicit filter bank representation. It also enables us to conduct incremental
learning to add a new image style by learning a new filter bank while holding
the auto-encoder fixed. The explicit style representation along with the
flexible network design enables us to fuse styles at not only the image level,
but also the region level. Our method is the first style transfer network that
links back to traditional texton mapping methods, and hence provides new
understanding on neural style transfer. Our method is easy to train, runs in
real-time, and produces results that qualitatively better or at least
comparable to existing methods.

Deep Poincare Map For Robust Medical Image Segmentation

Yuanhan Mo, Fangde Liu, Jingqing Zhang, Guang Yang, Taigang He, Yike Guo
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Precise segmentation is a prerequisite for an accurate quantification of the
imaged objects. It is a very challenging task in many medical imaging
applications due to relatively poor image quality and data scarcity. In this
work, we present an innovative segmentation paradigm, named Deep Poincare Map
(DPM), by coupling the dynamical system theory with a novel deep learning based
approach. Firstly, we model the image segmentation process as a dynamical
system, in which limit cycle models the boundary of the region of interest
(ROI). Secondly, instead of segmenting the ROI directly, convolutional neural
network is employed to predict the vector field of the dynamical system.
Finally, the boundary of the ROI is identified using the Poincare map and the
flow integration. We demonstrate that our segmentation model can be built using
a very limited number of train- ing data. By cross-validation, we can achieve a
mean Dice score of 94% compared to the manual delineation (ground truth) of the
left ventricle ROI defined by clinical experts on a cardiac MRI dataset.
Compared with other state-of-the-art methods, we can conclude that the proposed
DPM method is adaptive, accurate and robust. It is straightforward to apply
this method for other medical imaging applications.

Introduction To The Monogenic Signal

Christopher P. Bridge
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The monogenic signal is an image analysis methodology that was introduced by
Felsberg and Sommer in 2001 and has been employed for a variety of purposes in
image processing and computer vision research. In particular, it has been found
to be useful in the analysis of ultrasound imagery in several research
scenarios mostly in work done within the BioMedIA lab at Oxford. However, the
literature on the monogenic signal can be difficult to penetrate due to the
lack of a single resource to explain the various principles from basics. The
purpose of this document is therefore to introduce the principles, purpose,
applications, and limitations of the methodology. It assumes some background
knowledge from the fields of image and signal processing, in particular a good
knowledge of Fourier transforms as applied to signals and images. We will not
attempt to provide a thorough math- ematical description or derivation of the
monogenic signal, but rather focus on developing an intuition for understanding
and using the methodology and refer the reader elsewhere for a more
mathematical treatment.

Transfer learning for music classification and regression tasks

Keunwoo Choi, György Fazekas, Mark Sandler, Kyunghyun Cho
Comments: 16 pages, single column, NOT iclr submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD)

In this paper, we present a transfer learning approach for music
classification and regression tasks. We propose to use a pretrained convnet
feature, a concatenated feature vector using activations of feature maps of
multiple layers in a trained convolutional network. We show that how this
convnet feature can serve as a general-purpose music representation. In the
experiment, a convnet is trained for music tagging and then transferred for
many music-related classification and regression tasks as well as an
audio-related classification task. In experiments, the convnet feature
outperforms the baseline MFCC feature in all tasks and many reported approaches
of aggregating MFCCs and low- and high-level music features.

A Study on the Extraction and Analysis of a Large Set of Eye Movement Features during Reading

Ioannis Rigas, Lee Friedman, Oleg Komogortsev
Comments: 38 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Quantitative Methods (q-bio.QM)

This work presents a study on the extraction and analysis of a set of 101
categories of eye movement features from three types of eye movement events:
fixations, saccades, and post-saccadic oscillations. The eye movements were
recorded during a reading task. For the categories of features with multiple
instances in a recording we extract corresponding feature subtypes by
calculating descriptive statistics on the distributions of these instances. A
unified framework of detailed descriptions and mathematical formulas are
provided for the extraction of the feature set. The analysis of feature values
is performed using a large database of eye movement recordings from a normative
population of 298 subjects. We demonstrate the central tendency and overall
variability of feature values over the experimental population, and more
importantly, we quantify the test-retest reliability (repeatability) of each
separate feature. The described methods and analysis can provide valuable tools
in fields exploring the eye movements, such as in behavioral studies, attention
and cognition research, medical research, biometric recognition, and
human-computer interaction.

Reweighted Infrared Patch-Tensor Model With Both Non-Local and Local Priors for Single-Frame Small Target Detection

Yimian Dai, Yiquan Wu
Comments: Submitted to IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 16 pages, 16 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Many state-of-the-art methods have been proposed for infrared small target
detection. They work well on the images with homogeneous backgrounds and
high-contrast targets. However, when facing highly heterogeneous backgrounds,
they would not perform very well, mainly due to: 1) the existence of strong
edges and other interfering components, 2) not utilizing the priors fully.
Inspired by this, we propose a novel method to exploit both local and non-local
priors simultaneously. Firstly, we employ a new infrared patch-tensor (IPT)
model to represent the image and preserve its spatial correlations. Exploiting
the target sparse prior and background non-local self-correlation prior, the
target-background separation is modeled as a robust low-rank tensor recovery
problem. Moreover, with the help of the structure tensor and reweighted idea,
we design an entry-wise local-structure-adaptive and sparsity enhancing weight
to replace the globally constant weighting parameter. The decomposition could
be achieved via the element-wise reweighted higher-order robust principal
component analysis with an additional convergence condition according to the
practical situation of target detection. Extensive experiments demonstrate that
our model outperforms the other state-of-the-arts, in particular for the images
with very dim targets and heavy clutters.

Multi-Path Region-Based Convolutional Neural Network for Accurate Detection of Unconstrained "Hard Faces"

Yuguang Liu, Martin D. Levine
Comments: 11 pages, 7 figures, to be presented at CRV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Large-scale variations still pose a challenge in unconstrained face
detection. To the best of our knowledge, no current face detection algorithm
can detect a face as large as 800 x 800 pixels while simultaneously detecting
another one as small as 8 x 8 pixels within a single image with equally high
accuracy. We propose a two-stage cascaded face detection framework, Multi-Path
Region-based Convolutional Neural Network (MP-RCNN), that seamlessly combines a
deep neural network with a classic learning strategy, to tackle this challenge.
The first stage is a Multi-Path Region Proposal Network (MP-RPN) that proposes
faces at three different scales. It simultaneously utilizes three parallel
outputs of the convolutional feature maps to predict multi-scale candidate face
regions. The “atrous” convolution trick (convolution with up-sampled filters)
and a newly proposed sampling layer for “hard” examples are embedded in MP-RPN
to further boost its performance. The second stage is a Boosted Forests
classifier, which utilizes deep facial features pooled from inside the
candidate face regions as well as deep contextual features pooled from a larger
region surrounding the candidate face regions. This step is included to further
remove hard negative samples. Experiments show that this approach achieves
state-of-the-art face detection performance on the WIDER FACE dataset “hard”
partition, outperforming the former best result by 9.6% for the Average
Precision.

Active Convolution: Learning the Shape of Convolution for Image Classification

Yunho Jeon, Junmo Kim
Comments: Accepted to appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, deep learning has achieved great success in many computer
vision applications. Convolutional neural networks (CNNs) have lately emerged
as a major approach to image classification. Most research on CNNs thus far has
focused on developing architectures such as the Inception and residual
networks. The convolution layer is the core of the CNN, but few studies have
addressed the convolution unit itself. In this paper, we introduce a
convolution unit called the active convolution unit (ACU). A new convolution
has no fixed shape, because of which we can define any form of convolution. Its
shape can be learned through backpropagation during training. Our proposed unit
has a few advantages. First, the ACU is a generalization of convolution; it can
define not only all conventional convolutions, but also convolutions with
fractional pixel coordinates. We can freely change the shape of the
convolution, which provides greater freedom to form CNN structures. Second, the
shape of the convolution is learned while training and there is no need to tune
it by hand. Third, the ACU can learn better than a conventional unit, where we
obtained the improvement simply by changing the conventional convolution to an
ACU. We tested our proposed method on plain and residual networks, and the
results showed significant improvement using our method on various datasets and
architectures in comparison with the baseline.

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks (DNNs) are currently widely used for many artificial
intelligence (AI) applications including computer vision, speech recognition,
and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it
comes at the cost of high computational complexity. Accordingly, techniques
that enable efficient processing of deep neural network to improve
energy-efficiency and throughput without sacrificing performance accuracy or
increasing hardware cost are critical to enabling the wide deployment of DNNs
in AI systems.

This article aims to provide a comprehensive tutorial and survey about the
recent advances towards the goal of enabling efficient processing of DNNs.
Specifically, it will provide an overview of DNNs, discuss various platforms
and architectures that support DNNs, and highlight key trends in recent
efficient processing techniques that reduce the computation cost of DNNs either
solely via hardware design changes or via joint hardware design and network
algorithm changes. It will also summarize various development resources that
can enable researchers and practitioners to quickly get started on DNN design,
and highlight important benchmarking metrics and design considerations that
should be used for evaluating the rapidly growing number of DNN hardware
designs, optionally including algorithmic co-design, being proposed in academia
and industry.

The reader will take away the following concepts from this article:
understand the key design considerations for DNNs; be able to evaluate
different DNN hardware implementations with benchmarks and comparison metrics;
understand trade-offs between various architectures and platforms; be able to
evaluate the utility of various DNN design techniques for efficient processing;
and understand of recent implementation trends and opportunities.

Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video

Davide Moltisanti, Michael Wray, Walterio Mayol-Cuevas, Dima Damen
Comments: 9 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Manual annotations of temporal bounds for object interactions (i.e. start and
end times) are typical training input to recognition, localization and
detection algorithms. For three publicly available egocentric datasets, we
uncover inconsistencies in ground truth temporal bounds within and across
annotators and datasets. We systematically assess the robustness of
state-of-the-art approaches to changes in labeled temporal bounds, for object
interaction recognition. As boundaries are trespassed, a drop of up to 10% is
observed for both Improved Dense Trajectories and Two-Stream Convolutional
Neural Network. We demonstrate that such disagreement stems from a limited
understanding of the distinct phases of an action, and propose annotating based
on the Rubicon Boundaries, inspired by a similarly named cognitive model, for
consistent temporal bounds of object interactions. Evaluated on a public
dataset, we report a 4% increase in overall accuracy, and an increase in
accuracy for 55% of classes when Rubicon Boundaries are used for temporal
annotations.

Simultaneous Perception and Path Generation Using Fully Convolutional Neural Networks

Luca Caltagirone, Mauro Bellone, Lennart Svensson, Mattias Wahde
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this work, a novel learning-based approach has been developed to generate
driving paths by integrating LIDAR point clouds, GPS-IMU information, and
Google driving directions. The system is based on a fully convolutional neural
network that jointly learns to carry out perception and path generation from
real-world driving sequences and that is trained using automatically generated
training examples. Several combinations of input data were tested in order to
assess the performance gain provided by specific information modalities. The
fully convolutional neural network trained using all the available sensors
together with driving directions achieved the best MaxF score of 88.13% when
considering a region of interest of 60×60 meters. By considering a smaller
region of interest, the agreement between predicted paths and ground-truth
increased to 92.60%. The positive results obtained in this work indicate that
the proposed system may help fill the gap between low-level scene parsing and
behavior-reflex approaches by generating outputs that are close to vehicle
control and at the same time human-interpretable.

Mastering Sketching: Adversarial Augmentation for Structured Prediction

Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa
Comments: 12 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present an integral framework for training sketch simplification networks
that convert challenging rough sketches into clean line drawings. Our approach
augments a simplification network with a discriminator network, training both
networks jointly so that the discriminator network discerns whether a line
drawing is a real training data or the output of the simplification network,
which in turn tries to fool it. This approach has two major advantages. First,
because the discriminator network learns the structure in line drawings, it
encourages the output sketches of the simplification network to be more similar
in appearance to the training sketches. Second, we can also train the
simplification network with additional unsupervised data, using the
discriminator network as a substitute teacher. Thus, by adding only rough
sketches without simplified line drawings, or only line drawings without the
original rough sketches, we can improve the quality of the sketch
simplification. We show how our framework can be used to train models that
significantly outperform the state of the art in the sketch simplification
task, despite using the same architecture for inference. We additionally
present an approach to optimize for a single image, which improves accuracy at
the cost of additional computation time. Finally, we show that, using the same
framework, it is possible to train the network to perform the inverse problem,
i.e., convert simple line sketches into pencil drawings, which is not possible
using the standard mean squared error loss. We validate our framework with two
user tests, where our approach is preferred to the state of the art in sketch
simplification 92.3% of the time and obtains 1.2 more points on a scale of 1 to
5.

Scaling the Scattering Transform: Deep Hybrid Networks

Edouard Oyallon (DI-ENS), Eugene Belilovsky (CVN, GALEN), Sergey Zagoruyko (ENPC)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We use the scattering network as a generic and fixed initialization of the
first layers of a supervised hybrid deep network. We show that early layers do
not necessarily need to be learned, providing the best results to-date with
pre-defined representations while being competitive with Deep CNNs. Using a
shallow cascade of 1×1 convolutions, which encodes scattering coefficients that
correspond to spatial windows of very small sizes, permits to obtain AlexNet
accuracy on the imagenet ILSVRC2012. We demonstrate that this local encoding
explicitly learns in-variance w.r.t. rotations. Combining scattering networks
with a modern ResNet, we achieve a single-crop top 5 error of 11.4% on imagenet
ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10
layers. We also find that hybrid architectures can yield excellent performance
in the small sample regime, exceeding their end-to-end counterparts, through
their ability to incorporate geometrical priors. We demonstrate this on subsets
of the CIFAR-10 dataset and by setting a new state-of-the-art on the STL-10
dataset.

MIHash: Online Hashing with Mutual Information

Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff
Comments: 16 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Learning-based adaptive hashing methods are widely used for nearest neighbor
retrieval. Recently, online hashing methods have demonstrated a good
performance-complexity tradeoff by learning hash functions from streaming data.
In this paper, we aim to advance the state-of-the-art for online hashing. We
first address a key challenge that has often been ignored: the binary codes for
indexed data must be recomputed to keep pace with updates to the hash
functions. We propose an efficient quality measure for hash functions, based on
an information-theoretic quantity, mutual information, and use it successfully
as a criterion to eliminate unnecessary hash table updates. Next, we show that
mutual information can also be used as an objective in learning hash functions,
using gradient-based optimization. Experiments on image retrieval benchmarks
(including a 2.5M image dataset) confirm the effectiveness of our formulation,
both in reducing hash table recomputations and in learning high-quality hash
functions.

A Visual Measure of Changes to Weighted Self-Organizing Map Patterns

Younjin Chung, Joachim Gudmundsson, Masahiro Takatsuka
Comments: 8 pages, 3 figures, conference, llncs style
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Estimating output changes by input changes is the main task in causal
analysis. In previous work, input and output Self-Organizing Maps (SOMs) were
associated for causal analysis of multivariate and nonlinear data. Based on the
association, a weight distribution of the output conditional on a given input
was obtained over the output map space. Such a weighted SOM pattern of the
output changes when the input changes. In order to analyze the change, it is
important to measure the difference of the patterns. Many methods have been
proposed for the dissimilarity measure of patterns. However, it remains a major
challenge when attempting to measure how the patterns change. In this paper, we
propose a visualization approach that simplifies the comparison of the
difference in terms of the pattern property. Using this approach, the change
can be analyzed by integrating colors and star glyph shapes representing the
property dissimilarity. Ecological data is used to demonstrate the usefulness
of our approach and the experimental results show that our approach provides
the change information effectively.

Exploiting Color Name Space for Salient Object Detection

Jing Lou, Huan Wang, Longtao Chen, Qingyuan Xia, Wei Zhu, Mingwu Ren
Comments: 13 pages, 10 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we will investigate the contribution of color names for
salient object detection. Each input image is first converted to the color name
space, which is consisted of 11 probabilistic channels. By exploring the
topological structure relationship between the figure and the ground, we obtain
a saliency map through a linear combination of a set of sequential attention
maps. To overcome the limitation of only exploiting the surroundedness cue, two
global cues with respect to color names are invoked for guiding the computation
of another weighted saliency map. Finally, we integrate the two saliency maps
into a unified framework to infer the saliency result. In addition, an improved
post-processing procedure is introduced to effectively suppress the background
while uniformly highlight the salient objects. Experimental results show that
the proposed model produces more accurate saliency maps and performs well
against 23 saliency models in terms of three evaluation metrics on three public
datasets.

Transductive Zero-Shot Learning with Adaptive Structural Embedding

Yunlong Yu, Zhong Ji, Jichang Guo, Yanwei Pang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Zero-shot learning (ZSL) endows the computer vision system with the
inferential capability to recognize instances of a new category that has never
seen before. Two fundamental challenges in it are visual-semantic embedding and
domain adaptation in cross-modality learning and unseen class prediction steps,
respectively. To address both challenges, this paper presents two corresponding
methods named Adaptive STructural Embedding (ASTE) and Self-PAsed Selective
Strategy (SPASS), respectively. Specifically, ASTE formulates the
visualsemantic interactions in a latent structural SVM framework to adaptively
adjust the slack variables to embody the different reliableness among training
instances. In this way, the reliable instances are imposed with small
punishments, wheras the less reliable instances are imposed with more severe
punishments. Thus, it ensures a more discriminative embedding. On the other
hand, SPASS offers a framework to alleviate the domain shift problem in ZSL,
which exploits the unseen data in an easy to hard fashion. Particularly, SPASS
borrows the idea from selfpaced learning by iteratively selecting the unseen
instances from reliable to less reliable to gradually adapt the knowledge from
the seen domain to the unseen domain. Subsequently, by combining SPASS and
ASTE, we present a self-paced Transductive ASTE (TASTE) method to progressively
reinforce the classification capacity. Extensive experiments on three benchmark
datasets (i.e., AwA, CUB, and aPY) demonstrate the superiorities of ASTE and
TASTE. Furthermore, we also propose a fast training (FT) strategy to improve
the efficiency of most of existing ZSL methods. The FT strategy is surprisingly
simple and general enough, which can speed up the training time of most
existing methods by 4~300 times while holding the previous performance.

Transductive Zero-Shot Learning with a Self-training dictionary approach

Yunlong Yu, Zhong Ji, Xi Li, Jichang Guo, Zhongfei Zhang, Haibin Ling, Fei Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

As an important and challenging problem in computer vision, zero-shot
learning (ZSL) aims at automatically recognizing the instances from unseen
object classes without training data. To address this problem, ZSL is usually
carried out in the following two aspects: 1) capturing the domain distribution
connections between seen classes data and unseen classes data; and 2) modeling
the semantic interactions between the image feature space and the label
embedding space. Motivated by these observations, we propose a bidirectional
mapping based semantic relationship modeling scheme that seeks for crossmodal
knowledge transfer by simultaneously projecting the image features and label
embeddings into a common latent space. Namely, we have a bidirectional
connection relationship that takes place from the image feature space to the
latent space as well as from the label embedding space to the latent space. To
deal with the domain shift problem, we further present a transductive learning
approach that formulates the class prediction problem in an iterative refining
process, where the object classification capacity is progressively reinforced
through bootstrapping-based model updating over highly reliable instances.
Experimental results on three benchmark datasets (AwA, CUB and SUN) demonstrate
the effectiveness of the proposed approach against the state-of-the-art
approaches.

Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

Lingni Ma, Jörg Stückler, Christian Kerl, Daniel Cremers
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Visual scene understanding is an important capability that enables robots to
purposefully act in their environment. In this paper, we propose a novel
approach to object-class segmentation from multiple RGB-D views using deep
learning. We train a deep neural network to predict object-class semantics that
is consistent from several view points in a semi-supervised way. At test time,
the semantics predictions of our network can be fused more consistently in
semantic keyframe maps than predictions of a network trained on individual
views. We base our network architecture on a recent single-view deep learning
approach to RGB and depth fusion for semantic object-class segmentation and
enhance it with multi-scale loss minimization. We obtain the camera trajectory
using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth
annotated frames in order to enforce multi-view consistency during training. At
test time, predictions from multiple views are fused into keyframes. We propose
and analyze several methods for enforcing multi-view consistency during
training and testing. We evaluate the benefit of multi-view consistency
training and demonstrate that pooling of deep features and fusion over multiple
views outperforms single-view baselines on the NYUDv2 benchmark for semantic
segmentation. Our end-to-end trained network achieves state-of-the-art
performance on the NYUDv2 dataset in single-view segmentation as well as
multi-view semantic fusion.

Person Re-Identification by Camera Correlation Aware Feature Augmentation

Ying-Cong Chen, Xiatian Zhu, Wei-Shi Zheng, Jian-Huang Lai
Comments: To Appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The challenge of person re-identification (re-id) is to match individual
images of the same person captured by different non-overlapping camera views
against significant and unknown cross-view feature distortion. While a large
number of distance metric/subspace learning models have been developed for
re-id, the cross-view transformations they learned are view-generic and thus
potentially less effective in quantifying the feature distortion inherent to
each camera view. Learning view-specific feature transformations for re-id
(i.e., view-specific re-id), an under-studied approach, becomes an alternative
resort for this problem. In this work, we formulate a novel view-specific
person re-identification framework from the feature augmentation point of view,
called Camera coRrelation Aware Feature augmenTation (CRAFT). Specifically,
CRAFT performs cross-view adaptation by automatically measuring camera
correlation from cross-view visual data distribution and adaptively conducting
feature augmentation to transform the original features into a new adaptive
space. Through our augmentation framework, view-generic learning algorithms can
be readily generalized to learn and optimize view-specific sub-models whilst
simultaneously modelling view-generic discrimination information. Therefore,
our framework not only inherits the strength of view-generic model learning but
also provides an effective way to take into account view specific
characteristics. Our CRAFT framework can be extended to jointly learn
view-specific feature transformations for person re-id across a large network
with more than two cameras, a largely under-investigated but realistic re-id
setting. Additionally, we present a domain-generic deep person appearance
representation which is designed particularly to be towards view invariant for
facilitating cross-view adaptation by CRAFT.

Learned multi-patch similarity

Wilfried Hartmann, Silvano Galliani, Michal Havlena, Konrad Schindler, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Estimating a depth map from multiple views of a scene is a fundamental task
in computer vision. As soon as more than two viewpoints are available, one
faces the very basic question how to measure similarity across >2 image
patches. Surprisingly, no direct solution exists, instead it is common to fall
back to more or less robust averaging of two-view similarities. Encouraged by
the success of machine learning, and in particular convolutional neural
networks, we propose to learn a matching function which directly maps multiple
image patches to a scalar similarity score. Experiments on several multi-view
datasets demonstrate that this approach has advantages over methods based on
pairwise patch similarity.

SCAN: Structure Correcting Adversarial Network for Chest X-rays Organ Segmentation

Wei Dai, Joseph Doyle, Xiaodan Liang, Hao Zhang, Nanqing Dong, Yuan Li, Eric P. Xing
Comments: 10 pages, 7 figures, submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Chest X-ray (CXR) is one of the most commonly prescribed medical imaging
procedures, often with over 2-10x more scans than other imaging modalities such
as MRI, CT scan, and PET scans. These voluminous CXR scans place significant
workloads on radiologists and medical practitioners. Organ segmentation is a
crucial step to obtain effective computer-aided detection on CXR. In this work,
we propose Structure Correcting Adversarial Network (SCAN) to segment lung
fields and the heart in CXR images. SCAN incorporates a critic network to
impose on the convolutional segmentation network the structural regularities
emerging from human physiology. During training, the critic network learns to
discriminate between the ground truth organ annotations from the masks
synthesized by the segmentation network. Through this adversarial process the
critic network learns the higher order structures and guides the segmentation
model to achieve realistic segmentation outcomes. Extensive experiments show
that our method produces highly accurate and natural segmentation. Using only
very limited training data available, our model reaches human-level performance
without relying on any existing trained model or dataset. Our method also
generalizes well to CXR images from a different patient population and disease
profiles, surpassing the current state-of-the-art.

Open Vocabulary Scene Parsing

Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recognizing arbitrary objects in the wild has been a challenging problem due
to the limitations of existing classification models and datasets. In this
paper, we propose a new task that aims at parsing scene with a large and open
vocabulary, and several evaluation metrics are explored for this problem. Our
proposed approach to this problem is a joint image pixel and word concept
embeddings framework, where word concepts are connected by semantic relations.
We validate the open vocabulary prediction ability of our framework on ADE20K
dataset which covers a wide variety of scenes and objects. We further explore
the trained joint embedding space to show its interpretability.

Structured Learning of Tree Potentials in CRF for Image Segmentation

Fayao Liu, Guosheng Lin, Ruizhi Qiao, Chunhua Shen
Comments: 10 pages. Appearing in IEEE Transactions on Neural Networks and Learning Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a new approach to image segmentation, which exploits the
advantages of both conditional random fields (CRFs) and decision trees. In the
literature, the potential functions of CRFs are mostly defined as a linear
combination of some pre-defined parametric models, and then methods like
structured support vector machines (SSVMs) are applied to learn those linear
coefficients. We instead formulate the unary and pairwise potentials as
nonparametric forests—ensembles of decision trees, and learn the ensemble
parameters and the trees in a unified optimization problem within the
large-margin framework. In this fashion, we easily achieve nonlinear learning
of potential functions on both unary and pairwise terms in CRFs. Moreover, we
learn class-wise decision trees for each object that appears in the image. Due
to the rich structure and flexibility of decision trees, our approach is
powerful in modelling complex data likelihoods and label relationships. The
resulting optimization problem is very challenging because it can have
exponentially many variables and constraints. We show that this challenging
optimization can be efficiently solved by combining a modified column
generation and cutting-planes techniques. Experimental results on both binary
(Graz-02, Weizmann horse, Oxford flower) and multi-class (MSRC-21, PASCAL VOC
2012) segmentation datasets demonstrate the power of the learned nonlinear
nonparametric potentials.

Sketch-based Face Editing in Video Using Identity Deformation Transfer

Long Zhao, Fangda Han, Mubbasir Kapadia, Vladimir Pavlovic, Dimitris Metaxas
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address the problem of using hand-drawn sketch to edit facial identity,
such as enlarging the shape or modifying the position of eyes or mouth, in the
whole video. This task is formulated as a 3D face model reconstruction and
deformation problem. We first introduce a two-stage real-time 3D face model
fitting schema to recover facial identity and expressions from the video. We
recognize the user’s editing intention from the input sketch as a set of facial
modifications. A novel identity deformation algorithm is then proposed to
transfer these deformations from 2D space to 3D facial identity directly, while
preserving the facial expressions. Finally, these changes are propagated to the
whole video with the modified identity. Experimental results demonstrate that
our method can effectively edit facial identity in video based on the input
sketch with high consistency and fidelity.

Count-ception: Counting by Fully Convolutional Redundant Counting

Joseph Paul Cohen, Henry Z. Lo, Yoshua Bengio
Comments: Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Counting objects in digital images is a process that should be replaced by
machines. This tedious task is time consuming and prone to errors due to
fatigue of human annotators. The goal is to have a system that takes as input
an image and returns a count of the objects inside and justification for the
prediction in the form of object localization. We repose a problem, originally
posed by Lempitsky and Zisserman, to instead predict a count map which contains
redundant counts based on the receptive field of a smaller regression network.
The regression network predicts a count of the objects that exist inside this
frame. By processing the image in a fully convolutional way each pixel is going
to be accounted for some number of times, the number of windows which include
it, which is the size of each window, (i.e., 32×32 = 1024). To recover the true
count take the average over the redundant predictions. Our contribution is
redundant counting instead of predicting a density map in order to average over
errors. We also propose a novel deep neural network architecture adapted from
the Inception family of networks called the Count-ception network. Together our
approach results in a 20% gain over the state of the art method by Xie, Noble,
and Zisserman in 2016.

Improving the Accuracy of the CogniLearn System for Cognitive Behavior Assessment

Amir Ghaderi, Srujana Gattupalli, Dylan Ebert, Ali Sharifara, Vassilis Athitsos, Fillia Makedon
Subjects: Computer Vision and Pattern Recognition (cs.CV)

HTKS is a game-like cognitive assessment method, designed for children
between four and eight years of age. During the HTKS assessment, a child
responds to a sequence of requests, such as “touch your head” or “touch your
toes”. The cognitive challenge stems from the fact that the children are
instructed to interpret these requests not literally, but by touching a
different body part than the one stated. In prior work, we have developed the
CogniLearn system, that captures data from subjects performing the HTKS game,
and analyzes the motion of the subjects. In this paper we propose some specific
improvements that make the motion analysis module more accurate. As a result of
these improvements, the accuracy in recognizing cases where subjects touch
their toes has gone from 76.46% in our previous work to 97.19% in this paper.

Bayesian Optimization for Refining Object Proposals

Anthony D. Rhodes, Jordan Witte, Melanie Mitchell, Bruno Jedynak
Comments: 8 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We develop a general-purpose algorithm using a Bayesian optimization
framework for the efficient refinement of object proposals. While recent
research has achieved substantial progress for object localization and related
objectives in computer vision, current state-of-the-art object localization
procedures are nevertheless encumbered by inefficiency and inaccuracy. We
present a novel, computationally efficient method for refining inaccurate
bounding-box proposals for a target object using Bayesian optimization.
Offline, image features from a convolutional neural network are used to train a
model to predict the offset distance of an object proposal from a target
object. Online, this model is used in a Bayesian active search to improve
inaccurate object proposals. In experiments, we compare our approach to a
state-of-the-art bounding-box regression method for localization refinement of
pedestrian object proposals. Our method exhibits a substantial improvement for
the task of localization refinement over this baseline regression method.

More is Less: A More Complicated Network with Less Inference Complexity

Xuanyi Dong, Junshi Huang, Yi Yang, Shuicheng Yan
Comments: This paper has been accepted by the IEEE CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a novel and general network structure towards
accelerating the inference process of convolutional neural networks, which is
more complicated in network structure yet with less inference complexity. The
core idea is to equip each original convolutional layer with another low-cost
collaborative layer (LCCL), and the element-wise multiplication of the ReLU
outputs of these two parallel layers produces the layer-wise output. The
combined layer is potentially more discriminative than the original
convolutional layer, and its inference is faster for two reasons: 1) the zero
cells of the LCCL feature maps will remain zero after element-wise
multiplication, and thus it is safe to skip the calculation of the
corresponding high-cost convolution in the original convolutional layer, 2)
LCCL is very fast if it is implemented as a 1*1 convolution or only a single
filter shared by all channels. Extensive experiments on the CIFAR-10, CIFAR-100
and ILSCRC-2012 benchmarks show that our proposed network structure can
accelerate the inference process by 32\% on average with negligible performance
drop.

AMAT: Medial Axis Transform for Natural Images

Stavros Tsogkas, Sven Dickinson
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The medial axis transform (MAT) is a powerful shape abstraction that has been
successfully used in shape editing, matching and retrieval. Despite its long
history, the MAT has not found widespread use in tasks involving natural
images, due to the lack of a generalization that accommodates color and
texture. In this paper we introduce Appearance-MAT (AMAT), by framing the MAT
of natural images as a weighted geometric set cover problem. We make the
following contributions: i) we extend previous medial point detection methods
for color images, by associating each medial point with a local scale; ii)
inspired by the invertibility property of the binary MAT, we also associate
each medial point with a local encoding that allows us to invert the AMAT,
reconstructing the input image; iii) we describe a clustering scheme that takes
advantage of the additional scale and appearance information to group
individual points into medial branches, providing a shape decomposition of the
underlying image regions. In our experiments, we show state-of-the-art
performance in medial point detection on Berkeley Medial AXes (BMAX500), a new
dataset of medial axes based on the established BSDS500 database. We also
measure the quality of reconstructed images from the same dataset, obtained by
inverting their computed AMAT. Our approach delivers significantly better
reconstruction quality with respect to three baselines, using just 10% of the
image pixels. Our code is available at this https URL

Temporal Non-Volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition

Chi Nhan Duong, Kha Gia Quach, Khoa Luu, T. Hoang Ngan le, Marios Savvides
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Modeling the long-term facial aging process is extremely challenging due to
the presence of large and non-linear variations during the face development
stages. In order to efficiently address the problem, this work first decomposes
the aging process into multiple short-term stages. Then, a novel generative
probabilistic model, named Temporal Non-Volume Preserving (TNVP)
transformation, is presented to model the facial aging process at each stage.
Unlike Generative Adversarial Networks (GANs), which requires an empirical
balance threshold, and Restricted Boltzmann Machines (RBM), an intractable
model, our proposed TNVP approach guarantees a tractable density function,
exact inference and evaluation for embedding the feature transformations
between faces in consecutive stages. Our model shows its advantages not only in
capturing the non-linear age related variance in each stage but also producing
a smooth synthesis in age progression across faces. Our approach can model any
face in the wild provided with only four basic landmark points. Moreover, the
structure can be transformed into a deep convolutional network while keeping
the advantages of probabilistic models with tractable log-likelihood density
estimation. Our method is evaluated in both terms of synthesizing
age-progressed faces and cross-age face verification and consistently shows the
state-of-the-art results in various face aging databases, i.e. FG-NET, MORPH,
AginG Faces in the Wild (AGFW), and Cross-Age Celebrity Dataset (CACD). A
large-scale face verification on Megaface challenge 1 is also performed to
further show the advantages of our proposed approach.

Adversarial Examples for Semantic Segmentation and Object Detection

Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, Alan Yuille
Comments: Submitted to ICCV 2017 (10 pages, 6 figures)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

It has been well demonstrated that adversarial examples, i.e., natural images
with visually imperceptible perturbations added, generally exist for deep
networks to fail on image classification. In this paper, we extend adversarial
examples to semantic segmentation and object detection which are much more
difficult. Our observation is that both segmentation and detection are based on
classifying multiple targets on an image (e.g., the basic target is a pixel or
a receptive field in segmentation, and an object proposal in detection), which
inspires us to optimize a loss function over a set of pixels/proposals for
generating adversarial perturbations. Based on this idea, we propose a novel
algorithm named Dense Adversary Generation (DAG), which generates a large
family of adversarial examples, and applies to a wide range of state-of-the-art
deep networks for segmentation and detection. We also find that the adversarial
perturbations can be transferred across networks with different training data,
based on different architectures, and even for different recognition tasks. In
particular, the transferability across networks with the same architecture is
more significant than in other cases. Besides, summing up heterogeneous
perturbations often leads to better transfer performance, which provides an
effective method of black-box adversarial attack.

Deep Residual Learning for Instrument Segmentation in Robotic Surgery

Daniil Pakhomov, Vittal Premachandran, Max Allan, Mahdi Azizian, Nassir Navab
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Detection, tracking, and pose estimation of surgical instruments are crucial
tasks for computer assistance during minimally invasive robotic surgery. In the
majority of cases, the first step is the automatic segmentation of surgical
tools. Prior work has focused on binary segmentation, where the objective is to
label every pixel in an image as tool or background. We improve upon previous
work in two major ways. First, we leverage recent techniques such as deep
residual learning and dilated convolutions to advance binary-segmentation
performance. Second, we extend the approach to multi-class segmentation, which
lets us segment different parts of the tool, in addition to background. We
demonstrate the performance of this method on the MICCAI Endoscopic Vision
Challenge Robotic Instruments dataset.

A Dynamic Programming Solution to Bounded Dejittering Problems

Lukas F. Lang
Comments: The final publication is available at link.springer.com
Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV)

We propose a dynamic programming solution to image dejittering problems with
bounded displacements and obtain efficient algorithms for the removal of line
jitter, line pixel jitter, and pixel jitter.

Where to put the Image in an Image Caption Generator

When a neural language model is used for caption generation, the image
information can be fed to the neural network either by directly incorporating
it in a recurrent neural network — conditioning the language model by
injecting image features — or in a layer following the recurrent neural
network — conditioning the language model by merging the image features. While
merging implies that visual features are bound at the end of the caption
generation process, injecting can bind the visual features at a variety stages.
In this paper we empirically show that late binding is superior to early
binding in terms of different evaluation metrics. This suggests that the
different modalities (visual and linguistic) for caption generation should not
be jointly encoded by the RNN; rather, the multimodal integration should be
delayed to a subsequent stage. Furthermore, this suggests that recurrent neural
networks should not be viewed as actually generating text, but only as encoding
it for prediction in a subsequent layer.

Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs

Yunzhu Li, Jiaming Song, Stefano Ermon
Comments: 10 pages, 6 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The goal of imitation learning is to match example expert behavior, without
access to a reinforcement signal. Expert demonstrations provided by humans,
however, often show significant variability due to latent factors that are not
explicitly modeled. We introduce an extension to the Generative Adversarial
Imitation Learning method that can infer the latent structure of human
decision-making in an unsupervised way. Our method can not only imitate complex
behaviors, but also learn interpretable and meaningful representations. We
demonstrate that the approach is applicable to high-dimensional environments
including raw visual inputs. In the highway driving domain, we show that a
model learned from demonstrations is able to both produce different styles of
human-like driving behaviors and accurately anticipate human actions. Our
method surpasses various baselines in terms of performance and functionality.

Who Said What: Modeling Individual Labelers Improves Classification

Melody Y. Guan, Varun Gulshan, Andrew M. Dai, Geoffrey E. Hinton
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Data are often labeled by many different experts with each expert only
labeling a small fraction of the data and each data point being labeled by
several experts. This reduces the workload on individual experts and also gives
a better estimate of the unobserved ground truth. When experts disagree, the
standard approaches are to treat the majority opinion as the correct label or
to model the correct label as a distribution. These approaches, however, do not
make any use of potentially valuable information about which expert produced
which label. To make use of this extra information, we propose modeling the
experts individually and then learning averaging weights for combining them,
possibly in sample-specific ways. This allows us to give more weight to more
reliable experts and take advantage of the unique strengths of individual
experts at classifying certain types of data. Here we show that our approach
leads to improvements in computer-aided diagnosis of diabetic retinopathy. We
also show that our method performs better than competing algorithms by Welinder
and Perona, and by Mnih and Hinton. Our work offers an innovative approach for
dealing with the myriad real-world settings that use expert opinions to define
labels for training.

Multivariate Regression with Gross Errors on Manifold-valued Data

Xiaowei Zhang, Xudong Shi, Yu Sun, Li Cheng
Comments: Submitted to a journal
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)

We consider the topic of multivariate regression on manifold-valued output,
that is, for a multivariate observation, its output response lies on a
manifold. Moreover, we propose a new regression model to deal with the presence
of grossly corrupted manifold-valued responses, a bottleneck issue commonly
encountered in practical scenarios. Our model first takes a correction step on
the grossly corrupted responses via geodesic curves on the manifold, and then
performs multivariate linear regression on the corrected data. This results in
a nonconvex and nonsmooth optimization problem on manifolds. To this end, we
propose a dedicated approach named PALMR, by utilizing and extending the
proximal alternating linearized minimization techniques. Theoretically, we
investigate its convergence property, where it is shown to converge to a
critical point under mild conditions. Empirically, we test our model on both
synthetic and real diffusion tensor imaging data, and show that our model
outperforms other multivariate regression models when manifold-valued responses
contain gross errors, and is effective in identifying gross errors.

Artificial Intelligence

On Automating the Doctrine of Double Effect

Naveen Sundar Govindarajulu, Selmer Bringsjord
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Robotics (cs.RO)

The doctrine of double effect ((mathcal{DDE})) is a long-studied ethical
principle that governs when actions that have both positive and negative
effects are to be allowed. The goal in this paper is to automate
(mathcal{DDE}). We briefly present (mathcal{DDE}), and use a first-order
modal logic, the deontic cognitive event calculus, as our framework to
formalize the doctrine. We present formalizations of increasingly stronger
versions of the principle, including what is known as the doctrine of triple
effect. We then use our framework to simulate successfully scenarios that have
been used to test for the presence of the principle in human subjects. Our
framework can be used in two different modes: One can use it to build
(mathcal{DDE})-compliant autonomous systems from scratch, or one can use it to
verify that a given AI system is (mathcal{DDE})-compliant, by applying a
(mathcal{DDE}) layer on an existing system or model. For the latter mode, the
underlying AI system can be built using any architecture (planners, deep neural
networks, bayesian networks, knowledge-representation systems, or a hybrid); as
long as the system exposes a few parameters in its model, such verification is
possible. The role of the (mathcal{DDE}) layer here is akin to a (dynamic or
static) software verifier that examines existing software modules. Finally, we
end by presenting initial work on how one can apply our (mathcal{DDE}) layer
to the STRIPS-style planning model, and to a modified POMDP model.

Team Formation for Scheduling Educational Material in Massive Online Classes

Sanaz Bahargam, Dóra Erdos, Azer Bestavros, Evimaria Terzi
Subjects: Artificial Intelligence (cs.AI)

Whether teaching in a classroom or a Massive Online Open Course it is crucial
to present the material in a way that benefits the audience as a whole. We
identify two important tasks to solve towards this objective, 1 group students
so that they can maximally benefit from peer interaction and 2 find an optimal
schedule of the educational material for each group. Thus, in this paper, we
solve the problem of team formation and content scheduling for education. Given
a time frame d, a set of students S with their required need to learn different
activities T and given k as the number of desired groups, we study the problem
of finding k group of students. The goal is to teach students within time frame
d such that their potential for learning is maximized and find the best
schedule for each group. We show this problem to be NP-hard and develop a
polynomial algorithm for it. We show our algorithm to be effective both on
synthetic as well as a real data set. For our experiments, we use real data on
students’ grades in a Computer Science department. As part of our contribution,
we release a semi-synthetic dataset that mimics the properties of the real
data.

Transfer learning for music classification and regression tasks

In this paper, we present a transfer learning approach for music
classification and regression tasks. We propose to use a pretrained convnet
feature, a concatenated feature vector using activations of feature maps of
multiple layers in a trained convolutional network. We show that how this
convnet feature can serve as a general-purpose music representation. In the
experiment, a convnet is trained for music tagging and then transferred for
many music-related classification and regression tasks as well as an
audio-related classification task. In experiments, the convnet feature
outperforms the baseline MFCC feature in all tasks and many reported approaches
of aggregating MFCCs and low- and high-level music features.

Intelligent bidirectional rapidly-exploring random trees for optimal motion planning in complex cluttered environments

Ahmed Hussain Qureshi, Yasar Ayaz
Comments: The article is published in Elsevier Journal of Robotics and Autonomous Systems
Journal-ref: Robotics and Autonomous Systems 68 (2015): 1-11
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

The sampling based motion planning algorithm known as Rapidly-exploring
Random Trees (RRT) has gained the attention of many researchers due to their
computational efficiency and effectiveness. Recently, a variant of RRT called
RRT* has been proposed that ensures asymptotic optimality. Subsequently its
bidirectional version has also been introduced in the literature known as
Bidirectional-RRT* (B-RRT*). We introduce a new variant called Intelligent
Bidirectional-RRT* (IB-RRT*) which is an improved variant of the optimal RRT*
and bidirectional version of RRT* (B-RRT*) algorithms and is specially designed
for complex cluttered environments. IB-RRT* utilizes the bidirectional trees
approach and introduces intelligent sample insertion heuristic for fast
convergence to the optimal path solution using uniform sampling heuristics. The
proposed algorithm is evaluated theoretically and experimental results are
presented that compares IB-RRT* with RRT* and B-RRT*. Moreover, experimental
results demonstrate the superior efficiency of IB-RRT* in comparison with RRT*
and B-RRT in complex cluttered environments.

Socially Aware Motion Planning with Deep Reinforcement Learning

Yu Fan Chen, Michael Everett, Miao Liu, Jonathan P. How
Comments: 8 pages
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

For robotic vehicles to navigate safely and efficiently in pedestrian-rich
environments, it is important to model subtle human behaviors and navigation
rules. However, while instinctive to humans, socially compliant navigation is
still difficult to quantify due to the stochasticity in people’s behaviors.
Existing works are mostly focused on using feature-matching techniques to
describe and imitate human paths, but often do not generalize well since the
feature values can vary from person to person, and even run to run. This work
notes that while it is challenging to directly specify the details of what to
do (precise mechanisms of human navigation), it is straightforward to specify
what not to do (violations of social norms). Specifically, using deep
reinforcement learning, this work develops a time-efficient navigation policy
that respects common social norms. The proposed method is shown to enable fully
autonomous navigation of a robotic vehicle moving at human walking speed in an
environment with many pedestrians.

Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs

Yunzhu Li, Jiaming Song, Stefano Ermon
Comments: 10 pages, 6 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The goal of imitation learning is to match example expert behavior, without
access to a reinforcement signal. Expert demonstrations provided by humans,
however, often show significant variability due to latent factors that are not
explicitly modeled. We introduce an extension to the Generative Adversarial
Imitation Learning method that can infer the latent structure of human
decision-making in an unsupervised way. Our method can not only imitate complex
behaviors, but also learn interpretable and meaningful representations. We
demonstrate that the approach is applicable to high-dimensional environments
including raw visual inputs. In the highway driving domain, we show that a
model learned from demonstrations is able to both produce different styles of
human-like driving behaviors and accurately anticipate human actions. Our
method surpasses various baselines in terms of performance and functionality.

Surrogate Model of Multi-Period Flexibility from a Home Energy Management System

Rui Pinto, Ricardo Bessa, Manuel Matos
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

Near-future electric distribution grids operation will have to rely on
demand-side flexibility, both by implementation of demand response strategies
and by taking advantage of the intelligent management of increasingly common
small-scale energy storage. Home energy management systems (HEMS) will play a
crucial role on the flexibility provision to both system operators and market
players like aggregators. Modeling multi-period flexibility from residential
consumers (HEMS flexibility), such as battery storage and electric water
heater, while complying with internal constraints (comfort levels, data
privacy) and uncertainty is a complex task. This paper describes a
computational method that is capable of efficiently define and learn the
feasible flexibility set from controllable resources connected to a HEMS. An
Evolutionary Particle Swarm Optimization (EPSO) algorithm is adopted and
reshaped to derive a set of feasible temporal trajectories for the residential
net-load, considering storage, flexible appliances, and predefined costumer
preferences, as well as load and photovoltaic (PV) forecast uncertainty. A
support vector data description (SVDD) algorithm is used to build models
capable of classifying feasible and unfeasible HEMS operating trajectories upon
request from an optimization/control algorithm operated by a DSO or market
player.

Open Vocabulary Scene Parsing

Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recognizing arbitrary objects in the wild has been a challenging problem due
to the limitations of existing classification models and datasets. In this
paper, we propose a new task that aims at parsing scene with a large and open
vocabulary, and several evaluation metrics are explored for this problem. Our
proposed approach to this problem is a joint image pixel and word concept
embeddings framework, where word concepts are connected by semantic relations.
We validate the open vocabulary prediction ability of our framework on ADE20K
dataset which covers a wide variety of scenes and objects. We further explore
the trained joint embedding space to show its interpretability.

Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

Objective: We investigate whether deep learning techniques for natural
language processing (NLP) can be used efficiently for patient phenotyping.
Patient phenotyping is a classification task for determining whether a patient
has a medical condition, and is a crucial part of secondary analysis of
healthcare data. We assess the performance of deep learning algorithms and
compare them with classical NLP approaches.

Materials and Methods: We compare convolutional neural networks (CNNs),
n-gram models, and approaches based on cTAKES that extract pre-defined medical
concepts from clinical notes and use them to predict patient phenotypes. The
performance is tested on 10 different phenotyping tasks using 1,610 discharge
summaries extracted from the MIMIC-III database.

Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The
average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our
model having an F1-score up to 37 points higher than alternative approaches. We
additionally assess the interpretability of our model by presenting a method
that extracts the most salient phrases for a particular prediction.

Conclusion: We show that NLP methods based on deep learning improve the
performance of patient phenotyping. Our CNN-based algorithm automatically
learns the phrases associated with each patient phenotype. As such, it reduces
the annotation complexity for clinical domain experts, who are normally
required to develop task-specific annotation rules and identify relevant
phrases. Our method performs well in terms of both performance and
interpretability, which indicates that deep learning is an effective approach
to patient phenotyping based on clinicians’ notes.

Information Retrieval

Towards Effective Research-Paper Recommender Systems and User Modeling based on Mind Maps

Joeran Beel
Comments: PhD Thesis, Otto-von-Guericke University Magdeburg, Germany
Subjects: Information Retrieval (cs.IR)

While user-modeling and recommender systems successfully utilize items like
emails, news, and movies, they widely neglect mind-maps as a source for user
modeling. We consider this a serious shortcoming since we assume user modeling
based on mind maps to be equally effective as user modeling based on other
items. Hence, millions of mind-mapping users could benefit from user-modeling
applications such as recommender systems. The objective of this doctoral thesis
is to develop an effective user-modeling approach based on mind maps. To
achieve this objective, we integrate a recommender system in our mind-mapping
and reference-management software Docear. The recommender system builds user
models based on the mind maps, and recommends research papers based on the user
models. As part of our research, we identify several variables relating to
mind-map-based user modeling, and evaluate the variables’ impact on
user-modeling effectiveness with an offline evaluation, a user study, and an
online evaluation based on 430,893 recommendations displayed to 4,700 users. We
find, among others, that the number of analyzed nodes, modification time,
visibility of nodes, relations between nodes, and number of children and
siblings of a node affect the effectiveness of user modeling. When all
variables are combined in a favorable way, this novel approach achieves
click-through rates of 7.20%, which is nearly twice as effective as the best
baseline. In addition, we show that user modeling based on mind maps performs
about as well as user modeling based on other items, namely the research
articles users downloaded or cited. Our findings let us to conclude that user
modeling based on mind maps is a promising research field, and that developers
of mind-mapping applications should integrate recommender systems into their
applications. Such systems could create additional value for millions of
mind-mapping users.

Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia

Joeran Beel, Bela Gipp, Akiko Aizawa
Comments: Accepted for publication at the JCDL conference 2017
Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)

Recommender systems for research papers are offered only by few digital
libraries and reference managers, although they could help users of digital
libraries etc. to better deal with information overload. One reason might be
that operators of digital libraries do not have the resources to develop and
maintain a recommender system. In this paper, we introduce Mr. DLib’s
recommender-system as-a-service. Mr. DLib’s service allows digital libraries
and reference managers to easily integrate a recommender system. The effort is
low, and no knowledge about recommender systems is required. Mr. DLib’s first
pilot partner is the digital library Sowiport. Between September 2016 and
February 2017, Mr. DLib delivered 60 million recommendations to Sowiport with a
click-through rate of 0.15% on average. Mr. DLib is open source, non-profit,
and supports open data.

Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned

Stefan Langer, Joeran Beel
Comments: Accepted for publication at the 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR2017)
Subjects: Information Retrieval (cs.IR)

For the past few years, we used Apache Lucene as recommendation frame-work in
our scholarly-literature recommender system of the reference-management
software Docear. In this paper, we share three lessons learned from our work
with Lucene. First, recommendations with relevance scores below 0.025 tend to
have significantly lower click-through rates than recommendations with
relevance scores above 0.025. Second, by picking ten recommendations randomly
from Lucene’s top50 search results, click-through rate decreased by 15%,
compared to recommending the top10 results. Third, the number of returned
search results tend to predict how high click-through rates will be: when
Lucene returns less than 1,000 search results, click-through rates tend to be
around half as high as if 1,000+ results are returned.

Analyzing Evolving Stories in News Articles

Roberto Camacho Barranco, Arnold P. Boedihardjo, M. Shahriar Hossain
Comments: submitted to KDD 2017, 9 pages, 10 figures
Subjects: Information Retrieval (cs.IR); Information Theory (cs.IT)

There is an overwhelming number of news articles published every day around
the globe. Following the evolution of a news-story is a difficult task given
that there is no such mechanism available to track back in time to study the
diffusion of the relevant events in digital news feeds. The techniques
developed so far to extract meaningful information from a massive corpus rely
on similarity search, which results in a myopic loopback to the same topic
without providing the needed insights to hypothesize the origin of a story that
may be completely different than the news today. In this paper, we present an
algorithm that mines historical data to detect the origin of an event, segments
the timeline into disjoint groups of coherent news articles, and outlines the
most important documents in a timeline with a soft probability to provide a
better understanding of the evolution of a story. Qualitative and quantitative
approaches to evaluate our framework demonstrate that our algorithm discovers
statistically significant and meaningful stories in reasonable time.
Additionally, a relevant case study on a set of news articles demonstrates that
the generated output of the algorithm holds the promise to aid prediction of
future entities in a story.

Computation and Language

A Sentence Simplification System for Improving Relation Extraction

Christina Niklaus, Bernhard Bermeitinger, Siegfried Handschuh, André Freitas
Comments: 26th International Conference on Computational Linguistics (COLING 2016)
Subjects: Computation and Language (cs.CL)

In this demo paper, we present a text simplification approach that is
directed at improving the performance of state-of-the-art Open Relation
Extraction (RE) systems. As syntactically complex sentences often pose a
challenge for current Open RE approaches, we have developed a simplification
framework that performs a pre-processing step by taking a single sentence as
input and using a set of syntactic-based transformation rules to create a
textual input that is easier to process for subsequently applied Open RE
systems.

Question Answering from Unstructured Text by Retrieval and Comprehension

Yusuke Watanabe, Bhuwan Dhingra, Ruslan Salakhutdinov
Subjects: Computation and Language (cs.CL)

Open domain Question Answering (QA) systems must interact with external
knowledge sources, such as web pages, to find relevant information. Information
sources like Wikipedia, however, are not well structured and difficult to
utilize in comparison with Knowledge Bases (KBs). In this work we present a
two-step approach to question answering from unstructured text, consisting of a
retrieval step and a comprehension step. For comprehension, we present an RNN
based attention model with a novel mixture mechanism for selecting answers from
either retrieved articles or a fixed vocabulary. For retrieval we introduce a
hand-crafted model and a neural model for ranking relevant articles. We achieve
state-of-the-art performance on W IKI M OVIES dataset, reducing the error by
40%. Our experimental results further demonstrate the importance of each of the
introduced components.

Learning Simpler Language Models with the Delta Recurrent Neural Network Framework

Alexander G. Ororbia II, Tomas Mikolov, David Reitter
Subjects: Computation and Language (cs.CL)

Learning useful information across long time lags is a critical and difficult
problem for temporal neural models in tasks like language modeling. Existing
architectures that address the issue are often complex and costly to train. The
Delta Recurrent Neural Network (Delta-RNN) framework is a simple and
high-performing design that unifies previously proposed gated neural models.
The Delta-RNN models maintain longer-term memory by learning to interpolate
between a fast-changing data-driven representation and a slowly changing,
implicitly stable state. This requires hardly any more parameters than a
classical simple recurrent network. The models outperform popular complex
architectures, such as the Long Short Term Memory (LSTM) and the Gated
Recurrent Unit (GRU) and achieve state-of-the art performance in language
modeling at character and word levels and yield comparable performance at the
subword level.

LEPOR: An Augmented Machine Translation Evaluation Metric

Lifeng Han
Comments: 132 pages, thesis
Subjects: Computation and Language (cs.CL)

Machine translation (MT) was developed as one of the hottest research topics
in the natural language processing (NLP) literature. One important issue in MT
is that how to evaluate the MT system reasonably and tell us whether the
translation system makes an improvement or not. The traditional manual judgment
methods are expensive, time-consuming, unrepeatable, and sometimes with low
agreement. On the other hand, the popular automatic MT evaluation methods have
some weaknesses. Firstly, they tend to perform well on the language pairs with
English as the target language, but weak when English is used as source.
Secondly, some methods rely on many additional linguistic features to achieve
good performance, which makes the metric unable to replicate and apply to other
language pairs easily. Thirdly, some popular metrics utilize incomprehensive
factors, which result in low performance on some practical tasks. In this
thesis, to address the existing problems, we design novel MT evaluation methods
and investigate their performances on different languages. Firstly, we design
augmented factors to yield highly accurate evaluation.Secondly, we design a
tunable evaluation model where weighting of factors can be optimised according
to the characteristics of languages. Thirdly, in the enhanced version of our
methods, we design concise linguistic feature using POS to show that our
methods can yield even higher performance when using some external linguistic
resources. Finally, we introduce the practical performance of our metrics in
the ACL-WMT workshop shared tasks, which show that the proposed methods are
robust across different languages.

Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

Objective: We investigate whether deep learning techniques for natural
language processing (NLP) can be used efficiently for patient phenotyping.
Patient phenotyping is a classification task for determining whether a patient
has a medical condition, and is a crucial part of secondary analysis of
healthcare data. We assess the performance of deep learning algorithms and
compare them with classical NLP approaches.

Materials and Methods: We compare convolutional neural networks (CNNs),
n-gram models, and approaches based on cTAKES that extract pre-defined medical
concepts from clinical notes and use them to predict patient phenotypes. The
performance is tested on 10 different phenotyping tasks using 1,610 discharge
summaries extracted from the MIMIC-III database.

Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The
average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our
model having an F1-score up to 37 points higher than alternative approaches. We
additionally assess the interpretability of our model by presenting a method
that extracts the most salient phrases for a particular prediction.

Conclusion: We show that NLP methods based on deep learning improve the
performance of patient phenotyping. Our CNN-based algorithm automatically
learns the phrases associated with each patient phenotype. As such, it reduces
the annotation complexity for clinical domain experts, who are normally
required to develop task-specific annotation rules and identify relevant
phrases. Our method performs well in terms of both performance and
interpretability, which indicates that deep learning is an effective approach
to patient phenotyping based on clinicians’ notes.

Morphological Analysis for the Maltese Language: The Challenges of a Hybrid System

Claudia Borg, Albert Gatt
Comments: 11pages, Proceedings of the 3rd Arabic Natural Language Processing Workshop (WANLP’17)
Subjects: Computation and Language (cs.CL)

Maltese is a morphologically rich language with a hybrid morphological system
which features both concatenative and non-concatenative processes. This paper
analyses the impact of this hybridity on the performance of machine learning
techniques for morphological labelling and clustering. In particular, we
analyse a dataset of morphologically related word clusters to evaluate the
difference in results for concatenative and nonconcatenative clusters. We also
describe research carried out in morphological labelling, with a particular
focus on the verb category. Two evaluations were carried out, one using an
unseen dataset, and another one using a gold standard dataset which was
manually labelled. The gold standard dataset was split into concatenative and
non-concatenative to analyse the difference in results between the two
morphological systems.

Simplifying the Bible and Wikipedia Using Statistical Machine Translation

Yohan Jo
Subjects: Computation and Language (cs.CL)

I started this work with the hope of generating a text synthesizer (like a
musical synthesizer) that can imitate certain linguistic styles. Most of the
report focuses on text simplification using statistical machine translation
(SMT) techniques. I applied MOSES to a parallel corpus of the Bible (King James
Version and Easy-to-Read Version) and that of Wikipedia articles (normal and
simplified). I report the importance of the three main components of
SMT—phrase translation, language model, and recording—by changing their
weights and comparing the resulting quality of simplified text in terms of
METEOR and BLEU. Toward the end of the report will be presented some examples
of text “synthesized” into the King James style.

Sequence-to-Sequence Models Can Directly Transcribe Foreign Speech

Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen
Comments: Submitted to Interspeech 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

We present a recurrent encoder-decoder deep neural network architecture that
directly translates speech in one language into text in another. The model does
not explicitly transcribe the speech into text in the source language, nor does
it require supervision from the ground truth source language transcription
during training. We apply a slightly modified sequence-to-sequence with
attention architecture that has previously been used for speech recognition and
show that it can be repurposed for this more complex task, illustrating the
power of attention-based models. A single model trained end-to-end obtains
state-of-the-art performance on the Fisher Callhome Spanish-English speech
translation task, outperforming a cascade of independently trained
sequence-to-sequence speech recognition and machine translation models by 1.8
BLEU points on the Fisher test set. In addition, we find that making use of the
training data in both languages by multi-task training sequence-to-sequence
speech translation and recognition models with a shared encoder network can
improve performance by a further 1.4 BLEU points.

Where to put the Image in an Image Caption Generator

When a neural language model is used for caption generation, the image
information can be fed to the neural network either by directly incorporating
it in a recurrent neural network — conditioning the language model by
injecting image features — or in a layer following the recurrent neural
network — conditioning the language model by merging the image features. While
merging implies that visual features are bound at the end of the caption
generation process, injecting can bind the visual features at a variety stages.
In this paper we empirically show that late binding is superior to early
binding in terms of different evaluation metrics. This suggests that the
different modalities (visual and linguistic) for caption generation should not
be jointly encoded by the RNN; rather, the multimodal integration should be
delayed to a subsequent stage. Furthermore, this suggests that recurrent neural
networks should not be viewed as actually generating text, but only as encoding
it for prediction in a subsequent layer.

Bootstrapping a Lexicon for Emotional Arousal in Software Engineering

Mika V. Mäntylä, Nicole Novielli, Filippo Lanubile, Maëlick Claes, Miikka Kuutila
Comments: 5 pages. Accepted version. Copyright IEEE
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)

Emotional arousal increases activation and performance but may also lead to
burnout in software development. We present the first version of a Software
Engineering Arousal lexicon (SEA) that is specifically designed to address the
problem of emotional arousal in the software developer ecosystem. SEA is built
using a bootstrapping approach that combines word embedding model trained on
issue-tracking data and manual scoring of items in the lexicon. We show that
our lexicon is able to differentiate between issue priorities, which are a
source of emotional activation and then act as a proxy for arousal. The best
performance is obtained by combining SEA (428 words) with a previously created
general purpose lexicon by Warriner et al. (13,915 words) and it achieves
Cohen’s d effect sizes up to 0.5.

D.TRUMP: Data-mining Textual Responses to Uncover Misconception Patterns

Joshua J. Michalenko, Andrew S. Lan, Richard G. Baraniuk
Comments: 7 Pages, Submitted to EDM 2017, Workshop version accepted to L@S 2017
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL)

An important, yet largely unstudied, problem in student data analysis is to
detect misconceptions from students’ responses to open-response questions.
Misconception detection enables instructors to deliver more targeted feedback
on the misconceptions exhibited by many students in their class, thus improving
the quality of instruction. In this paper, we propose D.TRUMP, a new natural
language processing-based framework to detect the common misconceptions among
students’ textual responses to short-answer questions. We propose a
probabilistic model for students’ textual responses involving misconceptions
and experimentally validate it on a real-world student-response dataset.
Experimental results show that D.TRUMP excels at classifying whether a response
exhibits one or more misconceptions. More importantly, it can also
automatically detect the common misconceptions exhibited across responses from
multiple students to multiple questions; this property is especially important
at large scale, since instructors will no longer need to manually specify all
possible misconceptions that students might exhibit.

Distributed, Parallel, and Cluster Computing

Private Learning on Networks: Part II

Shripad Gade, Nitin H. Vaidya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Optimization and Control (math.OC)

Widespread deployment of distributed machine learning algorithms has raised
new privacy challenges. The focus of this paper is on improving privacy of each
participant’s local information (such as dataset or loss function) while
collaboratively learning underlying model. We present two iterative algorithms
for privacy preserving distributed learning. Our algorithms involves adding
structured randomization to the state estimates. We prove deterministic
correctness (in every execution) of our algorithm despite the iterates being
perturbed by non-zero mean random variables. We motivate privacy using privacy
analysis of a special case of our algorithm referred to as Function Sharing
strategy (presented in [1]).

MURS: Mitigating Memory Pressure in Data Processing Systems for Service

Xuanhua Shi, Xiong Zhang, Ligang He, Hai Jin, Zhixiang Ke, Song Wu
Comments: 10 pages, 7 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

It has been shown that in-memory computing systems suffer from serious memory
pressure. The memory pressure will affect all submitted jobs. Memory pressure
comes from the running tasks as they produce massive long-living data objects
in the limited memory space. The long-living objects incur significant memory
and CPU overheads. Some tasks cause the heavy memory pressure because of the
operations and dataset they process, which in turn affect all running tasks in
the system. Our studies show that a task often call several API functions
provided by the need to constant memory space, while some need the linear
memory space. As different models have different impact on memory pressure, we
propose a method of classifying the models that the tasks belong to. The method
uses the memory usage rate as the classification criteria. Further, we design a
scheduler called MURS to mitigate the memory pressure. We implement MURS in
Spark and conduct the experiments to evaluate the performance of MURS. The
results show that when comparing to Spark, our scheduler can 1) decrease the
execution time of submitted jobs by up to 65.8%, 2) mitigate the memory
pressure in the server by decreasing the garbage collection time by up to 81%,
and 3) reduce the data spilling, and hence disk I/Os, by approximately 90%.

Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm

C Rashmi
Comments: 11 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Distributed Computation has been a recent trend in engineering research.
Parallel Computation is widely used in different areas of Data Mining, Image
Processing, Simulating Models, Aerodynamics and so forth. One of the major
usage of Parallel Processing is widely implemented for clustering the satellite
images of size more than dimension of 1000×1000 in a legacy system. This paper
mainly focuses on the different approaches of parallel block processing such as
row-shaped, column-shaped and square-shaped. These approaches are applied for
classification problem. These approaches is applied to the K-Means clustering
algorithm as this is widely used for the detection of features for high
resolution orthoimagery satellite images. The different approaches are
analyzed, which lead to reduction in execution time and resulted the influence
of improvement in performance measurement compared to sequential K-Means
Clustering algorithm.

Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus

Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Tevfik Kosar
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We present WPaxos, a multileader wide area network (WAN) Paxos protocol, that
achieves low-latency high-throughput consensus across WAN deployments. WPaxos
dynamically partitions the global object-space across multiple concurrent
leaders that are deployed strategically using flexible quorums. This
partitioning and emphasis on local operations allow our protocol to
significantly outperform leaderless approaches, such as EPaxos, while
maintaining the same consistency guarantees. Unlike statically partitioned
multiple Paxos deployments, WPaxos adapts dynamically to the changing access
locality through adaptive object stealing. The ability to quickly react to
changing access locality not only speeds up the protocol, but also enables
support for mini-transactions.

We implemented WPaxos and evaluated it across WAN deployments using the
benchmarks introduced in the EPaxos work. Our results show that WPaxos achieves
up to 18 times faster average request latency and 65 times faster median
latency than EPaxos due to the reduction in WAN communication.

Distributed Voting/Ranking with Optimal Number of States per Node

Saber Salehkaleybar, Arsalan Sharif-Nassab, S. Jamaloddin Golestani
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Considering a network with (n) nodes, where each node initially votes for one
(or more) choices out of (K) possible choices, we present a Distributed
Multi-choice Voting/Ranking (DMVR) algorithm to determine either the choice
with maximum vote (the voting problem) or to rank all the choices in terms of
their acquired votes (the ranking problem). The algorithm consolidates node
votes across the network by updating the states of interacting nodes using two
key operations, the union and the intersection. The proposed algorithm is
simple, independent from network size, and easily scalable in terms of the
number of choices (K), using only (K imes 2^{K-1}) nodal states for voting,
and (K imes K!) nodal states for ranking. We prove the number of states to be
optimal in the ranking case, this optimality is conjectured to also apply to
the voting case. The time complexity of the algorithm is analyzed in complete
graphs. We show that the time complexity for both ranking and voting is
(O(log(n))) for given vote percentages, and is inversely proportional to the
minimum of the vote percentage differences among various choices.

Token-based Function Computation with Memory

Saber Salehkaleybar, S. Jamaloddin Golestani
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

In distributed function computation, each node has an initial value and the
goal is to compute a function of these values in a distributed manner. In this
paper, we propose a novel token-based approach to compute a wide class of
target functions to which we refer as “Token-based function Computation with
Memory” (TCM) algorithm. In this approach, node values are attached to tokens
and travel across the network. Each pair of travelling tokens would coalesce
when they meet, forming a token with a new value as a function of the original
token values. In contrast to the Coalescing Random Walk (CRW) algorithm, where
token movement is governed by random walk, meeting of tokens in our scheme is
accelerated by adopting a novel chasing mechanism. We proved that, compared to
the CRW algorithm, the TCM algorithm results in a reduction of time complexity
by a factor of at least (sqrt{n/log(n)}) in Erd”os-Renyi and complete
graphs, and by a factor of (log(n)/log(log(n))) in torus networks.
Simulation results show that there is at least a constant factor improvement in
the message complexity of TCM algorithm in all considered topologies.
Robustness of the CRW and TCM algorithms in the presence of node failure is
analyzed. We show that their robustness can be improved by running multiple
instances of the algorithms in parallel.

Randomized Load Balancing on Networks with Stochastic Inputs

Leran Cai, Thomas Sauerwald
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Iterative load balancing algorithms for indivisible tokens have been studied
intensively in the past. Complementing previous worst-case analyses, we study
an average-case scenario where the load inputs are drawn from a fixed
probability distribution. For cycles, tori, hypercubes and expanders, we obtain
almost matching upper and lower bounds on the discrepancy, the difference
between the maximum and the minimum load. Our bounds hold for a variety of
probability distributions including the uniform and binomial distribution but
also distributions with unbounded range such as the Poisson and geometric
distribution. For graphs with slow convergence like cycles and tori, our
results demonstrate a substantial difference between the convergence in the
worst- and average-case. An important ingredient in our analysis is new upper
bound on the t-step transition probability of a general Markov chain, which is
derived by invoking the evolving set process.

Learning

Deep Architectures for Modulation Recognition

Nathan E West, Timothy J. O'Shea
Comments: 7 pages, 14 figures, to be published in proceedings of IEEE DySPAN 2017
Subjects: Learning (cs.LG)

We survey the latest advances in machine learning with deep neural networks
by applying them to the task of radio modulation recognition. Results show that
radio modulation recognition is not limited by network depth and further work
should focus on improving learned synchronization and equalization. Advances in
these areas will likely come from novel architectures designed for these tasks
or through novel training methods.

GPU Activity Prediction using Representation Learning

Aswin Raghavan, Mohamed Amer, Timothy Shields, David Zhang, Sek Chai
Comments: Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s)
Subjects: Learning (cs.LG)

GPU activity prediction is an important and complex problem. This is due to
the high level of contention among thousands of parallel threads. This problem
was mostly addressed using heuristics. We propose a representation learning
approach to address this problem. We model any performance metric as a temporal
function of the executed instructions with the intuition that the flow of
instructions can be identified as distinct activities of the code. Our
experiments show high accuracy and non-trivial predictive power of
representation learning on a benchmark.

Automatic Decomposition of Self-Triggering Kernels of Hawkes Processes

Rafael Lima, Jaesik Choi
Subjects: Learning (cs.LG)

Hawkes Processes capture self- and mutual-excitation between events when the
arrival of one event makes future ones more likely to happen in time-series
data. Identification of the temporal covariance kernel can reveal the
underlying structure to better predict future events. In this paper, we present
a new framework to represent time-series events with a composition of
self-triggering kernels of Hawkes Processes. That is, the input time-series
events are decomposed into multiple Hawkes Processes with heterogeneous
kernels. Our automatic decomposition procedure is composed of three main steps:
(1) discretized kernel estimation through frequency domain inversion equation
associated with the covariance density, (2) greedy kernel decomposition through
four base kernels and their combinations (addition and multiplication), and (3)
automated report generation. We demonstrate that the new automatic
decomposition procedure performs better to predict future events than the
existing framework in real-world data.

Multimodal deep learning approach for joint EEG-EMG data compression and classification

Ahmed Ben Said, Amr Mohamed, Tarek Elfouly, Khaled Harras, Z. Jane Wang
Comments: IEEE Wireless Communications and Networking Conference (WCNC), 2017
Subjects: Learning (cs.LG)

In this paper, we present a joint compression and classification approach of
EEG and EMG signals using a deep learning approach. Specifically, we build our
system based on the deep autoencoder architecture which is designed not only to
extract discriminant features in the multimodal data representation but also to
reconstruct the data from the latent representation using encoder-decoder
layers. Since autoencoder can be seen as a compression approach, we extend it
to handle multimodal data at the encoder layer, reconstructed and retrieved at
the decoder layer. We show through experimental results, that exploiting both
multimodal data intercorellation and intracorellation 1) Significantly reduces
signal distortion particularly for high compression levels 2) Achieves better
accuracy in classifying EEG and EMG signals recorded and labeled according to
the sentiments of the volunteer.

Multiple Instance Learning with the Optimal Sub-Pattern Assignment Metric

Quang N. Tran, Ba-Ngu Vo, Dinh Phung, Ba-Tuong Vo, Thuong Nguyen
Subjects: Learning (cs.LG)

Multiple instance data are sets or multi-sets of unordered elements. Using
metrics or distances for sets, we propose an approach to several multiple
instance learning tasks, such as clustering (unsupervised learning),
classification (supervised learning), and novelty detection (semi-supervised
learning). In particular, we introduce the Optimal Sub-Pattern Assignment
metric to multiple instance learning so as to provide versatile design choices.
Numerical experiments on both simulated and real data are presented to
illustrate the versatility of the proposed solution.

Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs

Yunzhu Li, Jiaming Song, Stefano Ermon
Comments: 10 pages, 6 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

The goal of imitation learning is to match example expert behavior, without
access to a reinforcement signal. Expert demonstrations provided by humans,
however, often show significant variability due to latent factors that are not
explicitly modeled. We introduce an extension to the Generative Adversarial
Imitation Learning method that can infer the latent structure of human
decision-making in an unsupervised way. Our method can not only imitate complex
behaviors, but also learn interpretable and meaningful representations. We
demonstrate that the approach is applicable to high-dimensional environments
including raw visual inputs. In the highway driving domain, we show that a
model learned from demonstrations is able to both produce different styles of
human-like driving behaviors and accurately anticipate human actions. Our
method surpasses various baselines in terms of performance and functionality.

Uncertainty Quantification in the Classification of High Dimensional Data

Andrea L. Bertozzi, Xiyang Luo, Andrew M. Stuart, Konstantinos C. Zygalakis
Comments: 33 pages, 14 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Classification of high dimensional data finds wide-ranging applications. In
many of these applications equipping the resulting classification with a
measure of uncertainty may be as important as the classification itself. In
this paper we introduce, develop algorithms for, and investigate the properties
of, a variety of Bayesian models for the task of binary classification; via the
posterior distribution on the classification labels, these methods
automatically give measures of uncertainty. The methods are all based around
the graph formulation of semi-supervised learning.

We provide a unified framework which brings together a variety of methods
which have been introduced in different communities within the mathematical
sciences. We study probit classification, generalize the level-set method for
Bayesian inverse problems to the classification setting, and generalize the
Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also
show that the probit and level set approaches are natural relaxations of the
harmonic function approach.

We introduce efficient numerical methods, suited to large data-sets, for both
MCMC-based sampling as well as gradient-based MAP estimation. Through numerical
experiments we study classification accuracy and uncertainty quantification for
our models; these experiments showcase a suite of datasets commonly used to
evaluate graph-based semi-supervised learning algorithms.

Who Said What: Modeling Individual Labelers Improves Classification

Melody Y. Guan, Varun Gulshan, Andrew M. Dai, Geoffrey E. Hinton
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Data are often labeled by many different experts with each expert only
labeling a small fraction of the data and each data point being labeled by
several experts. This reduces the workload on individual experts and also gives
a better estimate of the unobserved ground truth. When experts disagree, the
standard approaches are to treat the majority opinion as the correct label or
to model the correct label as a distribution. These approaches, however, do not
make any use of potentially valuable information about which expert produced
which label. To make use of this extra information, we propose modeling the
experts individually and then learning averaging weights for combining them,
possibly in sample-specific ways. This allows us to give more weight to more
reliable experts and take advantage of the unique strengths of individual
experts at classifying certain types of data. Here we show that our approach
leads to improvements in computer-aided diagnosis of diabetic retinopathy. We
also show that our method performs better than competing algorithms by Welinder
and Perona, and by Mnih and Hinton. Our work offers an innovative approach for
dealing with the myriad real-world settings that use expert opinions to define
labels for training.

Exploration–Exploitation in MDPs with Options

Ronan Fruit, Alessandro Lazaric
Subjects: Learning (cs.LG)

While a large body of empirical results show that temporally-extended actions
and options may significantly affect the learning performance of an agent, the
theoretical understanding of how and when options can be beneficial in online
reinforcement learning is relatively limited. In this paper, we derive an upper
and lower bound on the regret of a variant of UCRL using options. While we
first analyze the algorithm in the general case of semi-Markov decision
processes (SMDPs), we show how these results can be translated to the specific
case of MDPs with options and we illustrate simple scenarios in which the
regret of learning with options can be extit{provably} much smaller than the
regret suffered when learning with primitive actions.

Low Precision Neural Networks using Subband Decomposition

Sek Chai, Aswin Raghavan, David Zhang, Mohamed Amer, Tim Shields
Comments: Presented at CogArch Workshop, Atlanta, GA, April 2016
Subjects: Learning (cs.LG)

Large-scale deep neural networks (DNN) have been successfully used in a
number of tasks from image recognition to natural language processing. They are
trained using large training sets on large models, making them computationally
and memory intensive. As such, there is much interest in research development
for faster training and test time. In this paper, we present a unique approach
using lower precision weights for more efficient and faster training phase. We
separate imagery into different frequency bands (e.g. with different
information content) such that the neural net can better learn using less bits.
We present this approach as a complement existing methods such as pruning
network connections and encoding learning weights. We show results where this
approach supports more stable learning with 2-4X reduction in precision with
17X reduction in DNN parameters.

Biologically inspired protection of deep networks from adversarial attacks

Aran Nayebi, Surya Ganguli
Comments: 11 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Inspired by biophysical principles underlying nonlinear dendritic computation
in neural circuits, we develop a scheme to train deep neural networks to make
them robust to adversarial attacks. Our scheme generates highly nonlinear,
saturated neural networks that achieve state of the art performance on gradient
based adversarial examples on MNIST, despite never being exposed to
adversarially chosen examples during training. Moreover, these networks exhibit
unprecedented robustness to targeted, iterative schemes for generating
adversarial examples, including second-order methods. We further identify
principles governing how these networks achieve their robustness, drawing on
methods from information geometry. We find these networks progressively create
highly flat and compressed internal representations that are sensitive to very
few input dimensions, while still solving the task. Moreover, they employ
highly kurtotic weight distributions, also found in the brain, and we
demonstrate how such kurtosis can protect even linear classifiers from
adversarial attack.

Sticking the Landing: An Asymptotically Zero-Variance Gradient Estimator for Variational Inference

Geoffrey Roeder, Yuhuai Wu, David Duvenaud
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We propose a simple and general variant of the standard reparameterized
gradient estimator for the variational evidence lower bound. Specifically, we
remove a part of the total derivative with respect to the variational
parameters that corresponds to the score function. Removing this term produces
an unbiased gradient estimator whose variance approaches zero as the
approximate posterior approaches the exact posterior. We analyze the behavior
of this gradient estimator theoretically and empirically, and generalize it to
more complex variational distributions such as mixtures and importance-weighted
posteriors.

Private Learning on Networks: Part II

Shripad Gade, Nitin H. Vaidya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Optimization and Control (math.OC)

Widespread deployment of distributed machine learning algorithms has raised
new privacy challenges. The focus of this paper is on improving privacy of each
participant’s local information (such as dataset or loss function) while
collaboratively learning underlying model. We present two iterative algorithms
for privacy preserving distributed learning. Our algorithms involves adding
structured randomization to the state estimates. We prove deterministic
correctness (in every execution) of our algorithm despite the iterates being
perturbed by non-zero mean random variables. We motivate privacy using privacy
analysis of a special case of our algorithm referred to as Function Sharing
strategy (presented in [1]).

Scaling the Scattering Transform: Deep Hybrid Networks

Edouard Oyallon (DI-ENS), Eugene Belilovsky (CVN, GALEN), Sergey Zagoruyko (ENPC)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We use the scattering network as a generic and fixed initialization of the
first layers of a supervised hybrid deep network. We show that early layers do
not necessarily need to be learned, providing the best results to-date with
pre-defined representations while being competitive with Deep CNNs. Using a
shallow cascade of 1×1 convolutions, which encodes scattering coefficients that
correspond to spatial windows of very small sizes, permits to obtain AlexNet
accuracy on the imagenet ILSVRC2012. We demonstrate that this local encoding
explicitly learns in-variance w.r.t. rotations. Combining scattering networks
with a modern ResNet, we achieve a single-crop top 5 error of 11.4% on imagenet
ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10
layers. We also find that hybrid architectures can yield excellent performance
in the small sample regime, exceeding their end-to-end counterparts, through
their ability to incorporate geometrical priors. We demonstrate this on subsets
of the CIFAR-10 dataset and by setting a new state-of-the-art on the STL-10
dataset.

Distributed Voting/Ranking with Optimal Number of States per Node

Saber Salehkaleybar, Arsalan Sharif-Nassab, S. Jamaloddin Golestani
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Considering a network with (n) nodes, where each node initially votes for one
(or more) choices out of (K) possible choices, we present a Distributed
Multi-choice Voting/Ranking (DMVR) algorithm to determine either the choice
with maximum vote (the voting problem) or to rank all the choices in terms of
their acquired votes (the ranking problem). The algorithm consolidates node
votes across the network by updating the states of interacting nodes using two
key operations, the union and the intersection. The proposed algorithm is
simple, independent from network size, and easily scalable in terms of the
number of choices (K), using only (K imes 2^{K-1}) nodal states for voting,
and (K imes K!) nodal states for ranking. We prove the number of states to be
optimal in the ranking case, this optimality is conjectured to also apply to
the voting case. The time complexity of the algorithm is analyzed in complete
graphs. We show that the time complexity for both ranking and voting is
(O(log(n))) for given vote percentages, and is inversely proportional to the
minimum of the vote percentage differences among various choices.

Learned multi-patch similarity

Wilfried Hartmann, Silvano Galliani, Michal Havlena, Konrad Schindler, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Estimating a depth map from multiple views of a scene is a fundamental task
in computer vision. As soon as more than two viewpoints are available, one
faces the very basic question how to measure similarity across >2 image
patches. Surprisingly, no direct solution exists, instead it is common to fall
back to more or less robust averaging of two-view similarities. Encouraged by
the success of machine learning, and in particular convolutional neural
networks, we propose to learn a matching function which directly maps multiple
image patches to a scalar similarity score. Experiments on several multi-view
datasets demonstrate that this approach has advantages over methods based on
pairwise patch similarity.

Count-ception: Counting by Fully Convolutional Redundant Counting

Joseph Paul Cohen, Henry Z. Lo, Yoshua Bengio
Comments: Under Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Counting objects in digital images is a process that should be replaced by
machines. This tedious task is time consuming and prone to errors due to
fatigue of human annotators. The goal is to have a system that takes as input
an image and returns a count of the objects inside and justification for the
prediction in the form of object localization. We repose a problem, originally
posed by Lempitsky and Zisserman, to instead predict a count map which contains
redundant counts based on the receptive field of a smaller regression network.
The regression network predicts a count of the objects that exist inside this
frame. By processing the image in a fully convolutional way each pixel is going
to be accounted for some number of times, the number of windows which include
it, which is the size of each window, (i.e., 32×32 = 1024). To recover the true
count take the average over the redundant predictions. Our contribution is
redundant counting instead of predicting a density map in order to average over
errors. We also propose a novel deep neural network architecture adapted from
the Inception family of networks called the Count-ception network. Together our
approach results in a 20% gain over the state of the art method by Xie, Noble,
and Zisserman in 2016.

Jointly Optimizing Placement and Inference for Beacon-based Localization

Charles Schaff, David Yunis, Ayan Chakrabarti, Matthew R. Walter
Subjects: Robotics (cs.RO); Learning (cs.LG)

The ability of robots to estimate their location is crucial for a wide
variety of autonomous operations. In settings where GPS is unavailable, range-
or bearing-only observations relative to a set of fixed beacons provide an
effective means of estimating a robot’s location as it navigates. The accuracy
of such a beacon-based localization system depends both on how beacons are
spatially distributed in the environment, and how the robot’s location is
inferred based on noisy measurements of range or bearing. However, it is
computationally challenging to search for a placement and an inference strategy
that, together, are optimal. Existing methods decouple these decisions,
forgoing optimality for tractability. We propose a new optimization approach to
jointly determine the beacon placement and inference algorithm. We model
inference as a neural network and incorporate beacon placement as a
differentiable neural layer. This formulation allows us to optimize placement
and inference by jointly training the inference network and beacon layer. We
evaluate our method on different localization problems and demonstrate
performance that exceeds hand-crafted baselines.

Sequence-to-Sequence Models Can Directly Transcribe Foreign Speech

We present a recurrent encoder-decoder deep neural network architecture that
directly translates speech in one language into text in another. The model does
not explicitly transcribe the speech into text in the source language, nor does
it require supervision from the ground truth source language transcription
during training. We apply a slightly modified sequence-to-sequence with
attention architecture that has previously been used for speech recognition and
show that it can be repurposed for this more complex task, illustrating the
power of attention-based models. A single model trained end-to-end obtains
state-of-the-art performance on the Fisher Callhome Spanish-English speech
translation task, outperforming a cascade of independently trained
sequence-to-sequence speech recognition and machine translation models by 1.8
BLEU points on the Fisher test set. In addition, we find that making use of the
training data in both languages by multi-task training sequence-to-sequence
speech translation and recognition models with a shared encoder network can
improve performance by a further 1.4 BLEU points.

Information Theory

Generalized Gabidulin codes over fields of any characteristic

Daniel Augot, Pierre Loidreau, Gwezheneg Robert
Subjects: Information Theory (cs.IT)

We generalise Gabidulin codes to the case of infinite fields, eventually with
characteristic zero. For this purpose, we consider an abstract field extension
and any automorphism in the Galois group. We derive some conditions on the
automorphism to be able to have a proper notion of rank metric which is in
coherence with linearized polynomials. Under these conditions, we generalize
Gabidulin codes and provide a decoding algorithm which decode both errors and
erasures. Then, we focus on codes over integer rings and how to decode them. We
are then faced with the problem of the exponential growth of intermediate
values, and to circumvent the problem, it is natural to propose to do
computations modulo a prime ideal. For this, we study the reduction of
generalized Gabidulin codes over number ideals codes modulo a prime ideal, and
show they are classical Gabidulin codes. As a consequence, knowing side
information on the size of the errors or the message, we can reduce the
decoding problem over the integer ring to a decoding problem over a finite
field. We also give examples and timings.

Multiple Access for 5G New Radio: Categorization, Evaluation, and Challenges

Hyunsoo Kim, Yeon-Geun Lim, Chan-Byoung Chae, Daesik Hong
Comments: 9 pages, 4 figures, 2 tables
Subjects: Information Theory (cs.IT)

Next generation wireless networks require massive uplink connections as well
as high spectral efficiency. It is well known that, theoretically, it is not
possible to achieve the sum capacity of multi-user communications with
orthogonal multiple access. To meet the challenging requirements of next
generation networks, researchers have explored non-orthogonal and overloaded
transmission technologies-known as new radio multiple access (NR-MA)
schemes-for fifth generation (5G) networks. In this article, we discuss the key
features of the promising NR-MA schemes for the massive uplink connections. The
candidate schemes of NR-MA can be characterized by multiple access signatures
(MA-signatures), such as codebook, sequence, and interleaver/scrambler. At the
receiver side, advanced multi-user detection (MUD) schemes are employed to
extract each user’s data from non-orthogonally superposed data according to
MA-signatures. Through link-level simulations, we compare the performances of
NR-MA candidates under the same conditions. We further evaluate the sum rate
performances of the NR-MA schemes using a 3-dimensional (3D) ray tracing tool
based system-level simulator by reflecting realistic environments. Lastly, we
discuss the tips for system operations as well as call attention to the
remaining technical challenges.

One- and Two-Way Relay Optimization for MIMO Interference Networks

Muhammad R A Khandaker, Kai-Kit Wong
Comments: Accepted in EURASIP Journal on Advances in Signal Processing
Journal-ref: EURASIP J. Adv. Signal Process., 2017:24
Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)

This paper considers multiple-input multiple-output (MIMO) relay
communication in multi-cellular (interference) systems in which MIMO
source-destination pairs communicate simultaneously. It is assumed that due to
severe attenuation and/or shadowing effects, communication links can be
established only with the aid of a relay node. The aim is to minimize the
maximal mean-square-error (MSE) among all the receiving nodes under constrained
source and relay transmit powers. Both one- and two-way amplify-and-forward
(AF) relaying mechanisms are considered. Since the exactly optimal solution for
this practically appealing problem is intractable, we first propose optimizing
the source, relay, and receiver matrices in an alternating fashion. Then we
contrive a simplified semidefinite programming (SDP) solution based on the
error covariance matrix decomposition technique, avoiding the high complexity
of the iterative process. Numerical results reveal the effectiveness of the
proposed schemes.

Group Cooperation with Optimal Resource Allocation in Wireless Powered Communication Networks

Ke Xiong, Chen Chen, Gang Qu, Pingyi Fan (IEEE), Khaled Ben Letaief
Comments: 13 pages, 14 figures, to appear in IEEE Transactions on Wireless Communications Information Theory (cs.IT)
Subjects: Information Theory (cs.IT)

This paper considers a wireless powered communication network (WPCN) with
group cooperation, where two communication groups cooperate with each other via
wireless power transfer and time sharing to fulfill their expected information
delivering and achieve “win-win” collaboration. To explore the system
performance limits, we formulate optimization problems to respectively maximize
the weighted sum-rate and minimize the total consumed power. The time
assignment, beamforming vector and power allocation are jointly optimized under
available power and quality of service requirement constraints of both groups.
For the WSR-maximization, both fixed and flexible power scenarios are
investigated. As all problems are non-convex and have no known solution
methods, we solve them by using proper variable substitutions and the
semi-definite relaxation. We theoretically prove that our proposed solution
method guarantees the global optimum for each problem. Numerical results are
presented to show the system performance behaviors, which provide some useful
insights for future WPCN design. It shows that in such a group
cooperation-aware WPCN, optimal time assignment has the greatest effect on the
system performance than other factors.

Physical Layer Security in Wireless Ad Hoc Networks Under A Hybrid Full-/Half-Duplex Receiver Deployment Strategy

Tong-Xing Zheng, Hui-Ming Wang, Jinhong Yuan, Zhu Han, Moon Ho Lee
Comments: Journal paper, double-column 12 pages, 9 figures, accepted by IEEE Transactions on Wireless Communications, 2017
Subjects: Information Theory (cs.IT)

This paper studies physical layer security in a wireless ad hoc network with
numerous legitimate transmitter-receiver pairs and eavesdroppers. A hybrid
full-/half-duplex receiver deployment strategy is proposed to secure legitimate
transmissions, by letting a fraction of legitimate receivers work in the
full-duplex (FD) mode sending jamming signals to confuse eavesdroppers upon
their information receptions, and letting the other receivers work in the
half-duplex mode just receiving their desired signals. The objective of this
paper is to choose properly the fraction of FD receivers for achieving the
optimal network security performance. Both accurate expressions and tractable
approximations for the connection outage probability and the secrecy outage
probability of an arbitrary legitimate link are derived, based on which the
area secure link number, network-wide secrecy throughput and network-wide
secrecy energy efficiency are optimized respectively. Various insights into the
optimal fraction are further developed and its closed-form expressions are also
derived under perfect self-interference cancellation or in a dense network. It
is concluded that the fraction of FD receivers triggers a non-trivial trade-off
between reliability and secrecy, and the proposed strategy can significantly
enhance the network security performance.

Some new bounds of placement delivery arrays

X. Niu, H. Cao
Comments: Coded caching scheme, placement delivery array, optimal
Subjects: Information Theory (cs.IT)

Coded caching scheme is a technique which reduce the load during peak traffic
times in a wireless network system. Placement delivery array (PDA in short) was
first introduced by Yan et al.. It can be used to design coded caching scheme.
In this paper, we prove some lower bounds of PDA on the element and some lower
bounds of PDA on the column. We also give some constructions for optimal PDA.

A categorical characterization of relative entropy on Polish spaces

Nicolas Gagne, Prakash Panangaden
Comments: 19 pages
Subjects: Information Theory (cs.IT)

We give a categorical treatment, in the spirit of Baez and Fritz, of relative
entropy for probability distributions defined on Polish spaces. We define a
category called PolStat suitable for reasoning about statistical inference on
Polish spaces. We define relative entropy as a functor into Lawvere’s category
([0,infty]) and we show convexity, lower semicontinuity and uniqueness.

A Unified Ensemble of Concatenated Convolutional Codes

Saeedeh Moloudi, Michael Lentmaier, Alexandre Graell i Amat
Subjects: Information Theory (cs.IT)

We introduce a unified ensemble for turbo-like codes (TCs) that contains the
four main classes of TCs: parallel concatenated codes, serially concatenated
codes, hybrid concatenated codes, and braided convolutional codes. We show that
for each of the original classes of TCs, it is possible to find an equivalent
ensemble by proper selection of the design parameters in the unified ensemble.
We also derive the density evolution (DE) equations for this ensemble over the
binary erasure channel. The thresholds obtained from the DE indicate that the
TC ensembles from the unified ensemble have similar asymptotic behavior to the
original TC ensembles.

Denoising-based Turbo Compressed Sensing

Zhipeng Xue, Junjie Ma, Xiaojun Yuan
Comments: 11pages, 10 figures
Subjects: Information Theory (cs.IT)

Turbo compressed sensing (Turbo-CS) is an efficient iterative algorithm for
sparse signal recovery with partial orthogonal sensing matrices. In this paper,
we extend the Turbo-CS algorithm to solve compressed sensing problems involving
more general signal structure, including compressive image recovery and
low-rank matrix recovery. A main difficulty for such an extension is that the
original Turbo-CS algorithm requires prior knowledge of the signal distribution
that is usually unavailable in practice. To overcome this difficulty, we
propose to redesign the Turbo-CS algorithm by employing a generic denoiser that
does not depend on the prior distribution and hence the name denoising-based
Turbo-CS (D-Turbo-CS). We then derive the extrinsic information for a generic
denoiser by following the Turbo-CS principle. Based on that, we optimize the
parametric extrinsic denoisers to minimize the output mean-square error (MSE).
Explicit expressions are derived for the extrinsic SURE-LET denoiser used in
compressive image denoising and also for the singular value thresholding (SVT)
denoiser used in low-rank matrix denoising. We find that the dynamics of
D-Turbo-CS can be well described by a scaler recursion called MSE evolution,
similar to the case for Turbo-CS. Numerical results demonstrate that D-Turbo-CS
considerably outperforms the counterpart algorithms in both reconstruction
quality and running time.

Multipair Massive MIMO Relaying Systems with One-Bit ADCs and DACs

Chuili Kong, Amine Mezghani, Caijun Zhong, A. Lee Swindlehurst, Zhaoyang Zhang
Comments: 14 pages, 10 figures, submitted to IEEE Trans. Signal Processing
Subjects: Information Theory (cs.IT)

This paper considers a multipair amplify-and-forward massive MIMO relaying
system with one-bit ADCs and one-bit DACs at the relay. The channel state
information is estimated via pilot training, and then utilized by the relay to
perform simple maximum-ratio combining/maximum-ratio transmission processing.
Leveraging on the Bussgang decomposition, an exact achievable rate is derived
for the system with correlated quantization noise. Based on this, a closed-form
asymptotic approximation for the achievable rate is presented, thereby enabling
efficient evaluation of the impact of key parameters on the system performance.
Furthermore, power scaling laws are characterized to study the potential energy
efficiency associated with deploying massive one-bit antenna arrays at the
relay. In addition, a power allocation strategy is designed to compensate for
the rate degradation caused by the coarse quantization. Our results suggest
that the quality of the channel estimates depends on the specific orthogonal
pilot sequences that are used, contrary to unquantized systems where any set of
orthogonal pilot sequences gives the same result. Moreover, the sum rate gap
between the double-quantized relay system and an ideal non-quantized system is
a moderate factor of (4/pi^2) in the low power regime.

Regularized Gradient Descent: A Nonconvex Recipe for Fast Joint Blind Deconvolution and Demixing

Shuyang Ling, Thomas Strohmer
Subjects: Information Theory (cs.IT)

We study the question of extracting a sequence of functions
({oldsymbol{f}_i, oldsymbol{g}_i}_{i=1}^s) from observing only the sum of
their convolutions, i.e., from (oldsymbol{y} = sum_{i=1}^s
oldsymbol{f}_iast oldsymbol{g}_i). While convex optimization techniques
are able to solve this joint blind deconvolution-demixing problem provably and
robustly under certain conditions, for medium-size or large-size problems we
need computationally faster methods without sacrificing the benefits of
mathematical rigor that come with convex methods. In this paper we present a
non-convex algorithm which guarantees exact recovery under conditions that are
competitive with convex optimization methods, with the additional advantage of
being computationally much more efficient. Our two-step algorithm converges to
the global minimum linearly and is also robust in the presence of additive
noise. While the derived performance bounds are suboptimal in terms of the
information-theoretic limit, numerical simulations show remarkable performance
even if the number of measurements is close to the number of degrees of
freedom. We discuss an application of the proposed framework in wireless
communications in connection with the Internet-of-Things.

Computing the capacity of a Markoff channel with perfect feedback is PSPACE-hard

Mukul Agarwal, Sanjoy Mitter
Subjects: Information Theory (cs.IT)

It will be proved that computing the capacity of a Markoff channel with
perfect feedback is PSPACE-hard.

Channel Impulse Response-based Distributed Physical Layer Authentication

Ammar Mahmood, Waqas Aman, M. Ozair Iqbal, M. Mahboob Ur Rahman, Qammer H. Abbasi
Comments: 6 pages, 5 figures, accepted for presentation at IEEE VTC 2017 Spring
Subjects: Information Theory (cs.IT)

In this preliminary work, we study the problem of {it distributed}
authentication in wireless networks. Specifically, we consider a system where
multiple Bob (sensor) nodes listen to a channel and report their {it
correlated} measurements to a Fusion Center (FC) which makes the ultimate
authentication decision. For the feature-based authentication at the FC,
channel impulse response has been utilized as the device fingerprint.
Additionally, the {it correlated} measurements by the Bob nodes allow us to
invoke Compressed sensing to significantly reduce the reporting overhead to the
FC. Numerical results show that: i) the detection performance of the FC is
superior to that of a single Bob-node, ii) compressed sensing leads to at least
(20\%) overhead reduction on the reporting channel at the expense of a small
((<1) dB) SNR margin to achieve the same detection performance.

On period polynomials of degree (2^m) and weight distributions of certain irreducible cyclic codes

Ioulia N. Baoulina
Comments: 17 pages, 5 tables
Subjects: Number Theory (math.NT); Information Theory (cs.IT)

We explicitly determine the values of reduced cyclotomic periods of order
(2^m), (mge 4), for finite fields of characteristic (pequiv 3) or
(5pmod{8}). These evaluations are applied to obtain explicit factorizations of
the corresponding reduced period polynomials. As another application, the
weight distributions of certain irreducible cyclic codes are described.

Multi-sensor Transmission Management for Remote State Estimation under Coordination

Kemi Ding, Yuzhe Li, Subhrakanti Dey, Ling Shi
Subjects: Methodology (stat.ME); Information Theory (cs.IT)

This paper considers the remote state estimation in a cyber-physical system
(CPS) using multiple sensors. The measurements of each sensor are transmitted
to a remote estimator over a shared channel, where simultaneous transmissions
from other sensors are regarded as interference signals. In such a competitive
environment, each sensor needs to choose its transmission power for sending
data packets taking into account of other sensors’ behavior. To model this
interactive decision-making process among the sensors, we introduce a
multi-player non-cooperative game framework. To overcome the inefficiency
arising from the Nash equilibrium (NE) solution, we propose a correlation
policy, along with the notion of correlation equilibrium (CE). An analytical
comparison of the game value between the NE and the CE is provided,
with/without the power expenditure constraints for each sensor. Also, numerical
simulations demonstrate the comparison results.

TCP in 5G mmWave Networks: Link Level Retransmissions and MP-TCP

Michele Polese, Rittwik Jana, Michele Zorzi
Comments: 6 pages, 11 figures, accepted for presentation at the 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

MmWave communications, one of the cornerstones of future 5G mobile networks,
are characterized at the same time by a potential multi-gigabit capacity and by
a very dynamic channel, sensitive to blockage, wide fluctuations in the
received signal quality, and possibly also sudden link disruption. While the
performance of physical and MAC layer schemes that address these issues has
been thoroughly investigated in the literature, the complex interactions
between mmWave links and transport layer protocols such as TCP are still
relatively unexplored. This paper uses the ns-3 mmWave module, with its channel
model based on real measurements in New York City, to analyze the performance
of the Linux TCP/IP stack (i) with and without link-layer retransmissions,
showing that they are fundamental to reach a high TCP throughput on mmWave
links and (ii) with Multipath TCP (MP-TCP) over multiple LTE and mmWave links,
illustrating which are the throughput-optimal combinations of secondary paths
and congestion control algorithms in different conditions.

Analyzing Evolving Stories in News Articles

Roberto Camacho Barranco, Arnold P. Boedihardjo, M. Shahriar Hossain
Comments: submitted to KDD 2017, 9 pages, 10 figures
Subjects: Information Retrieval (cs.IR); Information Theory (cs.IT)

There is an overwhelming number of news articles published every day around
the globe. Following the evolution of a news-story is a difficult task given
that there is no such mechanism available to track back in time to study the
diffusion of the relevant events in digital news feeds. The techniques
developed so far to extract meaningful information from a massive corpus rely
on similarity search, which results in a myopic loopback to the same topic
without providing the needed insights to hypothesize the origin of a story that
may be completely different than the news today. In this paper, we present an
algorithm that mines historical data to detect the origin of an event, segments
the timeline into disjoint groups of coherent news articles, and outlines the
most important documents in a timeline with a soft probability to provide a
better understanding of the evolution of a story. Qualitative and quantitative
approaches to evaluate our framework demonstrate that our algorithm discovers
statistically significant and meaningful stories in reasonable time.
Additionally, a relevant case study on a set of news articles demonstrates that
the generated output of the algorithm holds the promise to aid prediction of
future entities in a story.