IT博客汇 | arXiv Paper Daily: Thu, 16 Mar 2017

arXiv Paper Daily: Thu, 16 Mar 2017

我爱机器学习(52ml.net)发表于 2017-03-16 00:00:00

Neural and Evolutionary Computing

Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG

Robin Tibor Schirrmeister, Jost Tobias Springenberg, Lukas Dominique Josef Fiederer, Martin Glasstetter, Katharina Eggensperger, Michael Tangermann, Frank Hutter, Wolfram Burgard, Tonio Ball
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Deep learning with convolutional neural networks (deep ConvNets) has
revolutionized computer vision through end-to-end learning, i.e. learning from
the raw data. Now, there is increasing interest in using deep ConvNets for
end-to-end EEG analysis. However, little is known about many important aspects
of how to design and train ConvNets for end-to-end EEG decoding, and there is
still a lack of techniques to visualize the informative EEG features the
ConvNets learn.

Here, we studied deep ConvNets with a range of different architectures,
designed for decoding imagined or executed movements from raw EEG. Our results
show that recent advances from the machine learning field, including batch
normalization and exponential linear units, together with a cropped training
strategy, boosted the deep ConvNets decoding performance, reaching or
surpassing that of the widely-used filter bank common spatial patterns (FBCSP)
decoding algorithm. While FBCSP is designed to use spectral power modulations,
the features used by ConvNets are not fixed a priori. Our novel methods for
visualizing the learned features demonstrated that ConvNets indeed learned to
use spectral power modulations in the alpha, beta and high gamma frequencies.
These methods also proved useful as a technique for spatially mapping the
learned features, revealing the topography of the causal contributions of
features in different frequency bands to decoding the movement classes.

Our study thus shows how to design and train ConvNets to decode
movement-related information from the raw EEG without handcrafted features and
highlights the potential of deep ConvNets combined with advanced visualization
techniques for EEG-based brain mapping.

Neural Programming by Example

Chengxun Shu, Hongyu Zhang
Comments: 7 pages, Association for the Advancement of Artificial Intelligence (AAAI)
Journal-ref: AAAI-2017
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Software Engineering (cs.SE)

Programming by Example (PBE) targets at automatically inferring a computer
program for accomplishing a certain task from sample input and output. In this
paper, we propose a deep neural networks (DNN) based PBE model called Neural
Programming by Example (NPBE), which can learn from input-output strings and
induce programs that solve the string manipulation problems. Our NPBE model has
four neural network based components: a string encoder, an input-output
analyzer, a program generator, and a symbol selector. We demonstrate the
effectiveness of NPBE by training it end-to-end to solve some common string
manipulation problems in spreadsheet systems. The results show that our model
can induce string manipulation programs effectively. Our work is one step
towards teaching DNN to generate computer programs.

Neural Graph Machines: Learning Neural Networks Using Graphs

Thang D. Bui, Sujith Ravi, Vivek Ramavajjala
Comments: 9 pages
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Label propagation is a powerful and flexible semi-supervised learning
technique on graphs. Neural networks, on the other hand, have proven track
records in many supervised learning tasks. In this work, we propose a training
framework with a graph-regularised objective, namely “Neural Graph Machines”,
that can combine the power of neural networks and label propagation. This work
generalises previous literature on graph-augmented training of neural networks,
enabling it to be applied to multiple neural architectures (Feed-forward NNs,
CNNs and LSTM RNNs) and a wide range of graphs. The new objective allows the
neural networks to harness both labeled and unlabeled data by: (a) allowing the
network to train using labeled data as in the supervised setting, (b) biasing
the network to learn similar hidden representations for neighboring nodes on a
graph, in the same vein as label propagation. Such architectures with the
proposed objective can be trained efficiently using stochastic gradient descent
and scaled to large graphs, with a runtime that is linear in the number of
edges. The proposed joint training approach convincingly outperforms many
existing methods on a wide range of tasks (multi-label classification on social
graphs, news categorization, document classification and semantic intent
classification), with multiple forms of graph inputs (including graphs with and
without node-level features) and using different types of neural networks.

FastQA: A Simple and Efficient Neural Architecture for Question Answering

Dirk Weissenborn, Georg Wiese, Laura Seiffe
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Recent development of large-scale question answering (QA) datasets triggered
a substantial amount of research into end-to-end neural architectures for QA.
Increasingly complex systems have been conceived without comparison to a
simpler neural baseline system that would justify their complexity. In this
work, we propose a simple heuristic that guided the development of FastQA, an
efficient end-to-end neural model for question answering that is very
competitive with existing models. We further demonstrate, that an extended
version (FastQAExt) achieves state-of-the-art results on recent benchmark
datasets, namely SQuAD, NewsQA and MsMARCO, outperforming most existing models.
However, we show that increasing the complexity of FastQA to FastQAExt does not
yield any systematic improvements. We argue that the same holds true for most
existing systems that are similar to FastQAExt. A manual analysis reveals that
our proposed heuristic explains most predictions of our model, which indicates
that modeling a simple heuristic is enough to achieve strong performance on
extractive QA datasets. The overall strong performance of FastQA puts results
of existing, more complex models into perspective.

Learned Optimizers that Scale and Generalize

Olga Wichrowska, Niru Maheswaranathan, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Learning to learn has emerged as an important direction for achieving
artificial intelligence. Two of the primary barriers to its adoption are an
inability to scale to larger problems and a limited ability to generalize to
new tasks. We introduce a learned gradient descent optimizer that generalizes
well to new tasks, and which has significantly reduced memory and computation
overhead. We achieve this by introducing a novel hierarchical RNN architecture,
with minimal per-parameter overhead, augmented with additional architectural
features that mirror the known structure of optimization tasks. We also develop
a meta-training ensemble of small, diverse, optimization tasks capturing common
properties of loss landscapes. The optimizer learns to out-perform RMSProp/ADAM
on problems in this corpus. More importantly, it performs comparably or better
when applied to small convolutional neural networks, despite seeing no neural
networks in its meta-training set. Finally, it generalizes to train Inception
V3 and ResNet V2 architectures on the ImageNet dataset, optimization problems
that are of a vastly different scale than those it was trained on.

Tree Memory Networks for Modelling Long-term Temporal Dependencies

Tharindu Fernando, Simon Denman, Aaron McFadyen, Sridha Sridharan, Clinton Fookes
Comments: This paper has been submitted to JMLR
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

In the domain of sequence modelling, Recurrent Neural Networks (RNN) have
been capable of achieving impressive results in a variety of application areas
including visual question answering, part-of-speech tagging and machine
translation. However this success in modelling short term dependencies has not
successfully transitioned to application areas such as trajectory prediction,
which require capturing both short term and long term relationships. In this
paper, we propose a Tree Memory Network (TMN) for modelling long term and short
term relationships in sequence-to-sequence mapping problems. The proposed
network architecture is composed of an input module, controller and a memory
module. In contrast to related literature, which models the memory as a
sequence of historical states, we model the memory as a recursive tree
structure. This structure more effectively captures temporal dependencies
across both short term and long term sequences using its hierarchical
structure. We demonstrate the effectiveness and flexibility of the proposed TMN
in two practical problems, aircraft trajectory modelling and pedestrian
trajectory modelling in a surveillance setting, and in both cases we outperform
the current state-of-the-art. Furthermore, we perform an in depth analysis on
the evolution of the memory module content over time and provide visual
evidence on how the proposed TMN is able to map both long term and short term
relationships efficiently via a hierarchical structure.

Drone Squadron Optimization: a Self-adaptive Algorithm for Global Numerical Optimization

Vinícius Veloso de Melo, Wolfgang Banzhaf
Comments: Short version – Full version published by Springer Neural Computing and Applications
Journal-ref: Neural Computing and Applications, 2017, pp 1-28
Subjects: Optimization and Control (math.OC); Neural and Evolutionary Computing (cs.NE)

This paper proposes Drone Squadron Optimization, a new self-adaptive
metaheuristic for global numerical optimization which is updated online by a
hyper-heuristic. DSO is an artifact-inspired technique, as opposed to many
algorithms used nowadays, which are nature-inspired. DSO is very flexible
because it is not related to behaviors or natural phenomena. DSO has two core
parts: the semi-autonomous drones that fly over a landscape to explore, and the
Command Center that processes the retrieved data and updates the drones’
firmware whenever necessary. The self-adaptive aspect of DSO in this work is
the perturbation/movement scheme, which is the procedure used to generate
target coordinates. This procedure is evolved by the Command Center during the
global optimization process in order to adapt DSO to the search landscape. DSO
was evaluated on a set of widely employed benchmark functions. The statistical
analysis of the results shows that the proposed method is competitive with the
other methods in the comparison, the performance is promising, but several
future improvements are planned.

Sensor Fusion for Robot Control through Deep Reinforcement Learning

Steven Bohez, Tim Verbelen, Elias De Coninck, Bert Vankeirsbilck, Pieter Simoens, Bart Dhoedt
Comments: 6 pages, 6 figures, submitted to IROS 2017
Subjects: Robotics (cs.RO); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)

Deep reinforcement learning is becoming increasingly popular for robot
control algorithms, with the aim for a robot to self-learn useful feature
representations from unstructured sensory input leading to the optimal
actuation policy. In addition to sensors mounted on the robot, sensors might
also be deployed in the environment, although these might need to be accessed
via an unreliable wireless connection. In this paper, we demonstrate deep
neural network architectures that are able to fuse information coming from
multiple sensors and are robust to sensor failures at runtime. We evaluate our
method on a search and pick task for a robot both in simulation and the real
world.

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Chunpeng Wu, Wei Wen, Tariq Afzal, Yongmei Zhang, Yiran Chen, Hai Li
Comments: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Recently, DNN model compression based on network architecture design, e.g.,
SqueezeNet, attracted a lot attention. No accuracy drop on image classification
is observed on these extremely compact networks, compared to well-known models.
An emerging question, however, is whether these model compression techniques
hurt DNN’s learning ability other than classifying images on a single dataset.
Our preliminary experiment shows that these compression methods could degrade
domain adaptation (DA) ability, though the classification performance is
preserved. Therefore, we propose a new compact network architecture and
unsupervised DA method in this paper. The DNN is built on a new basic module
Conv-M which provides more diverse feature extractors without significantly
increasing parameters. The unified framework of our DA method will
simultaneously learn invariance across domains, reduce divergence of feature
representations, and adapt label prediction. Our DNN has 4.1M parameters, which
is only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN
obtains GoogLeNet-level accuracy both on classification and DA, and our DA
method slightly outperforms previous competitive ones. Put all together, our DA
strategy based on our DNN achieves state-of-the-art on sixteen of total
eighteen DA tasks on popular Office-31 and Office-Caltech datasets.

Computer Vision and Pattern Recognition

A clever elimination strategy for efficient minimal solvers

Zuzana Kukelova, Joe Kileel, Bernd Sturmfels, Tomas Pajdla
Comments: 13 pages, 7 figures
Journal-ref: IEEE Conference on Computer Vision and Pattern Recognition 2017
(CVPR 2017)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Symbolic Computation (cs.SC)

We present a new insight into the systematic generation of minimal solvers in
computer vision, which leads to smaller and faster solvers. Many minimal
problem formulations are coupled sets of linear and polynomial equations where
image measurements enter the linear equations only. We show that it is useful
to solve such systems by first eliminating all the unknowns that do not appear
in the linear equations and then extending solutions to the rest of unknowns.
This can be generalized to fully non-linear systems by linearization via
lifting. We demonstrate that this approach leads to more efficient solvers in
three problems of partially calibrated relative camera pose computation with
unknown focal length and/or radial distortion. Our approach also generates new
interesting constraints on the fundamental matrices of partially calibrated
cameras, which were not known before.

Hybrid Supervised-unsupervised Image Topic Visualization with Convolutional Neural Network and LDA

Kai Zhen, Mridul Birla, David Crandall, Bingjing Zhang, Judy Qiu
Comments: 6 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The system generates three errors of “Bad character(s) in field Abstract” for
no reason. Please refer to manuscript for the full abstract.

Transfer Learning for Melanoma Detection: Participation in ISIC 2017 Skin Lesion Classification Challenge

Dennis H. Murphree, Che Ngufor
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This manuscript describes our participation in the International Skin Imaging
Collaboration’s 2017 Skin Lesion Analysis Towards Melanoma Detection
competition. We participated in Part 3: Lesion Classification. The two stated
goals of this binary image classification challenge were to distinguish between
(a) melanoma and (b) nevus and seborrheic keratosis, followed by distinguishing
between (a) seborrheic keratosis and (b) nevus and melanoma. We chose a deep
neural network approach with a transfer learning strategy, using a pre-trained
Inception V3 network as both a feature extractor to provide input for a
multi-layer perceptron as well as fine-tuning an augmented Inception network.
This approach yielded validation set AUC’s of 0.84 on the second task and 0.76
on the first task, for an average AUC of 0.80. We joined the competition
unfortunately late, and we look forward to improving on these results.

Texture segmentation with Fully Convolutional Networks

Vincent Andrearczyk, Paul F. Whelan
Comments: 13 pages, 4 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the last decade, deep learning has contributed to advances in a wide range
computer vision tasks including texture analysis. This paper explores a new
approach for texture segmentation using deep convolutional neural networks,
sharing important ideas with classic filter bank based texture segmentation
methods. Several methods are developed to train Fully Convolutional Networks to
segment textures in various applications. We show in particular that these
networks can learn to recognize and segment a type of texture, e.g. wood and
grass from texture recognition datasets (no training segmentation). We
demonstrate that Fully Convolutional Networks can learn from repetitive
patterns to segment a particular texture from a single image or even a part of
an image. We take advantage of these findings to develop a method that is
evaluated on a series of supervised and unsupervised experiments and improve
the state of the art on the Prague texture segmentation datasets.

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks

Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jungkwon Lee, Jiwon Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While humans easily recognize relations between data from different domains
without any supervision, learning to automatically discover them is in general
very challenging and needs many ground-truth pairs that illustrate the
relations. To avoid costly pairing, we address the task of discovering
cross-domain relations given unpaired data. We propose a method based on
generative adversarial networks that learns to discover relations between
different domains (DiscoGAN). Using the discovered relations, our proposed
network successfully transfers style from one domain to another while
preserving key attributes such as orientation and face identity.

Automatic skin lesion segmentation with fully convolutional-deconvolutional networks

Yading Yuan, Ming Chao, Yeh-Chi Lo
Comments: ISIC2017 challenge, 4 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper summarizes our method and validation results for the ISBI
Challenge 2017 – Skin Lesion Analysis Towards Melanoma Detection – Part I:
Lesion Segmentation

Real-Time Panoramic Tracking for Event Cameras

Christian Reinbacher, Gottfried Munda, Thomas Pock
Comments: Accepted to International Conference on Computational Photography 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Event cameras are a paradigm shift in camera technology. Instead of full
frames, the sensor captures a sparse set of {em events} caused by intensity
changes. Since only the changes are transferred, those cameras are able to
capture quick movements of objects in the scene or of the camera itself. In
this work we propose a novel method to perform camera tracking of event cameras
in a panoramic setting with three degrees of freedom. We propose a direct
camera tracking formulation, similar to state-of-the-art in visual odometry. We
show that the minimal information needed for simultaneous tracking and mapping
is the spatial position of events, without using the appearance of the imaged
scene point. We verify the robustness to fast camera movements and dynamic
objects in the scene on a recently proposed dataset cite{Mueggler2016} and
self-recorded sequences.

Random Forests and VGG-NET: An Algorithm for the ISIC 2017 Skin Lesion Classification Challenge

Songtao Guo, Yixin Luo, Yanzhi Song
Comments: ISIC2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This manuscript briefly describes an algorithm developed for the ISIC 2017
Skin Lesion Classification Competition. In this task, participants are asked to
complete two independent binary image classification tasks that involve three
unique diagnoses of skin lesions (melanoma, nevus, and seborrheic keratosis).
In the first binary classification task, participants are asked to distinguish
between (a) melanoma and (b) nevus and seborrheic keratosis. In the second
binary classification task, participants are asked to distinguish between (a)
seborrheic keratosis and (b) nevus and melanoma. The other phases of the
competition are not considered. Our proposed algorithm consists of three steps:
preprocessing, classification using VGG-NET and Random Forests, and calculation
of a final score.

Block Compressive Sensing of Image and Video with Nonlocal Lagrangian Multiplier and Patch-based Sparse Representation

Trinh Van Chien, Khanh Quoc Dinh, Byeungwoo Jeon, Martin Burger
Comments: 30 pages, 13 figures, 2 tables; Accepted by Elsevier Signal Processing:Image Communication
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Although block compressive sensing (BCS) makes it tractable to sense
large-sized images and video, its recovery performance has yet to be
significantly improved because its recovered images or video usually suffer
from blurred edges, loss of details, and high-frequency oscillatory artifacts,
especially at a low subrate. This paper addresses these problems by designing a
modified total variation technique that employs multi-block gradient
processing, a denoised Lagrangian multiplier, and patch-based sparse
representation. In the case of video, the proposed recovery method is able to
exploit both spatial and temporal similarities. Simulation results confirm the
improved performance of the proposed method for compressive sensing of images
and video in terms of both objective and subjective qualities.

A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks

Satoshi Tsutsui, David Crandall
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A key problem in automatic analysis and understanding of scientific papers is
to extract semantic information from non-textual paper components like figures,
diagrams, tables, etc. This research always requires a very first preprocessing
step: decomposing compound multi-part figures into individual subfigures.
Previous work in compound figure separation has been based on manually designed
features and separation rules, which often fail for less common figure types
and layouts. Moreover, no implementation for compound figure decomposition is
publicly available.

This paper proposes a data driven approach to separate compound figures using
modern deep Convolutional Neural Networks (CNNs) to train the separator in an
end-to-end manner. CNNs eliminate the need for manually designing features and
separation rules, but require large amount of annotated training data. We
overcome this challenge using transfer learning as well as automatically
synthesizing training exemplars. We evaluate our technique on the ImageCLEF
Medical dataset, achieving 85.9% accuracy and outperforming manually engineered
previous techniques. We made the resulting approach available as an easy-to-use
Python library, aiming to promote further research in scientific figure mining.

Joint Epipolar Tracking (JET): Simultaneous optimization of epipolar geometry and feature correspondences

Henry Bradler, Matthias Ochs, Rudolf Mester
Comments: Accepted at IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, USA, March 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Traditionally, pose estimation is considered as a two step problem. First,
feature correspondences are determined by direct comparison of image patches,
or by associating feature descriptors. In a second step, the relative pose and
the coordinates of corresponding points are estimated, most often by minimizing
the reprojection error (RPE). RPE optimization is based on a loss function that
is merely aware of the feature pixel positions but not of the underlying image
intensities. In this paper, we propose a sparse direct method which introduces
a loss function that allows to simultaneously optimize the unscaled relative
pose, as well as the set of feature correspondences directly considering the
image intensity values. Furthermore, we show how to integrate statistical prior
information on the motion into the optimization process. This constructive
inclusion of a Bayesian bias term is particularly efficient in application
cases with a strongly predictable (short term) dynamic, e.g. in a driving
scenario. In our experiments, we demonstrate that the JET algorithm we propose
outperforms the classical reprojection error optimization on two synthetic
datasets and on the KITTI dataset. The JET algorithm runs in real-time on a
single CPU thread.

Learning Rank Reduced Interpolation with Principal Component Analysis

Matthias Ochs, Henry Bradler, Rudolf Mester
Comments: Accepted at Intelligent Vehicles Symposium (IV), Los Angeles, USA, June 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In computer vision most iterative optimization algorithms, both sparse and
dense, rely on a coarse and reliable dense initialization to bootstrap their
optimization procedure. For example, dense optical flow algorithms profit
massively in speed and robustness if they are initialized well in the basin of
convergence of the used loss function. The same holds true for methods as
sparse feature tracking when initial flow or depth information for new features
at arbitrary positions is needed. This makes it extremely important to have
techniques at hand that allow to obtain from only very few available
measurements a dense but still approximative sketch of a desired 2D structure
(e.g. depth maps, optical flow, disparity maps, etc.). The 2D map is regarded
as sample from a 2D random process. The method presented here exploits the
complete information given by the principal component analysis (PCA) of that
process, the principal basis and its prior distribution. The method is able to
determine a dense reconstruction from sparse measurement. When facing
situations with only very sparse measurements, typically the number of
principal components is further reduced which results in a loss of
expressiveness of the basis. We overcome this problem and inject prior
knowledge in a maximum a posterior (MAP) approach. We test our approach on the
KITTI and the virtual KITTI datasets and focus on the interpolation of depth
maps for driving scenes. The evaluation of the results show good agreement to
the ground truth and are clearly better than results of interpolation by the
nearest neighbor method which disregards statistical information.

Large Margin Object Tracking with Circulant Feature Maps

Mengmeng Wang, Yong Liu, Zeyi Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Structured output support vector machine (SVM) based tracking algorithms have
shown favorable performance recently. Nonetheless, the time-consuming candidate
sampling and complex optimization limit their real-time applications. In this
paper, we propose a novel large margin object tracking method which absorbs the
strong discriminative ability from structured output SVM and speeds up by the
correlation filter algorithm significantly. Secondly, a multimodal target
detection technique is proposed to improve the target localization precision
and prevent model drift introduced by similar objects or background noise.
Thirdly, we exploit the feedback from high-confidence tracking results to avoid
the model corruption problem. We implement two versions of the proposed tracker
with the representations from both conventional hand-crafted and deep
convolution neural networks (CNNs) based features to validate the strong
compatibility of the algorithm. The experimental results demonstrate that the
proposed tracker performs superiorly against several state-of-the-art
algorithms on the challenging benchmark sequences while runs at speed in excess
of 80 frames per second. The source code and experimental results will be made
publicly available.

Zero-Shot Recognition using Dual Visual-Semantic Mapping Paths

Yanan Li, Donghui Wang, Huanhang Hu, Yuetan Lin, Yueting Zhuang
Comments: Accepted as a full paper in IEEE Computer Vision and Pattern Recognition (CVPR) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Zero-shot recognition aims to accurately recognize objects of unseen classes
by using a shared visual-semantic mapping between the image feature space and
the semantic embedding space. This mapping is learned on training data of seen
classes and is expected to have transfer ability to unseen classes. In this
paper, we tackle this problem by exploiting the intrinsic relationship between
the semantic space manifold and the transfer ability of visual-semantic
mapping. We formalize their connection and cast zero-shot recognition as a
joint optimization problem. Motivated by this, we propose a novel framework for
zero-shot recognition, which contains dual visual-semantic mapping paths. Our
analysis shows this framework can not only apply prior semantic knowledge to
infer underlying semantic manifold in the image feature space, but also
generate optimized semantic embedding space, which can enhance the transfer
ability of the visual-semantic mapping to unseen classes. The proposed method
is evaluated for zero-shot recognition on four benchmark datasets, achieving
outstanding results.

Label Stability in Multiple Instance Learning

Veronika Cheplygina, Lauge Sørensen, David M. J. Tax, Marleen de Bruijne, Marco Loog
Comments: Published at MICCAI 2015
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We address the problem of emph{instance label stability} in multiple
instance learning (MIL) classifiers. These classifiers are trained only on
globally annotated images (bags), but often can provide fine-grained
annotations for image pixels or patches (instances). This is interesting for
computer aided diagnosis (CAD) and other medical image analysis tasks for which
only a coarse labeling is provided. Unfortunately, the instance labels may be
unstable. This means that a slight change in training data could potentially
lead to abnormalities being detected in different parts of the image, which is
undesirable from a CAD point of view. Despite MIL gaining popularity in the CAD
literature, this issue has not yet been addressed. We investigate the stability
of instance labels provided by several MIL classifiers on 5 different datasets,
of which 3 are medical image datasets (breast histopathology, diabetic
retinopathy and computed tomography lung images). We propose an unsupervised
measure to evaluate instance stability, and demonstrate that a
performance-stability trade-off can be made when comparing MIL classifiers.

Transfer Learning by Asymmetric Image Weighting for Segmentation across Scanners

Veronika Cheplygina, Annegreet van Opbroek, M. Arfan Ikram, Meike W. Vernooij, Marleen de Bruijne
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Supervised learning has been very successful for automatic segmentation of
images from a single scanner. However, several papers report deteriorated
performances when using classifiers trained on images from one scanner to
segment images from other scanners. We propose a transfer learning classifier
that adapts to differences between training and test images. This method uses a
weighted ensemble of classifiers trained on individual images. The weight of
each classifier is determined by the similarity between its training image and
the test image.

We examine three unsupervised similarity measures, which can be used in
scenarios where no labeled data from a newly introduced scanner or scanning
protocol is available. The measures are based on a divergence, a bag distance,
and on estimating the labels with a clustering procedure. These measures are
asymmetric. We study whether the asymmetry can improve classification. Out of
the three similarity measures, the bag similarity measure is the most robust
across different studies and achieves excellent results on four brain tissue
segmentation datasets and three white matter lesion segmentation datasets,
acquired at different centers and with different scanners and scanning
protocols. We show that the asymmetry can indeed be informative, and that
computing the similarity from the test image to the training images is more
appropriate than the opposite direction.

Classification of COPD with Multiple Instance Learning

Veronika Cheplygina, Lauge Sørensen, David M. J. Tax, Jesper Holst Pedersen, Marco Loog, Marleen de Bruijne
Comments: Published at International Conference on Pattern Recognition (ICPR) 2014
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Chronic obstructive pulmonary disease (COPD) is a lung disease where early
detection benefits the survival rate. COPD can be quantified by classifying
patches of computed tomography images, and combining patch labels into an
overall diagnosis for the image. As labeled patches are often not available,
image labels are propagated to the patches, incorrectly labeling healthy
patches in COPD patients as being affected by the disease. We approach
quantification of COPD from lung images as a multiple instance learning (MIL)
problem, which is more suitable for such weakly labeled data. We investigate
various MIL assumptions in the context of COPD and show that although a concept
region with COPD-related disease patterns is present, considering the whole
distribution of lung tissue patches improves the performance. The best method
is based on averaging instances and obtains an AUC of 0.742, which is higher
than the previously reported best of 0.713 on the same dataset. Using the full
training set further increases performance to 0.776, which is significantly
higher (DeLong test) than previous results.

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

Alex Kendall, Yarin Gal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

There are two major types of uncertainty one can model. Aleatoric uncertainty
captures noise inherent in the observations. On the other hand, epistemic
uncertainty accounts for uncertainty in the model — uncertainty which can be
explained away given enough data. Traditionally it has been difficult to model
epistemic uncertainty in computer vision, but with new Bayesian deep learning
tools this is now possible. We study the benefits of modeling epistemic vs.
aleatoric uncertainty in Bayesian deep learning models for vision tasks. For
this we present a Bayesian deep learning framework combining input-dependent
aleatoric uncertainty together with epistemic uncertainty. We study models
under the framework with per-pixel semantic segmentation and depth regression
tasks. Further, our explicit uncertainty formulation leads to new loss
functions for these tasks, which can be interpreted as learned attenuation.
This makes the loss more robust to noisy data, also giving new state-of-the-art
results on segmentation and depth regression benchmarks.

Comparison of the Deep-Learning-Based Automated Segmentation Methods for the Head Sectioned Images of the Virtual Korean Human Project

Mohammad Eshghi, Holger R. Roth, Masahiro Oda, Min Suk Chung, Kensaku Mori
Comments: Accepted for presentation at the 15th IAPR Conference on Machine Vision Applications (MVA2017), Nagoya, Japan
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents an end-to-end pixelwise fully automated segmentation of
the head sectioned images of the Visible Korean Human (VKH) project based on
Deep Convolutional Neural Networks (DCNNs). By converting classification
networks into Fully Convolutional Networks (FCNs), a coarse prediction map,
with smaller size than the original input image, can be created for
segmentation purposes. To refine this map and to obtain a dense pixel-wise
output, standard FCNs use deconvolution layers to upsample the coarse map.
However, upsampling based on deconvolution increases the number of network
parameters and causes loss of detail because of interpolation. On the other
hand, dilated convolution is a new technique introduced recently that attempts
to capture multi-scale contextual information without increasing the network
parameters while keeping the resolution of the prediction maps high. We used
both a standard FCN and a dilated convolution based FCN for semantic
segmentation of the head sectioned images of the VKH dataset. Quantitative
results showed approximately 20% improvement in the segmentation accuracy when
using FCNs with dilated convolutions.

End-to-end Binary Representation Learning via Direct Binary Embedding

Liu Liu, Alireza Rahimpour, Ali Taalimi, Hairong Qi
Comments: Submitted to ICIP’17
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Learning binary representation is essential to large-scale computer vision
tasks. Most existing algorithms require a separate quantization constraint to
learn effective hashing functions. In this work, we present Direct Binary
Embedding (DBE), a simple yet very effective algorithm to learn binary
representation in an end-to-end fashion. By appending an ingeniously designed
DBE layer to the deep convolutional neural network (DCNN), DBE learns binary
code directly from the continuous DBE layer activation without quantization
error. By employing the deep residual network (ResNet) as DCNN component, DBE
captures rich semantics from images. Furthermore, in the effort of handling
multilabel images, we design a joint cross entropy loss that includes both
softmax cross entropy and weighted binary cross entropy in consideration of the
correlation and independence of labels, respectively. Extensive experiments
demonstrate the significant superiority of DBE over state-of-the-art methods on
tasks of natural object recognition, image retrieval and image annotation.

Robust Non-Rigid Registration With Reweighted Dual Sparsities

Jingyu Yang, Kun Li, Yu-Kun Lai, Daoliang Guo
Comments: 12 pages, submitted to IEEE Transactions on Visualization and Computer Graphics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG); Graphics (cs.GR)

Non-rigid registration is challenging because it is ill-posed with high
degrees of freedom and is thus sensitive to noise and outliers. We propose a
robust non-rigid registration method using reweighted sparsities on position
and transformation to estimate the deformations between 3-D shapes. We
formulate the energy function with dual sparsities on both the data term and
the smoothness term, and define the smoothness constraint using local rigidity.
The dual-sparsity based non-rigid registration model is enhanced with a
reweighting scheme, and solved by transferring the model into some alternating
optimized subproblems which have exact solutions and guaranteed convergence.
Experimental results on both public datasets and real scanned datasets show
that our method outperforms the state-of-the-art methods and is more robust to
noise and outliers than conventional non-rigid registration methods.

Source Camera Identification Based On Content-Adaptive Fusion Network

Pengpeng Yang, Wei Zhao, Rongrong Ni, Yao Zhao
Comments: This article has been submitted to the 2017 IEEE International Conference on Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Source camera identification is still a hard task in forensics community,
especially for the case of the small query image size. In this paper, we
propose a solution to identify the source camera of the small-size images:
content-adaptive fusion network. In order to learn better feature
representation from the input data, content-adaptive convolutional neural
networks(CA-CNN) are constructed. We add a convolutional layer in preprocessing
stage. Moreover, with the purpose of capturing more comprehensive information,
we parallel three CA-CNNs: CA3-CNN, CA5-CNN, CA7-CNN to get the
content-adaptive fusion network. The difference of three CA-CNNs lies in the
convolutional kernel size of pre-processing layer. The experimental results
show that the proposed method is practicable and satisfactory.

Face Recognition using Multi-Modal Low-Rank Dictionary Learning

Homa Foroughi, Moein Shakeri, Nilanjan Ray, Hong Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face recognition has been widely studied due to its importance in different
applications; however, most of the proposed methods fail when face images are
occluded or captured under illumination and pose variations. Recently several
low-rank dictionary learning methods have been proposed and achieved promising
results for noisy observations. While these methods are mostly developed for
single-modality scenarios, recent studies demonstrated the advantages of
feature fusion from multiple inputs. We propose a multi-modal structured
low-rank dictionary learning method for robust face recognition, using raw
pixels of face images and their illumination invariant representation. The
proposed method learns robust and discriminative representations from
contaminated face images, even if there are few training samples with large
intra-class variations. Extensive experiments on different datasets validate
the superior performance and robustness of our method to severe illumination
variations and occlusion.

Skin lesion segmentation based on preprocessing, thresholding and neural networks

Juana M. Gutiérrez-Arriola, Marta Gómez-Álvarez, Victor Osma-Ruiz, Nicolás Sáenz-Lechón, Rubén Fraile
Comments: 4 pages, 4 figures, abstract submitted to participate in the challenge ISIC 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This abstract describes the segmentation system used to participate in the
challenge ISIC 2017: Skin Lesion Analysis Towards Melanoma Detection. Several
preprocessing techniques have been tested for three color representations (RGB,
YCbCr and HSV) of 392 images. Results have been used to choose the better
preprocessing for each channel. In each case a neural network is trained to
predict the Jaccard Index based on object characteristics. The system includes
black frames and reference circle detection algorithms but no special treatment
is done for hair removal. Segmentation is performed in two steps first the best
channel to be segmented is chosen by selecting the best neural network output.
If this output does not predict a Jaccard Index over 0.5 a more aggressive
preprocessing is performed using open and close morphological operations and
the segmentation of the channel that obtains the best output from the neural
networks is selected as the lesion.

A Proximity-Aware Hierarchical Clustering of Faces

Wei-An Lin, Jun-Cheng Chen, Rama Chellappa
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose an unsupervised face clustering algorithm called
“Proximity-Aware Hierarchical Clustering” (PAHC) that exploits the local
structure of deep representations. In the proposed method, a similarity measure
between deep features is computed by evaluating linear SVM margins. SVMs are
trained using nearest neighbors of sample data, and thus do not require any
external training data. Clusters are then formed by thresholding the similarity
scores. We evaluate the clustering performance using three challenging
unconstrained face datasets, including Celebrity in Frontal-Profile (CFP),
IARPA JANUS Benchmark A (IJB-A), and JANUS Challenge Set 3 (JANUS CS3)
datasets. Experimental results demonstrate that the proposed approach can
achieve significant improvements over state-of-the-art methods. Moreover, we
also show that the proposed clustering algorithm can be applied to curate a set
of large-scale and noisy training dataset while maintaining sufficient amount
of images and their variations due to nuisance factors. The face verification
performance on JANUS CS3 improves significantly by finetuning a DCNN model with
the curated MS-Celeb-1M dataset which contains over three million face images.

In Search of a Dataset for Handwritten Optical Music Recognition: Introducing MUSCIMA++

Jan Hajič jr., Pavel Pecina
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Optical Music Recognition (OMR) has long been without an adequate dataset and
ground truth for evaluating OMR systems, which has been a major problem for
establishing a state of the art in the field. Furthermore, machine learning
methods require training data. We analyze how the OMR processing pipeline can
be expressed in terms of gradually more complex ground truth, and based on this
analysis, we design the MUSCIMA++ dataset of handwritten music notation that
addresses musical symbol recognition and notation reconstruction. The MUSCIMA++
dataset version 0.9 consists of 140 pages of handwritten music, with 91255
manually annotated notation symbols and 82261 explicitly marked relationships
between symbol pairs. The dataset allows training and evaluating models for
symbol classification, symbol localization, and notation graph assembly, both
in isolation and jointly. Open-source tools are provided for manipulating the
dataset, visualizing the data and further annotation, and the dataset itself is
made available under an open license.

RECOD Titans at ISIC Challenge 2017

Afonso Menegola, Julia Tavares, Michel Fornaciali, Lin Tzy Li, Sandra Avila, Eduardo Valle
Comments: 5 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This extended abstract describes the participation of RECOD Titans in parts 1
and 3 of the ISIC Challenge 2017 “Skin Lesion Analysis Towards Melanoma
Detection” (ISBI 2017). Although our team has a long experience with melanoma
classification, the ISIC Challenge 2017 was the very first time we worked on
skin-lesion segmentation. For part 1 (segmentation), our final submission used
four of our models: two trained with all 2000 samples, without a validation
split, for 250 and for 500 epochs respectively; and other two trained and
validated with two different 1600/400 splits, for 220 epochs. Those four
models, individually, achieved between 0.780 and 0.783 official validation
scores. Our final submission averaged the output of those four models achieved
a score of 0.793. For part 3 (classification), the submitted test run as well
as our last official validation run were the result from a meta-model that
assembled seven base deep-learning models: three based on Inception-V4 trained
on our largest dataset; three based on Inception trained on our smallest
dataset; and one based on ResNet-101 trained on our smaller dataset. The
results of those component models were stacked in a meta-learning layer based
on an SVM trained on the validation set of our largest dataset.

Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets

Andrea Tacchetti, Stephen Voinea, Georgios Evangelopoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

The complexity of a learning task is increased by transformations in the
input space that preserve class identity. Visual object recognition for example
is affected by changes in viewpoint, scale, illumination or planar
transformations. While drastically altering the visual appearance, these
changes are orthogonal to recognition and should not be reflected in the
representation or feature encoding used for learning. We introduce a framework
for weakly supervised learning of image embeddings that are robust to
transformations and selective to the class distribution, using sets of
transforming examples (orbit sets), deep parametrizations and a novel
orbit-based loss. The proposed loss combines a discriminative, contrastive part
for orbits with a reconstruction error that learns to rectify orbit
transformations. The learned embeddings are evaluated in distance metric-based
tasks, such as one-shot classification under geometric transformations, as well
as face verification and retrieval under more realistic visual variability. Our
results suggest that orbit sets, suitably computed or observed, can be used for
efficient, weakly-supervised learning of semantically relevant image
embeddings.

Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction

Benoît Massé, Silèye Ba, Radu Horaud
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The visual focus of attention (VFOA) has been recognized as a prominent
conversational cue. We are interested in the VFOA tracking of a group of people
involved in social interaction. We note that in this case the participants look
either at each other or at an object of interest; therefore they don’t always
face a camera and, consequently, their gazes (and their VFOAs) cannot be based
on eye detection and tracking. We propose a method that exploits the
correlation between gaze direction and head orientation. Both VFOA and gaze are
modeled as latent variables in a Bayesian switching linear dynamic model. The
proposed formulation leads to a tractable learning procedure and to an
efficient gaze-and-VFOA tracking algorithm. The method is tested and
benchmarked using a publicly available dataset that contains typical
multi-party human-robot interaction scenarios, and that was recorded with both
a motion capture system, and with a camera mounted onto a robot head.

A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition

Cheng Zhao, Li Sun, Rustam Stolkin
Comments: 8 pages, 7 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the problem of simultaneous 3D reconstruction and
material recognition and segmentation. Enabling robots to recognise different
materials (concrete, metal etc.) in a scene is important for many tasks, e.g.
robotic interventions in nuclear decommissioning. Previous work on 3D semantic
reconstruction has predominantly focused on recognition of everyday domestic
objects (tables, chairs etc.), whereas previous work on material recognition
has largely been confined to single 2D images without any 3D reconstruction.
Meanwhile, most 3D semantic reconstruction methods rely on computationally
expensive post-processing, using Fully-Connected Conditional Random Fields
(CRFs), to achieve consistent segmentations. In contrast, we propose a deep
learning method which performs 3D reconstruction while simultaneously
recognising different types of materials and labelling them at the pixel level.
Unlike previous methods, we propose a fully end-to-end approach, which does not
require hand-crafted features or CRF post-processing. Instead, we use only
learned features, and the CRF segmentation constraints are incorporated inside
the fully end-to-end learned system. We present the results of experiments, in
which we trained our system to perform real-time 3D semantic reconstruction for
23 different materials in a real-world application. The run-time performance of
the system can be boosted to around 10Hz, using a conventional GPU, which is
enough to achieve real-time semantic reconstruction using a 30fps RGB-D camera.
To the best of our knowledge, this work is the first real-time end-to-end
system for simultaneous 3D reconstruction and material recognition.

6-DoF Object Pose from Semantic Keypoints

Georgios Pavlakos, Xiaowei Zhou, Aaron Chan, Konstantinos G. Derpanis, Kostas Daniilidis
Comments: IEEE International Conference on Robotics and Automation (ICRA), 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

This paper presents a novel approach to estimating the continuous six degree
of freedom (6-DoF) pose (3D translation and rotation) of an object from a
single RGB image. The approach combines semantic keypoints predicted by a
convolutional network (convnet) with a deformable shape model. Unlike prior
work, we are agnostic to whether the object is textured or textureless, as the
convnet learns the optimal representation from the available training image
data. Furthermore, the approach can be applied to instance- and class-based
pose recovery. Empirically, we show that the proposed approach can accurately
recover the 6-DoF object pose for both instance- and class-based scenarios with
a cluttered background. For class-based object pose estimation,
state-of-the-art accuracy is shown on the large-scale PASCAL3D+ dataset.

A Framework for Dynamic Image Sampling Based on Supervised Learning (SLADS)

G. M. Dilshan P. Godaliyadda, Dong Hye Ye, Michael D.Uchic, Michael A. Groeber, Gregery T. Buzzard, Charles A. Bouman
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Sparse sampling schemes have the potential to dramatically reduce image
acquisition time while simultaneously reducing radiation damage to samples.
However, for a sparse sampling scheme to be useful it is important that we are
able to reconstruct the underlying object with sufficient clarity using the
sparse measurements. In dynamic sampling, each new measurement location is
selected based on information obtained from previous measurements. Therefore,
dynamic sampling schemes have the potential to dramatically reduce the number
of measurements needed for high fidelity reconstructions. However, most
existing dynamic sampling methods for point-wise measurement acquisition tend
to be computationally expensive and are therefore too slow for practical
applications.

In this paper, we present a framework for dynamic sampling based on machine
learning techniques, which we call a supervised learning approach for dynamic
sampling (SLADS). In each step of SLADS, the objective is to find the pixel
that maximizes the expected reduction in distortion (ERD) given previous
measurements. SLADS is fast because we use a simple regression function to
compute the ERD, and it is accurate because the regression function is trained
using data sets that are representative of the specific application. In
addition, we introduce a method to terminate dynamic sampling at a desired
level of distortion, and we extended the SLADS methodology to sample groups of
pixels at each step. Finally, we present results on computationally-generated
synthetic data and experimentally-collected data to demonstrate a dramatic
improvement over state-of-the-art static sampling methods.

A PatchMatch-based Dense-field Algorithm for Video Copy-Move Detection and Localization

Luca D'Amiano, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a new algorithm for the reliable detection and localization of
video copy-move forgeries. Discovering well crafted video copy-moves may be
very difficult, especially when some uniform background is copied to occlude
foreground objects. To reliably detect both additive and occlusive copy-moves
we use a dense-field approach, with invariant features that guarantee
robustness to several post-processing operations. To limit complexity, a
suitable video-oriented version of PatchMatch is used, with a multiresolution
search strategy, and a focus on volumes of interest. Performance assessment
relies on a new dataset, designed ad hoc, with realistic copy-moves and a wide
variety of challenging situations. Experimental results show the proposed
method to detect and localize video copy-moves with good accuracy even in
adverse conditions.

Recasting Residual-based Local Descriptors as Convolutional Neural Networks: an Application to Image Forgery Detection

Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Local descriptors based on the image noise residual have proven extremely
effective for a number of forensic applications, like forgery detection and
localization. Nonetheless, motivated by promising results in computer vision,
the focus of the research community is now shifting on deep learning. In this
paper we show that a class of residual-based descriptors can be actually
regarded as a simple constrained convolutional neural network (CNN). Then, by
relaxing the constraints, and fine-tuning the net on a relatively small
training set, we obtain a significant performance improvement with respect to
the conventional detector.

Subspace Learning in The Presence of Sparse Structured Outliers and Noise

Shervin Minaee, Yao Wang
Journal-ref: IEEE International Symposium on Circuits and Systems, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Subspace learning is an important problem, which has many applications in
image and video processing. It can be used to find a low-dimensional
representation of signals and images. But in many applications, the desired
signal is heavily distorted by outliers and noise, which negatively affect the
learned subspace. In this work, we present a novel algorithm for learning a
subspace for signal representation, in the presence of structured outliers and
noise. The proposed algorithm tries to jointly detect the outliers and learn
the subspace for images. We present an alternating optimization algorithm for
solving this problem, which iterates between learning the subspace and finding
the outliers. This algorithm has been trained on a large number of image
patches, and the learned subspace is used for image segmentation, and is shown
to achieve better segmentation results than prior methods, including least
absolute deviation fitting, k-means clustering based segmentation in DjVu, and
shape primitive extraction and coding algorithm.

Learning Background-Aware Correlation Filters for Visual Tracking

Hamed Kiani Galoogahi, Ashton Fagg, Simon Lucey
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Correlation Filters (CFs) have recently demonstrated excellent performance in
terms of rapidly tracking objects under challenging photometric and geometric
variations. The strength of the approach comes from its ability to efficiently
learn – “on the fly” – how the object is changing over time. A fundamental
drawback to CFs, however, is that the background of the object is not be
modelled over time which can result in suboptimal results. In this paper we
propose a Background-Aware CF that can model how both the foreground and
background of the object varies over time. Our approach, like conventional CFs,
is extremely computationally efficient – and extensive experiments over
multiple tracking benchmarks demonstrate the superior accuracy and real-time
performance of our method compared to the state-of-the-art trackers including
those based on a deep learning paradigm.

Fully Convolutional Networks to Detect Clinical Dermoscopic Features

Jeremy Kawahara, Ghassan Hamarneh
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We use a pretrained fully convolutional neural network to detect clinical
dermoscopic features from dermoscopy skin lesion images. We reformulate the
superpixel classification task as an image segmentation problem, and extend a
neural network architecture originally designed for image classification to
detect dermoscopic features. Specifically, we interpolate the feature maps from
several layers in the network to match the size of the input, concatenate the
resized feature maps, and train the network to minimize a smoothed negative F1
score. Over the public validation leaderboard of the 2017 ISIC/ISBI Lesion
Dermoscopic Feature Extraction Challenge, our approach achieves 89.3% AUROC,
the highest averaged score when compared to the other two entries. Results over
the private test leaderboard are still to be announced.

Neural Networks for Beginners. A fast implemention in Matlab, Torch, TensorFlow

Francesco Giannini, Vincenzo Laveglia, Alessandro Rossi, Dario Zanca, Andrea Zugarini
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Mathematical Software (cs.MS); Machine Learning (stat.ML)

This report provides an introduction to some Machine Learning tools within
the most common development environments. It mainly focuses on practical
problems, skipping any theoretical introduction. It is oriented to both
students trying to approach Machine Learning and experts looking for new
frameworks.

Real-time 3D Human Tracking for Mobile Robots with Multisensors

Mengmeng Wang, Daobilige Su, Lei Shi, Yong Liu, Jaime Valls Miro
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Acquiring the accurate 3-D position of a target person around a robot
provides fundamental and valuable information that is applicable to a wide
range of robotic tasks, including home service, navigation and entertainment.
This paper presents a real-time robotic 3-D human tracking system which
combines a monocular camera with an ultrasonic sensor by the extended Kalman
filter (EKF). The proposed system consists of three sub-modules: monocular
camera sensor tracking model, ultrasonic sensor tracking model and multi-sensor
fusion. An improved visual tracking algorithm is presented to provide partial
location estimation (2-D). The algorithm is designed to overcome severe
occlusions, scale variation, target missing and achieve robust re-detection.
The scale accuracy is further enhanced by the estimated 3-D information. An
ultrasonic sensor array is employed to provide the range information from the
target person to the robot and Gaussian Process Regression is used for partial
location estimation (2-D). EKF is adopted to sequentially process multiple,
heterogeneous measurements arriving in an asynchronous order from the vision
sensor and the ultrasonic sensor separately. In the experiments, the proposed
tracking system is tested in both simulation platform and actual mobile robot
for various indoor and outdoor scenes. The experimental results show the
superior performance of the 3-D tracking system in terms of both the accuracy
and robustness.

Tree Memory Networks for Modelling Long-term Temporal Dependencies

In the domain of sequence modelling, Recurrent Neural Networks (RNN) have
been capable of achieving impressive results in a variety of application areas
including visual question answering, part-of-speech tagging and machine
translation. However this success in modelling short term dependencies has not
successfully transitioned to application areas such as trajectory prediction,
which require capturing both short term and long term relationships. In this
paper, we propose a Tree Memory Network (TMN) for modelling long term and short
term relationships in sequence-to-sequence mapping problems. The proposed
network architecture is composed of an input module, controller and a memory
module. In contrast to related literature, which models the memory as a
sequence of historical states, we model the memory as a recursive tree
structure. This structure more effectively captures temporal dependencies
across both short term and long term sequences using its hierarchical
structure. We demonstrate the effectiveness and flexibility of the proposed TMN
in two practical problems, aircraft trajectory modelling and pedestrian
trajectory modelling in a surveillance setting, and in both cases we outperform
the current state-of-the-art. Furthermore, we perform an in depth analysis on
the evolution of the memory module content over time and provide visual
evidence on how the proposed TMN is able to map both long term and short term
relationships efficiently via a hierarchical structure.

Geometry-Based Region Proposals for Real-Time Robot Detection of Tabletop Objects

Alexander Broad, Brenna Argall
Comments: Update based on work presented at RSS 2016 Deep Learning Workshop
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

We present a novel object detection pipeline for localization and recognition
in three dimensional environments. Our approach makes use of an RGB-D sensor
and combines state-of-the-art techniques from the robotics and computer vision
communities to create a robust, real-time detection system. We focus
specifically on solving the object detection problem for tabletop scenes, a
common environment for assistive manipulators. Our detection pipeline locates
objects in a point cloud representation of the scene. These clusters are
subsequently used to compute a bounding box around each object in the RGB
space. Each defined patch is then fed into a Convolutional Neural Network (CNN)
for object recognition. We also demonstrate that our region proposal method can
be used to develop novel datasets that are both large and diverse enough to
train deep learning models, and easy enough to collect that end-users can
develop their own datasets. Lastly, we validate the resulting system through an
extensive analysis of the accuracy and run-time of the full pipeline.

Artificial Intelligence

On Inconsistency Indices and Inconsistency Axioms in Pairwise Comparisons

Jiri Mazurek
Comments: 13 pages, 2 figures
Subjects: Artificial Intelligence (cs.AI)

Pairwise comparisons are an important tool of modern (multiple criteria)
decision making. Since human judgments are often inconsistent, many studies
focused on the ways how to express and measure this inconsistency, and several
inconsistency indices were proposed as an alternative to Saaty inconsistency
index and inconsistency ratio for reciprocal pairwise comparisons matrices.
This paper aims to: firstly, introduce a new measure of inconsistency of
pairwise comparisons and to prove its basic properties; secondly, to postulate
an additional axiom, an upper boundary axiom, to an existing set of axioms; and
the last, but not least, the paper provides proofs of satisfaction of this
additional axiom by selected inconsistency indices as well as it provides their
numerical comparison.

Fuzzy Rankings: Properties and Applications

Jiří Mazurek
Comments: 11 pages, 11 figures
Subjects: Artificial Intelligence (cs.AI)

In practice, a ranking of objects with respect to given set of criteria is of
considerable importance. However, due to lack of knowledge, information of time
pressure, decision makers might not be able to provide a (crisp) ranking of
objects from the top to the bottom. Instead, some objects might be ranked
equally, or better than other objects only to some degree. In such cases, a
generalization of crisp rankings to fuzzy rankings can be more useful. The aim
of the article is to introduce the notion of a fuzzy ranking and to discuss its
several properties, namely orderings, similarity and indecisiveness. The
proposed approach can be used both for group decision making or multiple
criteria decision making when uncertainty is involved.

Neural Programming by Example

Programming by Example (PBE) targets at automatically inferring a computer
program for accomplishing a certain task from sample input and output. In this
paper, we propose a deep neural networks (DNN) based PBE model called Neural
Programming by Example (NPBE), which can learn from input-output strings and
induce programs that solve the string manipulation problems. Our NPBE model has
four neural network based components: a string encoder, an input-output
analyzer, a program generator, and a symbol selector. We demonstrate the
effectiveness of NPBE by training it end-to-end to solve some common string
manipulation problems in spreadsheet systems. The results show that our model
can induce string manipulation programs effectively. Our work is one step
towards teaching DNN to generate computer programs.

Syntax-Preserving Belief Change Operators for Logic Programs

Sebastian Binnewies (1), Zhiqiang Zhuang (1), Kewen Wang (1), Bela Stantic (1) ((1) School of Information and Communication Technology, Griffith University, Australia)
Comments: 44 pages, submitted to ACM Transactions on Computational Logic
Subjects: Artificial Intelligence (cs.AI)

Recent methods have adapted the well-established AGM and belief base
frameworks for belief change to cover belief revision in logic programs. In
this study here, we present two new sets of belief change operators for logic
programs. They focus on preserving the explicit relationships expressed in the
rules of a program, a feature that is missing in purely semantic approaches
that consider programs only in their entirety. In particular, operators of the
latter class fail to satisfy preservation and support, two important properties
for belief change in logic programs required to ensure intuitive results.

We address this shortcoming of existing approaches by introducing partial
meet and ensconcement constructions for logic program belief change, which
allow us to define syntax-preserving operators that satisfy preservation and
support. Our work is novel in that our constructions not only preserve more
information from a logic program during a change operation than existing ones,
but they also facilitate natural definitions of contraction operators, the
first in the field to the best of our knowledge.

In order to evaluate the rationality of our operators, we translate the
revision and contraction postulates from the AGM and belief base frameworks to
the logic programming setting. We show that our operators fully comply with the
belief base framework and formally state the interdefinability between our
operators. We further propose an algorithm that is based on modularising a
logic program to reduce partial meet and ensconcement revisions or contractions
to performing the operation only on the relevant modules of that program.
Finally, we compare our approach to two state-of-the-art logic program revision
methods and demonstrate that our operators address the shortcomings of one and
generalise the other method.

Emergence of Grounded Compositional Language in Multi-Agent Populations

Igor Mordatch, Pieter Abbeel
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

By capturing statistical patterns in large corpora, machine learning has
enabled significant advances in natural language processing, including in
machine translation, question answering, and sentiment analysis. However, for
agents to intelligently interact with humans, simply capturing the statistical
patterns is insufficient. In this paper we investigate if, and how, grounded
compositional language can emerge as a means to achieve goals in multi-agent
populations. Towards this end, we propose a multi-agent learning environment
and learning methods that bring about emergence of a basic compositional
language. This language is represented as streams of abstract discrete symbols
uttered by agents over time, but nonetheless has a coherent structure that
possesses a defined vocabulary and syntax. We also observe emergence of
non-verbal communication such as pointing and guiding when language
communication is unavailable.

Exploring the Combination Rules of D Numbers From a Perspective of Conflict Redistribution

Xinyang Deng, Wen Jiang
Comments: 6 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI)

Dempster-Shafer theory of evidence is widely applied to uncertainty modelling
and knowledge reasoning because of its advantages in dealing with uncertain
information. But some conditions or requirements, such as exclusiveness
hypothesis and completeness constraint, limit the development and application
of that theory to a large extend. To overcome the shortcomings and enhance its
capability of representing the uncertainty, a novel model, called D numbers,
has been proposed recently. However, many key issues, for example how to
implement the combination of D numbers, remain unsolved. In the paper, we have
explored the combination of D Numbers from a perspective of conflict
redistribution, and proposed two combination rules being suitable for different
situations for the fusion of two D numbers. The proposed combination rules can
reduce to the classical Dempster’s rule in Dempster-Shafer theory under a
certain conditions. Numerical examples and discussion about the proposed rules
are also given in the paper.

Towards Moral Autonomous Systems

Vicky Charisi, Louise Dennis, Michael Fisher Robert Lieck, Andreas Matthias, Marija Slavkovik Janina Sombetzki, Alan F. T. Winfield, Roman Yampolskiy
Subjects: Artificial Intelligence (cs.AI)

Both the ethics of autonomous systems and the problems of their technical
implementation have by now been studied in some detail. Less attention has been
given to the areas in which these two separate concerns meet. This paper,
written by both philosophers and engineers of autonomous systems, addresses a
number of issues in machine ethics that are located at precisely the
intersection between ethics and engineering. We first discuss different
approaches towards the conceptual design of autonomous systems and their
implications on the ethics implementation in such systems. Then we examine
problematic areas regarding the specification and verification of ethical
behavior in autonomous systems, particularly with a view towards the
requirements of future legislation. We discuss transparency and accountability
issues that will be crucial for any future wide deployment of autonomous
systems in society. Finally we consider the, often overlooked, possibility of
intentional misuse of AI systems and the possible dangers arising out of
deliberately unethical design, implementation, and use of autonomous robots.

Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making

Qi Zhang, Satinder Singh, Edmund Durfee
Subjects: Artificial Intelligence (cs.AI)

In cooperative multiagent planning, it can often be beneficial for an agent
to make commitments about aspects of its behavior to others, allowing them in
turn to plan their own behaviors without taking the agent’s detailed behavior
into account. Extending previous work in the Bayesian setting, we consider
instead a worst-case setting in which the agent has a set of possible
environments (MDPs) it could be in, and develop a commitment semantics that
allows for probabilistic guarantees on the agent’s behavior in any of the
environments it could end up facing. Crucially, an agent receives observations
(of reward and state transitions) that allow it to potentially eliminate
possible environments and thus obtain higher utility by adapting its policy to
the history of observations. We develop algorithms and provide theory and some
preliminary empirical results showing that they ensure an agent meets its
commitments with history-dependent policies while minimizing maximum regret
over the possible environments.

InScript: Narrative texts annotated with script information

Ashutosh Modi, Tatjana Anikina, Simon Ostermann, Manfred Pinkal
Comments: Paper accepted at LREC 2016, 9 pages, The corpus can be downloaded at: this http URL
Journal-ref: LREC 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper presents the InScript corpus (Narrative Texts Instantiating Script
structure). InScript is a corpus of 1,000 stories centered around 10 different
scenarios. Verbs and noun phrases are annotated with event and participant
types, respectively. Additionally, the text is annotated with coreference
information. The corpus shows rich lexical variation and will serve as a unique
resource for the study of the role of script knowledge in natural language
processing.

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

Jacob Steinhardt, Moses Charikar, Gregory Valiant
Comments: 18 pages, pre-print
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

We introduce a criterion, resilience, which allows properties of a dataset
(such as its mean or best low rank approximation) to be robustly computed, even
in the presence of a large fraction of arbitrary additional data. Resilience is
a weaker condition than most other properties considered so far in the
literature, and yet enables robust estimation in a broader variety of settings,
including the previously unstudied problem of robust mean estimation in
(ell_p)-norms.

FastQA: A Simple and Efficient Neural Architecture for Question Answering

Dirk Weissenborn, Georg Wiese, Laura Seiffe
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Recent development of large-scale question answering (QA) datasets triggered
a substantial amount of research into end-to-end neural architectures for QA.
Increasingly complex systems have been conceived without comparison to a
simpler neural baseline system that would justify their complexity. In this
work, we propose a simple heuristic that guided the development of FastQA, an
efficient end-to-end neural model for question answering that is very
competitive with existing models. We further demonstrate, that an extended
version (FastQAExt) achieves state-of-the-art results on recent benchmark
datasets, namely SQuAD, NewsQA and MsMARCO, outperforming most existing models.
However, we show that increasing the complexity of FastQA to FastQAExt does not
yield any systematic improvements. We argue that the same holds true for most
existing systems that are similar to FastQAExt. A manual analysis reveals that
our proposed heuristic explains most predictions of our model, which indicates
that modeling a simple heuristic is enough to achieve strong performance on
extractive QA datasets. The overall strong performance of FastQA puts results
of existing, more complex models into perspective.

Weighted Voting Via No-Regret Learning

Nika Haghtalab, Ritesh Noothigattu, Ariel D. Procaccia
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA)

Voting systems typically treat all voters equally. We argue that perhaps they
should not: Voters who have supported good choices in the past should be given
higher weight than voters who have supported bad ones. To develop a formal
framework for desirable weighting schemes, we draw on no-regret learning.
Specifically, given a voting rule, we wish to design a weighting scheme such
that applying the voting rule, with voters weighted by the scheme, leads to
choices that are almost as good as those endorsed by the best voter in
hindsight. We derive possibility and impossibility results for the existence of
such weighting schemes, depending on whether the voting rule and the weighting
scheme are deterministic or randomized, as well as on the social choice axioms
satisfied by the voting rule.

Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

How can we explain the predictions of a black-box model? In this paper, we
use influence functions — a classic technique from robust statistics — to
trace a model’s prediction through the learning algorithm and back to its
training data, identifying the points most responsible for a given prediction.
Applying ideas from second-order optimization, we scale up influence functions
to modern machine learning settings and show that they can be applied to
high-dimensional black-box models, even in non-convex and non-differentiable
settings. We give a simple, efficient implementation that requires only oracle
access to gradients and Hessian-vector products. On linear models and
convolutional neural networks, we demonstrate that influence functions are
useful for many different purposes: to understand model behavior, debug models
and detect dataset errors, and even identify and exploit vulnerabilities to
adversarial training-set attacks.

A computational investigation of sources of variability in sentence comprehension difficulty in aphasia

Paul Mätzig, Shravan Vasishth, Felix Engelmann, David Caplan
Comments: 6 pages, 2 figures, 2 tables, submitted to MathPsych/ICCM 2017, Warwick, UK
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present a computational evaluation of three hypotheses about sources of
deficit in sentence comprehension in aphasia: slowed processing, intermittent
deficiency, and resource reduction. The ACT-R based Lewis & Vasishth 2005 model
is used to implement these three proposals. Slowed processing is implemented as
slowed default production-rule firing time; intermittent deficiency as
increased random noise in activation of chunks in memory; and resource
reduction as reduced goal activation. As data, we considered subject vs. object
relatives presented in a self-paced listening modality to 56 individuals with
aphasia (IWA) and 46 matched controls. The participants heard the sentences and
carried out a picture verification task to decide on an interpretation of the
sentence. These response accuracies are used to identify the best parameters
(for each participant) that correspond to the three hypotheses mentioned above.
We show that controls have more tightly clustered (less variable) parameter
values than IWA; specifically, compared to controls, among IWA there are more
individuals with low goal activations, high noise, and slow default action
times. This suggests that (i) individual patients show differential amounts of
deficit along the three dimensions of slowed processing, intermittent
deficient, and resource reduction, (ii) overall, there is evidence for all
three sources of deficit playing a role, and (iii) IWA have a more variable
range of parameter values than controls. In sum, this study contributes a proof
of concept of a quantitative implementation of, and evidence for, these three
accounts of comprehension deficits in aphasia.

Learning best K analogies from data distribution for case-based software effort estimation

Mohammad Azzeh, Yousef Elsheikh
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Case-Based Reasoning (CBR) has been widely used to generate good software
effort estimates. The predictive performance of CBR is a dataset dependent and
subject to extremely large space of configuration possibilities. Regardless of
the type of adaptation technique, deciding on the optimal number of similar
cases to be used before applying CBR is a key challenge. In this paper we
propose a new technique based on Bisecting k-medoids clustering algorithm to
better understanding the structure of a dataset and discovering the …

Fuzzy Model Tree For Early Effort Estimation

Mohammad Azzeh, Ali Bou Nassif
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Use Case Points (UCP) is a well-known method to estimate the project size,
based on Use Case diagram, at early phases of software development. Although
the Use Case diagram is widely accepted as a de-facto model for analyzing
object oriented software requirements over the world, UCP method did not take
sufficient amount of attention because, as yet, there is no consensus on how to
produce software effort from UCP. This paper aims to study the potential of
using Fuzzy Model Tree to derive effort estimates …

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Recently, DNN model compression based on network architecture design, e.g.,
SqueezeNet, attracted a lot attention. No accuracy drop on image classification
is observed on these extremely compact networks, compared to well-known models.
An emerging question, however, is whether these model compression techniques
hurt DNN’s learning ability other than classifying images on a single dataset.
Our preliminary experiment shows that these compression methods could degrade
domain adaptation (DA) ability, though the classification performance is
preserved. Therefore, we propose a new compact network architecture and
unsupervised DA method in this paper. The DNN is built on a new basic module
Conv-M which provides more diverse feature extractors without significantly
increasing parameters. The unified framework of our DA method will
simultaneously learn invariance across domains, reduce divergence of feature
representations, and adapt label prediction. Our DNN has 4.1M parameters, which
is only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN
obtains GoogLeNet-level accuracy both on classification and DA, and our DA
method slightly outperforms previous competitive ones. Put all together, our DA
strategy based on our DNN achieves state-of-the-art on sixteen of total
eighteen DA tasks on popular Office-31 and Office-Caltech datasets.

Information Retrieval

Character-based Neural Embeddings for Tweet Clustering

Svitlana Vakulenko, Lyndon Nixon, Mihai Lupu
Comments: Accepted at the SocialNLP 2017 workshop held in conjunction with EACL 2017, April 3, 2017, Valencia, Spain
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

In this paper we show how the performance of tweet clustering can be improved
by leveraging character-based neural networks. The proposed approach overcomes
the limitations related to the vocabulary explosion in the word-based models
and allows for the seamless processing of the multilingual content. Our
evaluation results and code are available on-line at
this https URL clustering

Distributed-Representation Based Hybrid Recommender System with Short Item Descriptions

Junhua He, Hankz Hankui Zhuo, Jarvan Law
Comments: 10 pages, 5 figures
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Collaborative filtering (CF) aims to build a model from users’ past behaviors
and/or similar decisions made by other users, and use the model to recommend
items for users. Despite of the success of previous collaborative filtering
approaches, they are all based on the assumption that there are sufficient
rating scores available for building high-quality recommendation models. In
real world applications, however, it is often difficult to collect sufficient
rating scores, especially when new items are introduced into the system, which
makes the recommendation task challenging. We find that there are often “short”
texts describing features of items, based on which we can approximate the
similarity of items and make recommendation together with rating scores. In
this paper we “borrow” the idea of vector representation of words to capture
the information of short texts and embed it into a matrix factorization
framework. We empirically show that our approach is effective by comparing it
with state-of-the-art approaches.

End-to-end Binary Representation Learning via Direct Binary Embedding

Liu Liu, Alireza Rahimpour, Ali Taalimi, Hairong Qi
Comments: Submitted to ICIP’17
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Learning binary representation is essential to large-scale computer vision
tasks. Most existing algorithms require a separate quantization constraint to
learn effective hashing functions. In this work, we present Direct Binary
Embedding (DBE), a simple yet very effective algorithm to learn binary
representation in an end-to-end fashion. By appending an ingeniously designed
DBE layer to the deep convolutional neural network (DCNN), DBE learns binary
code directly from the continuous DBE layer activation without quantization
error. By employing the deep residual network (ResNet) as DCNN component, DBE
captures rich semantics from images. Furthermore, in the effort of handling
multilabel images, we design a joint cross entropy loss that includes both
softmax cross entropy and weighted binary cross entropy in consideration of the
correlation and independence of labels, respectively. Extensive experiments
demonstrate the significant superiority of DBE over state-of-the-art methods on
tasks of natural object recognition, image retrieval and image annotation.

Ensemble of Neural Classifiers for Scoring Knowledge Base Triples

Ikuya Yamada, Motoki Sato, Hiroyuki Shindo
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

This paper describes our approach for the triple scoring task at WSDM Cup
2017. The task aims to assign a relevance score for each pair of entities and
their types in a knowledge base in order to enhance the ranking results in
entity retrieval tasks. We propose an approach wherein the outputs of multiple
neural network classifiers are combined using a supervised machine learning
model. The experimental results show that our proposed method achieves the best
performance in one out of three measures, and performs competitively in the
other two measures.

Computation and Language

InScript: Narrative texts annotated with script information

This paper presents the InScript corpus (Narrative Texts Instantiating Script
structure). InScript is a corpus of 1,000 stories centered around 10 different
scenarios. Verbs and noun phrases are annotated with event and participant
types, respectively. Additionally, the text is annotated with coreference
information. The corpus shows rich lexical variation and will serve as a unique
resource for the study of the role of script knowledge in natural language
processing.

Is this word borrowed? An automatic approach to quantify the likeliness of borrowing in social media

Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Prithwish Mukherjee, Monojit Choudhury, Animesh Mukherjee
Comments: 11 pages, 3 Figures
Subjects: Computation and Language (cs.CL)

Code-mixing or code-switching are the effortless phenomena of natural
switching between two or more languages in a single conversation. Use of a
foreign word in a language; however, does not necessarily mean that the speaker
is code-switching because often languages borrow lexical items from other
languages. If a word is borrowed, it becomes a part of the lexicon of a
language; whereas, during code-switching, the speaker is aware that the
conversation involves foreign words or phrases. Identifying whether a foreign
word used by a bilingual speaker is due to borrowing or code-switching is a
fundamental importance to theories of multilingualism, and an essential
prerequisite towards the development of language and speech technologies for
multilingual communities. In this paper, we present a series of novel
computational methods to identify the borrowed likeliness of a word, based on
the social media signals. We first propose context based clustering method to
sample a set of candidate words from the social media data.Next, we propose
three novel and similar metrics based on the usage of these words by the users
in different tweets; these metrics were used to score and rank the candidate
words indicating their borrowed likeliness. We compare these rankings with a
ground truth ranking constructed through a human judgment experiment. The
Spearman’s rank correlation between the two rankings (nearly 0.62 for all the
three metric variants) is more than double the value (0.26) of the most
competitive existing baseline reported in the literature. Some other striking
observations are, (i) the correlation is higher for the ground truth data
elicited from the younger participants (age less than 30) than that from the
older participants, and (ii )those participants who use mixed-language for
tweeting the least, provide the best signals of borrowing.

SyntaxNet Models for the CoNLL 2017 Shared Task

Chris Alberti, Daniel Andor, Ivan Bogatyy, Michael Collins, Dan Gillick, Lingpeng Kong, Terry Koo, Ji Ma, Mark Omernick, Slav Petrov, Chayut Thanapirom, Zora Tung, David Weiss
Comments: Tech report
Subjects: Computation and Language (cs.CL)

We describe a baseline dependency parsing system for the CoNLL2017 Shared
Task. This system, which we call “ParseySaurus,” uses the DRAGNN framework
[Kong et al, 2017] to combine transition-based recurrent parsing and tagging
with character-based word representations. On the v1.3 Universal Dependencies
Treebanks, the new system outpeforms the publicly available, state-of-the-art
“Parsey’s Cousins” models by 3.47% absolute Labeled Accuracy Score (LAS) across
52 treebanks.

Ensemble of Neural Classifiers for Scoring Knowledge Base Triples

Ikuya Yamada, Motoki Sato, Hiroyuki Shindo
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

This paper describes our approach for the triple scoring task at WSDM Cup
2017. The task aims to assign a relevance score for each pair of entities and
their types in a knowledge base in order to enhance the ranking results in
entity retrieval tasks. We propose an approach wherein the outputs of multiple
neural network classifiers are combined using a supervised machine learning
model. The experimental results show that our proposed method achieves the best
performance in one out of three measures, and performs competitively in the
other two measures.

Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

Zhen Yang, Wei Chen, Feng Wang, Bo Xu
Comments: 9 pages , 3 figures, 3 tables
Subjects: Computation and Language (cs.CL)

This paper proposes a new route for applying the generative adversarial nets
(GANs) to NLP tasks (taking the neural machine translation as an instance) and
the widespread perspective that GANs can’t work well in the NLP area turns out
to be unreasonable. In this work, we build a conditional sequence generative
adversarial net which comprises of two adversarial sub models, a generative
model (generator) which translates the source sentence into the target sentence
as the traditional NMT models do and a discriminative model (discriminator)
which discriminates the machine-translated target sentence from the
human-translated sentence. From the perspective of Turing test, the proposed
model is to generate the translation which is indistinguishable from the
human-translated one. Experiments show that the proposed model achieves
significant improvements than the traditional NMT model. In Chinese-English
translation tasks, we obtain up to +2.0 BLEU points improvement. To the best of
our knowledge, this is the first time that the quantitative results about the
application of GANs in the traditional NLP task is reported. Meanwhile, we
present detailed strategies for GAN training. In addition, We find that the
discriminator of the proposed model shows great capability in data cleaning.

Sparse Named Entity Classification using Factorization Machines

Ai Hirata, Mamoru Komachi
Comments: 4+1 pages
Subjects: Computation and Language (cs.CL)

Named entity classification is the task of classifying text-based elements
into various categories, including places, names, dates, times, and monetary
values. A bottleneck in named entity classification, however, is the data
problem of sparseness, because new named entities continually emerge, making it
rather difficult to maintain a dictionary for named entity classification.
Thus, in this paper, we address the problem of named entity classification
using matrix factorization to overcome the problem of feature sparsity.
Experimental results show that our proposed model, with fewer features and a
smaller size, achieves competitive accuracy to state-of-the-art models.

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

Diego Marcheggiani, Ivan Titov
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Semantic role labeling (SRL) is the task of identifying the
predicate-argument structure of a sentence. It is typically regarded as an
important step in the standard natural language processing pipeline, providing
information to downstream tasks such as information extraction and question
answering. As the semantic representations are closely related to syntactic
ones, we exploit syntactic information in our model. We propose a version of
graph convolutional networks (GCNs), a recent class of multilayer neural
networks operating on graphs, suited to modeling syntactic dependency graphs.
GCNs over syntactic dependency trees are used as sentence encoders, producing
latent feature representations of words in a sentence and capturing information
relevant to predicting the semantic representations. We observe that GCN layers
are complementary to LSTM ones: when we stack both GCN and LSTM layers, we
obtain a substantial improvement over an already state-of-the-art LSTM SRL
model, resulting in the best reported scores on the standard benchmark
(CoNLL-2009) both for Chinese and English.

FastQA: A Simple and Efficient Neural Architecture for Question Answering

Dirk Weissenborn, Georg Wiese, Laura Seiffe
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Recent development of large-scale question answering (QA) datasets triggered
a substantial amount of research into end-to-end neural architectures for QA.
Increasingly complex systems have been conceived without comparison to a
simpler neural baseline system that would justify their complexity. In this
work, we propose a simple heuristic that guided the development of FastQA, an
efficient end-to-end neural model for question answering that is very
competitive with existing models. We further demonstrate, that an extended
version (FastQAExt) achieves state-of-the-art results on recent benchmark
datasets, namely SQuAD, NewsQA and MsMARCO, outperforming most existing models.
However, we show that increasing the complexity of FastQA to FastQAExt does not
yield any systematic improvements. We argue that the same holds true for most
existing systems that are similar to FastQAExt. A manual analysis reveals that
our proposed heuristic explains most predictions of our model, which indicates
that modeling a simple heuristic is enough to achieve strong performance on
extractive QA datasets. The overall strong performance of FastQA puts results
of existing, more complex models into perspective.

Extending Automatic Discourse Segmentation for Texts in Spanish to Catalan

Iria da Cunha, Eric SanJuan, Juan-Manuel Torres-Moreno, Irene Castellón
Journal-ref: Proceedings of the First Workshop on Modeling, Learning and Mining
for Cross/Multilinguality (MultiLingMine 2016), 38th European Conference on
Information Retrieval (ECIR 2016)
Subjects: Computation and Language (cs.CL)

At present, automatic discourse analysis is a relevant research topic in the
field of NLP. However, discourse is one of the phenomena most difficult to
process. Although discourse parsers have been already developed for several
languages, this tool does not exist for Catalan. In order to implement this
kind of parser, the first step is to develop a discourse segmenter. In this
article we present the first discourse segmenter for texts in Catalan. This
segmenter is based on Rhetorical Structure Theory (RST) for Spanish, and uses
lexical and syntactic information to translate rules valid for Spanish into
rules for Catalan. We have evaluated the system by using a gold standard corpus
including manually segmented texts and results are promising.

A computational investigation of sources of variability in sentence comprehension difficulty in aphasia

We present a computational evaluation of three hypotheses about sources of
deficit in sentence comprehension in aphasia: slowed processing, intermittent
deficiency, and resource reduction. The ACT-R based Lewis & Vasishth 2005 model
is used to implement these three proposals. Slowed processing is implemented as
slowed default production-rule firing time; intermittent deficiency as
increased random noise in activation of chunks in memory; and resource
reduction as reduced goal activation. As data, we considered subject vs. object
relatives presented in a self-paced listening modality to 56 individuals with
aphasia (IWA) and 46 matched controls. The participants heard the sentences and
carried out a picture verification task to decide on an interpretation of the
sentence. These response accuracies are used to identify the best parameters
(for each participant) that correspond to the three hypotheses mentioned above.
We show that controls have more tightly clustered (less variable) parameter
values than IWA; specifically, compared to controls, among IWA there are more
individuals with low goal activations, high noise, and slow default action
times. This suggests that (i) individual patients show differential amounts of
deficit along the three dimensions of slowed processing, intermittent
deficient, and resource reduction, (ii) overall, there is evidence for all
three sources of deficit playing a role, and (iii) IWA have a more variable
range of parameter values than controls. In sum, this study contributes a proof
of concept of a quantitative implementation of, and evidence for, these three
accounts of comprehension deficits in aphasia.

Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks

Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas Raykar, Lili Kotlerman, Guy Lev
Comments: Submitted to Interspeech 2017
Subjects: Computation and Language (cs.CL)

The stream of words produced by Automatic Speech Recognition (ASR) systems is
devoid of any punctuations and formatting. Most natural language processing
applications usually expect segmented and well-formatted texts as input, which
is not available in ASR output. This paper proposes a novel technique of
jointly modelling multiple correlated tasks such as punctuation and
capitalization using bidirectional recurrent neural networks, which leads to
improved performance for each of these tasks. This method can be extended for
joint modelling of any other correlated multiple sequence labelling tasks.

Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering

Junbei Zhang, Xiaodan Zhu, Qian Chen, Lirong Dai, Hui Jiang
Subjects: Computation and Language (cs.CL)

The last several years have seen intensive interest in exploring
neural-network-based models for machine comprehension (MC) and question
answering (QA). In this paper, we approach the problems by closely modelling
questions in a neural network framework. We first introduce syntactic
information to help encode questions. We then view and model different types of
questions and the information shared among them as an adaptation task and
proposed adaptation models for them. On the Stanford Question Answering Dataset
(SQuAD), we show that these approaches can help attain better results over a
competitive baseline.

Character-based Neural Embeddings for Tweet Clustering

In this paper we show how the performance of tweet clustering can be improved
by leveraging character-based neural networks. The proposed approach overcomes
the limitations related to the vocabulary explosion in the word-based models
and allows for the seamless processing of the multilingual content. Our
evaluation results and code are available on-line at
this https URL clustering

Emergence of Grounded Compositional Language in Multi-Agent Populations

Igor Mordatch, Pieter Abbeel
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

By capturing statistical patterns in large corpora, machine learning has
enabled significant advances in natural language processing, including in
machine translation, question answering, and sentiment analysis. However, for
agents to intelligently interact with humans, simply capturing the statistical
patterns is insufficient. In this paper we investigate if, and how, grounded
compositional language can emerge as a means to achieve goals in multi-agent
populations. Towards this end, we propose a multi-agent learning environment
and learning methods that bring about emergence of a basic compositional
language. This language is represented as streams of abstract discrete symbols
uttered by agents over time, but nonetheless has a coherent structure that
possesses a defined vocabulary and syntax. We also observe emergence of
non-verbal communication such as pointing and guiding when language
communication is unavailable.

Distributed-Representation Based Hybrid Recommender System with Short Item Descriptions

Junhua He, Hankz Hankui Zhuo, Jarvan Law
Comments: 10 pages, 5 figures
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Collaborative filtering (CF) aims to build a model from users’ past behaviors
and/or similar decisions made by other users, and use the model to recommend
items for users. Despite of the success of previous collaborative filtering
approaches, they are all based on the assumption that there are sufficient
rating scores available for building high-quality recommendation models. In
real world applications, however, it is often difficult to collect sufficient
rating scores, especially when new items are introduced into the system, which
makes the recommendation task challenging. We find that there are often “short”
texts describing features of items, based on which we can approximate the
similarity of items and make recommendation together with rating scores. In
this paper we “borrow” the idea of vector representation of words to capture
the information of short texts and embed it into a matrix factorization
framework. We empirically show that our approach is effective by comparing it
with state-of-the-art approaches.

Multichannel End-to-end Speech Recognition

Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey
Subjects: Sound (cs.SD); Computation and Language (cs.CL)

The field of speech recognition is in the midst of a paradigm shift:
end-to-end neural networks are challenging the dominance of hidden Markov
models as a core technology. Using an attention mechanism in a recurrent
encoder-decoder architecture solves the dynamic time alignment problem,
allowing joint end-to-end training of the acoustic and language modeling
components. In this paper we extend the end-to-end framework to encompass
microphone array signal processing for noise suppression and speech enhancement
within the acoustic encoding network. This allows the beamforming components to
be optimized jointly within the recognition architecture to improve the
end-to-end speech recognition objective. Experiments on the noisy speech
benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system
outperformed the attention-based baseline with input from a conventional
adaptive beamformer.

Distributed, Parallel, and Cluster Computing

How to Stop Consensus Algorithms, locally?

Pei Xie, Keyou You, Cheng Wu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This paper studies problems on locally stopping distributed consensus
algorithms over networks where each node updates its state by interacting with
its neighbors and decides by itself whether certain level of agreement has been
achieved among nodes. Since an individual node is unable to access the states
of those beyond its neighbors, this problem becomes challenging. In this work,
we first define the stopping problem for generic distributed algorithms. Then,
a distributed algorithm is explicitly provided for each node to stop consensus
updating by exploring the relationship between the so-called local and global
consensus. Finally, we show both in theory and simulation that its
effectiveness depends both on the network size and the structure.

Network Constrained Distributed Dual Coordinate Ascent for Machine Learning

Myung Cho, Lifeng Lai, Weiyu Xu
Comments: 5 pages, 6 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

With explosion of data size and limited storage space at a single location,
data are often distributed at different locations. We thus face the challenge
of performing large-scale machine learning from these distributed data through
communication networks. In this paper, we study how the network communication
constraints will impact the convergence speed of distributed machine learning
optimization algorithms. In particular, we give the convergence rate analysis
of the distributed dual coordinate ascent in a general tree structured network.
Furthermore, by considering network communication delays, we optimize the
network-constrained dual coordinate ascent algorithms to maximize its
convergence speed. Our results show that under different network communication
delays, to achieve maximum convergence speed, one needs to adopt
delay-dependent numbers of local and global iterations for distributed dual
coordinate ascent.

Locality and Singularity for Store-Atomic Memory Models

Egor Derevenetc, Roland Meyer, Sebastian Schweizer
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Robustness is a correctness notion for concurrent programs running under
relaxed consistency models. The task is to check that the relaxed behavior
coincides (up to traces) with sequential consistency (SC). Although
computationally simple on paper (robustness has been shown to be
PSPACE-complete for TSO, PGAS, and Power), building a practical robustness
checker remains a challenge. The problem is that the various relaxations lead
to a dramatic number of computations, only few of which violate robustness.

In the present paper, we set out to reduce the search space for robustness
checkers. We focus on store-atomic consistency models and establish two
completeness results. The first result, called locality, states that a
non-robust program always contains a violating computation where only one
thread delays commands. The second result, called singularity, is even stronger
but restricted to programs without lightweight fences. It states that there is
a violating computation where a single store is delayed.

As an application of the results, we derive a linear-size source-to-source
translation of robustness to SC-reachability. It applies to general programs,
regardless of the data domain and potentially with an unbounded number of
threads and with unbounded buffers. We have implemented the translation and
verified, for the first time, PGAS algorithms in a fully automated fashion. For
TSO, our analysis outperforms existing tools.

Optimization of Lattice Boltzmann Simulations on Heterogeneous Computers

E. Calore, A. Gabbana, S. F. Schifano, R. Tripiccione
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

High-performance computing systems are more and more often based on
accelerators. Computing applications targeting those systems often follow a
host-driven approach in which hosts offload almost all compute-intensive
sections of the code onto accelerators; this approach only marginally exploits
the computational resources available on the host CPUs, limiting performance
and energy efficiency. The obvious step forward is to run compute-intensive
kernels in a concurrent and balanced way on both hosts and accelerators. In
this paper we consider exactly this problem for a class of applications based
on Lattice Boltzmann Methods, widely used in computational fluid-dynamics. Our
goal is to develop just one program, portable and able to run efficiently on
several different combinations of hosts and accelerators. To reach this goal,
we define common data layouts enabling the code to exploit efficiently the
different parallel and vector options of the various accelerators, and matching
the possibly different requirements of the compute-bound and memory-bound
kernels of the application. We also define models and metrics that predict the
best partitioning of workloads among host and accelerator, and the optimally
achievable overall performance level. We test the performance of our codes and
their scaling properties using as testbeds HPC clusters incorporating different
accelerators: Intel Xeon-Phi many-core processors, NVIDIA GPUs and AMD GPUs.

Bandwidth-efficient Storage Services for Mitigating Side Channel Attack

Pengfei Zuo, Yu Hua, Cong Wang, Wen Xia, Shunde Cao, Yukun Zhou, Yuanyuan Sun
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Data deduplication is able to effectively identify and eliminate redundant
data and only maintain a single copy of files and chunks. Hence, it is widely
used in cloud storage systems to save storage space and network bandwidth.
However, the occurrence of deduplication can be easily identified by monitoring
and analyzing network traffic, which leads to the risk of user privacy leakage.
The attacker can carry out a very dangerous side channel attack, i.e.,
learn-the-remaining-information (LRI) attack, to reveal users’ privacy
information by exploiting the side channel of network traffic in deduplication.
Existing work addresses the LRI attack at the cost of the high bandwidth
efficiency of deduplication. In order to address this problem, we propose a
simple yet effective scheme, called randomized redundant chunk scheme (RRCS),
to significantly mitigate the risk of the LRI attack while maintaining the high
bandwidth efficiency of deduplication. The basic idea behind RRCS is to add
randomized redundant chunks to mix up the real deduplication states of files
used for the LRI attack, which effectively obfuscates the view of the attacker,
who attempts to exploit the side channel of network traffic for the LRI attack.
Our security analysis shows that RRCS could significantly mitigate the risk of
the LRI attack. We implement the RRCS prototype and evaluate it by using three
large-scale real-world datasets. Experimental results demonstrate the
efficiency and efficacy of RRCS.

Friend or Foe? Population Protocols can perform Community Detection

Luca Becchetti, Andrea Clementi, Emanuele Natale, Francesco Pasquale, Prasad Raghavendra, Luca Trevisan
Comments: 26 pages
Subjects: Discrete Mathematics (cs.DM); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)

We present a simple distributed algorithm that, given a regular graph
consisting of two communities (or clusters), each inducing a good expander and
such that the cut between them has sparsity (1/mbox{polylog}(n)), recovers the
two communities.

More precisely, upon running the protocol, every node assigns itself a binary
label of (m = Theta(log n)) bits, so that with high probability, for all but
a small number of outliers, nodes within the same community are assigned labels
with Hamming distance (o(m)), while nodes belonging to different communities
receive labels with Hamming distance at least (m/2 – o(m)). We refer to such an
outcome as a “community sensitive labeling” of the graph.

Our algorithm uses (Theta(log^2 n)) local memory and computes the community
sensitive labeling after each node performs (Theta(log^2 n)) steps of local
work.

Our algorithm and its analysis work in the “(random) population protocol”
model, in which anonymous nodes do not share any global clock (the model is
asynchronous) and communication occurs over one single (random) edge per round.
We believe, this is the first provably-effective protocol for community
detection that works in this model.

Distributed Optimal Vehicle Grid Integration Strategy with User Behavior Prediction

Yingqi Xiong, Bin Wang, Chi-cheng Chu, Rajit Gadh
Comments: IEEE PES General Meeting 2017
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC)

With the increasing of electric vehicle (EV) adoption in recent years, the
impact of EV charging activities to the power grid becomes more and more
significant. In this article, an optimal scheduling algorithm which combines
smart EV charging and V2G gird service is developed to integrate EVs into power
grid as distributed energy resources, with improved system cost performance.
Specifically, an optimization problem is formulated and solved at each EV
charging station according to control signal from aggregated control center and
user charging behavior prediction by mean estimation and linear regression. The
control center collects distributed optimization results and updates the
control signal, periodically. The iteration continues until it converges to
optimal scheduling. Experimental result shows this algorithm helps fill the
valley and shave the peak in electric load profiles within a microgrid, while
the energy demand of individual driver can be satisfied.

Learning

Neural Networks for Beginners. A fast implemention in Matlab, Torch, TensorFlow

This report provides an introduction to some Machine Learning tools within
the most common development environments. It mainly focuses on practical
problems, skipping any theoretical introduction. It is oriented to both
students trying to approach Machine Learning and experts looking for new
frameworks.

Deep Embedding Forest: Forest-based Serving with Deep Embedding Features

Jie Zhu, Ying Shan, JC Mao, Dong Yu, Holakou Rahmanian, Yi Zhang
Comments: 14 pages, 3 figures, 5 tables
Subjects: Learning (cs.LG)

Deep Neural Networks (DNN) have demonstrated superior ability to extract high
level embedding vectors from low level features. Despite the success, the
serving time is still the bottleneck due to expensive run-time computation of
multiple layers of dense matrices. GPGPU, FPGA, or ASIC-based serving systems
require additional hardware that are not in the mainstream design of most
commercial applications. In contrast, tree or forest-based models are widely
adopted because of low serving cost, but heavily depend on carefully engineered
features. This work proposes a Deep Embedding Forest model that benefits from
the best of both worlds. The model consists of a number of embedding layers and
a forest/tree layer. The former maps high dimensional (hundreds of thousands to
millions) and heterogeneous low-level features to the lower dimensional
(thousands) vectors, and the latter ensures fast serving.

Built on top of a representative DNN model called Deep Crossing, and two
forest/tree-based models including XGBoost and LightGBM, a two-step Deep
Embedding Forest algorithm is demonstrated to achieve on-par or slightly better
performance as compared with the DNN counterpart, with only a fraction of
serving time on conventional hardware. After comparing with a joint
optimization algorithm called partial fuzzification, also proposed in this
paper, it is concluded that the two-step Deep Embedding Forest has achieved
near optimal performance. Experiments based on large scale data sets (up to 1
billion samples) from a major sponsored search engine proves the efficacy of
the proposed model.

Prototypical Networks for Few-shot Learning

Jake Snell, Kevin Swersky, Richard S. Zemel
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We propose prototypical networks for the problem of few-shot classification,
where a classifier must generalize to new classes not seen in the training set,
given only a small number of examples of each new class. Prototypical networks
learn a metric space in which classification can be performed by computing
Euclidean distances to prototype representations of each class. Compared to
recent approaches for few-shot learning, they reflect a simpler inductive bias
that is beneficial in this limited-data regime, and achieve state-of-the-art
results. We provide an analysis showing that some simple design decisions can
yield substantial improvements over recent approaches involving complicated
architectural choices and meta-learning. We further extend prototypical
networks to the case of zero-shot learning and achieve state-of-the-art
zero-shot results on the CU-Birds dataset.

Online Learning for Distribution-Free Prediction

Dave Zachariah, Petre Stoica, Thomas B. Schön
Subjects: Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)

We develop an online learning method for prediction, which is important in
problems with large and/or streaming data sets. We formulate the learning
approach using a covariance-fitting methodology, and show that the resulting
predictor has desirable computational and distribution-free properties: It is
implemented online with a runtime that scales linearly in the number of
samples; has a constant memory requirement; avoids local minima problems; and
prunes away redundant feature dimensions without relying on restrictive
assumptions on the data distribution. In conjunction with the split conformal
approach, it also produces distribution-free prediction confidence intervals in
a computationally efficient manner. The method is demonstrated on both real and
synthetic datasets.

Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG

Deep learning with convolutional neural networks (deep ConvNets) has
revolutionized computer vision through end-to-end learning, i.e. learning from
the raw data. Now, there is increasing interest in using deep ConvNets for
end-to-end EEG analysis. However, little is known about many important aspects
of how to design and train ConvNets for end-to-end EEG decoding, and there is
still a lack of techniques to visualize the informative EEG features the
ConvNets learn.

Here, we studied deep ConvNets with a range of different architectures,
designed for decoding imagined or executed movements from raw EEG. Our results
show that recent advances from the machine learning field, including batch
normalization and exponential linear units, together with a cropped training
strategy, boosted the deep ConvNets decoding performance, reaching or
surpassing that of the widely-used filter bank common spatial patterns (FBCSP)
decoding algorithm. While FBCSP is designed to use spectral power modulations,
the features used by ConvNets are not fixed a priori. Our novel methods for
visualizing the learned features demonstrated that ConvNets indeed learned to
use spectral power modulations in the alpha, beta and high gamma frequencies.
These methods also proved useful as a technique for spatially mapping the
learned features, revealing the topography of the causal contributions of
features in different frequency bands to decoding the movement classes.

Our study thus shows how to design and train ConvNets to decode
movement-related information from the raw EEG without handcrafted features and
highlights the potential of deep ConvNets combined with advanced visualization
techniques for EEG-based brain mapping.

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

We introduce a criterion, resilience, which allows properties of a dataset
(such as its mean or best low rank approximation) to be robustly computed, even
in the presence of a large fraction of arbitrary additional data. Resilience is
a weaker condition than most other properties considered so far in the
literature, and yet enables robust estimation in a broader variety of settings,
including the previously unstudied problem of robust mean estimation in
(ell_p)-norms.

Sharp Minima Can Generalize For Deep Nets

Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio
Comments: 8.5 pages of main content, 2.5 of bibliography and 1 page of appendix
Subjects: Learning (cs.LG)

Despite their overwhelming capacity to overfit, deep learning architectures
tend to generalize relatively well to unseen data, allowing them to be deployed
in practice. However, explaining why this is the case is still an open area of
research. One standing hypothesis that is gaining popularity, e.g. Hochreiter &
Schmidhuber (1997); Keskar et al. (2017), is that the flatness of minima of the
loss function found by stochastic gradient based methods results in good
generalization. This paper argues that most notions of flatness are problematic
for deep models and can not be directly applied to explain generalization.
Specifically, when focusing on deep networks with rectifier units, we can
exploit the particular geometry of parameter space induced by the inherent
symmetries that these architectures exhibit to build equivalent models
corresponding to arbitrarily sharper minima. Furthermore, if we allow to
reparametrize a function, the geometry of its parameters can change drastically
without affecting its generalization properties.

Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis

Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra
Subjects: Learning (cs.LG); Numerical Analysis (cs.NA); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)

Stochastic variance reduction algorithms have recently become popular for
minimizing the average of a large, but finite number of loss functions. The
present paper proposes a Riemannian stochastic quasi-Newton algorithm with
variance reduction (R-SQN-VR). The key challenges of averaging, adding, and
subtracting multiple gradients are addressed with notions of retraction and
vector transport. We present a global convergence analysis and a local
convergence rate analysis of R-SQN-VR under some natural assumptions. The
proposed algorithm is applied to the Karcher mean computation on the symmetric
positive-definite manifold and low-rank matrix completion on the Grassmann
manifold. In all cases, the proposed algorithm outperforms the Riemannian
stochastic gradient descent and the Riemannian stochastic variance reduction
algorithms.

Towards Optimal Sparse Inverse Covariance Selection through Non-Convex Optimization

Sidhant Misra, Marc Vuffray, Andrey Y. Lokhov, Michael Chertkov
Subjects: Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST)

We study the problem of reconstructing the graph of a sparse Gaussian
Graphical Model from independent observations, which is equivalent to finding
non-zero elements of an inverse covariance matrix. For a model of size (p) and
maximum degree (d), information theoretic lower bounds established in prior
works require that the number of samples needed for recovering the graph
perfectly is at least (d log p/kappa^2), where (kappa) is the minimum
normalized non-zero entry of the inverse covariance matrix. Existing algorithms
require additional assumptions to guarantee perfect graph reconstruction, and
consequently, their sample complexity is dependent on parameters that are not
present in the information theoretic lower bound. We propose an estimator,
called SLICE, that consists of a cardinality constrained least-squares
regression followed by a thresholding procedure. Without any additional
assumptions we show that SLICE attains a sample complexity of
(frac{64}{kappa^4}d log p), which differs from the lower bound by only a
factor proportional to (1/kappa^2) and depends only on parameters present in
the lower bound.

Budgeted Batch Bayesian Optimization With Unknown Batch Sizes

Vu Nguyen, Santu Rana, Sunil Gupta, Cheng Li, Svetha Venkatesh
Comments: 22 pages
Subjects: Learning (cs.LG)

Parameter settings profoundly impact the performance of machine learning
algorithms and laboratory experiments. The classical grid search or trial-error
methods are exponentially expensive in large parameter spaces, and Bayesian
optimization (BO) offers an elegant alternative for global optimization of
black box functions. In situations where the black box function can be
evaluated at multiple points simultaneously, batch Bayesian optimization is
used. Current batch BO approaches are restrictive in that they fix the number
of evaluations per batch, and this can be wasteful when the number of specified
evaluations is larger than the number of real maxima in the underlying
acquisition function. We present the Budgeted Batch Bayesian Optimization (B3O)
for hyper-parameter tuning and experimental design – we identify the
appropriate batch size for each iteration in an elegant way. To set the batch
size flexible, we use the infinite Gaussian mixture model (IGMM) for
automatically identifying the number of peaks in the underlying acquisition
functions. We solve the intractability of estimating the IGMM directly from the
acquisition function by formulating the batch generalized slice sampling to
efficiently draw samples from the acquisition function. We perform extensive
experiments for both synthetic functions and two real world applications –
machine learning hyper-parameter tuning and experimental design for alloy
hardening. We show empirically that the proposed B3O outperforms the existing
fixed batch BO approaches in finding the optimum whilst requiring a fewer
number of evaluations, thus saving cost and time.

Neural Graph Machines: Learning Neural Networks Using Graphs

Thang D. Bui, Sujith Ravi, Vivek Ramavajjala
Comments: 9 pages
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Label propagation is a powerful and flexible semi-supervised learning
technique on graphs. Neural networks, on the other hand, have proven track
records in many supervised learning tasks. In this work, we propose a training
framework with a graph-regularised objective, namely “Neural Graph Machines”,
that can combine the power of neural networks and label propagation. This work
generalises previous literature on graph-augmented training of neural networks,
enabling it to be applied to multiple neural architectures (Feed-forward NNs,
CNNs and LSTM RNNs) and a wide range of graphs. The new objective allows the
neural networks to harness both labeled and unlabeled data by: (a) allowing the
network to train using labeled data as in the supervised setting, (b) biasing
the network to learn similar hidden representations for neighboring nodes on a
graph, in the same vein as label propagation. Such architectures with the
proposed objective can be trained efficiently using stochastic gradient descent
and scaled to large graphs, with a runtime that is linear in the number of
edges. The proposed joint training approach convincingly outperforms many
existing methods on a wide range of tasks (multi-label classification on social
graphs, news categorization, document classification and semantic intent
classification), with multiple forms of graph inputs (including graphs with and
without node-level features) and using different types of neural networks.

Learned Optimizers that Scale and Generalize

Learning to learn has emerged as an important direction for achieving
artificial intelligence. Two of the primary barriers to its adoption are an
inability to scale to larger problems and a limited ability to generalize to
new tasks. We introduce a learned gradient descent optimizer that generalizes
well to new tasks, and which has significantly reduced memory and computation
overhead. We achieve this by introducing a novel hierarchical RNN architecture,
with minimal per-parameter overhead, augmented with additional architectural
features that mirror the known structure of optimization tasks. We also develop
a meta-training ensemble of small, diverse, optimization tasks capturing common
properties of loss landscapes. The optimizer learns to out-perform RMSProp/ADAM
on problems in this corpus. More importantly, it performs comparably or better
when applied to small convolutional neural networks, despite seeing no neural
networks in its meta-training set. Finally, it generalizes to train Inception
V3 and ResNet V2 architectures on the ImageNet dataset, optimization problems
that are of a vastly different scale than those it was trained on.

Online Learning Rate Adaptation with Hypergradient Descent

Atilim Gunes Baydin, Robert Cornish, David Martinez Rubio, Mark Schmidt, Frank Wood
Comments: 9 pages, 4 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We introduce a general method for improving the convergence rate of
gradient-based optimizers that is easy to implement and works well in practice.
We analyze the effectiveness of the method by applying it to stochastic
gradient descent, stochastic gradient descent with Nesterov momentum, and Adam,
showing that it improves upon these commonly used algorithms on a range of
optimization problems; in particular the kinds of objective functions that
arise frequently in deep neural network training. Our method works by
dynamically updating the learning rate during optimization using the gradient
with respect to the learning rate of the update rule itself. Computing this
“hypergradient” needs little additional computation, requires only one extra
copy of the original gradient to be stored in memory, and relies upon nothing
more than what is provided by reverse-mode automatic differentiation.

Convergence of Deep Neural Networks to a Hierarchical Covariance Matrix Decomposition

Nima Dehmamy, Neda Rohani, Aggelos Katsaggelos
Subjects: Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

We show that in a deep neural network trained with ReLU, the low-lying layers
should be replaceable with truncated linearly activated layers. We derive the
gradient descent equations in this truncated linear model and demonstrate that
–if the distribution of the training data is stationary during training– the
optimal choice for weights in these low-lying layers is the eigenvectors of the
covariance matrix of the data. If the training data is random and uniform
enough, these eigenvectors can be found using a small fraction of the training
data, thus reducing the computational complexity of training. We show how this
can be done recursively to form successive, trained layers. At least in the
first layer, our tests show that this approach improves classification of
images while reducing network size.

Tree Memory Networks for Modelling Long-term Temporal Dependencies

In the domain of sequence modelling, Recurrent Neural Networks (RNN) have
been capable of achieving impressive results in a variety of application areas
including visual question answering, part-of-speech tagging and machine
translation. However this success in modelling short term dependencies has not
successfully transitioned to application areas such as trajectory prediction,
which require capturing both short term and long term relationships. In this
paper, we propose a Tree Memory Network (TMN) for modelling long term and short
term relationships in sequence-to-sequence mapping problems. The proposed
network architecture is composed of an input module, controller and a memory
module. In contrast to related literature, which models the memory as a
sequence of historical states, we model the memory as a recursive tree
structure. This structure more effectively captures temporal dependencies
across both short term and long term sequences using its hierarchical
structure. We demonstrate the effectiveness and flexibility of the proposed TMN
in two practical problems, aircraft trajectory modelling and pedestrian
trajectory modelling in a surveillance setting, and in both cases we outperform
the current state-of-the-art. Furthermore, we perform an in depth analysis on
the evolution of the memory module content over time and provide visual
evidence on how the proposed TMN is able to map both long term and short term
relationships efficiently via a hierarchical structure.

A New Unbiased and Efficient Class of LSH-Based Samplers and Estimators for Partition Function Computation in Log-Linear Models

Ryan Spring, Anshumali Shrivastava
Subjects: Machine Learning (stat.ML); Databases (cs.DB); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

Log-linear models are arguably the most successful class of graphical models
for large-scale applications because of their simplicity and tractability.
Learning and inference with these models require calculating the partition
function, which is a major bottleneck and intractable for large state spaces.
Importance Sampling (IS) and MCMC-based approaches are lucrative. However, the
condition of having a “good” proposal distribution is often not satisfied in
practice.

In this paper, we add a new dimension to efficient estimation via sampling.
We propose a new sampling scheme and an unbiased estimator that estimates the
partition function accurately in sub-linear time. Our samples are generated in
near-constant time using locality sensitive hashing (LSH), and so are
correlated and unnormalized. We demonstrate the effectiveness of our proposed
approach by comparing the accuracy and speed of estimating the partition
function against other state-of-the-art estimation techniques including IS and
the efficient variant of Gumbel-Max sampling. With our efficient sampling
scheme, we accurately train real-world language models using only 1-2% of
computations.

Selective Harvesting over Networks

Fabricio Murai, Diogo Rennó, Bruno Ribeiro, Gisele L. Pappa, Don Towsley, Krista Gile
Comments: 28 pages, 9 figures
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

Active search (AS) on graphs focuses on collecting certain labeled nodes
(targets) given global knowledge of the network topology and its edge weights
under a query budget. However, in most networks, nodes, topology and edge
weights are all initially unknown. We introduce selective harvesting, a variant
of AS where the next node to be queried must be chosen among the neighbors of
the current queried node set; the available training data for deciding which
node to query is restricted to the subgraph induced by the queried set (and
their node attributes) and their neighbors (without any node or edge
attributes). Therefore, selective harvesting is a sequential decision problem,
where we must decide which node to query at each step. A classifier trained in
this scenario suffers from a tunnel vision effect: without recourse to
independent sampling, the urge to query promising nodes forces classifiers to
gather increasingly biased training data, which we show significantly hurts the
performance of AS methods and standard classifiers. We find that it is possible
to collect a much larger set of targets by using multiple classifiers, not by
combining their predictions as an ensemble, but switching between classifiers
used at each step, as a way to ease the tunnel vision effect. We discover that
switching classifiers collects more targets by (a) diversifying the training
data and (b) broadening the choices of nodes that can be queried next. This
highlights an exploration, exploitation, and diversification trade-off in our
problem that goes beyond the exploration and exploitation duality found in
classic sequential decision problems. From these observations we propose D3TS,
a method based on multi-armed bandits for non-stationary stochastic processes
that enforces classifier diversity, matching or exceeding the performance of
competing methods on seven real network datasets in our evaluation.

Matched bipartite block model with covariates

Zahra S. Razaee, Arash A. Amini, Jingyi Jessica Li
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

Community detection or clustering is a fundamental task in the analysis of
network data. Many real networks have a bipartite structure which makes
community detection challenging. In this paper, we consider a model which
allows for matched communities in the bipartite setting, in addition to node
covariates with information about the matching. We derive a simple fast
algorithm for fitting the model based on variational inference ideas and show
its effectiveness on both simulated and real data. A variation of the model to
allow for degree-correction is also considered, in addition to a novel approach
to fitting such degree-corrected models.

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling

Diego Marcheggiani, Ivan Titov
Subjects: Computation and Language (cs.CL); Learning (cs.LG)

Semantic role labeling (SRL) is the task of identifying the
predicate-argument structure of a sentence. It is typically regarded as an
important step in the standard natural language processing pipeline, providing
information to downstream tasks such as information extraction and question
answering. As the semantic representations are closely related to syntactic
ones, we exploit syntactic information in our model. We propose a version of
graph convolutional networks (GCNs), a recent class of multilayer neural
networks operating on graphs, suited to modeling syntactic dependency graphs.
GCNs over syntactic dependency trees are used as sentence encoders, producing
latent feature representations of words in a sentence and capturing information
relevant to predicting the semantic representations. We observe that GCN layers
are complementary to LSTM ones: when we stack both GCN and LSTM layers, we
obtain a substantial improvement over an already state-of-the-art LSTM SRL
model, resulting in the best reported scores on the standard benchmark
(CoNLL-2009) both for Chinese and English.

Classification in biological networks with hypergraphlet kernels

Jose Lugo-Martinez, Predrag Radivojac
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Biological and cellular systems are often modeled as graphs in which vertices
represent objects of interest (genes, proteins, drugs) and edges represent
relational ties among these objects (binds-to, interacts-with, regulates). This
approach has been highly successful owing to the theory, methodology and
software that support analysis and learning on graphs. Graphs, however, often
suffer from information loss when modeling physical systems due to their
inability to accurately represent multiobject relationships. Hypergraphs, a
generalization of graphs, provide a framework to mitigate information loss and
unify disparate graph-based methodologies. In this paper, we present a
hypergraph-based approach for modeling physical systems and formulate vertex
classification, edge classification and link prediction problems on
(hyper)graphs as instances of vertex classification on (extended, dual)
hypergraphs in a semi-supervised setting. We introduce a novel kernel method on
vertex- and edge-labeled (colored) hypergraphs for analysis and learning. The
method is based on exact and inexact (via hypergraph edit distances)
enumeration of small simple hypergraphs, referred to as hypergraphlets, rooted
at a vertex of interest. We extensively evaluate this method and show its
potential use in a positive-unlabeled setting to estimate the number of missing
and false positive links in protein-protein interaction networks.

Network Constrained Distributed Dual Coordinate Ascent for Machine Learning

Myung Cho, Lifeng Lai, Weiyu Xu
Comments: 5 pages, 6 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

With explosion of data size and limited storage space at a single location,
data are often distributed at different locations. We thus face the challenge
of performing large-scale machine learning from these distributed data through
communication networks. In this paper, we study how the network communication
constraints will impact the convergence speed of distributed machine learning
optimization algorithms. In particular, we give the convergence rate analysis
of the distributed dual coordinate ascent in a general tree structured network.
Furthermore, by considering network communication delays, we optimize the
network-constrained dual coordinate ascent algorithms to maximize its
convergence speed. Our results show that under different network communication
delays, to achieve maximum convergence speed, one needs to adopt
delay-dependent numbers of local and global iterations for distributed dual
coordinate ascent.

Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets

Andrea Tacchetti, Stephen Voinea, Georgios Evangelopoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

The complexity of a learning task is increased by transformations in the
input space that preserve class identity. Visual object recognition for example
is affected by changes in viewpoint, scale, illumination or planar
transformations. While drastically altering the visual appearance, these
changes are orthogonal to recognition and should not be reflected in the
representation or feature encoding used for learning. We introduce a framework
for weakly supervised learning of image embeddings that are robust to
transformations and selective to the class distribution, using sets of
transforming examples (orbit sets), deep parametrizations and a novel
orbit-based loss. The proposed loss combines a discriminative, contrastive part
for orbits with a reconstruction error that learns to rectify orbit
transformations. The learned embeddings are evaluated in distance metric-based
tasks, such as one-shot classification under geometric transformations, as well
as face verification and retrieval under more realistic visual variability. Our
results suggest that orbit sets, suitably computed or observed, can be used for
efficient, weakly-supervised learning of semantically relevant image
embeddings.

Audio Scene Classification with Deep Recurrent Neural Networks

Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Alfred Mertins
Comments: submitted for Interspeech 2017
Subjects: Sound (cs.SD); Learning (cs.LG)

We introduce in this work an efficient approach for audio scene
classification using deep recurrent neural networks. A scene audio signal is
firstly transformed into a sequence of high-level label tree embedding feature
vectors. The vector sequence is then divided into multiple subsequences on
which a deep GRU-based recurrent neural network is trained for
sequence-to-label classification. The global predicted label for the entire
sequence is finally obtained via aggregation of subsequence classification
outputs. We will show that our approach obtain an F1-score of 97.7% on the
LITIS Rouen dataset, which is the largest dataset publicly available for the
task. Compared to the best previously reported result on the dataset, our
approach is able to reduce the relative classification error by 35.3%.

Weighted Voting Via No-Regret Learning

Nika Haghtalab, Ritesh Noothigattu, Ariel D. Procaccia
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA)

Voting systems typically treat all voters equally. We argue that perhaps they
should not: Voters who have supported good choices in the past should be given
higher weight than voters who have supported bad ones. To develop a formal
framework for desirable weighting schemes, we draw on no-regret learning.
Specifically, given a voting rule, we wish to design a weighting scheme such
that applying the voting rule, with voters weighted by the scheme, leads to
choices that are almost as good as those endorsed by the best voter in
hindsight. We derive possibility and impossibility results for the existence of
such weighting schemes, depending on whether the voting rule and the weighting
scheme are deterministic or randomized, as well as on the social choice axioms
satisfied by the voting rule.

Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

How can we explain the predictions of a black-box model? In this paper, we
use influence functions — a classic technique from robust statistics — to
trace a model’s prediction through the learning algorithm and back to its
training data, identifying the points most responsible for a given prediction.
Applying ideas from second-order optimization, we scale up influence functions
to modern machine learning settings and show that they can be applied to
high-dimensional black-box models, even in non-convex and non-differentiable
settings. We give a simple, efficient implementation that requires only oracle
access to gradients and Hessian-vector products. On linear models and
convolutional neural networks, we demonstrate that influence functions are
useful for many different purposes: to understand model behavior, debug models
and detect dataset errors, and even identify and exploit vulnerabilities to
adversarial training-set attacks.

On the benefits of output sparsity for multi-label classification

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Joseph Salmon
Subjects: Statistics Theory (math.ST); Learning (cs.LG); Machine Learning (stat.ML)

The multi-label classification framework, where each observation can be
associated with a set of labels, has generated a tremendous amount of attention
over recent years. The modern multi-label problems are typically large-scale in
terms of number of observations, features and labels, and the amount of labels
can even be comparable with the amount of observations. In this context,
different remedies have been proposed to overcome the curse of dimensionality.
In this work, we aim at exploiting the output sparsity by introducing a new
loss, called the sparse weighted Hamming loss. This proposed loss can be seen
as a weighted version of classical ones, where active and inactive labels are
weighted separately. Leveraging the influence of sparsity in the loss function,
we provide improved generalization bounds for the empirical risk minimizer, a
suitable property for large-scale problems. For this new loss, we derive rates
of convergence linear in the underlying output-sparsity rather than linear in
the number of labels. In practice, minimizing the associated risk can be
performed efficiently by using convex surrogates and modern convex optimization
algorithms. We provide experiments on various real-world datasets demonstrating
the pertinence of our approach when compared to non-weighted techniques.

Optimal Densification for Fast and Accurate Minwise Hashing

Anshumali Shrivastava
Comments: Fast Minwise Hashing
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

Minwise hashing is a fundamental and one of the most successful hashing
algorithm in the literature. Recent advances based on the idea of
densification~cite{Proc:OneHashLSH_ICML14,Proc:Shrivastava_UAI14} have shown
that it is possible to compute (k) minwise hashes, of a vector with (d)
nonzeros, in mere ((d + k)) computations, a significant improvement over the
classical (O(dk)). These advances have led to an algorithmic improvement in the
query complexity of traditional indexing algorithms based on minwise hashing.
Unfortunately, the variance of the current densification techniques is
unnecessarily high, which leads to significantly poor accuracy compared to
vanilla minwise hashing, especially when the data is sparse. In this paper, we
provide a novel densification scheme which relies on carefully tailored
2-universal hashes. We show that the proposed scheme is variance-optimal, and
without losing the runtime efficiency, it is significantly more accurate than
existing densification techniques. As a result, we obtain a significantly
efficient hashing scheme which has the same variance and collision probability
as minwise hashing. Experimental evaluations on real sparse and
high-dimensional datasets validate our claims. We believe that given the
significant advantages, our method will replace minwise hashing implementations
in practice.

Sensor Fusion for Robot Control through Deep Reinforcement Learning

Deep reinforcement learning is becoming increasingly popular for robot
control algorithms, with the aim for a robot to self-learn useful feature
representations from unstructured sensory input leading to the optimal
actuation policy. In addition to sensors mounted on the robot, sensors might
also be deployed in the environment, although these might need to be accessed
via an unreliable wireless connection. In this paper, we demonstrate deep
neural network architectures that are able to fuse information coming from
multiple sensors and are robust to sensor failures at runtime. We evaluate our
method on a search and pick task for a robot both in simulation and the real
world.

Information Theory

A Real-Time Millimeter-Wave Phased Array MIMO Channel Sounder

C. Umit Bas, Rui Wang, Dimitris Psychoudakis, Thomas Henige, Robert Monroe, Jeongho Park, Jianzhong Zhang, Andreas F. Molisch
Comments: 6 pages, 15 figures, conference paper
Subjects: Information Theory (cs.IT)

In this paper, we present a novel real-time MIMO channel sounder for 28 GHz.
Until now, the common practice to investigate the directional characteristics
of millimeter-wave channels has been using a rotating horn antenna. The sounder
presented here is capable of performing horizontal and vertical beam steering
with the help of phased arrays. Thanks to fast beam-switching capability, the
proposed sounder can perform measurements that are directionally resolved both
at the transmitter (TX) and receiver (RX) as fast as 1.44 milliseconds compared
to the minutes or even hours required for rotating horn antenna sounders. This
does not only enable us to measure more points for better statistical inference
but also allows to perform directional analysis in dynamic environments.
Equally importantly, the short measurement time combined with the high phase
stability of our setup limits the phase drift between TX and RX, enabling
phase-coherent sounding of all beam pairs even when TX and RX are physically
separated and have no cabled connection for synchronization. This ensures that
the measurement data is suitable for high-resolution parameter extraction
algorithms. Along with the system design and specifications, this paper also
discusses the measurements performed for verification of the sounder.
Furthermore, we present sample measurements from a channel sounding campaign
performed on a residential street.

Throughput Analysis for Wavelet OFDM in Broadband Power Line Communications

Freddy A. Pinto-Benel, Manuel Blanco-Velasco, Fernando Cruz-Roldán
Subjects: Information Theory (cs.IT)

Windowed orthogonal frequency-division multiplexing (OFDM) and wavelet OFDM
have been proposed as medium access techniques for broadband communications
over the power line network by the standard IEEE 1901. Windowed OFDM has been
extensively researched and employed in different fields of communication, while
wavelet OFDM, which has been recently recommended for the first time in a
standard, has received less attention. This work is aimed to show that wavelet
OFDM, which basically is an Extended Lapped Transform-based multicarrier
modulation (ELT-MCM), is a viable and attractive alternative for data
transmission in hostile scenarios, such as in-home PLC. To this end, we obtain
theoretical expressions for ELT-MCM of: 1) the useful signal power, 2) the
inter-symbol interference (ISI) power, 3) the inter-carrier interference (ICI)
power, and 4) the noise power at the receiver side. The system capacity and the
achievable throughput are derived from these. This study includes several
computer simulations that show that ELT-MCM is an efficient alternative to
improve data rates in PLC networks.

Steerable Discrete Fourier Transform

Giulia Fracastoro, Enrico Magli
Subjects: Information Theory (cs.IT)

Directional transforms have recently raised a lot of interest thanks to their
numerous applications in signal compression and analysis. In this letter, we
introduce a generalization of the discrete Fourier transform, called steerable
DFT (SDFT). Since the DFT is used in numerous fields, it may be of interest in
a wide range of applications. Moreover, we also show that the SDFT is highly
related to other well-known transforms, such as the Fourier sine and cosine
transforms and the Hilbert transforms.

Statistical Multiplexing of Computations in C-RAN with Tradeoffs in Latency and Energy

Anders E. Kalør, Mauricio I.Agurto, Nuno K. Pratas, Jimmy J. Nielsen, Petar Popovski
Comments: Accepted for presentation at the 3rd International Workshop on 5G RAN Design (ICC 2017)
Subjects: Information Theory (cs.IT)

In the Cloud Radio Access Network (C-RAN) architecture, the baseband signals
from multiple remote radio heads are processed in a centralized baseband unit
(BBU) pool. This architecture allows network operators to adapt the BBU’s
computational resources to the aggregate access load experienced at the BBU,
which can change in every air-interface access frame. The degree of savings
that can be achieved by adapting the resources is a tradeoff between savings,
adaptation frequency, and increased queuing time. If the time scale for
adaptation of the resource multiplexing is greater than the access frame
duration, then this may result in additional access latency and limit the
energy savings. In this paper we investigate the tradeoff by considering two
extreme time-scales for the resource multiplexing: (i) long-term, where the
computational resources are adapted over periods much larger than the access
frame durations; (ii) short-term, where the adaption is below the access frame
duration. We develop a general C-RAN queuing model that describes the access
latency and show, for Poisson arrivals, that long-term multiplexing achieves
savings comparable to short-term multiplexing, while offering low
implementation complexity.

Several Classes of Trace Codes With Either Optimal Two Weights or a Few Weights over (mathbb{F}_{q}+umathbb{F}_{q})

Hongwei Liu, Youcef Maouche
Subjects: Information Theory (cs.IT)

Let (p) be a prime number, (q=p^s) for a positive integer (s). For any
positive divisor (e) of (q-1), we construct an infinite families codes of
dimension (2m) with few Lee-weight. These codes are defined as trace codes over
the ring (R=mathbb{F}_q + umathbb{F}_q), (u^2 = 0). Using Gauss sums, their
Lee weight distribution is provided. When (gcd(e,m)=1), we obtain an infinite
family of two-weight codes which meet the Griesmer bound. Moreover, if
(gcd(e,m)=2, 3) or (4) we construct new infinite families with at most
five-weight. These codes can be used in authentication codes and secret sharing
schemes.

Fairness Comparison of Uplink NOMA and OMA

Zhiqiang Wei, Jiajia Guo, Derrick Wing Kwan Ng, Jinhong Yuan
Comments: 6 pages, accepted for publication, VTC 2017, Spring, Sydney
Subjects: Information Theory (cs.IT)

In this paper, we compare the resource allocation fairness of uplink
communications between non-orthogonal multiple access (NOMA) schemes and
orthogonal multiple access (OMA) schemes. Through characterizing the
contribution of the individual user data rate to the system sum rate, we
analyze the fundamental reasons that NOMA offers a more fair resource
allocation than that of OMA in asymmetric channels. Furthermore, a fairness
indicator metric based on Jain’s index is proposed to measure the asymmetry of
multiuser channels. More importantly, the proposed metric provides a selection
criterion for choosing between NOMA and OMA for fair resource allocation. Based
on this discussion, we propose a hybrid NOMA-OMA scheme to further enhance the
users fairness. Simulation results confirm the accuracy of the proposed metric
and demonstrate the fairness enhancement of the proposed hybrid NOMA-OMA scheme
compared to the conventional OMA and NOMA schemes.

D2D-Aware Device Caching in MmWave-Cellular Networks

Nikolaos Giatsoglou, Konstantinos Ntontin, Elli Kartsakli, Angelos Antonopoulos, Christos Verikoukis
Subjects: Information Theory (cs.IT)

In this paper, we propose a novel device caching policy that facilitates
popular content exchange through high-rate device-to-device (D2D)
millimeter-wave (mmWave) communication. The proposed policy, which is termed
D2D-aware caching (DAC) policy, splits the cacheable content into two content
groups and distributes it evenly at the paired devices. By exploiting the
high-bandwidth availability and the directionality of mmWaves, we ensure high
rates for the D2D transmissions, while mitigating the co-channel interference
that limits the D2D-communication potentials in the sub-6 GHz bands.
Furthermore, based on a stochastic-geometry approach for the modeling of the
location and number of the base stations (BSs) and users (UEs) in the network,
we analytically derive the offloading gain that the proposed policy achieves
and the distribution of the content retrieval delay for both half- and
full-duplex D2D communication. The accuracy of the proposed analytical
framework is validated against Monte-Carlo simulations. In addition, for a wide
range of content-popularity indicator the results show that the proposed policy
achieves better offloading and smaller content-retrieval delays than a widely
considered state-of-the-art policy that caches the same content at every UE.

On the Support Recovery of Jointly Sparse Gaussian Sources using Sparse Bayesian Learning

Saurabh Khanna, Chandra R. Murthy
Comments: 14 pages
Subjects: Information Theory (cs.IT)

In this work, we provide non-asymptotic, probabilistic guarantees for
successful sparse support recovery by the Multiple Sparse Bayesian Learning
(M-SBL) algorithm in the multiple measurement vector (MMV) framework. For joint
sparse Gaussian sources, we show that M-SBL can perfectly recover their common
nonzero support with arbitrarily high probability using only finitely many
MMVs. In fact, the support error probability decays exponentially fast with the
number of MMVs, with the decay rate depending on the restricted isometry
constant of the self Khatri-Rao product of the measurement matrix. Our analysis
theoretically confirms that M-SBL can indeed recover supports of size up to
(mathcal{O}(m^{2})), where (m) is the number of measurements per MMV. This is
in contrast to popular MMV algorithms in compressed sensing such as
simultaneous orthogonal matching pursuit and group LASSO, which have been shown
to succeed only when the sparsity is at most (mathcal{O}(m)). Next, in the
presence of measurement noise, we show that M-SBL can recover the true
(k)-sparse support of (n)-dimensional joint sparse Gaussian sources using at
most (mathcal{O}(k log{n})) MMVs, while only a single MMV suffices in the
noiseless case. Unlike existing support recovery guarantees for M-SBL, our
sufficient conditions are truly non-asymptotic and do not require the
orthogonality of the nonzero rows of the joint sparse signals.

Greedy-Merge Degrading has Optimal Power-Law

Assaf Kartowsky, Ido Tal
Comments: 17 pages, 4 figures, 1 table, to be submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)

Consider a channel with a given input alphabet size and a given input
distribution. Our aim is to degrade or upgrade it to a channel with at most L
output letters.

The paper contains four main results. The first result, from which the paper
title is derived, deals with the so called “greedy-merge” algorithm. We derive
an upper bound on the reduction in mutual information between input and output.
This upper bound is within a constant factor of an algorithm-independent lower
bound. Thus, we establish that greedy-merge is optimal in the power-law sense.

The other main results deal with upgrading. The second result shows that a
certain sequence of channels that was previously shown to be “hard” for
degrading, displays the same hardness in the context of upgrading. That is,
suppose we are given such a channel and a corresponding input distribution. If
we upgrade (degrade) to a new channel with L output letters, we incur an
increase (decrease) in mutual information between input and output. We show
that a previously derived bound on the decrease in mutual information for the
degrading case is also a lower bound on the increase for the upgrading case.

The third result is an efficient algorithm for optimal upgrading, in the
binary-input case. That is, we are given a channel and an input distribution.
We must find an upgraded channel with L output letters, for which the increase
in mutual information is minimal. We give a simple characterization of such a
channel, which implies an efficient algorithm.

The fourth result is an analog of the first result for the upgrading case,
when the input is binary. That is, we first present a sub-optimal algorithm for
the setting considered in the third result. By analyzing the algorithm, we show
that the increase incurred in mutual information is within a constant factor of
the lower bound derived in the second result.

Low-complexity Location-aware Multi-user Massive MIMO Beamforming for High Speed Train Communications

Xuhong Chen, Pingyi Fan
Comments: This paper has been accepted for future publication by VTC2017-Spring
Subjects: Information Theory (cs.IT)

Massive Multiple-input Multiple-output (MIMO) adaption is one of the primary
evolving objectives for the next generation high speed train (HST)
communication system. In this paper, we consider how to design an efficient
low-complexity location-aware beamforming for the multi-user (MU) massive MIMO
system in HST scenario. We first put forward a low-complexity beamforming based
on location information, where multiple users are considered. Then, without
considering inter-beam interference, a closed-form solution to maximize the
total service competence of base station (BS) is proposed in this MU HST
scenario. Finally, we present a location-aid searching-based suboptimal
solution to eliminate the inter-beam interference and maximize the BS service
competence. Various simulations are given to exhibit the advantages of our
proposed massive MIMO beamforming method.

Revisiting Frequency Reuse towards Supporting Ultra-Reliable Ubiquitous-Rate Communication

Jihong Park, Dong Min Kim, Petar Popovski, Seong-Lyun Kim
Comments: 7 pages, 6 figures, to appear in proc. IEEE WiOpt Wksp. SpaSWiN 2017
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

One of the goals of 5G wireless systems stated by the NGMN alliance is to
provide moderate rates (50+ Mbps) everywhere and with very high reliability. We
term this service Ultra-Reliable Ubiquitous-Rate Communication (UR2C). This
paper investigates the role of frequency reuse in supporting UR2C in the
downlink. To this end, two frequency reuse schemes are considered:
user-specific frequency reuse (FRu) and BS-specific frequency reuse (FRb). For
a given unit frequency channel, FRu reduces the number of serving user
equipments (UEs), whereas FRb directly decreases the number of interfering base
stations (BSs). This increases the distance from the interfering BSs and the
signal-to-interference ratio (SIR) attains ultra-reliability, e.g. 99% SIR
coverage at a randomly picked UE. The ultra-reliability is, however, achieved
at the cost of the reduced frequency allocation, which may degrade overall
downlink rate. To fairly capture this reliability-rate tradeoff, we propose
ubiquitous rate defined as the maximum downlink rate whose required SIR can be
achieved with ultra-reliability. By using stochastic geometry, we derive
closed-form ubiquitous rate as well as the optimal frequency reuse rules for
UR2C.

Temporal Analysis of Measured LOS Massive MIMO Channels with Mobility

Paul Harris, Steffen Malkowsky, Joao Vieira, Fredrik Tufvesson, Wael Boukley Hasan, Liang Liu, Mark Beach, Simon Armour, Ove Edfors
Comments: Accepted for presentation at the 85th IEEE Vehicular Technology Conference in Sydney. 5 Pages. arXiv admin note: substantial text overlap with arXiv:1701.08818
Subjects: Information Theory (cs.IT)

The first measured results for massive multiple-input, multiple-output (MIMO)
performance in a line-of-sight (LOS) scenario with moderate mobility are
presented, with 8 users served by a 100 antenna base Station (BS) at 3.7 GHz.
When such a large number of channels dynamically change, the inherent
propagation and processing delay has a critical relationship with the rate of
change, as the use of outdated channel information can result in severe
detection and precoding inaccuracies. For the downlink (DL) in particular, a
time division duplex (TDD) configuration synonymous with massive MIMO
deployments could mean only the uplink (UL) is usable in extreme cases.
Therefore, it is of great interest to investigate the impact of mobility on
massive MIMO performance and consider ways to combat the potential limitations.
In a mobile scenario with moving cars and pedestrians, the correlation of the
MIMO channel vector over time is inspected for vehicles moving up to 29 km/h.
For a 100 antenna system, it is found that the channel state information (CSI)
update rate requirement may increase by 7 times when compared to an 8 antenna
system, whilst the power control update rate could be decreased by at least 5
times relative to a single antenna system.

Beamforming and Power Splitting Designs for AN-aided Secure Multi-user MIMO SWIPT Systems

Zhengyu Zhu, Zheng Chu, Ning Wang, Sai Huang, Zhongyong Wang, Inkyu Lee
Comments: 12 pages, 6 figures, submitted for possible publication
Subjects: Information Theory (cs.IT)

In this paper, an energy harvesting scheme for a multi-user
multiple-input-multiple-output (MIMO) secrecy channel with artificial noise
(AN) transmission is investigated. Joint optimization of the transmit
beamforming matrix, the AN covariance matrix, and the power splitting ratio is
conducted to minimize the transmit power under the target secrecy rate, the
total transmit power, and the harvested energy constraints. The original
problem is shown to be non-convex, which is tackled by a two-layer
decomposition approach. The inner layer problem is solved through semi-definite
relaxation, and the outer problem, on the other hand, is shown to be a single-
variable optimization that can be solved by one-dimensional (1- D) line search.
To reduce computational complexity, a sequential parametric convex
approximation (SPCA) method is proposed to find a near-optimal solution. The
work is then extended to the imperfect channel state information case with
norm-bounded channel errors. Furthermore, tightness of the relaxation for the
proposed schemes are validated by showing that the optimal solution of the
relaxed problem is rank-one. Simulation results demonstrate that the proposed
SPCA method achieves the same performance as the scheme based on 1-D but with
much lower complexity.

Tuning Free Orthogonal Matching Pursuit

Sreejith Kallummil, Sheetal Kalyani
Comments: 13 pages. 9 figures
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT)

Orthogonal matching pursuit (OMP) is a widely used compressive sensing (CS)
algorithm for recovering sparse signals in noisy linear regression models. The
performance of OMP depends on its stopping criteria (SC). SC for OMP discussed
in literature typically assumes knowledge of either the sparsity of the signal
to be estimated (k_0) or noise variance (sigma^2), both of which are
unavailable in many practical applications. In this article we develop a
modified version of OMP called tuning free OMP or TF-OMP which does not require
a SC. TF-OMP is proved to accomplish successful sparse recovery under the usual
assumptions on restricted isometry constants (RIC) and mutual coherence of
design matrix. TF-OMP is numerically shown to deliver a highly competitive
performance in comparison with OMP having extit{a priori} knowledge of (k_0)
or (sigma^2). Greedy algorithm for robust de-noising (GARD) is an OMP like
algorithm proposed for efficient estimation in classical overdetermined linear
regression models corrupted by sparse outliers. However, GARD requires the
knowledge of inlier noise variance which is difficult to estimate. We also
produce a tuning free algorithm (TF-GARD) for efficient estimation in the
presence of sparse outliers by extending the operating principle of TF-OMP to
GARD. TF-GARD is numerically shown to achieve a performance comparable to that
of the existing implementation of GARD.

Tail asymptotics of signal-to-interference ratio distribution in spatial cellular network models

Naoto Miyoshi, Tomoyuki Shirai
Comments: Dedicated to Tomasz Rolski on the occasion of his 70th birthday
Subjects: Probability (math.PR); Information Theory (cs.IT)

We consider a spatial stochastic model of wireless cellular networks, where
the base stations (BSs) are deployed according to a simple and stationary point
process on (mathbb{R}^d), (dge2). In this model, we investigate tail
asymptotics of the distribution of signal-to-interference ratio (SIR), which is
a key quantity in wireless communications. In the case where the path-loss
function representing signal attenuation is unbounded at the origin, we derive
the exact tail asymptotics of the SIR distribution under an appropriate
sufficient condition. While we show that widely-used models based on a Poisson
point process and on a determinantal point process meet the sufficient
condition, we also give a counterexample violating it. In the case of bounded
path-loss functions, we derive a logarithmically asymptotic upper bound on the
SIR tail distribution for the Poisson-based and (alpha)-Ginibre-based models.
A logarithmically asymptotic lower bound with the same order as the upper bound
is also obtained for the Poisson-based model.

Holevo Superadditivity Bounds for Heralded Channels

Li Gao, Marius Junge, Nicholas LaRacuente
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We show that for a particular class of quantum channels, which we call
heralded channels, monogamy of squashed entanglement limits the superadditivity
of Holevo capacity. Heralded channels provide a means to understand the quantum
erasure channel composed with an arbitrary other quantum channel, as well as
common situations in experimental quantum information that involve frequent
loss of qubits or failure of trials. We also show how entanglement monogamy
applies to non-classicality in quantum games, and we consider how any faithful,
monogamous entanglement measure may bound the achievable value of
entanglement-dependent quantities in many-party scenarios.

Towards Optimal Sparse Inverse Covariance Selection through Non-Convex Optimization

Sidhant Misra, Marc Vuffray, Andrey Y. Lokhov, Michael Chertkov
Subjects: Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST)

We study the problem of reconstructing the graph of a sparse Gaussian
Graphical Model from independent observations, which is equivalent to finding
non-zero elements of an inverse covariance matrix. For a model of size (p) and
maximum degree (d), information theoretic lower bounds established in prior
works require that the number of samples needed for recovering the graph
perfectly is at least (d log p/kappa^2), where (kappa) is the minimum
normalized non-zero entry of the inverse covariance matrix. Existing algorithms
require additional assumptions to guarantee perfect graph reconstruction, and
consequently, their sample complexity is dependent on parameters that are not
present in the information theoretic lower bound. We propose an estimator,
called SLICE, that consists of a cardinality constrained least-squares
regression followed by a thresholding procedure. Without any additional
assumptions we show that SLICE attains a sample complexity of
(frac{64}{kappa^4}d log p), which differs from the lower bound by only a
factor proportional to (1/kappa^2) and depends only on parameters present in
the lower bound.

Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models

Marc Vuffray, Sidhant Misra, Andrey Y. Lokhov, Michael Chertkov
Comments: To be published in Advances in Neural Information Processing Systems 30
Subjects: Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)

We consider the problem of learning the underlying graph of an unknown Ising
model on p spins from a collection of i.i.d. samples generated from the model.
We suggest a new estimator that is computationally efficient and requires a
number of samples that is near-optimal with respect to previously established
information-theoretic lower-bound. Our statistical estimator has a physical
interpretation in terms of “interaction screening”. The estimator is consistent
and is efficiently implemented using convex optimization. We prove that with
appropriate regularization, the estimator recovers the underlying graph using a
number of samples that is logarithmic in the system size p and exponential in
the maximum coupling-intensity and maximum node-degree.