IT博客汇 | arXiv Paper Daily: Wed, 24 May 2017

arXiv Paper Daily: Wed, 24 May 2017

我爱机器学习(52ml.net)发表于 2017-05-24 00:00:00

Neural and Evolutionary Computing

Unbiasing Truncated Backpropagation Through Time

Corentin Tallec, Yann Ollivier
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Truncated Backpropagation Through Time (truncated BPTT) is a widespread
method for learning recurrent computational graphs. Truncated BPTT keeps the
computational benefits of Backpropagation Through Time (BPTT) while relieving
the need for a complete backtrack through the whole data sequence at every
step. However, truncation favors short-term dependencies: the gradient estimate
of truncated BPTT is biased, so that it does not benefit from the convergence
guarantees from stochastic gradient theory. We introduce Anticipated Reweighted
Truncated Backpropagation (ARTBP), an algorithm that keeps the computational
benefits of truncated BPTT, while providing unbiasedness. ARTBP works by using
variable truncation lengths together with carefully chosen compensation factors
in the backpropagation equation. We check the viability of ARTBP on two tasks.
First, a simple synthetic task where careful balancing of temporal dependencies
at different scales is needed: truncated BPTT displays unreliable performance,
and in worst case scenarios, divergence, while ARTBP converges reliably.
Second, on Penn Treebank character-level language modelling, ARTBP slightly
outperforms truncated BPTT.

A divide and conquer method for symbolic regression

Changtong Luo, Chen Chen, Zonglin Jiang
Subjects: Neural and Evolutionary Computing (cs.NE)

Symbolic regression aims to find a function that best explains the
relationship between independent variables and the objective value based on a
given set of sample data. Genetic programming (GP) is usually considered as an
appropriate method for the problem since it can optimize functional structure
and coefficients simultaneously. However, the convergence speed of GP might be
too slow for large scale problems that involve a large number of variables.
Fortunately, in many applications, the target function is separable or
partially separable. This feature motivated us to design a new method, divide
and conquer (D&C), for symbolic regression, in which the target function is
divided into a number of sub-functions and the sub-functions are then
determined by any GP algorithms available. The separability is probed by a new
proposed technique, Bi-Correlation test (BiCT). D&C powered GP has been tested
on some real-world applications, and the study shows that D&C can help GP to
get the target function more rapidly and more reliably.

An evolutionary strategy for DeltaE – E identification

Katarzyna Schmidt, Oskar Wyszynski
Subjects: Instrumentation and Detectors (physics.ins-det); Neural and Evolutionary Computing (cs.NE); Software Engineering (cs.SE); Nuclear Experiment (nucl-ex)

In this article we present an automatic method for charge and mass
identification of charged nuclear fragments produced in heavy ion collisions at
intermediate energies. The algorithm combines a generative model of DeltaE – E
relation and a Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES). The
CMA-ES is a stochastic and derivative-free method employed to search parameter
space of the model by means of a fitness function. The article describes
details of the method along with results of an application on simulated labeled
data.

Matching neural paths: transfer from recognition to correspondence search

Nikolay Savinov, Lubor Ladicky, Marc Pollefeys
Comments: Submitted to NIPS 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Many machine learning tasks require finding per-part correspondences between
objects. In this work we focus on low-level correspondences – a highly
ambiguous matching problem. We propose to use a hierarchical semantic
representation of the objects, coming from a convolutional neural network, to
solve this ambiguity. Training it for low-level correspondence prediction
directly might not be an option in some domains where the ground-truth
correspondences are hard to obtain. We show how transfer from recognition can
be used to avoid such training. Our idea is to mark parts as “matching” if
their features are close to each other at all the levels of convolutional
feature hierarchy (neural paths). Although the overall number of such paths is
exponential in the number of layers, we propose a polynomial algorithm for
aggregating all of them in a single backward pass. The empirical validation is
done on the task of stereo correspondence and demonstrates that we achieve
competitive results among the methods which do not use labeled target domain
data.

Sluice networks: Learning what to share between loosely related tasks

Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard
Comments: 10 pages, 3 figures, 5 tables
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Multi-task learning is partly motivated by the observation that humans bring
to bear what they know about related problems when solving new ones. Similarly,
deep neural networks can profit from related tasks by sharing parameters with
other networks. However, humans do not consciously decide to transfer knowledge
between tasks (and are typically not aware of the transfer). In machine
learning, it is hard to estimate if sharing will lead to improvements;
especially if tasks are only loosely related. To overcome this, we introduce
Sluice Networks, a general framework for multi-task learning where trainable
parameters control the amount of sharing — including which parts of the models
to share. Our framework goes beyond and generalizes over previous proposals in
enabling hard or soft sharing of all combinations of subspaces, layers, and
skip connections. We perform experiments on three task pairs from natural
language processing, and across seven different domains, using data from
OntoNotes 5.0, and achieve up to 15% average error reductions over common
approaches to multi-task learning. We analyze when the architecture is
particularly helpful, as well as its ability to fit noise. We show that a)
label entropy is predictive of gains in sluice networks, confirming findings
for hard parameter sharing, and b) while sluice networks easily fit noise, they
are robust across domains in practice.

Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices

Tayfun Gokmen, O. Murat Onen, Wilfried Haensch
Comments: 22 pages, 6 figures, 2 tables
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

In a previous work we have detailed the requirements to obtain a maximal
performance benefit by implementing fully connected deep neural networks (DNN)
in form of arrays of resistive devices for deep learning. This concept of
Resistive Processing Unit (RPU) devices we extend here towards convolutional
neural networks (CNNs). We show how to map the convolutional layers to RPU
arrays such that the parallelism of the hardware can be fully utilized in all
three cycles of the backpropagation algorithm. We find that the noise and bound
limitations imposed due to analog nature of the computations performed on the
arrays effect the training accuracy of the CNNs. Noise and bound management
techniques are presented that mitigate these problems without introducing any
additional complexity in the analog circuits and can be addressed by the
digital circuits. In addition, we discuss digitally programmable update
management and device variability reduction techniques that can be used
selectively for some of the layers in a CNN. We show that combination of all
those techniques enables a successful application of the RPU concept for
training CNNs. The techniques discussed here are more general and can be
applied beyond CNN architectures and therefore enables applicability of RPU
approach for large class of neural network architectures.

pix2code: Generating Code from a Graphical User Interface Screenshot

Tony Beltramelli
Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Transforming a graphical user interface screenshot created by a designer into
computer code is a typical task conducted by a developer in order to build
customized software, websites and mobile applications. In this paper, we show
that Deep Learning techniques can be leveraged to automatically generate code
given a graphical user interface screenshot as input. Our model is able to
generate code targeting three different platforms (i.e. iOS, Android and
web-based technologies) from a single input image with over 77% of accuracy.

Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

Chris Donahue, Akshay Balsubramani, Julian McAuley, Zachary C. Lipton
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We propose a new algorithm for training generative adversarial networks to
jointly learn latent codes for both identities (e.g. individual humans) and
observations (e.g. specific photographs). In practice, this means that by
fixing the identity portion of latent codes, we can generate diverse images of
the same subject, and by fixing the observation portion we can traverse the
manifold of subjects while maintaining contingent aspects such as lighting and
pose. Our algorithm features a pairwise training scheme in which each sample
from the generator consists of two images with a common identity code.
Corresponding samples from the real dataset consist of two distinct photographs
of the same subject. In order to fool the discriminator, the generator must
produce images that are both photorealistic, distinct, and appear to depict the
same person. We augment both the DCGAN and BEGAN approaches with Siamese
discriminators to accommodate pairwise training. Experiments with human judges
and an off-the-shelf face verification system demonstrate our algorithm’s
ability to generate convincing, identity-matched photographs.

Computer Vision and Pattern Recognition

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Chunhui Gu, Chen Sun, Sudheendra Vijayanarasimhan, Caroline Pantofaru, David A. Ross, George Toderici, Yeqing Li, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik
Comments: 15 pages, ICCV 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a video dataset of spatio-temporally localized Atomic
Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual
actions in 64k movie clips with actions localized in space and time, resulting
in 197k action labels with multiple labels per human occurring frequently. The
main differences with existing video datasets are: (1) the definition of atomic
visual actions, which avoids collecting data for each and every complex action;
(2) precise spatio-temporal annotations with possibly multiple annotations for
each human; (3) the use of diverse, realistic video material (movies). This
departs from existing datasets for spatio-temporal action recognition, such as
JHMDB and UCF datasets, which provide annotations for at most 24 composite
actions, such as basketball dunk, captured in specific environments, i.e.,
basketball court.

We implement a state-of-the-art approach for action localization. Despite
this, the performance on our dataset remains low and underscores the need for
developing new approaches for video understanding. The AVA dataset is the first
step in this direction, and enables the measurement of performance and progress
in realistic scenarios.

Classification of Aerial Photogrammetric 3D Point Clouds

Carlos Becker, Nicolai Häni, Elena Rosinskaya, Emmanuel d'Angelo, Christoph Strecha
Comments: ISPRS 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a powerful method to extract per-point semantic class labels from
aerialphotogrammetry data. Labeling this kind of data is important for tasks
such as environmental modelling, object classification and scene understanding.
Unlike previous point cloud classification methods that rely exclusively on
geometric features, we show that incorporating color information yields a
significant increase in accuracy in detecting semantic classes. We test our
classification method on three real-world photogrammetry datasets that were
generated with Pix4Dmapper Pro, and with varying point densities. We show that
off-the-shelf machine learning techniques coupled with our new features allow
us to train highly accurate classifiers that generalize well to unseen data,
processing point clouds containing 10 million points in less than 3 minutes on
a desktop computer.

Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues

Talha Qaiser, Abhik Mukherjee, Chaitanya Reddy Pb, Sai Dileep Munugoti, Vamsi Tallam, Tomi Pitkäaho, Taina Lehtimäki, Thomas Naughton, Matt Berseth, Aníbal Pedraza, Ramakrishnan Mukundan, Matthew Smith, Abhir Bhalerao, Erik Rodner, Marcel Simon, Joachim Denzler, Chao-Hui Huang, Gloria Bueno David Snead, Ian Ellis, Mohammad Ilyas, Nasir Rajpoot
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Evaluating expression of the Human epidermal growth factor receptor 2 (Her2)
by visual examination of immunohistochemistry (IHC) on invasive breast cancer
(BCa) is a key part of the diagnostic assessment of BCa due to its recognised
importance as a predictive and prognostic marker in clinical practice. However,
visual scoring of Her2 is subjective and consequently prone to inter-observer
variability. Given the prognostic and therapeutic implications of Her2 scoring,
a more objective method is required. In this paper, we report on a recent
automated Her2 scoring contest, held in conjunction with the annual PathSoc
meeting held in Nottingham in June 2016, aimed at systematically comparing and
advancing the state-of-the-art Artificial Intelligence (AI) based automated
methods for Her2 scoring. The contest dataset comprised of digitised whole
slide images (WSI) of sections from 86 cases of invasive breast carcinoma
stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting
algorithms automatically predicted scores of the IHC slides for an unseen
subset of the dataset and the predicted scores were compared with the ‘ground
truth’ (a consensus score from at least two experts). We also report on a
simple Man vs Machine contest for the scoring of Her2 and show that the
automated methods could beat the pathology experts on this contest dataset.
This paper presents a benchmark for comparing the performance of automated
algorithms for scoring of Her2. It also demonstrates the enormous potential of
automated algorithms in assisting the pathologist with objective IHC scoring.

Improvements to Frank-Wolfe optimization for multi-detector multi-object tracking

Roberto Henschel, Laura Leal-Taixé, Daniel Cremers, Bodo Rosenhahn
Comments: 13 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel formulation for the multi-object
tracking-by-detection paradigm for two (or more) input detectors. Using
full-body and heads detections, the fusion helps to recover heavily occluded
persons and to reduce false positives. The assignment of the two input features
to a person and the extraction of the trajectories is commonly solved from one
binary quadratic program (BQP). Due to the computational complexity of the
NP-hard QP, we approximate the solution using the Frank-Wolfe algorithm. We
propose several improvements to this solver affecting better minimization and
shorter computations, compared to off-the-shelf BQP-solvers and the standard
Frank-Wolfe algorithm. Evaluation on pedestrian tracking is provided for
multiple scenarios, showing improved tracking quality over single input feature
trackers and standard QP-solvers. Finally we present the performance of our
tracker on the challenging MOTNEW benchmark, being comparable to
state-of-the-art trackers.

Anatomically Constrained Neural Networks (ACNN): Application to Cardiac Image Enhancement and Segmentation

Ozan Oktay, Enzo Ferrante, Konstantinos Kamnitsas, Mattias Heinrich, Wenjia Bai, Jose Caballero, Ricardo Guerrero, Stuart Cook, Antonio de Marvao, Declan O'Regan, Bernhard Kainz, Ben Glocker, Daniel Rueckert
Comments: Submission to IEEE Transactions on Medical Imaging / May’17
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Incorporation of prior knowledge about organ shape and location is key to
improve performance of image analysis approaches. In particular, priors can be
useful in cases where images are corrupted and contain artefacts due to
limitations in image acquisition. The highly constrained nature of anatomical
objects can be well captured with learning based techniques. However, in most
recent and promising techniques such as CNN based segmentation it is not
obvious how to incorporate such prior knowledge. State-of-the-art methods
operate as pixel-wise classifiers where the training objectives do not
incorporate the structure and inter-dependencies of the output. To overcome
this limitation, we propose a generic training strategy that incorporates
anatomical prior knowledge into CNNs through a new regularisation model, which
is trained end-to-end. The new framework encourages models to follow the global
anatomical properties of the underlying anatomy (e.g. shape, label structure)
via learned non-linear representations of the shape. We show that the proposed
approach can be easily adapted to different analysis tasks (e.g. image
enhancement, segmentation) and improve the prediction accuracy of the
state-of-the-art models. The applicability of our approach is shown on
multi-modal cardiac datasets and public benchmarks. Additionally, we
demonstrate how the learned deep models of 3D shapes can be interpreted and
used as biomarkers for classification of cardiac pathologies.

An Invariant Model of the Significance of Different Body Parts in Recognizing Different Actions

Yuping Shen, Hassan Foroosh
Comments: arXiv admin note: substantial text overlap with arXiv:1705.04641, arXiv:1705.05741, arXiv:1705.04433
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we show that different body parts do not play equally
important roles in recognizing a human action in video data. We investigate to
what extent a body part plays a role in recognition of different actions and
hence propose a generic method of assigning weights to different body points.
The approach is inspired by the strong evidence in the applied perception
community that humans perform recognition in a foveated manner, that is they
recognize events or objects by only focusing on visually significant aspects.
An important contribution of our method is that the computation of the weights
assigned to body parts is invariant to viewing directions and camera parameters
in the input data. We have performed extensive experiments to validate the
proposed approach and demonstrate its significance. In particular, results show
that considerable improvement in performance is gained by taking into account
the relative importance of different body parts as defined by our approach.

How hard can it be? Estimating the difficulty of visual search in an image

Radu Tudor Ionescu, Bogdan Alexe, Marius Leordeanu, Marius Popescu, Dim P. Papadopoulos, Vittorio Ferrari
Comments: Published at CVPR 2016
Journal-ref: In Proceedings of CVPR, pp. 2157-2166, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address the problem of estimating image difficulty defined as the human
response time for solving a visual search task. We collect human annotations of
image difficulty for the PASCAL VOC 2012 data set through a crowd-sourcing
platform. We then analyze what human interpretable image properties can have an
impact on visual search difficulty, and how accurate are those properties for
predicting difficulty. Next, we build a regression model based on deep features
learned with state of the art convolutional neural networks and show better
results for predicting the ground-truth visual search difficulty scores
produced by human annotators. Our model is able to correctly rank about 75%
image pairs according to their difficulty score. We also show that our
difficulty predictor generalizes well to new classes not seen during training.
Finally, we demonstrate that our predicted difficulty scores are useful for
weakly supervised object localization (8% improvement) and semi-supervised
object classification (1% improvement).

A New 3D Segmentation Technique for QCT Scans of the Lumbar Spine to Determine BMD and Vertebral Geometry

Andre Mastmeyer, Klaus Engelke, Christina Fuchs, Willi Kalender
Comments: 2 pages, 2 figures, International Congress of Medical Physics 2005
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Quantitative computed tomography (QCT) is a standard method to determine bone
mineral density (BMD) in the spine. Traditionally single 8 – 10 mm thick slices
have been analyzed only. Current spiral CT scanners provide true 3D acquisition
schemes allowing for a more differential BMD analysis and an assessment of
geometric parameters, which may improve fracture prediction. We developed a
novel 3D segmentation approach that combines deformable balloons, multi seeded
volume growing, and dedicated morphological operations to extract the vertebral
bodies. An anatomy-oriented coordinate system attached automatically to each
vertebra is used to define volumes of interest. We analyzed intra-operator
precision of the segmentation procedure using abdominal scans from 10 patients
(60 mAs, 120 kV, slice thickness 1mm, B40s, Siemens Sensation 16). Our new
segmentation method shows excellent precision errors in the order of < 1 % for
BMD and < 2 % for volume.

Matching neural paths: transfer from recognition to correspondence search

Many machine learning tasks require finding per-part correspondences between
objects. In this work we focus on low-level correspondences – a highly
ambiguous matching problem. We propose to use a hierarchical semantic
representation of the objects, coming from a convolutional neural network, to
solve this ambiguity. Training it for low-level correspondence prediction
directly might not be an option in some domains where the ground-truth
correspondences are hard to obtain. We show how transfer from recognition can
be used to avoid such training. Our idea is to mark parts as “matching” if
their features are close to each other at all the levels of convolutional
feature hierarchy (neural paths). Although the overall number of such paths is
exponential in the number of layers, we propose a polynomial algorithm for
aggregating all of them in a single backward pass. The empirical validation is
done on the task of stereo correspondence and demonstrates that we achieve
competitive results among the methods which do not use labeled target domain
data.

Accelerating Discrete Wavelet Transforms on GPUs

David Barina, Michal Kula, Michal Matysek, Pavel Zemcik
Comments: preprint submitted to ICIP 2017. arXiv admin note: substantial text overlap with arXiv:1704.08657
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

The two-dimensional discrete wavelet transform has a huge number of
applications in image-processing techniques. Until now, several papers compared
the performance of such transform on graphics processing units (GPUs). However,
all of them only dealt with lifting and convolution computation schemes. In
this paper, we show that corresponding horizontal and vertical lifting parts of
the lifting scheme can be merged into non-separable lifting units, which halves
the number of steps. We also discuss an optimization strategy leading to a
reduction in the number of arithmetic operations. The schemes were assessed
using the OpenCL and pixel shaders. The proposed non-separable lifting scheme
outperforms the existing schemes in many cases, irrespective of its higher
complexity.

Isomorphism between Differential and Moment Invariants under Affine Transform

Erbo Li, Hua Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The invariant is one of central topics in science, technology and
engineering. The differential invariant is essential in understanding or
describing some important phenomena or procedures in mathematics, physics,
chemistry, biology or computer science etc. The derivation of differential
invariants is usually difficult or complicated. This paper reports a discovery
that under the affine transform, differential invariants have similar
structures with moment invariants up to a scalar function of transform
parameters. If moment invariants are known, relative differential invariants
can be obtained by the substitution of moments by derivatives with the same
order. Whereas moment invariants can be calculated by multiple integrals, this
method provides a simple way to derive differential invariants without the need
to resolve any equation system. Since the definition of moments on different
manifolds or in different dimension of spaces is well established, differential
invariants on or in them will also be well defined. Considering that moments
have a strong background in mathematics and physics, this technique offers a
new view angle to the inner structure of invariants. Projective differential
invariants can also be found in this way with a screening process.

Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery

Menglong Ye, Edward Johns, Ankur Handa, Lin Zhang, Philip Pratt, Guang-Zhong Yang
Comments: A two-page short report to be presented at the Hamlyn Symposium on Medical Robotics 2017. An extension of this work is on progress
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Robotic surgery has become a powerful tool for performing minimally invasive
procedures, providing advantages in dexterity, precision, and 3D vision, over
traditional surgery. One popular robotic system is the da Vinci surgical
platform, which allows preoperative information to be incorporated into live
procedures using Augmented Reality (AR). Scene depth estimation is a
prerequisite for AR, as accurate registration requires 3D correspondences
between preoperative and intraoperative organ models. In the past decade, there
has been much progress on depth estimation for surgical scenes, such as using
monocular or binocular laparoscopes [1,2]. More recently, advances in deep
learning have enabled depth estimation via Convolutional Neural Networks (CNNs)
[3], but training requires a large image dataset with ground truth depths.
Inspired by [4], we propose a deep learning framework for surgical scene depth
estimation using self-supervision for scalable data acquisition. Our framework
consists of an autoencoder for depth prediction, and a differentiable spatial
transformer for training the autoencoder on stereo image pairs without ground
truth depths. Validation was conducted on stereo videos collected in robotic
partial nephrectomy.

Distributed Algorithms for Feature Extraction Off-loading in Multi-Camera Visual Sensor Networks

Emil Eriksson, György Dán, Viktoria Fodor
Comments: 12 pages, 7 figures, submitted to Transactions on Circuits and Systems for Video Technology
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Real-time visual analysis tasks, like tracking and recognition, require swift
execution of computationally intensive algorithms. Visual sensor networks can
be enabled to perform such tasks by augmenting the sensor network with
processing nodes and distributing the computational burden in a way that the
cameras contend for the processing nodes while trying to minimize their task
completion times. In this paper, we formulate the problem of minimizing the
completion time of all camera sensors as an optimization problem. We propose
algorithms for fully distributed optimization, analyze the existence of
equilibrium allocations, evaluate the effect of the network topology and of the
video characteristics, and the benefits of central coordination. Our results
demonstrate that with sufficient information available, distributed
optimization can provide low completion times, moreover predictable and stable
performance can be achieved with additional, sparse central coordination.

On the mathematics of beauty: beautiful images

Abdullah Khalili
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we will study the simplest kind of beauty that can be found in
a simple visual pattern and can be appreciated universally. The proposed model
suggest that there is a link between beautiful pattern and the Dyson
distribution, which seems to be the result of a deeper optimisation process
between randomness and regularity. Then we show that beautiful patterns need to
satisfy a more fundamental law that seeks to deliver the highest amount of
information using the least amount of energy. The proposed model is used to
classify and generate beautiful patterns.

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

Michael Gygli
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Shot boundary detection (SBD) is an important component of many video
analysis tasks, such as action recognition, video indexing, summarization and
editing. Previous work typically used a combination of low-level features like
color histograms, in conjunction with simple models such as SVMs. Instead, we
propose to learn shot detection end-to-end, from pixels to final shot
boundaries. For training such a model, we rely on our insight that all shot
boundaries are generated. Thus, we create a dataset with one million frames and
automatically generated transitions such as cuts, dissolves and fades. In order
to efficiently analyze hours of videos, we propose a Convolutional Neural
Network (CNN) which is fully convolutional in time, thus allowing to use a
large temporal context without the need to repeatedly processing frames. With
this architecture our method obtains state-of-the-art results while running at
an unprecedented speed of more than 120x real-time.

Salient Object Detection with Semantic Priors

Tam V. Nguyen, Luoqi Liu
Comments: accepted to IJCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Salient object detection has increasingly become a popular topic in cognitive
and computational sciences, including computer vision and artificial
intelligence research. In this paper, we propose integrating extit{semantic
priors} into the salient object detection process. Our algorithm consists of
three basic steps. Firstly, the explicit saliency map is obtained based on the
semantic segmentation refined by the explicit saliency priors learned from the
data. Next, the implicit saliency map is computed based on a trained model
which maps the implicit saliency priors embedded into regional features with
the saliency values. Finally, the explicit semantic map and the implicit map
are adaptively fused to form a pixel-accurate saliency map which uniformly
covers the objects of interest. We further evaluate the proposed framework on
two challenging datasets, namely, ECSSD and HKUIS. The extensive experimental
results demonstrate that our method outperforms other state-of-the-art methods.

Unmasking the abnormal events in video

Radu Tudor Ionescu, Sorina Smeureanu, Bogdan Alexe, Marius Popescu
Comments: Submitted at a conference for review
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a novel framework for abnormal event detection in video that
requires no training sequences. Our framework is based on unmasking, a
technique previously used for authorship verification in text documents, which
we adapt to our task. We iteratively train a binary classifier to distinguish
between two consecutive video sequences while removing at each step the most
discriminant features. Higher training accuracy rates of the intermediately
obtained classifiers represent abnormal events. To the best of our knowledge,
this is the first work to apply unmasking for a computer vision task. We
compare our method with several state-of-the-art supervised and unsupervised
methods on four benchmark data sets. The empirical results indicate that our
abnormal event detection framework can achieve state-of-the-art results, while
running in real-time at 20 frames per second.

Correlation Alignment by Riemannian Metric for Domain Adaptation

Pietro Morerio, Vittorio Murino
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Domain adaptation techniques address the problem of reducing the sensitivity
of machine learning methods to the so-called domain shift, namely the
difference between source (training) and target (test) data distributions. In
particular, unsupervised domain adaptation assumes no labels are available in
the target domain. To this end, aligning second order statistics (covariances)
of target and source domains have proven to be an effective approach ti fill
the gap between the domains. However, covariance matrices do not form a
subspace of the Euclidean space, but live in a Riemannian manifold with
non-positive curvature, making the usual Euclidean metric suboptimal to measure
distances. In this paper, we extend the idea of training a neural network with
a constraint on the covariances of the hidden layer features, by rigorously
accounting for the curved structure of the manifold of symmetric positive
definite matrices. The resulting loss function exploits a theoretically sound
geodesic distance on such manifold. Results show indeed the suboptimal nature
of the Euclidean distance. This makes us able to perform better than previous
approaches on the standard Office dataset, a benchmark for domain adaptation
techniques.

Look, Listen and Learn

Relja Arandjelović, Andrew Zisserman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We consider the question: what can be learnt by looking at and listening to a
large amount of unlabelled videos? There is a valuable, but so far untapped,
source of information contained in the video itself — the correspondence
between the visual and the audio streams, and we introduce a novel
“Audio-Visual Correspondence” learning task that makes use of this. Training
visual and audio networks from scratch, without any additional supervision
other than the raw unconstrained videos themselves, is shown to successfully
solve this task, and, more interestingly, result in good vision and audio
representations. These features set the new state-of-the-art on two sound
classification benchmarks, and perform on par with the state-of-the-art
self-supervised approaches on ImageNet classification. We also demonstrate that
the network is able to localize objects in both modalities, as well as perform
fine-grained recognition tasks.

A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data

Benjamin Gutierrez, Loic Peter, Tassilo Klein, Christian Wachinger
Comments: MICCAI 2017 Proceedings
Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the availability of big medical image data, the selection of an adequate
training set is becoming more important to address the heterogeneity of
different datasets. Simply including all the data does not only incur high
processing costs but can even harm the prediction. We formulate the smart and
efficient selection of a training dataset from big medical image data as a
multi-armed bandit problem, solved by Thompson sampling. Our method assumes
that image features are not available at the time of the selection of the
samples, and therefore relies only on meta information associated with the
images. Our strategy simultaneously exploits data sources with high chances of
yielding useful samples and explores new data regions. For our evaluation, we
focus on the application of estimating the age from a brain MRI. Our results on
7,250 subjects from 10 datasets show that our approach leads to higher accuracy
while only requiring a fraction of the training data.

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

Hong Liu, Juanhui Tu, Mengyuan Liu
Comments: 5 pages, 6 figures, 3 tabels
Subjects: Computer Vision and Pattern Recognition (cs.CV)

It remains a challenge to efficiently extract spatialtemporal data from
skeleton sequences for 3D human action recognition. Since 3D convolutional
neural network(3DCNN) is a powerful tool to simultaneously learn features from
both spatial and temporal dimensions, in this paper, we propose a two-stream
model using 3DCNN. To our best knowledge, this is the first application of
3DCNN to action recognition based on skeleton sequences. It contains three
stages. First, skeleton joint positions are encoded as spatial value and time
value. Second, 3DCNN models are respectively adopted to extract deep features
from two streams. Third, to enhance the ability of deep features to capture
global relationships, we extend every stream into multi-temporal version.
Extensive experiments on the current largest benchmark NTU RGB-D dataset
demonstrate that spatial-temporal modeling can reinforce each other and the
superiority of our method over the state-of-the-art 3DCNN-based action
recognition approaches.

Towards seamless multi-view scene analysis from satellite to street-level

Sébastien Lefèvre, Devis Tuia, Jan Dirk Wegner, Timothée Produit, Ahmed Samy Nassar
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we discuss and review how combined multi-view imagery from
satellite to street-level can benefit scene analysis. Numerous works exist that
merge information from remote sensing and images acquired from the ground for
tasks like land cover mapping, object detection, or scene understanding. What
makes the combination of overhead and street-level images challenging, is the
strongly varying viewpoint, different scale, illumination, sensor modality and
time of acquisition. Direct (dense) matching of images on a per-pixel basis is
thus often impossible, and one has to resort to alternative strategies that
will be discussed in this paper. We review recent works that attempt to combine
images taken from the ground and overhead views for purposes like scene
registration, reconstruction, or classification. Three methods that represent
the wide range of potential methods and applications (change detection, image
orientation, and tree cataloging) are described in detail. We show that
cross-fertilization between remote sensing, computer vision and machine
learning is very valuable to make the best of geographic data available from
Earth Observation sensors and ground imagery. Despite its challenges, we
believe that integrating these complementary data sources will lead to major
breakthroughs in Big GeoData.

Universal Style Transfer via Feature Transforms

Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Universal style transfer aims to transfer any arbitrary visual styles to
content images. Existing feed-forward based methods, while enjoying the
inference efficiency, are mainly limited by inability of generalizing to unseen
styles or compromised visual quality. In this paper, we present a simple yet
effective method that tackles these limitations without training on any
pre-defined styles. The key ingredient of our method is a pair of feature
transforms, whitening and coloring, that are embedded to an image
reconstruction network. The whitening and coloring transforms reflect a direct
matching of feature covariance of the content image to a given style image,
which shares similar spirits with the optimization of Gram matrix based cost in
neural style transfer. We demonstrate the effectiveness of our algorithm by
generating high-quality stylized images with comparisons to a number of recent
methods. We also analyze our method by visualizing the whitened features and
synthesizing textures via simple feature coloring.

Visual Semantic Planning using Deep Successor Representations

Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

A crucial capability of real-world intelligent agents is their ability to
plan a sequence of actions to achieve their goals in the visual world. In this
work, we address the problem of visual semantic planning: the task of
predicting a sequence of actions from visual observations that transform a
dynamic environment from an initial state to a goal state. Doing so entails
knowledge about objects and their affordances, as well as actions and their
preconditions and effects. We propose learning these through interacting with a
visual and dynamic environment. Our proposed solution involves bootstrapping
reinforcement learning with imitation learning. To ensure cross-task
generalization, we develop a deep predictive model based on successor
representations. Our experimental results show near optimal results across a
wide range of tasks in the challenging THOR environment. The supplementary
video can be accessed at the following link: this https URL

Patchnet: Interpretable Neural Networks for Image Classification

Adityanarayanan Radhakrishnan, Charles Durham, Ali Soylemezoglu, Caroline Uhler
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The ability to visually understand and interpret learned features from
complex predictive models is crucial for their acceptance in sensitive areas
such as health care. To move closer to this goal of truly interpretable complex
models, we present PatchNet, a network that restricts global context for image
classification tasks in order to easily provide visual representations of
learned texture features on a predetermined local scale. We demonstrate how
PatchNet provides visual heatmap representations of the learned features, and
we mathematically analyze the behavior of the network during convergence. We
also present a version of PatchNet that is particularly well suited for
lowering false positive rates in image classification tasks. We apply PatchNet
to the classification of textures from the Describable Textures Dataset and to
the ISBI-ISIC 2016 melanoma classification challenge.

Multiple Images Recovery Using a Single Affine Transformation

Bo Jiang, Chris Ding, Bin Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In many real-world applications, image data often come with noises,
corruptions or large errors. One approach to deal with noise image data is to
use data recovery techniques which aim to recover the true uncorrupted signals
from the observed noise images. In this paper, we first introduce a novel
corruption recovery transformation (CRT) model which aims to recover multiple
(or a collection of) corrupted images using a single affine transformation.
Then, we show that the introduced CRT can be efficiently constructed through
learning from training data. Once CRT is learned, we can recover the true
signals from the new incoming/test corrupted images explicitly. As an
application, we apply our CRT to image recognition task. Experimental results
on six image datasets demonstrate that the proposed CRT model is effective in
recovering noise image data and thus leads to better recognition results.

Learning multiple visual domains with residual adapters

Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

There is a growing interest in learning data representations that work well
for many different types of problems and data. In this paper, we look in
particular at the task of learning a single visual representation that can be
successfully utilized in the analysis of very different types of images, from
dog breeds to stop signs and digits. Inspired by recent work on learning
networks that predict the parameters of another, we develop a tunable deep
network architecture that, by means of adapter residual modules, can be steered
on the fly to diverse visual domains. Our method achieves a high degree of
parameter sharing while maintaining or even improving the accuracy of
domain-specific representations. We also introduce the Visual Decathlon
Challenge, a benchmark that evaluates the ability of representations to capture
simultaneously ten very different visual domains and measures their ability to
recognize well uniformly.

Unrolled Optimization with Deep Priors

Steven Diamond, Vincent Sitzmann, Felix Heide, Gordon Wetzstein
Comments: First two authors contributed equally
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A broad class of problems at the core of computational imaging, sensing, and
low-level computer vision reduces to the inverse problem of extracting latent
images that follow a prior distribution, from measurements taken under a known
physical image formation model. Traditionally, hand-crafted priors along with
iterative optimization methods have been used to solve such problems. In this
paper we present unrolled optimization with deep priors, a principled framework
for infusing knowledge of the image formation into deep networks that solve
inverse problems in imaging, inspired by classical iterative methods. We show
that instances of the framework outperform the state-of-the-art by a
substantial margin for a wide variety of imaging problems, such as denoising,
deblurring, and compressed sensing magnetic resonance imaging (MRI). Moreover,
we conduct experiments that explain how the framework is best used and why it
outperforms previous methods.

Training with Confusion for Fine-Grained Visual Classification

Abhimanyu Dubey, Otkrist Gupta, Pei Guo, Ramesh Raskar, Ryan Farrell, Nikhil Naik
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Research in Fine-Grained Visual Classification has focused on tackling the
variations in pose, lighting, and viewpoint using sophisticated localization
and segmentation techniques, and the usage of robust texture features to
improve performance. In this work, we look at the fundamental optimization of
neural network training for fine-grained classification tasks with minimal
inter-class variance, and attempt to learn features with increased
generalization to prevent overfitting. We introduce Training-with-Confusion, an
optimization procedure for fine-grained classification tasks that regularizes
training by introducing confusion in activations. Our method can be generalized
to any fine-tuning task; it is robust to the presence of small training sets
and label noise; and adds no overhead to the prediction time. We find that
Training-with-Confusion improves the state-of-the-art on all major fine-grained
classification datasets.

GP-Unet: Lesion Detection from Weak Labels with a 3D Regression Network

Florian Dubost, Gerda Bortsova, Hieab Adams, Arfan Ikram, Wiro Niessen, Meike Vernooij, Marleen De Bruijne
Comments: Article accepted in MICCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a novel convolutional neural network for lesion detection from
weak labels. Only a single, global label per image – the lesion count – is
needed for training. We train a regression network with a fully convolutional
architecture combined with a global pooling layer to aggregate the 3D output
into a scalar indicating the lesion count. When testing on unseen images, we
first run the network to estimate the number of lesions. Then we remove the
global pooling layer to compute localization maps of the size of the input
image. We evaluate the proposed network on the detection of enlarged
perivascular spaces in the basal ganglia in MRI. Our method achieves a
sensitivity of 62% with on average 1.5 false positives per image. Compared with
four other approaches based on intensity thresholding, saliency and class maps,
our method has a 20% higher sensitivity.

Universal 3D Wearable Fingerprint Targets: Advancing Fingerprint Reader Evaluations

Joshua J. Engelsma, Sunpreet S. Arora, Anil K. Jain, Nicholas G. Paulter Jr
Comments: 14 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present the design and manufacturing of high fidelity universal 3D
fingerprint targets, which can be imaged on a variety of fingerprint sensing
technologies, namely capacitive, contact-optical, and contactless-optical.
Universal 3D fingerprint targets enable, for the first time, not only a
repeatable and controlled evaluation of fingerprint readers, but also the
ability to conduct fingerprint reader interoperability studies. Fingerprint
reader interoperability refers to how robust fingerprint recognition systems
are to variations in the images acquired by different types of fingerprint
readers. To build universal 3D fingerprint targets, we adopt a molding and
casting framework consisting of (i) digital mapping of fingerprint images to a
negative mold, (ii) CAD modeling a scaffolding system to hold the negative
mold, (iii) fabricating the mold and scaffolding system with a high resolution
3D printer, (iv) producing or mixing a material with similar electrical,
optical, and mechanical properties to that of the human finger, and (v)
fabricating a 3D fingerprint target using controlled casting. Our experiments
conducted with PIV and Appendix F certified optical (contact and contactless)
and capacitive fingerprint readers demonstrate the usefulness of universal 3D
fingerprint targets for controlled and repeatable fingerprint reader
evaluations and also fingerprint reader interoperability studies.

Better Text Understanding Through Image-To-Text Transfer

Karol Kurach, Sylvain Gelly, Michal Jastrzebski, Philip Haeusser, Olivier Teytaud, Damien Vincent, Olivier Bousquet
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Generic text embeddings are successfully used in a variety of tasks. However,
they are often learnt by capturing the co-occurrence structure from pure text
corpora, resulting in limitations of their ability to generalize. In this
paper, we explore models that incorporate visual information into the text
representation. Based on comprehensive ablation studies, we propose a
conceptually simple, yet well performing architecture. It outperforms previous
multimodal approaches on a set of well established benchmarks. We also improve
the state-of-the-art results for image-related text datasets, using orders of
magnitude less data.

pix2code: Generating Code from a Graphical User Interface Screenshot

Transforming a graphical user interface screenshot created by a designer into
computer code is a typical task conducted by a developer in order to build
customized software, websites and mobile applications. In this paper, we show
that Deep Learning techniques can be leveraged to automatically generate code
given a graphical user interface screenshot as input. Our model is able to
generate code targeting three different platforms (i.e. iOS, Android and
web-based technologies) from a single input image with over 77% of accuracy.

Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

We propose a new algorithm for training generative adversarial networks to
jointly learn latent codes for both identities (e.g. individual humans) and
observations (e.g. specific photographs). In practice, this means that by
fixing the identity portion of latent codes, we can generate diverse images of
the same subject, and by fixing the observation portion we can traverse the
manifold of subjects while maintaining contingent aspects such as lighting and
pose. Our algorithm features a pairwise training scheme in which each sample
from the generator consists of two images with a common identity code.
Corresponding samples from the real dataset consist of two distinct photographs
of the same subject. In order to fool the discriminator, the generator must
produce images that are both photorealistic, distinct, and appear to depict the
same person. We augment both the DCGAN and BEGAN approaches with Siamese
discriminators to accommodate pairwise training. Experiments with human judges
and an off-the-shelf face verification system demonstrate our algorithm’s
ability to generate convincing, identity-matched photographs.

Artificial Intelligence

Knowledge Acquisition, Representation & Manipulation in Decision Support Systems

M.Michalewicz, S.T.Wierzchoń, M.A. Kłopotek
Comments: Intelligent Information Systems Proceedings of a Workshop held in August’ow, Poland, 7-11 June, 1993, pages 210- 238
Subjects: Artificial Intelligence (cs.AI)

In this paper we present a methodology and discuss some implementation issues
for a project on statistical/expert approach to data analysis and knowledge
acquisition. We discuss some general assumptions underlying the project.
Further, the requirements for a user-friendly computer assistant are specified
along with the nature of tools aiding the researcher. Next we show some aspects
of belief network approach and Dempster-Shafer (DST) methodology introduced in
practice to system SEAD. Specifically we present the application of DS
methodology to belief revision problem. Further a concept of an interface to
probabilistic and DS belief networks enabling a user to understand the
communication with a belief network based reasoning system is presented

Thinking Fast and Slow with Deep Learning and Tree Search

Thomas Anthony, Zheng Tian, David Barber
Subjects: Artificial Intelligence (cs.AI)

Solving sequential decision making problems, such as text parsing, robotic
control, and game playing, requires a combination of planning policies and
generalisation of those plans. In this paper, we present Expert Iteration, a
novel algorithm which decomposes the problem into separate planning and
generalisation tasks. Planning new policies is performed by tree search, while
a deep neural network generalises those plans. In contrast, standard Deep
Reinforcement Learning algorithms rely on a neural network not only to
generalise plans, but to discover them too. We show that our method
substantially outperforms Policy Gradients in the board game Hex, winning 84.4%
of games against it when trained for equal time.

Reinforcement Learning with a Corrupted Reward Channel

Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg
Comments: A shorter version of this report was accepted to IJCAI 2017 AI and Autonomy track
Subjects: Artificial Intelligence (cs.AI)

No real-world reward function is perfect. Sensory errors and software bugs
may result in RL agents observing higher (or lower) rewards than they should.
For example, a reinforcement learning agent may prefer states where a sensory
error gives it the maximum reward, but where the true reward is actually small.
We formalise this problem as a generalised Markov Decision Problem called
Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under
strong simplifying assumptions and when trying to compensate for the possibly
corrupt rewards. Two ways around the problem are investigated. First, by giving
the agent richer data, such as in inverse reinforcement learning and
semi-supervised reinforcement learning, reward corruption stemming from
systematic sensory errors may sometimes be completely managed. Second, by using
randomisation to blunt the agent’s optimisation, reward corruption can be
partially managed under some assumptions.

Explaining Transition Systems through Program Induction

Svetlin Penkov, Subramanian Ramamoorthy
Subjects: Artificial Intelligence (cs.AI)

Explaining and reasoning about processes which underlie observed black-box
phenomena enables the discovery of causal mechanisms, derivation of suitable
abstract representations and the formulation of more robust predictions. We
propose to learn high level functional programs in order to represent abstract
models which capture the invariant structure in the observed data. We introduce
the (pi)-machine (program-induction machine) — an architecture able to induce
interpretable LISP-like programs from observed data traces. We propose an
optimisation procedure for program learning based on backpropagation, gradient
descent and A* search. We apply the proposed method to three problems: system
identification of dynamical systems, explaining the behaviour of a DQN agent
and learning by demonstration in a human-robot interaction scenario. Our
experimental results show that the (pi)-machine can efficiently induce
interpretable programs from individual data traces.

Enhanced Experience Replay Generation for Efficient Reinforcement Learning

Vincent Huang, Tobias Ley, Martha Vlachou-Konchylaki, Wenfeng Hu
Subjects: Artificial Intelligence (cs.AI)

Applying deep reinforcement learning (RL) on real systems suffers from slow
data sampling. We propose an enhanced generative adversarial network (EGAN) to
initialize an RL agent in order to achieve faster learning. The EGAN utilizes
the relation between states and actions to enhance the quality of data samples
generated by a GAN. Pre-training the agent with the EGAN shows a steeper
learning curve with a 20% improvement of training time in the beginning of
learning, compared to no pre-training, and an improvement compared to training
with GAN by about 5% with smaller variations. For real time systems with sparse
and slow data sampling the EGAN could be used to speed up the early phases of
the training process.

XOR-Sampling for Network Design with Correlated Stochastic Events

Xiaojian Wu, Yexiang Xue, Bart Selman, Carla P. Gomes
Comments: In Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
Subjects: Artificial Intelligence (cs.AI)

Many network optimization problems can be formulated as stochastic network
design problems in which edges are present or absent stochastically.
Furthermore, protective actions can guarantee that edges will remain present.
We consider the problem of finding the optimal protection strategy under a
budget limit in order to maximize some connectivity measurements of the
network. Previous approaches rely on the assumption that edges are independent.
In this paper, we consider a more realistic setting where multiple edges are
not independent due to natural disasters or regional events that make the
states of multiple edges stochastically correlated. We use Markov Random Fields
to model the correlation and define a new stochastic network design framework.
We provide a novel algorithm based on Sample Average Approximation (SAA)
coupled with a Gibbs or XOR sampler. The experimental results on real road
network data show that the policies produced by SAA with the XOR sampler have
higher quality and lower variance compared to SAA with Gibbs sampler.

Logical Learning Through a Hybrid Neural Network with Auxiliary Inputs

Fang Wan, Chaoyang Song
Comments: 11 pages, 9 figures, 4 tables
Subjects: Artificial Intelligence (cs.AI)

The human reasoning process is seldom a one-way process from an input leading
to an output. Instead, it often involves a systematic deduction by ruling out
other possible outcomes as a self-checking mechanism. In this paper, we
describe the design of a hybrid neural network for logical learning that is
similar to the human reasoning through the introduction of an auxiliary input,
namely the indicators, that act as the hints to suggest logical outcomes. We
generate these indicators by digging into the hidden information buried
underneath the original training data for direct or indirect suggestions. We
used the MNIST data to demonstrate the design and use of these indicators in a
convolutional neural network. We trained a series of such hybrid neural
networks with variations of the indicators. Our results show that these hybrid
neural networks are very robust in generating logical outcomes with inherently
higher prediction accuracy than the direct use of the original input and output
in apparent models. Such improved predictability with reassured logical
confidence is obtained through the exhaustion of all possible indicators to
rule out all illogical outcomes, which is not available in the apparent models.
Our logical learning process can effectively cope with the unknown unknowns
using a full exploitation of all existing knowledge available for learning. The
design and implementation of the hints, namely the indicators, become an
essential part of artificial intelligence for logical learning. We also
introduce an ongoing application setup for this hybrid neural network in an
autonomous grasping robot, namely as_DeepClaw, aiming at learning an optimized
grasping pose through logical learning.

Poincaré Embeddings for Learning Hierarchical Representations

Maximilian Nickel, Douwe Kiela
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Representation learning has become an invaluable approach for learning from
symbolic data such as text and graphs. However, while complex symbolic datasets
often exhibit a latent hierarchical structure, state-of-the-art methods
typically learn embeddings in Euclidean vector spaces, which do not account for
this property. For this purpose, we introduce a new approach for learning
hierarchical representations of symbolic data by embedding them into hyperbolic
space — or more precisely into an n-dimensional Poincar’e ball. Due to the
underlying hyperbolic geometry, this allows us to learn parsimonious
representations of symbolic data by simultaneously capturing hierarchy and
similarity. We introduce an efficient algorithm to learn the embeddings based
on Riemannian optimization and show experimentally that Poincar’e embeddings
outperform Euclidean embeddings significantly on data with latent hierarchies,
both in terms of representation capacity and in terms of generalization
ability.

Living Together: Mind and Machine Intelligence

Neil D. Lawrence
Subjects: Artificial Intelligence (cs.AI)

In this paper we consider the nature of the machine intelligences we have
created in the context of our human intelligence. We suggest that the
fundamental difference between human and machine intelligence comes down to
emph{embodiment factors}. We define embodiment factors as the ratio between an
entity’s ability to communicate information vs compute information. We
speculate on the role of embodiment factors in driving our own intelligence and
consciousness. We briefly review dual process models of cognition and cast
machine intelligence within that framework, characterising it as a dominant
System Zero, which can drive behaviour through interfacing with us
subconsciously. Driven by concerns about the consequence of such a system we
suggest prophylactic courses of action that could be considered. Our main
conclusion is that it is emph{not} sentient intelligence we should fear but
emph{non-sentient} intelligence.

Compatible extensions and consistent closures: a fuzzy approach

Irina Georgescu
Subjects: Artificial Intelligence (cs.AI)

In this paper (ast)–compatible extensions of fuzzy relations are studied,
generalizing some results obtained by Duggan in case of crisp relations. From
this general result are obtained as particular cases fuzzy versions of some
important extension theorems for crisp relations (Szpilrajn, Hansson,
Suzumura). Two notions of consistent closure of a fuzzy relation are
introduced.

Symbolic LTLf Synthesis

Shufang Zhu, Lucas M. Tabajara, Jianwen Li, Geguang Pu, Moshe Y. Vardi
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

LTLf synthesis is the process of finding a strategy that satisfies a linear
temporal specification over finite traces. An existing solution to this problem
relies on a reduction to a DFA game. In this paper, we propose a symbolic
framework for LTLf synthesis based on this technique, by performing the
computation over a representation of the DFA as a boolean formula rather than
as an explicit graph. This approach enables strategy generation by utilizing
the mechanism of boolean synthesis. We implement this symbolic synthesis method
in a tool called Syft, and demonstrate by experiments on scalable benchmarks
that the symbolic approach scales better than the explicit one.

Continual Learning in Generative Adversarial Nets

Ari Seff, Alex Beatson, Daniel Suo, Han Liu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Developments in deep generative models have allowed for tractable learning of
high-dimensional data distributions. While the employed learning procedures
typically assume that training data is drawn i.i.d. from the distribution of
interest, it may be desirable to model distinct distributions which are
observed sequentially, such as when different classes are encountered over
time. Although conditional variations of deep generative models permit multiple
distributions to be modeled by a single network in a disentangled fashion, they
are susceptible to catastrophic forgetting when the distributions are
encountered sequentially. In this paper, we adapt recent work in reducing
catastrophic forgetting to the task of training generative adversarial networks
on a sequence of distinct distributions, enabling continual generative
modeling.

Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues

Evaluating expression of the Human epidermal growth factor receptor 2 (Her2)
by visual examination of immunohistochemistry (IHC) on invasive breast cancer
(BCa) is a key part of the diagnostic assessment of BCa due to its recognised
importance as a predictive and prognostic marker in clinical practice. However,
visual scoring of Her2 is subjective and consequently prone to inter-observer
variability. Given the prognostic and therapeutic implications of Her2 scoring,
a more objective method is required. In this paper, we report on a recent
automated Her2 scoring contest, held in conjunction with the annual PathSoc
meeting held in Nottingham in June 2016, aimed at systematically comparing and
advancing the state-of-the-art Artificial Intelligence (AI) based automated
methods for Her2 scoring. The contest dataset comprised of digitised whole
slide images (WSI) of sections from 86 cases of invasive breast carcinoma
stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting
algorithms automatically predicted scores of the IHC slides for an unseen
subset of the dataset and the predicted scores were compared with the ‘ground
truth’ (a consensus score from at least two experts). We also report on a
simple Man vs Machine contest for the scoring of Her2 and show that the
automated methods could beat the pathology experts on this contest dataset.
This paper presents a benchmark for comparing the performance of automated
algorithms for scoring of Her2. It also demonstrates the enormous potential of
automated algorithms in assisting the pathologist with objective IHC scoring.

Sluice networks: Learning what to share between loosely related tasks

Multi-task learning is partly motivated by the observation that humans bring
to bear what they know about related problems when solving new ones. Similarly,
deep neural networks can profit from related tasks by sharing parameters with
other networks. However, humans do not consciously decide to transfer knowledge
between tasks (and are typically not aware of the transfer). In machine
learning, it is hard to estimate if sharing will lead to improvements;
especially if tasks are only loosely related. To overcome this, we introduce
Sluice Networks, a general framework for multi-task learning where trainable
parameters control the amount of sharing — including which parts of the models
to share. Our framework goes beyond and generalizes over previous proposals in
enabling hard or soft sharing of all combinations of subspaces, layers, and
skip connections. We perform experiments on three task pairs from natural
language processing, and across seven different domains, using data from
OntoNotes 5.0, and achieve up to 15% average error reductions over common
approaches to multi-task learning. We analyze when the architecture is
particularly helpful, as well as its ability to fit noise. We show that a)
label entropy is predictive of gains in sluice networks, confirming findings
for hard parameter sharing, and b) while sluice networks easily fit noise, they
are robust across domains in practice.

Effective injury prediction in professional soccer with GPS data and machine learning

Alessio Rossi, Luca Pappalardo, Paolo Cintia, Marcello Iaia, Javier Fernandez, Daniel Medina
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Applications (stat.AP)

Injuries have a great impact on professional soccer, due to their large
influence on team performance and the considerable costs of rehabilitation for
players. Existing studies in the literature provide just a preliminary
understanding of which factors mostly affect injury risk, while an evaluation
of the potential of statistical models in forecasting injuries is still
missing. In this paper, we propose a multidimensional approach to injury
prediction in professional soccer which is based on GPS measurements and
machine learning. By using GPS tracking technology, we collect data describing
the training workload of players in a professional soccer club during a season.
We show that our injury predictors are both accurate and interpretable by
providing a set of case studies of interest to soccer practitioners. Our
approach opens a novel perspective on injury prevention, providing a set of
simple and practical rules for evaluating and interpreting the complex
relations between injury risk and training performance in professional soccer.

Detection Algorithms for Communication Systems Using Deep Learning

Nariman Farsad, Andrea Goldsmith
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

The design and analysis of communication systems typically rely on the
development of mathematical models that describe the underlying communication
channel, which dictates the relationship between the transmitted and the
received signals. However, in some systems, such as molecular communication
systems where chemical signals are used for transfer of information, it is not
possible to accurately model this relationship. In these scenarios, because of
the lack of mathematical channel models, a completely new approach to design
and analysis is required. In this work, we focus on one important aspect of
communication systems, the detection algorithms, and demonstrate that by
borrowing tools from deep learning, it is possible to train detectors that
perform well, without any knowledge of the underlying channel models. We
evaluate these algorithms using experimental data that is collected by a
chemical communication platform, where the channel model is unknown and
difficult to model analytically. We show that deep learning algorithms perform
significantly better than a simple detector that was used in previous works,
which also did not assume any knowledge of the channel.

pix2code: Generating Code from a Graphical User Interface Screenshot

Transforming a graphical user interface screenshot created by a designer into
computer code is a typical task conducted by a developer in order to build
customized software, websites and mobile applications. In this paper, we show
that Deep Learning techniques can be leveraged to automatically generate code
given a graphical user interface screenshot as input. Our model is able to
generate code targeting three different platforms (i.e. iOS, Android and
web-based technologies) from a single input image with over 77% of accuracy.

Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

We propose a new algorithm for training generative adversarial networks to
jointly learn latent codes for both identities (e.g. individual humans) and
observations (e.g. specific photographs). In practice, this means that by
fixing the identity portion of latent codes, we can generate diverse images of
the same subject, and by fixing the observation portion we can traverse the
manifold of subjects while maintaining contingent aspects such as lighting and
pose. Our algorithm features a pairwise training scheme in which each sample
from the generator consists of two images with a common identity code.
Corresponding samples from the real dataset consist of two distinct photographs
of the same subject. In order to fool the discriminator, the generator must
produce images that are both photorealistic, distinct, and appear to depict the
same person. We augment both the DCGAN and BEGAN approaches with Siamese
discriminators to accommodate pairwise training. Experiments with human judges
and an off-the-shelf face verification system demonstrate our algorithm’s
ability to generate convincing, identity-matched photographs.

Quantifying the relation between performance and success in soccer

Luca Pappalardo, Paolo Cintia
Subjects: Applications (stat.AP); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

The availability of massive data about sports activities offers nowadays the
opportunity to quantify the relation between performance and success. In this
work, we analyze more than 6,000 games and 10 million events in the six major
European leagues and investigate the relation between team performance and
success in soccer competitions. We discover that a team’s success in the
national tournament is significantly related to its typical performance.
Moreover, we observe that while victory and defeats can be explained by the
team’s performance during a game, draws are difficult to describe with a
machine learning approach. We then perform a simulation of an entire season of
the six leagues where the outcome of every game is replaced by a synthetic
outcome (victory, defeat, or draw) based on a machine learning model trained on
the previous seasons. We find that the final rankings in the simulated
tournaments are close to the actual rankings in the real tournaments,
suggesting that a complex systems’ view on soccer has the potential of
revealing hidden patterns regarding the relation between performance and
success.

Information Retrieval

Increasing Papers' Discoverability with Precise Semantic Labeling: the sci.AI Platform

Roman Gurinovich, Alexander Pashuk, Yuriy Petrovskiy, Alex Dmitrievskij, Oleg Kuryan, Alexei Scerbacov, Antonia Tiggre, Elena Moroz, Yuri Nikolsky
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

The number of published findings in biomedicine increases continually. At the
same time, specifics of the domain’s terminology complicates the task of
relevant publications retrieval. In the current research, we investigate
influence of terms’ variability and ambiguity on a paper’s likelihood of being
retrieved. We obtained statistics that demonstrate significance of the issue
and its challenges, followed by presenting the sci.AI platform, which allows
precise terms labeling as a resolution.

Music Playlist Continuation by Learning from Hand-Curated Examples and Song Features

Andreu Vall, Hamid Eghbal-zadeh, Matthias Dorfer, Markus Schedl
Subjects: Information Retrieval (cs.IR)

Automated music playlist generation is a specific form of music
recommendation. Generally stated, the user receives a set of song suggestions
defining a coherent listening session. We hypothesize that the best way to
convey such playlist coherence to new recommendations is by learning it from
actual curated examples, in contrast to imposing ad hoc constraints. Examples
of thoroughly curated playlists are rather scarce, especially compared to the
amount of listening logs available (e.g., in music streaming services).
Collaborative filtering methods can be used to capture underlying patterns in
hand-curated playlists, but their performance is severely affected by the size
of the data and the low number of song observations. We propose an alternative
method based on a song-to-playlist classifier, which learns the underlying
structure from actual playlist examples, while leveraging song features based
on audio, social tags and independent listening logs. Experiments on two
datasets of hand-curated playlists show competitive performance compared to
collaborative filtering when enough training data is available and also more
robust performance in cold-start situations. For example, both methods achieve
a recall@100 of roughly 35% for songs observed 5 or more times in the training
set, whereas the proposed model achieves a recall@100 of roughly 15% for songs
observed 4 or less times in the training set, compared to the 3% achieved by
collaborative filtering.

Reference String Extraction Using Line-Based Conditional Random Fields

Martin Körner
Comments: 5 pages, preprint
Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)

The extraction of individual reference strings from the reference section of
scientific publications is an important step in the citation extraction
pipeline. Current approaches divide this task into two steps by first detecting
the reference section areas and then grouping the text lines in such areas into
reference strings. We propose a classification model that considers every line
in a publication as a potential part of a reference string. By applying
line-based conditional random fields rather than constructing the graphical
model based on the individual words, dependencies and patterns that are typical
in reference sections provide strong features while the overall complexity of
the model is reduced.

TwiInsight: Discovering Topics and Sentiments from Social Media Datasets

Zhengkui Wang, Guangdong Bai, Soumyadeb Chowdhury, Quanqing Xu, Zhi Lin Seow
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Social media platforms contain a great wealth of information which provides
opportunities for us to explore hidden patterns or unknown correlations, and
understand people’s satisfaction with what they are discussing. As one
showcase, in this paper, we present a system, TwiInsight which explores the
insight of Twitter data. Different from other Twitter analysis systems,
TwiInsight automatically extracts the popular topics under different categories
(e.g., healthcare, food, technology, sports and transport) discussed in Twitter
via topic modeling and also identifies the correlated topics across different
categories. Additionally, it also discovers the people’s opinions on the tweets
and topics via the sentiment analysis. The system also employs an intuitive and
informative visualization to show the uncovered insight. Furthermore, we also
develop and compare six most popular algorithms – three for sentiment analysis
and three for topic modeling.

Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

Arman Cohan, Nazli Goharian
Comments: SIGIR 2017
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Citation texts are sometimes not very informative or in some cases inaccurate
by themselves; they need the appropriate context from the referenced paper to
reflect its exact contributions. To address this problem, we propose an
unsupervised model that uses distributed representation of words as well as
domain knowledge to extract the appropriate context from the reference paper.
Evaluation results show the effectiveness of our model by significantly
outperforming the state-of-the-art. We furthermore demonstrate how an effective
contextualization method results in improving citation-based summarization of
the scientific articles.

Computation and Language

Deep Learning of Grammatically-Interpretable Representations Through Question-Answering

Hamid Palangi, Paul Smolensky, Xiaodong He, Li Deng
Subjects: Computation and Language (cs.CL)

We introduce an architecture in which internal representations, learned by
end-to-end optimization in a deep neural network performing a textual
question-answering task, can be interpreted using basic concepts from
linguistic theory. This interpretability comes at a cost of only a few
percentage-point reduction in accuracy relative to the original model on which
the new one is based (BiDAF [1]). The internal representation that is
interpreted is a Tensor Product Representation: for each input word, the model
selects a symbol to encode the word, and a role in which to place the symbol,
and binds the two together. The selection is via soft attention. The overall
interpretation is built from interpretations of the symbols, as recruited by
the trained model, and interpretations of the roles as used by the model. We
find support for our initial hypothesis that symbols can be interpreted as
lexical-semantic word meanings, while roles can be interpreted as
approximations of grammatical roles (or categories) such as subject, wh-word,
determiner, etc. Through extremely detailed, fine-grained analysis, we find
specific correspondences between the learned roles and parts of speech as
assigned by a standard parser [2], and find several discrepancies in the
model’s favor. In this sense, the model learns significant aspects of grammar,
after having been exposed solely to linguistically unannotated text, questions,
and answers: no prior linguistic knowledge is given to the model. What is given
is the means to represent using symbols and roles and an inductive bias
favoring use of these in an approximately discrete manner.

Better Text Understanding Through Image-To-Text Transfer

Generic text embeddings are successfully used in a variety of tasks. However,
they are often learnt by capturing the co-occurrence structure from pure text
corpora, resulting in limitations of their ability to generalize. In this
paper, we explore models that incorporate visual information into the text
representation. Based on comprehensive ablation studies, we propose a
conceptually simple, yet well performing architecture. It outperforms previous
multimodal approaches on a set of well established benchmarks. We also improve
the state-of-the-art results for image-related text datasets, using orders of
magnitude less data.

Local Monotonic Attention Mechanism for End-to-End Speech Recognition

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Comments: 12 pages, 2 figures
Subjects: Computation and Language (cs.CL)

Recently, sequence-to-sequence model by using encoder-decoder neural network
has gained popularity for automatic speech recognition (ASR). The architecture
commonly uses an attentional mechanism which allows the model to learn
alignments between source speech sequence and target text sequence. Most
attentional mechanisms used today is based on a global attention property which
requires a computation of a weighted summarization of the whole input sequence
generated by encoder states. However, it is computationally expensive and often
produces misalignment on the longer input sequence. Furthermore, it does not
fit with monotonous or left-to-right nature in speech recognition task. In this
paper, we propose a novel attention mechanism that has local and monotonic
properties. Various ways to control those properties are also explored.
Experimental results demonstrate that encoder-decoder based ASR with local
monotonic attention could achieve significant performance improvements and
reduce the computational complexity in comparison with the one that used the
standard global attention architecture.

Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

Arman Cohan, Nazli Goharian
Comments: SIGIR 2017
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Citation texts are sometimes not very informative or in some cases inaccurate
by themselves; they need the appropriate context from the referenced paper to
reflect its exact contributions. To address this problem, we propose an
unsupervised model that uses distributed representation of words as well as
domain knowledge to extract the appropriate context from the reference paper.
Evaluation results show the effectiveness of our model by significantly
outperforming the state-of-the-art. We furthermore demonstrate how an effective
contextualization method results in improving citation-based summarization of
the scientific articles.

Latent Human Traits in the Language of Social Media: An Open-Vocabulary Approach

Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski, Sandra Matz, Lyle Ungar, Steven Skiena, H. Andrew Schwartz
Comments: In submission to PLOS One
Subjects: Computation and Language (cs.CL)

Over the past century, personality theory and research has successfully
identified core sets of characteristics that consistently describe and explain
fundamental differences in the way people think, feel and behave. Such
characteristics were derived through theory, dictionary analyses, and survey
research using explicit self-reports. The availability of social media data
spanning millions of users now makes it possible to automatically derive
characteristics from language use — at large scale. Taking advantage of
linguistic information available through Facebook, we study the process of
inferring a new set of potential human traits based on unprompted language use.
We subject these new traits to a comprehensive set of evaluations and compare
them with a popular five factor model of personality. We find that our
language-based trait construct is often more generalizable in that it often
predicts non-questionnaire-based outcomes better than questionnaire-based
traits (e.g. entities someone likes, income and intelligence quotient), while
the factors remain nearly as stable as traditional factors. Our approach
suggests a value in new constructs of personality derived from everyday human
language use.

Use of Knowledge Graph in Rescoring the N-Best List in Automatic Speech Recognition

Ashwini Jaya Kumar, Camilo Morales, Maria-Esther Vidal, Christoph Schmidt, Sören Auer
Subjects: Computation and Language (cs.CL)

With the evolution of neural network based methods, automatic speech
recognition (ASR) field has been advanced to a level where building an
application with speech interface is a reality. In spite of these advances,
building a real-time speech recogniser faces several problems such as low
recognition accuracy, domain constraint, and out-of-vocabulary words. The low
recognition accuracy problem is addressed by improving the acoustic model,
language model, decoder and by rescoring the N-best list at the output of the
decoder. We are considering the N-best list rescoring approach to improve the
recognition accuracy. Most of the methods in the literature use the
grammatical, lexical, syntactic and semantic connection between the words in a
recognised sentence as a feature to rescore. In this paper, we have tried to
see the semantic relatedness between the words in a sentence to rescore the
N-best list. Semantic relatedness is computed using
TransE~cite{bordes2013translating}, a method for low dimensional embedding of
a triple in a knowledge graph. The novelty of the paper is the application of
semantic web to automatic speech recognition.

Increasing Papers' Discoverability with Precise Semantic Labeling: the sci.AI Platform

The number of published findings in biomedicine increases continually. At the
same time, specifics of the domain’s terminology complicates the task of
relevant publications retrieval. In the current research, we investigate
influence of terms’ variability and ambiguity on a paper’s likelihood of being
retrieved. We obtained statistics that demonstrate significance of the issue
and its challenges, followed by presenting the sci.AI platform, which allows
precise terms labeling as a resolution.

Sluice networks: Learning what to share between loosely related tasks

Multi-task learning is partly motivated by the observation that humans bring
to bear what they know about related problems when solving new ones. Similarly,
deep neural networks can profit from related tasks by sharing parameters with
other networks. However, humans do not consciously decide to transfer knowledge
between tasks (and are typically not aware of the transfer). In machine
learning, it is hard to estimate if sharing will lead to improvements;
especially if tasks are only loosely related. To overcome this, we introduce
Sluice Networks, a general framework for multi-task learning where trainable
parameters control the amount of sharing — including which parts of the models
to share. Our framework goes beyond and generalizes over previous proposals in
enabling hard or soft sharing of all combinations of subspaces, layers, and
skip connections. We perform experiments on three task pairs from natural
language processing, and across seven different domains, using data from
OntoNotes 5.0, and achieve up to 15% average error reductions over common
approaches to multi-task learning. We analyze when the architecture is
particularly helpful, as well as its ability to fit noise. We show that a)
label entropy is predictive of gains in sluice networks, confirming findings
for hard parameter sharing, and b) while sluice networks easily fit noise, they
are robust across domains in practice.

TwiInsight: Discovering Topics and Sentiments from Social Media Datasets

Zhengkui Wang, Guangdong Bai, Soumyadeb Chowdhury, Quanqing Xu, Zhi Lin Seow
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Social media platforms contain a great wealth of information which provides
opportunities for us to explore hidden patterns or unknown correlations, and
understand people’s satisfaction with what they are discussing. As one
showcase, in this paper, we present a system, TwiInsight which explores the
insight of Twitter data. Different from other Twitter analysis systems,
TwiInsight automatically extracts the popular topics under different categories
(e.g., healthcare, food, technology, sports and transport) discussed in Twitter
via topic modeling and also identifies the correlated topics across different
categories. Additionally, it also discovers the people’s opinions on the tweets
and topics via the sentiment analysis. The system also employs an intuitive and
informative visualization to show the uncovered insight. Furthermore, we also
develop and compare six most popular algorithms – three for sentiment analysis
and three for topic modeling.

pix2code: Generating Code from a Graphical User Interface Screenshot

Transforming a graphical user interface screenshot created by a designer into
computer code is a typical task conducted by a developer in order to build
customized software, websites and mobile applications. In this paper, we show
that Deep Learning techniques can be leveraged to automatically generate code
given a graphical user interface screenshot as input. Our model is able to
generate code targeting three different platforms (i.e. iOS, Android and
web-based technologies) from a single input image with over 77% of accuracy.

Distributed, Parallel, and Cluster Computing

Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers

Richard Cole, Vijaya Ramachandran
Comments: To appear in Proceedings of ACM Symp. on Parallel Alg. and Architectures (SPAA) 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We analyze the caching overhead incurred by a class of multithreaded
algorithms when scheduled by an arbitrary scheduler. We obtain bounds that
match or improve upon the well-known (O(Q+S cdot (M/B))) caching cost for the
randomized work stealing (RWS) scheduler, where (S) is the number of steals,
(Q) is the sequential caching cost, and (M) and (B) are the cache size and
block (or cache line) size respectively.

Towards Blockchain-based Auditable Storage and Sharing of IoT Data

Hossein Shafagh, Anwar Hithnawi, Simon Duquennoy
Comments: USENIX NSDI’17, poster
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Today the cloud plays a central role in storing, processing, and distributing
data. Despite contributing to the rapid development of various applications,
including the IoT, the current centralized storage architecture has led into a
myriad of isolated data silos and is preventing the full potential of holistic
data-driven analytics for IoT data. In this abstract, we advocate a
data-centric design for IoT with focus on resilience, sharing, and auditable
protection of information. We introduce the initial design of our
blockchain-based end-to-end encrypted data storage system. We enable a secure
and persistent data management, by utilizing the blockchain as an auditable
access control layer to a decentralized storage layer.

Parallel Accelerated Custom Correlation Coefficient Calculations for Genomics Applications

Wayne Joubert, James Nance, Sharlee Climer, Deborah Weighill, Daniel Jacobson
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Performance (cs.PF)

The massive quantities of genomic data being made available through gene
sequencing techniques are enabling breakthroughs in genomic science in many
areas such as medical advances in the diagnosis and treatment of diseases.
Analyzing this data, however, is a computational challenge insofar as the
computational costs of the relevant algorithms can grow with quadratic, cubic
or higher complexity–leading to the need for leadership scale computing. In
this paper we describe a new approach to calculations of the Custom Correlation
Coefficient (CCC) between Single Nucleotide Polymorphisms (SNPs) across a
population, suitable for parallel systems equipped with graphics processing
units (GPUs) or Intel Xeon Phi processors. We describe the mapping of the
algorithms to accelerated processors, techniques used for eliminating redundant
calculations due to symmetries, and strategies for efficient mapping of the
calculations to many-node parallel systems. Results are presented demonstrating
high per-node performance and near-ideal parallel scalability with rates of
more than four quadrillion elementwise comparisons achieved per second on the
ORNL Titan system. In a companion paper we describe corresponding techniques
applied to calculations of the Proportional Similarity metric for comparative
genomics applications.

Parallel Accelerated Vector Similarity Calculations for Genomics Applications

Wayne Joubert, James Nance, Deborah Weighill, Daniel Jacobson
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Performance (cs.PF)

The surge in availability of genomic data holds promise for enabling
determination of genetic causes of observed individual traits, with
applications to problems such as discovery of the genetic roots of phenotypes,
be they molecular phenotypes such as gene expression or metabolite
concentrations, or complex phenotypes such as diseases. However, the growing
sizes of these datasets and the quadratic, cubic or higher scaling
characteristics of the relevant algorithms pose a serious computational
challenge necessitating use of leadership scale computing. In this paper we
describe a new approach to performing vector similarity metrics calculations,
suitable for parallel systems equipped with graphics processing units (GPUs) or
Intel Xeon Phi processors. Our primary focus is the Proportional Similarity
metric applied to Genome Wide Association Studies (GWAS) and Phenome Wide
Association Studies (PheWAS). We describe the implementation of the algorithms
on accelerated processors, methods used for eliminating redundant calculations
due to symmetries, and techniques for efficient mapping of the calculations to
many-node parallel systems. Results are presented demonstrating high per-node
performance and parallel scalability with rates of more than five quadrillion
elementwise comparisons achieved per second on the ORNL Titan system. In a
companion paper we describe corresponding techniques applied to calculations of
the Custom Correlation Coefficient for comparative genomics applications.

Distributed Testing of Conductance

Hendrik Fichtenberger, Yadu Vasudev
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We study the problem of testing conductance in the distributed computing
model and give a two-sided tester that takes (mathcal{O}(log n)) rounds to
decide if a graph has conductance at least (Phi) or is (epsilon)-far from
having conductance at least (Theta(Phi^2)) in the distributed CONGEST model.
We also show that (Omega(log n)) rounds are necessary for testing conductance
even in the LOCAL model. In the case of a connected graph, we show that we can
perform the test even when the number of vertices in the graph is not known a
priori. This is the first two-sided tester in the distributed model we are
aware of. The key idea in our algorithm is a way to perform a polynomial number
of random walks from a set of vertices, avoiding the congestion on the edges.

Transformation of Python Applications into Function-as-a-Service Deployments

Josef Spillner
Comments: 14 pages, 2 tables, 5 figures, repeatable, unreviewed
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

New cloud programming and deployment models pose challenges to software
application engineers who are looking, often in vain, for tools to automate any
necessary code adaptation and transformation. Function-as-a-Service interfaces
are particular non-trivial targets when considering that most cloud
applications are implemented in non-functional languages. Among the most widely
used of these languages is Python. This starting position calls for an
automated approach to transform monolithic Python code into modular FaaS units
by partially automated decomposition. Hence, this paper introduces and
evaluates Lambada, a Python module to dynamically decompose, convert and deploy
unmodified Python code into AWS Lambda functions. Beyond the tooling in the
form of a measured open source prototype implementation, the paper contributes
a description of the algorithms and code rewriting rules as blueprints for
transformations of other scripting languages.

Liquid Cloud Storage

Michael G. Luby, Roberto Padovani, Thomas J. Richardson, Lorenz Minder, Pooja Aggarwal
Comments: 44 pages, 21 figures, 1 table
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Systems and Control (cs.SY)

A liquid system provides durable object storage based on spreading
redundantly generated data across a network of hundreds to thousands of
potentially unreliable storage nodes. A liquid system uses a combination of a
large code, lazy repair, and a flow storage organization. We show that a liquid
system can be operated to enable flexible and essentially optimal combinations
of storage durability, storage overhead, repair bandwidth usage, and access
performance.

Fast and Differentially Private Algorithms for Decentralized Collaborative Machine Learning

Aurélien Bellet, Rachid Guerraoui, Mahsa Taziki, Marc Tommasi
Comments: 18 pages
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY); Machine Learning (stat.ML)

Consider a set of agents in a peer-to-peer communication network, where each
agent has a personal dataset and a personal learning objective. The main
question addressed in this paper is: how can agents collaborate to improve upon
their locally learned model without leaking sensitive information about their
data? Our first contribution is to reformulate this problem so that it can be
solved by a block coordinate descent algorithm. We obtain an efficient and
fully decentralized protocol working in an asynchronous fashion. Our second
contribution is to make our algorithm differentially private to protect against
the disclosure of any information about personal datasets. We prove convergence
rates and exhibit the trade-off between utility and privacy. Our experiments
show that our approach dramatically outperforms previous work in the
non-private case, and that under privacy constraints we significantly improve
over purely local models.

Randomized Composable Coresets for Matching and Vertex Cover

Sepehr Assadi, Sanjeev Khanna
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

A common approach for designing scalable algorithms for massive data sets is
to distribute the computation across, say (k), machines and process the data
using limited communication between them. A particularly appealing framework
here is the simultaneous communication model whereby each machine constructs a
small representative summary of its own data and one obtains an
approximate/exact solution from the union of the representative summaries. If
the representative summaries needed for a problem are small, then this results
in a communication-efficient and round-optimal protocol. While many fundamental
graph problems admit efficient solutions in this model, two prominent problems
are notably absent from the list of successes, namely, the maximum matching
problem and the minimum vertex cover problem. Indeed, it was shown recently
that for both these problems, even achieving a polylog((n)) approximation
requires essentially sending the entire input graph from each machine.

The main insight of our work is that the intractability of matching and
vertex cover in the simultaneous communication model is inherently connected to
an adversarial partitioning of the underlying graph across machines. We show
that when the underlying graph is randomly partitioned across machines, both
these problems admit randomized composable coresets of size (widetilde{O}(n))
that yield an (widetilde{O}(1))-approximate solution. This results in an
(widetilde{O}(1))-approximation simultaneous protocol for these problems with
(widetilde{O}(nk)) total communication when the input is randomly partitioned
across (k) machines. We further prove the optimality of our results. Finally,
by a standard application of composable coresets, our results also imply
MapReduce algorithms with the same approximation guarantee in one or two rounds
of communication

Learning

Fast and Differentially Private Algorithms for Decentralized Collaborative Machine Learning

Consider a set of agents in a peer-to-peer communication network, where each
agent has a personal dataset and a personal learning objective. The main
question addressed in this paper is: how can agents collaborate to improve upon
their locally learned model without leaking sensitive information about their
data? Our first contribution is to reformulate this problem so that it can be
solved by a block coordinate descent algorithm. We obtain an efficient and
fully decentralized protocol working in an asynchronous fashion. Our second
contribution is to make our algorithm differentially private to protect against
the disclosure of any information about personal datasets. We prove convergence
rates and exhibit the trade-off between utility and privacy. Our experiments
show that our approach dramatically outperforms previous work in the
non-private case, and that under privacy constraints we significantly improve
over purely local models.

Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues

Noga Alon, Moshe Babaioff, Yannai A. Gonczarowski, Yishay Mansour, Shay Moran, Amir Yehudayoff
Subjects: Learning (cs.LG); Computer Science and Game Theory (cs.GT)

In this work we derive a variant of the classic Glivenko-Cantelli Theorem,
which asserts uniform convergence of the empirical Cumulative Distribution
Function (CDF) to the CDF of the underlying distribution. Our variant allows
for tighter convergence bounds for extreme values of the CDF.

We apply our bound in the context of revenue learning, which is a
well-studied problem in economics and algorithmic game theory. We derive
sample-complexity bounds on the uniform convergence rate of the empirical
revenues to the true revenues, assuming a bound on the (k)th moment of the
valuations, for any (possibly fractional) (k>1).

For uniform convergence in the limit, we give a complete characterization and
a zero-one law: if the first moment of the valuations is finite, then uniform
convergence almost surely occurs; conversely, if the first moment is infinite,
then uniform convergence almost never occurs.

Continuous State-Space Models for Optimal Sepsis Treatment – a Deep Reinforcement Learning Approach

Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, Marzyeh Ghassemi
Subjects: Learning (cs.LG)

Sepsis is a leading cause of mortality in intensive care units (ICUs) and
costs hospitals billions annually. Treating a septic patient is highly
challenging, because individual patients respond very differently to medical
interventions and there is no universally agreed-upon treatment for sepsis.
Understanding more about a patient’s physiological state at a given time could
hold the key to effective treatment policies. In this work, we propose a new
approach to deduce optimal treatment policies for septic patients by using
continuous state-space models and deep reinforcement learning. Learning
treatment policies over continuous spaces is important, because we retain more
of the patient’s physiological information. Our model is able to learn
clinically interpretable treatment policies, similar in important aspects to
the treatment policies of physicians. Evaluating our algorithm on past ICU
patient data, we find that our model could reduce patient mortality in the
hospital by up to 3.6% over observed clinical policies, from a baseline
mortality of 13.7%. The learned treatment policies could be used to aid
intensive care clinicians in medical decision making and improve the likelihood
of patient survival.

Ridesourcing Car Detection by Transfer Learning

Leye Wang, Xu Geng, Jintao Ke, Chen Peng, Xiaojuan Ma, Daqing Zhang, Qiang Yang
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Ridesourcing platforms like Uber and Didi are getting more and more popular
around the world. However, unauthorized ridesourcing activities taking
advantages of the sharing economy can greatly impair the healthy development of
this emerging industry. As the first step to regulate on-demand ride services
and eliminate black market, we design a method to detect ridesourcing cars from
a pool of cars based on their trajectories. Since licensed ridesourcing car
traces are not openly available and may be completely missing in some cities
due to legal issues, we turn to transferring knowledge from public transport
open data, i.e, taxis and buses, to ridesourcing detection among ordinary
vehicles. We propose a two-stage transfer learning framework. In Stage 1, we
take taxi and bus data as input to learn a random forest (RF) classifier using
trajectory features shared by taxis/buses and ridesourcing/other cars. Then, we
use the RF to label all the candidate cars. In Stage 2, leveraging the subset
of high confident labels from the previous stage as input, we further learn a
convolutional neural network (CNN) classifier for ridesourcing detection, and
iteratively refine RF and CNN, as well as the feature set, via a co-training
process. Finally, we use the resulting ensemble of RF and CNN to identify the
ridesourcing cars in the candidate pool. Experiments on real car, taxi and bus
traces show that our transfer learning framework, with no need of a pre-labeled
ridesourcing dataset, can achieve similar accuracy as the supervised learning
methods.

Continual Learning in Generative Adversarial Nets

Ari Seff, Alex Beatson, Daniel Suo, Han Liu
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Developments in deep generative models have allowed for tractable learning of
high-dimensional data distributions. While the employed learning procedures
typically assume that training data is drawn i.i.d. from the distribution of
interest, it may be desirable to model distinct distributions which are
observed sequentially, such as when different classes are encountered over
time. Although conditional variations of deep generative models permit multiple
distributions to be modeled by a single network in a disentangled fashion, they
are susceptible to catastrophic forgetting when the distributions are
encountered sequentially. In this paper, we adapt recent work in reducing
catastrophic forgetting to the task of training generative adversarial networks
on a sequence of distinct distributions, enabling continual generative
modeling.

Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions

Aryeh Kontorovich, Sivan Sabato, Roi Weiss
Subjects: Learning (cs.LG); Statistics Theory (math.ST)

We examine the Bayes-consistency of a recently proposed
1-nearest-neighbor-based multiclass learning algorithm. This algorithm is
derived from sample compression bounds and enjoys the statistical advantages of
tight, fully empirical generalization bounds, as well as the algorithmic
advantages of runtime and memory savings. We prove that this algorithm is
strongly Bayes-consistent in metric spaces with finite doubling dimension —
the first consistency result for an efficient nearest-neighbor sample
compression scheme. Rather surprisingly, we discover that this algorithm
continues to be Bayes-consistent even in a certain infinite-dimensional
setting, in which the basic measure-theoretic conditions on which classic
consistency proofs hinge are violated. This is all the more surprising, since
it is known that k-NN is not Bayes-consistent in this setting. We pose several
challenging open problems for future research.

Black-Box Attacks against RNN based Malware Detection Algorithms

Weiwei Hu, Ying Tan
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR)

Recent researches have shown that machine learning based malware detection
algorithms are very vulnerable under the attacks of adversarial examples. These
works mainly focused on the detection algorithms which use features with fixed
dimension, while some researchers have begun to use recurrent neural networks
(RNN) to detect malware based on sequential API features. This paper proposes a
novel algorithm to generate sequential adversarial examples, which are used to
attack a RNN based malware detection system. It is usually hard for malicious
attackers to know the exact structures and weights of the victim RNN. A
substitute RNN is trained to approximate the victim RNN. Then we propose a
generative RNN to output sequential adversarial examples from the original
sequential malware inputs. Experimental results showed that RNN based malware
detection algorithms fail to detect most of the generated malicious adversarial
examples, which means the proposed model is able to effectively bypass the
detection algorithms.

Consistent Multitask Learning with Nonlinear Output Relations

Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco, Massimiliano Pontil
Comments: 25 pages, 1 figure, 1 table
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Key to multitask learning is exploiting relationships between different tasks
to improve prediction performance. If the relations are linear, regularization
approaches can be used successfully. However, in practice assuming the tasks to
be linearly related might be restrictive, and allowing for nonlinear structures
is a challenge. In this paper, we tackle this issue by casting the problem
within the framework of structured prediction. Our main contribution is a novel
algorithm for learning multiple tasks which are related by a system of
nonlinear equations that their joint outputs need to satisfy. We show that the
algorithm is consistent and can be efficiently implemented. Experimental
results show the potential of the proposed method.

Semi-Bandits with Knapsacks

Karthik Abinav Sankararaman, Aleksandrs Slivkins
Subjects: Learning (cs.LG)

This paper unifies two lines of work on multi-armed bandits, Bandits with
Knapsacks (BwK) and semi-bandits. The former concerns scenarios with limited
“resources” consumed by the algorithm, e.g., limited inventory in a dynamic
pricing problem. The latter has a huge number of actions, but there is
combinatorial structure and additional feedback which makes the problem
tractable. Both lines of work has received considerable recent attention, and
are supported by numerous application examples. We define a common
generalization, and design a general algorithm for this model. Our regret rates
are comparable with those for BwK and semi-bandits in general, and essentially
optimal for important special cases.

Learning from partial correction

Sanjoy Dasgupta, Michael Luby
Subjects: Learning (cs.LG)

We introduce a new model of interactive learning in which an expert examines
the predictions of a learner and partially fixes them if they are wrong.
Although this kind of feedback is not i.i.d., we show statistical
generalization bounds on the quality of the learned model.

Compressing Recurrent Neural Network with Tensor Train

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Comments: Accepted at IJCNN 2017
Subjects: Learning (cs.LG)

Recurrent Neural Network (RNN) are a popular choice for modeling temporal and
sequential tasks and achieve many state-of-the-art performance on various
complex problems. However, most of the state-of-the-art RNNs have millions of
parameters and require many computational resources for training and predicting
new data. This paper proposes an alternative RNN model to reduce the number of
parameters significantly by representing the weight parameters based on Tensor
Train (TT) format. In this paper, we implement the TT-format representation for
several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU). We
compare and evaluate our proposed RNN model with uncompressed RNN model on
sequence classification and sequence prediction tasks. Our proposed RNNs with
TT-format are able to preserve the performance while reducing the number of RNN
parameters significantly up to 40 times smaller.

Wasserstein Learning of Deep Generative Point Process Models

Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, Hongyuan Zha
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Point processes are becoming very popular in modeling asynchronous sequential
data due to their sound mathematical foundation and strength in modeling a
variety of real-world phenomena. Currently, they are often characterized via
intensity function which limits model’s expressiveness due to unrealistic
assumptions on its parametric form used in practice. Furthermore, they are
learned via maximum likelihood approach which is prone to failure in
multi-modal distributions of sequences. In this paper, we propose an
intensity-free approach for point processes modeling that transforms nuisance
processes to a target one. Furthermore, we train the model using a
likelihood-free leveraging Wasserstein distance between point processes.
Experiments on various synthetic and real-world data substantiate the
superiority of the proposed point process model over conventional ones.

Detection Algorithms for Communication Systems Using Deep Learning

Nariman Farsad, Andrea Goldsmith
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

The design and analysis of communication systems typically rely on the
development of mathematical models that describe the underlying communication
channel, which dictates the relationship between the transmitted and the
received signals. However, in some systems, such as molecular communication
systems where chemical signals are used for transfer of information, it is not
possible to accurately model this relationship. In these scenarios, because of
the lack of mathematical channel models, a completely new approach to design
and analysis is required. In this work, we focus on one important aspect of
communication systems, the detection algorithms, and demonstrate that by
borrowing tools from deep learning, it is possible to train detectors that
perform well, without any knowledge of the underlying channel models. We
evaluate these algorithms using experimental data that is collected by a
chemical communication platform, where the channel model is unknown and
difficult to model analytically. We show that deep learning algorithms perform
significantly better than a simple detector that was used in previous works,
which also did not assume any knowledge of the channel.

Parallel Stochastic Gradient Descent with Sound Combiners

Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz
Comments: 16 pages, 4 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Stochastic gradient descent (SGD) is a well known method for regression and
classification tasks. However, it is an inherently sequential algorithm at each
step, the processing of the current example depends on the parameters learned
from the previous examples. Prior approaches to parallelizing linear learners
using SGD, such as HOGWILD! and ALLREDUCE, do not honor these dependencies
across threads and thus can potentially suffer poor convergence rates and/or
poor scalability. This paper proposes SYMSGD, a parallel SGD algorithm that, to
a first-order approximation, retains the sequential semantics of SGD. Each
thread learns a local model in addition to a model combiner, which allows local
models to be combined to produce the same result as what a sequential SGD would
have produced. This paper evaluates SYMSGD’s accuracy and performance on 6
datasets on a shared-memory machine shows upto 11x speedup over our heavily
optimized sequential baseline on 16 cores and 2.2x, on average, faster than
HOGWILD!.

Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices

Tayfun Gokmen, O. Murat Onen, Wilfried Haensch
Comments: 22 pages, 6 figures, 2 tables
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

In a previous work we have detailed the requirements to obtain a maximal
performance benefit by implementing fully connected deep neural networks (DNN)
in form of arrays of resistive devices for deep learning. This concept of
Resistive Processing Unit (RPU) devices we extend here towards convolutional
neural networks (CNNs). We show how to map the convolutional layers to RPU
arrays such that the parallelism of the hardware can be fully utilized in all
three cycles of the backpropagation algorithm. We find that the noise and bound
limitations imposed due to analog nature of the computations performed on the
arrays effect the training accuracy of the CNNs. Noise and bound management
techniques are presented that mitigate these problems without introducing any
additional complexity in the analog circuits and can be addressed by the
digital circuits. In addition, we discuss digitally programmable update
management and device variability reduction techniques that can be used
selectively for some of the layers in a CNN. We show that combination of all
those techniques enables a successful application of the RPU concept for
training CNNs. The techniques discussed here are more general and can be
applied beyond CNN architectures and therefore enables applicability of RPU
approach for large class of neural network architectures.

Convergence Analysis of Batch Normalization for Deep Neural Nets

Yintai Ma, Diego Klabjan
Subjects: Learning (cs.LG)

Batch normalization (BN) is very effective in accelerating the convergence of
a neural network training phase that it has become a common practice. We
propose a generalization of BN, the diminishing batch normalization (DBN)
algorithm. We provide an analysis of the convergence of the DBN algorithm that
converges to a stationary point with respect to trainable parameters. We
analyze a two layer model with linear activation. The main challenge of the
analysis is the fact that some parameters are updated by gradient while others
are not. In the numerical experiments, we use models with more layers and ReLU
activation. We observe that DBN outperforms the original BN algorithm on MNIST,
NI and CIFAR-10 datasets with reasonable complex FNN and CNN models.

pix2code: Generating Code from a Graphical User Interface Screenshot

Transforming a graphical user interface screenshot created by a designer into
computer code is a typical task conducted by a developer in order to build
customized software, websites and mobile applications. In this paper, we show
that Deep Learning techniques can be leveraged to automatically generate code
given a graphical user interface screenshot as input. Our model is able to
generate code targeting three different platforms (i.e. iOS, Android and
web-based technologies) from a single input image with over 77% of accuracy.

Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

We propose a new algorithm for training generative adversarial networks to
jointly learn latent codes for both identities (e.g. individual humans) and
observations (e.g. specific photographs). In practice, this means that by
fixing the identity portion of latent codes, we can generate diverse images of
the same subject, and by fixing the observation portion we can traverse the
manifold of subjects while maintaining contingent aspects such as lighting and
pose. Our algorithm features a pairwise training scheme in which each sample
from the generator consists of two images with a common identity code.
Corresponding samples from the real dataset consist of two distinct photographs
of the same subject. In order to fool the discriminator, the generator must
produce images that are both photorealistic, distinct, and appear to depict the
same person. We augment both the DCGAN and BEGAN approaches with Siamese
discriminators to accommodate pairwise training. Experiments with human judges
and an off-the-shelf face verification system demonstrate our algorithm’s
ability to generate convincing, identity-matched photographs.

Detecting Adversarial Examples in Deep Networks with Adaptive Noise Reduction

Bin Liang, Hongcheng Li, Miaoqiang Su, Xirong Li, Wenchang Shi, Xiaofeng Wang
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

Deep neural networks (DNNs) play a key role in many applications.
Unsurprisingly, they also became a potential attack target of adversaries. Some
studies have demonstrated DNN classifiers can be fooled by the adversarial
example, which is crafted via introducing some perturbations into an original
sample. Accordingly, some powerful defense techniques were proposed against
adversarial examples. However, existing defense techniques require modifying
the target model or depend on the prior knowledge of attack techniques to
different degrees. In this paper, we propose a straightforward method for
detecting adversarial image examples. It doesn’t require any prior knowledge of
attack techniques and can be directly deployed into unmodified off-the-shelf
DNN models. Specifically, we consider the perturbation to images as a kind of
noise and introduce two classical image processing techniques, scalar
quantization and smoothing spatial filter, to reduce its effect. The image
two-dimensional entropy is employed as a metric to implement an adaptive noise
reduction for different kinds of images. As a result, the adversarial example
can be effectively detected by comparing the classification results of a given
sample and its denoised version. Thousands of adversarial examples against some
state-of-the-art DNN models are used to evaluate the proposed method, which are
crafted with different attack techniques. The experiment shows that our
detection method can achieve an overall recall of 93.73% and an overall
precision of 95.45% without referring to any prior knowledge of attack
techniques.

Efficient and principled score estimation

Dougal J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton
Comments: Full version including appendices
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

We propose a fast method with statistical guarantees for learning an
exponential family density model where the natural parameter is in a
reproducing kernel Hilbert space, and may be infinite dimensional. The model is
learned by fitting the derivative of the log density, the score, thus avoiding
the need to compute a normalization constant. We improved the computational
efficiency of an earlier solution with a low-rank, Nystr”om-like solution. The
new solution remains consistent, and is shown to converge in Fisher distance at
the same rate as a full-rank solution, with guarantees on the degree of cost
and storage reduction. We compare to a popular score learning approach using a
denoising autoencoder, in experiments on density estimation and in the
construction of an adaptive Hamiltonian Monte Carlo sampler. Apart from the
lack of statistical guarantees for the autoencoder, our estimator is more
data-efficient when estimating the score, runs faster, and has fewer parameters
(which can be tuned in a principled and interpretable way).

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Adaptive optimization methods, which perform local optimization with a metric
constructed from the history of iterates, are becoming increasingly popular for
training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We
show that for simple overparameterized problems, adaptive methods often find
drastically different solutions than gradient descent (GD) or stochastic
gradient descent (SGD). We construct an illustrative binary classification
problem where the data is linearly separable, GD and SGD achieve zero test
error, and AdaGrad, Adam, and RMSProp attain test errors arbitrarily close to
half. We additionally study the empirical generalization capability of adaptive
methods on several state-of-the-art deep learning models. We observe that the
solutions found by adaptive methods generalize worse (often significantly
worse) than SGD, even when these solutions have better training performance.
These results suggest that practitioners should reconsider the use of adaptive
methods to train neural networks.

Matching neural paths: transfer from recognition to correspondence search

Many machine learning tasks require finding per-part correspondences between
objects. In this work we focus on low-level correspondences – a highly
ambiguous matching problem. We propose to use a hierarchical semantic
representation of the objects, coming from a convolutional neural network, to
solve this ambiguity. Training it for low-level correspondence prediction
directly might not be an option in some domains where the ground-truth
correspondences are hard to obtain. We show how transfer from recognition can
be used to avoid such training. Our idea is to mark parts as “matching” if
their features are close to each other at all the levels of convolutional
feature hierarchy (neural paths). Although the overall number of such paths is
exponential in the number of layers, we propose a polynomial algorithm for
aggregating all of them in a single backward pass. The empirical validation is
done on the task of stereo correspondence and demonstrates that we achieve
competitive results among the methods which do not use labeled target domain
data.

Unbiasing Truncated Backpropagation Through Time

Corentin Tallec, Yann Ollivier
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Truncated Backpropagation Through Time (truncated BPTT) is a widespread
method for learning recurrent computational graphs. Truncated BPTT keeps the
computational benefits of Backpropagation Through Time (BPTT) while relieving
the need for a complete backtrack through the whole data sequence at every
step. However, truncation favors short-term dependencies: the gradient estimate
of truncated BPTT is biased, so that it does not benefit from the convergence
guarantees from stochastic gradient theory. We introduce Anticipated Reweighted
Truncated Backpropagation (ARTBP), an algorithm that keeps the computational
benefits of truncated BPTT, while providing unbiasedness. ARTBP works by using
variable truncation lengths together with carefully chosen compensation factors
in the backpropagation equation. We check the viability of ARTBP on two tasks.
First, a simple synthetic task where careful balancing of temporal dependencies
at different scales is needed: truncated BPTT displays unreliable performance,
and in worst case scenarios, divergence, while ARTBP converges reliably.
Second, on Penn Treebank character-level language modelling, ARTBP slightly
outperforms truncated BPTT.

Learning to Succeed while Teaching to Fail: Privacy in Closed Machine Learning Systems

Jure Sokolic, Qiang Qiu, Miguel R. D. Rodrigues, Guillermo Sapiro
Comments: 14 pages, 1 figure
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Security, privacy, and fairness have become critical in the era of data
science and machine learning. More and more we see that achieving universally
secure, private, and fair systems is practically impossible. We have seen for
example how generative adversarial networks can be used to learn about the
expected private training data; how the exploitation of additional data can
reveal private information in the original one; and how what looks like
unrelated features can teach us about each other. Confronted with this
challenge, in this paper we open a new line of research, where the security,
privacy, and fairness is learned and used in a closed environment. The goal is
to ensure that a given entity (e.g., the company or the government), trusted to
infer certain information with our data, is blocked from inferring protected
information from it. For example, a hospital might be allowed to produce
diagnosis on the patient (the positive task), without being able to infer the
gender of the subject (negative task). Similarly, a company can guarantee that
internally it is not using the provided data for any undesired task, an
important goal that is not contradicting the virtually impossible challenge of
blocking everybody from the undesired task. We design a system that learns to
succeed on the positive task while simultaneously fail at the negative one, and
illustrate this with challenging cases where the positive task is actually
harder than the negative one being blocked. Fairness, to the information in the
negative task, is often automatically obtained as a result of this proposed
approach. The particular framework and examples open the door to security,
privacy, and fairness in very important closed scenarios, ranging from private
data accumulation companies like social networks to law-enforcement and
hospitals.

Look, Listen and Learn

Relja Arandjelović, Andrew Zisserman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We consider the question: what can be learnt by looking at and listening to a
large amount of unlabelled videos? There is a valuable, but so far untapped,
source of information contained in the video itself — the correspondence
between the visual and the audio streams, and we introduce a novel
“Audio-Visual Correspondence” learning task that makes use of this. Training
visual and audio networks from scratch, without any additional supervision
other than the raw unconstrained videos themselves, is shown to successfully
solve this task, and, more interestingly, result in good vision and audio
representations. These features set the new state-of-the-art on two sound
classification benchmarks, and perform on par with the state-of-the-art
self-supervised approaches on ImageNet classification. We also demonstrate that
the network is able to localize objects in both modalities, as well as perform
fine-grained recognition tasks.

Visualizing LSTM decisions

Jos van der Westhuizen, Joan Lasenby
Comments: 10 pages, 7 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Applications (stat.AP)

Long Short-Term Memory (LSTM) recurrent neural networks are renowned for
being uninterpretable “black boxes”. In the medical domain where LSTMs have
shown promise, this is specifically concerning because it is imperative to
understand the decisions made by machine learning models in such acute
situations. This study employs techniques used in the Convolutional Neural
Network domain to elucidate the operations that LSTMs perform on time series.
The visualization techniques include input saliency by means of occlusion and
derivatives, class mode visualization, and temporal outputs. Moreover, we
demonstrate that LSTMs appear to extract features similar to those extracted by
wavelets. It was found that deriving the inputs for saliency is a poor
approximation and occlusion is a better approach. Moreover, analyzing LSTMs on
different sets of data provide novel interpretations.

Sluice networks: Learning what to share between loosely related tasks

Multi-task learning is partly motivated by the observation that humans bring
to bear what they know about related problems when solving new ones. Similarly,
deep neural networks can profit from related tasks by sharing parameters with
other networks. However, humans do not consciously decide to transfer knowledge
between tasks (and are typically not aware of the transfer). In machine
learning, it is hard to estimate if sharing will lead to improvements;
especially if tasks are only loosely related. To overcome this, we introduce
Sluice Networks, a general framework for multi-task learning where trainable
parameters control the amount of sharing — including which parts of the models
to share. Our framework goes beyond and generalizes over previous proposals in
enabling hard or soft sharing of all combinations of subspaces, layers, and
skip connections. We perform experiments on three task pairs from natural
language processing, and across seven different domains, using data from
OntoNotes 5.0, and achieve up to 15% average error reductions over common
approaches to multi-task learning. We analyze when the architecture is
particularly helpful, as well as its ability to fit noise. We show that a)
label entropy is predictive of gains in sluice networks, confirming findings
for hard parameter sharing, and b) while sluice networks easily fit noise, they
are robust across domains in practice.

Visual Semantic Planning using Deep Successor Representations

A crucial capability of real-world intelligent agents is their ability to
plan a sequence of actions to achieve their goals in the visual world. In this
work, we address the problem of visual semantic planning: the task of
predicting a sequence of actions from visual observations that transform a
dynamic environment from an initial state to a goal state. Doing so entails
knowledge about objects and their affordances, as well as actions and their
preconditions and effects. We propose learning these through interacting with a
visual and dynamic environment. Our proposed solution involves bootstrapping
reinforcement learning with imitation learning. To ensure cross-task
generalization, we develop a deep predictive model based on successor
representations. Our experimental results show near optimal results across a
wide range of tasks in the challenging THOR environment. The supplementary
video can be accessed at the following link: this https URL

Ambiguity set and learning via Bregman and Wasserstein

Xin Guo, Johnny Hong, Nan Yang
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Construction of ambiguity set in robust optimization relies on the choice of
divergences between probability distributions. In distribution learning,
choosing appropriate probability distributions based on observed data is
critical for approximating the true distribution. To improve the performance of
machine learning models, there has recently been interest in designing
objective functions based on Lp-Wasserstein distance rather than the classical
Kullback-Leibler (KL) divergence. In this paper, we derive concentration and
asymptotic results using Bregman divergence. We propose a novel asymmetric
statistical divergence called Wasserstein-Bregman divergence as a
generalization of L2-Wasserstein distance. We discuss how these results can be
applied to the construction of ambiguity set in robust optimization.

Neural Network Memory Architectures for Autonomous Robot Navigation

Steven W Chen, Nikolay Atanasov, Arbaaz Khan, Konstantinos Karydis, Daniel D. Lee, Vijay Kumar
Subjects: Robotics (cs.RO); Learning (cs.LG)

This paper highlights the significance of including memory structures in
neural networks when the latter are used to learn perception-action loops for
autonomous robot navigation. Traditional navigation approaches rely on global
maps of the environment to overcome cul-de-sacs and plan feasible motions. Yet,
maintaining an accurate global map may be challenging in real-world settings. A
possible way to mitigate this limitation is to use learning techniques that
forgo hand-engineered map representations and infer appropriate control
responses directly from sensed information. An important but unexplored aspect
of such approaches is the effect of memory on their performance. This work is a
first thorough study of memory structures for deep-neural-network-based robot
navigation, and offers novel tools to train such networks from supervision and
quantify their ability to generalize to unseen scenarios. We analyze the
separation and generalization abilities of feedforward, long short-term memory,
and differentiable neural computer networks. We introduce a new method to
evaluate the generalization ability by estimating the VC-dimension of networks
with a final linear readout layer. We validate that the VC estimates are good
predictors of actual test performance. The reported method can be applied to
deep learning problems beyond robotics.

Poincaré Embeddings for Learning Hierarchical Representations

Maximilian Nickel, Douwe Kiela
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Representation learning has become an invaluable approach for learning from
symbolic data such as text and graphs. However, while complex symbolic datasets
often exhibit a latent hierarchical structure, state-of-the-art methods
typically learn embeddings in Euclidean vector spaces, which do not account for
this property. For this purpose, we introduce a new approach for learning
hierarchical representations of symbolic data by embedding them into hyperbolic
space — or more precisely into an n-dimensional Poincar’e ball. Due to the
underlying hyperbolic geometry, this allows us to learn parsimonious
representations of symbolic data by simultaneously capturing hierarchy and
similarity. We introduce an efficient algorithm to learn the embeddings based
on Riemannian optimization and show experimentally that Poincar’e embeddings
outperform Euclidean embeddings significantly on data with latent hierarchies,
both in terms of representation capacity and in terms of generalization
ability.

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

Mark Eisen, Aryan Mokhtari, Alejandro Ribeiro
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

We consider large scale empirical risk minimization (ERM) problems, where
both the problem dimension and variable size is large. In these cases, most
second order methods are infeasible due to the high cost in both computing the
Hessian over all samples and computing its inverse in high dimensions. In this
paper, we propose a novel adaptive sample size second-order method, which
reduces the cost of computing the Hessian by solving a sequence of ERM problems
corresponding to a subset of samples and lowers the cost of computing the
Hessian inverse using a truncated eigenvalue decomposition. We show that while
we geometrically increase the size of the training set at each stage, a single
iteration of the truncated Newton method is sufficient to solve the new ERM
within its statistical accuracy. Moreover, for a large number of samples we are
allowed to double the size of the training set at each stage, and the proposed
method subsequently reaches the statistical accuracy of the full training set
approximately after two effective passes. In addition to this theoretical
result, we show empirically on a number of well known data sets that the
proposed truncated adaptive sample size algorithm outperforms stochastic
alternatives for solving ERM problems.

Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

Yichen Chen, Mengdi Wang
Subjects: Computational Complexity (cs.CC); Learning (cs.LG)

We study the computational complexity of the infinite-horizon
discounted-reward Markov Decision Problem (MDP) with a finite state space
(|mathcal{S}|) and a finite action space (|mathcal{A}|). We show that any
randomized algorithm needs a running time at least
(Omega(|mathcal{S}|^2|mathcal{A}|)) to compute an (epsilon)-optimal policy
with high probability. We consider two variants of the MDP where the input is
given in specific data structures, including arrays of cumulative probabilities
and binary trees of transition probabilities. For these cases, we show that the
complexity lower bound reduces to (Omegaleft( frac{|mathcal{S}|
|mathcal{A}|}{epsilon}
ight)). These results reveal a surprising
observation that the computational complexity of the MDP depends on the data
structure of input.

Modelling spatio-temporal routines in human mobility

Luca Pappalardo, Filippo Simini
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph); Other Statistics (stat.OT)

Human mobility modelling is of fundamental importance in a wide range of
applications, such as the developing of protocols for mobile ad-hoc networks or
what-if analysis in urban ecosystems. Current generative models fail in
accurately reproducing the individuals’ recurrent schedules and at the same
time in accounting for the possibility that individuals may break the routine
during periods of variable duration. In this article we present DITRAS
(DIary-based TRAjectory Simulator), a framework to simulate the spatio-temporal
patterns of human mobility. DITRAS operates in two steps: the generation of a
mobility diary and the translation of the mobility diary into a mobility
trajectory. We propose a data-driven algorithm which constructs a diary
generator from real data, capturing the tendency of individuals to follow or
break their routine. We also propose a trajectory generator based on the
concept of preferential exploration and preferential return. We instantiate
DITRAS with the proposed diary and trajectory generators and compare the
resulting spatio-temporal model with real data and synthetic data produced by
other spatio-temporal mobility models, built by instantiating DITRAS with
several combinations of diary and trajectory generators. We show that the
proposed model reproduces the statistical properties of real trajectories in
the most accurate way, making a step forward the understanding of the origin of
the spatio-temporal patterns of human mobility.

Information Theory

Preserving Privacy while Broadcasting: (k)-Limited-Access Schemes

Mohammed Karmoose, Linqi Song, Martina Cardone, Christina Fragouli
Subjects: Information Theory (cs.IT)

Index coding employs coding across clients within the same broadcast domain.
This typically assumes that all clients learn the coding matrix so that they
can decode and retrieve their requested data. However, learning the coding
matrix can pose privacy concerns: it may enable clients to infer information
about the requests and side information of other clients [1]. In this paper, we
formalize the intuition that the achieved privacy can increase by decreasing
the number of rows of the coding matrix that a client learns. Based on this, we
propose the use of (k)-limited-access schemes: given an index coding scheme
that employs (T) transmissions, we create a (k)-limited-access scheme with
(T_kgeq T) transmissions, and with the property that each client learns at
most (k) rows of the coding matrix to decode its message. We derive upper and
lower bounds on (T_k) for all values of (k), and develop deterministic designs
for these schemes for which (T_k) has an order-optimal exponent for some
regimes.

Information Theoretic Principles of Universal Discrete Denoising

Janis Nötzel, Andreas Winter
Comments: 10 pages, 6 figures
Subjects: Information Theory (cs.IT)

Today, the internet makes tremendous amounts of data widely available. Often,
the same information is behind multiple different available data sets. This
lends growing importance to latent variable models that try to learn the hidden
information from the available imperfect versions. For example, social media
platforms can contain an abundance of pictures of the same person or object,
yet all of which are taken from different perspectives. In a simplified
scenario, one may consider pictures taken from the same perspective, which are
distorted by noise. This latter application allows for a rigorous mathematical
treatment, which is the content of this contribution. We apply a recently
developed method of dependent component analysis to image denoising when
multiple distorted copies of one and the same image are available, each being
corrupted by a different and unknown noise process. In a simplified scenario,
we assume that the distorted image is corrupted by noise that acts
independently on each pixel. We answer completely the question of how to
perform optimal denoising, when at least three distorted copies are available:
First we define optimality of an algorithm in the presented scenario, and then
we describe an aymptotically optimal universal discrete denoising algorithm
(UDDA). In the case of binary data and binary symmetric noise, we develop a
simplified variant of the algorithm, dubbed BUDDA, which we prove to attain
universal denoising uniformly.

Distributed Precoding Systems in Multi-Gateway Multibeam Satellites: Regularization and Coarse Beamforming

Carlos Mosquera, Roberto Lopez-Valcarce, Vahid Joroughi
Comments: Submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

This paper deals with the problem of beamforming design in a multibeam
satellite, which is shared by different groups of terminals -clusters-, each
served by an Earth station or gateway. Each gateway precodes the symbols
addressed to its respective users; the design follows an MMSE criterion, and a
regularization factor judiciously chosen allows to account for the presence of
mutually interfering clusters, extending more classical results applicable to
one centralized station. More importantly, channel statistics can be used
instead of instantaneous channel state information, avoiding the exchange of
information among gateways through backhaul links. The on-board satellite
beamforming weights are designed to exploit the degrees of freedom of the
satellite antennas to minimize the noise impact and the interference to some
specific users. On-ground beamforming results are provided as a reference to
compare the joint performance of MMSE precoders and on-board beamforming
network. A non-adaptive design complements the results and makes them more
amenable to practical use by designing a coarse beamforming network.

Capacity Outer Bound and Degrees of Freedom of Wiener Phase Noise Channels with Oversampling

Luca Barletta, Stefano Rini
Comments: 5 pages, 1 figure, Submitted to Intern. Workshop Inf. Theory (ITW) 2017
Subjects: Information Theory (cs.IT)

The discrete-time Wiener phase noise channel with an integrate-and-dump
multi-sample receiver is studied.

A novel outer bound on the capacity with an average input power constraint is
derived as a function of the oversampling factor.

This outer bound yields the degrees of freedom for the scenario in which the
oversampling factor grows with the transmit power (P) as (P^{alpha}).

The result shows, perhaps surprisingly, that the largest pre-log that can be
attained with phase modulation at high signal-to-noise ratio is at most (1/4).

An Improved Secretive Coded Caching Scheme exploiting Common Demands

Hari Hara Suthan C, Ishani Chugh, Prasad Krishnan
Comments: Submitted to Information Theory Workshop 2017
Subjects: Information Theory (cs.IT)

Coded caching schemes on broadcast networks with user caches help to offload
traffic from peak times to off-peak times by prefetching information from the
server to the receivers during off-peak times and thus serving the users more
efficiently during peak times using coded transmissions. We consider the
problem of secretive coded caching which was proposed recently, in which a user
should not be able to decode any information about any file that the user has
not demanded. We propose a new secretive coded caching scheme which has a lower
average rate compared to the existing state-of-the-art scheme, for the same
memory available at the receivers. The proposed scheme is based on exploiting
the presence of common demands between multiple receivers.

Asymptotically optimal codebooks based on generalized Jacobi sums

Ziling Heng
Subjects: Information Theory (cs.IT)

Codebooks with small inner-product correlation are applied in many practical
applications including direct spread code division multiple access
communications, space-time codes and compressed sensing. It is extremely
difficult to construct codebooks achieving the Welch or Levenshtein bound. In
this paper, we firstly generalize Jacobi sums over finite fields and secondly
obtain asymptotically optimal codebooks with respect to the Welch or
Levenshtein bound. Our main results generalize those in [11] and the codebooks
in this paper have more flexible parameters than those in [11].

Capacity of Molecular Channels with Imperfect Particle-Intensity Modulation and Detection

Nariman Farsad, Christopher Rose, Muriel Médard, Andrea Goldsmith
Comments: Accepted at IEEE International Symposium on Information Theory (ISIT)
Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)

This work introduces the particle-intensity channel (PIC) as a model for
molecular communication systems and characterizes the properties of the optimal
input distribution and the capacity limits for this system. In the PIC, the
transmitter encodes information, in symbols of a given duration, based on the
number of particles released, and the receiver detects and decodes the message
based on the number of particles detected during the symbol interval. In this
channel, the transmitter may be unable to control precisely the number of
particles released, and the receiver may not detect all the particles that
arrive. We demonstrate that the optimal input distribution for this channel
always has mass points at zero and the maximum number of particles that can be
released. We then consider diffusive particle transport, derive the capacity
expression when the input distribution is binary, and show conditions under
which the binary input is capacity-achieving. In particular, we demonstrate
that when the transmitter cannot generate particles at a high rate, the optimal
input distribution is binary.

Joint Uplink/Downlink Resource Allocation and Data Offloading in OFDMA-Based Wireless Powered HetNets

Sepehr Rezvani, Nader Mokari, Mohammad R. Javan
Comments: 36 pages, 11 figures, Submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

This paper considers joint uplink/downlink of an orthogonal frequency
division multiple access (OFDMA)-based heterogeneous network (HetNet)
consisting of a single macro base station (MBS), multiple femto base stations
(FBSs) and access points (APs) where base stations (BSs) can offload data to
APs and each mobile user (MU) is able to harvest the received energy using the
simultaneous wireless information and power transfer (SWIPT) technique. We also
suppose that the harvested energy of MUs are used for their uplink information
transmission. We devise a radio resource allocation (RRA) algorithm to maximize
the uplink sum data rate of MUs subject to a minimum required downlink data
rate of each MU and maximum allowable transmit power of each BS, AP, and MU.
More specifically, both the frequency division duplex (FDD) and time division
duplex (TDD) schemes are investigated. The proposed non-convex optimization
problems are solved using an iterative algorithm. It is also proved that the
proposed algorithm converges to a near-optimal solution. Simulation results
illustrate that the TDD scheme improves the performance compared to the FDD
scheme. In addition, it is shown that utilizing the data offloading technique
improves the uplink sum data rate of MUs compared to the scenario without any
AP.

Exponential error rates of SDP for block models: Beyond Grothendieck's inequality

Yingjie Fei, Yudong Chen
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Social and Information Networks (cs.SI); Statistics Theory (math.ST)

In this paper we consider the cluster estimation problem under the Stochastic
Block Model. We show that the semidefinite programming (SDP) formulation for
this problem achieves an error rate that decays exponentially in the
signal-to-noise ratio. The error bound implies weak recovery in the sparse
graph regime with bounded expected degrees, as well as exact recovery in the
dense regime. An immediate corollary of our results yields error bounds under
the Censored Block Model. Moreover, these error bounds are robust, continuing
to hold under heterogeneous edge probabilities and a form of the so-called
monotone attack.

Significantly, this error rate is achieved by the SDP solution itself without
any further pre- or post-processing, and improves upon existing
polynomially-decaying error bounds proved using the Grothendieck extquoteright
s inequality. Our analysis has two key ingredients: (i) showing that the graph
has a well-behaved spectrum, even in the sparse regime, after discounting an
exponentially small number of edges, and (ii) an order-statistics argument that
governs the final error rate. Both arguments highlight the implicit
regularization effect of the SDP formulation.

Spectral Simplicity of Apparent Complexity, Part I: The Nondiagonalizable Metadynamics of Prediction

Paul M. Riechers, James P. Crutchfield
Comments: 24 pages, 3 figures, 4 tables; current version always at this http URL
Subjects: Chaotic Dynamics (nlin.CD); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Dynamical Systems (math.DS); Functional Analysis (math.FA)

Virtually all questions that one can ask about the behavioral and structural
complexity of a stochastic process reduce to a linear algebraic framing of a
time evolution governed by an appropriate hidden-Markov process generator. Each
type of question—correlation, predictability, predictive cost, observer
synchronization, and the like—induces a distinct generator class. Answers are
then functions of the class-appropriate transition dynamic. Unfortunately,
these dynamics are generically nonnormal, nondiagonalizable, singular, and so
on. Tractably analyzing these dynamics relies on adapting the recently
introduced meromorphic functional calculus, which specifies the spectral
decomposition of functions of nondiagonalizable linear operators, even when the
function poles and zeros coincide with the operator’s spectrum. Along the way,
we establish special properties of the projection operators that demonstrate
how they capture the organization of subprocesses within a complex system.
Circumventing the spurious infinities of alternative calculi, this leads in the
sequel, Part II, to the first closed-form expressions for complexity measures,
couched either in terms of the Drazin inverse (negative-one power of a singular
operator) or the eigenvalues and projection operators of the appropriate
transition dynamic.

Predicting stock market movements using network science: An information theoretic approach

Minjun Kim, Hiroki Sayama
Comments: 13 pages, 7 figures, 3 tables
Subjects: Social and Information Networks (cs.SI); Computational Engineering, Finance, and Science (cs.CE); Information Theory (cs.IT); Physics and Society (physics.soc-ph)

A stock market is considered as one of the highly complex systems, which
consists of many components whose prices move up and down without having a
clear pattern. The complex nature of a stock market challenges us on making a
reliable prediction of its future movements. In this paper, we aim at building
a new method to forecast the future movements of Standard & Poor’s 500 Index
(S&P 500) by constructing time-series complex networks of S&P 500 underlying
companies by connecting them with links whose weights are given by the mutual
information of 60-minute price movements of the pairs of the companies with the
consecutive 5,340 minutes price records. We showed that the changes in the
strength distributions of the networks provide an important information on the
network’s future movements. We built several metrics using the strength
distributions and network measurements such as centrality, and we combined the
best two predictors by performing a linear combination. We found that the
combined predictor and the changes in S&P 500 show a quadratic relationship,
and it allows us to predict the amplitude of the one step future change in S&P
500. The result showed significant fluctuations in S&P 500 Index when the
combined predictor was high. In terms of making the actual index predictions,
we built ARIMA models. We found that adding the network measurements into the
ARIMA models improves the model accuracy. These findings are useful for
financial market policy makers as an indicator based on which they can
interfere with the markets before the markets make a drastic change, and for
quantitative investors to improve their forecasting models.