IT博客汇 | arXiv Paper Daily: Fri, 2 Jun 2017

arXiv Paper Daily: Fri, 2 Jun 2017

我爱机器学习(52ml.net)发表于 2017-06-02 00:00:00

Neural and Evolutionary Computing

Integer Echo State Networks: Hyperdimensional Reservoir Computing

Denis Kleyko, Edward Paxon Frady, Evgeny Osipov
Comments: 10 pages, 5 figures
Subjects: Neural and Evolutionary Computing (cs.NE)

We propose an integer approximation of Echo State Networks (ESN) based on the
mathematics of hyperdimensional computing. The reservoir of the proposed
Integer Echo State Network (intESN) contains only n-bits integers and replaces
the recurrent matrix multiply with an efficient cyclic shift operation. Such an
architecture results in dramatic improvements in memory footprint and
computational efficiency, with minimal performance loss. Our architecture
naturally supports the usage of the trained reservoir in symbolic processing
tasks of analogy making and logical inference.

Blind nonnegative source separation using biological neural networks

Cengiz Pehlevan, Sreyas Mohan, Dmitri B. Chklovskii
Comments: Accepted for publication in Neural Computation
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

Blind source separation, i.e. extraction of independent sources from a
mixture, is an important problem for both artificial and natural signal
processing. Here, we address a special case of this problem when sources (but
not the mixing matrix) are known to be nonnegative, for example, due to the
physical nature of the sources. We search for the solution to this problem that
can be implemented using biologically plausible neural networks. Specifically,
we consider the online setting where the dataset is streamed to a neural
network. The novelty of our approach is that we formulate blind nonnegative
source separation as a similarity matching problem and derive neural networks
from the similarity matching objective. Importantly, synaptic weights in our
networks are updated according to biologically plausible local learning rules.

Transfer Learning for Speech Recognition on a Budget

Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens Johannsmeier, Sebastian Stober
Comments: Accepted for 2nd ACL Workshop on Representation Learning for NLP
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

End-to-end training of automated speech recognition (ASR) systems requires
massive data and compute resources. We explore transfer learning based on model
adaptation as an approach for training ASR models under constrained GPU memory,
throughput and training data. We conduct several systematic experiments
adapting a Wav2Letter convolutional neural network originally trained for
English ASR to the German language. We show that this technique allows faster
training on consumer-grade resources while requiring less training data in
order to achieve the same accuracy, thereby lowering the cost of training ASR
models in other languages. Model introspection revealed that small adaptations
to the network’s weights were sufficient for good performance, especially for
inner layers.

Free energy-based reinforcement learning using a quantum processor

Anna Levit, Daniel Crawford, Navid Ghadermarzy, Jaspreet S. Oberoi, Ehsan Zahedinejad, Pooya Ronagh
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Quantum Physics (quant-ph)

Recent theoretical and experimental results suggest the possibility of using
current and near-future quantum hardware in challenging sampling tasks. In this
paper, we introduce free energy-based reinforcement learning (FERL) as an
application of quantum hardware. We propose a method for processing a quantum
annealer’s measured qubit spin configurations in approximating the free energy
of a quantum Boltzmann machine (QBM). We then apply this method to perform
reinforcement learning on the grid-world problem using the D-Wave 2000Q quantum
annealer. The experimental results show that our technique is a promising
method for harnessing the power of quantum sampling in reinforcement learning
tasks.

Computer Vision and Pattern Recognition

Fader Networks: Manipulating Images by Sliding Attributes

Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a new encoder-decoder architecture that is trained to
reconstruct images by disentangling the salient information of the image and
the values of attributes directly in the latent space. As a result, after
training, our model can generate different realistic versions of an input image
by varying the attribute values. By using continuous attribute values, we can
choose how much a specific attribute is perceivable in the generated image.
This property could allow for applications where users can modify an image
using sliding knobs, like faders on a mixing console, to change the facial
expression of a portrait, or to update the color of some objects. Compared to
the state-of-the-art which mostly relies on training adversarial networks in
pixel space by altering attribute values at train time, our approach results in
much simpler training schemes and nicely scales to multiple attributes. We
present evidence that our model can significantly change the perceived value of
the attributes while preserving the naturalness of images.

Line Profile Based Segmentation Algorithm for Touching Corn Kernels

Ali Mahdi, Jun Qin
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image segmentation of touching objects plays a key role in providing accurate
classification for computer vision technologies. A new line profile based
imaging segmentation algorithm has been developed to provide a robust and
accurate segmentation of a group of touching corns. The performance of the line
profile based algorithm has been compared to a watershed based imaging
segmentation algorithm. Both algorithms are tested on three different patterns
of images, which are isolated corns, single-lines, and random distributed
formations. The experimental results show that the algorithm can segment a
large number of touching corn kernels efficiently and accurately.

DiracNets: Training Very Deep Neural Networks Without Skip-Connections

Sergey Zagoruyko, Nikos Komodakis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks with skip-connections, such as ResNet, show excellent
performance in various image classification benchmarks. It is though observed
that the initial motivation behind them – training deeper networks – does not
actually hold true, and the benefits come from increased capacity, rather than
from depth. Motivated by this, and inspired from ResNet, we propose a simple
Dirac weight parameterization, which allows us to train very deep plain
networks without skip-connections, and achieve nearly the same performance.
This parameterization has a minor computational cost at training time and no
cost at all at inference. We’re able to achieve 95.5% accuracy on CIFAR-10 with
34-layer deep plain network, surpassing 1001-layer deep ResNet, and approaching
Wide ResNet. Our parameterization also mostly eliminates the need of careful
initialization in residual and non-residual networks. The code and models for
our experiments are available at this https URL

Deep Mutual Learning

Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu
Comments: 10 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Model distillation is an effective and widely used technique to transfer
knowledge from a teacher to a student network. The typical application is to
transfer from a powerful large network or ensemble to a small network, that is
better suited to low-memory or fast execution requirements. In this paper, we
present a deep mutual learning (DML) strategy where, rather than one way
transfer between a static pre-defined teacher and a student, an ensemble of
students learn collaboratively and teach each other throughout the training
process. Our experiments show that a variety of network architectures benefit
from mutual learning and achieve compelling results on CIFAR-100 recognition
and Market-1501 person re-identification benchmarks. Surprisingly, it is
revealed that no prior powerful teacher network is necessary — mutual learning
of a collection of simple student networks works, and moreover outperforms
distillation from a more powerful yet static teacher.

TransFlow: Unsupervised Motion Flow by Joint Geometric and Pixel-level Estimation

Stefano Alletto, Davide Abati, Simone Calderara, Rita Cucchiara, Luca Rigazio
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address unsupervised optical flow estimation for ego-centric motion. We
argue that optical flow can be cast as a geometrical warping between two
successive video frames and devise a deep architecture to estimate such
transformation in two stages. First, a dense pixel-level flow is computed with
a geometric prior imposing strong spatial constraints. Such prior is typical of
driving scenes, where the point of view is coherent with the vehicle motion. We
show how such global transformation can be approximated with an homography and
how spatial transformer layers can be employed to compute the flow field
implied by such transformation. The second stage then refines the prediction
feeding a second deeper network. A final reconstruction loss compares the
warping of frame X(t) with the subsequent frame X(t+1) and guides both
estimates. The model, which we named TransFlow, performs favorably compared to
other unsupervised algorithms, and shows better generalization compared to
supervised methods with a 3x reduction in error on unseen data.

An Effective Approach for Point Clouds Registration Based on the Hard and Soft Assignments

Congcong Jin, Jihua Zhu, Yaochen Li, Shaoyi Du, Zhongyu Li, Huimin Lu
Comments: 23 pages, 6 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

For the registration of partially overlapping point clouds, this paper
proposes an effective approach based on both the hard and soft assignments.
Given two initially posed clouds, it firstly establishes the forward
correspondence for each point in the data shape and calculates the value of
binary variable, which can indicate whether this point correspondence is
located in the overlapping areas or not. Then, it establishes the bilateral
correspondence and computes bidirectional distances for each point in the
overlapping areas. Based on the ratio of bidirectional distances, the
exponential function is selected and utilized to calculate the probability
value, which can indicate the reliability of the point correspondence.
Subsequently, both the values of hard and soft assignments are embedded into
the proposed objective function for registration of partially overlapping point
clouds and a novel variant of ICP algorithm is proposed to obtain the optimal
rigid transformation. The proposed approach can achieve good registration of
point clouds, even when their overlap percentage is low. Experimental results
tested on public data sets illustrate its superiority over previous approaches
on accuracy and robustness.

Depth Structure Preserving Scene Image Generation

Wendong Zhang, Bingbing Ni, Yichao Yan, Jingwei Xu, Xiaokang Yang
Comments: 9 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Key to automatically generate natural scene images is to properly arrange
among various spatial elements, especially in the depth direction. To this end,
we introduce a novel depth structure preserving scene image generation network
(DSP-GAN), which favors a hierarchical and heterogeneous architecture, for the
purpose of depth structure preserving scene generation. The main trunk of the
proposedinfrastructureisbuiltonaHawkespointprocessthat models the spatial
dependency between different depth layers. Within each layer generative
adversarial sub-networks are trained collaboratively to generate realistic
scene components, conditioned on the layer information produced by the point
process. We experiment our model on a sub-set of SUNdataset with annotated
scene images and demonstrate that our models are capable of generating
depth-realistic natural scene image.

Shape and Positional Geometry of Multi-Object Configurations

James Damon, Ellen Gasparovic
Comments: This paper presents material relevant for two and three dimensional images that builds on and makes many references to a previous paper by the authors, arXiv:1402.5517
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In previous work, we introduced a method for modeling a configuration of
objects in 2D and 3D images using a mathematical “medial/skeletal linking
structure.” In this paper, we show how these structures allow us to capture
positional properties of a multi-object configuration in addition to the shape
properties of the individual objects. In particular, we introduce numerical
invariants for positional properties which measure the closeness of neighboring
objects, including identifying the parts of the objects which are close, and
the “relative significance” of objects compared with the other objects in the
configuration. Using these numerical measures, we introduce a hierarchical
ordering and relations between the individual objects, and quantitative
criteria for identifying subconfigurations. In addition, the invariants provide
a “proximity matrix” which yields a unique set of weightings measuring overall
proximity of objects in the configuration. Furthermore, we show that these
invariants, which are volumetrically defined and involve external regions, may
be computed via integral formulas in terms of “skeletal linking integrals”
defined on the internal skeletal structures of the objects.

Faster Spatially Regularized Correlation Filters for Visual Tracking

Xiaoxiang Hu, Yujiu Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Discriminatively learned correlation filters (DCF) have been widely used in
online visual tracking filed due to its simplicity and efficiency. These
methods utilize a periodic assumption of the training samples to construct a
circulant data matrix, which implicitly increases the training samples and
reduces both storage and computational complexity.The periodic assumption also
introduces unwanted boundary effects. Recently, Spatially Regularized
Correlation Filters (SRDCF) solved this issue by introducing penalization on
correlation filter coefficients depending on their spatial location. However,
SRDCF’s efficiency dramatically decreased due to the breaking of circulant
structure.

We propose Faster Spatially Regularized Discriminative Correlation Filters
(FSRDCF) for tracking. The FSRDCF is constructed from Ridge Regression, the
circulant structure of training samples in the spatial domain is fully used,
more importantly, we further exploit the circulant structure of regularization
function in the Fourier domain, which allows our problem to be solved more
directly and efficiently. Experiments are conducted on three benchmark
datasets: OTB-2013, OTB-2015 and VOT2016. Our approach achieves equivalent
performance to the baseline tracker SRDCF on all three datasets. On OTB-2013
and OTB-2015 datasets, our approach obtains a more than twice faster running
speed and a more than third times shorter start-up time than the SRDCF. For
state-of-the-art comparison, our approach demonstrates superior performance
compared to other non-spatial-regularization trackers.

Superhuman Accuracy on the SNEMI3D Connectomics Challenge

Kisuk Lee, Jonathan Zung, Peter Li, Viren Jain, H. Sebastian Seung
Subjects: Computer Vision and Pattern Recognition (cs.CV)

For the past decade, convolutional networks have been used for 3D
reconstruction of neurons from electron microscopic (EM) brain images. Recent
years have seen great improvements in accuracy, as evidenced by submissions to
the SNEMI3D benchmark challenge. Here we report the first submission to surpass
the estimate of human accuracy provided by the SNEMI3D leaderboard. A variant
of 3D U-Net is trained on a primary task of predicting affinities between
nearest neighbor voxels, and an auxiliary task of predicting long-range
affinities. The training data is augmented by simulated image defects. The
nearest neighbor affinities are used to create an oversegmentation, and then
supervoxels are greedily agglomerated based on mean affinity. The resulting
SNEMI3D score exceeds the estimate of human accuracy by a large margin. While
one should be cautious about extrapolating from the SNEMI3D benchmark to
real-world accuracy of large-scale neural circuit reconstruction, our result
inspires optimism that the goal of full automation may be realizable in the
future.

Blood capillaries and vessels segmentation in optical coherence tomography angiogram using fuzzy C-means and Curvelet transform

Fariborz Taherkhani
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Blood capillaries density and vessels structure in vivo brain provide
information about diseases and hemodynamic in the brain network. Capillary
density in animal brain is one of the indexes that determines if it has
diabetes. Capillary density in the animal brain subjected to diabetes
diminishes over time. Variation in physical structure and diameter of the
vessels as a function of time gives us information about hemodynamic in the
brain network as well. Capillaries and vessels segmentation is a process that
can help us in predicting some of diseases such as diabetes as well as
monitoring certain area of the brain that are supposed to be stimulated by
Optogenetics technique for controlling blood flow in the brain network. In this
work, we propose an unsupervised learning approach to distinguish capillaries
from vessels. The proposed method is based on curvelet transform which measures
thickness of curve singularities in Optical Coherence Tomography (OCT) images.
In this work, first we enhance contrast of image by strengthening faint
capillaries and softening vessels; we suppress noise level of image in this
step as well. This step of the algorithm is performed by modifying curvelet
coefficients using a non- linear function. Second, we extract a feature vector
from each pixel of image using curvelet transform and finally, we apply a
well-tuned fuzzy C-mean clustering approach to segment capillaries and vessels
from background. Improvement in segmentation accuracy is one of the main
motivations of this work; experimental results on test images indicate that the
proposed method outperforms some of existing segmentation methods.

Megapixel Size Image Creation using Generative Adversarial Networks

Marco Marchesi
Comments: 3 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Learning (cs.LG)

Since its appearance, Generative Adversarial Networks (GANs) have received a
lot of interest in the AI community. In image generation several projects
showed how GANs are able to generate photorealistic images but the results so
far did not look adequate for the quality standard of visual media production
industry. We present an optimized image generation process based on a Deep
Convolutional Generative Adversarial Networks (DCGANs), in order to create
photorealistic high-resolution images (up to 1024×1024 pixels). Furthermore,
the system was fed with a limited dataset of images, less than two thousand
images. All these results give more clue about future exploitation of GANs in
Computer Graphics and Visual Effects.

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI

Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus Alley, Neil Thakur, Song Han, William Dally, John M. Pauly, Lei Xing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear
inverse task demanding time and resource intensive computations that can
substantially trade off {it accuracy} for {it speed} in real-time imaging. In
addition, state-of-the-art compressed sensing (CS) analytics are not cognizant
of the image {it diagnostic quality}. To cope with these challenges we put
forth a novel CS framework that permeates benefits from generative adversarial
networks (GAN) to train a (low-dimensional) manifold of diagnostic-quality MR
images from historical patients. Leveraging a mixture of least-squares (LS)
GANs and pixel-wise (ell_1) cost, a deep residual network with skip
connections is trained as the generator that learns to remove the {it
aliasing} artifacts by projecting onto the manifold. LSGAN learns the texture
details, while (ell_1) controls the high-frequency noise. A multilayer
convolutional neural network is then jointly trained based on diagnostic
quality images to discriminate the projection quality. The test phase performs
feed-forward propagation over the generator network that demands a very low
computational overhead. Extensive evaluations are performed on a large
contrast-enhanced MR dataset of pediatric patients. In particular, images rated
based on expert radiologists corroborate that GANCS retrieves high contrast
images with detailed texture relative to conventional CS, and pixel-wise
schemes. In addition, it offers reconstruction under a few milliseconds, two
orders of magnitude faster than state-of-the-art CS-MRI schemes.

Cross-modal Common Representation Learning by Hybrid Transfer Network

Xin Huang, Yuxin Peng, Mingkuan Yuan
Comments: To appear in the proceedings of 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, Aug. 19-25, 2017
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

DNN-based cross-modal retrieval is a research hotspot to retrieve across
different modalities as image and text, but existing methods often face the
challenge of insufficient cross-modal training data. In single-modal scenario,
similar problem is usually relieved by transferring knowledge from large-scale
auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is
also very useful for cross-modal retrieval, which can provide rich general
semantic information that can be shared across different modalities. However,
it is challenging to transfer useful knowledge from single-modal (as image)
source domain to cross-modal (as image/text) target domain. Knowledge in source
domain cannot be directly transferred to both two different modalities in
target domain, and the inherent cross-modal correlation contained in target
domain provides key hints for cross-modal retrieval which should be preserved
during transfer process. This paper proposes Cross-modal Hybrid Transfer
Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes
the modality in both source and target domains as a bridge, for transferring
knowledge to both two modalities simultaneously; Layer-sharing correlation
subnetwork preserves the inherent cross-modal semantic correlation to further
adapt to cross-modal retrieval task. Cross-modal data can be converted to
common representation by CHTN for retrieval, and comprehensive experiment on 3
datasets shows its effectiveness.

Teaching Machines to Describe Images via Natural Language Feedback

Huan Ling, Sanja Fidler
Comments: 13 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Robots will eventually be part of every household. It is thus critical to
enable algorithms to learn from and be guided by non-expert users. In this
paper, we bring a human in the loop, and enable a human teacher to give
feedback to a learning agent in the form of natural language. We argue that a
descriptive sentence can provide a much stronger learning signal than a numeric
reward in that it can easily point to where the mistakes are and how to correct
them. We focus on the problem of image captioning in which the quality of the
output can easily be judged by non-experts. We propose a hierarchical
phrase-based captioning model trained with policy gradients, and design a
feedback network that provides reward to the learner by conditioning on the
human-provided feedback. We show that by exploiting descriptive feedback our
model learns to perform better than when given independently written human
captions.

Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers

Ken Hoover, Sourish Chaudhuri, Caroline Pantofaru, Malcolm Slaney, Ian Sturdy
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)

In this paper, we present a system that associates faces with voices in a
video by fusing information from the audio and visual signals. The thesis
underlying our work is that an extremely simple approach to generating (weak)
speech clusters can be combined with visual signals to effectively associate
faces and voices by aggregating statistics across a video. This approach does
not need any training data specific to this task and leverages the natural
coherence of information in the audio and visual streams. It is particularly
applicable to tracking speakers in videos on the web where a priori information
about the environment (e.g., number of speakers, spatial signals for
beamforming) is not available. We performed experiments on a real-world dataset
using this analysis framework to determine the speaker in a video. Given a
ground truth labeling determined by human rater consensus, our approach had
~71% accuracy.

Artificial Intelligence

Enhancing workflow-nets with data for trace completion

Riccardo De Masellis, Chiara Di Francescomarino, Chiara Ghidini, Sergio Tessaris
Subjects: Artificial Intelligence (cs.AI)

The growing adoption of IT-systems for modeling and executing (business)
processes or services has thrust the scientific investigation towards
techniques and tools which support more complex forms of process analysis. Many
of them, such as conformance checking, process alignment, mining and
enhancement, rely on complete observation of past (tracked and logged)
executions. In many real cases, however, the lack of human or IT-support on all
the steps of process execution, as well as information hiding and abstraction
of model and data, result in incomplete log information of both data and
activities. This paper tackles the issue of automatically repairing traces with
missing information by notably considering not only activities but also data
manipulated by them. Our technique recasts such a problem in a reachability
problem and provides an encoding in an action language which allows to
virtually use any state-of-the-art planning to return solutions.

Grounding Symbols in Multi-Modal Instructions

Yordan Hristov, Svetlin Penkov, Alex Lascarides, Subramanian Ramamoorthy
Comments: 9 pages, 8 figures, To appear in the Proceedings of the ACL workshop Language Grounding for Robotics, Vancouver, Canada
Subjects: Artificial Intelligence (cs.AI)

As robots begin to cohabit with humans in semi-structured environments, the
need arises to understand instructions involving rich variability—for
instance, learning to ground symbols in the physical world. Realistically, this
task must cope with small datasets consisting of a particular users’ contextual
assignment of meaning to terms. We present a method for processing a raw stream
of cross-modal input—i.e., linguistic instructions, visual perception of a
scene and a concurrent trace of 3D eye tracking fixations—to produce the
segmentation of objects with a correspondent association to high-level
concepts. To test our framework we present experiments in a table-top object
manipulation scenario. Our results show our model learns the user’s notion of
colour and shape from a small number of physical demonstrations, generalising
to identifying physical referents for novel combinations of the words.

Diversified Top-k Partial MaxSAT Solving

Junping Zhou, Huanyao Sun, Feifei Ma, Jian Gao, Ke Xu, Minghao Yin
Subjects: Artificial Intelligence (cs.AI)

We introduce a diversified top-k partial MaxSAT problem, a combination of
partial MaxSAT problem and enumeration problem. Given a partial MaxSAT formula
F and a positive integer k, the diversified top-k partial MaxSAT is to find k
maximal solutions for F such that the k maximal solutions satisfy the maximum
number of soft clauses of F. This problem can be widely used in many
applications including community detection, sensor place, motif discovery, and
combinatorial testing. We prove the problem is NP-hard and propose an approach
for solving the problem. The concrete idea of the approach is to design an
encoding EE which reduces diversified top-k partial MaxSAT problem into partial
MaxSAT problem, and then solve the resulting problem with state-of-art solvers.
In addition, we present an algorithm MEMKC exactly solving the diversified
top-k partial MaxSAT. Through several experiments we show that our approach can
be successfully applied to the interesting problem.

Descriptions of Objectives and Processes of Mechanical Learning

Chuyu Xiong
Subjects: Artificial Intelligence (cs.AI)

In [1], we introduced mechanical learning and proposed 2 approaches to
mechanical learning. Here, we follow one such approach to well describe the
objects and the processes of learning. We discuss 2 kinds of patterns:
objective and subjective pattern. Subjective pattern is crucial for learning
machine. We prove that for any objective pattern we can find a proper
subjective pattern based upon least base patterns to express the objective
pattern well. X-form is algebraic expression for subjective pattern. Collection
of X-forms form internal representation space, which is center of learning
machine. We discuss learning by teaching and without teaching. We define data
sufficiency by X-form. We then discussed some learning strategies. We show, in
each strategy, with sufficient data, and with certain capabilities, learning
machine indeed can learn any pattern (universal learning machine). In appendix,
with knowledge of learning machine, we try to view deep learning from a
different angle, i.e. its internal representation space and its learning
dynamics.

A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit

Mark W. Lewis
Comments: Quality solutions quickly obtained for xQx using the GPU to perform matrix multiplication, however improvements to solution intensification are needed
Subjects: Artificial Intelligence (cs.AI)

Multi-start algorithms are a common and effective tool for metaheuristic
searches. In this paper we amplify multi-start capabilities by employing the
parallel processing power of the graphics processer unit (GPU) to quickly
generate a diverse starting set of solutions for the Unconstrained Binary
Quadratic Optimization Problem which are evaluated and used to implement
screening methods to select solutions for further optimization. This method is
implemented as an initial high quality solution generation phase prior to a
secondary steepest ascent search and a comparison of results to best known
approaches on benchmark unconstrained binary quadratic problems demonstrates
that GPU-enabled diversified multi-start with screening quickly yields very
good results.

Learning Disentangled Representations with Semi-Supervised Deep Generative Models

N. Siddharth, Brooks Paige, Jan-Willem Van de Meent, Alban Desmaison, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

Variational autoencoders (VAEs) learn representations of data by jointly
training a probabilistic encoder and decoder network. Typically these models
encode all features of the data into a single variable. Here we are interested
in learning disentangled representations that encode distinct aspects of the
data into separate variables. We propose to learn such representations using
model architectures that generalize from standard VAEs, employing a general
graphical model structure in the encoder and decoder. This allows us to train
partially-specified models that make relatively strong assumptions about a
subset of interpretable variables and rely on the flexibility of neural
networks to learn representations for the remaining variables. We further
define a general objective for semi-supervised learning in this model class,
which can be approximated using an importance sampling procedure. We evaluate
our framework’s ability to learn disentangled representations, both by
qualitative exploration of its generative capacity, and quantitative evaluation
of its discriminative ability on a variety of models and datasets.

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Off-policy model-free deep reinforcement learning methods using previously
collected data can improve sample efficiency over on-policy policy gradient
techniques. On the other hand, on-policy algorithms are often more stable and
easier to use. This paper examines, both theoretically and empirically,
approaches to merging on- and off-policy updates for deep reinforcement
learning. Theoretical results show that off-policy updates with a value
function estimator can be interpolated with on-policy policy gradient updates
whilst still satisfying performance bounds. Our analysis uses control variate
methods to produce a family of policy gradient algorithms, with several
recently proposed algorithms being special cases of this family. We then
provide an empirical comparison of these techniques with the remaining
algorithmic details fixed, and show how different mixing of off-policy gradient
estimates with on-policy samples contribute to improvements in empirical
performance. The final algorithm provides a generalization and unification of
existing deep policy gradient techniques, has theoretical guarantees on the
bias introduced by off-policy updates, and improves on the state-of-the-art
model-free deep RL methods on a number of OpenAI Gym continuous control
benchmarks.

Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

Nikola Mrkšić, Ivan Vulić, Diarmuid Ó Séaghdha, Ira Leviant, Roi Reichart, Milica Gašić, Anna Korhonen, Steve Young
Comments: Accepted for publication at TACL (to be presented at EMNLP 2017)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present Attract-Repel, an algorithm for improving the semantic quality of
word vectors by injecting constraints extracted from lexical resources.
Attract-Repel facilitates the use of constraints from mono- and cross-lingual
resources, yielding semantically specialised cross-lingual vector spaces. Our
evaluation shows that the method can make use of existing cross-lingual
lexicons to construct high-quality vector spaces for a plethora of different
languages, facilitating semantic transfer from high- to lower-resource ones.
The effectiveness of our approach is demonstrated with state-of-the-art results
on semantic similarity datasets in six languages. We next show that
Attract-Repel-specialised vectors boost performance in the downstream task of
dialogue state tracking (DST) across multiple languages. Finally, we show that
cross-lingual vector spaces produced by our algorithm facilitate the training
of multilingual DST models, which brings further performance improvements.

Discovering Discrete Latent Topics with Neural Variational Inference

Yishu Miao, Edward Grefenstette, Phil Blunsom
Comments: ICML 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)

Topic models have been widely explored as probabilistic generative models of
documents. Traditional inference methods have sought closed-form derivations
for updating the models, however as the expressiveness of these models grows,
so does the difficulty of performing fast and accurate inference over their
parameters. This paper presents alternative neural approaches to topic
modelling by providing parameterisable distributions over topics which permit
training by backpropagation in the framework of neural variational inference.
In addition, with the help of a stick-breaking construction, we propose a
recurrent network that is able to discover a notionally unbounded number of
topics, analogous to Bayesian non-parametric topic models. Experimental results
on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the
effectiveness and efficiency of these neural topic models.

One button machine for automating feature engineering in relational databases

Hoang Thanh Lam, Johann-Michael Thiebaut, Mathieu Sinn, Bei Chen, Tiep Mai, Oznur Alkan
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

Feature engineering is one of the most important and time consuming tasks in
predictive analytics projects. It involves understanding domain knowledge and
data exploration to discover relevant hand-crafted features from raw data. In
this paper, we introduce a system called One Button Machine, or OneBM for
short, which automates feature discovery in relational databases. OneBM
automatically performs a key activity of data scientists, namely, joining of
database tables and applying advanced data transformations to extract useful
features from data. We validated OneBM in Kaggle competitions in which OneBM
achieved performance as good as top 16% to 24% data scientists in three Kaggle
competitions. More importantly, OneBM outperformed the state-of-the-art system
in a Kaggle competition in terms of prediction accuracy and ranking on Kaggle
leaderboard. The results show that OneBM can be useful for both data scientists
and non-experts. It helps data scientists reduce data exploration time allowing
them to try and error many ideas in short time. On the other hand, it enables
non-experts, who are not familiar with data science, to quickly extract value
from their data with a little effort, time and cost.

Teaching Machines to Describe Images via Natural Language Feedback

Robots will eventually be part of every household. It is thus critical to
enable algorithms to learn from and be guided by non-expert users. In this
paper, we bring a human in the loop, and enable a human teacher to give
feedback to a learning agent in the form of natural language. We argue that a
descriptive sentence can provide a much stronger learning signal than a numeric
reward in that it can easily point to where the mistakes are and how to correct
them. We focus on the problem of image captioning in which the quality of the
output can easily be judged by non-experts. We propose a hierarchical
phrase-based captioning model trained with policy gradients, and design a
feedback network that provides reward to the learner by conditioning on the
human-provided feedback. We show that by exploiting descriptive feedback our
model learns to perform better than when given independently written human
captions.

Free energy-based reinforcement learning using a quantum processor

Recent theoretical and experimental results suggest the possibility of using
current and near-future quantum hardware in challenging sampling tasks. In this
paper, we introduce free energy-based reinforcement learning (FERL) as an
application of quantum hardware. We propose a method for processing a quantum
annealer’s measured qubit spin configurations in approximating the free energy
of a quantum Boltzmann machine (QBM). We then apply this method to perform
reinforcement learning on the grid-world problem using the D-Wave 2000Q quantum
annealer. The experimental results show that our technique is a promising
method for harnessing the power of quantum sampling in reinforcement learning
tasks.

The Sample Complexity of Online One-Class Collaborative Filtering

Reinhard Heckel, Kannan Ramchandran
Comments: ICML 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

We consider the online one-class collaborative filtering (CF) problem that
consists of recommending items to users over time in an online fashion based on
positive ratings only. This problem arises when users respond only occasionally
to a recommendation with a positive rating, and never with a negative one. We
study the impact of the probability of a user responding to a recommendation,
p_f, on the sample complexity, i.e., the number of ratings required to make
`good’ recommendations, and ask whether receiving positive and negative
ratings, instead of positive ratings only, improves the sample complexity. Both
questions arise in the design of recommender systems. We introduce a simple
probabilistic user model, and analyze the performance of an online user-based
CF algorithm. We prove that after an initial cold start phase, where
recommendations are invested in exploring the user’s preferences, this
algorithm makes—up to a fraction of the recommendations required for updating
the user’s preferences—perfect recommendations. The number of ratings
required for the cold start phase is nearly proportional to 1/p_f, and that for
updating the user’s preferences is essentially independent of p_f. As a
consequence we find that, receiving positive and negative ratings instead of
only positive ones improves the number of ratings required for initial
exploration by a factor of 1/p_f, which can be significant.

A Learning Based Optimal Human Robot Collaboration with Linear Temporal Logic Constraints

Bo Wu, Bin Hu, Hai Lin
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

This paper considers an optimal task allocation problem for human robot
collaboration in human robot systems with persistent tasks. Such human robot
systems consist of human operators and intelligent robots collaborating with
each other to accomplish complex tasks that cannot be done by either part
alone. The system objective is to maximize the probability of successfully
executing persistent tasks that are formulated as linear temporal logic
specifications and minimize the average cost between consecutive visits of a
particular proposition. This paper proposes to model the human robot
collaboration under a framework with the composition of multiple Markov
Decision Process (MDP) with possibly unknown transition probabilities, which
characterizes how human cognitive states, such as human trust and fatigue,
stochastically change with the robot performance. Under the unknown MDP models,
an algorithm is developed to learn the model and obtain an optimal task
allocation policy that minimizes the expected average cost for each task cycle
and maximizes the probability of satisfying linear temporal logic constraints.
Moreover, this paper shows that the difference between the optimal policy based
on the learned model and that based on the underlying ground truth model can be
bounded by arbitrarily small constant and large confidence level with
sufficient samples. The case study of an assembly process demonstrates the
effectiveness and benefits of our proposed learning based human robot
collaboration.

Information Retrieval

Item-Item Music Recommendations With Side Information

Özgür Demir, Alexey Rodriguez Yakushev, Rany Keddo, Ursula Kallio
Subjects: Information Retrieval (cs.IR)

Online music services have tens of millions of tracks. The content itself is
broad and covers various musical genres as well as non-musical audio content
such as radio plays and podcasts. The sheer scale and diversity of content
makes it difficult for a user to find relevant tracks. Relevant recommendations
are therefore crucial for a good user experience. Here we present a method to
compute track-track similarities using collaborative filtering signals with
side information. On a data set from music streaming service SoundCloud, the
method here outperforms the widely adopted implicit matrix factorization
technique. The implementation of our method is open sourced and can be applied
to related item-item recommendation tasks with side information.

Discovering Discrete Latent Topics with Neural Variational Inference

Yishu Miao, Edward Grefenstette, Phil Blunsom
Comments: ICML 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)

Topic models have been widely explored as probabilistic generative models of
documents. Traditional inference methods have sought closed-form derivations
for updating the models, however as the expressiveness of these models grows,
so does the difficulty of performing fast and accurate inference over their
parameters. This paper presents alternative neural approaches to topic
modelling by providing parameterisable distributions over topics which permit
training by backpropagation in the framework of neural variational inference.
In addition, with the help of a stick-breaking construction, we propose a
recurrent network that is able to discover a notionally unbounded number of
topics, analogous to Bayesian non-parametric topic models. Experimental results
on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the
effectiveness and efficiency of these neural topic models.

Deep Learning for Hate Speech Detection in Tweets

Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, Vasudeva Varma
Comments: In Proceedings of ACM WWW’17 Companion, Perth, Western Australia, Apr 2017 (WWW’17), 2 pages
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

Hate speech detection on Twitter is critical for applications like
controversial event extraction, building AI chatterbots, content
recommendation, and sentiment analysis. We define this task as being able to
classify a tweet as racist, sexist or neither. The complexity of the natural
language constructs makes this task very challenging. We perform extensive
experiments with multiple deep learning architectures to learn semantic word
embeddings to handle this complexity. Our experiments on a benchmark dataset of
16K annotated tweets show that such deep learning methods outperform
state-of-the-art char/word n-gram methods by ~18 F1 points.

Network Capacity Bound for Personalized PageRank in Multimodal Networks

M.A. Kłopotek, S.T. Wierzchoń, R.A. Kłopotek
Comments: 28 pages. arXiv admin note: text overlap with arXiv:1702.03734
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

In a former paper the concept of Bipartite PageRank was introduced and a
theorem on the limit of authority flowing between nodes for personalized
PageRank has been generalized. In this paper we want to extend those results to
multimodal networks. In particular we introduce a hypergraph type that may be
used for describing multimodal network where a hyperlink connects nodes from
each of the modalities. We introduce a generalisation of PageRank for such
graphs and define the respective random walk model that can be used for
computations. we finally state and prove theorems on the limit of outflow of
authority for cases where individual modalities have identical and distinct
damping factors.

Computation and Language

Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules

Ivan Vulić, Nikola Mrkšić, Roi Reichart, Diarmuid Ó Séaghdha, Steve Young, Anna Korhonen
Comments: ACL 2017 (Long paper)
Subjects: Computation and Language (cs.CL)

Morphologically rich languages accentuate two properties of distributional
vector space models: 1) the difficulty of inducing accurate representations for
low-frequency word forms; and 2) insensitivity to distinct lexical relations
that have similar distributional signatures. These effects are detrimental for
language understanding systems, which may infer that ‘inexpensive’ is a
rephrasing for ‘expensive’ or may not associate ‘acquire’ with ‘acquires’. In
this work, we propose a novel morph-fitting procedure which moves past the use
of curated semantic lexicons for improving distributional vector spaces.
Instead, our method injects morphological constraints generated using simple
language-specific rules, pulling inflectional forms of the same word close
together and pushing derivational antonyms far apart. In intrinsic evaluation
over four languages, we show that our approach: 1) improves low-frequency word
estimates; and 2) boosts the semantic quality of the entire word vector
collection. Finally, we show that morph-fitted vectors yield large gains in the
downstream task of dialogue state tracking, highlighting the importance of
morphology for tackling long-tail phenomena in language understanding tasks.

Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

We present Attract-Repel, an algorithm for improving the semantic quality of
word vectors by injecting constraints extracted from lexical resources.
Attract-Repel facilitates the use of constraints from mono- and cross-lingual
resources, yielding semantically specialised cross-lingual vector spaces. Our
evaluation shows that the method can make use of existing cross-lingual
lexicons to construct high-quality vector spaces for a plethora of different
languages, facilitating semantic transfer from high- to lower-resource ones.
The effectiveness of our approach is demonstrated with state-of-the-art results
on semantic similarity datasets in six languages. We next show that
Attract-Repel-specialised vectors boost performance in the downstream task of
dialogue state tracking (DST) across multiple languages. Finally, we show that
cross-lingual vector spaces produced by our algorithm facilitate the training
of multilingual DST models, which brings further performance improvements.

Discovering Discrete Latent Topics with Neural Variational Inference

Yishu Miao, Edward Grefenstette, Phil Blunsom
Comments: ICML 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)

Topic models have been widely explored as probabilistic generative models of
documents. Traditional inference methods have sought closed-form derivations
for updating the models, however as the expressiveness of these models grows,
so does the difficulty of performing fast and accurate inference over their
parameters. This paper presents alternative neural approaches to topic
modelling by providing parameterisable distributions over topics which permit
training by backpropagation in the framework of neural variational inference.
In addition, with the help of a stick-breaking construction, we propose a
recurrent network that is able to discover a notionally unbounded number of
topics, analogous to Bayesian non-parametric topic models. Experimental results
on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the
effectiveness and efficiency of these neural topic models.

Using of heterogeneous corpora for training of an ASR system

Jan Trmal, Gaurav Kumar, Vimal Manohar, Sanjeev Khudanpur, Matt Post, Paul McNamee
Subjects: Computation and Language (cs.CL)

The paper summarizes the development of the LVCSR system built as a part of
the Pashto speech-translation system at the SCALE (Summer Camp for Applied
Language Exploration) 2015 workshop on “Speech-to-text-translation for
low-resource languages”. The Pashto language was chosen as a good “proxy”
low-resource language, exhibiting multiple phenomena which make the
speech-recognition and and speech-to-text-translation systems development hard.

Even when the amount of data is seemingly sufficient, given the fact that the
data originates from multiple sources, the preliminary experiments reveal that
there is little to no benefit in merging (concatenating) the corpora and more
elaborate ways of making use of all of the data must be worked out.

This paper concentrates only on the LVCSR part and presents a range of
different techniques that were found to be useful in order to benefit from
multiple different corpora

Polish Read Speech Corpus for Speech Tools and Services

Danijel Koržinek, Krzysztof Marasek, Łukasz Brocki, Krzysztof Wołk
Subjects: Computation and Language (cs.CL)

This paper describes the speech processing activities conducted at the Polish
consortium of the CLARIN project. The purpose of this segment of the project
was to develop specific tools that would allow for automatic and semi-automatic
processing of large quantities of acoustic speech data. The tools include the
following: grapheme-to-phoneme conversion, speech-to-text alignment, voice
activity detection, speaker diarization, keyword spotting and automatic speech
transcription. Furthermore, in order to develop these tools, a large
high-quality studio speech corpus was recorded and released under an open
license, to encourage development in the area of Polish speech research.
Another purpose of the corpus was to serve as a reference for studies in
phonetics and pronunciation. All the tools and resources were released on the
the Polish CLARIN website. This paper discusses the current status and future
plans for the project.

Deep Learning for Hate Speech Detection in Tweets

Hate speech detection on Twitter is critical for applications like
controversial event extraction, building AI chatterbots, content
recommendation, and sentiment analysis. We define this task as being able to
classify a tweet as racist, sexist or neither. The complexity of the natural
language constructs makes this task very challenging. We perform extensive
experiments with multiple deep learning architectures to learn semantic word
embeddings to handle this complexity. Our experiments on a benchmark dataset of
16K annotated tweets show that such deep learning methods outperform
state-of-the-art char/word n-gram methods by ~18 F1 points.

Natural Language Generation for Spoken Dialogue System using RNN Encoder-Decoder Networks

Van-Khanh Tran, Le-Minh Nguyen
Comments: has been accepted to appear at CoNLL 2017
Subjects: Computation and Language (cs.CL)

Natural language generation (NLG) is a critical component in a spoken
dialogue system. This paper presents a Recurrent Neural Network based
Encoder-Decoder architecture, in which an LSTM-based decoder is introduced to
select, aggregate semantic elements produced by an attention mechanism over the
input elements, and to produce the required utterances. The proposed generator
can be jointly trained both sentence planning and surface realization to
produce natural language sentences. The proposed model was extensively
evaluated on four different NLG datasets. The experimental results showed that
the proposed generators not only consistently outperform the previous methods
across all the NLG domains but also show an ability to generalize from a new,
unseen domain and learn from multi-domain datasets.

Semantic Refinement GRU-based Neural Language Generation for Spoken Dialogue Systems

Van-Khanh Tran, Le-Minh Nguyen
Comments: has been accepted to appear at PACLING 2017
Subjects: Computation and Language (cs.CL)

Natural language generation (NLG) plays a critical role in spoken dialogue
systems. This paper presents a new approach to NLG by using recurrent neural
networks (RNN), in which a gating mechanism is applied before RNN computation.
This allows the proposed model to generate appropriate sentences. The RNN-based
generator can be learned from unaligned data by jointly training sentence
planning and surface realization to produce natural language responses. The
model was extensively evaluated on four different NLG domains. The results show
that the proposed generator achieved better performance on all the NLG domains
compared to previous generators.

Teaching Machines to Describe Images via Natural Language Feedback

Robots will eventually be part of every household. It is thus critical to
enable algorithms to learn from and be guided by non-expert users. In this
paper, we bring a human in the loop, and enable a human teacher to give
feedback to a learning agent in the form of natural language. We argue that a
descriptive sentence can provide a much stronger learning signal than a numeric
reward in that it can easily point to where the mistakes are and how to correct
them. We focus on the problem of image captioning in which the quality of the
output can easily be judged by non-experts. We propose a hierarchical
phrase-based captioning model trained with policy gradients, and design a
feedback network that provides reward to the learner by conditioning on the
human-provided feedback. We show that by exploiting descriptive feedback our
model learns to perform better than when given independently written human
captions.

Transfer Learning for Speech Recognition on a Budget

End-to-end training of automated speech recognition (ASR) systems requires
massive data and compute resources. We explore transfer learning based on model
adaptation as an approach for training ASR models under constrained GPU memory,
throughput and training data. We conduct several systematic experiments
adapting a Wav2Letter convolutional neural network originally trained for
English ASR to the German language. We show that this technique allows faster
training on consumer-grade resources while requiring less training data in
order to achieve the same accuracy, thereby lowering the cost of training ASR
models in other languages. Model introspection revealed that small adaptations
to the network’s weights were sufficient for good performance, especially for
inner layers.

Learning to Compute Word Embeddings on the Fly

Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

Words in natural language follow a Zipfian distribution whereby some words
are frequent but most are rare. Learning representations for words in the “long
tail” of this distribution requires enormous amounts of data. Representations
of rare words trained directly on end-tasks are usually poor, requiring us to
pre-train embeddings on external data, or treat all rare words as
out-of-vocabulary words with a unique representation. We provide a method for
predicting embeddings of rare words on the fly from small amounts of auxiliary
data with a network trained against the end task. We show that this improves
results against baselines where embeddings are trained on the end task in a
reading comprehension task, a recognizing textual entailment task, and in
language modelling.

Distributed, Parallel, and Cluster Computing

A graph model of message passing processes

Andrew M. Mironov
Subjects: Logic in Computer Science (cs.LO); Distributed, Parallel, and Cluster Computing (cs.DC)

In the paper we consider a graph model of message passing processes and
present a method verification of message passing processes. The method is
illustrated by an example of a verification of sliding window protocol.

Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training

Martin Kuehn, Janis Keuper, Franz-Josef Pfreundt
Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Deep Neural Network (DNN) are currently of great inter- est in research and
application. The training of these net- works is a compute intensive and time
consuming task. To reduce training times to a bearable amount at reasonable
cost we extend the popular Caffe toolbox for DNN with an efficient distributed
memory communication pattern. To achieve good scalability we emphasize the
overlap of computation and communication and prefer fine granu- lar
synchronization patterns over global barriers. To im- plement these
communication patterns we rely on the the Global address space Programming
Interface version 2 (GPI-2) communication library. This interface provides a
light-weight set of asynchronous one-sided communica- tion primitives
supplemented by non-blocking fine gran- ular data synchronization mechanisms.
Therefore, Caf- feGPI is the name of our parallel version of Caffe. First
benchmarks demonstrate better scaling behavior com- pared with other
extensions, e.g., the Intel TM Caffe. Even within a single symmetric
multiprocessing machine with four graphics processing units, the CaffeGPI
scales bet- ter than the standard Caffe toolbox. These first results
demonstrate that the use of standard High Performance Computing (HPC) hardware
is a valid cost saving ap- proach to train large DDNs. I/O is an other
bottleneck to work with DDNs in a standard parallel HPC setting, which we will
consider in more detail in a forthcoming paper.

Learning

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Off-policy model-free deep reinforcement learning methods using previously
collected data can improve sample efficiency over on-policy policy gradient
techniques. On the other hand, on-policy algorithms are often more stable and
easier to use. This paper examines, both theoretically and empirically,
approaches to merging on- and off-policy updates for deep reinforcement
learning. Theoretical results show that off-policy updates with a value
function estimator can be interpolated with on-policy policy gradient updates
whilst still satisfying performance bounds. Our analysis uses control variate
methods to produce a family of policy gradient algorithms, with several
recently proposed algorithms being special cases of this family. We then
provide an empirical comparison of these techniques with the remaining
algorithmic details fixed, and show how different mixing of off-policy gradient
estimates with on-policy samples contribute to improvements in empirical
performance. The final algorithm provides a generalization and unification of
existing deep policy gradient techniques, has theoretical guarantees on the
bias introduced by off-policy updates, and improves on the state-of-the-art
model-free deep RL methods on a number of OpenAI Gym continuous control
benchmarks.

Transfer Learning for Speech Recognition on a Budget

End-to-end training of automated speech recognition (ASR) systems requires
massive data and compute resources. We explore transfer learning based on model
adaptation as an approach for training ASR models under constrained GPU memory,
throughput and training data. We conduct several systematic experiments
adapting a Wav2Letter convolutional neural network originally trained for
English ASR to the German language. We show that this technique allows faster
training on consumer-grade resources while requiring less training data in
order to achieve the same accuracy, thereby lowering the cost of training ASR
models in other languages. Model introspection revealed that small adaptations
to the network’s weights were sufficient for good performance, especially for
inner layers.

Learning to Compute Word Embeddings on the Fly

Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio
Subjects: Learning (cs.LG); Computation and Language (cs.CL)

Words in natural language follow a Zipfian distribution whereby some words
are frequent but most are rare. Learning representations for words in the “long
tail” of this distribution requires enormous amounts of data. Representations
of rare words trained directly on end-tasks are usually poor, requiring us to
pre-train embeddings on external data, or treat all rare words as
out-of-vocabulary words with a unique representation. We provide a method for
predicting embeddings of rare words on the fly from small amounts of auxiliary
data with a network trained against the end task. We show that this improves
results against baselines where embeddings are trained on the end task in a
reading comprehension task, a recognizing textual entailment task, and in
language modelling.

Krylov Subspace Recycling for Fast Iterative Least-Squares in Machine Learning

Filip de Roos, Philipp Hennig
Subjects: Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)

Solving symmetric positive definite linear problems is a fundamental
computational task in machine learning. The exact solution, famously, is
cubicly expensive in the size of the matrix. To alleviate this problem, several
linear-time approximations, such as spectral and inducing-point methods, have
been suggested and are now in wide use. These are low-rank approximations that
choose the low-rank space a priori and do not refine it over time. While this
allows linear cost in the data-set size, it also causes a finite, uncorrected
approximation error. Authors from numerical linear algebra have explored ways
to iteratively refine such low-rank approximations, at a cost of a small number
of matrix-vector multiplications. This idea is particularly interesting in the
many situations in machine learning where one has to solve a sequence of
related symmetric positive definite linear problems. From the machine learning
perspective, such deflation methods can be interpreted as transfer learning of
a low-rank approximation across a time-series of numerical tasks. We study the
use of such methods for our field. Our empirical results show that, on
regression and classification problems of intermediate size, this approach can
interpolate between low computational cost and numerical precision.

Subjective fairness: Fairness is in the eye of the beholder

Christos Dimitrakakis, Yang Liu, David Parkes, Goran Radanovic
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We analyze different notions of fairness in decision making when the
underlying model is not known with certainty. We argue that recent notions of
fairness in machine learning need to be modified to incorporate uncertainties
about model parameters. We introduce the notion of {em subjective fairness} as
a suitable candidate for fair Bayesian decision making rules, relate this
definition with existing ones, and experimentally demonstrate the inherent
accuracy-fairness tradeoff under this definition.

Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training

Martin Kuehn, Janis Keuper, Franz-Josef Pfreundt
Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Deep Neural Network (DNN) are currently of great inter- est in research and
application. The training of these net- works is a compute intensive and time
consuming task. To reduce training times to a bearable amount at reasonable
cost we extend the popular Caffe toolbox for DNN with an efficient distributed
memory communication pattern. To achieve good scalability we emphasize the
overlap of computation and communication and prefer fine granu- lar
synchronization patterns over global barriers. To im- plement these
communication patterns we rely on the the Global address space Programming
Interface version 2 (GPI-2) communication library. This interface provides a
light-weight set of asynchronous one-sided communica- tion primitives
supplemented by non-blocking fine gran- ular data synchronization mechanisms.
Therefore, Caf- feGPI is the name of our parallel version of Caffe. First
benchmarks demonstrate better scaling behavior com- pared with other
extensions, e.g., the Intel TM Caffe. Even within a single symmetric
multiprocessing machine with four graphics processing units, the CaffeGPI
scales bet- ter than the standard Caffe toolbox. These first results
demonstrate that the use of standard High Performance Computing (HPC) hardware
is a valid cost saving ap- proach to train large DDNs. I/O is an other
bottleneck to work with DDNs in a standard parallel HPC setting, which we will
consider in more detail in a forthcoming paper.

Free energy-based reinforcement learning using a quantum processor

Recent theoretical and experimental results suggest the possibility of using
current and near-future quantum hardware in challenging sampling tasks. In this
paper, we introduce free energy-based reinforcement learning (FERL) as an
application of quantum hardware. We propose a method for processing a quantum
annealer’s measured qubit spin configurations in approximating the free energy
of a quantum Boltzmann machine (QBM). We then apply this method to perform
reinforcement learning on the grid-world problem using the D-Wave 2000Q quantum
annealer. The experimental results show that our technique is a promising
method for harnessing the power of quantum sampling in reinforcement learning
tasks.

The Sample Complexity of Online One-Class Collaborative Filtering

Reinhard Heckel, Kannan Ramchandran
Comments: ICML 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

We consider the online one-class collaborative filtering (CF) problem that
consists of recommending items to users over time in an online fashion based on
positive ratings only. This problem arises when users respond only occasionally
to a recommendation with a positive rating, and never with a negative one. We
study the impact of the probability of a user responding to a recommendation,
p_f, on the sample complexity, i.e., the number of ratings required to make
`good’ recommendations, and ask whether receiving positive and negative
ratings, instead of positive ratings only, improves the sample complexity. Both
questions arise in the design of recommender systems. We introduce a simple
probabilistic user model, and analyze the performance of an online user-based
CF algorithm. We prove that after an initial cold start phase, where
recommendations are invested in exploring the user’s preferences, this
algorithm makes—up to a fraction of the recommendations required for updating
the user’s preferences—perfect recommendations. The number of ratings
required for the cold start phase is nearly proportional to 1/p_f, and that for
updating the user’s preferences is essentially independent of p_f. As a
consequence we find that, receiving positive and negative ratings instead of
only positive ones improves the number of ratings required for initial
exploration by a factor of 1/p_f, which can be significant.

Learning Time-Efficient Deep Architectures with Budgeted Super Networks

Tom Veniat, Ludovic Denoyer
Subjects: Learning (cs.LG)

Learning neural network architectures is a way to discover new highly
predictive models. We propose to focus on this problem from a different
perspective where the goal is to discover architectures efficient in terms of
both prediction quality and computation cost, e.g time in milliseconds, number
of operations… For instance, our approach is able to solve the following
task: find the best neural network architecture (in a very large set of
possible architectures) able to predict well in less than 100 milliseconds on
my mobile phone.

Our contribution is based on a new family of models called Budgeted Super
Networks that are learned using reinforcement-learning inspired techniques
applied to a budgeted learning objective function which includes the
computation cost during disk/memory operations at inference. We present a set
of experiments on computer vision problems and show the ability of our method
to discover efficient architectures in terms of both predictive quality and
computation time.

Biased Importance Sampling for Deep Neural Network Training

Angelos Katharopoulos, François Fleuret
Subjects: Learning (cs.LG)

Importance sampling has been successfully used to accelerate stochastic
optimization in many convex problems. However, the lack of an efficient way to
calculate the importance still hinders its application to Deep Learning. In
this paper, we show that the loss value can be used as an alternative
importance metric, and propose a way to efficiently approximate it for a deep
model, using a small model trained for that purpose in parallel.

This method allows in particular to utilize a biased gradient estimate that
implicitly optimizes a soft max-loss, and leads to better generalization
performance. While such method suffers from a prohibitively high variance of
the gradient estimate when using a standard stochastic optimizer, we show that
when it is combined with our sampling mechanism, it results in a reliable
procedure.

We showcase the generality of our method by testing it on both image
classification and language modeling tasks using deep convolutional and
recurrent neural networks. In particular, in case of CIFAR10 we reach 10%
classification error 50 epochs faster than when using uniform sampling.

Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks

Arash Vahdat
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Collecting large training datasets, annotated with high quality labels, is a
costly process. This paper proposes a novel framework for training deep
convolutional neural networks from noisy labeled datasets. The problem is
formulated using an undirected graphical model that represents the relationship
between noisy and clean labels, trained in a semi-supervised setting. In the
proposed structure, the inference over latent clean labels is tractable and is
regularized during training using auxiliary sources of information. The
proposed model is applied to the image labeling problem and is shown to be
effective in labeling unseen images as well as reducing label noise in training
on CIFAR-10 and MS COCO datasets.

Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Variational autoencoders (VAEs) learn representations of data by jointly
training a probabilistic encoder and decoder network. Typically these models
encode all features of the data into a single variable. Here we are interested
in learning disentangled representations that encode distinct aspects of the
data into separate variables. We propose to learn such representations using
model architectures that generalize from standard VAEs, employing a general
graphical model structure in the encoder and decoder. This allows us to train
partially-specified models that make relatively strong assumptions about a
subset of interpretable variables and rely on the flexibility of neural
networks to learn representations for the remaining variables. We further
define a general objective for semi-supervised learning in this model class,
which can be approximated using an importance sampling procedure. We evaluate
our framework’s ability to learn disentangled representations, both by
qualitative exploration of its generative capacity, and quantitative evaluation
of its discriminative ability on a variety of models and datasets.

Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

We present Attract-Repel, an algorithm for improving the semantic quality of
word vectors by injecting constraints extracted from lexical resources.
Attract-Repel facilitates the use of constraints from mono- and cross-lingual
resources, yielding semantically specialised cross-lingual vector spaces. Our
evaluation shows that the method can make use of existing cross-lingual
lexicons to construct high-quality vector spaces for a plethora of different
languages, facilitating semantic transfer from high- to lower-resource ones.
The effectiveness of our approach is demonstrated with state-of-the-art results
on semantic similarity datasets in six languages. We next show that
Attract-Repel-specialised vectors boost performance in the downstream task of
dialogue state tracking (DST) across multiple languages. Finally, we show that
cross-lingual vector spaces produced by our algorithm facilitate the training
of multilingual DST models, which brings further performance improvements.

Discovering Discrete Latent Topics with Neural Variational Inference

Yishu Miao, Edward Grefenstette, Phil Blunsom
Comments: ICML 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG)

Topic models have been widely explored as probabilistic generative models of
documents. Traditional inference methods have sought closed-form derivations
for updating the models, however as the expressiveness of these models grows,
so does the difficulty of performing fast and accurate inference over their
parameters. This paper presents alternative neural approaches to topic
modelling by providing parameterisable distributions over topics which permit
training by backpropagation in the framework of neural variational inference.
In addition, with the help of a stick-breaking construction, we propose a
recurrent network that is able to discover a notionally unbounded number of
topics, analogous to Bayesian non-parametric topic models. Experimental results
on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the
effectiveness and efficiency of these neural topic models.

Stable recovery of the factors from a deep matrix product and application to convolutional networks. Focus on sparsity constraints

Francois Malgouyres (IMT), Joseph Landsberg (TAMU)
Comments: arXiv admin note: substantial text overlap with arXiv:1703.08044
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Statistics Theory (math.ST)

We study a deep matrix factorization problem. It takes as input a matrix X
obtained by multiplying K matrices (called factors). Each factor is obtained by
applying a fixed linear operator to a vector of parameters satisfying a
sparsity constraint. We provide sharp conditions on the structure of the model
that guarantee the stable recovery of the factors from the knowledge of X and
the model for the factors. This is crucial in order to interpret the factors
and the intermediate features obtained when applying a few factors to a datum.
When K = 1: the paper provides compressed sensing statements; K = 2 covers (for
instance) Non-negative Matrix Factorization, Dictionary learning, low rank
approximation, phase recovery. The particularity of this paper is to extend the
study to deep problems. As an illustration, we detail the analysis and provide
(entirely computable) guarantees for the stable recovery of a (non-neural)
sparse convolutional network.

Discriminative k-shot learning using probabilistic models

Matthias Bauer, Mateo Rojas-Carulla, Jakub Bartłomiej Świątkowski, Bernhard Schölkopf, Richard E. Turner
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

This paper introduces a probabilistic framework for k-shot image
classification. The goal is to generalise from an initial large-scale
classification task to a separate task comprising new classes and small numbers
of examples. The new approach not only leverages the feature-based
representation learned by a neural network from the initial task
(representational transfer), but also information about the form of the classes
(concept transfer). The concept information is encapsulated in a probabilistic
model for the final layer weights of the neural network which then acts as a
prior when probabilistic k-shot learning is performed. Surprisingly, simple
probabilistic models and inference schemes outperform many existing k-shot
learning approaches and compare favourably with the state-of-the-art method in
terms of error-rate. The new probabilistic methods are also able to accurately
model uncertainty, leading to well calibrated classifiers, and they are easily
extensible and flexible, unlike many recent approaches to k-shot learning.

Supervised Quantile Normalisation

Marine Le Morvan (CBIO), Jean-Philippe Vert (DMA, CBIO)
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)

Quantile normalisation is a popular normalisation method for data subject to
unwanted variations such as images, speech, or genomic data. It applies a
monotonic transformation to the feature values of each sample to ensure that
after normalisation, they follow the same target distribution for each sample.
Choosing a “good” target distribution remains however largely empirical and
heuristic, and is usually done independently of the subsequent analysis of
normalised data. We propose instead to couple the quantile normalisation step
with the subsequent analysis, and to optimise the target distribution jointly
with the other parameters in the analysis. We illustrate this principle on the
problem of estimating a linear model over normalised data, and show that it
leads to a particular low-rank matrix regression problem that can be solved
efficiently. We illustrate the potential of our method, which we term SUQUAN,
on simulated data, images and genomic data, where it outperforms standard
quantile normalisation.

Cross-modal Common Representation Learning by Hybrid Transfer Network

DNN-based cross-modal retrieval is a research hotspot to retrieve across
different modalities as image and text, but existing methods often face the
challenge of insufficient cross-modal training data. In single-modal scenario,
similar problem is usually relieved by transferring knowledge from large-scale
auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is
also very useful for cross-modal retrieval, which can provide rich general
semantic information that can be shared across different modalities. However,
it is challenging to transfer useful knowledge from single-modal (as image)
source domain to cross-modal (as image/text) target domain. Knowledge in source
domain cannot be directly transferred to both two different modalities in
target domain, and the inherent cross-modal correlation contained in target
domain provides key hints for cross-modal retrieval which should be preserved
during transfer process. This paper proposes Cross-modal Hybrid Transfer
Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes
the modality in both source and target domains as a bridge, for transferring
knowledge to both two modalities simultaneously; Layer-sharing correlation
subnetwork preserves the inherent cross-modal semantic correlation to further
adapt to cross-modal retrieval task. Cross-modal data can be converted to
common representation by CHTN for retrieval, and comprehensive experiment on 3
datasets shows its effectiveness.

Scalable Generalized Linear Bandits: Online Computation and Hashing

Kwang-Sung Jun, Aniruddha Bhargava, Robert Nowak, Rebecca Willett
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Generalized Linear Bandits (GLBs), a natural extension of the stochastic
linear bandits, has been popular and successful in recent years. However,
existing GLBs scale poorly with the number of rounds and the number of arms,
limiting their utility in practice. This paper proposes new, scalable solutions
to the GLB problem in two respects. First, unlike existing GLBs, whose
per-time-step space and time complexity grow at least linearly with time (t),
we propose a new algorithm that performs online computations to enjoy a
constant space and time complexity. At its heart is a novel Generalized Linear
extension of the Online-to-confidence-set Conversion (GLOC method) that takes
emph{any} online learning algorithm and turns it into a GLB algorithm. As a
special case, we apply GLOC to the online Newton step algorithm, which results
in a low-regret GLB algorithm with much lower time and memory complexity than
prior work. Second, for the case where the number (N) of arms is very large, we
propose new algorithms in which each next arm is selected via an inner product
search. Such methods can be implemented via hashing algorithms (i.e.,
“hash-amenable”) and result in a time complexity sublinear in (N). While a
Thompson sampling extension of GLOC is hash-amenable, its regret bound for
(d)-dimensional arm sets scales with (d^{3/2}), whereas GLOC’s regret bound is
linear in (d). Towards closing this gap, we propose a new hash-amenable
algorithm whose regret bound scales with (d^{5/4}). Finally, we propose a fast
approximate hash-key computation (inner product) that has a better accuracy
than the state-of-the-art, which can be of independent interest. We conclude
the paper with preliminary experimental results confirming the merits of our
methods.

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

Jonathan Scarlett, Ilijia Bogunovic, Volkan Cevher
Comments: Accepted to COLT 2017
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

In this paper, we consider the problem of sequentially optimizing a black-box
function (f) based on noisy samples and bandit feedback. We assume that (f) is
smooth in the sense of having a bounded norm in some reproducing kernel Hilbert
space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian
process bandit optimization. We provide algorithm-independent lower bounds on
the simple regret, measuring the suboptimality of a single point reported after
(T) rounds, and on the cumulative regret, measuring the sum of regrets over the
(T) chosen points.

For the isotropic squared-exponential kernel in (d) dimensions, we find that
an average simple regret of (epsilon) requires (T =
Omegaig(frac{1}{epsilon^2} (logfrac{1}{epsilon})^{d/2}ig)), and the
average cumulative regret is at least (Omegaig( sqrt{T(log T)^d} ig)),
thus matching existing upper bounds up to the replacement of (d/2) by (d+O(1))
in both cases. For the Mat’ern-(
u) kernel, we give analogous bounds of the
form (Omegaig( (frac{1}{epsilon})^{2+d/
u}ig)) and (Omegaig(
T^{frac{
u + d}{2
u + d}} ig)), and discuss the resulting gaps to the
existing upper bounds.

Megapixel Size Image Creation using Generative Adversarial Networks

Marco Marchesi
Comments: 3 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Learning (cs.LG)

Since its appearance, Generative Adversarial Networks (GANs) have received a
lot of interest in the AI community. In image generation several projects
showed how GANs are able to generate photorealistic images but the results so
far did not look adequate for the quality standard of visual media production
industry. We present an optimized image generation process based on a Deep
Convolutional Generative Adversarial Networks (DCGANs), in order to create
photorealistic high-resolution images (up to 1024×1024 pixels). Furthermore,
the system was fed with a limited dataset of images, less than two thousand
images. All these results give more clue about future exploitation of GANs in
Computer Graphics and Visual Effects.

Low-Rank Matrix Approximation in the Infinity Norm

Nicolas Gillis, Yaroslav Shitov
Comments: 12 pages, 3 tables
Subjects: Computational Complexity (cs.CC); Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

The low-rank matrix approximation problem with respect to the entry-wise
(ell_{infty})-norm is the following: given a matrix (M) and a factorization
rank (r), find a matrix (X) whose rank is at most (r) and that minimizes
(max_{i,j} |M_{ij} – X_{ij}|). In this paper, we prove that the decision
variant of this problem for (r=1) is NP-complete using a reduction from the
problem `not all equal 3SAT’. We also analyze several cases when the problem
can be solved in polynomial time, and propose a simple practical heuristic
algorithm which we apply on the problem of the recovery of a quantized low-rank
matrix.

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI

Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear
inverse task demanding time and resource intensive computations that can
substantially trade off {it accuracy} for {it speed} in real-time imaging. In
addition, state-of-the-art compressed sensing (CS) analytics are not cognizant
of the image {it diagnostic quality}. To cope with these challenges we put
forth a novel CS framework that permeates benefits from generative adversarial
networks (GAN) to train a (low-dimensional) manifold of diagnostic-quality MR
images from historical patients. Leveraging a mixture of least-squares (LS)
GANs and pixel-wise (ell_1) cost, a deep residual network with skip
connections is trained as the generator that learns to remove the {it
aliasing} artifacts by projecting onto the manifold. LSGAN learns the texture
details, while (ell_1) controls the high-frequency noise. A multilayer
convolutional neural network is then jointly trained based on diagnostic
quality images to discriminate the projection quality. The test phase performs
feed-forward propagation over the generator network that demands a very low
computational overhead. Extensive evaluations are performed on a large
contrast-enhanced MR dataset of pediatric patients. In particular, images rated
based on expert radiologists corroborate that GANCS retrieves high contrast
images with detailed texture relative to conventional CS, and pixel-wise
schemes. In addition, it offers reconstruction under a few milliseconds, two
orders of magnitude faster than state-of-the-art CS-MRI schemes.

Machine Learning Based Crackle Detection in Lung Sounds

Morten Grønnesby, Juan Carlos Aviles Solis, Einar Holsbø, Hasse Melbye, Lars Ailo Bongo
Subjects: Sound (cs.SD); Learning (cs.LG)

The stethoscope is a well-known and widely available diagnostic instrument.
In recent years, many innovative solutions for recording and viewing sounds
from a stethoscope have become available. However, to fully utilize such
devices, there is a need for an automated approach for detecting abnormal lung
sounds, which is better than the existing methods that typically have been
developed and evaluated using a small and non-diverse dataset.

We propose a machine learning based approach for detecting crackles in lung
sounds recorded using a stethoscope in a large health survey. Our method is
trained and evaluated using 209 files with crackles classified by expert
listeners. Our analysis pipeline is based on features extracted from small
windows in audio files. We evaluated several feature extraction methods and
classifiers. We evaluated the pipeline using a training set of 175 crackle
windows and 208 normal windows.

We found and evaluated a 5-dimenstional vector with four features from the
time domain and one from the spectrum domain. We evaluated several classifiers
and found SVM with a Radial Basis Function Kernel to perform best for our
5-dimensional feature vector. Our approach had a precision of 86% and recall of
84% for classifying a crackle in a window, which is more accurate than found in
studies of health personnel. The low-dimensional feature vector makes the SVM
very fast. The model can be trained on a regular computer in 1.44 seconds, and
319 crackles can be classified in 1.08 seconds.

Our approach detects and visualizes individual crackles in recorded audio
files. It is accurate, fast, and has low resource requirements. The approach is
therefore well suited for deployment on smart devices and phones or as a web
application. It can be used to train health personnel or as part of a
smartphone application for Bluetooth stethoscopes.

Surface Networks

Ilya Kostrikov, Joan Bruna, Daniele Panozzo, Denis Zorin
Subjects: Machine Learning (stat.ML); Graphics (cs.GR); Learning (cs.LG)

We study data-driven representations for three-dimensional triangle meshes,
which are one of the prevalent objects used to represent 3D geometry. Recent
works have developed models that exploit the intrinsic geometry of manifolds
and graphs, namely the Graph Neural Networks (GNNs) and its spectral variants,
which learn from the local metric tensor via the Laplacian operator.

Despite offering excellent sample complexity and built-in invariances,
intrinsic geometry alone is invariant to isometric deformations, making it
unsuitable for many applications. To overcome this limitation, we propose
several upgrades to GNNs to leverage extrinsic differential geometry properties
of three-dimensional surfaces, increasing its modeling power. In particular, we
propose to exploit the Dirac operator, whose spectrum detects principal
curvature directions — this is in stark contrast with the classical Laplace
operator, which directly measures mean curvature. We coin the resulting model
the emph{Surface Network (SN)}. We demonstrate the efficiency and versatility
of SNs on two challenging tasks: temporal prediction of mesh deformations under
non-linear dynamics and generative models using a variational autoencoder
framework with encoders/decoders given by SNs.

Better Text Understanding Through Image-To-Text Transfer

Karol Kurach, Sylvain Gelly, Michal Jastrzebski, Philip Haeusser, Olivier Teytaud, Damien Vincent, Olivier Bousquet
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Generic text embeddings are successfully used in a variety of tasks. However,
they are often learnt by capturing the co-occurrence structure from pure text
corpora, resulting in limitations of their ability to generalize. In this
paper, we explore models that incorporate visual information into the text
representation. Based on comprehensive ablation studies, we propose a
conceptually simple, yet well performing architecture. It outperforms previous
multimodal approaches on a set of well established benchmarks. We also improve
the state-of-the-art results for image-related text datasets, using orders of
magnitude less data.

Information Theory

Benchmark problems for phase retrieval

Veit Elser, Ti-Yen Lan
Comments: 7 pages, 6 figures
Subjects: Information Theory (cs.IT); Data Analysis, Statistics and Probability (physics.data-an)

The hardest instances of phase retrieval arise in crystallography, where the
signal is periodic and comprised of atomic distributions arranged uniformly in
the unit cell of the crystal. We have constructed a graded set of benchmark
problems for evaluating algorithms that perform this type of phase retrieval. A
simple iterative algorithm was used to establish baseline runtimes that
empirically grow exponentially in the sparsity of the signal autocorrelation.
We also review the algorithms used by the leading software packages for
crystallographic phase retrieval.

More new classes of permutation trinomials over (mathbb{F}_{2^n})

Yanping Wang, WeiGuo Zhang, Zhengbang Zha
Comments: 17 pages
Subjects: Information Theory (cs.IT); Number Theory (math.NT)

Permutation polynomials over finite fields have wide applications in many
areas of science and engineering. In this paper, we present six new classes of
permutation trinomials over (mathbb{F}_{2^n}) which have explicit forms by
determining the solutions of some equations.

Optimal Slotted ALOHA under Delivery Deadline Constraint for Multiple-Packet Reception

Yijin Zhang, Yuan-Hsun Lo, Feng Shu, Jun Li
Subjects: Information Theory (cs.IT)

This paper considers a time-slotted communication channel that is shared by
(N) users transmitting to a single receiver under saturated traffic. It is
assumed that the receiver has the ability of the multiple-packet reception
(MPR) to correctly decode up to (M) ((1 leq M < N)) time-overlapping packets.
Each user attempts to access the channel following a slotted ALOHA protocol,
and transmits a packet within a channel slot with a common transmission
probability. To evaluate the reliability in the scenario that a packet needs to
be transmitted within a strict delivery deadline (D) ((Dgeq 1)) slots since
its arrival at the head of queue, we consider the successful delivery
probability of a packet as performance metric of interest. Generalizing
previous studies that only focused on the single-packet reception (SPR) channel
(i.e., (M=1)) or the throughput performance (i.e., (D=1)), we derive the
optimal transmission probability that maximizes the successful delivery
probability for any (1leq M<N) and any (Dgeq1). In particular, since that the
previous result on the optimal transmission probability under MPR for (D=1) was
obtained relying on an unproved technical condition, the throughput
maximization issue under MPR is first completely addressed in this paper.
Furthermore, by noting that maximizing the successful delivery probability for
(D>1) would degrade the throughput performance, we obtain the optimal
transmission probability subject to throughput constraint.

Multi-point Codes from the GGS Curves

Chuangqiang Hu, Shudi Yang
Comments: 24 pages. arXiv admin note: text overlap with arXiv:1607.05462
Subjects: Information Theory (cs.IT)

This paper is concerned with the construction of algebraic geometric codes
defined from GGS curves. It is of significant use to describe bases for the
Riemann-Roch spaces associated with totally ramified places, which enables us
to study multi-point AG codes. Along this line, we characterize explicitly the
Weierstrass semigroups and pure gaps. Additionally, we determine the floor of a
certain type of divisor and investigate the properties of AG codes from GGS
curves. Finally, we apply these results to find multi-point codes with
excellent parameters. As one of the examples, a presented code with parameters
( [216,190,geqslant 18] ) over ( mathbb{F}_{64} ) yields a new record.

Energy Harvesting Networks with General Utility Functions: Near Optimal Online Policies

Ahmed Arafa, Abdulrahman Baknina, Sennur Ulukus
Comments: To appear in the 2017 IEEE International Symposium on Information Theory. arXiv admin note: text overlap with arXiv:1705.10305
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

We consider online scheduling policies for single-user energy harvesting
communication systems, where the goal is to characterize online policies that
maximize the long term average utility, for some general concave and
monotonically increasing utility function. In our setting, the transmitter
relies on energy harvested from nature to send its messages to the receiver,
and is equipped with a finite-sized battery to store its energy. Energy packets
are independent and identically distributed (i.i.d.) over time slots, and are
revealed causally to the transmitter. Only the average arrival rate is known a
priori. We first characterize the optimal solution for the case of Bernoulli
arrivals. Then, for general i.i.d. arrivals, we first show that fixed fraction
policies [Shaviv-Ozgur] are within a constant multiplicative gap from the
optimal solution for all energy arrivals and battery sizes. We then derive a
set of sufficient conditions on the utility function to guarantee that fixed
fraction policies are within a constant additive gap as well from the optimal
solution.

Modeling and Design of Millimeter-Wave Networks for Highway Vehicular Communication

Andrea Tassi, Malcolm Egan, Robert J. Piechocki, Andrew Nix
Subjects: Information Theory (cs.IT); Performance (cs.PF)

Connected and autonomous vehicles will play a pivotal role in future
Intelligent Transportation Systems (ITSs) and smart cities, in general.
High-speed and low-latency wireless communication links will allow
municipalities to warn vehicles against safety hazards, as well as support
cloud-driving solutions to drastically reduce traffic jams and air pollution.
To achieve these goals, vehicles need to be equipped with a wide range of
sensors generating and exchanging high rate data streams. Recently, millimeter
wave (mmWave) techniques have been introduced as a means of fulfilling such
high data rate requirements. In this paper, we model a highway communication
network and characterize its fundamental link budget metrics. In particular, we
specifically consider a network where vehicles are served by mmWave Base
Stations (BSs) deployed alongside the road. To evaluate our highway network, we
develop a new theoretical model that accounts for a typical scenario where
heavy vehicles (such as buses and lorries) in slow lanes obstruct Line-of-Sight
(LOS) paths of vehicles in fast lanes and, hence, act as blockages. Using tools
from stochastic geometry, we derive approximations for the
Signal-to-Interference-plus-Noise ratio (SINR) outage probability, as well as
the probability that a user achieves a target communication rate (rate coverage
probability). Our analysis provides new design insights for mmWave highway
communication networks. In particular, we show that smaller antenna beamwidths
and, unlike bi-dimensional mmWave cellular networks, smaller BS densities do
not necessarily have a disruptive impact on improving the SINR outage
probability, and consequently the rate coverage probability.

Coding Method for Parallel Iterative Linear Solver

Yaoqing Yang, Pulkit Grover, Soummya Kar
Comments: submitted
Subjects: Information Theory (cs.IT)

Computationally intensive distributed and parallel computing is often
bottlenecked by a small set of slow workers known as stragglers. In this paper,
we utilize the emerging idea of “coded computation” to design a novel
error-correcting-code inspired technique for solving linear inverse problems
under specific iterative methods in a parallelized implementation affected by
stragglers. Example applications include inverse problems in machine learning
on graphs, such as personalized PageRank and sampling on graphs. We provably
show that our coded-computation technique can reduce the mean-squared error
under a computational deadline constraint. In fact, the ratio of mean-squared
error of replication-based and coded techniques diverges to infinity as the
deadline increases. Our experiments for personalized PageRank performed on real
systems and real social networks show that this ratio can be as large as
(10^4). Further, unlike coded-computation techniques proposed thus far, our
strategy combines outputs of all workers, including the stragglers, to produce
more accurate estimates at the computational deadline. This also ensures that
the accuracy degrades “gracefully” in the event that the number of stragglers
is large.

Asymptotic Outage Analysis of HARQ-IR over Time-Correlated Nakagami-(m) Fading Channels

Zheng Shi, Shaodan Ma, Guanghua Yang, Kam-Weng Tam, Minghua Xia
Subjects: Information Theory (cs.IT)

In this paper, outage performance of hybrid automatic repeat request with
incremental redundancy (HARQ-IR) is analyzed. Unlike prior analyses,
time-correlated Nakagami-(m) fading channel is considered. The outage analysis
thus involves the probability distribution analysis of a product of multiple
correlated shifted Gamma random variables and is more challenging than prior
analyses. Based on the finding of the conditional independence of the received
signal-to-noise ratios (SNRs), the outage probability is exactly derived by
using conditional Mellin transform. Specifically, the outage probability of
HARQ-IR under time-correlated Nakagami-(m) fading channels can be written as a
weighted sum of outage probabilities of HARQ-IR over independent Nakagami
fading channels, where the weightings are determined by a negative multinomial
distribution. This result enables not only an efficient truncation
approximation of the outage probability with uniform convergence but also
asymptotic outage analysis to further extract clear insights which have never
been discovered for HARQ-IR even under fast fading channels. The asymptotic
outage probability is then derived in a simple form which clearly quantifies
the impacts of transmit powers, channel time correlation and information
transmission rate. It is proved that the asymptotic outage probability is an
inverse power function of the product of transmission powers in all HARQ
rounds, an increasing function of the channel time correlation coefficients,
and a monotonically increasing and convex function of information transmission
rate. The simple expression of the asymptotic result enables optimal power
allocation and optimal rate selection of HARQ-IR with low complexity. Finally,
numerical results are provided to verify our analytical results and justify the
application of the asymptotic result for optimal system design.

Optimal repair of Reed-Solomon codes: Achieving the cut-set bound

Itzhak Tamo, Min Ye, Alexander Barg
Subjects: Information Theory (cs.IT)

Coding for distributed storage gives rise to a new set of problems in coding
theory related to the need of reducing inter-node communication in the system.
A large number of recent papers addressed the problem of optimizing the total
amount of information downloaded for repair of a single failed node (the repair
bandwidth) by accessing information on (d) {em helper nodes}, where (kle dle
n-1.) By the so-called cut-set bound (Dimakis et al., 2010), the repair
bandwidth of an ((n,k=n-r)) MDS code using (d) helper nodes is at least
(dl/(d+1-k),) where (l) is the size of the node. Also, a number of known
constructions of MDS array codes meet this bound with equality. In a related
but separate line of work, Guruswami and Wootters (2016) studied repair of
Reed-Solomon (RS) codes, showing that these codes can be repaired using a
smaller bandwidth than under the trivial approach. At the same time, their work
as well as follow-up papers stopped short of constructing RS codes (or any
scalar MDS codes) that meet the cut-set bound with equality, which has been an
open problem in coding theory. In this work we present a solution to this
problem, constructing RS codes of length (n) over the field (q^l,
l=exp((1+o(1))nlog n)) that meet the cut-set bound. We also prove an almost
matching lower bound on (l), showing that the super-exponential scaling is both
necessary and sufficient for achieving the cut-set bound using linear repair
schemes. More precisely, we prove that for scalar MDS codes (including the RS
codes) to meet this bound, the sub-packetization (l) must satisfy (l ge
exp((1+o(1)) klog k).)

Low Subpacketization Schemes for Coded Caching

Li Tang, Aditya Ramamoorthy
Comments: This paper was presented in part at the 2016 IEEE Workshop on Network Coding and Applications (NetCod) and will be presented in part at the 2017 IEEE International Symposium on Information Theory (ISIT)
Subjects: Information Theory (cs.IT)

Coded caching is a technique that generalizes conventional caching and
promises significant reductions in traffic over caching networks. However, the
basic coded caching scheme requires that each file hosted in the server be
partitioned into a large number (i.e., the subpacketization level) of
non-overlapping subfiles. From a practical perspective, this is problematic as
it means that prior schemes are only applicable when the size of the files is
extremely large. In this work, we propose coded caching schemes based on
combinatorial structures called resolvable designs. These structures can be
obtained in a natural manner from linear block codes whose generator matrices
possess certain rank properties. We demonstrate that several schemes with
subpacketization levels that are exponentially smaller than the basic scheme
can be obtained.

Inexact Gradient Projection and Fast Data Driven Compressed Sensing

Mohammad Golbabaee, Mike E. Davies
Subjects: Information Theory (cs.IT)

We study convergence of the iterative projected gradient (IPG) algorithm for
arbitrary (possibly nonconvex) sets and when both the gradient and projection
oracles are computed approximately. We consider different notions of
approximation of which we show that the Progressive Fixed Precision (PFP) and
the ((1+epsilon))-optimal oracles can achieve the same accuracy as for the
exact IPG algorithm. We show that the former scheme is also able to maintain
the (linear) rate of convergence of the exact algorithm, under the same
embedding assumption. In contrast, the ((1+epsilon))-approximate oracle
requires a stronger embedding condition, moderate compression ratios and it
typically slows down the convergence. We apply our results to accelerate
solving a class of data driven compressed sensing problems, where we replace
iterative exhaustive searches over large datasets by fast approximate nearest
neighbour search strategies based on the cover tree data structure. For
datasets with low intrinsic dimensions our proposed algorithm achieves a
complexity logarithmic in terms of the dataset population as opposed to the
linear complexity of a brute force search. By running several numerical
experiments we conclude similar observations as predicted by our theoretical
analysis.

Interference Modeling for Cellular Networks under Beamforming Transmission

Hussain Elkotby, Mai Vu
Comments: 32 pages, 9 figures, accepted in IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

We propose analytical models for the interference power distribution in a
cellular system employing MIMO beamforming in rich and limited scattering
environments, which capture non line-of-sight signal propagation in the
microwave and mmWave bands, respectively. Two candidate models are considered:
the Inverse Gaussian and the Inverse Weibull, both are two-parameter heavy tail
distributions. We further propose a mixture of these two distributions as a
model with three parameters. To estimate the parameters of these distributions,
three approaches are used: moment matching, individual distribution maximum
likelihood estimation (MLE), and mixture distribution MLE with a designed
expectation maximization algorithm. We then introduce simple fitted functions
for the mixture model parameters as polynomials of the channel path loss
exponent and shadowing variance. To measure the goodness of these models, the
information-theoretic metric relative entropy is used to capture the distance
from the model distribution to a reference one. The interference models are
tested against data obtained by simulating a cellular network based on
stochastic geometry. The results show that the three-parameter mixture model
offers remarkably good fit to simulated interference power. The mixture model
is further used to analyze the capacity of a cellular network employing joint
transmit and receive beamforming and confirms a good fit with simulation.

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

Jonathan Scarlett, Ilijia Bogunovic, Volkan Cevher
Comments: Accepted to COLT 2017
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

In this paper, we consider the problem of sequentially optimizing a black-box
function (f) based on noisy samples and bandit feedback. We assume that (f) is
smooth in the sense of having a bounded norm in some reproducing kernel Hilbert
space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian
process bandit optimization. We provide algorithm-independent lower bounds on
the simple regret, measuring the suboptimality of a single point reported after
(T) rounds, and on the cumulative regret, measuring the sum of regrets over the
(T) chosen points.

For the isotropic squared-exponential kernel in (d) dimensions, we find that
an average simple regret of (epsilon) requires (T =
Omegaig(frac{1}{epsilon^2} (logfrac{1}{epsilon})^{d/2}ig)), and the
average cumulative regret is at least (Omegaig( sqrt{T(log T)^d} ig)),
thus matching existing upper bounds up to the replacement of (d/2) by (d+O(1))
in both cases. For the Mat’ern-(
u) kernel, we give analogous bounds of the
form (Omegaig( (frac{1}{epsilon})^{2+d/
u}ig)) and (Omegaig(
T^{frac{
u + d}{2
u + d}} ig)), and discuss the resulting gaps to the
existing upper bounds.

The Sample Complexity of Online One-Class Collaborative Filtering

Reinhard Heckel, Kannan Ramchandran
Comments: ICML 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

We consider the online one-class collaborative filtering (CF) problem that
consists of recommending items to users over time in an online fashion based on
positive ratings only. This problem arises when users respond only occasionally
to a recommendation with a positive rating, and never with a negative one. We
study the impact of the probability of a user responding to a recommendation,
p_f, on the sample complexity, i.e., the number of ratings required to make
`good’ recommendations, and ask whether receiving positive and negative
ratings, instead of positive ratings only, improves the sample complexity. Both
questions arise in the design of recommender systems. We introduce a simple
probabilistic user model, and analyze the performance of an online user-based
CF algorithm. We prove that after an initial cold start phase, where
recommendations are invested in exploring the user’s preferences, this
algorithm makes—up to a fraction of the recommendations required for updating
the user’s preferences—perfect recommendations. The number of ratings
required for the cold start phase is nearly proportional to 1/p_f, and that for
updating the user’s preferences is essentially independent of p_f. As a
consequence we find that, receiving positive and negative ratings instead of
only positive ones improves the number of ratings required for initial
exploration by a factor of 1/p_f, which can be significant.