IT博客汇 | arXiv Paper Daily: Fri, 16 Dec 2016

arXiv Paper Daily: Fri, 16 Dec 2016

我爱机器学习(52ml.net)发表于 2016-12-16 00:00:00

Neural and Evolutionary Computing

Graphical RNN Models

Ashish Bora, Sugato Basu, Joydeep Ghosh
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Many time series are generated by a set of entities that interact with one
another over time. This paper introduces a broad, flexible framework to learn
from multiple inter-dependent time series generated by such entities. Our
framework explicitly models the entities and their interactions through time.
It achieves this by building on the capabilities of Recurrent Neural Networks,
while also offering several ways to incorporate domain knowledge/constraints
into the model architecture. The capabilities of our approach are showcased
through an application to weather prediction, which shows gains over strong
baselines.

Improving Neural Network Generalization by Combining Parallel Circuits with Dropout

Kien Tuong Phan, Tomas Henrique Maul, Tuong Thuy Vu, Lai Weng Kin
Comments: Pre-print. The final publication is available at Springer via this http URL
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

In an attempt to solve the lengthy training times of neural networks, we
proposed Parallel Circuits (PCs), a biologically inspired architecture.
Previous work has shown that this approach fails to maintain generalization
performance in spite of achieving sharp speed gains. To address this issue, and
motivated by the way Dropout prevents node co-adaption, in this paper, we
suggest an improvement by extending Dropout to the PC architecture. The paper
provides multiple insights into this combination, including a variety of fusion
approaches. Experiments show promising results in which improved error rates
are achieved in most cases, whilst maintaining the speed advantage of the PC
approach.

Learning binary or real-valued time-series via spike-timing dependent plasticity

Takayuki Osogami
Comments: This paper was accepted and presented at Computing with Spikes NIPS 2016 Workshop, Barcelona, Spain, December 2016
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

A dynamic Boltzmann machine (DyBM) has been proposed as a model of a spiking
neural network, and its learning rule of maximizing the log-likelihood of given
time-series has been shown to exhibit key properties of spike-timing dependent
plasticity (STDP), which had been postulated and experimentally confirmed in
the field of neuroscience as a learning rule that refines the Hebbian rule.
Here, we relax some of the constraints in the DyBM in a way that it becomes
more suitable for computation and learning. We show that learning the DyBM can
be considered as logistic regression for binary-valued time-series. We also
show how the DyBM can learn real-valued data in the form of a Gaussian DyBM and
discuss its relation to the vector autoregressive (VAR) model. The Gaussian
DyBM extends the VAR by using additional explanatory variables, which
correspond to the eligibility traces of the DyBM and capture long term
dependency of the time-series. Numerical experiments show that the Gaussian
DyBM significantly improves the predictive accuracy over VAR.

Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

Franck Dernoncourt, Ji Young Lee, Peter Szolovits
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Existing models based on artificial neural networks (ANNs) for sentence
classification often do not incorporate the context in which sentences appear,
and classify sentences individually. However, traditional sentence
classification approaches have been shown to greatly benefit from jointly
classifying subsequent sentences, such as with conditional random fields. In
this work, we present an ANN architecture that combines the effectiveness of
typical ANN models to classify sentences in isolation, with the strength of
structured prediction. Our model achieves state-of-the-art results on two
different datasets for sequential sentence classification in medical abstracts.

Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN

Li Jing, Yichen Shen, Tena Dubček, John Peurifoy, Scott Skirlo, Max Tegmark, Marin Soljačić
Comments: 9 pages, 4 figures
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

We present a method for implementing an Efficient Unitary Neural Network
(EUNN) whose computational complexity is merely (mathcal{O}(1)) per parameter
and has full tunability, from spanning part of unitary space to all of it. We
apply the EUNN in Recurrent Neural Networks, and test its performance on the
standard copying task and the MNIST digit recognition benchmark, finding that
it significantly outperforms a non-unitary RNN, an LSTM network, an exclusively
partial space URNN and a projective URNN with comparable parameter numbers.

Computer Vision and Pattern Recognition

Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator

Namhoon Lee, Xinshuo Weng, Vishnu Naresh Boddeti, Yu Zhang, Fares Beainy, Kris Kitani, Takeo Kanade
Comments: submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce the concept of a Visual Compiler that generates a scene specific
pedestrian detector and pose estimator without any pedestrian observations.
Given a single image and auxiliary scene information in the form of camera
parameters and geometric layout of the scene, the Visual Compiler first infers
geometrically and photometrically accurate images of humans in that scene
through the use of computer graphics rendering. Using these renders we learn a
scene-and-region specific spatially-varying fully convolutional neural network,
for simultaneous detection, pose estimation and segmentation of pedestrians. We
demonstrate that when real human annotated data is scarce or non-existent, our
data generation strategy can provide an excellent solution for bootstrapping
human detection and pose estimation. Experimental results show that our
approach outperforms off-the-shelf state-of-the-art pedestrian detectors and
pose estimators that are trained on real data.

CSVideoNet: A Recurrent Convolutional Neural Network for Compressive Sensing Video Reconstruction

Kai Xu, Fengbo Ren
Comments: 10 pages, 6 pages, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

In this paper, we develop a deep neural network architecture called
“CSVideoNet” that can learn visual representations from random measurements for
compressive sensing (CS) video reconstruction. CSVideoNet is an end-to-end
trainable and non-iterative model that combines convolutional neural networks
(CNNs) with a recurrent neural networks (RNN) to facilitate video
reconstruction by leveraging temporal-spatial features. The proposed network
can accept random measurements with a multi-level compression ratio (CR). The
lightly and aggressively compressed measurements offer background information
and object details, respectively. This is similar to the variable bit rate
techniques widely used in conventional video coding approaches. The RNN
employed by CSVideoNet can leverage temporal coherence that exists in adjacent
video frames to extrapolate motion features and merge them with spatial visual
features extracted by the CNNs to further enhance reconstruction quality,
especially at high CRs. We test our CSVideoNet on the UCF-101 dataset.
Experimental results show that CSVideoNet outperforms the existing video CS
reconstruction approaches. The results demonstrate that our method can preserve
relatively excellent visual details from original videos even at a 100x CR,
which is difficult to realize with the reference approaches. Also, the
non-iterative nature of CSVideoNet results in an decrease in runtime by three
orders of magnitude over iterative reconstruction algorithms. Furthermore,
CSVideoNet can enhance the CR of CS cameras beyond the limitation of
conventional approaches, ensuring a reduction in bandwidth for data
transmission. These benefits are especially favorable to high-frame-rate video
applications.

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

John McCormac, Ankur Handa, Stefan Leutenegger, Andrew J. Davison
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce SceneNet RGB-D, expanding the previous work of SceneNet to
enable large scale photorealistic rendering of indoor scene trajectories. It
provides pixel-perfect ground truth for scene understanding problems such as
semantic segmentation, instance segmentation, and object detection, and also
for geometric computer vision problems such as optical flow, depth estimation,
camera pose estimation, and 3D reconstruction. Random sampling permits
virtually unlimited scene configurations, and here we provide a set of 5M
rendered RGB-D images from over 15K trajectories in synthetic layouts with
random but physically simulated object poses. Each layout also has random
lighting, camera trajectories, and textures. The scale of this dataset is well
suited for pre-training data-driven computer vision techniques from scratch
with RGB-D inputs, which previously has been limited by relatively small
labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for
investigating 3D scene labelling tasks by providing perfect camera poses and
depth data as proxy for a SLAM system. We host the dataset at
this http URL

Reflectance Adaptive Filtering Improves Intrinsic Image Estimation

Thomas Nestmeyer, Peter V. Gehler
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Separation of an input image into its reflectance and shading layers poses a
challenge for learning approaches because no large corpus of precise and
realistic ground truth decompositions exists. The Intrinsic Images in the Wild
dataset (IIW) provides a sparse set of relative human reflectance judgments,
which serves as a standard benchmark for intrinsic images. This dataset led to
an increase in methods that learn statistical dependencies between the images
and their reflectance layer. Although learning plays a role in pushing
state-of-the-art performance, we show that a standard signal processing
technique achieves performance on par with recent developments. We propose a
loss function that enables learning dense reflectance predictions with a CNN.
Our results show a simple pixel-wise decision, without any context or prior
knowledge, is sufficient to provide a strong baseline on IIW. This sets a
competitive bar and we find that only two approaches surpass this result. We
then develop a joint bilateral filtering method that implements strong prior
knowledge about reflectance constancy. This filtering operation can be applied
to any intrinsic image algorithm and we improve several previous results
achieving a new state-of-the-art on IIW. Our findings suggest that the effect
of learning-based approaches may be over-estimated and that it is still the use
of explicit prior knowledge that drives performance on intrinsic image
decompositions.

Objective Micro-Facial Movement Detection Using FACS-Based Regions and Baseline Evaluation

Adrian K. Davison, Cliff Lansley, Choon Ching Ng, Kevin Tan, Moi Hoon Yap
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Micro-facial expressions are regarded as an important human behavioural event
that can highlight emotional deception. Spotting these movements is difficult
for humans and machines, however research into using computer vision to detect
subtle facial expressions is growing in popularity. This paper proposes an
individualised baseline micro-movement detection method using 3D Histogram of
Oriented Gradients (3D HOG) temporal difference method. We define a face
template consisting of 26 regions based on the Facial Action Coding System
(FACS). We extract the temporal features of each region using 3D HOG. Then, we
use Chi-square distance to find subtle facial motion in the local regions.
Finally, an automatic peak detector is used to detect micro-movements above the
newly proposed adaptive baseline threshold. The performance is validated on two
FACS coded datasets: SAMM and CASME II. This objective method focuses on the
movement of the 26 face regions. When comparing with the ground truth, the best
result was an AUC of 0.7512 and 0.7261 on SAMM and CASME II, respectively. The
results show that 3D HOG outperformed for micro-movement detection, compared to
state-of-the-art feature representations: Local Binary Patterns in Three
Orthogonal Planes and Histograms of Oriented Optical Flow.

A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract

Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, Korin Richmond
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a multilinear statistical model of the human tongue that captures
anatomical and tongue pose related shape variations separately. The model was
derived from 3D magnetic resonance imaging data of 11 speakers sustaining
speech related vocal tract configurations. The extraction was performed by
using a minimally supervised method that uses as basis an image segmentation
approach and a template fitting technique. Furthermore, it uses image denoising
to deal with possibly corrupt data, palate surface information reconstruction
to handle palatal tongue contacts, and a bootstrap strategy to refine the
obtained shapes. Our experiments concluded that limiting the degrees of freedom
for the anatomical and speech related variations to 5 and 4 respectively
produces a model that can reliably register unknown data while avoiding
overfitting effects.

Development of a Real-time Colorectal Tumor Classification System for Narrow-band Imaging zoom-videoendoscopy

Tsubasa Hirakawa, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka
Comments: 9 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Colorectal endoscopy is important for the early detection and treatment of
colorectal cancer and is used worldwide. A computer-aided diagnosis (CAD)
system that provides an objective measure to endoscopists during colorectal
endoscopic examinations would be of great value. In this study, we describe a
newly developed CAD system that provides real-time objective measures. Our
system captures the video stream from an endoscopic system and transfers it to
a desktop computer. The captured video stream is then classified by a
pretrained classifier and the results are displayed on a monitor. The
experimental results show that our developed system works efficiently in actual
endoscopic examinations and is medically significant.

Design of Image Matched Non-Separable Wavelet using Convolutional Neural Network

Naushad Ansari, Anubha Gupta, Rahul Duggal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image-matched nonseparable wavelets can find potential use in many
applications including image classification, segmen- tation, compressive
sensing, etc. This paper proposes a novel design methodology that utilizes
convolutional neural net- work (CNN) to design two-channel non-separable
wavelet matched to a given image. The design is proposed on quin- cunx lattice.
The loss function of the convolutional neural network is setup with total
squared error between the given input image to CNN and the reconstructed image
at the output of CNN, leading to perfect reconstruction at the end of train-
ing. Simulation results have been shown on some standard images.

Cloud Dictionary: Sparse Coding and Modeling for Point Clouds

Or Litany, Tal Remez, Alex Bronstein
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

With the development of range sensors such as LIDAR and time-of-flight
cameras, 3D point cloud scans have become ubiquitous in computer vision
applications, the most prominent ones being gesture recognition and autonomous
driving. Parsimony-based algorithms have shown great success on images and
videos where data points are sampled on a regular Cartesian grid. We propose an
adaptation of these techniques to irregularly sampled signals by using
continuous dictionaries. We present an example application in the form of point
cloud denoising.

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Hao Liu, Yang Yang, Fumin Shen, Lixin Duan, Heng Tao Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Along with the prosperity of recurrent neural network in modelling sequential
data and the power of attention mechanism in automatically identify salient
information, image captioning, a.k.a., image description, has been remarkably
advanced in recent years. Nonetheless, most existing paradigms may suffer from
the deficiency of invariance to images with different scaling, rotation, etc.;
and effective integration of standalone attention to form a holistic end-to-end
system. In this paper, we propose a novel image captioning architecture, termed
Recurrent Image Captioner ( extbf{RIC}), which allows visual encoder and
language decoder to coherently cooperate in a recurrent manner. Specifically,
we first equip CNN-based visual encoder with a differentiable layer to enable
spatially invariant transformation of visual signals. Moreover, we deploy an
attention filter module (differentiable) between encoder and decoder to
dynamically determine salient visual parts. We also employ bidirectional LSTM
to preprocess sentences for generating better textual representations. Besides,
we propose to exploit variational inference to optimize the whole architecture.
Extensive experimental results on three benchmark datasets (i.e., Flickr8k,
Flickr30k and MS COCO) demonstrate the superiority of our proposed architecture
as compared to most of the state-of-the-art methods.

Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network

Anh Tuan Tran, Tal Hassner, Iacopo Masi, Gerard Medioni
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The 3D shapes of faces are well known to be discriminative. Yet despite this,
they are rarely used for face recognition and always under controlled viewing
conditions. We claim that this is a symptom of a serious but often overlooked
problem with existing methods for single view 3D face reconstruction: when
applied “in the wild”, their 3D estimates are either unstable and change for
different photos of the same subject or they are over-regularized and generic.
In response, we describe a robust method for regressing discriminative 3D
morphable face models (3DMM). We use a convolutional neural network (CNN) to
regress 3DMM shape and texture parameters directly from an input photo. We
overcome the shortage of training data required for this purpose by offering a
method for generating huge numbers of labeled examples. The 3D estimates
produced by our CNN surpass state of the art accuracy on the MICC data set.
Coupled with a 3D-3D face matching pipeline, we show the first competitive face
recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes
as representations, rather than the opaque deep feature vectors used by other
modern systems.

Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

Vivek Krishnan, Deva Ramanan
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We consider the task of visual net surgery, in which a CNN can be
reconfigured without extra data to recognize novel concepts that may be omitted
from the training set. While most prior work make use of linguistic cues for
such “zero-shot” learning, we do so by using a pictorial language
representation of the training set, implicitly learned by a CNN, to generalize
to new classes. To this end, we introduce a set of visualization techniques
that better reveal the activation patterns and relations between groups of CNN
filters. We next demonstrate that knowledge of pictorial languages can be used
to rewire certain CNN neurons into a part model, which we call a pictorial
language classifier. We demonstrate the robustness of simple PLCs by applying
them in a weakly supervised manner: labeling unlabeled concepts for visual
classes present in the training data. Specifically we show that a PLC built on
top of a CNN trained for ImageNet classification can localize humans in Graz-02
and determine the pose of birds in PASCAL-VOC without extra labeled data or
additional training. We then apply PLCs in an interactive zero-shot manner,
demonstrating that pictorial languages are expressive enough to detect a set of
visual classes in MS-COCO that never appear in the ImageNet training set.

Scale Coding Bag of Deep Features for Human Attribute and Action Recognition

Fahad Shahbaz Khan, Joost van de Weijer, Rao Muhammad Anwer, Andrew D. Bagdanov, Michael Felsberg, Jorma Laaksonen
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most approaches to human attribute and action recognition in still images are
based on image representation in which multi-scale local features are pooled
across scale into a single, scale-invariant encoding. Both in bag-of-words and
the recently popular representations based on convolutional neural networks,
local features are computed at multiple scales. However, these multi-scale
convolutional features are pooled into a single scale-invariant representation.
We argue that entirely scale-invariant image representations are sub-optimal
and investigate approaches to scale coding within a Bag of Deep Features
framework.

Our approach encodes multi-scale information explicitly during the image
encoding stage. We propose two strategies to encode multi-scale information
explicitly in the final image representation. We validate our two scale coding
techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012,
Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale
coding approaches outperform both the scale-invariant method and the standard
deep features of the same network. Further, combining our scale coding
approaches with standard deep features leads to consistent improvement over the
state-of-the-art.

Border-Peeling Clustering

Nadav Bar, Hadar Averbuch-Elor, Daniel Cohen-Or
Comments: 9 pages, 9 figures, supplementary material added as ancillary file
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a novel non-parametric clustering technique, which
is based on an iterative algorithm that peels off layers of points around the
clusters. Our technique is based on the notion that each latent cluster is
comprised of layers that surround its core, where the external layers, or
border points, implicitly separate the clusters. Analyzing the K-nearest
neighbors of the points makes it possible to identify the border points and
associate them with points of inner layers. Our clustering algorithm
iteratively identifies border points, peels them, and separates the latent
clusters. We show that the peeling process adapts to the local density and
successfully separates adjacent clusters. A notable quality of the
Border-Peeling algorithm is that it does not require any parameter tuning in
order to outperform state-of-the-art finely-tuned non-parametric clustering
methods, including Mean-Shift and DBSCAN. We further assess our technique on
high-dimensional datasets that vary in size and characteristics. In particular,
we analyze the space of deep features that were trained by a convolutional
neural network.

A fuzzy approach for segmentation of touching characters

Giuseppe Airò Farulla, Nadir Murru, Rosaria Rossini
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The problem of correctly segmenting touching characters is an hard task to
solve and it is of major relevance in pattern recognition. In the recent years,
many methods and algorithms have been proposed; still, a definitive solution is
far from being found. In this paper, we propose a novel method based on fuzzy
logic. The proposed method combines in a novel way three features for
segmenting touching characters that have been already proposed in other studies
but have been exploited only singularly so far. The proposed strategy is based
on a 3–input/1–output fuzzy inference system with fuzzy rules specifically
optimized for segmenting touching characters in the case of Latin printed and
handwritten characters. The system performances are illustrated and supported
by numerical examples showing that our approach can achieve a reasonable good
overall accuracy in segmenting characters even on tricky conditions of touching
characters. Moreover, numerical results suggest that the method can be applied
to many different datasets of characters by means of a convenient tuning of the
fuzzy sets and rules.

Temporal-Needle: A view and appearance invariant video descriptor

Michal Yarom, Michal Irani
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The ability to detect similar actions across videos can be very useful for
real-world applications in many fields. However, this task is still challenging
for existing systems, since videos that present the same action, can be taken
from significantly different viewing directions, performed by different actors
and backgrounds and under various video qualities. Video descriptors play a
significant role in these systems. In this work we propose the
“temporal-needle” descriptor which captures the dynamic behavior, while being
invariant to viewpoint and appearance. The descriptor is computed using multi
temporal scales of the video and by computing self-similarity for every patch
through time in every temporal scale. The descriptor is computed for every
pixel in the video. However, to find similar actions across videos, we consider
only a small subset of the descriptors – the statistical significant
descriptors. This allow us to find good correspondences across videos more
efficiently. Using the descriptor, we were able to detect the same behavior
across videos in a variety of scenarios. We demonstrate the use of the
descriptor in tasks such as temporal and spatial alignment, action detection
and even show its potential in unsupervised video clustering into categories.
In this work we handled only videos taken with stationary cameras, but the
descriptor can be extended to handle moving camera as well.

The More You Know: Using Knowledge Graphs for Image Classification

Kenneth Marino, Ruslan Salakhutdinov, Abhinav Gupta
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Humans have the remarkable capability to learn a large variety of visual
concepts, often with very few examples, whereas current state-of-the-art vision
algorithms require hundreds or thousands of examples per category and struggle
with ambiguity. One characteristic that sets humans apart is our ability to
acquire knowledge about the world and reason using this knowledge. This paper
investigates the use of structured prior knowledge in the form of knowledge
graphs and shows that using this knowledge improves performance on image
classification. Specifically, we introduce the Graph Search Neural Network as a
way of efficiently incorporating large knowledge graphs into a fully end-to-end
learning system. We show in a number of experiments that our method outperforms
baselines for multi-label classification, even under low data and few-shot
settings.

Coupling Adaptive Batch Sizes with Learning Rates

Lukas Balles, Javier Romero, Philipp Hennig
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Mini-batch stochastic gradient descent and variants thereof have become
standard for large-scale empirical risk minimization like the training of
neural networks. These methods are usually used with a constant batch size
chosen by simple empirical inspection. The batch size significantly influences
the behavior of the stochastic optimization algorithm, though, since it
determines the variance of the gradient estimates. This variance also changes
over the optimization process; when using a constant batch size, stability and
convergence is thus often enforced by means of a (manually tuned) decreasing
learning rate schedule. We propose a practical method for dynamic batch size
adaptation. It estimates the variance of the stochastic gradients and adapts
the batch size to decrease the variance proportionally to the value of the
objective function, removing the need for the aforementioned learning rate
decrease. In contrast to recent related work, our algorithm couples the batch
size to the learning rate, directly reflecting the known relationship between
the two. On three image classification benchmarks, our batch size adaptation
yields faster optimization convergence, while simultaneously simplifying
learning rate tuning. A TensorFlow implementation is available.

Towards Score Following in Sheet Music Images

Matthias Dorfer, Andreas Arzt, Gerhard Widmer
Comments: Published In Proceedings of the 17th International Society for Music Information Retrieval Conference (2016)
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the matching of short music audio snippets to the
corresponding pixel location in images of sheet music. A system is presented
that simultaneously learns to read notes, listens to music and matches the
currently played music to its corresponding notes in the sheet. It consists of
an end-to-end multi-modal convolutional neural network that takes as input
images of sheet music and spectrograms of the respective audio snippets. It
learns to predict, for a given unseen audio snippet (covering approximately one
bar of music), the corresponding position in the respective score line. Our
results suggest that with the use of (deep) neural networks — which have
proven to be powerful image processing models — working with sheet music
becomes feasible and a promising future research direction.

Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration

Cecilia S. Lee, Doug M. Baughman, Aaron Y. Lee
Comments: 4 Figures, 1 Table
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Objective: The advent of Electronic Medical Records (EMR) with large
electronic imaging databases along with advances in deep neural networks with
machine learning has provided a unique opportunity to achieve milestones in
automated image analysis. Optical coherence tomography (OCT) is the most
commonly obtained imaging modality in ophthalmology and represents a dense and
rich dataset when combined with labels derived from the EMR. We sought to
determine if deep learning could be utilized to distinguish normal OCT images
from images from patients with Age-related Macular Degeneration (AMD). Methods:
Automated extraction of an OCT imaging database was performed and linked to
clinical endpoints from the EMR. OCT macula scans were obtained by Heidelberg
Spectralis, and each OCT scan was linked to EMR clinical endpoints extracted
from EPIC. The central 11 images were selected from each OCT scan of two
cohorts of patients: normal and AMD. Cross-validation was performed using a
random subset of patients. Area under receiver operator curves (auROC) were
constructed at an independent image level, macular OCT level, and patient
level. Results: Of an extraction of 2.6 million OCT images linked to clinical
datapoints from the EMR, 52,690 normal and 48,312 AMD macular OCT images were
selected. A deep neural network was trained to categorize images as either
normal or AMD. At the image level, we achieved an auROC of 92.78% with an
accuracy of 87.63%. At the macula level, we achieved an auROC of 93.83% with an
accuracy of 88.98%. At a patient level, we achieved an auROC of 97.45% with an
accuracy of 93.45%. Peak sensitivity and specificity with optimal cutoffs were
92.64% and 93.69% respectively. Conclusions: Deep learning techniques are
effective for classifying OCT images. These findings have important
implications in utilizing OCT in automated screening and computer aided
diagnosis tools.

Artificial Intelligence

Ontohub: A semantic repository for heterogeneous ontologies

Mihai Codescu, Eugen Kuksa, Oliver Kutz, Till Mossakowski, Fabian Neuhaus
Comments: Preprint, journal special issue
Subjects: Artificial Intelligence (cs.AI)

Ontohub is a repository engine for managing distributed heterogeneous
ontologies. The distributed nature enables communities to share and exchange
their contributions easily. The heterogeneous nature makes it possible to
integrate ontologies written in various ontology languages. Ontohub supports a
wide range of formal logical and ontology languages, as well as various
structuring and modularity constructs and inter-theory (concept) mappings,
building on the OMG-standardized DOL language. Ontohub repositories are
organised as Git repositories, thus inheriting all features of this popular
version control system. Moreover, Ontohub is the first repository engine
meeting a substantial amount of the requirements formulated in the context of
the Open Ontology Repository (OOR) initiative, including an API for federation
as well as support for logical inference and axiom selection.

Crowdsourced Outcome Determination in Prediction Markets

Rupert Freeman, Sebastien Lahaie, David M. Pennock
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

A prediction market is a useful means of aggregating information about a
future event. To function, the market needs a trusted entity who will verify
the true outcome in the end. Motivated by the recent introduction of
decentralized prediction markets, we introduce a mechanism that allows for the
outcome to be determined by the votes of a group of arbiters who may themselves
hold stakes in the market. Despite the potential conflict of interest, we
derive conditions under which we can incentivize arbiters to vote truthfully by
using funds raised from market fees to implement a peer prediction mechanism.
Finally, we investigate what parameter values could be used in a real-world
implementation of our mechanism.

Collaborative creativity with Monte-Carlo Tree Search and Convolutional Neural Networks

Memo Akten, Mick Grierson
Comments: Presented at the Constructive Machine Learning workshop at NIPS 2016 as a poster and spotlight talk. 8 pages including 2 page references, 2 page appendix, 3 figures. Blog post (including videos) at this https URL
Subjects: Artificial Intelligence (cs.AI)

We investigate a human-machine collaborative drawing environment in which an
autonomous agent sketches images while optionally allowing a user to directly
influence the agent’s trajectory. We combine Monte Carlo Tree Search with image
classifiers and test both shallow models (e.g. multinomial logistic regression)
and deep Convolutional Neural Networks (e.g. LeNet, Inception v3). We found
that using the shallow model, the agent produces a limited variety of images,
which are noticably recogonisable by humans. However, using the deeper models,
the agent produces a more diverse range of images, and while the agent remains
very confident (99.99%) in having achieved its objective, to humans they mostly
resemble unrecognisable ‘random’ noise. We relate this to recent research which
also discovered that ‘deep neural networks are easily fooled’ cite{Nguyen2015}
and we discuss possible solutions and future directions for the research.

Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

Existing models based on artificial neural networks (ANNs) for sentence
classification often do not incorporate the context in which sentences appear,
and classify sentences individually. However, traditional sentence
classification approaches have been shown to greatly benefit from jointly
classifying subsequent sentences, such as with conditional random fields. In
this work, we present an ANN architecture that combines the effectiveness of
typical ANN models to classify sentences in isolation, with the strength of
structured prediction. Our model achieves state-of-the-art results on two
different datasets for sequential sentence classification in medical abstracts.

Improving Scalability of Reinforcement Learning by Separation of Concerns

Harm van Seijen, Mehdi Fatemi, Joshua Romoff
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

In this paper, we propose a framework for solving a single-agent task by
using multiple agents, each focusing on different aspects of the task. This
approach has two main advantages: 1) it allows for specialized agents for
different parts of the task, and 2) it provides a new way to transfer
knowledge, by transferring trained agents. Our framework generalizes the
traditional hierarchical decomposition, in which, at any moment in time, a
single agent has control until it has solved its particular subtask. We
illustrate our framework using a number of examples.

Adversarial Message Passing For Graphical Models

Theofanis Karaletsos
Comments: (12 pages, 2 figures) Presented at NIPS Advances In Approximate Inference 2016 (AABI 2016)
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI)

Bayesian inference on structured models typically relies on the ability to
infer posterior distributions of underlying hidden variables. However,
inference in implicit models or complex posterior distributions is hard. A
popular tool for learning implicit models are generative adversarial networks
(GANs) which learn parameters of generators by fooling discriminators.
Typically, GANs are considered to be models themselves and are not understood
in the context of inference. Current techniques rely on inefficient global
discrimination of joint distributions to perform learning, or only consider
discriminating a single output variable. We overcome these limitations by
treating GANs as a basis for likelihood-free inference in generative models and
generalize them to Bayesian posterior inference over factor graphs. We propose
local learning rules based on message passing minimizing a global divergence
criterion involving cooperating local adversaries used to sidestep explicit
likelihood evaluations. This allows us to compose models and yields a unified
inference and learning framework for adversarial learning. Our framework treats
model specification and inference separately and facilitates richly structured
models within the family of Directed Acyclic Graphs, including components such
as intractable likelihoods, non-differentiable models, simulators and generally
cumbersome models. A key result of our treatment is the insight that Bayesian
inference on structured models can be performed only with sampling and
discrimination when using nonparametric variational families, without access to
explicit distributions. As a side-result, we discuss the link to likelihood
maximization. These approaches hold promise to be useful in the toolbox of
probabilistic modelers and enrich the gamut of current probabilistic
programming applications.

TeKnowbase: Towards Construction of a Knowledge-base of Technical Concepts

Prajna Upadhyay, Tanuma Patra, Ashwini Purkar, Maya Ramanath
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this paper, we describe the construction of TeKnowbase, a knowledge-base
of technical concepts in computer science. Our main information sources are
technical websites such as Webopedia and Techtarget as well as Wikipedia and
online textbooks. We divide the knowledge-base construction problem into two
parts — the acquisition of entities and the extraction of relationships among
these entities. Our knowledge-base consists of approximately 100,000 triples.
We conducted an evaluation on a sample of triples and report an accuracy of a
little over 90\%. We additionally conducted classification experiments on
StackOverflow data with features from TeKnowbase and achieved improved
classification accuracy.

Learning Through Dialogue Interactions

Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

A good dialogue agent should have the ability to interact with users. In this
work, we explore this direction by designing a simulator and a set of synthetic
tasks in the movie domain that allow the learner to interact with a teacher by
both asking and answering questions. We investigate how a learner can benefit
from asking questions in both an offline and online reinforcement learning
setting. We demonstrate that the learner improves when asking questions. Our
work represents a first step in developing end-to-end learned interactive
dialogue agents.

Dynamical Kinds and their Discovery

Benjamin C. Jantzen
Comments: Accepted for the proceedings of the Causation: Foundation to Application Workshop, UAI 2016
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

We demonstrate the possibility of classifying causal systems into kinds that
share a common structure without first constructing an explicit dynamical model
or using prior knowledge of the system dynamics. The algorithmic ability to
determine whether arbitrary systems are governed by causal relations of the
same form offers significant practical applications in the development and
validation of dynamical models. It is also of theoretical interest as an
essential stage in the scientific inference of laws from empirical data. The
algorithm presented is based on the dynamical symmetry approach to dynamical
kinds. A dynamical symmetry with respect to time is an intervention on one or
more variables of a system that commutes with the time evolution of the system.
A dynamical kind is a class of systems sharing a set of dynamical symmetries.
The algorithm presented classifies deterministic, time-dependent causal systems
by directly comparing their exhibited symmetries. Using simulated, noisy data
from a variety of nonlinear systems, we show that this algorithm correctly
sorts systems into dynamical kinds. It is robust under significant sampling
error, is immune to violations of normality in sampling error, and fails
gracefully with increasing dynamical similarity. The algorithm we demonstrate
is the first to address this aspect of automated scientific discovery.

Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, E. Agirre
Comments: Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

User acceptance of artificial intelligence agents might depend on their
ability to explain their reasoning, which requires adding an interpretability
layer that fa- cilitates users to understand their behavior. This paper focuses
on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),
which measures the degree of semantic equivalence between two sentences. The
interpretability layer is formalized as the alignment between pairs of segments
across the two sentences, where the relation between the segments is labeled
with a relation type and a similarity score. We present a publicly available
dataset of sentence pairs annotated following the formalization. We then
develop a system trained on this dataset which, given a sentence pair, explains
what is similar and different, in the form of graded and typed segment
alignments. When evaluated on the dataset, the system performs better than an
informed baseline, showing that the dataset and task are well-defined and
feasible. Most importantly, two user studies show how the system output can be
used to automatically produce explanations in natural language. Users performed
better when having access to the explanations, pro- viding preliminary evidence
that our dataset and method to automatically produce explanations is useful in
real applications.

Information Retrieval

Using the Context of User Feedback in Recommender Systems

Ladislav Peska (Charles University in Prague, Faculty of Mathematics and Physics)
Comments: In Proceedings MEMICS 2016, arXiv:1612.04037
Journal-ref: EPTCS 233, 2016, pp. 1-12
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC)

Our work is generally focused on recommending for small or medium-sized
e-commerce portals, where explicit feedback is absent and thus the usage of
implicit feedback is necessary. Nonetheless, for some implicit feedback
features, the presentation context may be of high importance. In this paper, we
present a model of relevant contextual features affecting user feedback,
propose methods leveraging those features, publish a dataset of real e-commerce
users containing multiple user feedback indicators as well as its context and
finally present results of purchase prediction and recommendation experiments.
Off-line experiments with real users of a Czech travel agency website
corroborated the importance of leveraging presentation context in both purchase
prediction and recommendation tasks.

A Graph Summarization: A Survey

Yike Liu, Abhilash Dighe, Tara Safavi, Danai Koutra
Subjects: Information Retrieval (cs.IR)

While advances in computing resources have made processing enormous amounts
of data possible, human ability to identify patterns in such data has not
scaled accordingly. Thus, efficient computational methods for condensing and
simplifying data are becoming vital for extracting actionable insights. In
particular, while data summarization techniques have been studied extensively,
only recently has summarizing interconnected data, or graphs, become popular.
This survey is a structured, comprehensive overview of the state-of-the-art
methods for summarizing graph data. We first broach the motivation behind and
the challenges of graph summarization. We then categorize summarization
approaches by the type of graphs taken as input and further organize each
category by core methodology. Finally, we discuss applications of summarization
on real-world graphs and conclude by describing some open problems in the
field.

Towards End-to-End Audio-Sheet-Music Retrieval

Matthias Dorfer, Andreas Arzt, Gerhard Widmer
Comments: In NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop, Barcelona, Spain
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Learning (cs.LG)

This paper demonstrates the feasibility of learning to retrieve short
snippets of sheet music (images) when given a short query excerpt of music
(audio) — and vice versa –, without any symbolic representation of music or
scores. This would be highly useful in many content-based musical retrieval
scenarios. Our approach is based on Deep Canonical Correlation Analysis (DCCA)
and learns correlated latent spaces allowing for cross-modality retrieval in
both directions. Initial experiments with relatively simple monophonic music
show promising results.

Computation and Language

Neural Networks for Joint Sentence Classification in Medical Paper Abstracts

Existing models based on artificial neural networks (ANNs) for sentence
classification often do not incorporate the context in which sentences appear,
and classify sentences individually. However, traditional sentence
classification approaches have been shown to greatly benefit from jointly
classifying subsequent sentences, such as with conditional random fields. In
this work, we present an ANN architecture that combines the effectiveness of
typical ANN models to classify sentences in isolation, with the strength of
structured prediction. Our model achieves state-of-the-art results on two
different datasets for sequential sentence classification in medical abstracts.

Building a robust sentiment lexicon with (almost) no resource

Mickael Rouvier, Benoit Favre
Subjects: Computation and Language (cs.CL)

Creating sentiment polarity lexicons is labor intensive. Automatically
translating them from resourceful languages requires in-domain machine
translation systems, which rely on large quantities of bi-texts. In this paper,
we propose to replace machine translation by transferring words from the
lexicon through word embeddings aligned across languages with a simple linear
transform. The approach leads to no degradation, compared to machine
translation, when tested on sentiment polarity classification on tweets from
four languages.

Transition-based Parsing with Context Enhancement and Future Reward Reranking

Fugen Zhou, Fuxiang Wu, Zhengchen Zhang, Minghui Dong
Subjects: Computation and Language (cs.CL)

This paper presents a novel reranking model, future reward reranking, to
re-score the actions in a transition-based parser by using a global scorer.
Different to conventional reranking parsing, the model searches for the best
dependency tree in all feasible trees constraining by a sequence of actions to
get the future reward of the sequence. The scorer is based on a first-order
graph-based parser with bidirectional LSTM, which catches different parsing
view compared with the transition-based parser. Besides, since context
enhancement has shown substantial improvement in the arc-stand transition-based
parsing over the parsing accuracy, we implement context enhancement on an
arc-eager transition-base parser with stack LSTMs, the dynamic oracle and
dropout supporting and achieve further improvement. With the global scorer and
context enhancement, the results show that UAS of the parser increases as much
as 1.20% for English and 1.66% for Chinese, and LAS increases as much as 1.32%
for English and 1.63% for Chinese. Moreover, we get state-of-the-art LASs,
achieving 87.58% for Chinese and 93.37% for English.

TeKnowbase: Towards Construction of a Knowledge-base of Technical Concepts

Prajna Upadhyay, Tanuma Patra, Ashwini Purkar, Maya Ramanath
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this paper, we describe the construction of TeKnowbase, a knowledge-base
of technical concepts in computer science. Our main information sources are
technical websites such as Webopedia and Techtarget as well as Wikipedia and
online textbooks. We divide the knowledge-base construction problem into two
parts — the acquisition of entities and the extraction of relationships among
these entities. Our knowledge-base consists of approximately 100,000 triples.
We conducted an evaluation on a sample of triples and report an accuracy of a
little over 90\%. We additionally conducted classification experiments on
StackOverflow data with features from TeKnowbase and achieved improved
classification accuracy.

Learning Through Dialogue Interactions

Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

A good dialogue agent should have the ability to interact with users. In this
work, we explore this direction by designing a simulator and a set of synthetic
tasks in the movie domain that allow the learner to interact with a teacher by
both asking and answering questions. We investigate how a learner can benefit
from asking questions in both an offline and online reinforcement learning
setting. We demonstrate that the learner improves when asking questions. Our
work represents a first step in developing end-to-end learned interactive
dialogue agents.

Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

User acceptance of artificial intelligence agents might depend on their
ability to explain their reasoning, which requires adding an interpretability
layer that fa- cilitates users to understand their behavior. This paper focuses
on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),
which measures the degree of semantic equivalence between two sentences. The
interpretability layer is formalized as the alignment between pairs of segments
across the two sentences, where the relation between the segments is labeled
with a relation type and a similarity score. We present a publicly available
dataset of sentence pairs annotated following the formalization. We then
develop a system trained on this dataset which, given a sentence pair, explains
what is similar and different, in the form of graded and typed segment
alignments. When evaluated on the dataset, the system performs better than an
informed baseline, showing that the dataset and task are well-defined and
feasible. Most importantly, two user studies show how the system output can be
used to automatically produce explanations in natural language. Users performed
better when having access to the explanations, pro- viding preliminary evidence
that our dataset and method to automatically produce explanations is useful in
real applications.

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Hao Liu, Yang Yang, Fumin Shen, Lixin Duan, Heng Tao Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Along with the prosperity of recurrent neural network in modelling sequential
data and the power of attention mechanism in automatically identify salient
information, image captioning, a.k.a., image description, has been remarkably
advanced in recent years. Nonetheless, most existing paradigms may suffer from
the deficiency of invariance to images with different scaling, rotation, etc.;
and effective integration of standalone attention to form a holistic end-to-end
system. In this paper, we propose a novel image captioning architecture, termed
Recurrent Image Captioner ( extbf{RIC}), which allows visual encoder and
language decoder to coherently cooperate in a recurrent manner. Specifically,
we first equip CNN-based visual encoder with a differentiable layer to enable
spatially invariant transformation of visual signals. Moreover, we deploy an
attention filter module (differentiable) between encoder and decoder to
dynamically determine salient visual parts. We also employ bidirectional LSTM
to preprocess sentences for generating better textual representations. Besides,
we propose to exploit variational inference to optimize the whole architecture.
Extensive experimental results on three benchmark datasets (i.e., Flickr8k,
Flickr30k and MS COCO) demonstrate the superiority of our proposed architecture
as compared to most of the state-of-the-art methods.

Distributed, Parallel, and Cluster Computing

Private Learning on Networks

Shripad Gade, Nitin H. Vaidya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Optimization and Control (math.OC)

Continual data collection and widespread deployment of machine learning
algorithms, particularly the distributed variants, have raised new privacy
challenges. In a distributed machine learning scenario, the dataset is stored
among several machines and they solve a distributed optimization problem to
collectively learn the underlying model. We present a secure multi-party
computation inspired privacy preserving distributed algorithm for optimizing a
convex function consisting of several possibly non-convex functions. Each
individual objective function is privately stored with an agent while the
agents communicate model parameters with neighbor machines connected in a
network. We show that our algorithm can correctly optimize the overall
objective function and learn the underlying model accurately. We further prove
that under a vertex connectivity condition on the topology, our algorithm
preserves privacy of individual objective functions. We establish limits on the
what a coalition of adversaries can learn by observing the messages and states
shared over a network.

GentleRain+: Making GentleRain Robust on Clock Anomalies

Mohammad Roohitavaf, Sandeep Kulkarni
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Causal consistency is in an intermediate consistency model that can be
achieved together with high availability and high performance requirements even
in presence of network partitions. There are several proposals in the
literature for causally consistent data stores. Thanks to the use of single
scalar physical clocks, GentleRain has a throughput higher than other proposals
such as COPS or Orbe. However, both of its correctness and performance relay on
monotonic synchronized physical clocks. Specifically, if physical clocks go
backward its correctness is violated. In addition, GentleRain is sensitive on
the clock synchronization, and clock skew may slow write operations in
GenlteRain. In this paper, we want to solve this issue in GenlteRain by using
Hybrid Logical Clock (HLC) instead of physical clocks. Using HLC, GentleRain
protocl is not sensitive on the clock skew anymore. In addition, even if clocks
go backward, the correctness of the system is not violated. Furthermore, by
HLC, we timestamp versions with a clock very close to the physical clocks.
Thus, we can take causally consistency snapshot of the system at any give
physical time. We call GentleRain protocol with HLCs GentleRain+. We have
implemented GentleRain+ protocol, and have evaluated it experimentally.
GentleRain+ provides faster write operations compare to GentleRain that rely
solely on physical clocks to achieve causal consistency. We have also shown
that using HLC instead of physical clock does not have any overhead. Thus, it
makes GentleRain more robust on clock anomalies at no cost.

Scalable Byzantine Consensus via Hardware-assisted Secret Sharing

Jian Liu, Wenting Li, Ghassan O. Karame, N. Asokan
Comments: 11 pages, 10 figures
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

The surging interest in blockchain technology has revitalized the search for
effective Byzantine consensus schemes. In particular, the blockchain community
has been looking for ways to effectively integrate traditional Byzantine
fault-tolerant (BFT) protocols into a blockchain consensus layer allowing
various financial institutions to securely agree on the order of transactions.
However, existing BFT protocols can only scale to tens of nodes due to their
(O(n^2)) message complexity.

In this paper, we propose FastBFT, the fastest and most scalable BFT protocol
to-date. At the heart of FastBFT is a novel message aggregation technique that
combines hardware-based trusted execution environments (TEEs) with lightweight
secret sharing primitives. Combining this technique with several other
optimizations (i.e., optimistic execution, tree topology and failure
detection), FastBFT achieves low latency and high throughput even for large
scale networks. Via systematic analysis and experiments, we demonstrate that
FastBFT has better scalability and performance than previous BFT protocols.

Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

Sunil Thulasidasan, Jeffrey Bilmes, Garrett Kenyon
Comments: NIPS 2016 Workshop on Machine Learning Systems
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

We describe a computationally efficient, stochastic graph-regularization
technique that can be utilized for the semi-supervised training of deep neural
networks in a parallel or distributed setting. We utilize a technique, first
described in [13] for the construction of mini-batches for stochastic gradient
descent (SGD) based on synthesized partitions of an affinity graph that are
consistent with the graph structure, but also preserve enough stochasticity for
convergence of SGD to good local minima. We show how our technique allows a
graph-based semi-supervised loss function to be decomposed into a sum over
objectives, facilitating data parallelism for scalable training of machine
learning models. Empirical results indicate that our method significantly
improves classification accuracy compared to the fully-supervised case when the
fraction of labeled data is low, and in the parallel case, achieves significant
speed-up in terms of wall-clock time to convergence. We show the results for
both sequential and distributed-memory semi-supervised DNN training on a speech
corpus.

Learning

Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN

We present a method for implementing an Efficient Unitary Neural Network
(EUNN) whose computational complexity is merely (mathcal{O}(1)) per parameter
and has full tunability, from spanning part of unitary space to all of it. We
apply the EUNN in Recurrent Neural Networks, and test its performance on the
standard copying task and the MNIST digit recognition benchmark, finding that
it significantly outperforms a non-unitary RNN, an LSTM network, an exclusively
partial space URNN and a projective URNN with comparable parameter numbers.

Improving Scalability of Reinforcement Learning by Separation of Concerns

Harm van Seijen, Mehdi Fatemi, Joshua Romoff
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

In this paper, we propose a framework for solving a single-agent task by
using multiple agents, each focusing on different aspects of the task. This
approach has two main advantages: 1) it allows for specialized agents for
different parts of the task, and 2) it provides a new way to transfer
knowledge, by transferring trained agents. Our framework generalizes the
traditional hierarchical decomposition, in which, at any moment in time, a
single agent has control until it has solved its particular subtask. We
illustrate our framework using a number of examples.

Coupling Adaptive Batch Sizes with Learning Rates

Lukas Balles, Javier Romero, Philipp Hennig
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Mini-batch stochastic gradient descent and variants thereof have become
standard for large-scale empirical risk minimization like the training of
neural networks. These methods are usually used with a constant batch size
chosen by simple empirical inspection. The batch size significantly influences
the behavior of the stochastic optimization algorithm, though, since it
determines the variance of the gradient estimates. This variance also changes
over the optimization process; when using a constant batch size, stability and
convergence is thus often enforced by means of a (manually tuned) decreasing
learning rate schedule. We propose a practical method for dynamic batch size
adaptation. It estimates the variance of the stochastic gradients and adapts
the batch size to decrease the variance proportionally to the value of the
objective function, removing the need for the aforementioned learning rate
decrease. In contrast to recent related work, our algorithm couples the batch
size to the learning rate, directly reflecting the known relationship between
the two. On three image classification benchmarks, our batch size adaptation
yields faster optimization convergence, while simultaneously simplifying
learning rate tuning. A TensorFlow implementation is available.

A Fully Convolutional Deep Auditory Model for Musical Chord Recognition

Filip Korzeniowski, Gerhard Widmer
Comments: In Proceedings of the 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Vietro sul Mare, Italy
Subjects: Learning (cs.LG); Sound (cs.SD)

Chord recognition systems depend on robust feature extraction pipelines.
While these pipelines are traditionally hand-crafted, recent advances in
end-to-end machine learning have begun to inspire researchers to explore
data-driven methods for such tasks. In this paper, we present a chord
recognition system that uses a fully convolutional deep auditory model for
feature extraction. The extracted features are processed by a Conditional
Random Field that decodes the final chord sequence. Both processing stages are
trained automatically and do not require expert knowledge for optimising
parameters. We show that the learned auditory system extracts musically
interpretable features, and that the proposed chord recognition system achieves
results on par or better than state-of-the-art algorithms.

Towards Score Following in Sheet Music Images

This paper addresses the matching of short music audio snippets to the
corresponding pixel location in images of sheet music. A system is presented
that simultaneously learns to read notes, listens to music and matches the
currently played music to its corresponding notes in the sheet. It consists of
an end-to-end multi-modal convolutional neural network that takes as input
images of sheet music and spectrograms of the respective audio snippets. It
learns to predict, for a given unseen audio snippet (covering approximately one
bar of music), the corresponding position in the respective score line. Our
results suggest that with the use of (deep) neural networks — which have
proven to be powerful image processing models — working with sheet music
becomes feasible and a promising future research direction.

A Data-Driven Compressive Sensing Framework Tailored For Energy-Efficient Wearable Sensing

Kai Xu, Yixing Li, Fengbo Ren
Comments: Accepted as an oral presentation in 42th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Learning (cs.LG); Information Theory (cs.IT)

Compressive sensing (CS) is a promising technology for realizing
energy-efficient wireless sensors for long-term health monitoring. However,
conventional model-driven CS frameworks suffer from limited compression ratio
and reconstruction quality when dealing with physiological signals due to
inaccurate models and the overlook of individual variability. In this paper, we
propose a data-driven CS framework that can learn signal characteristics and
personalized features from any individual recording of physiologic signals to
enhance CS performance with a minimized number of measurements. Such
improvements are accomplished by a co-training approach that optimizes the
sensing matrix and the dictionary towards improved restricted isometry property
and signal sparsity, respectively. Experimental results upon ECG signals show
that the proposed method, at a compression ratio of 10x, successfully reduces
the isometry constant of the trained sensing matrices by 86% against random
matrices and improves the overall reconstructed signal-to-noise ratio by 15dB
over conventional model-driven approaches.

Bayesian Optimization for Machine Learning : A Practical Guidebook

Ian Dewancker, Michael McCourt, Scott Clark
Subjects: Learning (cs.LG)

The engineering of machine learning systems is still a nascent field; relying
on a seemingly daunting collection of quickly evolving tools and best
practices. It is our hope that this guidebook will serve as a useful resource
for machine learning practitioners looking to take advantage of Bayesian
optimization techniques. We outline four example machine learning problems that
can be solved using open source machine learning libraries, and highlight the
benefits of using Bayesian optimization in the context of these common machine
learning applications.

Constraint Selection in Metric Learning

Hoel Le Capitaine
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

A number of machine learning algorithms are using a metric, or a distance, in
order to compare individuals. The Euclidean distance is usually employed, but
it may be more efficient to learn a parametric distance such as Mahalanobis
metric. Learning such a metric is a hot topic since more than ten years now,
and a number of methods have been proposed to efficiently learn it. However,
the nature of the problem makes it quite difficult for large scale data, as
well as data for which classes overlap. This paper presents a simple way of
improving accuracy and scalability of any iterative metric learning algorithm,
where constraints are obtained prior to the algorithm. The proposed approach
relies on a loss-dependent weighted selection of constraints that are used for
learning the metric. Using the corresponding dedicated loss function, the
method clearly allows to obtain better results than state-of-the-art methods,
both in terms of accuracy and time complexity. Some experimental results on
real world, and potentially large, datasets are demonstrating the effectiveness
of our proposition.

Private Learning on Networks

Shripad Gade, Nitin H. Vaidya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Optimization and Control (math.OC)

Continual data collection and widespread deployment of machine learning
algorithms, particularly the distributed variants, have raised new privacy
challenges. In a distributed machine learning scenario, the dataset is stored
among several machines and they solve a distributed optimization problem to
collectively learn the underlying model. We present a secure multi-party
computation inspired privacy preserving distributed algorithm for optimizing a
convex function consisting of several possibly non-convex functions. Each
individual objective function is privately stored with an agent while the
agents communicate model parameters with neighbor machines connected in a
network. We show that our algorithm can correctly optimize the overall
objective function and learn the underlying model accurately. We further prove
that under a vertex connectivity condition on the topology, our algorithm
preserves privacy of individual objective functions. We establish limits on the
what a coalition of adversaries can learn by observing the messages and states
shared over a network.

CSVideoNet: A Recurrent Convolutional Neural Network for Compressive Sensing Video Reconstruction

Kai Xu, Fengbo Ren
Comments: 10 pages, 6 pages, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

In this paper, we develop a deep neural network architecture called
“CSVideoNet” that can learn visual representations from random measurements for
compressive sensing (CS) video reconstruction. CSVideoNet is an end-to-end
trainable and non-iterative model that combines convolutional neural networks
(CNNs) with a recurrent neural networks (RNN) to facilitate video
reconstruction by leveraging temporal-spatial features. The proposed network
can accept random measurements with a multi-level compression ratio (CR). The
lightly and aggressively compressed measurements offer background information
and object details, respectively. This is similar to the variable bit rate
techniques widely used in conventional video coding approaches. The RNN
employed by CSVideoNet can leverage temporal coherence that exists in adjacent
video frames to extrapolate motion features and merge them with spatial visual
features extracted by the CNNs to further enhance reconstruction quality,
especially at high CRs. We test our CSVideoNet on the UCF-101 dataset.
Experimental results show that CSVideoNet outperforms the existing video CS
reconstruction approaches. The results demonstrate that our method can preserve
relatively excellent visual details from original videos even at a 100x CR,
which is difficult to realize with the reference approaches. Also, the
non-iterative nature of CSVideoNet results in an decrease in runtime by three
orders of magnitude over iterative reconstruction algorithms. Furthermore,
CSVideoNet can enhance the CR of CS cameras beyond the limitation of
conventional approaches, ensuring a reduction in bandwidth for data
transmission. These benefits are especially favorable to high-frame-rate video
applications.

On the Potential of Simple Framewise Approaches to Piano Transcription

Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian Böck, Andreas Arzt, Gerhard Widmer
Comments: Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR 2016), New York, NY
Subjects: Sound (cs.SD); Learning (cs.LG)

In an attempt at exploring the limitations of simple approaches to the task
of piano transcription (as usually defined in MIR), we conduct an in-depth
analysis of neural network-based framewise transcription. We systematically
compare different popular input representations for transcription systems to
determine the ones most suitable for use with neural networks. Exploiting
recent advances in training techniques and new regularizers, and taking into
account hyper-parameter tuning, we show that it is possible, by simple
bottom-up frame-wise processing, to obtain a piano transcriber that outperforms
the current published state of the art on the publicly available MAPS dataset
— without any complex post-processing steps. Thus, we propose this simple
approach as a new baseline for this dataset, for future transcription research
to build on and improve.

Towards End-to-End Audio-Sheet-Music Retrieval

This paper demonstrates the feasibility of learning to retrieve short
snippets of sheet music (images) when given a short query excerpt of music
(audio) — and vice versa –, without any symbolic representation of music or
scores. This would be highly useful in many content-based musical retrieval
scenarios. Our approach is based on Deep Canonical Correlation Analysis (DCCA)
and learns correlated latent spaces allowing for cross-modality retrieval in
both directions. Initial experiments with relatively simple monophonic music
show promising results.

Feature Learning for Chord Recognition: The Deep Chroma Extractor

Filip Korzeniowski, Gerhard Widmer
Comments: In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), New York, USA, 2016
Subjects: Sound (cs.SD); Learning (cs.LG)

We explore frame-level audio feature learning for chord recognition using
artificial neural networks. We present the argument that chroma vectors
potentially hold enough information to model harmonic content of audio for
chord recognition, but that standard chroma extractors compute too noisy
features. This leads us to propose a learned chroma feature extractor based on
artificial neural networks. It is trained to compute chroma features that
encode harmonic information important for chord recognition, while being robust
to irrelevant interferences. We achieve this by feeding the network an audio
spectrum with context instead of a single frame as input. This way, the network
can learn to selectively compensate noise and resolve harmonic ambiguities.

We compare the resulting features to hand-crafted ones by using a simple
linear frame-wise classifier for chord recognition on various data sets. The
results show that the learned feature extractor produces superior chroma
vectors for chord recognition.

Graphical RNN Models

Ashish Bora, Sugato Basu, Joydeep Ghosh
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

Many time series are generated by a set of entities that interact with one
another over time. This paper introduces a broad, flexible framework to learn
from multiple inter-dependent time series generated by such entities. Our
framework explicitly models the entities and their interactions through time.
It achieves this by building on the capabilities of Recurrent Neural Networks,
while also offering several ways to incorporate domain knowledge/constraints
into the model architecture. The capabilities of our approach are showcased
through an application to weather prediction, which shows gains over strong
baselines.

Optimal structure and parameter learning of Ising models

Andrey Y. Lokhov, Marc Vuffray, Sidhant Misra, Michael Chertkov
Comments: 4 pages, 11 pages of supplementary information
Subjects: Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

Reconstruction of structure and parameters of a graphical model from binary
samples is a problem of practical importance in a variety of disciplines,
ranging from statistical physics and computational biology to image processing
and machine learning. The focus of the research community shifted towards
developing universal reconstruction algorithms which are both computationally
efficient and require the minimal amount of expensive data. We introduce a new
method, Interaction Screening, which accurately estimates the model parameters
using local optimization problems. The algorithm provably achieves perfect
graph structure recovery with an information-theoretically optimal number of
samples and outperforms state of the art techniques, especially in the
low-temperature regime which is known to be the hardest for learning. We assess
the efficacy of Interaction Screening through extensive numerical tests on
Ising models of various topologies and with different types of interactions,
ranging from ferromagnetic to spin-glass.

Graph-based semi-supervised learning for relational networks

Leto Peel
Comments: 11 pages, 8 figures
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

We address the problem of semi-supervised learning in relational networks,
networks in which nodes are entities and links are the relationships or
interactions between them. Typically this problem is confounded with the
problem of graph-based semi-supervised learning (GSSL), because both problems
represent the data as a graph and predict the missing class labels of nodes.
However, not all graphs are created equally. In GSSL a graph is constructed,
often from independent data, based on similarity. As such, edges tend to
connect instances with the same class label. Relational networks, however, can
be more heterogeneous and edges do not always indicate similarity. For
instance, instead of links being more likely to connect nodes with the same
class label, they may occur more frequently between nodes with different class
labels (link-heterogeneity). Or nodes with the same class label do not
necessarily have the same type of connectivity across the whole network
(class-heterogeneity), e.g. in a network of sexual interactions we may observe
links between opposite genders in some parts of the graph and links between the
same genders in others. Performing classification in networks with different
types of heterogeneity is a hard problem that is made harder still when we do
not know a-priori the type or level of heterogeneity. Here we present two
scalable approaches for graph-based semi-supervised learning for the more
general case of relational networks. We demonstrate these approaches on
synthetic and real-world networks that display different link patterns within
and between classes. Compared to state-of-the-art approaches, ours give better
classification performance without prior knowledge of how classes interact. In
particular, our two-step label propagation algorithm gives consistently good
accuracy and runs on networks of over 1.6 million nodes and 30 million edges in
around 12 seconds.

Improving Neural Network Generalization by Combining Parallel Circuits with Dropout

In an attempt to solve the lengthy training times of neural networks, we
proposed Parallel Circuits (PCs), a biologically inspired architecture.
Previous work has shown that this approach fails to maintain generalization
performance in spite of achieving sharp speed gains. To address this issue, and
motivated by the way Dropout prevents node co-adaption, in this paper, we
suggest an improvement by extending Dropout to the PC architecture. The paper
provides multiple insights into this combination, including a variety of fusion
approaches. Experiments show promising results in which improved error rates
are achieved in most cases, whilst maintaining the speed advantage of the PC
approach.

Dynamical Kinds and their Discovery

We demonstrate the possibility of classifying causal systems into kinds that
share a common structure without first constructing an explicit dynamical model
or using prior knowledge of the system dynamics. The algorithmic ability to
determine whether arbitrary systems are governed by causal relations of the
same form offers significant practical applications in the development and
validation of dynamical models. It is also of theoretical interest as an
essential stage in the scientific inference of laws from empirical data. The
algorithm presented is based on the dynamical symmetry approach to dynamical
kinds. A dynamical symmetry with respect to time is an intervention on one or
more variables of a system that commutes with the time evolution of the system.
A dynamical kind is a class of systems sharing a set of dynamical symmetries.
The algorithm presented classifies deterministic, time-dependent causal systems
by directly comparing their exhibited symmetries. Using simulated, noisy data
from a variety of nonlinear systems, we show that this algorithm correctly
sorts systems into dynamical kinds. It is robust under significant sampling
error, is immune to violations of normality in sampling error, and fails
gracefully with increasing dynamical similarity. The algorithm we demonstrate
is the first to address this aspect of automated scientific discovery.

Semi-Supervised Phone Classification using Deep Neural Networks and Stochastic Graph-Based Entropic Regularization

Sunil Thulasidasan, Jeffrey Bilmes
Comments: InterSpeech Workshop on Machine Learning in Speech and Language Processing, 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We describe a graph-based semi-supervised learning framework in the context
of deep neural networks that uses a graph-based entropic regularizer to favor
smooth solutions over a graph induced by the data. The main contribution of
this work is a computationally efficient, stochastic graph-regularization
technique that uses mini-batches that are consistent with the graph structure,
but also provides enough stochasticity (in terms of mini-batch data diversity)
for convergence of stochastic gradient descent methods to good solutions. For
this work, we focus on results of frame-level phone classification accuracy on
the TIMIT speech corpus but our method is general and scalable to much larger
data sets. Results indicate that our method significantly improves
classification accuracy compared to the fully-supervised case when the fraction
of labeled data is low, and it is competitive with other methods in the fully
labeled case.

Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

We describe a computationally efficient, stochastic graph-regularization
technique that can be utilized for the semi-supervised training of deep neural
networks in a parallel or distributed setting. We utilize a technique, first
described in [13] for the construction of mini-batches for stochastic gradient
descent (SGD) based on synthesized partitions of an affinity graph that are
consistent with the graph structure, but also preserve enough stochasticity for
convergence of SGD to good local minima. We show how our technique allows a
graph-based semi-supervised loss function to be decomposed into a sum over
objectives, facilitating data parallelism for scalable training of machine
learning models. Empirical results indicate that our method significantly
improves classification accuracy compared to the fully-supervised case when the
fraction of labeled data is low, and in the parallel case, achieves significant
speed-up in terms of wall-clock time to convergence. We show the results for
both sequential and distributed-memory semi-supervised DNN training on a speech
corpus.

Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration

Cecilia S. Lee, Doug M. Baughman, Aaron Y. Lee
Comments: 4 Figures, 1 Table
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Objective: The advent of Electronic Medical Records (EMR) with large
electronic imaging databases along with advances in deep neural networks with
machine learning has provided a unique opportunity to achieve milestones in
automated image analysis. Optical coherence tomography (OCT) is the most
commonly obtained imaging modality in ophthalmology and represents a dense and
rich dataset when combined with labels derived from the EMR. We sought to
determine if deep learning could be utilized to distinguish normal OCT images
from images from patients with Age-related Macular Degeneration (AMD). Methods:
Automated extraction of an OCT imaging database was performed and linked to
clinical endpoints from the EMR. OCT macula scans were obtained by Heidelberg
Spectralis, and each OCT scan was linked to EMR clinical endpoints extracted
from EPIC. The central 11 images were selected from each OCT scan of two
cohorts of patients: normal and AMD. Cross-validation was performed using a
random subset of patients. Area under receiver operator curves (auROC) were
constructed at an independent image level, macular OCT level, and patient
level. Results: Of an extraction of 2.6 million OCT images linked to clinical
datapoints from the EMR, 52,690 normal and 48,312 AMD macular OCT images were
selected. A deep neural network was trained to categorize images as either
normal or AMD. At the image level, we achieved an auROC of 92.78% with an
accuracy of 87.63%. At the macula level, we achieved an auROC of 93.83% with an
accuracy of 88.98%. At a patient level, we achieved an auROC of 97.45% with an
accuracy of 93.45%. Peak sensitivity and specificity with optimal cutoffs were
92.64% and 93.69% respectively. Conclusions: Deep learning techniques are
effective for classifying OCT images. These findings have important
implications in utilizing OCT in automated screening and computer aided
diagnosis tools.

Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences

User acceptance of artificial intelligence agents might depend on their
ability to explain their reasoning, which requires adding an interpretability
layer that fa- cilitates users to understand their behavior. This paper focuses
on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),
which measures the degree of semantic equivalence between two sentences. The
interpretability layer is formalized as the alignment between pairs of segments
across the two sentences, where the relation between the segments is labeled
with a relation type and a similarity score. We present a publicly available
dataset of sentence pairs annotated following the formalization. We then
develop a system trained on this dataset which, given a sentence pair, explains
what is similar and different, in the form of graded and typed segment
alignments. When evaluated on the dataset, the system performs better than an
informed baseline, showing that the dataset and task are well-defined and
feasible. Most importantly, two user studies show how the system output can be
used to automatically produce explanations in natural language. Users performed
better when having access to the explanations, pro- viding preliminary evidence
that our dataset and method to automatically produce explanations is useful in
real applications.

Uncovering the Dynamics of Crowdlearning and the Value of Knowledge

Utkarsh Upadhyay, Isabel Valera, Manuel Gomez-Rodriguez
Comments: To appear in Tenth ACM International conference on Web Search and Data Mining (WSDM) in 2017
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)

Learning from the crowd has become increasingly popular in the Web and social
media. There is a wide variety of crowdlearning sites in which, on the one
hand, users learn from the knowledge that other users contribute to the site,
and, on the other hand, knowledge is reviewed and curated by the same users
using assessment measures such as upvotes or likes.

In this paper, we present a probabilistic modeling framework of
crowdlearning, which uncovers the evolution of a user’s expertise over time by
leveraging other users’ assessments of her contributions. The model allows for
both off-site and on-site learning and captures forgetting of knowledge. We
then develop a scalable estimation method to fit the model parameters from
millions of recorded learning and contributing events. We show the
effectiveness of our model by tracing activity of ~25 thousand users in Stack
Overflow over a 4.5 year period. We find that answers with high knowledge value
are rare. Newbies and experts tend to acquire less knowledge than users in the
middle range. Prolific learners tend to be also proficient contributors that
post answers with high knowledge value.

Information Theory

Lossy Transmission of Correlated Sources over a Multiple Access Channel: Necessary Conditions and Separation Results

Basak Guler, Deniz Gunduz, Aylin Yener
Comments: Submitted to IEEE Transactions on Information Theory on Nov 30, 2016
Subjects: Information Theory (cs.IT)

Lossy communication of correlated sources over a multiple access channel is
studied. First, lossy communication is investigated in the presence of
correlated decoder side information. An achievable joint source-channel coding
scheme is presented, and the conditions under which separate source and channel
coding is optimal are explored. It is shown that separation is optimal when the
encoders and the decoder have access to a common observation conditioned on
which the two sources are independent. Separation is shown to be optimal also
when only the encoders have access to such a common observation whose lossless
recovery is required at the decoder. Moreover, the optimality of separation is
shown for sources with a common part, and sources with reconstruction
constraints. Next, these results obtained for the system in presence of side
information are utilized to provide a set of necessary conditions for the
transmission of correlated sources over a multiple access channel without side
information. The identified necessary conditions are specialized to the case of
bivariate Gaussian sources over a Gaussian multiple access channel, and are
shown to be tighter than known results in the literature in certain cases. Our
results indicate that side information can have a significant impact on the
optimality of source-channel separation in lossy transmission, in addition to
being instrumental in identifying necessary conditions for the transmission of
correlated sources when no side information is present.

Privacy-Protecting Energy Management Unit through Model-Distribution Predictive Control

Jun-Xing Chin, Tomas Tinoco De Rubira, Gabriela Hug
Comments: Pre-print, submitted for review
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

The roll-out of smart meters in electricity networks introduces risks for
consumer privacy due to increased measurement frequency and granularity.
Through various Non-Intrusive Load Monitoring techniques, consumer behavior may
be inferred from their metering data. In this paper, we propose an energy
management method that protects privacy through the minimization of information
leakage. The method is based on a Model Predictive Controller that utilizes
energy storage and local generation, and that predicts the effects of its
actions on the statistics of the actual energy consumption of a consumer and
that seen by the grid. Computationally, the method requires solving a
Mixed-Integer Quadratic Program of manageable size whenever new meter readings
are available. We simulate the controller on generated residential load
profiles with different privacy costs in a two-tier time-of-use energy pricing
environment. Results show that information leakage is effectively reduced at
the expense of increased energy cost. The results also show that, using the
proposed controller, the consumer load profile seen by the grid resembles a
mixture between that obtained with Non-Intrusive Load Leveling and Lazy
Stepping.

Variations of the McEliece Cryptosystem

Jessalyn Bolkema, Heide Gluesing-Luerssen, Christine A. Kelley, Kristin Lauter, Beth Malmskog, Joachim Rosenthal
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)

Two variations of the McEliece cryptosystem are presented. The first one is
based on a relaxation of the column permutation in the classical McEliece
scrambling process. This is done in such a way that the Hamming weight of the
error, added in the encryption process, can be controlled so that efficient
decryption remains possible. The second variation is based on the use of
spatially coupled moderate-density parity-check codes as secret codes. These
codes are known for their excellent error-correction performance and allow for
a relatively low key size in the cryptosystem. For both variants the security
with respect to known attacks is discussed.

QoS-Based Linear Transceiver Optimization for Full-Duplex Multi-User Communications

Tsung-Hui Chang, Ya-Feng Liu, Shih-Chun Lin
Comments: submitted for publication
Subjects: Information Theory (cs.IT)

In this paper, we consider a multi-user wireless system with one full duplex
(FD) base station (BS) serving a set of half duplex (HD) mobile users.To cope
with the in-band self-interference (SI) and co-channel interference, we
formulate a quality-of-service (QoS) based linear transceiver design problem.
The problem jointly optimizes the downlink (DL) and uplink (UL) beamforming
vectors of the BS and the transmission powers of UL users so as to provide both
the DL and UL users with guaranteed signal-to-interference-plus-noise ratio
performance, using a minimum UL and DL transmission sum power.The considered
system model not only takes into account noise caused by non-ideal RF circuits,
analog/digital SI cancellation but also constrains the maximum signal power at
the input of the analog-to-digital converter (ADC) for avoiding signal
distortion due to finite ADC precision. The formulated design problem is not
convex and challenging to solve in general. We first show that for a special
case where the SI channel estimation errors are independent and identically
distributed, the QoS-based linear transceiver design problem is globally
solvable by a polynomial-time bisection algorithm.For the general case, we
propose a suboptimal algorithm based on alternating optimization (AO). The AO
algorithm is guaranteed to converge to a Karush-Kuhn-Tucker solution.To reduce
the complexity of the AO algorithm, we further develop a fixed-point method by
extending the classical uplink-downlink duality in HD systems to the FD
system.Simulation results are presented to demonstrate the performance of the
proposed algorithms and the comparison with HD systems.

Antenna Selection for MIMO Non-orthogonal Multiple Access Systems

Yuehua Yu, He Chen, Yonghui Li, Zhiguo Ding, Branka Vucetic
Comments: Submitted for possible journal publication
Subjects: Information Theory (cs.IT)

This paper considers the joint antenna selection (AS) problem for a classical
two-user MIMO non-orthogonal multiple access (NOMA) system, where both the base
station (BS) and users (UEs) are equipped with multiple antennas. Specifically,
several computationally-efficient AS algorithms are developed for two
commonly-used NOMA scenarios: fixed power allocation NOMA (F-NOMA) and
cognitive radio-inspired NOMA (CR-NOMA). For the F-NOMA system, two novel AS
schemes, namely max-max-max AS (A(^3)-AS) and max-min-max AS (AIA-AS), are
proposed to maximize the system sum-rate, without and with the consideration of
user fairness, respectively. In the CR-NOMA network, a novel AS algorithm,
termed maximum-channel-gain-based AS (MCG-AS), is proposed to maximize the
achievable rate of the secondary user, under the condition that the primary
user’s quality of service requirement is satisfied. The asymptotic closed-form
expressions of the average sum-rate for A(^3)-AS and AIA-AS and that of the
average rate of the secondary user for MCG-AS are derived, respectively.
Numerical results demonstrate that the AIA-AS provides better user-fairness,
while the A(^3)-AS achieves a near-optimal sum-rate in F-NOMA systems. For the
CR-NOMA scenario, MCG-AS achieves a near-optimal performance in a wide SNR
regime. Furthermore, all the proposed AS algorithms yield a significant
computational complexity reduction, compared to exhaustive search-based
counterparts.

Optical Adaptive Precoding for Visible Light Communications

Hanaa Marshoud, Paschalis C. Sofotasios, Sami Muhaidat, Bayan S. Sharif, George K. Karagiannidis
Subjects: Information Theory (cs.IT)

Multiple-input multiple-output (MIMO) techniques have recently demonstrated
significant potentials in visible light communications (VLC), as they can
overcome the modulation bandwidth limitation and provide substantial
improvement in terms of spectral efficiency and link reliability. However, MIMO
systems typically suffer from inter-channel interference, which causes severe
degradation to the system performance. In this context, we propose a novel
optical adaptive precoding (OAP) scheme for the downlink of MIMO VLC systems,
which exploits the knowledge of transmitted symbols to enhance the effective
signal-to-interference-plus-noise ratio. We also derive bit-error-rate
expressions for the OAP under perfect and outdated channel state information
(CSI). Our results demonstrate that the proposed scheme is more robust to both
CSI error and channel correlation, compared to conventional channel inversion
precoding.

State Estimation with Secrecy against Eavesdroppers

Anastasios Tsiamis, Konstantinos Gatsis, George J. Pappas
Subjects: Systems and Control (cs.SY); Cryptography and Security (cs.CR); Information Theory (cs.IT)

We study the problem of remote state estimation, in the presence of an
eavesdropper. An authorized user estimates the state of a linear plant, based
on the data received from a sensor, while the data may also be intercepted by
the eavesdropper. To maintain confidentiality with respect to state, we
introduce a novel control-theoretic definition of perfect secrecy requiring
that the user’s expected error remains bounded while the eavesdropper’s
expected error grows unbounded. We propose a secrecy mechanism which guarantees
perfect secrecy by randomly withholding sensor information, under the condition
that the user’s packet reception rate is larger than the eavesdropper’s
interception rate. Given this mechanism, we also explore the tradeoff between
user’s utility and confidentiality with respect to the eavesdropper, via an
optimization problem. Finally, some examples are studied to provide insights
about this tradeoff.

A Data-Driven Compressive Sensing Framework Tailored For Energy-Efficient Wearable Sensing

Compressive sensing (CS) is a promising technology for realizing
energy-efficient wireless sensors for long-term health monitoring. However,
conventional model-driven CS frameworks suffer from limited compression ratio
and reconstruction quality when dealing with physiological signals due to
inaccurate models and the overlook of individual variability. In this paper, we
propose a data-driven CS framework that can learn signal characteristics and
personalized features from any individual recording of physiologic signals to
enhance CS performance with a minimized number of measurements. Such
improvements are accomplished by a co-training approach that optimizes the
sensing matrix and the dictionary towards improved restricted isometry property
and signal sparsity, respectively. Experimental results upon ECG signals show
that the proposed method, at a compression ratio of 10x, successfully reduces
the isometry constant of the trained sensing matrices by 86% against random
matrices and improves the overall reconstructed signal-to-noise ratio by 15dB
over conventional model-driven approaches.