IT博客汇 | arXiv Paper Daily: Mon, 6 Mar 2017

arXiv Paper Daily: Mon, 6 Mar 2017

我爱机器学习(52ml.net)发表于 2017-03-06 00:00:00

Neural and Evolutionary Computing

On the Behavior of Convolutional Nets for Feature Extraction

Dario Garcia-Gasulla, Ferran Parés, Armand Vilalta, Jonatan Moreno, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, Toyotaro Suzumura
Comments: Submitted to Journal of Artificial Intelligence Research Special Track on Deep Learning, Knowledge Representation, and Reasoning
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Convolutional neural networks (CNN) are representation learning techniques
that achieve state-of-the-art performance on almost every image-related,
machine learning task. Applying the representation languages build by these
models to tasks beyond the one they were originally trained for is a field of
interest known as transfer learning for feature extraction. Through this
approach, one can apply the image descriptors learnt by a CNN after processing
millions of images to any dataset, without an expensive training phase.
Contributions to this field have so far focused on extracting CNN features from
layers close to the output (e.g., fully connected layers), particularly because
they work better when used out-of-the-box to feed a classifier. Nevertheless,
the rest of CNN features is known to encode a wide variety of visual
information, which could be potentially exploited on knowledge representation
and reasoning tasks. In this paper we analyze the behavior of each feature
individually, exploring their intra/inter class activations for all classes of
three different datasets. From this study we learn that low and middle level
features behave very differently to high level features, the former being more
descriptive and the latter being more discriminant. We show how low and middle
level features can be used for knowledge representation purposes both by their
presence or by their absence. We also study how much noise these features may
encode, and propose a thresholding approach to discard most of it. Finally, we
discuss the potential implications of these results in the context of knowledge
representation using features extracted from a CNN.

Large-Scale Evolution of Image Classifiers

Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Quoc Le, Alex Kurakin
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

Neural networks have proven effective at solving difficult problems but
designing their architectures can be challenging, even for image classification
problems alone. Evolutionary algorithms provide a technique to discover such
networks automatically. Despite significant computational requirements, we show
that evolving models that rival large, hand-designed architectures is possible
today. We employ simple evolutionary techniques at unprecedented scales to
discover models for the CIFAR-10 and CIFAR-100 datasets, starting from trivial
initial conditions. To do this, we use novel and intuitive mutation operators
that navigate large search spaces. We stress that no human participation is
required once evolution starts and that the output is a fully-trained model.
Throughout this work, we place special emphasis on the repeatability of
results, the variability in the outcomes and the computational requirements.

Adversarial Examples for Semantic Image Segmentation

Volker Fischer, Mummadi Chaithanya Kumar, Jan Hendrik Metzen, Thomas Brox
Comments: ICLR 2017 workshop submission
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Machine learning methods in general and Deep Neural Networks in particular
have shown to be vulnerable to adversarial perturbations. So far this
phenomenon has mainly been studied in the context of whole-image
classification. In this contribution, we analyse how adversarial perturbations
can affect the task of semantic segmentation. We show how existing adversarial
attackers can be transferred to this task and that it is possible to create
imperceptible adversarial perturbations that lead a deep network to misclassify
almost all pixels of a chosen class while leaving network prediction nearly
unchanged outside this class.

Computer Vision and Pattern Recognition

Bridging Saliency Detection to Weakly Supervised Object Detection Based on Self-paced Curriculum Learning

Dingwen Zhang, Deyu Meng, Long Zhao, Junwei Han
Comments: Has published in IJCAI 16
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Weakly-supervised object detection (WOD) is a challenging problems in
computer vision. The key problem is to simultaneously infer the exact object
locations in the training images and train the object detectors, given only the
training images with weak image-level labels. Intuitively, by simulating the
selective attention mechanism of human visual system, saliency detection
technique can select attractive objects in scenes and thus is a potential way
to provide useful priors for WOD. However, the way to adopt saliency detection
in WOD is not trivial since the detected saliency region might be possibly
highly ambiguous in complex cases. To this end, this paper first
comprehensively analyzes the challenges in applying saliency detection to WOD.
Then, we make one of the earliest efforts to bridge saliency detection to WOD
via the self-paced curriculum learning, which can guide the learning procedure
to gradually achieve faithful knowledge of multi-class objects from easy to
hard. The experimental results demonstrate that the proposed approach can
successfully bridge saliency detection and WOD tasks and achieve the
state-of-the-art object detection results under the weak supervision.

Instance Flow Based Online Multiple Object Tracking

Sebastian Bullinger, Christoph Bodensteiner, Michael Arens
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a method to perform online Multiple Object Tracking (MOT) of known
object categories in monocular video data. Current Tracking-by-Detection MOT
approaches build on top of 2D bounding box detections. In contrast, we exploit
state-of-the-art instance aware semantic segmentation techniques to compute 2D
shape representations of target objects in each frame. We predict position and
shape of segmented instances in subsequent frames by exploiting optical flow
cues. We define an affinity matrix between instances of subsequent frames which
reflects locality and visual similarity. The instance association is solved by
applying the Hungarian method. We evaluate different configurations of our
algorithm using the MOT 2D 2015 train dataset. The evaluation shows that our
tracking approach is able to track objects with high relative motions. In
addition, we provide results of our approach on the MOT 2D 2015 test set for
comparison with previous works. We achieve a MOTA score of 32.1.

Incident Light Frequency-based Image Defogging Algorithm

Wenbo Zhang, Xiaorong Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Considering the problem of color distortion caused by the defogging algorithm
based on dark channel prior, an improved algorithm was proposed to calculate
the transmittance of all channels respectively. First, incident light
frequency’s effect on the transmittance of various color channels was analyzed
according to the Beer-Lambert’s Law, from which a proportion among various
channel transmittances was derived; afterwards, images were preprocessed by
down-sampling to refine transmittance, and then the original size was restored
to enhance the operational efficiency of the algorithm; finally, the
transmittance of all color channels was acquired in accordance with the
proportion, and then the corresponding transmittance was used for image
restoration in each channel. The experimental results show that compared with
the existing algorithm, this improved image defogging algorithm could make
image colors more natural, solve the problem of slightly higher color
saturation caused by the existing algorithm, and shorten the operation time by
four to nine times.

Augmented Reality for Depth Cues in Monocular Minimally Invasive Surgery

Long Chen, Wen Tang, Nigel W. John, Tao Ruan Wan, Jian Jun Zhang
Comments: 15 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

One of the major challenges in Minimally Invasive Surgery (MIS) such as
laparoscopy is the lack of depth perception. In recent years, laparoscopic
scene tracking and surface reconstruction has been a focus of investigation to
provide rich additional information to aid the surgical process and compensate
for the depth perception issue. However, robust 3D surface reconstruction and
augmented reality with depth perception on the reconstructed scene are yet to
be reported. This paper presents our work in this area. First, we adopt a
state-of-the-art visual simultaneous localization and mapping (SLAM) framework
– ORB-SLAM – and extend the algorithm for use in MIS scenes for reliable
endoscopic camera tracking and salient point mapping. We then develop a robust
global 3D surface reconstruction frame- work based on the sparse point clouds
extracted from the SLAM framework. Our approach is to combine an outlier
removal filter within a Moving Least Squares smoothing algorithm and then
employ Poisson surface reconstruction to obtain smooth surfaces from the
unstructured sparse point cloud. Our proposed method has been quantitatively
evaluated compared with ground-truth camera trajectories and the organ model
surface we used to render the synthetic simulation videos. In vivo laparoscopic
videos used in the tests have demonstrated the robustness and accuracy of our
proposed framework on both camera tracking and surface reconstruction,
illustrating the potential of our algorithm for depth augmentation and
depth-corrected augmented reality in MIS with monocular endoscopes.

Deep Collaborative Learning for Visual Recognition

Yan Wang, Lingxi Xie, Ya Zhang, Wenjun Zhang, Alan Yuille
Comments: Submitted to CVPR 2017 (10 pages, 5 figures)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks are playing an important role in state-of-the-art visual
recognition. To represent high-level visual concepts, modern networks are
equipped with large convolutional layers, which use a large number of filters
and contribute significantly to model complexity. For example, more than half
of the weights of AlexNet are stored in the first fully-connected layer (4,096
filters).

We formulate the function of a convolutional layer as learning a large visual
vocabulary, and propose an alternative way, namely Deep Collaborative Learning
(DCL), to reduce the computational complexity. We replace a convolutional layer
with a two-stage DCL module, in which we first construct a couple of smaller
convolutional layers individually, and then fuse them at each spatial position
to consider feature co-occurrence. In mathematics, DCL can be explained as an
efficient way of learning compositional visual concepts, in which the
vocabulary size increases exponentially while the model complexity only
increases linearly. We evaluate DCL on a wide range of visual recognition
tasks, including a series of multi-digit number classification datasets, and
some generic image classification datasets such as SVHN, CIFAR and ILSVRC2012.
We apply DCL to several state-of-the-art network structures, improving the
recognition accuracy meanwhile reducing the number of parameters (16.82% fewer
in AlexNet).

Context Aware Query Image Representation for Particular Object Retrieval

Zakaria Laskar, Juho Kannala
Comments: 14 pages, Extended version of a manuscript submitted to SCIA 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The current models of image representation based on Convolutional Neural
Networks (CNN) have shown tremendous performance in image retrieval. Such
models are inspired by the information flow along the visual pathway in the
human visual cortex. We propose that in the field of particular object
retrieval, the process of extracting CNN representations from query images with
a given region of interest (ROI) can also be modelled by taking inspiration
from human vision. Particularly, we show that by making the CNN pay attention
on the ROI while extracting query image representation leads to significant
improvement over the baseline methods on challenging Oxford5k and Paris6k
datasets. Furthermore, we propose an extension to a recently introduced
encoding method for CNN representations, regional maximum activations of
convolutions (R-MAC). The proposed extension weights the regional
representations using a novel saliency measure prior to aggregation. This leads
to further improvement in retrieval accuracy.

Denoising Adversarial Autoencoders

Antonia Creswell, Anil Anthony Bharath
Comments: submitted to journal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Unsupervised learning is of growing interest because it unlocks the potential
held in vast amounts of unlabelled data to learn useful representations for
inference. Autoencoders, a form of generative model, may be trained by learning
to reconstruct unlabelled input data from a latent representation space. More
robust representations may be produced by an autoencoder if it learns to
recover clean input samples from corrupted ones. Representations may be further
improved by introducing regularisation during training to shape the
distribution of the encoded data in latent space. We suggest denoising
adversarial autoencoders, which combine denoising and regularisation, shaping
the distribution of latent space using adversarial training. We introduce a
novel analysis that shows how denoising may be incorporated into the training
and sampling of adversarial autoencoders. Experiments are performed to assess
the contributions that denoising makes to the learning of representations for
classification and sample synthesis. Our results suggest that autoencoders
trained using a denoising criterion achieve higher classification performance,
and can synthesise samples that are more consistent with the input data than
those trained without a corruption process.

EmotioNet Challenge: Recognition of facial expressions of emotion in the wild

C. Fabian Benitez-Quiroz, Ramprakash Srinivasan, Qianli Feng, Yan Wang, Aleix M. Martinez
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper details the methodology and results of the EmotioNet challenge.
This challenge is the first to test the ability of computer vision algorithms
in the automatic analysis of a large number of images of facial expressions of
emotion in the wild. The challenge was divided into two tracks. The first track
tested the ability of current computer vision algorithms in the automatic
detection of action units (AUs). Specifically, we tested the detection of 11
AUs. The second track tested the algorithms’ ability to recognize emotion
categories in images of facial expressions. Specifically, we tested the
recognition of 16 basic and compound emotion categories. The results of the
challenge suggest that current computer vision and machine learning algorithms
are unable to reliably solve these two tasks. The limitations of current
algorithms are more apparent when trying to recognize emotion. We also show
that current algorithms are not affected by mild resolution changes, small
occluders, gender or age, but that 3D pose is a major limiting factor on
performance. We provide an in-depth discussion of the points that need special
attention moving forward.

A Survey on Content-Aware Video Analysis for Sports

Huang-Chia Shih
Comments: Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Sports data analysis is becoming increasingly large-scale, diversified, and
shared, but difficulty persists in rapidly accessing the most crucial
information. Previous surveys have focused on the methodologies of sports video
analysis from the spatiotemporal viewpoint instead of a content-based
viewpoint, and few of these studies have considered semantics. This study
develops a deeper interpretation of content-aware sports video analysis by
examining the insight offered by research into the structure of content under
different scenarios. On the basis of this insight, we provide an overview of
the themes particularly relevant to the research on content-aware systems for
broadcast sports. Specifically, we focus on the video content analysis
techniques applied in sportscasts over the past decade from the perspectives of
fundamentals and general review, a content hierarchical model, and trends and
challenges. Content-aware analysis methods are discussed with respect to
object-, event-, and context-oriented groups. In each group, the gap between
sensation and content excitement must be bridged using proper strategies. In
this regard, a content-aware approach is required to determine user demands.
Finally, the paper summarizes the future trends and challenges for sports video
analysis. We believe that our findings can advance the field of research on
content-aware video analysis for broadcast sports.

Deep Learning with Domain Adaptation for Accelerated Projection Reconstruction MR

Yo Seob Han, Jaejun Yoo, Jong Chul Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Purpose: A radial k-space trajectory is one of well-established sampling
trajectory in magnetic resonance imaging. However, the radial k-space
trajectory requires a large number of radial lines for high-resolution
reconstruction. Increasing the number of lines causes longer sampling times,
making it more difficult for routine clinical use. If we reduce the radial
lines to reduce the sampling time, streaking artifact patterns are unavoidable.
To solve this problem, we propose a novel deep learning approach to reconstruct
high-resolution MR images from the under-sampled k-space data.

Methods: The proposed deep network estimates the streaking artifacts. Once
the streaking artifacts are estimated, an artifact-free image is then obtained
by subtracting the estimated streaking artifacts from the distorted image. In
the case of the limited number of available radial acquisition data, we apply a
domain adaptation scheme, which first pre-trains the network with a large
number of x-ray computed tomography (CT) data sets and then fine-tunes it with
only a few MR data sets.

Results: The proposed deep learning method shows better performance than the
existing compressed sensing algorithms, such as total variation and PR-FOCUSS.
In addition, the calculation time is several order of magnitude faster than
total variation and PR-FOCUSS methods.

Conclusion: The proposed deep learning method surpasses the image quality as
well as the computation times against the existing compressed sensing
algorithms. In addition, we demonstrate the possibilities of domain-adaptation
approach when a limited number of MR data is available.

Deep artifact learning for compressed sensing and parallel MRI

Dongwook Lee, Jaejun Yoo, Jong Chul Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Purpose: Compressed sensing MRI (CS-MRI) from single and parallel coils is
one of the powerful ways to reduce the scan time of MR imaging with performance
guarantee. However, the computational costs are usually expensive. This paper
aims to propose a computationally fast and accurate deep learning algorithm for
the reconstruction of MR images from highly down-sampled k-space data.

Theory: Based on the topological analysis, we show that the data manifold of
the aliasing artifact is easier to learn from a uniform subsampling pattern
with additional low-frequency k-space data. Thus, we develop deep aliasing
artifact learning networks for the magnitude and phase images to estimate and
remove the aliasing artifacts from highly accelerated MR acquisition.

Methods: The aliasing artifacts are directly estimated from the distorted
magnitude and phase images reconstructed from subsampled k-space data so that
we can get an aliasing-free images by subtracting the estimated aliasing
artifact from corrupted inputs. Moreover, to deal with the globally distributed
aliasing artifact, we develop a multi-scale deep neural network with a large
receptive field.

Results: The experimental results confirm that the proposed deep artifact
learning network effectively estimates and removes the aliasing artifacts.
Compared to existing CS methods from single and multi-coli data, the proposed
network shows minimal errors by removing the coherent aliasing artifacts.
Furthermore, the computational time is by order of magnitude faster.

Conclusion: As the proposed deep artifact learning network immediately
generates accurate reconstruction, it has great potential for clinical
applications.

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, Xiangyang Xue
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a novel rotation-based framework for arbitrary-oriented
text detection in natural scene images. We present the Rotation Region Proposal
Networks (RRPN), which is designed to generate inclined proposals with text
orientation angle information. The angle information is then adapted for
bounding box regression to make the proposals more accurately fit into the text
region in orientation. The Rotation Region-of-Interest (RRoI) pooling layer is
proposed to project arbitrary-oriented proposals to the feature map for a text
region classifier. The whole framework is built upon the Faster-RCNN
architecture, which ensures the computational efficiency of the
arbitrary-oriented text detection comparing with previous text detection
systems. We conduct experiments using the rotation-based framework on three
real-world scene text detection datasets, and demonstrate its superiority in
terms of effectiveness and efficiency over previous approaches.

Skin Lesion Classification using Class Activation Map

Xi Jia, Linlin Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We proposed a two stage framework with only one network to analyze skin
lesion images, we firstly trained a convolutional network to classify these
images, and cropped the import regions which the network has the maximum
activation value. In the second stage, we retrained this CNN with the image
regions extracted from stage one and output the final probabilities. The two
stage framework achieved a mean AUC of 0.857 in ISIC-2017 skin lesion
validation set and is 0.04 higher than that of the original inputs, 0.821.

Outlier Cluster Formation in Spectral Clustering

Takuro Ina, Atsushi Hashimoto, Masaaki Iiyama, Hidekazu Kasahara, Mikihiko Mori, Michihiko Minoh
Comments: 10 pages, 2 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Outlier detection and cluster number estimation is an important issue for
clustering real data. This paper focuses on spectral clustering, a time-tested
clustering method, and reveals its important properties related to outliers.
The highlights of this paper are the following two mathematical observations:
first, spectral clustering’s intrinsic property of an outlier cluster
formation, and second, the singularity of an outlier cluster with a valid
cluster number. Based on these observations, we designed a function that
evaluates clustering and outlier detection results. In experiments, we prepared
two scenarios, face clustering in photo album and person re-identification in a
camera network. We confirmed that the proposed method detects outliers and
estimates the number of clusters properly in both problems. Our method
outperforms state-of-the-art methods in both the 128-dimensional sparse space
for face clustering and the 4,096-dimensional non-sparse space for person
re-identification.

A Novel Multi-task Deep Learning Model for Skin Lesion Segmentation and Classification

Xulei Yang, Zeng Zeng, Si Yong Yeo, Colin Tan, Hong Liang Tey, Yi Su
Comments: Submission to support ISIC 2017 challenge results
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this study, a multi-task deep neural network is proposed for skin lesion
analysis. The proposed multi-task learning model solves different tasks (e.g.,
lesion segmentation and two independent binary lesion classifications) at the
same time by exploiting commonalities and differences across tasks. This
results in improved learning efficiency and potential prediction accuracy for
the task-specific models, when compared to training the individual models
separately. The proposed multi-task deep learning model is trained and
evaluated on the dermoscopic image sets from the International Skin Imaging
Collaboration (ISIC) 2017 Challenge – Skin Lesion Analysis towards Melanoma
Detection, which consists of 2000 training samples and 150 evaluation samples.
The experimental results show that the proposed multi-task deep learning model
achieves promising performances on skin lesion segmentation and classification.
The average value of Jaccard index for lesion segmentation is 0.724, while the
average values of area under the receiver operating characteristic curve (AUC)
on two individual lesion classifications are 0.880 and 0.972, respectively.

Adversarial Examples for Semantic Image Segmentation

Machine learning methods in general and Deep Neural Networks in particular
have shown to be vulnerable to adversarial perturbations. So far this
phenomenon has mainly been studied in the context of whole-image
classification. In this contribution, we analyse how adversarial perturbations
can affect the task of semantic segmentation. We show how existing adversarial
attackers can be transferred to this task and that it is possible to create
imperceptible adversarial perturbations that lead a deep network to misclassify
almost all pixels of a chosen class while leaving network prediction nearly
unchanged outside this class.

Large-Scale Evolution of Image Classifiers

Neural networks have proven effective at solving difficult problems but
designing their architectures can be challenging, even for image classification
problems alone. Evolutionary algorithms provide a technique to discover such
networks automatically. Despite significant computational requirements, we show
that evolving models that rival large, hand-designed architectures is possible
today. We employ simple evolutionary techniques at unprecedented scales to
discover models for the CIFAR-10 and CIFAR-100 datasets, starting from trivial
initial conditions. To do this, we use novel and intuitive mutation operators
that navigate large search spaces. We stress that no human participation is
required once evolution starts and that the output is a fully-trained model.
Throughout this work, we place special emphasis on the repeatability of
results, the variability in the outcomes and the computational requirements.

Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression

Jangwon Lee, Michael S. Ryoo
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We design a new approach that allows robot learning of new activities from
unlabeled human example videos. Given videos of humans executing the same
activity from a human’s viewpoint (i.e., first-person videos), our objective is
to make the robot learn the temporal structure of the activity as its future
regression network, and learn to transfer such model for its own motor
execution. We present a new deep learning model: We extend the state-of-the-art
convolutional object detection network for the detection of human hands in
training videos based on image information, and newly introduce the concept of
using a fully convolutional network to regress (i.e., predict) the intermediate
scene representation corresponding to the future frame (e.g., 1-2 seconds
later). Combining these allows direct prediction of future locations of human
hands and objects, which enables the robot to infer the motor control plan
using our manipulation network. We experimentally confirm that our approach
makes learning of robot activities from unlabeled human interaction videos
possible, and demonstrate that our robot is able to execute the learned
collaborative activities in real-time directly based on its camera input.

Estimating the resolution of real images

Ryuta Mizutani, Rino Saiga, Susumu Takekoshi, Chie Inomoto, Naoya Nakamura, Makoto Arai, Kenichi Oshima, Masanari Itokawa, Akihisa Takeuchi, Kentaro Uesugi, Yasuko Terada, Yoshio Suzuki
Comments: 4 pages, 2 figures
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

Image resolvability is the primary concern in imaging. This paper reports an
estimation of the full width at half maximum of the point spread function from
a Fourier domain plot of real sample images by neither using test objects, nor
defining a threshold criterion. We suggest that this method can be applied to
any type of image, independently of the imaging modality.

Belief Propagation in Conditional RBMs for Structured Prediction

Wei Ping, Alexander Ihler
Comments: Artificial Intelligence and Statistics (AISTATS) 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Restricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular
models for a wide range of applications. In previous work, learning on such
models has been dominated by contrastive divergence~(CD) and its variants.
Belief propagation~(BP) algorithms are believed to be slow for structured
prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as
CD when applied in learning~(e.g., Larochelle et al. [2012]). In this work, we
present a matrix-based implementation of belief propagation algorithms on
CRBMs, which is easily scalable to tens of thousands of visible and hidden
units. We demonstrate that, in both maximum likelihood and max-margin learning,
training conditional RBMs with BP as the inference routine can provide
significantly better results than current state-of-the-art CD methods on
structured prediction problems. We also include practical guidelines on
training CRBMs with BP, and some insights on the interaction of learning and
inference algorithms for CRBMs.

A Restaurant Process Mixture Model for Connectivity Based Parcellation of the Cortex

Daniel Moyer, Boris A Gutman, Neda Jahanshad, Paul M. Thompson
Comments: In the Proceedings of Information Processing in Medical Imaging 2017
Subjects: Neurons and Cognition (q-bio.NC); Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM); Applications (stat.AP)

One of the primary objectives of human brain mapping is the division of the
cortical surface into functionally distinct regions, i.e. parcellation. While
it is generally agreed that at macro-scale different regions of the cortex have
different functions, the exact number and configuration of these regions is not
known. Methods for the discovery of these regions are thus important,
particularly as the volume of available information grows. Towards this end, we
present a parcellation method based on a Bayesian non-parametric mixture model
of cortical connectivity.

Depth Estimation using Modified Cost Function for Occlusion Handling

Krzysztof Wegner, Olgierd Stankiewicz
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)

The paper presents a novel approach to occlusion handling problem in depth
estimation using three views. A solution based on modification of similarity
cost function is proposed. During the depth estimation via optimization
algorithms like Graph Cut similarity metric is constantly updated so that only
non-occluded fragments in side views are considered. At each iteration of the
algorithm non-occluded fragments are detected based on side view virtual depth
maps synthesized from the best currently estimated depth map of the center
view. Then similarity metric is updated for correspondence search only in
non-occluded regions of the side views. The experimental results, conducted on
well-known 3D video test sequences, have proved that the depth maps estimated
with the proposed approach provide about 1.25 dB virtual view quality
improvement in comparison to the virtual view synthesized based on depth maps
generated by the state-of-the-art MPEG Depth Estimation Reference Software.

Artificial Intelligence

Actor-Critic Reinforcement Learning with Simultaneous Human Control and Feedback

Kory W. Mathewson, Patrick M. Pilarski
Comments: 10 pages, 2 pages of references, 8 figures. Under review for the 34th International Conference on Machine Learning,Sydney, Australia, 2017. Copyright 2017 by the authors
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

This paper contributes a first study into how different human users deliver
simultaneous control and feedback signals during human-robot interaction. As
part of this work, we formalize and present a general interactive learning
framework for online cooperation between humans and reinforcement learning
agents. In many human-machine interaction settings, there is a growing gap
between the degrees-of-freedom of complex semi-autonomous systems and the
number of human control channels. Simple human control and feedback mechanisms
are required to close this gap and allow for better collaboration between
humans and machines on complex tasks. To better inform the design of concurrent
control and feedback interfaces, we present experimental results from a
human-robot collaborative domain wherein the human must simultaneously deliver
both control and feedback signals to interactively train an actor-critic
reinforcement learning robot. We compare three experimental conditions: 1)
human delivered control signals, 2) reward-shaping feedback signals, and 3)
simultaneous control and feedback. Our results suggest that subjects provide
less feedback when simultaneously delivering feedback and control signals and
that control signal quality is not significantly diminished. Our data suggest
that subjects may also modify when and how they provide feedback. Through
algorithmic development and tuning informed by this study, we expect
semi-autonomous actions of robotic agents can be better shaped by human
feedback, allowing for seamless collaboration and improved performance in
difficult interactive domains.

FeUdal Networks for Hierarchical Reinforcement Learning

Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu
Subjects: Artificial Intelligence (cs.AI)

We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical
reinforcement learning. Our approach is inspired by the feudal reinforcement
learning proposal of Dayan and Hinton, and gains power and efficacy by
decoupling end-to-end learning across multiple levels — allowing it to utilise
different resolutions of time. Our framework employs a Manager module and a
Worker module. The Manager operates at a lower temporal resolution and sets
abstract goals which are conveyed to and enacted by the Worker. The Worker
generates primitive actions at every tick of the environment. The decoupled
structure of FuN conveys several benefits — in addition to facilitating very
long timescale credit assignment it also encourages the emergence of
sub-policies associated with different goals set by the Manager. These
properties allow FuN to dramatically outperform a strong baseline agent on
tasks that involve long-term credit assignment or memorisation. We demonstrate
the performance of our proposed system on a range of tasks from the ATARI suite
and also from a 3D DeepMind Lab environment.

Sequential Plan Recognition

Reuth Mirsky, Roni Stern, Ya'akov (Kobi)Gal, Meir Kalech
Subjects: Artificial Intelligence (cs.AI)

Plan recognition algorithms infer agents’ plans from their observed actions.
Due to imperfect knowledge about the agent’s behavior and the environment, it
is often the case that there are multiple hypotheses about an agent’s plans
that are consistent with the observations, though only one of these hypotheses
is correct. This paper addresses the problem of how to disambiguate between
hypotheses, by querying the acting agent about whether a candidate plan in one
of the hypotheses matches its intentions. This process is performed
sequentially and used to update the set of possible hypotheses during the
recognition process. The paper defines the sequential plan recognition process
(SPRP), which seeks to reduce the number of hypotheses using a minimal number
of queries. We propose a number of policies for the SPRP which use maximum
likelihood and information gain to choose which plan to query. We show this
approach works well in practice on two domains from the literature,
significantly reducing the number of hypotheses using fewer queries than a
baseline approach. Our results can inform the design of future plan recognition
systems that interleave the recognition process with intelligent interventions
of their users.

Unsupervised Basis Function Adaptation for Reinforcement Learning

Edward W. Barker, Charl J. Ras
Comments: Extended abstract submitted (3 March 2017) for 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

When using reinforcement learning (RL) algorithms to evaluate a policy it is
common, given a large state space, to introduce some form of approximation
architecture for the value function (VF). The exact form of this architecture
can have a significant effect on the accuracy of the VF estimate, however, and
determining a suitable approximation architecture can often be a highly complex
task. Consequently there is a large amount of interest in the potential for
allowing RL algorithms to adaptively generate approximation architectures.

We investigate a method of adapting approximation architectures which uses
feedback regarding the frequency with which an agent has visited certain states
to guide which areas of the state space to approximate with greater detail.
This method is “unsupervised” in the sense that it makes no direct reference to
reward or the VF estimate. We introduce an algorithm based upon this idea which
adapts a state aggregation approximation architecture on-line.

A common method of scoring a VF estimate is to weight the squared Bellman
error of each state-action by the probability of that state-action occurring.
Adopting this scoring method, and assuming (S) states, we demonstrate
theoretically that – provided (1) the number of cells (X) in the state
aggregation architecture is of order (sqrt{S}log_2{S}ln{S}) or greater, (2)
the policy and transition function are close to deterministic, and (3) the
prior for the transition function is uniformly distributed – our algorithm,
used in conjunction with a suitable RL algorithm, can guarantee a score which
is arbitrarily close to zero as (S) becomes large. It is able to do this
despite having only (O(X log_2S)) space complexity and negligible time
complexity. The results take advantage of certain properties of the stationary
distributions of Markov chains.

Stochastic Separation Theorems

A.N. Gorban, I.Y. Tyukin
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

A set (S) is linearly separable if each (xin S) can be separated from the
rest of (S) by a linear functional. We study random (N)-element sets in
(mathbb{R}^n) for large (n) and demonstrate that for (N<aexp(b{n})) they are
linearly separable with probability (p), (p>1-vartheta), for a given (small)
(vartheta>0). Constants (a,b>0) depend on the probability distribution and the
constant (vartheta). The results are important for machine learning in high
dimension, especially for correction of unavoidable mistakes of legacy
Artificial Intelligence systems.

On the Behavior of Convolutional Nets for Feature Extraction

Convolutional neural networks (CNN) are representation learning techniques
that achieve state-of-the-art performance on almost every image-related,
machine learning task. Applying the representation languages build by these
models to tasks beyond the one they were originally trained for is a field of
interest known as transfer learning for feature extraction. Through this
approach, one can apply the image descriptors learnt by a CNN after processing
millions of images to any dataset, without an expensive training phase.
Contributions to this field have so far focused on extracting CNN features from
layers close to the output (e.g., fully connected layers), particularly because
they work better when used out-of-the-box to feed a classifier. Nevertheless,
the rest of CNN features is known to encode a wide variety of visual
information, which could be potentially exploited on knowledge representation
and reasoning tasks. In this paper we analyze the behavior of each feature
individually, exploring their intra/inter class activations for all classes of
three different datasets. From this study we learn that low and middle level
features behave very differently to high level features, the former being more
descriptive and the latter being more discriminant. We show how low and middle
level features can be used for knowledge representation purposes both by their
presence or by their absence. We also study how much noise these features may
encode, and propose a thresholding approach to discard most of it. Finally, we
discuss the potential implications of these results in the context of knowledge
representation using features extracted from a CNN.

Large-Scale Evolution of Image Classifiers

Neural networks have proven effective at solving difficult problems but
designing their architectures can be challenging, even for image classification
problems alone. Evolutionary algorithms provide a technique to discover such
networks automatically. Despite significant computational requirements, we show
that evolving models that rival large, hand-designed architectures is possible
today. We employ simple evolutionary techniques at unprecedented scales to
discover models for the CIFAR-10 and CIFAR-100 datasets, starting from trivial
initial conditions. To do this, we use novel and intuitive mutation operators
that navigate large search spaces. We stress that no human participation is
required once evolution starts and that the output is a fully-trained model.
Throughout this work, we place special emphasis on the repeatability of
results, the variability in the outcomes and the computational requirements.

Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression

Jangwon Lee, Michael S. Ryoo
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We design a new approach that allows robot learning of new activities from
unlabeled human example videos. Given videos of humans executing the same
activity from a human’s viewpoint (i.e., first-person videos), our objective is
to make the robot learn the temporal structure of the activity as its future
regression network, and learn to transfer such model for its own motor
execution. We present a new deep learning model: We extend the state-of-the-art
convolutional object detection network for the detection of human hands in
training videos based on image information, and newly introduce the concept of
using a fully convolutional network to regress (i.e., predict) the intermediate
scene representation corresponding to the future frame (e.g., 1-2 seconds
later). Combining these allows direct prediction of future locations of human
hands and objects, which enables the robot to infer the motor control plan
using our manipulation network. We experimentally confirm that our approach
makes learning of robot activities from unlabeled human interaction videos
possible, and demonstrate that our robot is able to execute the learned
collaborative activities in real-time directly based on its camera input.

End-to-End Task-Completion Neural Dialogue Systems

Xuijun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper presents an end-to-end learning framework for task-completion
neural dialogue systems, which leverages supervised and reinforcement learning
with various deep-learning models. The system is able to interface with a
structured database, and interact with users for assisting them to access
information and complete tasks such as booking movie tickets. Our experiments
in a movie-ticket booking domain show the proposed system outperforms a
modular-based dialogue system and is more robust to noise produced by other
components in the system.

A Laplacian Framework for Option Discovery in Reinforcement Learning

Marlos C. Machado, Marc G. Bellemare, Michael Bowling
Comments: Version submitted to the 34th International Conference on Machine Learning (ICML)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Representation learning and option discovery are two of the biggest
challenges in reinforcement learning (RL). Proto-RL is a well known approach
for representation learning in MDPs. The representations learned with this
framework are called proto-value functions (PVFs). In this paper we address the
option discovery problem by showing how PVFs implicitly define options. We do
it by introducing eigenpurposes, intrinsic reward functions derived from the
learned representations. The options discovered from eigenpurposes traverse the
principal directions of the state space. They are useful for multiple tasks
because they are independent of the agents’ intentions. Moreover, by capturing
the diffusion process of a random walk, different options act at different time
scales, making them helpful for exploration strategies. We demonstrate features
of eigenpurposes in traditional tabular domains as well as in Atari 2600 games.

Controllable Text Generation

Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P. Xing
Comments: updated discussions and references
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

Generic generation and manipulation of text is challenging and has limited
success compared to recent deep generative modeling in visual domain. This
paper aims at generating plausible natural language sentences, whose attributes
are dynamically controlled by learning disentangled latent representations with
designated semantics. We propose a new neural generative model which combines
variational auto-encoders and holistic attribute discriminators for effective
imposition of semantic structures. With differentiable approximation to
discrete text samples, explicit constraints on independent attribute controls,
and efficient collaborative learning of generator and discriminators, our model
learns highly interpretable representations from even only word annotations,
and produces realistic sentences with desired attributes. Quantitative
evaluation validates the accuracy of sentence and attribute generation.

DAWT: Densely Annotated Wikipedia Texts across multiple languages

Nemanja Spasojevic, Preeti Bhargava, Guoning Hu
Comments: 8 pages, 3 figures, 7 tables, WWW2017, WWW 2017 Companion proceedings
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

In this work, we open up the DAWT dataset – Densely Annotated Wikipedia Texts
across multiple languages. The annotations include labeled text mentions
mapping to entities (represented by their Freebase machine ids) as well as the
type of the entity. The data set contains total of 13.6M articles, 5.0B tokens,
13.8M mention entity co-occurrences. DAWT contains 4.8 times more anchor text
to entity links than originally present in the Wikipedia markup. Moreover, it
spans several languages including English, Spanish, Italian, German, French and
Arabic. We also present the methodology used to generate the dataset which
enriches Wikipedia markup in order to increase number of links. In addition to
the main dataset, we open up several derived datasets including mention entity
co-occurrence counts and entity embeddings, as well as mappings between
Freebase ids and Wikidata item ids. We also discuss two applications of these
datasets and hope that opening them up would prove useful for the Natural
Language Processing and Information Retrieval communities, as well as
facilitate multi-lingual research.

Information Retrieval

Employing Spectral Domain Features for Efficient Collaborative Filtering

Doaa M. Shawky
Subjects: Information Retrieval (cs.IR)

Collaborative filtering (CF) is a powerful recommender system that generates
a list of recommended items for an active user based on the ratings of similar
users. This paper presents a novel approach to CF by first finding the set of
users similar to the active user by adopting self-organizing maps (SOM),
followed by k-means clustering. Then, the ratings for each item in the cluster
closest to the active user are mapped to the frequency domain using the
Discrete Fourier Transform (DFT). The power spectra of the mapped ratings are
generated, and a new similarity measure based on the coherence of these power
spectra is calculated. The proposed similarity measure is more time efficient
than current state-of-the-art measures. Moreover, it can capture the global
similarity between the profiles of users. Experimental results show that the
proposed approach overcomes the major problems in existing CF algorithms as
follows: First, it mitigates the scalability problem by creating clusters of
similar users and applying the time-efficient similarity measure. Second, its
frequency-based similarity measure is less sensitive to sparsity problems
because the DFT performs efficiently even with sparse data. Third, it
outperforms standard similarity measures in terms of accuracy.

DAWT: Densely Annotated Wikipedia Texts across multiple languages

In this work, we open up the DAWT dataset – Densely Annotated Wikipedia Texts
across multiple languages. The annotations include labeled text mentions
mapping to entities (represented by their Freebase machine ids) as well as the
type of the entity. The data set contains total of 13.6M articles, 5.0B tokens,
13.8M mention entity co-occurrences. DAWT contains 4.8 times more anchor text
to entity links than originally present in the Wikipedia markup. Moreover, it
spans several languages including English, Spanish, Italian, German, French and
Arabic. We also present the methodology used to generate the dataset which
enriches Wikipedia markup in order to increase number of links. In addition to
the main dataset, we open up several derived datasets including mention entity
co-occurrence counts and entity embeddings, as well as mappings between
Freebase ids and Wikidata item ids. We also discuss two applications of these
datasets and hope that opening them up would prove useful for the Natural
Language Processing and Information Retrieval communities, as well as
facilitate multi-lingual research.

Deconvolving Feedback Loops in Recommender Systems

Ayan Sinha, David F. Gleich, Karthik Ramani
Comments: Neural Information Processing Systems, 2016
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)

Collaborative filtering is a popular technique to infer users’ preferences on
new content based on the collective information of all users preferences.
Recommender systems then use this information to make personalized suggestions
to users. When users accept these recommendations it creates a feedback loop in
the recommender system, and these loops iteratively influence the collaborative
filtering algorithm’s predictions over time. We investigate whether it is
possible to identify items affected by these feedback loops. We state
sufficient assumptions to deconvolve the feedback loops while keeping the
inverse solution tractable. We furthermore develop a metric to unravel the
recommender system’s influence on the entire user-item rating matrix. We use
this metric on synthetic and real-world datasets to (1) identify the extent to
which the recommender system affects the final rating matrix, (2) rank
frequently recommended items, and (3) distinguish whether a user’s rated item
was recommended or an intrinsic preference. Our results indicate that it is
possible to recover the ratings matrix of intrinsic user preferences using a
single snapshot of the ratings matrix without any temporal information.

Computation and Language

Exponential Moving Average Model in Parallel Speech Recognition Training

Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei
Comments: 5 pages
Subjects: Computation and Language (cs.CL)

As training data rapid growth, large-scale parallel training with multi-GPUs
cluster is widely applied in the neural network model learning currently.We
present a new approach that applies exponential moving average method in
large-scale parallel training of neural network model. It is a non-interference
strategy that the exponential moving average model is not broadcasted to
distributed workers to update their local models after model synchronization in
the training process, and it is implemented as the final model of the training
system. Fully-connected feed-forward neural networks (DNNs) and deep
unidirectional Long short-term memory (LSTM) recurrent neural networks (RNNs)
are successfully trained with proposed method for large vocabulary continuous
speech recognition on Shenma voice search data in Mandarin. The character error
rate (CER) of Mandarin speech recognition further degrades than
state-of-the-art approaches of parallel training.

End-to-End Task-Completion Neural Dialogue Systems

Xuijun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper presents an end-to-end learning framework for task-completion
neural dialogue systems, which leverages supervised and reinforcement learning
with various deep-learning models. The system is able to interface with a
structured database, and interact with users for assisting them to access
information and complete tasks such as booking movie tickets. Our experiments
in a movie-ticket booking domain show the proposed system outperforms a
modular-based dialogue system and is more robust to noise produced by other
components in the system.

A Comparative Study of Word Embeddings for Reading Comprehension

Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, William W. Cohen
Subjects: Computation and Language (cs.CL)

The focus of past machine learning research for Reading Comprehension tasks
has been primarily on the design of novel deep learning architectures. Here we
show that seemingly minor choices made on (1) the use of pre-trained word
embeddings, and (2) the representation of out-of-vocabulary tokens at test
time, can turn out to have a larger impact than architectural choices on the
final performance. We systematically explore several options for these choices,
and provide recommendations to researchers working in this area.

Controllable Text Generation

Generic generation and manipulation of text is challenging and has limited
success compared to recent deep generative modeling in visual domain. This
paper aims at generating plausible natural language sentences, whose attributes
are dynamically controlled by learning disentangled latent representations with
designated semantics. We propose a new neural generative model which combines
variational auto-encoders and holistic attribute discriminators for effective
imposition of semantic structures. With differentiable approximation to
discrete text samples, explicit constraints on independent attribute controls,
and efficient collaborative learning of generator and discriminators, our model
learns highly interpretable representations from even only word annotations,
and produces realistic sentences with desired attributes. Quantitative
evaluation validates the accuracy of sentence and attribute generation.

DAWT: Densely Annotated Wikipedia Texts across multiple languages

In this work, we open up the DAWT dataset – Densely Annotated Wikipedia Texts
across multiple languages. The annotations include labeled text mentions
mapping to entities (represented by their Freebase machine ids) as well as the
type of the entity. The data set contains total of 13.6M articles, 5.0B tokens,
13.8M mention entity co-occurrences. DAWT contains 4.8 times more anchor text
to entity links than originally present in the Wikipedia markup. Moreover, it
spans several languages including English, Spanish, Italian, German, French and
Arabic. We also present the methodology used to generate the dataset which
enriches Wikipedia markup in order to increase number of links. In addition to
the main dataset, we open up several derived datasets including mention entity
co-occurrence counts and entity embeddings, as well as mappings between
Freebase ids and Wikidata item ids. We also discuss two applications of these
datasets and hope that opening them up would prove useful for the Natural
Language Processing and Information Retrieval communities, as well as
facilitate multi-lingual research.

Distributed, Parallel, and Cluster Computing

A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Kishori M. Konwar, N. Prakash, Nancy Lynch, Muriel Medard
Comments: 22 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

Motivated by emerging applications to the edge computing paradigm, we
introduce a two-layer erasure-coded fault-tolerant distributed storage system
offering atomic access for read and write operations. In edge computing,
clients interact with an edge-layer of servers that is geographically near; the
edge-layer in turn interacts with a back-end layer of servers. The edge-layer
provides low latency access and temporary storage for client operations, and
uses the back-end layer for persistent storage. Our algorithm, termed Layered
Data Storage (LDS) algorithm, offers several features suitable for
edge-computing systems, works under asynchronous message-passing environments,
supports multiple readers and writers, and can tolerate (f_1 < n_1/2) and (f_2
< n_2/3) crash failures in the two layers having (n_1) and (n_2) servers,
respectively. We use a class of erasure codes known as regenerating codes for
storage of data in the back-end layer. The choice of regenerating codes,
instead of popular choices like Reed-Solomon codes, not only optimizes the cost
of back-end storage, but also helps in optimizing communication cost of read
operations, when the value needs to be recreated all the way from the back-end.
The two-layer architecture permits a modular implementation of atomicity and
erasure-code protocols; the implementation of erasure-codes is mostly limited
to interaction between the two layers. We prove liveness and atomicity of LDS,
and also compute performance costs associated with read and write operations.
Further, in a multi-object system running (N) independent instances of LDS,
where only a small fraction of the objects undergo concurrent accesses at any
point during the execution, the overall storage cost is dominated by that of
persistent storage in the back-end layer, and is given by (Theta(N)).

Workload Analysis of Blue Waters

Matthew D. Jones, Joseph P. White, Martins Innus, Robert L. DeLeon, Nikolay Simakov, Jeffrey T. Palmer, Steven M. Gallo, Thomas R. Furlani, Michael Showerman, Robert Brunner, Andry Kot, Gregory Bauer, Brett Bode, Jeremy Enos, William Kramer
Comments: 107 pages, >100 figures (figure sizes reduced to save space, contact authors for version with full resolution)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Blue Waters is a Petascale-level supercomputer whose mission is to enable the
national scientific and research community to solve “grand challenge” problems
that are orders of magnitude more complex than can be carried out on other high
performance computing systems. Given the important and unique role that Blue
Waters plays in the U.S. research portfolio, it is important to have a detailed
understanding of its workload in order to guide performance optimization both
at the software and system configuration level as well as inform architectural
balance tradeoffs. Furthermore, understanding the computing requirements of the
Blue Water’s workload (memory access, IO, communication, etc.), which is
comprised of some of the most computationally demanding scientific problems,
will help drive changes in future computing architectures, especially at the
leading edge. With this objective in mind, the project team carried out a
detailed workload analysis of Blue Waters.

Everware toolkit. Supporting reproducible science and challenge-driven education

Andrey Ustyuzhanin, Timothy Daniel Head, Igor Babuschkin, Alexander Tiunov
Subjects: Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC)

Modern science clearly demands for a higher level of reproducibility and
collaboration. To make research fully reproducible one has to take care of
several aspects: research protocol description, data access, environment
preservation, workflow pipeline, and analysis script preservation. Version
control systems like git help with the workflow and analysis scripts part.
Virtualization techniques like Docker or Vagrant can help deal with
environments. Jupyter notebooks are a powerful platform for conducting research
in a collaborative manner. We present project Everware that seamlessly
integrates git repository management systems such as Github or Gitlab, Docker
and Jupyter helping with a) sharing results of real research and b) boosts
education activities. With the help of Everware one can not only share the
final artifacts of research but all the depth of the research process. This
been shown to be extremely helpful during organization of several data analysis
hackathons and machine learning schools. Using Everware participants could
start from an existing solution instead of starting from scratch. They could
start contributing immediately. Everware allows its users to make use of their
own computational resources to run the workflows they are interested in, which
leads to higher scalability of the toolkit.

Runtime Optimization of Join Location in Parallel Data Management Systems

Bikash Chandra, S. Sudarshan
Comments: 16 pages
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

Applications running on parallel systems often need to join a streaming
relation or a stored relation with data indexed in a parallel data storage
system. Some applications also compute UDFs on the joined tuples. The join can
be done at the data storage nodes, corresponding to reduce side joins, or by
fetching data from the storage system, corresponding to map side join. Both may
be suboptimal: reduce side joins may cause skew, while map side joins may lead
to a lot of data being transferred and replicated.

In this paper, we present techniques to make runtime decisions between the
two options on a per key basis, in order to improve the throughput of the join,
accounting for UDF computation if any. Our techniques are based on an extended
ski-rental algorithm and provide worst-case performance guarantees with respect
to the optimal point in the space considered by us. Our techniques use load
balancing taking into account the CPU, network and I/O costs as well as the
load at clients and servers. We have implemented our techniques on Hadoop,
Spark and the Muppet stream processing engine. Our experiments show that our
optimization techniques provide significant improvement in throughput over
existing techniques.

When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

Aneesh Sharma, C. Seshadhri, Ashish Goel
Subjects: Social and Information Networks (cs.SI); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Finding similar user pairs is a fundamental task in social networks, with
numerous applications in ranking and personalization tasks such as link
prediction and tie strength detection. A common manifestation of user
similarity is based upon network structure: each user is represented by a
vector that represents the user’s network connections, where pairwise cosine
similarity among these vectors defines user similarity. The predominant task
for user similarity applications is to discover all similar pairs that have a
pairwise cosine similarity value larger than a given threshold ( au). In
contrast to previous work where ( au) is assumed to be quite close to 1, we
focus on recommendation applications where ( au) is small, but still
meaningful. The all pairs cosine similarity problem is computationally
challenging on networks with billions of edges, and especially so for settings
with small ( au). To the best of our knowledge, there is no practical solution
for computing all user pairs with, say ( au = 0.2) on large social networks,
even using the power of distributed algorithms.

Our work directly addresses this challenge by introducing a new algorithm —
WHIMP — that solves this problem efficiently in the MapReduce model. The key
insight in WHIMP is to combine the “wedge-sampling” approach of Cohen-Lewis for
approximate matrix multiplication with the SimHash random projection techniques
of Charikar. We provide a theoretical analysis of WHIMP, proving that it has
near optimal communication costs while maintaining computation cost comparable
with the state of the art. We also empirically demonstrate WHIMP’s scalability
by computing all highly similar pairs on four massive data sets, and show that
it accurately finds high similarity pairs. In particular, we note that WHIMP
successfully processes the entire Twitter network, which has tens of billions
of edges.

Large-Scale Evolution of Image Classifiers

Neural networks have proven effective at solving difficult problems but
designing their architectures can be challenging, even for image classification
problems alone. Evolutionary algorithms provide a technique to discover such
networks automatically. Despite significant computational requirements, we show
that evolving models that rival large, hand-designed architectures is possible
today. We employ simple evolutionary techniques at unprecedented scales to
discover models for the CIFAR-10 and CIFAR-100 datasets, starting from trivial
initial conditions. To do this, we use novel and intuitive mutation operators
that navigate large search spaces. We stress that no human participation is
required once evolution starts and that the output is a fully-trained model.
Throughout this work, we place special emphasis on the repeatability of
results, the variability in the outcomes and the computational requirements.

Learning

EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

Justin Fu, John D. Co-Reyes, Sergey Levine
Subjects: Learning (cs.LG)

Efficient exploration in high-dimensional environments remains a key
challenge in reinforcement learning (RL). Deep reinforcement learning methods
have demonstrated the ability to learn with highly general policy classes for
complex tasks with high-dimensional inputs, such as raw images. However, many
of the most effective exploration techniques rely on tabular representations,
or on the ability to construct a generative model over states and actions. Both
are exceptionally difficult when these inputs are complex and high dimensional.
On the other hand, it is comparatively easy to build discriminative models on
top of complex states such as images using standard deep neural networks. This
paper introduces a novel approach, EX2, which approximates state visitation
densities by training an ensemble of discriminators, and assigns reward bonuses
to rarely visited states. We demonstrate that EX2 achieves comparable
performance to the state-of-the-art methods on low-dimensional tasks, and its
effectiveness scales into high-dimensional state spaces such as visual domains
without hand-designing features or density models.

Learning Graphical Games from Behavioral Data: Sufficient and Necessary Conditions

Asish Ghoshal, Jean Honorio
Comments: Accepted to AISTATS 2017, Florida. arXiv admin note: substantial text overlap with arXiv:1607.02959
Subjects: Learning (cs.LG)

In this paper we obtain sufficient and necessary conditions on the number of
samples required for exact recovery of the pure-strategy Nash equilibria (PSNE)
set of a graphical game from noisy observations of joint actions. We consider
sparse linear influence games — a parametric class of graphical games with
linear payoffs, and represented by directed graphs of n nodes (players) and
in-degree of at most k. We show that one can efficiently recover the PSNE set
of a linear influence game with (O(k^2 log n)) samples, under very general
observation models. On the other hand, we show that (Omega(k log n)) samples
are necessary for any procedure to recover the PSNE set from observations of
joint actions.

Stochastic Separation Theorems

A.N. Gorban, I.Y. Tyukin
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

A set (S) is linearly separable if each (xin S) can be separated from the
rest of (S) by a linear functional. We study random (N)-element sets in
(mathbb{R}^n) for large (n) and demonstrate that for (N<aexp(b{n})) they are
linearly separable with probability (p), (p>1-vartheta), for a given (small)
(vartheta>0). Constants (a,b>0) depend on the probability distribution and the
constant (vartheta). The results are important for machine learning in high
dimension, especially for correction of unavoidable mistakes of legacy
Artificial Intelligence systems.

Learning Identifiable Gaussian Bayesian Networks in Polynomial Time and Sample Complexity

Asish Ghoshal, Jean Honorio
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Learning the directed acyclic graph (DAG) structure of a Bayesian network
from observational data is a notoriously difficult problem for which many
hardness results are known. In this paper we propose a provably polynomial-time
algorithm for learning sparse Gaussian Bayesian networks with equal noise
variance — a class of Bayesian networks for which the DAG structure can be
uniquely identified from observational data — under high-dimensional
settings. We show that (O(k^4 log p)) number of samples suffices for our
method to recover the true DAG structure with high probability, where (p) is
the number of variables and (k) is the maximum Markov blanket size. We obtain
our theoretical guarantees under a condition called Restricted Strong Adjacency
Faithfulness, which is strictly weaker than strong faithfulness — a condition
that other methods based on conditional independence testing need for their
success. The sample complexity of our method matches the information-theoretic
limits in terms of the dependence on (p). We show that our method out-performs
existing state-of-the-art methods for learning Gaussian Bayesian networks in
terms of recovering the true DAG structure while being comparable in speed to
heuristic methods.

Dynamic State Warping

Zhichen Gong, Huanhuan Chen
Subjects: Learning (cs.LG)

The ubiquity of sequences in many domains enhances significant recent
interest in sequence learning, for which a basic problem is how to measure the
distance between sequences. Dynamic time warping (DTW) aligns two sequences by
nonlinear local warping and returns a distance value. DTW shows superior
ability in many applications, e.g. video, image, etc. However, in DTW, two
points are paired essentially based on point-to-point Euclidean distance (ED)
without considering the autocorrelation of sequences. Thus, points with
different semantic meanings, e.g. peaks and valleys, may be matched providing
their coordinate values are similar. As a result, DTW is sensitive to noise and
poorly interpretable. This paper proposes an efficient and flexible sequence
alignment algorithm, dynamic state warping (DSW). DSW converts each time point
into a latent state, which endows point-wise autocorrelation information.
Alignment is performed by using the state sequences. Thus DSW is able to yield
alignment that is semantically more interpretable than that of DTW. Using one
nearest neighbor classifier, DSW shows significant improvement on
classification accuracy in comparison to ED (70/85 wins) and DTW (74/85 wins).
We also empirically demonstrate that DSW is more robust and scales better to
long sequences than ED and DTW.

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
Comments: 17 pages
Subjects: Learning (cs.LG)

Researchers have demonstrated state-of-the-art performance in sequential
decision making problems (e.g., robotics control, sequential prediction) with
deep neural network models. One often has access to near-optimal oracles that
achieve good performance on the task during training. We demonstrate that
AggreVaTeD — a policy gradient extension of the Imitation Learning (IL)
approach of (Ross & Bagnell, 2014) — can leverage such an oracle to achieve
faster and better solutions with less training data than a less-informed
Reinforcement Learning (RL) technique. Using both feedforward and recurrent
neural network predictors, we present stochastic gradient procedures on a
sequential prediction task, dependency-parsing from raw image data, as well as
on various high dimensional robotics control problems. We also provide a
comprehensive theoretical study of IL that demonstrates we can expect up to
exponentially lower sample complexity for learning with AggreVaTeD than with RL
algorithms, which backs our empirical findings. Our results and theory indicate
that the proposed approach can achieve superior performance with respect to the
oracle when the demonstrator is sub-optimal.

Active Learning for Cost-Sensitive Classification

Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daume III, John Langford
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We design an active learning algorithm for cost-sensitive multiclass
classification: problems where different errors have different costs. Our
algorithm, COAL, makes predictions by regressing on each label’s cost and
predicting the smallest. On a new example, it uses a set of regressors that
perform well on past data to estimate possible costs for each label. It queries
only the labels that could be the best, ignoring the sure losers. We prove COAL
can be efficiently implemented for any regression family that admits squared
loss optimization; it also enjoys strong guarantees with respect to predictive
performance and labeling effort. We empirically compare COAL to passive
learning, showing significant improvements in labeling effort and test cost.

Scalable Deep Traffic Flow Neural Networks for Urban Traffic Congestion Prediction

Mohammadhani Fouladgar, Mostafa Parchami, Ramez Elmasri, Amir Ghaderi
Subjects: Learning (cs.LG)

Tracking congestion throughout the network road is a critical component of
Intelligent transportation network management systems. Understanding how the
traffic flows and short-term prediction of congestion occurrence due to
rush-hour or incidents can be beneficial to such systems to effectively manage
and direct the traffic to the most appropriate detours. Many of the current
traffic flow prediction systems are designed by utilizing a central processing
component where the prediction is carried out through aggregation of the
information gathered from all measuring stations. However, centralized systems
are not scalable and fail provide real-time feedback to the system whereas in a
decentralized scheme, each node is responsible to predict its own short-term
congestion based on the local current measurements in neighboring nodes.

We propose a decentralized deep learning-based method where each node
accurately predicts its own congestion state in real-time based on the
congestion state of the neighboring stations. Moreover, historical data from
the deployment site is not required, which makes the proposed method more
suitable for newly installed stations. In order to achieve higher performance,
we introduce a regularized Euclidean loss function that favors high congestion
samples over low congestion samples to avoid the impact of the unbalanced
training dataset. A novel dataset for this purpose is designed based on the
traffic data obtained from traffic control stations in northern California.
Extensive experiments conducted on the designed benchmark reflect a successful
congestion prediction.

Optimization of distributions differences for classification

Mohammad Reza Bonyadi, Quang M. Tieng, David C. Reutens
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

In this paper we introduce a new classification algorithm called Optimization
of Distributions Differences (ODD). The algorithm aims to find a transformation
from the feature space to a new space where the instances in the same class are
as close as possible to one another while the gravity centers of these classes
are as far as possible from one another. This aim is formulated as a
multiobjective optimization problem that is solved by a hybrid of an
evolutionary strategy and the Quasi-Newton method. The choice of the
transformation function is flexible and could be any continuous space function.
We experiment with a linear and a non-linear transformation in this paper. We
show that the algorithm can outperform 6 other state-of-the-art classification
methods, namely naive Bayes, support vector machines, linear discriminant
analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in
12 standard classification datasets. Our results show that the method is less
sensitive to the imbalanced number of instances comparing to these methods. We
also show that ODD maintains its performance better than other classification
methods in these datasets, hence, offers a better generalization ability.

Belief Propagation in Conditional RBMs for Structured Prediction

Restricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular
models for a wide range of applications. In previous work, learning on such
models has been dominated by contrastive divergence~(CD) and its variants.
Belief propagation~(BP) algorithms are believed to be slow for structured
prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as
CD when applied in learning~(e.g., Larochelle et al. [2012]). In this work, we
present a matrix-based implementation of belief propagation algorithms on
CRBMs, which is easily scalable to tens of thousands of visible and hidden
units. We demonstrate that, in both maximum likelihood and max-margin learning,
training conditional RBMs with BP as the inference routine can provide
significantly better results than current state-of-the-art CD methods on
structured prediction problems. We also include practical guidelines on
training CRBMs with BP, and some insights on the interaction of learning and
inference algorithms for CRBMs.

A Laplacian Framework for Option Discovery in Reinforcement Learning

Representation learning and option discovery are two of the biggest
challenges in reinforcement learning (RL). Proto-RL is a well known approach
for representation learning in MDPs. The representations learned with this
framework are called proto-value functions (PVFs). In this paper we address the
option discovery problem by showing how PVFs implicitly define options. We do
it by introducing eigenpurposes, intrinsic reward functions derived from the
learned representations. The options discovered from eigenpurposes traverse the
principal directions of the state space. They are useful for multiple tasks
because they are independent of the agents’ intentions. Moreover, by capturing
the diffusion process of a random walk, different options act at different time
scales, making them helpful for exploration strategies. We demonstrate features
of eigenpurposes in traditional tabular domains as well as in Atari 2600 games.

Controllable Text Generation

Generic generation and manipulation of text is challenging and has limited
success compared to recent deep generative modeling in visual domain. This
paper aims at generating plausible natural language sentences, whose attributes
are dynamically controlled by learning disentangled latent representations with
designated semantics. We propose a new neural generative model which combines
variational auto-encoders and holistic attribute discriminators for effective
imposition of semantic structures. With differentiable approximation to
discrete text samples, explicit constraints on independent attribute controls,
and efficient collaborative learning of generator and discriminators, our model
learns highly interpretable representations from even only word annotations,
and produces realistic sentences with desired attributes. Quantitative
evaluation validates the accuracy of sentence and attribute generation.

Machine Learning on Sequential Data Using a Recurrent Weighted Average

Jared Ostmeyer, Lindsay Cowell
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Recurrent Neural Networks (RNN) are a type of statistical model designed to
handle sequential data. The model reads a sequence one symbol at a time. Each
symbol is processed based on information collected from the previous symbols.
With existing RNN architectures, each symbol is processed using only
information from the previous processing step. To overcome this limitation, we
propose a new kind of RNN model that computes a recurrent weighted average
(RWA) over every past processing step. Because the RWA can be computed as a
running average, the computational overhead scales like that of any other RNN.
The approach essentially reformulates the attention mechanism into a
stand-alone model. When assessing a RWA model, it is found to train faster and
generalize better than a standard LSTM model when performing the variable copy
problem, the adding problem, classification of artificial grammar,
classification of sequences by length, and classification of MNIST handwritten
digits (where the pixels are read sequentially one at a time).

Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization

Alonso Marco, Felix Berkenkamp, Philipp Hennig, Angela P. Schoellig, Andreas Krause, Stefan Schaal, Sebastian Trimpe
Comments: 7 pages, 6 figures, to appear in IEEE 2017 International Conference on Robotics and Automation (ICRA)
Subjects: Robotics (cs.RO); Learning (cs.LG); Systems and Control (cs.SY)

In practice, the parameters of control policies are often tuned manually.
This is time-consuming and frustrating. Reinforcement learning is a promising
alternative that aims to automate this process, yet often requires too many
experiments to be practical. In this paper, we propose a solution to this
problem by exploiting prior knowledge from simulations, which are readily
available for most robotic platforms. Specifically, we extend Entropy Search, a
Bayesian optimization algorithm that maximizes information gain from each
experiment, to the case of multiple information sources. The result is a
principled way to automatically combine cheap, but inaccurate information from
simulations with expensive and accurate physical experiments in a
cost-effective manner. We apply the resulting method to a cart-pole system,
which confirms that the algorithm can find good control policies with fewer
experiments than standard Bayesian optimization on the physical system only.

Denoising Adversarial Autoencoders

Antonia Creswell, Anil Anthony Bharath
Comments: submitted to journal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Unsupervised learning is of growing interest because it unlocks the potential
held in vast amounts of unlabelled data to learn useful representations for
inference. Autoencoders, a form of generative model, may be trained by learning
to reconstruct unlabelled input data from a latent representation space. More
robust representations may be produced by an autoencoder if it learns to
recover clean input samples from corrupted ones. Representations may be further
improved by introducing regularisation during training to shape the
distribution of the encoded data in latent space. We suggest denoising
adversarial autoencoders, which combine denoising and regularisation, shaping
the distribution of latent space using adversarial training. We introduce a
novel analysis that shows how denoising may be incorporated into the training
and sampling of adversarial autoencoders. Experiments are performed to assess
the contributions that denoising makes to the learning of representations for
classification and sample synthesis. Our results suggest that autoencoders
trained using a denoising criterion achieve higher classification performance,
and can synthesise samples that are more consistent with the input data than
those trained without a corruption process.

On the Behavior of Convolutional Nets for Feature Extraction

Convolutional neural networks (CNN) are representation learning techniques
that achieve state-of-the-art performance on almost every image-related,
machine learning task. Applying the representation languages build by these
models to tasks beyond the one they were originally trained for is a field of
interest known as transfer learning for feature extraction. Through this
approach, one can apply the image descriptors learnt by a CNN after processing
millions of images to any dataset, without an expensive training phase.
Contributions to this field have so far focused on extracting CNN features from
layers close to the output (e.g., fully connected layers), particularly because
they work better when used out-of-the-box to feed a classifier. Nevertheless,
the rest of CNN features is known to encode a wide variety of visual
information, which could be potentially exploited on knowledge representation
and reasoning tasks. In this paper we analyze the behavior of each feature
individually, exploring their intra/inter class activations for all classes of
three different datasets. From this study we learn that low and middle level
features behave very differently to high level features, the former being more
descriptive and the latter being more discriminant. We show how low and middle
level features can be used for knowledge representation purposes both by their
presence or by their absence. We also study how much noise these features may
encode, and propose a thresholding approach to discard most of it. Finally, we
discuss the potential implications of these results in the context of knowledge
representation using features extracted from a CNN.

Differentially Private Bayesian Learning on Distributed Data

Mikko Heikkilä, Yusuke Okimoto, Samuel Kaski, Kana Shimizu, Antti Honkela
Comments: 12 pages, 11 figures
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG); Computation (stat.CO)

Many applications of machine learning, for example in health care, would
benefit from methods that can guarantee privacy of data subjects. Differential
privacy (DP) has become established as a standard for protecting learning
results, but the proposed algorithms require a single trusted party to have
access to the entire data, which is a clear weakness. We consider DP Bayesian
learning in a distributed setting, where each party only holds a single sample
or a few samples of the data. We propose a novel method for DP learning in this
distributed setting, based on a secure multi-party sum function for aggregating
summaries from the data holders. Each data holder adds their share of Gaussian
noise to make the total computation differentially private using the Gaussian
mechanism. We prove that the system can be made secure against a desired number
of colluding data owners and robust against faulting data owners. The method
builds on an asymptotically optimal and practically efficient DP Bayesian
inference with rapidly diminishing extra cost.

Adversarial Examples for Semantic Image Segmentation

Machine learning methods in general and Deep Neural Networks in particular
have shown to be vulnerable to adversarial perturbations. So far this
phenomenon has mainly been studied in the context of whole-image
classification. In this contribution, we analyse how adversarial perturbations
can affect the task of semantic segmentation. We show how existing adversarial
attackers can be transferred to this task and that it is possible to create
imperceptible adversarial perturbations that lead a deep network to misclassify
almost all pixels of a chosen class while leaving network prediction nearly
unchanged outside this class.

Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression

Jangwon Lee, Michael S. Ryoo
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

We design a new approach that allows robot learning of new activities from
unlabeled human example videos. Given videos of humans executing the same
activity from a human’s viewpoint (i.e., first-person videos), our objective is
to make the robot learn the temporal structure of the activity as its future
regression network, and learn to transfer such model for its own motor
execution. We present a new deep learning model: We extend the state-of-the-art
convolutional object detection network for the detection of human hands in
training videos based on image information, and newly introduce the concept of
using a fully convolutional network to regress (i.e., predict) the intermediate
scene representation corresponding to the future frame (e.g., 1-2 seconds
later). Combining these allows direct prediction of future locations of human
hands and objects, which enables the robot to infer the motor control plan
using our manipulation network. We experimentally confirm that our approach
makes learning of robot activities from unlabeled human interaction videos
possible, and demonstrate that our robot is able to execute the learned
collaborative activities in real-time directly based on its camera input.

Unsupervised Basis Function Adaptation for Reinforcement Learning

When using reinforcement learning (RL) algorithms to evaluate a policy it is
common, given a large state space, to introduce some form of approximation
architecture for the value function (VF). The exact form of this architecture
can have a significant effect on the accuracy of the VF estimate, however, and
determining a suitable approximation architecture can often be a highly complex
task. Consequently there is a large amount of interest in the potential for
allowing RL algorithms to adaptively generate approximation architectures.

We investigate a method of adapting approximation architectures which uses
feedback regarding the frequency with which an agent has visited certain states
to guide which areas of the state space to approximate with greater detail.
This method is “unsupervised” in the sense that it makes no direct reference to
reward or the VF estimate. We introduce an algorithm based upon this idea which
adapts a state aggregation approximation architecture on-line.

A common method of scoring a VF estimate is to weight the squared Bellman
error of each state-action by the probability of that state-action occurring.
Adopting this scoring method, and assuming (S) states, we demonstrate
theoretically that – provided (1) the number of cells (X) in the state
aggregation architecture is of order (sqrt{S}log_2{S}ln{S}) or greater, (2)
the policy and transition function are close to deterministic, and (3) the
prior for the transition function is uniformly distributed – our algorithm,
used in conjunction with a suitable RL algorithm, can guarantee a score which
is arbitrarily close to zero as (S) becomes large. It is able to do this
despite having only (O(X log_2S)) space complexity and negligible time
complexity. The results take advantage of certain properties of the stationary
distributions of Markov chains.

Co-Clustering for Multitask Learning

Keerthiram Murugesan, Jaime Carbonell, Yiming Yang
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

This paper presents a new multitask learning framework that learns a shared
representation among the tasks, incorporating both task and feature clusters.
The jointly-induced clusters yield a shared latent subspace where task
relationships are learned more effectively and more generally than in
state-of-the-art multitask learning methods. The proposed general framework
enables the derivation of more specific or restricted state-of-the-art
multitask methods. The paper also proposes a highly-scalable multitask learning
algorithm, based on the new framework, using conjugate gradient descent and
generalized extit{Sylvester equations}. Experimental results on synthetic and
benchmark datasets show that the proposed method systematically outperforms
several state-of-the-art multitask learning methods.

Compositional Falsification of Cyber-Physical Systems with Machine Learning Components

Tommaso Dreossi, Alexandre Donzé, Sanjit A. Seshia
Subjects: Systems and Control (cs.SY); Learning (cs.LG)

Cyber-physical systems (CPS), such as automotive systems, are starting to
include sophisticated machine learning (ML) components. Their correctness,
therefore, depends on properties of the inner ML modules. While learning
algorithms aim to generalize from examples, they are only as good as the
examples provided, and recent efforts have shown that they can produce
inconsistent output under small adversarial perturbations. This raises the
question: can the output from learning components can lead to a failure of the
entire CPS? In this work, we address this question by formulating it as a
problem of falsifying signal temporal logic (STL) specifications for CPS with
ML components. We propose a compositional falsification framework where a
temporal logic falsifier and a machine learning analyzer cooperate with the aim
of finding falsifying executions of the considered model. The efficacy of the
proposed technique is shown on an automatic emergency braking system model with
a perception component based on deep neural networks.

Self-Paced Multitask Learning with Shared Knowledge

Keerthiram Murugesan, Jaime Carbonell
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

This paper introduces self-paced task selection to multitask learning, where
instances from more closely related tasks are selected in a progression of
easier-to-harder tasks, to emulate an effective human education strategy but
applied to multitask machine learning. We develop the mathematical foundation
for the approach based on an iterative selection of the most appropriate task,
learning the task parameters, and updating the shared knowledge, optimizing a
new bi-convex loss function. This proposed method applies quite generally,
including to multitask feature learning, multitask learning with alternating
structure optimization and multitask manifold regularization learning. Results
show that in each of the above formulations self-paced (easier-to-harder) task
selection outperforms the baseline version of these methods in all the
experiments.

Information Theory

On the MISO Channel with Feedback: Can Infinitely Massive Antennas Achieve Infinite Capacity?

Jinyuan Chen
Subjects: Information Theory (cs.IT)

We consider communication over a multiple-input single-output (MISO) block
fading channel in the presence of an independent noiseless feedback link. We
assume that the transmitter and receiver have no prior knowledge of the channel
state realizations, but the transmitter and receiver can acquire the channel
state information (CSIT/CSIR) via downlink training and feedback. For this
channel, we show that increasing the number of transmit antennas to infinity
will not achieve an infinite capacity, for a finite channel coherence and a
finite input constraint on the second or fourth moment. This insight follows
from our new capacity bounds that hold for any linear and nonlinear coding
strategies, and any channel training schemes. In addition to the channel
capacity bounds, we also provide a characterization on the beamforming gain
that is also known as array gain or power gain, at the regime with large number
of antennas.

Downlink Cellular Network Analysis with LOS/NLOS Propagation and Elevated Base Stations

Italo Atzeni, Jesús Arnau, Marios Kountouris
Comments: Submitted to the IEEE for possible publication
Subjects: Information Theory (cs.IT)

In this paper, we investigate the downlink performance of dense cellular
networks with elevated base stations (BSs) using a channel model that
incorporates line-of-sight (LOS)/non-line-of-sight (NLOS) propagation in both
small-scale and large-scale fading. Modeling LOS fading with Nakagami-(m)
fading, we provide a unified framework based on stochastic geometry that
encompasses both closest and strongest BS association. Our study is
particularized to two distance-dependent LOS/NLOS models of practical interest.
Considering the effect of LOS propagation alone, we derive closed-form
expressions for the coverage probability with Nakagami-(m) fading, showing that
the performance for strongest BS association is the same as in the case of
Rayleigh fading, whereas for closest BS association it monotonically increases
with the shape parameter (m). Then, focusing on the effect of elevated BSs, we
show that network densification eventually leads to near-universal outage even
for moderately low BS densities: in particular, the maximum area spectral
efficiency is proportional to the inverse of the squared BS height.

On squares of cyclic codes

Ignacio Cascudo
Subjects: Information Theory (cs.IT)

The square (C^{*2}) of a linear error correcting code (C) is the linear code
spanned by the coordinate-wise products of every pair of (non-necessarily
distinct) words in (C). Squares of codes have gained attention for several
applications mainly in the area of cryptography, where typically one is
concerned about some of the parameters (dimension, minimum distance) of both
(C^{*2}) and (C). In this paper, squares of cyclic codes are considered.
General results on the minimum distance of the squares of cyclic codes are
obtained and constructions of cyclic codes (C) with relatively large dimension
of (C) and minimum distance of the square (C^{*2}) are discussed. In some
cases, the constructions lead to codes (C) such that both (C) and (C^{*2})
simultaneously have the largest possible minimum distances for their length and
dimensions.

Percentile Policies for Tracking of Markovian Random Processes with Asymmetric Cost and Observation

Parisa Mansourifard, Tara Javidi, Bhaskar Krishnamachari
Subjects: Information Theory (cs.IT)

Motivated by wide-ranging applications such as video delivery over networks
using Multiple Description Codes, congestion control, and inventory management,
we study the state-tracking of a Markovian random process with a known
transition matrix and a finite ordered state set. The decision-maker must
select a state as an action at each time step to minimize the total expected
cost. The decision-maker is faced with asymmetries both in cost and
observation: in case the selected state is less than the actual state of the
Markovian process, an under-utilization cost occurs and only partial
observation about the actual state is revealed; otherwise, the decision incurs
an over-utilization cost and reveals full information about the actual state.
We can formulate this problem as a Partially Observable Markov Decision Process
which can be expressed as a dynamic program based on the last full observed
state and the time of full observation. This formulation determines the
sequence of actions to be taken between any two consecutive full observations
of the actual state. However, this DP grows exponentially in the number of
states, with little hope for a computationally feasible solution. We present an
interesting class of computationally tractable policies with a percentile
structure. A generalization of binary search, this class of policies attempt at
any given time to reduce the uncertainty by a given percentage. Among all
percentile policies, we search for the one with the minimum expected cost. The
result of this search is a heuristic policy which we evaluate through numerical
simulations. We show that it outperforms the myopic policies and under some
conditions performs close to the optimal policies. Furthermore, we derive a
lower bound on the cost of the optimal policy which can be computed with low
complexity and give a measure for how close our heuristic policy is to the
optimal policy.

The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing

Zhihui Zhu, Qiuwei Li, Gongguo Tang, Michael B. Wakin
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

In this paper we characterize the optimization geometry of a matrix
factorization problem where we aim to find (n imes r) and (m imes r) matrices
(U) and (V) such that (UV^T) approximates a given matrix (X^star). We show
that the objective function of the matrix factorization problem has no spurious
local minima and obeys the strict saddle property not only for the
exact-parameterization case where (rank(X^star) = r), but also for the
over-parameterization case where (rank(X^star) < r) and under-parameterization
case where (rank(X^star) > r). These geometric properties imply that a number
of iterative optimization algorithms (such as gradient descent) converge to a
global solution with random initialization. For the exact-parameterization
case, we further show that the objective function satisfies the robust strict
saddle property, ensuring global convergence of many local search algorithms in
polynomial time. We extend the geometric analysis to the matrix sensing problem
with the factorization approach and prove that this global optimization
geometry is preserved as long as the measurement operator satisfies the
standard restricted isometry property.

Preserving Confidentiality in The Gaussian Broadcast Channel Using Compute-and-Forward

Parisa Babaheidarian, Somayeh Salimi, Panos Papadimitratos
Comments: 6 pages, accepted to CISS 2017
Subjects: Information Theory (cs.IT)

We study the transmission of confidential messages across a wireless
broadcast channel with K>2 receivers and K helpers. The goal is to transmit all
messages reliably to their intended receivers while keeping them confidential
from the unintended receivers. We design a codebook based on nested lattice
structure, cooperative jamming, lattice alignment, and i.i.d. coding. Moreover,
we exploit the asymmetric compute-and-forward decoding strategy to handle
finite SNR regimes. Unlike previous alignment schemes, our achievable rates are
attainable at any finite SNR value. Also, we show that our scheme achieves the
optimal sum secure degrees of freedom of 1 for the K-receiver Gaussian
broadcast channel with K confidential messages and K helpers.

Multicast Transmissions in Directional mmWave Communications

Alessandro Biason, Michele Zorzi
Comments: 8 pages, 5 figures, submitted to European Wireless (EW), Feb. 2017
Subjects: Information Theory (cs.IT)

Multicast transmissions have been widely analyzed in traditional networks as
a way to improve spectrum efficiency when multiple users are interested in the
same data. However, their application to mmWave communications has been studied
only marginally so far. The goal of this paper is to partially fill this gap by
investigating optimal and suboptimal multicast schemes for mmWave
communications with directional beams. In particular, we propose a Markov setup
to model the retransmission status of the unsuccessfully transmitted packets
and, because of the computational complexity of the optimal solution, we
introduce a suboptimal hierarchical optimization procedure, which is much
easier to derive. Finally, we numerically show that restricting the link to
unicast beams is strongly suboptimal, especially when many packets have to be
transmitted.

Sum-set Inequalities from Aligned Image Sets: Instruments for Robust GDoF Bounds

Arash Gholami Davoodi, Syed A. Jafar
Comments: 22 pages, 1 figure
Subjects: Information Theory (cs.IT)

We present sum-set inequalities specialized to the generalized degrees of
freedom (GDoF) framework. These are information theoretic lower bounds on the
entropy of bounded density linear combinations of discrete, power-limited
dependent random variables in terms of the joint entropies of arbitrary linear
combinations of new random variables that are obtained by power level
partitioning of the original random variables. The bounds are useful
instruments to obtain GDoF characterizations for wireless interference
networks, especially with multiple antenna nodes, subject to arbitrary channel
strength and channel uncertainty levels.

Symmetric Laplacians, Quantum Density Matrices and their Von-Neumann Entropy

David E. Simmons, Justin P. Coon, Animesh Datta
Subjects: Information Theory (cs.IT)

We show that the (normalized) symmetric Laplacian of a simple graph can be
obtained from the partial trace over a pure bipartite quantum state that
resides in a bipartite Hilbert space (one part corresponding to the vertices,
the other corresponding to the edges). This suggests an interpretation of the
symmetric Laplacian’s Von Neumann entropy as a measure of bipartite
entanglement present between the two parts of the state. We then study extreme
values for a connected graph’s generalized R’enyi-(p) entropy. Specifically,
we show that egin{enumerate}item the complete graph achieves maximum
entropy, item the (2)-regular graph:egin{enumerate} item achieves minimum
R’enyi-(2) entropy among all (k)-regular graphs, item is within (log 4/3) of
the minimum R’enyi-(2) entropy and (log4sqrt{2}/3) of the minimum Von
Neumann entropy among all connected graphs, item achieves a Von Neumann
entropy less than the star graph. end{enumerate} end{enumerate} Point ((2))
contrasts sharply with similar work applied to (normalized) combinatorial
Laplacians, where it has been shown that the star graph almost always achieves
minimum Von Neumann entropy. In this work we find that the star graph achieves
maximum entropy in the limit as the number of vertices grows without bound.

smallskip
oindent extbf{Keywords:} Symmetric; Laplacian; Quantum;
Entropy; Bounds; R’enyi.

Zero-Delay Source-Channel Coding with a One-Bit ADC Front End and Correlated Side Information at the Receiver

Morteza Varasteh, Borzoo Rassouli, Ozvaldo Simeone, Deniz Gunduz
Subjects: Information Theory (cs.IT)

Zero-delay transmission of a Gaussian source over an additive white Gaussian
noise (AWGN) channel is considered with a one-bit analog-to-digital converter
(ADC) front end and a correlated side information at the receiver. The design
of the optimal encoder and decoder is studied for two performance criteria,
namely, the mean squared error (MSE) distortion and the distortion outage
probability (DOP), under an average power constraint on the channel input. For
both criteria, necessary optimality conditions for the encoder and the decoder
are derived. Using these conditions, it is observed that the numerically
optimized encoder (NOE) under the MSE distortion criterion is periodic, and its
period increases with the correlation between the source and the receiver side
information. For the DOP, it is instead seen that the NOE mappings periodically
acquire positive and negative values, which decay to zero with increasing
source magnitude, and the interval over which the mapping takes non-zero
values, becomes wider with the correlation between the source and the side
information.

Good cyclic codes and the uncertainty principle

Shai Evra, Emmanuel Kowalski, Alexander Lubotzky
Comments: 19 pages
Subjects: Information Theory (cs.IT); Number Theory (math.NT); Representation Theory (math.RT)

A long standing problem in the area of error correcting codes asks whether
there exist good cyclic codes. Most of the known results point in the direction
of a negative answer.

The uncertainty principle is a classical result of harmonic analysis
asserting that given a non-zero function (f) on some abelian group, either (f)
or its Fourier transform (hat{f}) has large support.

In this note, we observe a connection between these two subjects. We point
out that even a weak version of the uncertainty principle for fields of
positive characteristic would imply that good cyclic codes do exist. We also
provide some heuristic arguments supporting that this is indeed the case.

Full-Duplex Operations in Wireless Powered Communication Networks

Hyungsik Ju, Yuro Lee, Tae-Joong Kim
Comments: 5 pages, 6 figures
Subjects: Information Theory (cs.IT)

In this paper, we study a wireless powered communication network (WPCN) in
which a hybrid access point (H-AP) and user equipments (UEs) all operate in
full-duplex (FD). We first propose a transceiver structure that enables FD
operation of UEs to receive energy in the downlink (DL) and transmit
information in the uplink (UL) simultaneously. We then provide an energy usage
model in the proposed UE transceiver accounting for energy leakage from
transmit chain to receive chain. It is shown that throughput of FD WPCN with
the proposed FD UEs can be maximized by optimally allocating UL transmission
time to UEs by solving a simple convex optimization problem. Simulation results
reveals that the use of the proposed FD UEs efficiently improves the throughput
of WPCN with practical self-interference cancellation capability at the H-AP.

Performance Analysis of a Hybrid Downlink-Uplink Cooperative NOMA Scheme

Zhiqiang Wei, Linglong Dai, Derrick Wing Kwan Ng, Jinhong Yuan
Comments: 7 pages, accepted for presentation at the IEEE VTC 2017 Spring, Sydney, Australia
Subjects: Information Theory (cs.IT)

This paper proposes a novel hybrid downlinkuplink cooperative NOMA
(HDU-CNOMA) scheme to achieve a better tradeoff between spectral efficiency and
signal reception reliability than the conventional cooperative NOMA schemes. In
particular, the proposed scheme enables the strong user to perform a
cooperative transmission and an interference-free uplink transmission
simultaneously during the cooperative phase, at the expense of a slightly
decrease in signal reception reliability at the weak user. We analyze the
outage probability, diversity order, and outage throughput of the proposed
scheme. Simulation results not only confirm the accuracy of the developed
analytical results, but also unveil the spectral efficiency gains achieved by
the proposed scheme over a baseline cooperative NOMA scheme and a
non-cooperative NOMA scheme.

Atomic Norm Minimization for Modal Analysis from Random and Compressed Samples

Shuang Li, Dehui Yang, Gongguo Tang, Michael B. Wakin
Subjects: Information Theory (cs.IT)

Modal analysis is the process of estimating a system’s modal parameters such
as its natural frequencies and mode shapes. One application of modal analysis
is in structural health monitoring (SHM), where a network of sensors may be
used to collect vibration data from a physical structure such as a building or
bridge. There is a growing interest in developing automated techniques for SHM
based on data collected in a wireless sensor network. In order to conserve
power and extend battery life, however, it is desirable to minimize the amount
of data that must be collected and transmitted in such a sensor network. In
this paper, we highlight the fact that modal analysis can be formulated as an
atomic norm minimization (ANM) problem, which can be solved efficiently and in
some cases recover perfectly a structure’s mode shapes and frequencies. We
survey a broad class of sampling and compression strategies that one might
consider in a physical sensor network, and we provide bounds on the sample
complexity of these compressive schemes in order to recover a structure’s mode
shapes and frequencies via ANM. A main contribution of our paper is to
establish a bound on the sample complexity of modal analysis with random
temporal compression, and in this scenario we prove that the samples per sensor
can actually decrease as the number of sensors increases. We also extend an
atomic norm denoising problem to the multiple measurement vector (MMV) setting
in the case of uniform sampling.

A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Kishori M. Konwar, N. Prakash, Nancy Lynch, Muriel Medard
Comments: 22 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

Motivated by emerging applications to the edge computing paradigm, we
introduce a two-layer erasure-coded fault-tolerant distributed storage system
offering atomic access for read and write operations. In edge computing,
clients interact with an edge-layer of servers that is geographically near; the
edge-layer in turn interacts with a back-end layer of servers. The edge-layer
provides low latency access and temporary storage for client operations, and
uses the back-end layer for persistent storage. Our algorithm, termed Layered
Data Storage (LDS) algorithm, offers several features suitable for
edge-computing systems, works under asynchronous message-passing environments,
supports multiple readers and writers, and can tolerate (f_1 < n_1/2) and (f_2
< n_2/3) crash failures in the two layers having (n_1) and (n_2) servers,
respectively. We use a class of erasure codes known as regenerating codes for
storage of data in the back-end layer. The choice of regenerating codes,
instead of popular choices like Reed-Solomon codes, not only optimizes the cost
of back-end storage, but also helps in optimizing communication cost of read
operations, when the value needs to be recreated all the way from the back-end.
The two-layer architecture permits a modular implementation of atomicity and
erasure-code protocols; the implementation of erasure-codes is mostly limited
to interaction between the two layers. We prove liveness and atomicity of LDS,
and also compute performance costs associated with read and write operations.
Further, in a multi-object system running (N) independent instances of LDS,
where only a small fraction of the objects undergo concurrent accesses at any
point during the execution, the overall storage cost is dominated by that of
persistent storage in the back-end layer, and is given by (Theta(N)).