IT博客汇 | arXiv Paper Daily: Wed, 22 Feb 2017

arXiv Paper Daily: Wed, 22 Feb 2017

我爱机器学习(52ml.net)发表于 2017-02-22 00:00:00

Neural and Evolutionary Computing

Online Representation Learning with Multi-layer Hebbian Networks for Image Classification Tasks

Yanis Bahroun, Andrea Soltoggio
Comments: 8 pages, 6 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

Unsupervised learning allows algorithms to adapt to different data thanks to
the autonomous discovery of discriminating features during the training. When
these algorithms are reducible to cost-function minimisation, better
interpretations of their learning dynamics are possible. Recently, new
Hebbian-like plasticity, bio-inspired, local and unsupervised learning rules
for neural networks, have been shown to minimise a cost-function while
performing online sparse representation learning. However, it is unclear to
what degree such rules are effective to learn features from images. To
investigate this point, this study introduces a novel multi-layer Hebbian
network trained by a rule derived from a non-negative classical
multidimensional scaling cost-function. The performance is compared to that of
other fully unsupervised learning algorithms.

Predicting non-linear dynamics: a stable local learning scheme for recurrent spiking neural networks

Aditya Gilra, Wulfram Gerstner
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

Brains need to predict how our muscles and body react to motor commands. How
networks of spiking neurons can learn to reproduce these non-linear dynamics,
using local, online and stable learning rules, is an important, open question.
Here, we present a supervised learning scheme for the feedforward and recurrent
connections in a network of heterogeneous spiking neurons. The error in the
output is fed back through fixed random connections with a negative gain,
causing the network to follow the desired dynamics, while an online and local
rule changes the weights; hence we call the scheme FOLLOW (Feedback-based
Online Local Learning Of Weights) The rule is local in the sense that weight
changes depend on the presynaptic activity and the error signal projected onto
the post-synaptic neuron. We provide examples of learning linear, non-linear
and chaotic dynamics, as well as the dynamics of a two-link arm. Using the
Lyapunov method, and under reasonable assumptions and approximations, we show
that FOLLOW learning is uniformly stable, with the error going to zero
asymptotically.

Survey of Reasoning using Neural networks

Amit Sahu
Comments: 12 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Reason and inference require process as well as memory skills by humans.
Neural networks are able to process tasks like image recognition (better than
humans) but in memory aspects are still limited (by attention mechanism, size).
Recurrent Neural Network (RNN) and it’s modified version LSTM are able to solve
small memory contexts, but as context becomes larger than a threshold, it is
difficult to use them. The Solution is to use large external memory. Still, it
poses many challenges like, how to train neural networks for discrete memory
representation, how to describe long term dependencies in sequential data etc.
Most prominent neural architectures for such tasks are Memory networks:
inference components combined with long term memory and Neural Turing Machines:
neural networks using external memory resources. Also, additional techniques
like attention mechanism, end to end gradient descent on discrete memory
representation are needed to support these solutions. Preliminary results of
above neural architectures on simple algorithms (sorting, copying) and Question
Answering (based on story, dialogs) application are comparable with the state
of the art. In this paper, I explain these architectures (in general), the
additional techniques used and the results of their application.

Computer Vision and Pattern Recognition

VidLoc: 6-DoF Video-Clip Relocalization

Ronald Clark, Sen Wang, Andrew Markham, Niki Trigoni, Hongkai Wen
Comments: Preliminary version
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Machine learning techniques, namely convolutional neural networks (CNN) and
regression forests, have recently shown great promise in performing 6-DoF
localization of monocular images. However, in most cases image-sequences,
rather only single images, are readily available. To this extent, none of the
proposed learning-based approaches exploit the valuable constraint of temporal
smoothness, often leading to situations where the per-frame error is larger
than the camera motion. In this paper we propose a recurrent model for
performing 6-DoF localization of video-clips. We find that, even by considering
only short sequences (20 frames), the pose estimates are smoothed and the
localization error can be drastically reduced. Finally, we consider means of
obtaining probabilistic pose estimates from our model. We evaluate our method
on openly-available real-world autonomous driving and indoor localization
datasets.

PixelNet: Representation of the pixels, by the pixels, and for the pixels

Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta. Deva Ramanan
Comments: Project Page: this http URL arXiv admin note: substantial text overlap with arXiv:1609.06694
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

We explore design principles for general pixel-level prediction problems,
from low-level edge detection to mid-level surface normal estimation to
high-level semantic segmentation. Convolutional predictors, such as the
fully-convolutional network (FCN), have achieved remarkable success by
exploiting the spatial redundancy of neighboring pixels through convolutional
processing. Though computationally efficient, we point out that such approaches
are not statistically efficient during learning precisely because spatial
redundancy limits the information learned from neighboring pixels. We
demonstrate that stratified sampling of pixels allows one to (1) add diversity
during batch updates, speeding up learning; (2) explore complex nonlinear
predictors, improving accuracy; and (3) efficiently train state-of-the-art
models tabula rasa (i.e., “from scratch”) for diverse pixel-labeling tasks. Our
single architecture produces state-of-the-art results for semantic segmentation
on PASCAL-Context dataset, surface normal estimation on NYUDv2 depth dataset,
and edge detection on BSDS.

Crowd Sourcing Image Segmentation with iaSTAPLE

Dmitrij Schlesinger, Florian Jug, Gene Myers, Carsten Rother, Dagmar Kainmüller
Comments: Accepted to ISBI2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a novel label fusion technique as well as a crowdsourcing protocol
to efficiently obtain accurate epithelial cell segmentations from non-expert
crowd workers. Our label fusion technique simultaneously estimates the true
segmentation, the performance levels of individual crowd workers, and an image
segmentation model in the form of a pairwise Markov random field. We term our
approach image-aware STAPLE (iaSTAPLE) since our image segmentation model
seamlessly integrates into the well-known and widely used STAPLE approach. In
an evaluation on a light microscopy dataset containing more than 5000 membrane
labeled epithelial cells of a fly wing, we show that iaSTAPLE outperforms
STAPLE in terms of segmentation accuracy as well as in terms of the accuracy of
estimated crowd worker performance levels, and is able to correctly segment 99%
of all cells when compared to expert segmentations. These results show that
iaSTAPLE is a highly useful tool for crowd sourcing image segmentation.

Traffic Surveillance Camera Calibration by 3D Model Bounding Box Alignment for Accurate Vehicle Speed Measurement

Jakub Sochor, Roman Juránek, Adam Herout
Comments: under review in CVIU
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we focus on fully automatic traffic surveillance camera
calibration which we use for speed measurement of passing vehicles. We improve
over a recent state-of-the-art camera calibration method for traffic
surveillance based on two detected vanishing points. More importantly, we
propose a novel automatic scene scale inference based on matching bounding
boxes of rendered 3D models of vehicles with detected bounding boxes in the
image. The proposed method can be used from an arbitrary viewpoint and it has
no constraints on camera placement. We evaluate our method on recent
comprehensive dataset for speed measurement BrnoCompSpeed. Experiments show
that our automatic camera calibration by detected two vanishing points method
reduces the error by 50% compared to the previous state-of-the-art method. We
also show that our scene scale inference method is much more precise (mean
speed measurement error 1.10km/h) outperforming both state of the art automatic
calibration method (error reduction by 86% — mean error 7.98km/h) and manual
calibration (error reduction by 19% — mean error 1.35km/h). We also present
qualitative results of automatic camera calibration method on video sequences
obtained from real surveillance cameras on various places and under different
lighting conditions (night, dawn, day).

BrnoCompSpeed: Review of Traffic Camera Calibration and Comprehensive Dataset for Monocular Speed Measurement

Jakub Sochor, Roman Juránek, Jakub Špaňhel, Lukáš Maršík, Adam Široký, Adam Herout, Pavel Zemčík
Comments: under review in IEEE T-ITS
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we focus on traffic camera calibration and visual speed
measurement from a single monocular camera, which is an important task of
visual traffic surveillance. Existing methods addressing this problem are hard
to compare due to lack of a common dataset with reliable ground truth.
Therefore, it is not clear how the methods compare in various aspects and what
are the factors affecting their performance. We captured a new dataset of 18
full-HD videos, each around one hour long, captured at 6 different locations.
Vehicles in videos (20,865 instances in total) are annotated with precise speed
measurements from optical gates using LIDAR and verified with several reference
GPS tracks. We provide the videos and metadata (calibration, lengths of
features in image, annotations, etc.) for future comparison and evaluation.
Camera calibration is the most crucial part of the speed measurement;
therefore, we provide a review of the methods and analyze a recently published
method for fully automatic camera calibration and vehicle speed measurement and
report the results on this dataset in detail.

A Discriminative Event Based Model for Alzheimer's Disease Progression Modeling

Vikram Venkatraghavan, Esther Bron, Wiro Niessen, Stefan Klein
Comments: Information Processing in Medical Imaging (IPMI), 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

The event-based model (EBM) for data-driven disease progression modeling
estimates the sequence in which biomarkers for a disease become abnormal. This
helps in understanding the dynamics of disease progression and facilitates
early diagnosis by staging patients on a disease progression timeline. Existing
EBM methods are all generative in nature. In this work we propose a novel
discriminative approach to EBM, which is shown to be more accurate as well as
computationally more efficient than existing state-of-the art EBM methods. The
method first estimates for each subject an approximate ordering of events, by
ranking the posterior probabilities of individual biomarkers being abnormal.
Subsequently, the central ordering over all subjects is estimated by fitting a
generalized Mallows model to these approximate subject-specific orderings based
on a novel probabilistic Kendall’s Tau distance. To evaluate the accuracy, we
performed extensive experiments on synthetic data simulating the progression of
Alzheimer’s disease. Subsequently, the method was applied to the Alzheimer’s
Disease Neuroimaging Initiative (ADNI) data to estimate the central event
ordering in the dataset. The experiments benchmark the accuracy of the new
model under various conditions and compare it with existing state-of-the-art
EBM methods. The results indicate that discriminative EBM could be a simple and
elegant approach to disease progression modeling.

Fast Resampling of 3D Point Clouds via Graphs

Siheng Chen, Dong Tian, Chen Feng, Anthony Vetro, Jelena Kovačević
Subjects: Computer Vision and Pattern Recognition (cs.CV)

To reduce cost in storing, processing and visualizing a large-scale point
cloud, we consider a randomized resampling strategy to select a representative
subset of points while preserving application-dependent features. The proposed
strategy is based on graphs, which can represent underlying surfaces and lend
themselves well to efficient computation. We use a general feature-extraction
operator to represent application-dependent features and propose a general
reconstruction error to evaluate the quality of resampling. We obtain a general
form of optimal resampling distribution by minimizing the reconstruction error.
The proposed optimal resampling distribution is guaranteed to be shift,
rotation and scale-invariant in the 3D space. We next specify the
feature-extraction operator to be a graph filter and study specific resampling
strategies based on all-pass, low-pass, high-pass graph filtering and graph
filter banks. We finally apply the proposed methods to three applications:
large-scale visualization, accurate registration and robust shape modeling. The
empirical performance validates the effectiveness and efficiency of the
proposed resampling methods.

Mimicking Ensemble Learning with Deep Branched Networks

Byungju Kim, Youngsoo Kim, Yeakang Lee, Junmo Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a branched residual network for image classification. It
is known that high-level features of deep neural network are more
representative than lower-level features. By sharing the low-level features,
the network can allocate more memory to high-level features. The upper layers
of our proposed network are branched, so that it mimics the ensemble learning.
By mimicking ensemble learning with single network, we have achieved better
performance on ImageNet classification task.

Object Detection in Videos with Tubelet Proposal Networks

Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Object detection in videos has drawn increasing attention recently with the
introduction of the large-scale ImageNet VID dataset. Different from object
detection in static images, temporal information in videos provides vital
information for object detection. To fully utilize temporal information,
state-of-the-art methods are therefore based on spatiotemporal tubelets, which
are essentially sequences of associated bounding boxes across time. However,
the existing methods have major limitations in generating tubelets in terms of
quality and efficiency. Motion-based methods are able to obtain dense tubelets,
but the lengths are generally only several frames, which is not optimal to
incorporate long-term temporal information. Appearance-based methods, usually
involving generic object tracking, could generate long tubelets, but are
usually computational expensive. In this work, we propose a framework for
object detection in videos, which consists of a novel tubelet proposal network
to efficiently generate spatiotemporal proposals, and a Long Short-term Memory
(LSTM) network that incorporates temporal information from tubelet proposals
for achieving high object detection accuracy in videos. The experiments on the
large-scale ImageNet VID dataset demonstrate the effectiveness of the proposed
framework for object detection in videos.

Just DIAL: DomaIn Alignment Layers for Unsupervised Domain Adaptation

Fabio Maria Carlucci, Lorenzo Porzi, Barbara Caputo, Elisa Ricci, Samuel Rota Bulò
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The empirical fact that classifiers, trained on given data collections,
perform poorly when tested on data acquired in different settings is
theoretically explained in domain adaptation through a shift among
distributions of the source and target domains. Alleviating the domain shift
problem, especially in the challenging setting where no labeled data are
available for the target domain, is paramount for having visual recognition
systems working in the wild. As the problem stems from a shift among
distributions, intuitively one should try to align them. In the literature,
this has resulted in a stream of works attempting to align the feature
representations learned from the source and target domains. Here we take a
different route. Rather than introducing regularization terms aiming to promote
the alignment of the two representations, we act at the distribution level
through the introduction of emph{DomaIn Alignment Layers} (DIAL), able to
match the observed source and target data distributions to a reference one.
Thorough experiments on three different public benchmarks we confirm the power
of our approach.

Learning Compact Appearance Representation for Video-based Person Re-Identification

Wei Zhang, Shengnan Hu, Kan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel approach for video-based person re-identification
using multiple Convolutional Neural Networks (CNNs). Unlike previous work, we
intend to extract a compact yet discriminative appearance representation from
several frames rather than the whole sequence. Specifically, given a video, the
representative frames are selected based on the walking profile of consecutive
frames. A multiple CNN architecture incorporated with feature pooling is
proposed to learn and compile the features of the selected representative
frames into a compact description about the pedestrian for identification.
Experiments are conducted on benchmark datasets to demonstrate the superiority
of the proposed method over existing person re-identification approaches.

Visual Tracking by Reinforced Decision Making

Janghoon Choi, Junseok Kwon, Kyoung Mu Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)

One of the major challenges of model-free visual tracking problem has been
the difficulty originating from the unpredictable and drastic changes in the
appearance of objects we target to track. Existing methods tackle this problem
by updating the appearance model on-line in order to adapt to the changes in
the appearance. Despite the success of these methods however, inaccurate and
erroneous updates of the appearance model result in a tracker drift. In this
paper, we introduce a novel visual tracking algorithm based on a template
selection strategy constructed by deep reinforcement learning methods. The
tracking algorithm utilizes this strategy to choose the best template for
tracking a given frame. The template selection strategy is self-learned by
utilizing a simple policy gradient method on numerous training episodes
randomly generated from a tracking benchmark dataset. Our proposed
reinforcement learning framework is generally applicable to other confidence
map based tracking algorithms. The experiment shows that our tracking algorithm
effectively decides the best template for visual tracking.

Weighted Motion Averaging for the Registration of Multi-View Range Scans

Rui Guo, Jihua Zhu, Yaochen Li, Zhongyu Li, Dapeng Chen, Yongqin Zhang
Comments: 9 pages, 6 figures, 2tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-view registration is a fundamental but challenging problem in 3D
reconstruction and robot vision. Although the original motion averaging
algorithm has been introduced as an effective means to solve the multi-view
registration problem, it does not consider the reliability and accuracy of each
relative motion. Accordingly, this paper proposes a novel motion averaging
algorithm for multi-view registration. Firstly, it utilizes the pair-wise
registration algorithm to estimate the relative motion and overlapping
percentage of each scan pair with a certain degree of overlap. With the
overlapping percentage available, it views the overlapping percentage as the
corresponding weight of each scan pair and proposes the weight motion averaging
algorithm, which can pay more attention to reliable and accurate relative
motions. By treating each relative motion distinctively, more accurate
registration can be achieved by applying the weighted motion averaging to
multi-view range scans. Experimental results demonstrate the superiority of our
proposed approach compared with the state-of-the-art methods in terms of
accuracy, robustness and efficiency.

The Power of Sparsity in Convolutional Neural Networks

Soravit Changpinyo, Mark Sandler, Andrey Zhmoginov
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep convolutional networks are well-known for their high computational and
memory demands. Given limited resources, how does one design a network that
balances its size, training time, and prediction accuracy? A surprisingly
effective approach to trade accuracy for size and speed is to simply reduce the
number of channels in each convolutional layer by a fixed fraction and retrain
the network. In many cases this leads to significantly smaller networks with
only minimal changes to accuracy. In this paper, we take a step further by
empirically examining a strategy for deactivating connections between filters
in convolutional layers in a way that allows us to harvest savings both in
run-time and memory for many network architectures. More specifically, we
generalize 2D convolution to use a channel-wise sparse connection structure and
show that this leads to significantly better results than the baseline approach
for large networks including VGG and Inception V3.

Learning to Generate Posters of Scientific Papers by Probabilistic Graphical Models

Yu-ting Qiang, Yanwei Fu, Xiao Yu, Yanwen Guo, Zhi-Hua Zhou, Leonid Sigal
Comments: 10 pages, submission to IEEE TPAMI. arXiv admin note: text overlap with arXiv:1604.01219
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

Researchers often summarize their work in the form of scientific posters.
Posters provide a coherent and efficient way to convey core ideas expressed in
scientific papers. Generating a good scientific poster, however, is a complex
and time consuming cognitive task, since such posters need to be readable,
informative, and visually aesthetic. In this paper, for the first time, we
study the challenging problem of learning to generate posters from scientific
papers. To this end, a data-driven framework, that utilizes graphical models,
is proposed. Specifically, given content to display, the key elements of a good
poster, including attributes of each panel and arrangements of graphical
elements are learned and inferred from data. During the inference stage, an MAP
inference framework is employed to incorporate some design principles. In order
to bridge the gap between panel attributes and the composition within each
panel, we also propose a recursive page splitting algorithm to generate the
panel layout for a poster. To learn and validate our model, we collect and
release a new benchmark dataset, called NJU-Fudan Paper-Poster dataset, which
consists of scientific papers and corresponding posters with exhaustively
labelled panels and attributes. Qualitative and quantitative results indicate
the effectiveness of our approach.

Efficient Dense Labeling of Human Activity Sequences from Wearables using Fully Convolutional Networks

Rui Yao, Guosheng Lin, Qinfeng Shi, Damith Ranasinghe
Comments: 7 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

Recognizing human activities in a sequence is a challenging area of research
in ubiquitous computing. Most approaches use a fixed size sliding window over
consecutive samples to extract features—either handcrafted or learned
features—and predict a single label for all samples in the window. Two key
problems emanate from this approach: i) the samples in one window may not
always share the same label. Consequently, using one label for all samples
within a window inevitably lead to loss of information; ii) the testing phase
is constrained by the window size selected during training while the best
window size is difficult to tune in practice. We propose an efficient algorithm
that can predict the label of each sample, which we call dense labeling, in a
sequence of human activities of arbitrary length using a fully convolutional
network. In particular, our approach overcomes the problems posed by the
sliding window step. Additionally, our algorithm learns both the features and
classifier automatically. We release a new daily activity dataset based on a
wearable sensor with hospitalized patients. We conduct extensive experiments
and demonstrate that our proposed approach is able to outperform the
state-of-the-arts in terms of classification and label misalignment measures on
three challenging datasets: Opportunity, Hand Gesture, and our new dataset.

Forest understory trees revealed using sufficiently dense airborne laser scanning point clouds

Hamid Hamraz, Marco A. Contreras, Jun Zhang
Comments: arXiv admin note: text overlap with arXiv:1701.00169
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG)

Airborne laser scanning (lidar) point clouds can be process to extract
tree-level information over large forested landscapes. Existing procedures
typically detect more than 90% of overstory trees, yet they barely detect 60%
of understory trees because of reduced number of lidar points penetrating the
top canopy layer. Although understory trees provide limited financial value,
they offer habitat for numerous wildlife species and are important for stand
development. Here we model tree identification accuracy according to point
cloud density by decomposing lidar point cloud into overstory and multiple
understory canopy layers, estimating the fraction of points representing the
different layers, and inspecting tree identification accuracy as a function of
point density. We show at a density of about 170 pt/m2 understory tree
identification accuracy likely plateaus, which we regard as the required point
density for reasonable identification of understory trees. Given the
advancements of lidar sensor technology, point clouds can feasibly reach the
required density to enable effective identification of individual understory
trees, ultimately making remote quantification of forest resources more
accurate. The layer decomposition methodology can also be adopted for other
similar remote sensing or advanced imaging applications such as geological
subsurface modelling or biomedical tissue analysis.

Developing a comprehensive framework for multimodal feature extraction

Quinten McNamara, Alejandro de la Vega, Tal Yarkoni
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG); Multimedia (cs.MM)

Feature extraction is a critical component of many applied data science
workflows. In recent years, rapid advances in artificial intelligence and
machine learning have led to an explosion of feature extraction tools and
services that allow data scientists to cheaply and effectively annotate their
data along a vast array of dimensions—ranging from detecting faces in images
to analyzing the sentiment expressed in coherent text. Unfortunately, the
proliferation of powerful feature extraction services has been mirrored by a
corresponding expansion in the number of distinct interfaces to feature
extraction services. In a world where nearly every new service has its own API,
documentation, and/or client library, data scientists who need to combine
diverse features obtained from multiple sources are often forced to write and
maintain ever more elaborate feature extraction pipelines. To address this
challenge, we introduce a new open-source framework for comprehensive
multimodal feature extraction. Pliers is an open-source Python package that
supports standardized annotation of diverse data types (video, images, audio,
and text), and is expressly with both ease-of-use and extensibility in mind.
Users can apply a wide range of pre-existing feature extraction tools to their
data in just a few lines of Python code, and can also easily add their own
custom extractors by writing modular classes. A graph-based API enables rapid
development of complex feature extraction pipelines that output results in a
single, standardized format. We describe the package’s architecture, detail its
major advantages over previous feature extraction toolboxes, and use a sample
application to a large functional MRI dataset to illustrate how pliers can
significantly reduce the time and effort required to construct sophisticated
feature extraction workflows while increasing code clarity and maintainability.

Derivative Based Focal Plane Array Nonuniformity Correction

G. Ness, A. Oved, I. Kakon
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a fast and robust method for fixed pattern noise
nonuniformity correction of infrared focal plane arrays. The proposed method
requires neither shutter nor elaborate calibrations and therefore enables a
real time correction with no interruptions. Based on derivative estimation of
the fixed pattern noise from pixel sized translations of the focal plane array,
the proposed method has the advantages of being invariant to the noise
magnitude and robust to unknown camera and inter-scene movements while being
virtually transparent to the end-user.

Online Representation Learning with Multi-layer Hebbian Networks for Image Classification Tasks

Yanis Bahroun, Andrea Soltoggio
Comments: 8 pages, 6 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

Unsupervised learning allows algorithms to adapt to different data thanks to
the autonomous discovery of discriminating features during the training. When
these algorithms are reducible to cost-function minimisation, better
interpretations of their learning dynamics are possible. Recently, new
Hebbian-like plasticity, bio-inspired, local and unsupervised learning rules
for neural networks, have been shown to minimise a cost-function while
performing online sparse representation learning. However, it is unclear to
what degree such rules are effective to learn features from images. To
investigate this point, this study introduces a novel multi-layer Hebbian
network trained by a rule derived from a non-negative classical
multidimensional scaling cost-function. The performance is compared to that of
other fully unsupervised learning algorithms.

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

FPGA-based hardware accelerators for convolutional neural networks (CNNs)
have obtained great attentions due to their higher energy efficiency than GPUs.
However, it is challenging for FPGA-based solutions to achieve a higher
throughput than GPU counterparts. In this paper, we demonstrate that FPGA
acceleration can be a superior solution in terms of both throughput and energy
efficiency when a CNN is trained with binary constraints on weights and
activations. Specifically, we propose an optimized accelerator architecture
tailored for bitwise convolution and normalization that features massive
spatial parallelism with deep pipelines stages. Experiment results show that
the proposed architecture is 8.3x faster and 75x more energy-efficient than a
Titan X GPU for processing online individual requests (in small batch size).
For processing static data (in large batch size), the proposed solution is on a
par with a Titan X GPU in terms of throughput while delivering 9.5x higher
energy efficiency.

Deep Geometric Retrieval

Y Qian, E Vazquez, B Sengupta
Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV)

Comparing images in order to recommend items from an image-inventory is a
subject of continued interest. Added with the scalability of deep-learning
architectures the once `manual’ job of hand-crafting features have been largely
alleviated, and images can be compared according to features generated from a
deep convolutional neural network. In this paper, we compare distance metrics
(and divergences) to rank features generated from a neural network, for
content-based image retrieval. Specifically, after modelling individual images
using approximations of mixture models or sparse covariance estimators we
resort to their information-theoretic and Riemann geometric comparisons. We
show that using approximations of mixture models enable us to to compute a
distance measure based on the Wasserstein metric that requires less effort than
computationally intensive optimal transport plans; finally, an affine invariant
metric is used to compare the optimal transport metric to its Riemann geometric
counterpart — we conclude that although expensive, retrieval metric based on
Wasserstein geometry are more suitable than information theoretic comparison of
images. In short, we combine GPU scalability in learning deep feature vectors
with computationally efficient metrics that we foresee being utilized in a
commercial setting.

Is Saki #delicious? The Food Perception Gap on Instagram and Its Relation to Health

Ferda Ofli, Yusuf Aytar, Ingmar Weber, Raggi al Hammouri, Antonio Torralba
Comments: This is a pre-print of our paper accepted to appear in the Proceedings of 2017 International World Wide Web Conference (WWW’17)
Subjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

Food is an integral part of our life and what and how much we eat crucially
affects our health. Our food choices largely depend on how we perceive certain
characteristics of food, such as whether it is healthy, delicious or if it
qualifies as a salad. But these perceptions differ from person to person and
one person’s “single lettuce leaf” might be another person’s “side salad”.
Studying how food is perceived in relation to what it actually is typically
involves a laboratory setup. Here we propose to use recent advances in image
recognition to tackle this problem. Concretely, we use data for 1.9 million
images from Instagram from the US to look at systematic differences in how a
machine would objectively label an image compared to how a human subjectively
does. We show that this difference, which we call the “perception gap”, relates
to a number of health outcomes observed at the county level. To the best of our
knowledge, this is the first time that image recognition is being used to study
the “misalignment” of how people describe food images vs. what they actually
depict.

Projection based advanced motion model for cubic mapping for 360-degree video

Li Li, Zhu Li, Madhukar Budagavi, Houqiang Li
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel advanced motion model to handle the irregular
motion for the cubic map projection of 360-degree video. Since the irregular
motion is mainly caused by the projection from the sphere to the cube map, we
first try to project the pixels in both the current picture and reference
picture from unfolding cube back to the sphere. Then through utilizing the
characteristic that most of the motions in the sphere are uniform, we can
derive the relationship between the motion vectors of various pixels in the
unfold cube. The proposed advanced motion model is implemented in the High
Efficiency Video Coding reference software. Experimental results demonstrate
that quite obvious performance improvement can be achieved for the sequences
with obvious motions.

Artificial Intelligence

Delving Deeper into MOOC Student Dropout Prediction

Jacob Whitehill, Kiran Mohan, Daniel Seaton, Yigal Rosen, Dustin Tingley
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

In order to obtain reliable accuracy estimates for automatic MOOC dropout
predictors, it is important to train and test them in a manner consistent with
how they will be used in practice. Yet most prior research on MOOC dropout
prediction has measured test accuracy on the same course used for training the
classifier, which can lead to overly optimistic accuracy estimates. In order to
understand better how accuracy is affected by the training+testing regime, we
compared the accuracy of a standard dropout prediction architecture
(clickstream features + logistic regression) across 4 different training
paradigms. Results suggest that (1) training and testing on the same course
(“post-hoc”) can overestimate accuracy by several percentage points; (2)
dropout classifiers trained on proxy labels based on students’ persistence are
surprisingly competitive with post-hoc training (87.33% versus 90.20% AUC
averaged over 8 weeks of 40 HarvardX MOOCs); and (3) classifier performance
does not vary significantly with the academic discipline. Finally, we also
research new dropout prediction architectures based on deep, fully-connected,
feed-forward neural networks and find that (4) networks with as many as 5
hidden layers can statistically significantly increase test accuracy over that
of logistic regression.

Towards a Common Implementation of Reinforcement Learning for Multiple Robotic Tasks

Angel Martínez-Tenor, Juan Antonio Fernández-Madrigal, Ana Cruz-Martín, Javier González-Jiménez
Comments: 15 pages, 10 figures, 7 tables. To be published in a scientific journal
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

Mobile robots are increasingly being employed for performing complex tasks in
dynamic environments. Reinforcement learning (RL) methods are recognized to be
promising for specifying such tasks in a relatively simple manner. However, the
strong dependency between the learning method and the task to learn is a
well-known problem that restricts practical implementations of RL in robotics,
often requiring major modifications of parameters and adding other techniques
for each particular task. In this paper we present a practical core
implementation of RL which enables the learning process for multiple robotic
tasks with minimal per-task tuning or none. Based on value iteration methods,
this implementation includes a novel approach for action selection, called
Q-biased softmax regression (QBIASSR), which avoids poor performance of the
learning process when the robot reaches new unexplored states. Our approach
takes advantage of the structure of the state space by attending the physical
variables involved (e.g., distances to obstacles, X,Y,{ heta} pose, etc.),
thus experienced sets of states may favor the decision-making process of
unexplored or rarely-explored states. This improvement has a relevant role in
reducing the tuning of the algorithm for particular tasks. Experiments with
real and simulated robots, performed with the software framework also
introduced here, show that our implementation is effectively able to learn
different robotic tasks without tuning the learning method. Results also
suggest that the combination of true online SARSA({lambda}) with QBIASSR can
outperform the existing RL core algorithms in low-dimensional robotic tasks.

Sample Efficient Policy Search for Optimal Stopping Domains

Karan Goel, Christoph Dann, Emma Brunskill
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Arising naturally in many fields, optimal stopping problems consider the
question of deciding when to stop an observation-generating process. We examine
the problem of simultaneously learning and planning in such domains, when data
is collected directly from the environment. We propose GFSE, a simple and
flexible model-free policy search method that reuses data for sample efficiency
by leveraging problem structure. We bound the sample complexity of our approach
to guarantee uniform convergence of policy value estimates, tightening existing
PAC bounds to achieve logarithmic dependence on horizon length for our setting.
We also examine the benefit of our method against prevalent model-based and
model-free approaches on 3 domains taken from diverse fields.

Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning

Vlad Firoiu, William F. Whitney, Joshua B. Tenenbaum
Comments: Submitted to IJCAI 2017
Subjects: Artificial Intelligence (cs.AI)

There has been a recent explosion in the capabilities of game-playing
artificial intelligence. Many classes of RL tasks, from Atari games to motor
control to board games, are now solvable by fairly generic algorithms, based on
deep learning, that learn to play from experience with minimal knowledge of the
specific domain of interest. In this work, we will investigate the performance
of these methods on Super Smash Bros. Melee (SSBM), a popular console fighting
game. The SSBM environment has complex dynamics and partial observability,
making it challenging for human and machine alike. The multi-player aspect
poses an additional challenge, as the vast majority of recent advances in RL
have focused on single-agent environments. Nonetheless, we will show that it is
possible to train agents that are competitive against and even surpass human
professionals, a new result for the multi-player video game setting.

The Dialog State Tracking Challenge with Bayesian Approach

Quan Nguyen
Subjects: Artificial Intelligence (cs.AI)

Generative model has been one of the most common approaches for solving the
Dialog State Tracking Problem with the capabilities to model the dialog
hypotheses in an explicit manner. The most important task in such Bayesian
networks models is constructing the most reliable user models by learning and
reflecting the training data into the probability distribution of user actions
conditional on networks states. This paper provides an overall picture of the
learning process in a Bayesian framework with an emphasize on the
state-of-the-art theoretical analyses of the Expectation Maximization learning
algorithm.

Synthesizing Imperative Programs for Introductory Programming Assignments

Sunbeom So, Hakjoo Oh
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)

We present a novel algorithm that synthesizes imperative programs for
introductory programming courses. Given a set of input-output examples and a
partial program, our algorithm generates a complete program that is consistent
with every example. Our key idea is to combine enumerative program synthesis
and static analysis, which aggressively prunes out a large search space while
guaranteeing to find, if any, a correct solution. We have implemented our
algorithm in a tool, called SIMPL, and evaluated it on 30 problems used in
introductory programming courses. The results show that SIMPL is able to solve
the benchmark problems in 6.6 seconds on average.

Survey of Reasoning using Neural networks

Amit Sahu
Comments: 12 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Reason and inference require process as well as memory skills by humans.
Neural networks are able to process tasks like image recognition (better than
humans) but in memory aspects are still limited (by attention mechanism, size).
Recurrent Neural Network (RNN) and it’s modified version LSTM are able to solve
small memory contexts, but as context becomes larger than a threshold, it is
difficult to use them. The Solution is to use large external memory. Still, it
poses many challenges like, how to train neural networks for discrete memory
representation, how to describe long term dependencies in sequential data etc.
Most prominent neural architectures for such tasks are Memory networks:
inference components combined with long term memory and Neural Turing Machines:
neural networks using external memory resources. Also, additional techniques
like attention mechanism, end to end gradient descent on discrete memory
representation are needed to support these solutions. Preliminary results of
above neural architectures on simple algorithms (sorting, copying) and Question
Answering (based on story, dialogs) application are comparable with the state
of the art. In this paper, I explain these architectures (in general), the
additional techniques used and the results of their application.

Information Retrieval

Algorithmes de classification et d'optimisation: participation du LIA/ADOC á DEFT'14

Luis Adrián Cabrera-Diego, Stéphane Huet, Bassam Jabaian, Alejandro Molina, Juan-Manuel Torres-Moreno, Marc El-Bèze, Barthélémy Durette
Comments: 8 pages, 3 tables, Conference paper (in French)
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

This year, the DEFT campaign (D’efi Fouilles de Textes) incorporates a task
which aims at identifying the session in which articles of previous TALN
conferences were presented. We describe the three statistical systems developed
at LIA/ADOC for this task. A fusion of these systems enables us to obtain
interesting results (micro-precision score of 0.76 measured on the test corpus)

Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno, Azucena Montes Rendón, Gerardo Sierra
Comments: 8 pages, 6 figures, Conference paper
Journal-ref: Proceedings of the 8th International Joint Conference on Knowledge
Discovery, Knowledge Engineering and Knowledge Management, Vol 1: KDIR,
307-314, 2016, Porto, Portugal
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

In this paper we describe a dynamic normalization process applied to social
network multilingual documents (Facebook and Twitter) to improve the
performance of the Author profiling task for short texts. After the
normalization process, (n)-grams of characters and n-grams of POS tags are
obtained to extract all the possible stylistic information encoded in the
documents (emoticons, character flooding, capital letters, references to other
users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of
performance.

Deep Geometric Retrieval

Y Qian, E Vazquez, B Sengupta
Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV)

Comparing images in order to recommend items from an image-inventory is a
subject of continued interest. Added with the scalability of deep-learning
architectures the once `manual’ job of hand-crafting features have been largely
alleviated, and images can be compared according to features generated from a
deep convolutional neural network. In this paper, we compare distance metrics
(and divergences) to rank features generated from a neural network, for
content-based image retrieval. Specifically, after modelling individual images
using approximations of mixture models or sparse covariance estimators we
resort to their information-theoretic and Riemann geometric comparisons. We
show that using approximations of mixture models enable us to to compute a
distance measure based on the Wasserstein metric that requires less effort than
computationally intensive optimal transport plans; finally, an affine invariant
metric is used to compare the optimal transport metric to its Riemann geometric
counterpart — we conclude that although expensive, retrieval metric based on
Wasserstein geometry are more suitable than information theoretic comparison of
images. In short, we combine GPU scalability in learning deep feature vectors
with computationally efficient metrics that we foresee being utilized in a
commercial setting.

SAR: A Semantic Analysis Approach for Recommendation

Han Xiao, Minlie Huang, Xiaoyan Zhu
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

Recommendation system is a common demand in daily life and matrix completion
is a widely adopted technique for this task. However, most matrix completion
methods lack semantic interpretation and usually result in weak-semantic
recommendations. To this end, this paper proposes a {f S}emantic {f
A}nalysis approach for {f R}ecommendation systems extbf{(SAR)}, which
applies a two-level hierarchical generative process that assigns semantic
properties and categories for user and item. SAR learns semantic
representations of users/items merely from user ratings on items, which offers
a new path to recommendation by semantic matching with the learned
representations. Extensive experiments demonstrate SAR outperforms other
state-of-the-art baselines substantially.

MOLIERE: Automatic Biomedical Hypothesis Generation System

Justin Sybrandt, Michael Shtutman, Ilya Safro
Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL); Social and Information Networks (cs.SI); Quantitative Methods (q-bio.QM); Other Statistics (stat.OT)

Hypothesis generation is becoming a crucial time-saving technique which
allows biomedical researchers to quickly discover implicit connections between
important concepts. Typically, these systems operate on domain-specific
fractions of public medical data. MOLIERE, in contrast, utilizes information
from over 24.5 million documents. At the heart of our approach lies a
multi-modal and multi-relational network of biomedical objects extracted from
several heterogeneous datasets from the National Center for Biotechnology
Information (NCBI). These objects include but are not limited to scientific
papers, keywords, genes, proteins, diseases, and diagnoses. We model hypotheses
using Latent Dirichlet Allocation applied on abstracts found near shortest
paths discovered within this network, and demonstrate the effectiveness of
MOLIERE by performing hypothesis generation on historical data. Our network,
implementation, and resulting data are all publicly available for the broad
scientific community.

Systèmes du LIA à DEFT'13

Xavier Bost, Ilaria Brunetti, Luis Adrián Cabrera-Diego, Jean-Valère Cossu, Andréa Linhares, Mohamed Morchid, Juan-Manuel Torres-Moreno, Marc El-Bèze, Richard Dufour
Comments: 12 pages, 3 tables, (Paper in French)
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

The 2013 D’efi de Fouille de Textes (DEFT) campaign is interested in two
types of language analysis tasks, the document classification and the
information extraction in the specialized domain of cuisine recipes. We present
the systems that the LIA has used in DEFT 2013. Our systems show interesting
results, even though the complexity of the proposed tasks.

Filtering Tweets for Social Unrest

Alan Mishler, Kevin Wonus, Wendy Chambers, Michael Bloodgood
Comments: 7 pages, 8 figures, 3 tables; to appear in Proceedings of the Eleventh IEEE International Conference on Semantic Computing (ICSC 2017), San Diego, California, 2017
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)

Since the events of the Arab Spring, there has been increased interest in
using social media to anticipate social unrest. While efforts have been made
toward automated unrest prediction, we focus on filtering the vast volume of
tweets to identify tweets relevant to unrest, which can be provided to
downstream users for further analysis. We train a supervised classifier that is
able to label Arabic language tweets as relevant to unrest with high
reliability. We examine the relationship between training data size and
performance and investigate ways to optimize the model building process while
minimizing cost. We also explore how confidence thresholds can be set to
achieve desired levels of performance.

Robust Phase Retrieval via ADMM with Outliers

Xue Jiang, H. C. So, X. Liu
Subjects: Information Theory (cs.IT); Information Retrieval (cs.IR)

An outlier-resistance phase retrieval algorithm based on alternating
direction method of multipliers (ADMM) is devised in this letter. Instead of
the widely used least squares criterion that is only optimal for Gaussian noise
environment, we adopt the least absolute deviation criterion to enhance the
robustness against outliers. Considering both intensity- and amplitude-based
observation models, the framework of ADMM is developed to solve the resulting
non-differentiable optimization problems. It is demonstrated that the core
subproblem of ADMM is the proximity operator of the L1-norm, which can be
computed efficiently by soft-thresholding in each iteration. Simulation results
are provided to validate the accuracy and efficiency of the proposed approach
compared to the existing schemes.

Developing a comprehensive framework for multimodal feature extraction

Quinten McNamara, Alejandro de la Vega, Tal Yarkoni
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG); Multimedia (cs.MM)

Feature extraction is a critical component of many applied data science
workflows. In recent years, rapid advances in artificial intelligence and
machine learning have led to an explosion of feature extraction tools and
services that allow data scientists to cheaply and effectively annotate their
data along a vast array of dimensions—ranging from detecting faces in images
to analyzing the sentiment expressed in coherent text. Unfortunately, the
proliferation of powerful feature extraction services has been mirrored by a
corresponding expansion in the number of distinct interfaces to feature
extraction services. In a world where nearly every new service has its own API,
documentation, and/or client library, data scientists who need to combine
diverse features obtained from multiple sources are often forced to write and
maintain ever more elaborate feature extraction pipelines. To address this
challenge, we introduce a new open-source framework for comprehensive
multimodal feature extraction. Pliers is an open-source Python package that
supports standardized annotation of diverse data types (video, images, audio,
and text), and is expressly with both ease-of-use and extensibility in mind.
Users can apply a wide range of pre-existing feature extraction tools to their
data in just a few lines of Python code, and can also easily add their own
custom extractors by writing modular classes. A graph-based API enables rapid
development of complex feature extraction pipelines that output results in a
single, standardized format. We describe the package’s architecture, detail its
major advantages over previous feature extraction toolboxes, and use a sample
application to a large functional MRI dataset to illustrate how pliers can
significantly reduce the time and effort required to construct sophisticated
feature extraction workflows while increasing code clarity and maintainability.

Computation and Language

Systèmes du LIA à DEFT'13

The 2013 D’efi de Fouille de Textes (DEFT) campaign is interested in two
types of language analysis tasks, the document classification and the
information extraction in the specialized domain of cuisine recipes. We present
the systems that the LIA has used in DEFT 2013. Our systems show interesting
results, even though the complexity of the proposed tasks.

Multi-task Learning with CTC and Segmental CRF for Speech Recognition

Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith
Comments: 5 pages, 2 figures, submitted to Interspeech 2017
Subjects: Computation and Language (cs.CL)

Segmental conditional random fields (SCRFs) and connectionist temporal
classification (CTC) are two sequence labeling objectives used for end-to-end
training of speech recognition models. Both models define the transcription
probability by marginalizing decisions about latent segmentation alternatives
to derive a sequence probability: the former uses a globally normalized joint
model of segment labels and durations, and the latter classifies each frame as
either an output symbol or a “continuation” of the previous label. In this
paper, we train a recognition model by optimizing an interpolation between the
SCRF and CTC losses, where the same recurrent neural network (RNN) encoder used
for feature extraction for both outputs. We find that this multi-task objective
improves recognition accuracy when decoding with either the SCRF or CTC models.
Additionally, we show that CTC can also be used to pretrain the RNN encoder,
which improves the convergence rate when learning the joint model.

Hybrid Dialog State Tracker with ASR Features

Miroslav Vodolán, Rudolf Kadlec, Jan Kleindienst
Comments: Accepted to EACL 2017
Subjects: Computation and Language (cs.CL)

This paper presents a hybrid dialog state tracker enhanced by trainable
Spoken Language Understanding (SLU) for slot-filling dialog systems. Our
architecture is inspired by previously proposed neural-network-based
belief-tracking systems. In addition, we extended some parts of our modular
architecture with differentiable rules to allow end-to-end training. We
hypothesize that these rules allow our tracker to generalize better than pure
machine-learning based systems. For evaluation, we used the Dialog State
Tracking Challenge (DSTC) 2 dataset – a popular belief tracking testbed with
dialogs from restaurant information system. To our knowledge, our hybrid
tracker sets a new state-of-the-art result in three out of four categories
within the DSTC2.

Reinforcement Learning Based Argument Component Detection

Yang Gao, Hao Wang, Chen Zhang, Wei Wang
Subjects: Computation and Language (cs.CL)

Argument component detection (ACD) is an important sub-task in argumentation
mining. ACD aims at detecting and classifying different argument components in
natural language texts. Historical annotations (HAs) are important features the
human annotators consider when they manually perform the ACD task. However, HAs
are largely ignored by existing automatic ACD techniques. Reinforcement
learning (RL) has proven to be an effective method for using HAs in some
natural language processing tasks. In this work, we propose a RL-based ACD
technique, and evaluate its performance on two well-annotated corpora. Results
suggest that, in terms of classification accuracy, HAs-augmented RL outperforms
plain RL by at most 17.85%, and outperforms the state-of-the-art supervised
learning algorithm by at most 11.94%.

Learning to generate one-sentence biographies from Wikidata

Andrew Chisholm, Will Radford, Ben Hachey
Subjects: Computation and Language (cs.CL)

We investigate the generation of one-sentence Wikipedia biographies from
facts derived from Wikidata slot-value pairs. We train a recurrent neural
network sequence-to-sequence model with attention to select facts and generate
textual summaries. Our model incorporates a novel secondary objective that
helps ensure it generates sentences that contain the input facts. The model
achieves a BLEU score of 41, improving significantly upon the vanilla
sequence-to-sequence model and scoring roughly twice that of a simple template
baseline. Human preference evaluation suggests the model is nearly as good as
the Wikipedia reference. Manual analysis explores content selection, suggesting
the model can trade the ability to infer knowledge against the risk of
hallucinating incorrect information.

Filtering Tweets for Social Unrest

Since the events of the Arab Spring, there has been increased interest in
using social media to anticipate social unrest. While efforts have been made
toward automated unrest prediction, we focus on filtering the vast volume of
tweets to identify tweets relevant to unrest, which can be provided to
downstream users for further analysis. We train a supervised classifier that is
able to label Arabic language tweets as relevant to unrest with high
reliability. We examine the relationship between training data size and
performance and investigate ways to optimize the model building process while
minimizing cost. We also explore how confidence thresholds can be set to
achieve desired levels of performance.

Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages

Raj Dabre, Fabien Cromieres, Sadao Kurohashi
Comments: Work in progress. Augmented manuscripts to follow
Subjects: Computation and Language (cs.CL)

In this paper, we propose a novel and elegant solution to “Multi-Source
Neural Machine Translation” (MSNMT) which only relies on preprocessing a N-way
multilingual corpus without modifying the Neural Machine Translation (NMT)
architecture or training procedure. We simply concatenate the source sentences
to form a single long multi-source input sentence while keeping the target side
sentence as it is and train an NMT system using this augmented corpus. We
evaluate our method in a low resource, general domain setting and show its
effectiveness (+2 BLEU using 2 source languages and +6 BLEU using 5 source
languages) along with some insights on how the NMT system leverages
multilingual information in such a scenario by visualizing attention.

Algorithmes de classification et d'optimisation: participation du LIA/ADOC á DEFT'14

This year, the DEFT campaign (D’efi Fouilles de Textes) incorporates a task
which aims at identifying the session in which articles of previous TALN
conferences were presented. We describe the three statistical systems developed
at LIA/ADOC for this task. A fusion of these systems enables us to obtain
interesting results (micro-precision score of 0.76 measured on the test corpus)

Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

In this paper we describe a dynamic normalization process applied to social
network multilingual documents (Facebook and Twitter) to improve the
performance of the Author profiling task for short texts. After the
normalization process, (n)-grams of characters and n-grams of POS tags are
obtained to extract all the possible stylistic information encoded in the
documents (emoticons, character flooding, capital letters, references to other
users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of
performance.

Distributed, Parallel, and Cluster Computing

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs)
have obtained great attentions due to their higher energy efficiency than GPUs.
However, it is challenging for FPGA-based solutions to achieve a higher
throughput than GPU counterparts. In this paper, we demonstrate that FPGA
acceleration can be a superior solution in terms of both throughput and energy
efficiency when a CNN is trained with binary constraints on weights and
activations. Specifically, we propose an optimized accelerator architecture
tailored for bitwise convolution and normalization that features massive
spatial parallelism with deep pipelines stages. Experiment results show that
the proposed architecture is 8.3x faster and 75x more energy-efficient than a
Titan X GPU for processing online individual requests (in small batch size).
For processing static data (in large batch size), the proposed solution is on a
par with a Titan X GPU in terms of throughput while delivering 9.5x higher
energy efficiency.

Edge-Fog Cloud: A Distributed Cloud for Internet of Things Computations

Nitinder Mohan, Jussi Kangasharju
Comments: Published in IEEE 2nd Conference on Cloudification of Internet of Things (CIoT) – 2016, Paris, France
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Internet of Things typically involves a significant number of smart sensors
sensing information from the environment and sharing it to a cloud service for
processing. Various architectural abstractions, such as Fog and Edge computing,
have been proposed to localize some of the processing near the sensors and away
from the central cloud servers. In this paper, we propose Edge-Fog Cloud which
distributes task processing on the participating cloud resources in the
network. We develop the Least Processing Cost First (LPCF) method for assigning
the processing tasks to nodes which provide the optimal processing time and
near optimal networking costs. We evaluate LPCF in a variety of scenarios and
demonstrate its effectiveness in finding the processing task assignments.

Demystifying Fog Computing: Characterizing Architectures, Applications and Abstractions

Prateeksha Varshney, Yogesh Simmhan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Internet of Things (IoT) has accelerated the deployment of millions of
sensors at the edge of the network, through Smart City infrastructure and
lifestyle devices. Cloud computing platforms are often tasked with handling
these large volumes and fast streams of data from the edge. Recently, Fog
computing has emerged as a concept for low-latency and resource-rich processing
of these observation streams, to complement Edge and Cloud computing. In this
paper, we review various dimensions of system architecture, application
characteristics and platform abstractions that are manifest in this Edge, Fog
and Cloud eco-system. We highlight novel capabilities of the Edge and Fog
layers, such as physical and application mobility, privacy sensitivity, and a
nascent runtime environment. IoT application case studies based on first-hand
experiences across diverse domains drive this categorization. We also highlight
the gap between the potential and the reality of Fog computing, and identify
challenges that need to be overcome for the solution to be sustainable.
Together, our article can help platform and application developers bridge the
gap that remains in making Fog computing viable.

A Rollback in the History of Communication-Induced Checkpointing

Islene C. Garcia, Gustavo M. D. Vieira, Luiz E. Buzato
Comments: First draft
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The literature on communication-induced checkpointing presents a family of
protocols that use logical clocks to control whether forced checkpoints must be
taken. For many years, HMNR, also called Fully Informed (FI), was the most
complex and efficient protocol of this family. The Lazy-FI protocol applies a
lazy strategy that defers the increase of logical clocks, resulting in a
protocol with better perfomance for distributed systems where processes can
take basic checkpoints at different, asymmetric, rates. Recently, the Fully
Informed aNd Efficient (FINE) protocol was proposed using the same control
structures as FI, but with a stronger and, presumably better,
checkpoint-inducing condition. FINE and its lazy version, called Lazy-FINE,
would now be the most efficient checkpointing protocols based on logical
clocks. This paper reviews this family of protocols, proves a theorem on a
condition that must be enforced by all stronger versions of FI, and proves that
both FINE and Lazy-FINE do not guarantee the absence of useless checkpoints. As
a consequence, FI and Lazy-FI can be rolled back to the position of most
efficient protocols of this family of index-based checkpointing protocols.

Computing Influence of a Product through Uncertain Reverse Skyline

Md. Saiful Islam, Wenny Rahayu, Chengfei Liu, Tarique Anwar, Bela Stantic
Comments: 12 pages, 3 tables, 12 figures, submitted to SSDBM 2017
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Understanding the influence of a product is crucially important for making
informed business decisions. This paper introduces a new type of skyline
queries, called uncertain reverse skyline, for measuring the influence of a
probabilistic product in uncertain data settings. More specifically, given a
dataset of probabilistic products P and a set of customers C, an uncertain
reverse skyline of a probabilistic product q retrieves all customers c in C
which include q as one of their preferred products. We present efficient
pruning ideas and techniques for processing the uncertain reverse skyline query
of a probabilistic product using R-Tree data index. We also present an
efficient parallel approach to compute the uncertain reverse skyline and
influence score of a probabilistic product. Our approach significantly
outperforms the baseline approach derived from the existing literature. The
efficiency of our approach is demonstrated by conducting extensive experiments
with both real and synthetic datasets.

Review of Apriori Based Algorithms on MapReduce Framework

Sudhakar Singh, Rakhi Garg, P. K. Mishra
Comments: 12 pages, 3 figures, ICC-2014
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

The Apriori algorithm that mines frequent itemsets is one of the most popular
and widely used data mining algorithms. Now days many algorithms have been
proposed on parallel and distributed platforms to enhance the performance of
Apriori algorithm. They differ from each other on the basis of load balancing
technique, memory system, data decomposition technique and data layout used to
implement them. The problems with most of the distributed framework are
overheads of managing distributed system and lack of high level parallel
programming language. Also with grid computing there is always potential
chances of node failures which cause multiple re-executions of tasks. These
problems can be overcome by the MapReduce framework introduced by Google.
MapReduce is an efficient, scalable and simplified programming model for large
scale distributed data processing on a large cluster of commodity computers and
also used in cloud computing. In this paper, we present the overview of
parallel Apriori algorithm implemented on MapReduce framework. They are
categorized on the basis of Map and Reduce functions used to implement them
e.g. 1-phase vs. k-phase, I/O of Mapper, Combiner and Reducer, using
functionality of Combiner inside Mapper etc. This survey discusses and analyzes
the various implementations of Apriori on MapReduce framework on the basis of
their distinguishing characteristics. Moreover, it also includes the advantages
and limitations of MapReduce framework.

Learning

A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe

Francesco Locatello, Rajiv Khanna, Michael Tschannen, Martin Jaggi
Comments: Accepted for presentation at AISTATS 2017, preliminary version
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Two of the most fundamental prototypes of greedy optimization are the
matching pursuit and Frank-Wolfe algorithms. In this paper, we take a unified
view on both classes of methods, leading to the first explicit convergence
rates of matching pursuit methods in an optimization sense, for general sets of
atoms. We derive sublinear ((1/t)) convergence for both classes on general
smooth objectives, and linear convergence on strongly convex objectives, as
well as a clear correspondence of algorithm variants. Our presented algorithms
and rates are affine invariant, and do not need any incoherence or sparsity
assumptions.

Negative-Unlabeled Tensor Factorization for Location Category Inference from Inaccurate Mobility Data

Jinfeng Yi, Qi Lei, Wesley Gifford, Ji Liu
Subjects: Learning (cs.LG)

Identifying significant location categories visited by mobile phone users is
the key to a variety of applications. This is an extremely challenging task due
to the possible deviation between the estimated location coordinate and the
actual location, which could be on the order of kilometers. Using the collected
location coordinate as the center and its associated location error as the
radius, we can draw a location uncertainty circle that may cover multiple
location categories, especially in densely populated areas. To estimate the
actual location category more precisely, we propose a novel tensor
factorization framework, through several key observations including the
intrinsic correlations between users, to infer the most likely location
categories within the location uncertainty circle. In addition, the proposed
algorithm can also predict where users are even when there is no location
update. In order to efficiently solve the proposed framework, we propose a
parameter-free and scalable optimization algorithm by effectively exploring the
sparse and low-rank structure of the tensor. Our empirical studies show that
the proposed algorithm is both efficient and effective: it can solve problems
with millions of users and billions of location updates, and also provides
superior prediction accuracies on real-world location updates and check-in data
sets.

Positive-Unlabeled Demand-Aware Recommendation

Jinfeng Yi, Cho-Jui Hsieh, Kush Varshney, Lijun Zhang, Yao Li
Subjects: Learning (cs.LG)

Recommendation for e-commerce with a mix of durable and nondurable goods has
characteristics that distinguish it from the well-studied media recommendation
problem. The demand for items is a combined effect of form utility and time
utility, i.e., a product must both be intrinsically appealing to a consumer and
the time must be right for purchase. In particular for durable goods, time
utility is a function of inter-purchase duration within product category
because consumers are unlikely to purchase two items in the same category in
close temporal succession. Moreover, purchase data, in contrast to ratings
data, is implicit with non-purchases not necessarily indicating dislike.
Together, these issues give rise to the positive-unlabeled demand-aware
recommendation problem that we pose via joint low-rank tensor completion and
product category inter-purchase duration vector estimation. We further relax
this problem and propose a highly scalable alternating minimization approach
with which we can solve problems with millions of users and items. We also show
superior prediction accuracies on multiple real-world data sets.

Fast rates for online learning in Linearly Solvable Markov Decision Processes

Gergely Neu, Vicenç Gómez
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We study the problem of online learning in a class of Markov decision
processes known as linearly solvable MDPs. In the stationary version of this
problem, a learner interacts with its environment by directly controlling the
state transitions, attempting to balance a fixed state-dependent cost and a
certain smooth cost penalizing extreme control inputs. In the current paper, we
consider an online setting where the state costs may change arbitrarily between
consecutive rounds, and the learner only observes the costs at the end of each
respective round. We are interested in constructing algorithms for the learner
that guarantee small regret against the best stationary control policy chosen
in full knowledge of the cost sequence. Our main result is showing that the
smoothness of the control cost enables the simple algorithm of following the
leader to achieve a regret of order (log^2 T) after (T) rounds, vastly
improving on the best known regret bound of order (T^{3/4}) for this setting.

Convolution Aware Initialization

Armen Aghajanyan
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Initialization of parameters in deep neural networks has been shown to have a
big impact on the performance of the networks (Mishkin & Matas, 2015). The
initialization scheme devised by He et al, allowed convolution activations to
carry a constrained mean which allowed deep networks to be trained effectively
(He et al., 2015a). Orthogonal initializations and more generally orthogonal
matrices in standard recurrent networks have been proved to eradicate the
vanishing and exploding gradient problem (Pascanu et al., 2012). Majority of
current initialization schemes do not take fully into account the intrinsic
structure of the convolution operator. This paper introduces a new type of
initialization built around the duality of the Fourier transform and the
convolution operator. With Convolution Aware Initialization we noticed not only
higher accuracy and lower loss, but faster convergence in general. We achieve
new state of the art on the CIFAR10 dataset, and achieve close to state of the
art on various other tasks.

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Emre Çakır, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen
Comments: Accepted for IEEE Transactions on Audio, Speech and Language Processing, Special Issue on Sound Scene and Event Analysis
Subjects: Learning (cs.LG); Sound (cs.SD)

Sound events often occur in unstructured environments where they exhibit wide
variations in their frequency content and temporal structure. Convolutional
neural networks (CNN) are able to extract higher level features that are
invariant to local spectral and temporal variations. Recurrent neural networks
(RNNs) are powerful in learning the longer term temporal context in the audio
signals. CNNs and RNNs as classifiers have recently shown improved performances
over established methods in various sound recognition tasks. We combine these
two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it
on a polyphonic sound event detection task. We compare the performance of the
proposed CRNN method with CNN, RNN, and other established methods, and observe
a considerable improvement for four different datasets consisting of everyday
sound events.

Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

Jialei Wang, Weiran Wang, Nathan Srebro
Subjects: Learning (cs.LG)

We present and analyze statistically optimal, communication and memory
efficient distributed stochastic optimization algorithms with near-linear
speedups (up to (log)-factors). This improves over prior work which includes
methods with near-linear speedups but polynomial communication requirements
(accelerated minibatch SGD) and communication efficient methods which do not
exhibit any runtime speedups over a naive single-machine approach. We first
analyze a distributed SVRG variant as a distributed stochastic optimization
method and show that it can achieve near-linear speedups with logarithmic
rounds of communication, at the cost of high memory requirements. We then
present a novel method, stochastic DANE, which trades off memory for
communication and still allows for optimization with communication which scales
only logarithmically with the desired accuracy while also being memory
efficient. Stochastic DANE is based on a minibatch prox procedure, solving a
non-linearized subproblem on a minibatch at each iteration. We provide a novel
analysis for this procedure which achieves the statistical optimal rate
regardless of minibatch size and smoothness, and thus significantly improving
on prior work.

Exact tensor completion with sum-of-squares

Aaron Potechin, David Steurer
Subjects: Learning (cs.LG); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)

We obtain the first polynomial-time algorithm for exact tensor completion
that improves over the bound implied by reduction to matrix completion. The
algorithm recovers an unknown 3-tensor with (r) incoherent, orthogonal
components in (mathbb R^n) from (rcdot ilde O(n^{1.5})) randomly observed
entries of the tensor. This bound improves over the previous best one of
(rcdot ilde O(n^{2})) by reduction to exact matrix completion. Our bound
also matches the best known results for the easier problem of approximate
tensor completion (Barak & Moitra, 2015).

Our algorithm and analysis extends seminal results for exact matrix
completion (Candes & Recht, 2009) to the tensor setting via the sum-of-squares
method. The main technical challenge is to show that a small number of randomly
chosen monomials are enough to construct a degree-3 polynomial with a precisely
planted orthogonal global optima over the sphere and that this fact can be
certified within the sum-of-squares proof system.

Survey of Reasoning using Neural networks

Amit Sahu
Comments: 12 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Reason and inference require process as well as memory skills by humans.
Neural networks are able to process tasks like image recognition (better than
humans) but in memory aspects are still limited (by attention mechanism, size).
Recurrent Neural Network (RNN) and it’s modified version LSTM are able to solve
small memory contexts, but as context becomes larger than a threshold, it is
difficult to use them. The Solution is to use large external memory. Still, it
poses many challenges like, how to train neural networks for discrete memory
representation, how to describe long term dependencies in sequential data etc.
Most prominent neural architectures for such tasks are Memory networks:
inference components combined with long term memory and Neural Turing Machines:
neural networks using external memory resources. Also, additional techniques
like attention mechanism, end to end gradient descent on discrete memory
representation are needed to support these solutions. Preliminary results of
above neural architectures on simple algorithms (sorting, copying) and Question
Answering (based on story, dialogs) application are comparable with the state
of the art. In this paper, I explain these architectures (in general), the
additional techniques used and the results of their application.

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

Mahdi Soltanolkotabi
Subjects: Learning (cs.LG); Information Theory (cs.IT); Functional Analysis (math.FA); Optimization and Control (math.OC); Machine Learning (stat.ML)

This paper concerns the problem of recovering an unknown but structured
signal (x in R^n) from (m) quadratic measurements of the form
(y_r=|<a_r,x>|^2) for (r=1,2,…,m). We focus on the under-determined setting
where the number of measurements is significantly smaller than the dimension of
the signal ((m<<n)). We formulate the recovery problem as a nonconvex
optimization problem where prior structural information about the signal is
enforced through constrains on the optimization variables. We prove that
projected gradient descent, when initialized in a neighborhood of the desired
signal, converges to the unknown signal at a linear rate. These results hold
for any constraint set (convex or nonconvex) providing convergence guarantees
to the global optimum even when the objective function and constraint set is
nonconvex. Furthermore, these results hold with a number of measurements that
is only a constant factor away from the minimal number of measurements required
to uniquely identify the unknown signal. Our results provide the first provably
tractable algorithm for this data-poor regime, breaking local sample complexity
barriers that have emerged in recent literature. In a companion paper we
demonstrate favorable properties for the optimization problem that may enable
similar results to continue to hold more globally (over the entire ambient
space). Collectively these two papers utilize and develop powerful tools for
uniform convergence of empirical processes that may have broader implications
for rigorous understanding of constrained nonconvex optimization heuristics.
The mathematical results in this paper also pave the way for a new generation
of data-driven phase-less imaging systems that can utilize prior information to
significantly reduce acquisition time and enhance image reconstruction,
enabling nano-scale imaging at unprecedented speeds and resolutions.

On the Consistency of (k)-means++ algorithm

Mieczysław A. Kłopotek
Subjects: Learning (cs.LG)

We prove in this paper that the expected value of the objective function of
the (k)-means++ algorithm for samples converges to population expected value.
As (k)-means++, for samples, provides with constant factor approximation for
(k)-means objectives, such an approximation can be achieved for the population
with increase of the sample size.

This result is of potential practical relevance when one is considering using
subsampling when clustering large data sets (large data bases).

PixelNet: Representation of the pixels, by the pixels, and for the pixels

We explore design principles for general pixel-level prediction problems,
from low-level edge detection to mid-level surface normal estimation to
high-level semantic segmentation. Convolutional predictors, such as the
fully-convolutional network (FCN), have achieved remarkable success by
exploiting the spatial redundancy of neighboring pixels through convolutional
processing. Though computationally efficient, we point out that such approaches
are not statistically efficient during learning precisely because spatial
redundancy limits the information learned from neighboring pixels. We
demonstrate that stratified sampling of pixels allows one to (1) add diversity
during batch updates, speeding up learning; (2) explore complex nonlinear
predictors, improving accuracy; and (3) efficiently train state-of-the-art
models tabula rasa (i.e., “from scratch”) for diverse pixel-labeling tasks. Our
single architecture produces state-of-the-art results for semantic segmentation
on PASCAL-Context dataset, surface normal estimation on NYUDv2 depth dataset,
and edge detection on BSDS.

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs)
have obtained great attentions due to their higher energy efficiency than GPUs.
However, it is challenging for FPGA-based solutions to achieve a higher
throughput than GPU counterparts. In this paper, we demonstrate that FPGA
acceleration can be a superior solution in terms of both throughput and energy
efficiency when a CNN is trained with binary constraints on weights and
activations. Specifically, we propose an optimized accelerator architecture
tailored for bitwise convolution and normalization that features massive
spatial parallelism with deep pipelines stages. Experiment results show that
the proposed architecture is 8.3x faster and 75x more energy-efficient than a
Titan X GPU for processing online individual requests (in small batch size).
For processing static data (in large batch size), the proposed solution is on a
par with a Titan X GPU in terms of throughput while delivering 9.5x higher
energy efficiency.

Causal Inference on Multivariate Mixed-Type Data by Minimum Description Length

Alexander Marx, Jilles Vreeken
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Given data over the joint distribution of two univariate or multivariate
random variables (X) and (Y) of mixed or single type data, we consider the
problem of inferring the most likely causal direction between (X) and (Y). We
take an information theoretic approach, from which it follows that first
describing the data over cause and then that of effect given cause is shorter
than the reverse direction.

For practical inference, we propose a score for causal models for mixed type
data based on the Minimum Description Length (MDL) principle. In particular, we
model dependencies between (X) and (Y) using classification and regression
trees. Inferring the optimal model is NP-hard, and hence we propose Crack, a
fast greedy algorithm to infer the most likely causal direction directly from
the data.

Empirical evaluation on synthetic, benchmark, and real world data shows that
Crack reliably and with high accuracy infers the correct causal direction on
both univariate and multivariate cause–effect pairs over both single and mixed
type data.

Interpreting Outliers: Localized Logistic Regression for Density Ratio Estimation

Makoto Yamada, Song Liu, Samuel Kaski
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We propose an inlier-based outlier detection method capable of both
identifying the outliers and explaining why they are outliers, by identifying
the outlier-specific features. Specifically, we employ an inlier-based outlier
detection criterion, which uses the ratio of inlier and test probability
densities as a measure of plausibility of being an outlier. For estimating the
density ratio function, we propose a localized logistic regression algorithm.
Thanks to the locality of the model, variable selection can be
outlier-specific, and will help interpret why points are outliers in a
high-dimensional space. Through synthetic experiments, we show that the
proposed algorithm can successfully detect the important features for outliers.
Moreover, we show that the proposed algorithm tends to outperform existing
algorithms in benchmark datasets.

Towards a Common Implementation of Reinforcement Learning for Multiple Robotic Tasks

Mobile robots are increasingly being employed for performing complex tasks in
dynamic environments. Reinforcement learning (RL) methods are recognized to be
promising for specifying such tasks in a relatively simple manner. However, the
strong dependency between the learning method and the task to learn is a
well-known problem that restricts practical implementations of RL in robotics,
often requiring major modifications of parameters and adding other techniques
for each particular task. In this paper we present a practical core
implementation of RL which enables the learning process for multiple robotic
tasks with minimal per-task tuning or none. Based on value iteration methods,
this implementation includes a novel approach for action selection, called
Q-biased softmax regression (QBIASSR), which avoids poor performance of the
learning process when the robot reaches new unexplored states. Our approach
takes advantage of the structure of the state space by attending the physical
variables involved (e.g., distances to obstacles, X,Y,{ heta} pose, etc.),
thus experienced sets of states may favor the decision-making process of
unexplored or rarely-explored states. This improvement has a relevant role in
reducing the tuning of the algorithm for particular tasks. Experiments with
real and simulated robots, performed with the software framework also
introduced here, show that our implementation is effectively able to learn
different robotic tasks without tuning the learning method. Results also
suggest that the combination of true online SARSA({lambda}) with QBIASSR can
outperform the existing RL core algorithms in low-dimensional robotic tasks.

On the (Statistical) Detection of Adversarial Examples

Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, Patrick McDaniel
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)

Machine Learning (ML) models are applied in a variety of tasks such as
network intrusion detection or malware classification. Yet, these models are
vulnerable to a class of malicious inputs known as adversarial examples. These
are slightly perturbed inputs that are classified incorrectly by the ML model.
The mitigation of these adversarial inputs remains an open problem.

As a step towards a model-agnostic defense against adversarial examples, we
show that they are not drawn from the same distribution than the original data,
and can thus be detected using statistical tests. As the number of malicious
points included in samples presented to the test diminishes, its detection
confidence decreases. Hence, we introduce a complimentary approach to identify
specific inputs that are adversarial among sets of inputs flagged by the
statistical test. Specifically, we augment our ML model with an additional
output, in which the model is trained to classify all adversarial inputs.

We evaluate our approach on multiple adversarial example crafting methods
(including the fast gradient sign and Jacobian-based saliency map methods) with
several datasets. The statistical test flags sample sets containing adversarial
inputs with confidence above 80%. Furthermore, our augmented model either
detects adversarial examples with high accuracy (>80%) or increases the
adversary’s cost—the perturbation added—by more than 150%. In this way, we
show that statistical properties of adversarial examples are essential to their
detection.

SAR: A Semantic Analysis Approach for Recommendation

Han Xiao, Minlie Huang, Xiaoyan Zhu
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

Recommendation system is a common demand in daily life and matrix completion
is a widely adopted technique for this task. However, most matrix completion
methods lack semantic interpretation and usually result in weak-semantic
recommendations. To this end, this paper proposes a {f S}emantic {f
A}nalysis approach for {f R}ecommendation systems extbf{(SAR)}, which
applies a two-level hierarchical generative process that assigns semantic
properties and categories for user and item. SAR learns semantic
representations of users/items merely from user ratings on items, which offers
a new path to recommendation by semantic matching with the learned
representations. Extensive experiments demonstrate SAR outperforms other
state-of-the-art baselines substantially.

Sample Efficient Policy Search for Optimal Stopping Domains

Karan Goel, Christoph Dann, Emma Brunskill
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

Arising naturally in many fields, optimal stopping problems consider the
question of deciding when to stop an observation-generating process. We examine
the problem of simultaneously learning and planning in such domains, when data
is collected directly from the environment. We propose GFSE, a simple and
flexible model-free policy search method that reuses data for sample efficiency
by leveraging problem structure. We bound the sample complexity of our approach
to guarantee uniform convergence of policy value estimates, tightening existing
PAC bounds to achieve logarithmic dependence on horizon length for our setting.
We also examine the benefit of our method against prevalent model-based and
model-free approaches on 3 domains taken from diverse fields.

Determination of hysteresis in finite-state random walks using Bayesian cross validation

Joshua C. Chang
Comments: Submitted
Subjects: Methodology (stat.ME); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Quantitative Methods (q-bio.QM)

Consider the problem of modeling hysteresis for finite-state random walks
using higher-order Markov chains. This Letter introduces a Bayesian framework
to determine, from data, the number of prior states of recent history upon
which a trajectory is statistically dependent. The general recommendation is to
use leave-one-out cross validation, using an easily-computable formula that is
provided in closed form. Importantly, Bayes factors using flat model priors are
biased in favor of too-complex a model (more hysteresis) when a large amount of
data is present and the Akaike information criterion (AIC) is biased in favor
of too-sparse a model (less hysteresis) when few data are present.

Filtering Tweets for Social Unrest

Since the events of the Arab Spring, there has been increased interest in
using social media to anticipate social unrest. While efforts have been made
toward automated unrest prediction, we focus on filtering the vast volume of
tweets to identify tweets relevant to unrest, which can be provided to
downstream users for further analysis. We train a supervised classifier that is
able to label Arabic language tweets as relevant to unrest with high
reliability. We examine the relationship between training data size and
performance and investigate ways to optimize the model building process while
minimizing cost. We also explore how confidence thresholds can be set to
achieve desired levels of performance.

Bayesian Boolean Matrix Factorisation

Tammo Rukat, Chris C. Holmes, Michalis K. Titsias, Christopher Yau
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Numerical Analysis (cs.NA); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM); Methodology (stat.ME)

Boolean matrix factorisation (BooMF) infers interpretable decompositions of a
binary data matrix into a pair of low-rank, binary matrices: One containing
meaningful patterns, the other quantifying how the observations can be
expressed as a combination of these patterns. We introduce the OrMachine, a
probabilistic generative model for BooMF and derive a Metropolised Gibbs
sampler that facilitates very efficient parallel posterior inference. Our
method outperforms all currently existing approaches for Boolean Matrix
factorization and completion, as we show on simulated and real world data. This
is the first method to provide full posterior inference for BooMF which is
relevant in applications, e.g. for controlling false positive rates in
collaborative filtering, and crucially it improves the interpretability of the
inferred patterns. The proposed algorithm scales to large datasets as we
demonstrate by analysing single cell gene expression data in 1.3 million mouse
brain cells across 11,000 genes on commodity hardware.

Developing a comprehensive framework for multimodal feature extraction

Quinten McNamara, Alejandro de la Vega, Tal Yarkoni
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Learning (cs.LG); Multimedia (cs.MM)

Feature extraction is a critical component of many applied data science
workflows. In recent years, rapid advances in artificial intelligence and
machine learning have led to an explosion of feature extraction tools and
services that allow data scientists to cheaply and effectively annotate their
data along a vast array of dimensions—ranging from detecting faces in images
to analyzing the sentiment expressed in coherent text. Unfortunately, the
proliferation of powerful feature extraction services has been mirrored by a
corresponding expansion in the number of distinct interfaces to feature
extraction services. In a world where nearly every new service has its own API,
documentation, and/or client library, data scientists who need to combine
diverse features obtained from multiple sources are often forced to write and
maintain ever more elaborate feature extraction pipelines. To address this
challenge, we introduce a new open-source framework for comprehensive
multimodal feature extraction. Pliers is an open-source Python package that
supports standardized annotation of diverse data types (video, images, audio,
and text), and is expressly with both ease-of-use and extensibility in mind.
Users can apply a wide range of pre-existing feature extraction tools to their
data in just a few lines of Python code, and can also easily add their own
custom extractors by writing modular classes. A graph-based API enables rapid
development of complex feature extraction pipelines that output results in a
single, standardized format. We describe the package’s architecture, detail its
major advantages over previous feature extraction toolboxes, and use a sample
application to a large functional MRI dataset to illustrate how pliers can
significantly reduce the time and effort required to construct sophisticated
feature extraction workflows while increasing code clarity and maintainability.

Information Theory

A data-driven basis for direct estimation of functionals of distributions

Alan Wisler, Visar Berisha, Andreas Spanias, Alfred O. Hero
Comments: Under review for IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

A number of fundamental quantities in statistical signal processing and
information theory can be expressed as integral functions of two probability
density functions. Such quantities are called density functionals as they map
density functions onto the real line. For example, information divergence
functions measure the dissimilarity between two probability density functions
and are particularly useful in a number of applications. Typically, estimating
these quantities requires complete knowledge of the underlying distribution
followed by multi-dimensional integration. Existing methods make parametric
assumptions about the data distribution or use non-parametric density
estimation followed by high-dimensional integration. In this paper, we propose
a new alternative. We introduce the concept of “data-driven” basis functions –
functions of distributions whose value we can estimate given only samples from
the underlying distributions without requiring distribution fitting or direct
integration. We derive a new data-driven complete basis that is similar to the
deterministic Bernstein polynomial basis and develop two methods for performing
basis expansions of functionals of two distributions. We also show that the new
basis set allows us to approximate functions of distributions as closely as
desired. Finally, we evaluate the methodology by developing data driven
estimators for the Kullback-Leibler divergences and the Hellinger distance and
by constructing tight data-driven bounds on the Bayes Error Rate.

Timely CSI Acquisition Exploiting Full Duplex

Jesus Arnau, Marios Kountouris
Comments: 6 pages, 4 figures, accepted at IEEE WCNC 2017
Subjects: Information Theory (cs.IT)

In this paper, we propose a method for acquiring accurate and timely channel
state information (CSI) by leveraging full-duplex transmission. Specifically,
we propose a mobile communication system in which base stations continuously
transmit a pilot sequence in the uplink frequency band, while terminals use
self-interference cancellation capabilities to obtain CSI at any time. Our
proposal outperforms its half-duplex counterpart by at least 50% in terms of
throughput while ensuring the same (or even lower) outage probability.
Remarkably, it also outperforms using full duplex for downlink data
transmission for low values of downlink bandwidth and received power.

3-Dimensional Optical Orthogonal Codes with Ideal Autocorrelation-Bounds and Optimal Constructions

Tim L. Alderson
Subjects: Information Theory (cs.IT)

Several new constructions of 3-dimensional optical orthogonal codes are
presented here. In each case the codes have ideal autocorrelation (mathbf{
lambda_a=0} ), and in all but one case a cross correlation of (
mathbf{lambda_c=1} ). All codes produced are optimal with respect to the
applicable Johnson bound either presented or developed here. Thus, on one hand
the codes are as large as possible, and on the other, the bound(s) are shown to
be tight. All codes are constructed by using a particular automorphism (a
Singer cycle) of ( mathbf{ PG(k,q)} ), the finite projective geometry of
dimension ( k ) over the field of order ( mathbf{q} ), or by using an affine
analogue in ( AG(k,q) ).

Several Classes of Permutation Trinomials over (mathbb F_{5^n}) From Niho Exponents

Gaofei Wu, Nian Li
Subjects: Information Theory (cs.IT)

The construction of permutation trinomials over finite fields attracts
people’s interest recently due to their simple form and some additional
properties. Motivated by some results on the construction of permutation
trinomials with Niho exponents, by constructing some new fractional polynomials
that permute the set of the ((q+1))-th roots of unity in (mathbb F_{q^2}), we
present several classes of permutation trinomials with Niho exponents over
(mathbb F_{q^2}), where (q=5^k).

Phaseless Sampling and Reconstruction of Real-Valued Signals in Shift-Invariant Spaces

Cheng Cheng, Junzheng Jiang, Qiyu Sun
Subjects: Information Theory (cs.IT); Numerical Analysis (math.NA)

Sampling in shift-invariant spaces is a realistic model for signals with
smooth spectrum. In this paper, we consider phaseless sampling and
reconstruction of real-valued signals in a shift-invariant space from their
magnitude measurements on the whole Euclidean space and from their phaseless
samples taken on a discrete set with finite sampling density. We introduce an
undirected graph to a signal and use connectivity of the graph to characterize
whether the signal can be determined, up to a sign, from its magnitude
measurements on the whole Euclidean space. Under the local complement property
assumption on a shift-invariant space, we find a discrete set with finite
sampling density such that signals in the shift-invariant space, that are
determined from their magnitude measurements on the whole Euclidean space, can
be reconstructed in a stable way from their phaseless samples taken on that
discrete set. In this paper, we also propose a reconstruction algorithm which
provides a suboptimal approximation to the original signal when its noisy
phaseless samples are available only. Finally, numerical simulations are
performed to demonstrate the robust reconstruction of box spline signals from
their noisy phaseless samples.

Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation

Yue M. Lu, Gen Li
Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

We study a spectral initialization method that serves as a key ingredient in
recent work on using efficient iterative algorithms for estimating signals in
nonconvex settings. Unlike previous analysis in the literature, which is
restricted to the phase retrieval setting and which provides only performance
bounds, we consider arbitrary generalized linear sensing models and present a
precise asymptotic characterization of the performance of the spectral method
in the high-dimensional regime. Our analysis reveals a phase transition
phenomenon that depends on the sampling ratio. When the ratio is below a
minimum threshold, the estimates given by the spectral method are no better
than a random guess drawn uniformly from the hypersphere; above a maximum
threshold, however, the estimates become increasingly aligned with the target
signal. The computational complexity of the spectral method is also markedly
different in the two phases. Worked examples and numerical results are provided
to illustrate and verify the analytical predictions. In particular, simulations
show that our asymptotic formulas provide accurate predictions even at moderate
signal dimensions.

Finite Horizon Energy-Efficient Scheduling with Energy Harvesting Transmitters over Fading Channels

Baran Tan Bacinoglu, Elif Uysal-Biyikoglu, Can Emre Koksal
Comments: A version of this paper has been submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)

In this paper, energy-efficient transmission schemes achieving maximal
throughput over a finite time interval are studied in a problem setting
including energy harvests, data arrivals and channel variation. The goal is to
express the offline optimal policy in a way that facilitates a good online
solution. We express any throughput maximizing energy efficient offline
schedule (EE-TM-OFF) explicitly in terms of water levels. This allows per-slot
real-time evaluation of transmit power and rate decisions, using estimates of
the associated offline water levels. To compute the online power level, we
construct a stochastic dynamic program that incorporates the offline optimal
solution as a stochastic process. We introduce the “Immediate Fill” metric
which provides a lower bound on the efficiency of any online policy with
respect to the corresponding optimal offline solution. The online algorithms
obtained this way exhibit performance close to the offline optimal, not only in
the long run but also in short problem horizons, deeming them suitable for
practical implementations.

Compressive Channel Estimation and Multi-user Detection in C-RAN

Qi He, Tony Q.S. Quek, Zhi Chen, Shaoqian Li
Comments: 6 pages, 3 figures
Subjects: Information Theory (cs.IT)

This paper considers the channel estimation (CE) and multi-user detection
(MUD) problems in cloud radio access network (C-RAN). Assuming that active
users are sparse in the network, we solve CE and MUD problems with compressed
sensing (CS) technology to greatly reduce the long identification pilot
overhead. A mixed L{2,1}-regularization functional for extended sparse
group-sparsity recovery is proposed to exploit the inherently sparse property
existing both in user activities and remote radio heads (RRHs) that active
users are attached to. Empirical and theoretical guidelines are provided to
help choosing tuning parameters which have critical effect on the performance
of the penalty functional. To speed up the processing procedure, based on
alternating direction method of multipliers and variable splitting strategy, an
efficient algorithm is formulated which is guaranteed to be convergent.
Numerical results are provided to illustrate the effectiveness of the proposed
functional and efficient algorithm.

Ultra-Reliable Short-Packet Communications with Wireless Energy Transfer

Onel L. Alcaraz López, Hirley Alves, Richard Demo Souza, Evelio Martín García Fernández
Comments: In press – IEEE SPL
Subjects: Information Theory (cs.IT)

We analyze and optimize a wireless system with energy transfer in the
downlink and information transfer in the uplink, under quasi-static Nakagami-m
fading. We consider ultra-reliable communication scenarios representative of
the fifth-generation of wireless systems, with strict error and latency
requirements. The error probability and delay are investigated, and an
approximation for the former is given and validated through simulations. The
numerical results demonstrate that there are optimum numbers of channels uses
for both energy and information transfer for a given message length.

Information-Theoretic Perspectives on Brascamp-Lieb Inequality and Its Reverse

Jingbo Liu, Thomas A. Courtade, Paul Cuff, Sergio Verdu
Comments: 57 pages; 6 figures; presented in part in ISIT 2016; submitted
Subjects: Information Theory (cs.IT)

We introduce an inequality which may be viewed as a generalization of both
the Brascamp-Lieb inequality and its reverse (Barthe’s inequality), and prove
its information-theoretic (i.e. entropic) formulation. This result leads to a
unified approach to functional inequalities such as the variational formula of
R’enyi entropy, hypercontractivity and its reverse, strong data processing
inequalities, and transportation-cost inequalities, whose utility in the proofs
of various coding theorems has gained growing popularity recently. We show that
our information-theoretic setting is convenient for proving properties such as
data processing, tensorization, convexity (Riesz-Thorin interpolation) and
Gaussian optimality. In particular, we elaborate on a “doubling trick” used by
Lieb and Geng-Nair to prove several results on Gaussian optimality. Several
applications are discussed, including a generalization of the Brascamp-Lieb
inequality involving Gaussian random transformations, the determination of
Wyner’s common information of vector Gaussian sources, and the achievable rate
region of certain key generation problems in the case of vector Gaussian
sources.

Noise Models in the Nonlinear Spectral Domain for Optical Fibre Communications

Qun Zhang, Terence H. Chan
Comments: submitted to IEEE Trans. on IT
Subjects: Information Theory (cs.IT); Probability (math.PR)

Existing works on building a soliton transmission system only encode
information using the imaginary part of the eigenvalue, which fails to make
full use of the signal degree-of-freedoms. Motivated by this observation, we
make the first step of encoding information using (discrete) spectral
amplitudes by proposing analytical noise models for the spectral amplitudes of
(N)-solitons ((Ngeq 1)). To our best knowledge, this is the first work in
building an analytical noise model for spectral amplitudes, which leads to many
interesting information theoretic questions, such as channel capacity analysis,
and has a potential of increasing the transmission rate. The noise statistics
of the spectral amplitude of a soliton are also obtained without the Gaussian
approximation.

Multi-Agent Reinforcement Learning for Energy Harvesting Two-Hop Communications with Full Cooperation

Andrea Ortiz, Hussein Al-Shatri, Tobias Weber, Anja Klein
Subjects: Information Theory (cs.IT)

We focus on energy harvesting (EH) two-hop communications since they are the
essential building blocks of more complicated multi-hop networks. The scenario
consists of three nodes, where an EH transmitter wants to send data to a
receiver through an EH relay. The harvested energy is used exclusively for data
transmission and we address the problem of how to efficiently use it. As in
practical scenarios, we assume only causal knowledge at the EH nodes, i.e., in
each time interval, the transmitter and the relay know their own current and
past amounts of incoming energy, battery levels, data buffer levels and channel
coefficients for their own transmit channels. Our goal is to find transmission
policies which aim at maximizing the throughput considering that the EH nodes
fully cooperate with each other to exchange their causal knowledge during a
signaling phase. We model the problem as a Markov game and propose a
multi-agent reinforcement learning algorithm to find the transmission policies.
Furthermore, we show the trade-off between the achievable throughput and the
signaling required, and provide convergence guarantees for the proposed
algorithm. Results show that even when the signaling overhead is taken into
account, the proposed algorithm outperforms other approaches that do not
consider cooperation among the nodes.

Robust Phase Retrieval via ADMM with Outliers

Xue Jiang, H. C. So, X. Liu
Subjects: Information Theory (cs.IT); Information Retrieval (cs.IR)

An outlier-resistance phase retrieval algorithm based on alternating
direction method of multipliers (ADMM) is devised in this letter. Instead of
the widely used least squares criterion that is only optimal for Gaussian noise
environment, we adopt the least absolute deviation criterion to enhance the
robustness against outliers. Considering both intensity- and amplitude-based
observation models, the framework of ADMM is developed to solve the resulting
non-differentiable optimization problems. It is demonstrated that the core
subproblem of ADMM is the proximity operator of the L1-norm, which can be
computed efficiently by soft-thresholding in each iteration. Simulation results
are provided to validate the accuracy and efficiency of the proposed approach
compared to the existing schemes.

Throughput Optimal Beam Alignment in Millimeter Wave Networks

Muddassar Hussain, Nicolo Michelusi
Subjects: Information Theory (cs.IT)

Millimeter wave communications rely on narrow-beam transmissions to cope with
the strong signal attenuation at these frequencies, thus demanding precise beam
alignment between transmitter and receiver. The communication overhead incurred
to achieve beam alignment may become a severe impairment in mobile networks.
This paper addresses the problem of optimizing beam alignment acquisition, with
the goal of maximizing throughput. Specifically, the algorithm jointly
determines the portion of time devoted to beam alignment acquisition, as well
as, within this portion of time, the optimal beam search parameters, using the
framework of Markov decision processes. It is proved that a bisection search
algorithm is optimal, and that it outperforms exhaustive and iterative search
algorithms proposed in the literature. The duration of the beam alignment phase
is optimized so as to maximize the overall throughput. The numerical results
show that the throughput, optimized with respect to the duration of the beam
alignment phase, achievable under the exhaustive algorithm is 88.3% lower than
that achievable under the bisection algorithm. Similarly, the throughput
achievable by the iterative search algorithm for a division factor of 4 and 8
is, respectively, 12.8% and 36.4% lower than that achievable by the bisection
algorithm.

An Inequality for the Correlation of Two Functions Operating on Symmetric Bivariate Normal Variables

Ran Hadad, Uri Erez, Yaming Yu
Subjects: Information Theory (cs.IT)

An inequality is derived for the correlation of two univariate functions
operating on symmetric bivariate normal random variables. The inequality is a
simple consequence of the Cauchy-Schwarz inequality.

Energy-Efficient Wireless Content Delivery with Proactive Caching

Samuel O. Somuyiwa, András György, Deniz Gündüz
Comments: 6 pages, 2 figures. Submitted, under review
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Optimization and Control (math.OC)

We propose an intelligent proactive content caching scheme to reduce the
energy consumption in wireless downlink. We consider an online social network
(OSN) setting where new contents are generated over time, and remain
extit{relevant} to the user for a random lifetime. Contents are downloaded to
the user equipment (UE) through a time-varying wireless channel at an energy
cost that depends on the channel state and the number of contents downloaded.
The user accesses the OSN at random time instants, and consumes all the
relevant contents. To reduce the energy consumption, we propose
extit{proactive caching} of contents under favorable channel conditions to a
finite capacity cache memory. Assuming that the channel quality (or
equivalently, the cost of downloading data) is memoryless over time slots, we
show that the optimal caching policy, which may replace contents in the cache
with shorter remaining lifetime with contents at the server that remain
relevant longer, has certain threshold structure with respect to the channel
quality. Since the optimal policy is computationally demanding in practice, we
introduce a simplified caching scheme and optimize its parameters using policy
search. We also present two lower bounds on the energy consumption. We
demonstrate through numerical simulations that the proposed caching scheme
significantly reduces the energy consumption compared to traditional reactive
caching tools, and achieves close-to-optimal performance for a wide variety of
system parameters.

Exact tensor completion with sum-of-squares

Aaron Potechin, David Steurer
Subjects: Learning (cs.LG); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)

We obtain the first polynomial-time algorithm for exact tensor completion
that improves over the bound implied by reduction to matrix completion. The
algorithm recovers an unknown 3-tensor with (r) incoherent, orthogonal
components in (mathbb R^n) from (rcdot ilde O(n^{1.5})) randomly observed
entries of the tensor. This bound improves over the previous best one of
(rcdot ilde O(n^{2})) by reduction to exact matrix completion. Our bound
also matches the best known results for the easier problem of approximate
tensor completion (Barak & Moitra, 2015).

Our algorithm and analysis extends seminal results for exact matrix
completion (Candes & Recht, 2009) to the tensor setting via the sum-of-squares
method. The main technical challenge is to show that a small number of randomly
chosen monomials are enough to construct a degree-3 polynomial with a precisely
planted orthogonal global optima over the sphere and that this fact can be
certified within the sum-of-squares proof system.

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

Mahdi Soltanolkotabi
Subjects: Learning (cs.LG); Information Theory (cs.IT); Functional Analysis (math.FA); Optimization and Control (math.OC); Machine Learning (stat.ML)

This paper concerns the problem of recovering an unknown but structured
signal (x in R^n) from (m) quadratic measurements of the form
(y_r=|<a_r,x>|^2) for (r=1,2,…,m). We focus on the under-determined setting
where the number of measurements is significantly smaller than the dimension of
the signal ((m<<n)). We formulate the recovery problem as a nonconvex
optimization problem where prior structural information about the signal is
enforced through constrains on the optimization variables. We prove that
projected gradient descent, when initialized in a neighborhood of the desired
signal, converges to the unknown signal at a linear rate. These results hold
for any constraint set (convex or nonconvex) providing convergence guarantees
to the global optimum even when the objective function and constraint set is
nonconvex. Furthermore, these results hold with a number of measurements that
is only a constant factor away from the minimal number of measurements required
to uniquely identify the unknown signal. Our results provide the first provably
tractable algorithm for this data-poor regime, breaking local sample complexity
barriers that have emerged in recent literature. In a companion paper we
demonstrate favorable properties for the optimization problem that may enable
similar results to continue to hold more globally (over the entire ambient
space). Collectively these two papers utilize and develop powerful tools for
uniform convergence of empirical processes that may have broader implications
for rigorous understanding of constrained nonconvex optimization heuristics.
The mathematical results in this paper also pave the way for a new generation
of data-driven phase-less imaging systems that can utilize prior information to
significantly reduce acquisition time and enhance image reconstruction,
enabling nano-scale imaging at unprecedented speeds and resolutions.