IT博客汇 | arXiv Paper Daily: Tue, 18 Apr 2017

arXiv Paper Daily: Tue, 18 Apr 2017

我爱机器学习(52ml.net)发表于 2017-04-18 00:00:00

Neural and Evolutionary Computing

Interval Arithmetic and Interval-Aware Operators for Genetic Programming

Grant Dick
Comments: Extended version of: Grant Dick. 2017. Revisiting Interval Arithmetic for Regression Problems in Genetic Programming. In Proceedings of the 2017 Annual Conference on Genetic and Evolutionary Computation. ACM. To appear 8 pages, 10 figures
Subjects: Neural and Evolutionary Computing (cs.NE)

Symbolic regression via genetic programming is a flexible approach to machine
learning that does not require up-front specification of model structure.
However, traditional approaches to symbolic regression require the use of
protected operators, which can lead to perverse model characteristics and poor
generalisation. In this paper, we revisit interval arithmetic as one possible
solution to allow genetic programming to perform regression using unprotected
operators. Using standard benchmarks, we show that using interval arithmetic
within model evaluation does not prevent invalid solutions from entering the
population, meaning that search performance remains compromised. We extend the
basic interval arithmetic concept with `safe’ search operators that integrate
interval information into their process, thereby greatly reducing the number of
invalid solutions produced during search. The resulting algorithms are able to
more effectively identify good models that generalise well to unseen data. We
conclude with an analysis of the sensitivity of interval arithmetic-based
operators with respect to the accuracy of the supplied input feature intervals.

A Sport Tournament Scheduling by Genetic Algorithm with Swapping Method

Tinnaluk Rutjanisarakul, Thiradet Jiarasuksakun
Subjects: Neural and Evolutionary Computing (cs.NE)

A sport tournament problem is considered the Traveling Tournament Problem
(TTP). One interesting type is the mirrored Traveling Tournament Problem
(mTTP). The objective of the problem is to minimize either the total number of
traveling or the total distances of traveling or both. This research aims to
find an optimized solution of the mirrored Traveling Tournament Problem with
minimum total number of traveling. The solutions consisting of traveling and
scheduling tables are solved by using genetic algorithm (GA) with swapping
method. The number of traveling of all teams from obtained solutions are close
to the lower bound theory of number of traveling. Moreover, this algorithm
generates better solutions than known results for most cases.

Self-Adaptive Differential Evolution for Bio-Inspired Neuromorphic Collision Avoidance

Llewyn Salt, David Howard
Comments: 8 page conference paper. Submitted to GECCO2017. Accepted as 2 page poster
Subjects: Neural and Evolutionary Computing (cs.NE)

We present the optimisation of a neuromorphic adaptation of a spiking neural
network model of the locust Lobula Giant Movement Detector (LGMD), which
detects looming objects and can be used to facilitate obstacle avoidance in
robotic applications. Our model is constrained by the parameters of a mixed
signal analogue-digital neuromorphic device and is driven by the output of a
neuromorphic vision sensor DVS. Due to the number of user-defined parameters
and the difficulty to find values that perform well we investigate the use of
Differential Evolution and self-adaptive DE (SADE) to find optimal values. We
demonstrate that these optimisation algorithms are suitable candidates to find
suitable parameters for an obstacle avoidance system on an unmanned aerial
vehicle (UAV).

A Hybrid ACO Algorithm for the Next Release Problem

He Jiang, Jingyuan Zhang, Jifeng Xuan, Zhilei Ren, Yan Hu
Comments: 6 pages, 2 figures, Proceedings of 2nd International Conference on Software Engineering and Data Mining (SEDM 2010), 2010
Subjects: Neural and Evolutionary Computing (cs.NE)

In this paper, we propose a Hybrid Ant Colony Optimization algorithm (HACO)
for Next Release Problem (NRP). NRP, a NP-hard problem in requirement
engineering, is to balance customer requests, resource constraints, and
requirement dependencies by requirement selection. Inspired by the successes of
Ant Colony Optimization algorithms (ACO) for solving NP-hard problems, we
design our HACO to approximately solve NRP. Similar to traditional ACO
algorithms, multiple artificial ants are employed to construct new solutions.
During the solution construction phase, both pheromone trails and neighborhood
information will be taken to determine the choices of every ant. In addition, a
local search (first found hill climbing) is incorporated into HACO to improve
the solution quality. Extensively wide experiments on typical NRP test
instances show that HACO outperforms the existing algorithms (GRASP and
simulated annealing) in terms of both solution uality and running time.

Adversarial and Clean Data Are Not Twins

Zhitao Gong, Wenlu Wang, Wei-Shinn Ku
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Adversarial attack has cast a shadow on the massive success of deep neural
networks. Despite being almost visually identical to the clean data, the
adversarial images can fool deep neural networks into wrong predictions with
very high confidence. In this paper, however, we show that we can build a
simple binary classifier separating the adversarial apart from the clean data
with accuracy over 99%. We also empirically show that the binary classifier is
robust to a second-round adversarial attack. In other words, it is difficult to
disguise adversarial samples to bypass the binary classifier. Further more, we
empirically investigate the generalization limitation which lingers on all
current defensive methods, including the binary classifier approach. And we
hypothesize that this is the result of intrinsic property of adversarial
crafting algorithms.

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, et al. (19 additional authors not shown)
Comments: 17 pages, 11 figures, 8 tables. To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 24-28, 2017
Subjects: Hardware Architecture (cs.AR); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Many architects believe that major improvements in cost-energy-performance
must now come from domain-specific hardware. This paper evaluates a custom
ASIC—called a Tensor Processing Unit (TPU)—deployed in datacenters since
2015 that accelerates the inference phase of neural networks (NN). The heart of
the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak
throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed
on-chip memory. The TPU’s deterministic execution model is a better match to
the 99th-percentile response-time requirement of our NN applications than are
the time-varying optimizations of CPUs and GPUs (caches, out-of-order
execution, multithreading, multiprocessing, prefetching, …) that help average
throughput more than guaranteed latency. The lack of such features helps
explain why, despite having myriad MACs and a big memory, the TPU is relatively
small and low power. We compare the TPU to a server-class Intel Haswell CPU and
an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters.
Our workload, written in the high-level TensorFlow framework, uses production
NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters’
NN inference demand. Despite low utilization for some applications, the TPU is
on average about 15X – 30X faster than its contemporary GPU or CPU, with
TOPS/Watt about 30X – 80X higher. Moreover, using the GPU’s GDDR5 memory in the
TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and
200X the CPU.

A fast ILP-based Heuristic for the robust design of Body Wireless Sensor Networks

Fabio D'Andreagiovanni, Antonella Nardin, Enrico Natalizio
Comments: This is the authors’ final version of the paper published in G. Squillero and K. Sim (Eds.): EvoApplications 2017, Part I, LNCS 10199, pp. 1-17, 2017. DOI: 10.1007/978-3-319-55849-3\_16. The final publication is available at Springer via this http URL
Journal-ref: EvoApplications 2017, Springer LNCS 10199 (2017) 1-17
Subjects: Optimization and Control (math.OC); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)

We consider the problem of optimally designing a body wireless sensor
network, while taking into account the uncertainty of data generation of
biosensors. Since the related min-max robustness Integer Linear Programming
(ILP) problem can be difficult to solve even for state-of-the-art commercial
optimization solvers, we propose an original heuristic for its solution. The
heuristic combines deterministic and probabilistic variable fixing strategies,
guided by the information coming from strengthened linear relaxations of the
ILP robust model, and includes a very large neighborhood search for reparation
and improvement of generated solutions, formulated as an ILP problem solved
exactly. Computational tests on realistic instances show that our heuristic
finds solutions of much higher quality than a state-of-the-art solver and than
an effective benchmark heuristic.

Computer Vision and Pattern Recognition

End-to-end 3D face reconstruction with deep neural networks

Pengfei Dou, Shishir K. Shah, Ioannis A. Kakadiaris
Comments: Accepted to CVPR17
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Monocular 3D facial shape reconstruction from a single 2D facial image has
been an active research area due to its wide applications. Inspired by the
success of deep neural networks (DNN), we propose a DNN-based approach for
End-to-End 3D FAce Reconstruction (UH-E2FAR) from a single 2D image. Different
from recent works that reconstruct and refine the 3D face in an iterative
manner using both an RGB image and an initial 3D facial shape rendering, our
DNN model is end-to-end, and thus the complicated 3D rendering process can be
avoided. Moreover, we integrate in the DNN architecture two components, namely
a multi-task loss function and a fusion convolutional neural network (CNN) to
improve facial expression reconstruction. With the multi-task loss function, 3D
face reconstruction is divided into neutral 3D facial shape reconstruction and
expressive 3D facial shape reconstruction. The neutral 3D facial shape is
class-specific. Therefore, higher layer features are useful. In comparison, the
expressive 3D facial shape favors lower or intermediate layer features. With
the fusion-CNN, features from different intermediate layers are fused and
transformed for predicting the 3D expressive facial shape. Through extensive
experiments, we demonstrate the superiority of our end-to-end framework in
improving the accuracy of 3D face reconstruction.

AMTnet: Action-Micro-Tube regression by end-to-end trainable deep architecture

Suman Saha, Gurkirt Singh, Fabio Cuzzolin
Comments: 14 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Dominant approaches to action detection can only provide sub-optimal
solutions to the problem, as they rely on seeking frame-level detections, to
later compose them into action tubes in a post-processing step. With this paper
we radically depart from current practice, and take a first step towards the
design and implementation of a deep network architecture able to classify and
regress whole video subsets, so providing a truly optimal solution of the
action detection problem. In this work, in particular, we propose a novel deep
net framework able to regress and classify 3D region proposals spanning two
consecutive video frames, whose core is an evolution of classical region
proposal networks (RPNs). As such, our 3D-RPN net is able to effectively encode
the temporal aspect of actions by purely exploiting appearance, as opposed to
methods which heavily rely on expensive flow maps computed in a parallel
stream. The proposed model is end-to-end trainable and can be jointly optimised
for action localisation and classification using a single step of optimisation.
At test time the network predicts ‘micro-tubes’ spanning two frames, which are
linked up into complete action tubes via a new algorithm which exploits the
temporal encoding learned by the network and cuts computation time by 50%.
Promising results on the J-HMDB-21 and UCF-101 action detection datasets show
that our model does indeed outperform the state-of-the-art when relying purely
on appearance.

Multi-View Image Generation from a Single-View

Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Jiashi Feng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

This paper addresses a challenging problem — how to generate multi-view
cloth images from only a single view input. To generate realistic-looking
images with different views from the input, we propose a new image generation
model termed VariGANs that combines the strengths of the variational inference
and the Generative Adversarial Networks (GANs). Our proposed VariGANs model
generates the target image in a coarse-to-fine manner instead of a single pass
which suffers from severe artifacts. It first performs variational inference to
model global appearance of the object (e.g., shape and color) and produce a
coarse image with a different view. Conditioned on the generated low resolution
images, it then proceeds to perform adversarial learning to fill details and
generate images of consistent details with the input. Extensive experiments
conducted on two clothing datasets, MVC and DeepFashion, have demonstrated that
images of a novel view generated by our model are more plausible than those
generated by existing approaches, in terms of more consistent global appearance
as well as richer and sharper details.

An improved ellipsoid fitting algorithm using iterative random projections

Amit Reza, Anand S. Sengupta
Comments: Submitted to Applied Mathematics and Computation (Elsevier)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A generalized method for ellipsoid fitting against a minimum set of data
points is described. The proposed method is numerically stable and applies to a
wide range of ellipsoidal shapes, including highly elongated and arbitrarily
oriented ellipsoids. This new method also provides for the retrieval of
rotational angle and length of semi-axes of the fitted ellipsoids accurately.
We demonstrate the efficacy of this algorithm on simulated data sets and also
indicate its potential use in gravitational wave data analysis.

Gang of GANs: Generative Adversarial Networks with Maximum Margin Ranking

Felix Juefei-Xu, Vishnu Naresh Boddeti, Marios Savvides
Comments: 16 pages. 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Traditional generative adversarial networks (GAN) and many of its variants
are trained by minimizing the KL or JS-divergence loss that measures how close
the generated data distribution is from the true data distribution. A recent
advance called the WGAN based on Wasserstein distance can improve on the KL and
JS-divergence based GANs, and alleviate the gradient vanishing, instability,
and mode collapse issues that are common in the GAN training. In this work, we
aim at improving on the WGAN by first generalizing its discriminator loss to a
margin-based one, which leads to a better discriminator, and in turn a better
generator, and then carrying out a progressive training paradigm involving
multiple GANs to contribute to the maximum margin ranking loss so that the GAN
at later stages will improve upon early stages. We call this method Gang of
GANs (GoGAN). We have shown theoretically that the proposed GoGAN can reduce
the gap between the true data distribution and the generated data distribution
by at least half in an optimally trained WGAN. We have also proposed a new way
of measuring GAN quality which is based on image completion tasks. We have
evaluated our method on four visual datasets: CelebA, LSUN Bedroom, CIFAR-10,
and 50K-SSFF, and have seen both visual and quantitative improvement over
baseline WGAN.

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a class of efficient models called MobileNets for mobile and
embedded vision applications. MobileNets are based on a streamlined
architecture that uses depth-wise separable convolutions to build light weight
deep neural networks. We introduce two simple global hyper-parameters that
efficiently trade off between latency and accuracy. These hyper-parameters
allow the model builder to choose the right sized model for their application
based on the constraints of the problem. We present extensive experiments on
resource and accuracy tradeoffs and show strong performance compared to other
popular models on ImageNet classification. We then demonstrate the
effectiveness of MobileNets across a wide range of applications and use cases
including object detection, finegrain classification, face attributes and large
scale geo-localization.

Replicator Equation: Applications Revisited

Tinsae G.Dulecha
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The replicator equation is a simple model of evolution that leads to stable
form of Nash Equilibrium, Evolutionary Stable Strategy (ESS). It has been
studied in connection with Evolutionary Game Theory and was originally
developed for symmetric games. Beyond its first emphasis in biological use,
evolutionary game theory has been expanded well beyond in social studies for
behavioral analysis, in machine learning, computer vision and others. Its
several applications in the fields of machine learning and computer vision has
drawn my attention which is the reason to write this extended abstract

Harvesting Multiple Views for Marker-less 3D Human Pose Annotations

Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis
Comments: CVPR 2017 Camera Ready
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances with Convolutional Networks (ConvNets) have shifted the
bottleneck for many computer vision tasks to annotated data collection. In this
paper, we present a geometry-driven approach to automatically collect
annotations for human pose prediction tasks. Starting from a generic ConvNet
for 2D human pose, and assuming a multi-view setup, we describe an automatic
way to collect accurate 3D human pose annotations. We capitalize on constraints
offered by the 3D geometry of the camera setup and the 3D structure of the
human body to probabilistically combine per view 2D ConvNet predictions into a
globally optimal 3D pose. This 3D pose is used as the basis for harvesting
annotations. The benefit of the annotations produced automatically with our
approach is demonstrated in two challenging settings: (i) fine-tuning a generic
ConvNet-based 2D pose predictor to capture the discriminative aspects of a
subject’s appearance (i.e.,”personalization”), and (ii) training a ConvNet from
scratch for single view 3D human pose prediction without leveraging 3D pose
groundtruth. The proposed multi-view pose estimator achieves state-of-the-art
results on standard benchmarks, demonstrating the effectiveness of our method
in exploiting the available multi-view information.

AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching

David Novotny, Diane Larlus, Andrea Vedaldi
Comments: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite significant progress of deep learning in recent years,
state-of-the-art semantic matching methods still rely on legacy features such
as SIFT or HoG. We argue that the strong invariance properties that are key to
the success of recent deep architectures on the classification task make them
unfit for dense correspondence tasks, unless a large amount of supervision is
used. In this work, we propose a deep network, termed AnchorNet, that produces
image representations that are well-suited for semantic matching. It relies on
a set of filters whose response is geometrically consistent across different
object instances, even in the presence of strong intra-class, scale, or
viewpoint variations. Trained only with weak image-level labels, the final
representation successfully captures information about the object structure and
improves results of state-of-the-art semantic matching methods such as the
deformable spatial pyramid or the proposal flow methods. We show positive
results on the cross-instance matching task where different instances of the
same object category are matched as well as on a new cross-category semantic
matching task aligning pairs of instances each from a different object class.

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

Amir Mazaheri, Dong Zhang, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Given a video and a description sentence with one missing word (we call it
the “source sentence”), Video-Fill-In-the-Blank (VFIB) problem is to find the
missing word automatically. The contextual information of the sentence, as well
as visual cues from the video, are important to infer the missing word
accurately. Since the source sentence is broken into two fragments: the
sentence’s left fragment (before the blank) and the sentence’s right fragment
(after the blank), traditional Recurrent Neural Networks cannot encode this
structure accurately because of many possible variations of the missing word in
terms of the location and type of the word in the source sentence. For example,
a missing word can be the first word or be in the middle of the sentence and it
can be a verb or an adjective. In this paper, we propose a framework to tackle
the textual encoding: Two separate LSTMs (the LR and RL LSTMs) are employed to
encode the left and right sentence fragments and a novel structure is
introduced to combine each fragment with an “external memory” corresponding the
opposite fragments. For the visual encoding, end-to-end spatial and temporal
attention models are employed to select discriminative visual representations
to find the missing word. In the experiments, we demonstrate the superior
performance of the proposed method on challenging VFIB problem. Furthermore, we
introduce an extended and more generalized version of VFIB, which is not
limited to a single blank. Our experiments indicate the generalization
capability of our method in dealing with such more realistic scenarios.

Temporal Action Localization by Structured Maximal Sums

Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address the problem of temporal action localization in videos. We pose
action localization as a structured prediction over arbitrary-length temporal
windows, where each window is scored as the sum of frame-wise classification
scores. Additionally, our model classifies the start, middle, and end of each
action as separate components, allowing our system to explicitly model each
action’s temporal evolution and take advantage of informative temporal
dependencies present in this structure. In this framework, we localize actions
by searching for the structured maximal sum, a problem for which we develop a
novel, provably-efficient algorithmic solution. The frame-wise classification
scores are computed using features from a deep Convolutional Neural Network
(CNN), which are trained end-to-end to directly optimize for a novel structured
objective. We evaluate our system on the THUMOS 14 action detection benchmark
and achieve competitive performance.

Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification with Convolutional Neural Networks

Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text in natural images contains plenty of semantics that are often highly
relevant to objects or scene. In this paper, we are concerned with the problem
on fully exploiting scene text for visual understanding. The basic idea is
combining word representations and deep visual features into a globally
trainable deep convolutional neural network. First, the recognized words are
obtained by a scene text reading system. Then, we combine the word embedding of
the recognized words and the deep visual features into a single representation,
which is optimized by a convolutional neural network for fine-grained image
classification. In our framework, the attention mechanism is adopted to reveal
the relevance between each recognized word and the given image, which
reasonably enhances the recognition performance. We have performed experiments
on two datasets: Con-Text dataset and Drink Bottle dataset, that are proposed
for fine-grained classification of business places and drink bottles
respectively. The experimental results consistently demonstrate that the
proposed method combining textual and visual cues significantly outperforms
classification with only visual representations. Moreover, we have shown that
the learned representation improves the retrieval performance on the drink
bottle images by a large margin, which is potentially useful in product search.

Deep Learning for Photoacoustic Tomography from Sparse Data

Stephan Antholzer, Markus Haltmeier, Johannes Schwab
Comments: 13 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

The development of fast and accurate image reconstruction algorithms is a
central aspect of computed tomography. In this paper we investigate this issue
for the sparse data problem of photoacoustic tomography (PAT). We develop
direct and highly efficient reconstruction algorithms based on deep-learning.
In this approach image reconstruction is performed with a deep convolutional
neural network (CNN), whose weights are adjusted prior to the actual image
reconstruction based on a set of training data. Our results demonstrate that
the proposed deep learning approach reconstructs images with a quality
komparable to state of the art iterative approaches from sparse data. At the
same time, the numerically complexity of our approach is much smaller and the
image reconstruction is performed in a fraction of the time required by
iterative methods.

Interpretable 3D Human Action Analysis with Temporal Convolutional Networks

Tae Soo Kim, Austin Reiter
Comments: 8 pages, 5 figures, BNMW CVPR 2017 Submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The discriminative power of modern deep learning models for 3D human action
recognition is growing ever so potent. In conjunction with the recent
resurgence of 3D human action representation with 3D skeletons, the quality and
the pace of recent progress have been significant. However, the inner workings
of state-of-the-art learning based methods in 3D human action recognition still
remain mostly black-box. In this work, we propose to use a new class of models
known as Temporal Convolutional Neural Networks (TCN) for 3D human action
recognition. Compared to popular LSTM-based Recurrent Neural Network models,
given interpretable input such as 3D skeletons, TCN provides us a way to
explicitly learn readily interpretable spatio-temporal representations for 3D
human action recognition. We provide our strategy in re-designing the TCN with
interpretability in mind and how such characteristics of the model is leveraged
to construct a powerful 3D activity recognition method. Through this work, we
wish to take a step towards a spatio-temporal model that is easier to
understand, explain and interpret. The resulting model, Res-TCN, achieves
state-of-the-art results on the largest 3D human action recognition dataset,
NTU-RGBD.

Recovery of damped exponentials using structured low rank matrix completion

Arvind Balachandrasekaran, Vincent Magnotta, Mathews Jacob
Comments: 14 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce a structured low rank matrix completion algorithm to recover a
series of images from their under-sampled measurements, where the signal along
the parameter dimension at every pixel is described by a linear combination of
exponentials. We exploit the exponential behavior of the signal at every pixel,
along with the spatial smoothness of the exponential parameters to derive an
annihilation relation in the Fourier domain. This relation translates to a
low-rank property on a structured matrix constructed from the Fourier samples.
We enforce the low rank property of the structured matrix as a regularization
prior to recover the images. Since the direct use of current low rank matrix
recovery schemes to this problem is associated with high computational
complexity and memory demand, we adopt an iterative re-weighted least squares
(IRLS) algorithm, which facilitates the exploitation of the convolutional
structure of the matrix. Novel approximations involving two dimensional Fast
Fourier Transforms (FFT) are introduced to drastically reduce the memory demand
and computational complexity, which facilitates the extension of structured low
rank methods to large scale three dimensional problems. We demonstrate our
algorithm in the MR parameter mapping setting and show improvement over the
state-of-the-art methods.

Improving Object Detection With One Line of Code

Navaneeth Bodla, Bharat Singh, Rama Chellappa, Larry S. Davis
Comments: ICCV 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Non-maximum suppression is an integral part of the object detection pipeline.
First, it sorts all detection boxes on the basis of their scores. The detection
box M with the maximum score is selected and all other detection boxes with a
significant overlap (using a pre-defined threshold) with M are suppressed. This
process is recursively applied on the remaining boxes. As per the design of the
algorithm, if an object lies within the predefined overlap threshold, it leads
to a miss. To this end, we propose Soft-NMS, an algorithm which decays the
detection scores of all other objects as a continuous function of their overlap
with M. Hence, no object is eliminated in this process. Soft-NMS obtains
consistent improvements for the coco-style mAP metric on standard datasets like
PASCAL VOC 2007 (1.7\% for both R-FCN and Faster-RCNN) and MS-COCO (1.3\% for
R-FCN and 1.1\% for Faster-RCNN) by just changing the NMS algorithm without any
additional hyper-parameters. Further, the computational complexity of Soft-NMS
is the same as traditional NMS and hence it can be efficiently implemented.
Since Soft-NMS does not require any extra training and is simple to implement,
it can be easily integrated into any object detection pipeline. Code for
Soft-NMS is publicly available on GitHub url{this http URL}.

CT Image Reconstruction in a Low Dimensional Manifold

Wenxiang Cong, Ge Wang, Qingsong Yang, Jiang Hsieh, Jia Li, Rongjie Lai
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)

Regularization methods are commonly used in X-ray CT image reconstruction.
Different regularization methods reflect the characterization of different
prior knowledge of images. In a recent work, a new regularization method called
a low-dimensional manifold model (LDMM) is investigated to characterize the
low-dimensional patch manifold structure of natural images, where the manifold
dimensionality characterizes structural information of an image. In this paper,
we propose a CT image reconstruction method based on the prior knowledge of the
low-dimensional manifold of CT image. Using the clinical raw projection data
from GE clinic, we conduct comparisons for the CT image reconstruction among
the proposed method, the simultaneous algebraic reconstruction technique (SART)
with the total variation (TV) regularization, and the filtered back projection
(FBP) method. Results show that the proposed method can successfully recover
structural details of an imaging object, and achieve higher spatial and
contrast resolution of the reconstructed image than counterparts of FBP and
SART with TV.

Trigger for the SoLid Reactor Antineutrino Experiment

Lukas On Arnold, for the SoLid collaboration
Comments: Poster presented at NuPhys2016 (London, 12-14 December 2016). 8 pages, LaTeX, 6 png figures, 1 pdf figure
Subjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); High Energy Physics – Experiment (hep-ex)

SoLid, located at SCK-CEN in Mol, Belgium, is a reactor antineutrino
experiment at a very short baseline of 5.5 — 10m aiming at the search for
sterile neutrinos and for high precision measurement of the neutrino energy
spectrum of Uranium-235. It uses a novel approach using Lithium-6 sheets and
PVT cubes as scintillators for tagging the Inverse Beta-Decay products (neutron
and positron). Being located overground and close to the BR2 research reactor,
the experiment faces a large amount of backgrounds. Efficient real-time
background and noise rejection is essential in order to increase the
signal-background ratio for precise oscillation measurement and decrease data
production to a rate which can be handled by the online software. Therefore, a
reliable distinction between the neutrons and background signals is crucial.
This can be performed online with a dedicated firmware trigger. A peak counting
algorithm and an algorithm measuring time over threshold have been identified
as performing well both in terms of efficiency and fake rate, and have been
implemented onto FPGA.

Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy

Jan Kremer, Kristoffer Stensbo-Smidt, Fabian Gieseke, Kim Steenstrup Pedersen, Christian Igel
Journal-ref: IEEE Intelligent Systems, vol. 32, no. , pp. 16-22, Mar.-Apr. 2017
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Astrophysics and cosmology are rich with data. The advent of wide-area
digital cameras on large aperture telescopes has led to ever more ambitious
surveys of the sky. Data volumes of entire surveys a decade ago can now be
acquired in a single night and real-time analysis is often desired. Thus,
modern astronomy requires big data know-how, in particular it demands highly
efficient machine learning and image analysis algorithms. But scalability is
not the only challenge: Astronomy applications touch several current machine
learning research questions, such as learning from biased data and dealing with
label and measurement noise. We argue that this makes astronomy a great domain
for computer science research, as it pushes the boundaries of data analysis. In
the following, we will present this exciting application area for data
scientists. We will focus on exemplary results, discuss main challenges, and
highlight some recent methodological advancements in machine learning and image
analysis triggered by astronomical applications.

A learning-based approach for automatic image and video colorization

Raj Kumar Gupta, Alex Yong-Sang Chia, Deepu Rajan, Huang Zhiyong
Comments: Computer Graphics International – 2012
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a color transfer algorithm to colorize a broad
range of gray images without any user intervention. The algorithm uses a
machine learning-based approach to automatically colorize grayscale images. The
algorithm uses the superpixel representation of the reference color images to
learn the relationship between different image features and their corresponding
color values. We use this learned information to predict the color value of
each grayscale image superpixel. As compared to processing individual image
pixels, our use of superpixels helps us to achieve a much higher degree of
spatial consistency as well as speeds up the colorization process. The
predicted color values of the gray-scale image superpixels are used to provide
a ‘micro-scribble’ at the centroid of the superpixels. These color scribbles
are refined by using a voting based approach. To generate the final
colorization result, we use an optimization-based approach to smoothly spread
the color scribble across all pixels within a superpixel. Experimental results
on a broad range of images and the comparison with existing state-of-the-art
colorization methods demonstrate the greater effectiveness of the proposed
algorithm.

ShapeWorld – A new test methodology for multimodal language understanding

Alexander Kuhnle, Ann Copestake
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

We introduce a novel framework for evaluating multimodal deep learning models
with respect to their language understanding and generalization abilities. In
this approach, artificial data is automatically generated according to the
experimenter’s specifications. The content of the data, both during training
and evaluation, can be controlled in detail, which enables tasks to be created
that require true generalization abilities, in particular the combination of
previously introduced concepts in novel ways. We demonstrate the potential of
our methodology by evaluating various visual question answering models on four
different tasks, and show how our framework gives us detailed insights into
their capabilities and limitations. By open-sourcing our framework, we hope to
stimulate progress in the field of multimodal language understanding.

Artificial Intelligence

Morpheo: Traceable Machine Learning on Hidden data

Mathieu Galtier, Camille Marini
Comments: whitepaper, 9 pages, 6 figures
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

Morpheo is a transparent and secure machine learning platform collecting and
analysing large datasets. It aims at building state-of-the art prediction
models in various fields where data are sensitive. Indeed, it offers strong
privacy of data and algorithm, by preventing anyone to read the data, apart
from the owner and the chosen algorithms. Computations in Morpheo are
orchestrated by a blockchain infrastructure, thus offering total traceability
of operations. Morpheo aims at building an attractive economic ecosystem around
data prediction by channelling crypto-money from prediction requests to useful
data and algorithms providers. Morpheo is designed to handle multiple data
sources in a transfer learning approach in order to mutualize knowledge
acquired from large datasets for applications with smaller but similar
datasets.

Probabilistic programs for inferring the goals of autonomous agents

Marco F. Cusumano-Towner, Alexey Radul, David Wingate, Vikash K. Mansinghka
Subjects: Artificial Intelligence (cs.AI)

Intelligent systems sometimes need to infer the probable goals of people,
cars, and robots, based on partial observations of their motion. This paper
introduces a class of probabilistic programs for formulating and solving these
problems. The formulation uses randomized path planning algorithms as the basis
for probabilistic models of the process by which autonomous agents plan to
achieve their goals. Because these path planning algorithms do not have
tractable likelihood functions, new inference algorithms are needed. This paper
proposes two Monte Carlo techniques for these “likelihood-free” models, one of
which can use likelihood estimates from neural networks to accelerate
inference. The paper demonstrates efficacy on three simple examples, each using
under 50 lines of probabilistic code.

Pseudorehearsal in actor-critic agents

Marochko Vladimir, Leonard Johard, Manuel Mazzara
Subjects: Artificial Intelligence (cs.AI)

Catastrophic forgetting has a serious impact in reinforcement learning, as
the data distribution is generally sparse and non-stationary over time. The
purpose of this study is to investigate whether pseudorehearsal can increase
performance of an actor-critic agent with neural-network based policy selection
and function approximation in a pole balancing task and compare different
pseudorehearsal approaches. We expect that pseudorehearsal assists learning
even in such very simple problems, given proper initialization of the rehearsal
parameters.

Approximating the Backbone in the Weighted Maximum Satisfiability Problem

He Jiang, Jifeng Xuan, Yan Hu
Comments: 14 pages, 1 figure, Proceedings of Advances in Knowledge Discovery and Data Mining 2008 (PAKDD 2008), 2008
Subjects: Artificial Intelligence (cs.AI)

The weighted Maximum Satisfiability problem (weighted MAX-SAT) is a NP-hard
problem with numerous applications arising in artificial intelligence. As an
efficient tool for heuristic design, the backbone has been applied to
heuristics design for many NP-hard problems. In this paper, we investigated the
computational complexity for retrieving the backbone in weighted MAX-SAT and
developed a new algorithm for solving this problem. We showed that it is
intractable to retrieve the full backbone under the assumption that . Moreover,
it is intractable to retrieve a fixed fraction of the backbone as well. And
then we presented a backbone guided local search (BGLS) with Walksat operator
for weighted MAX-SAT. BGLS consists of two phases: the first phase samples the
backbone information from local optima and the backbone phase conducts local
search under the guideline of backbone. Extensive experimental results on the
benchmark showed that BGLS outperforms the existing heuristics in both solution
quality and runtime.

FML-based Prediction Agent and Its Application to Game of Go

Chang-Shing Lee, Mei-Hui Wang, Chia-Hsiu Kao, Sheng-Chi Yang, Yusuke Nojima, Ryosuke Saga, Nan Shuo, Naoyuki Kubota
Comments: 6 pages, 12 figures, Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS 2017), Otsu, Japan, Jun. 27-30, 2017
Subjects: Artificial Intelligence (cs.AI)

In this paper, we present a robotic prediction agent including a darkforest
Go engine, a fuzzy markup language (FML) assessment engine, an FML-based
decision support engine, and a robot engine for game of Go application. The
knowledge base and rule base of FML assessment engine are constructed by
referring the information from the darkforest Go engine located in NUTN and
OPU, for example, the number of MCTS simulations and winning rate prediction.
The proposed robotic prediction agent first retrieves the database of Go
competition website, and then the FML assessment engine infers the winning
possibility based on the information generated by darkforest Go engine. The
FML-based decision support engine computes the winning possibility based on the
partial game situation inferred by FML assessment engine. Finally, the robot
engine combines with the human-friendly robot partner PALRO, produced by
Fujisoft incorporated, to report the game situation to human Go players.
Experimental results show that the FML-based prediction agent can work
effectively.

Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping

Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi, Tetsunari Inamura
Comments: Preprint submitted to 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Received March 1, 2017
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)

In this paper, we propose an online learning algorithm based on a
Rao-Blackwellized particle filter for spatial concept acquisition and mapping.
We have proposed a nonparametric Bayesian spatial concept acquisition model
(SpCoA). We propose a novel method (SpCoSLAM) integrating SpCoA and FastSLAM in
the theoretical framework of the Bayesian generative model. The proposed method
can simultaneously learn place categories and lexicons while incrementally
generating an environmental map. Furthermore, the proposed method has scene
image features and a language model added to SpCoA. In the experiments, we
tested online learning of spatial concepts and environmental maps in a novel
environment of which the robot did not have a map. Then, we evaluated the
results of online learning of spatial concepts and lexical acquisition. The
experimental results demonstrated that the robot was able to more accurately
learn the relationships between words and the place in the environmental map
incrementally by using the proposed method.

The Reactor: A Sample-Efficient Actor-Critic Architecture

Audrunas Gruslys, Mohammad Gheshlaghi Azar, Marc G. Bellemare, Remi Munos
Subjects: Artificial Intelligence (cs.AI)

In this work we present a new reinforcement learning agent, called Reactor
(for Retrace-actor), based on an off-policy multi-step return actor-critic
architecture. The agent uses a deep recurrent neural network for function
approximation. The network outputs a target policy {pi} (the actor), an
action-value Q-function (the critic) evaluating the current policy {pi}, and
an estimated behavioral policy {hat mu} which we use for off-policy
correction. The agent maintains a memory buffer filled with past experiences.
The critic is trained by the multi-step off-policy Retrace algorithm and the
actor is trained by a novel {eta}-leave-one-out policy gradient estimate
(which uses both the off-policy corrected return and the estimated Q-function).
The Reactor is sample-efficient thanks to the use of memory replay, and
numerical efficient since it uses multi-step returns. Also both acting and
learning can be parallelized. We evaluated our algorithm on 57 Atari 2600 games
and demonstrate that it achieves state-of-the-art performance.

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

Fanhua Shang
Comments: 36 pages, 10 figures. The simple variant of SVRG is much better than the best-known stochastic method, Katyusha
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

In this paper, we propose a simple variant of the original stochastic
variance reduction gradient (SVRG), where hereafter we refer to as the variance
reduced stochastic gradient descent (VR-SGD). Different from the choices of the
snapshot point and starting point in SVRG and its proximal variant, Prox-SVRG,
the two vectors of each epoch in VR-SGD are set to the average and last iterate
of the previous epoch, respectively. This setting allows us to use much larger
learning rates or step sizes than SVRG, e.g., 3/(7L) for VR-SGD vs 1/(10L) for
SVRG, and also makes our convergence analysis more challenging. In fact, a
larger learning rate enjoyed by VR-SGD means that the variance of its
stochastic gradient estimator asymptotically approaches zero more rapidly.
Unlike common stochastic methods such as SVRG and proximal stochastic methods
such as Prox-SVRG, we design two different update rules for smooth and
non-smooth objective functions, respectively. In other words, VR-SGD can tackle
non-smooth and/or non-strongly convex problems directly without using any
reduction techniques such as quadratic regularizers. Moreover, we analyze the
convergence properties of VR-SGD for strongly convex problems, which show that
VR-SGD attains a linear convergence rate. We also provide the convergence
guarantees of VR-SGD for non-strongly convex problems. Experimental results
show that the performance of VR-SGD is significantly better than its
counterparts, SVRG and Prox-SVRG, and it is also much better than the best
known stochastic method, Katyusha.

Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention

Feiyun Zhu, Peng Liao
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Online reinforcement learning (RL) is increasingly popular for the
personalized mobile health (mHealth) intervention. It is able to personalize
the type and dose of interventions according to user’s ongoing statuses and
changing needs. However, at the beginning of online learning, there are usually
too few samples to support the RL updating, which leads to poor performances. A
delay in good performance of the online learning algorithms can be especially
detrimental in the mHealth, where users tend to quickly disengage with apps. To
address this problem, we propose a new online RL methodology that focuses on an
effective warm start. The main idea is to make full use of the data accumulated
and the decision rule achieved in a former study. As a result, we can greatly
enrich the data size at the beginning of online learning in our method. Such
case accelerates the online learning process for new users to achieve good
performances not only at the beginning of online learning but also through the
whole online learning process. Besides, we use the decision rules achieved
previously to initialize the parameter in our online RL model for new users. It
provides a good initialization for the proposed online RL algorithm. Experiment
results show that promising improvements have been achieved by our method
compared with the state-of-the-art method.

A Novel Experimental Platform for In-Vessel Multi-Chemical Molecular Communications

Nariman Farsad, David Pan, Andrea Goldsmith
Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Learning (cs.LG)

This work presents a new multi-chemical experimental platform for molecular
communication where the transmitter can release different chemicals. This
platform is designed to be inexpensive and accessible, and it can be expanded
to simulate different environments including the cardiovascular system and
complex network of pipes in industrial complexes and city infrastructures. To
demonstrate the capabilities of the platform, we implement a time-slotted
binary communication system where a bit-0 is represented by an acid pulse, a
bit-1 by a base pulse, and information is carried via pH signals. The channel
model for this system, which is nonlinear and has long memories, is unknown.
Therefore, we devise novel detection algorithms that use techniques from
machine learning and deep learning to train a maximum-likelihood detector.
Using these algorithms the bit error rate improves by an order of magnitude
relative to the approach used in previous works. Moreover, our system achieves
a data rate that is an order of magnitude higher than any of the previous
molecular communication platforms.

A Security Monitoring Framework For Virtualization Based HEP Infrastructures

A. Gomez Ramirez, M. Martinez Pedreira, C. Grigoras, L. Betev, C. Lara, U. Kebschull for the ALICE Collaboration
Comments: Proceedings of the 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, 10-14 October 2016, San Francisco. Submitted to Journal of Physics: Conference Series (JPCS)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); High Energy Physics – Experiment (hep-ex)

High Energy Physics (HEP) distributed computing infrastructures require
automatic tools to monitor, analyze and react to potential security incidents.
These tools should collect and inspect data such as resource consumption, logs
and sequence of system calls for detecting anomalies that indicate the presence
of a malicious agent. They should also be able to perform automated reactions
to attacks without administrator intervention. We describe a novel framework
that accomplishes these requirements, with a proof of concept implementation
for the ALICE experiment at CERN. We show how we achieve a fully virtualized
environment that improves the security by isolating services and Jobs without a
significant performance impact. We also describe a collected dataset for
Machine Learning based Intrusion Prevention and Detection Systems on Grid
computing. This dataset is composed of resource consumption measurements (such
as CPU, RAM and network traffic), logfiles from operating system services, and
system call data collected from production Jobs running in an ALICE Grid test
site and a big set of malware. This malware was collected from security
research sites. Based on this dataset, we will proceed to develop Machine
Learning algorithms able to detect malicious Jobs.

Learn-Memorize-Recall-Reduce A Robotic Cloud Computing Paradigm

Shaoshan Liu, Bolin Ding, Jie Tang, Dawei Sun, Zhe Zhang, Grace Tsai, Jean-Luc Gaudiot
Comments: 6 pages, 7 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Robotics (cs.RO)

The rise of robotic applications has led to the generation of a huge volume
of unstructured data, whereas the current cloud infrastructure was designed to
process limited amounts of structured data. To address this problem, we propose
a learn-memorize-recall-reduce paradigm for robotic cloud computing. The
learning stage converts incoming unstructured data into structured data; the
memorization stage provides effective storage for the massive amount of data;
the recall stage provides efficient means to retrieve the raw data; while the
reduction stage provides means to make sense of this massive amount of
unstructured data with limited computing resources.

RACE: Large-scale ReAding Comprehension Dataset From Examinations

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present RACE, a new dataset for benchmark evaluation of methods in the
reading comprehension task. Collected from the English exams for middle and
high school Chinese students in the age range between 12 to 18, RACE consists
of 28,000+ passages and near 100,000 questions generated by human experts
(English instructors), and covers a variety of topics which are carefully
designed for evaluating the students’ ability in understanding and reasoning.
In particular, the proportion of questions that requires reasoning is much
larger in RACE than that in other benchmark datasets for reading comprehension,
and there is a significant gap between the performance of the state-of-the-art
models (43%) and the ceiling human performance (95%). We hope this new dataset
can serve as a valuable resource for research and evaluation in machine
comprehension. The dataset is freely available at
this http URL

ShapeWorld – A new test methodology for multimodal language understanding

Alexander Kuhnle, Ann Copestake
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

We introduce a novel framework for evaluating multimodal deep learning models
with respect to their language understanding and generalization abilities. In
this approach, artificial data is automatically generated according to the
experimenter’s specifications. The content of the data, both during training
and evaluation, can be controlled in detail, which enables tasks to be created
that require true generalization abilities, in particular the combination of
previously introduced concepts in novel ways. We demonstrate the potential of
our methodology by evaluating various visual question answering models on four
different tasks, and show how our framework gives us detailed insights into
their capabilities and limitations. By open-sourcing our framework, we hope to
stimulate progress in the field of multimodal language understanding.

Information Retrieval

NEXT: A Neural Network Framework for Next POI Recommendation

Zhiqian Zhang, Chenliang Li, Zhiyong Wu, Aixin Sun, Dengpan Ye, Xiangyang Luo
Subjects: Information Retrieval (cs.IR)

The task of next POI recommendation has been studied extensively in recent
years. However, developing an unified recommendation framework to incorporate
multiple factors associated with both POIs and users remains challenging,
because of the heterogeneity nature of these information. Further, effective
mechanisms to handle cold-start and endow the system with interpretability are
also difficult topics. Inspired by the recent success of neural networks in
many areas, in this paper, we present a simple but effective neural network
framework for next POI recommendation, named NEXT. NEXT is an unified framework
to learn the hidden intent regarding user’s next move, by incorporating
different factors in an unified manner. Specifically, in NEXT, we incorporate
meta-data information and two kinds of temporal contexts (i.e., time interval
and visit time). To leverage sequential relations and geographical influence,
we propose to adopt DeepWalk, a network representation learning technique, to
encode such knowledge. We evaluate the effectiveness of NEXT against
state-of-the-art alternatives and neural networks based solutions. Experimental
results over three publicly available datasets demonstrate that NEXT
significantly outperforms baselines in real-time next POI recommendation.
Further experiments demonstrate the superiority of NEXT in handling cold-start.
More importantly, we show that NEXT provides meaningful explanation of the
dimensions in hidden intent space.

Task-Oriented Query Reformulation with Reinforcement Learning

Rodrigo Nogueira, Kyunghyun Cho
Subjects: Information Retrieval (cs.IR)

Search engines play an important role in our everyday lives by assisting us
in finding the information we need. When we input a complex query, however,
results are often far from satisfactory. In this work, we introduce a query
reformulation system based on a neural network that rewrites a query to
maximize the number of relevant documents returned. We train this neural
network with reinforcement learning. The actions correspond to selecting terms
to build a reformulated query, and the reward is the document recall. We
evaluate our approach on three datasets against strong baselines and show a
relative improvement of 5-20% in terms of recall. Furthermore, we present a
simple method to estimate a conservative upper-bound performance of a model in
a particular environment and verify that there is still large room for
improvements.

Generic LSH Families for the Angular Distance Based on Johnson-Lindenstrauss Projections and Feature Hashing LSH

Luis Argerich, Natalia Golmar
Subjects: Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)

In this paper we propose the creation of generic LSH families for the angular
distance based on Johnson-Lindenstrauss projections. We show that feature
hashing is a valid J-L projection and propose two new LSH families based on
feature hashing. These new LSH families are tested on both synthetic and real
datasets with very good results and a considerable performance improvement over
other LSH families. While the theoretical analysis is done for the angular
distance, these families can also be used in practice for the euclidean
distance with excellent results [2]. Our tests using real datasets show that
the proposed LSH functions work well for the euclidean distance.

Computation and Language

Sparse Communication for Distributed Gradient Descent

Alham Fikri Aji, Kenneth Heafield
Comments: Submitted to EMNLP 2017
Subjects: Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

We make distributed stochastic gradient descent faster by exchanging 99%
sparse updates instead of dense updates. In data-parallel training, nodes pull
updated values of the parameters from a sharded server, compute gradients, push
their gradients to the server, and repeat. These push and pull updates strain
the network. However, most updates are near zero, so we map the 99% smallest
updates (by absolute value) to zero then exchange sparse matrices. Even simple
coordinate and value encoding achieves 50x reduction in bandwidth. Our
experiment with a neural machine translation on 4 GPUs achieved a 22% speed
boost without impacting BLEU score.

Deep Joint Entity Disambiguation with Local Neural Attention

Octavian-Eugen Ganea, Thomas Hofmann
Subjects: Computation and Language (cs.CL)

We propose a novel deep learning model for joint document-level entity
disambiguation, which leverages learned neural representations. Key components
are entity embeddings, a neural attention mechanism over local context windows,
and a differentiable joint inference stage for disambiguation. Our approach
thereby combines benefits of deep learning with more traditional approaches
such as graphical models and probabilistic mention-entity maps. Extensive
experiments show that we are able to obtain competitive or state-of-the-art
accuracy at moderate computational costs.

Learning Character-level Compositionality with Visual Features

Frederick Liu, Han Lu, Chieh Lo, Graham Neubig
Comments: Accepted to ACL 2017
Subjects: Computation and Language (cs.CL)

Previous work has modeled the compositionality of words by creating
character-level models of meaning, reducing problems of sparsity for rare
words. However, in many writing systems compositionality has an effect even on
the character-level: the meaning of a character is derived by the sum of its
parts. In this paper, we model this effect by creating embeddings for
characters based on their visual characteristics, creating an image for the
character and running it through a convolutional neural network to produce a
visual character embedding. Experiments on a text classification task
demonstrate that such model allows for better processing of instances with rare
characters in languages such as Chinese, Japanese, and Korean. Additionally,
qualitative analyses demonstrate that our proposed model learns to focus on the
parts of characters that carry semantic content, resulting in embeddings that
are coherent in visual space.

A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes

Pablo Loyola, Edison Marrese-Taylor, Yutaka Matsuo
Comments: Accepted at ACL 2017
Subjects: Computation and Language (cs.CL)

We propose a model to automatically describe changes introduced in the source
code of a program using natural language. Our method receives as input a set of
code commits, which contains both the modifications and message introduced by
an user. These two modalities are used to train an encoder-decoder
architecture. We evaluated our approach on twelve real world open source
projects from four different programming languages. Quantitative and
qualitative results showed that the proposed approach can generate feasible and
semantically sound descriptions not only in standard in-project settings, but
also in a cross-project setting.

Towards String-to-Tree Neural Machine Translation

Roee Aharoni, Yoav Goldberg
Comments: Accepted as a short paper in ACL 2017
Subjects: Computation and Language (cs.CL)

We present a simple method to incorporate syntactic information about the
target language in a neural machine translation system by translating into
linearized, lexicalized constituency trees. An experiment on the WMT16
German-English news translation task resulted in an improved BLEU score when
compared to a syntax-agnostic NMT baseline trained on the same dataset. An
analysis of the translations from the syntax-aware system shows that it
performs more reordering during translation in comparison to the baseline. A
small-scale human evaluation also showed an advantage to the syntax-aware
system.

RACE: Large-scale ReAding Comprehension Dataset From Examinations

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present RACE, a new dataset for benchmark evaluation of methods in the
reading comprehension task. Collected from the English exams for middle and
high school Chinese students in the age range between 12 to 18, RACE consists
of 28,000+ passages and near 100,000 questions generated by human experts
(English instructors), and covers a variety of topics which are carefully
designed for evaluating the students’ ability in understanding and reasoning.
In particular, the proportion of questions that requires reasoning is much
larger in RACE than that in other benchmark datasets for reading comprehension,
and there is a significant gap between the performance of the state-of-the-art
models (43%) and the ceiling human performance (95%). We hope this new dataset
can serve as a valuable resource for research and evaluation in machine
comprehension. The dataset is freely available at
this http URL

Graph Convolutional Encoders for Syntax-aware Neural Machine Translation

Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, Khalil Sima'an
Subjects: Computation and Language (cs.CL)

We present a simple and effective approach to incorporating syntactic
structure into neural attention-based encoder-decoder models for machine
translation. We rely on graph-convolutional networks (GCNs), a recent class of
neural networks developed for modeling graph-structured data. Our GCNs use
predicted syntactic dependency trees of source sentences to produce
representations of words (i.e. hidden states of the encoder) that are sensitive
to their syntactic neighborhoods. GCNs take word representations as input and
produce word representations as output, so they can easily be incorporated as
layers into standard encoders (e.g., on top of bidirectional RNNs or
convolutional neural networks). We evaluate their effectiveness with
English-German and English-Czech translation experiments for different types of
encoders and observe substantial improvements over their syntax-agnostic
versions in all the considered setups.

MUSE: Modularizing Unsupervised Sense Embeddings

Guang-He Lee, Yun-Nung Chen
Subjects: Computation and Language (cs.CL)

This paper proposes to address the word sense ambiguity issue in an
unsupervised manner, where word sense representations are learned along a word
sense selection mechanism given contexts. Prior work about learning multi-sense
embeddings suffered from either ambiguity of different-level embeddings or
inefficient sense selection. The proposed modular framework, MUSE, implements
flexible modules to optimize distinct mechanisms, achieving the first purely
sense-level representation learning system with linear-time sense selection. We
leverage reinforcement learning to enable joint training on the proposed
modules, and introduce various exploration techniques on sense selection for
better robustness. The experiments on benchmark data show that the proposed
approach achieves the state-of-the-art performance on synonym selection as well
as on contextual word similarities in terms of MaxSimC.

Neural Paraphrase Identification of Questions with Noisy Pretraining

Gaurav Singh Tomar, Thyago Duque, Oscar Täckström, Jakob Uszkoreit, Dipanjan Das
Subjects: Computation and Language (cs.CL)

We present a solution to the problem of paraphrase identification of
questions. We focus on a recent dataset of question pairs annotated with binary
paraphrase labels and show that a variant of the decomposable attention model
(Parikh et al., 2016) results in accurate performance on this task, while being
far simpler than many competing neural architectures. Furthermore, when the
model is pretrained on a noisy dataset of automatically collected question
paraphrases, it obtains the best reported performance on the dataset.

Distributional model on a diet: One-shot word learning from text only

Su Wang, Stephen Roller, Katrin Erk
Comments: Keywords: Distributional semantics; Lexical semantics; Bayesian models
Subjects: Computation and Language (cs.CL)

We test whether distributional models can do one-shot learning of
definitional properties from text only. Using Bayesian models, we find that
first learning overarching structure in the known data, regularities in textual
contexts and in properties, helps one-shot learning, and that individual
context items can be highly informative.

Cross-lingual Abstract Meaning Representation Parsing

Marco Damonte, Shay B. Cohen
Subjects: Computation and Language (cs.CL)

Abstract Meaning Representation (AMR) annotation efforts have mostly focused
on English. In order to train parsers on other languages, we propose a method
based on annotation projection, which involves exploiting annotations in a
source language and a parallel corpus of the source language and a target
language. Using English as the source language, we show promising results for
Italian, Spanish, German and Chinese as target languages. Besides evaluating
the target parsers on non-gold datasets, we further propose an evaluation
method that exploits the English gold annotations and does not require access
to gold annotations for the target languages. This is achieved by inverting the
projection process: a new English parser is learned from the target language
parser and evaluated on the existing English gold standard.

Neural Extractive Summarization with Side Information

Shashi Narayan, Nikos Papasarantopoulos, Mirella Lapata, Shay B. Cohen
Comments: 10 pages
Subjects: Computation and Language (cs.CL)

Most extractive summarization methods focus on the main body of the document
from which sentences need to be extracted. The gist of the document often lies
in the side information of the document, such as title and image captions.
These types of side information are often available for newswire articles. We
propose to explore side information in the context of single-document
extractive summarization. We develop a framework for single-document
summarization composed of a hierarchical document encoder and an
attention-based extractor with attention over side information. We evaluate our
models on a large scale news dataset. We show that extractive summarization
with side information consistently outperforms its counterpart (that does not
use any side information), in terms on both informativeness and fluency.

Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation

Zi Long, Takehito Utsuro, Tomoharu Mitsuhashi, Mikio Yamamoto
Comments: WAT 2016
Subjects: Computation and Language (cs.CL)

Neural machine translation (NMT), a new approach to machine translation, has
achieved promising results comparable to those of traditional approaches such
as statistical machine translation (SMT). Despite its recent success, NMT
cannot handle a larger vocabulary because training complexity and decoding
complexity proportionally increase with the number of target words. This
problem becomes even more serious when translating patent documents, which
contain many technical terms that are observed infrequently. In NMTs, words
that are out of vocabulary are represented by a single unknown token. In this
paper, we propose a method that enables NMT to translate patent sentences
comprising a large vocabulary of technical terms. We train an NMT system on
bilingual data wherein technical terms are replaced with technical term tokens;
this allows it to translate most of the source sentences except technical
terms. Further, we use it as a decoder to translate source sentences with
technical term tokens and replace the tokens with technical term translations
using SMT. We also use it to rerank the 1,000-best SMT translations on the
basis of the average of the SMT score and that of the NMT rescoring of the
translated sentences with technical term tokens. Our experiments on
Japanese-Chinese patent sentences show that the proposed NMT system achieves a
substantial improvement of up to 3.1 BLEU points and 2.3 RIBES points over
traditional SMT systems and an improvement of approximately 0.6 BLEU points and
0.8 RIBES points over an equivalent NMT system without our proposed technique.

Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy

Zi Long, Takehito Utsuro, Tomoharu Mitsuhashi, Mikio Yamamoto
Comments: EMNLP 2017, under review
Subjects: Computation and Language (cs.CL)

Neural machine translation (NMT), a new approach to machine translation, has
achieved promising results comparable to those of traditional approaches such
as statistical machine translation (SMT). Despite its recent success, NMT
cannot handle a larger vocabulary because the training complexity and decoding
complexity proportionally increase with the number of target words. This
problem becomes even more serious when translating patent documents, which
contain many technical terms that are observed infrequently. In this paper, we
propose to select phrases that contain out-of-vocabulary words using the
statistical approach of branching entropy. This allows the proposed NMT system
to be applied to a translation task of any language pair without any
language-specific knowledge about technical term identification. The selected
phrases are then replaced with tokens during training and post-translated by
the phrase translation table of SMT. Evaluation on Japanese-to-Chinese and
Chinese-to-Japanese patent sentence translation proved the effectiveness of
phrases selected with branching entropy, where the proposed NMT system achieves
a substantial improvement over a baseline NMT system without our proposed
technique.

ShapeWorld – A new test methodology for multimodal language understanding

Alexander Kuhnle, Ann Copestake
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

We introduce a novel framework for evaluating multimodal deep learning models
with respect to their language understanding and generalization abilities. In
this approach, artificial data is automatically generated according to the
experimenter’s specifications. The content of the data, both during training
and evaluation, can be controlled in detail, which enables tasks to be created
that require true generalization abilities, in particular the combination of
previously introduced concepts in novel ways. We demonstrate the potential of
our methodology by evaluating various visual question answering models on four
different tasks, and show how our framework gives us detailed insights into
their capabilities and limitations. By open-sourcing our framework, we hope to
stimulate progress in the field of multimodal language understanding.

Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping

In this paper, we propose an online learning algorithm based on a
Rao-Blackwellized particle filter for spatial concept acquisition and mapping.
We have proposed a nonparametric Bayesian spatial concept acquisition model
(SpCoA). We propose a novel method (SpCoSLAM) integrating SpCoA and FastSLAM in
the theoretical framework of the Bayesian generative model. The proposed method
can simultaneously learn place categories and lexicons while incrementally
generating an environmental map. Furthermore, the proposed method has scene
image features and a language model added to SpCoA. In the experiments, we
tested online learning of spatial concepts and environmental maps in a novel
environment of which the robot did not have a map. Then, we evaluated the
results of online learning of spatial concepts and lexical acquisition. The
experimental results demonstrated that the robot was able to more accurately
learn the relationships between words and the place in the environmental map
incrementally by using the proposed method.

Distributed, Parallel, and Cluster Computing

Space-Optimal Majority in Population Protocols

Dan Alistarh, James Aspnes, Rati Gelashvili
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Population protocols are a model of distributed computing, in which (n)
agents with limited local state interact randomly, and cooperate to
collectively compute global predicates. An extensive series of papers, across
different communities, has examined the computability and complexity
characteristics of this model. Majority, or consensus, is a central task, in
which agents need to collectively reach a decision as to which one of two
states (A) or (B) had a higher initial count. Two complexity metrics are
important: the time that a protocol requires to stabilize to an output
decision, and the state space size that each agent requires.

It is known that majority requires (Omega(log log n)) states per agent to
allow for poly-logarithmic time stabilization, and that (O(log^2 n)) states
are sufficient. Thus, there is an exponential gap between the upper and lower
bounds.

We address this question. We provide a new lower bound of (Omega(log n))
states for any protocol which stabilizes in (O( n^c )) time, for any (c leq 1)
constant. This result is conditional on basic monotonicity and output
assumptions, satisfied by all known protocols. Technically, it represents a
significant departure from previous lower bounds, as it does not rely on the
existence of dense configurations. Instead, we introduce a new generalized
surgery technique to prove the existence of incorrect executions which
contradict the correctness of algorithms that stabilize too fast.

We give an algorithm for majority which uses (O(log n)) states, and
stabilizes in (O(log^2 n)) time. Central to the algorithm is a new leaderless
phase clock, which allows nodes to synchronize in phases of (Theta(n log{n}))
consecutive interactions using (O(log n)) states per node. We also build a new
leader election algorithm with (O(log n )) states, which stabilizes in
(O(log^2 n)) time.

A Security Monitoring Framework For Virtualization Based HEP Infrastructures

High Energy Physics (HEP) distributed computing infrastructures require
automatic tools to monitor, analyze and react to potential security incidents.
These tools should collect and inspect data such as resource consumption, logs
and sequence of system calls for detecting anomalies that indicate the presence
of a malicious agent. They should also be able to perform automated reactions
to attacks without administrator intervention. We describe a novel framework
that accomplishes these requirements, with a proof of concept implementation
for the ALICE experiment at CERN. We show how we achieve a fully virtualized
environment that improves the security by isolating services and Jobs without a
significant performance impact. We also describe a collected dataset for
Machine Learning based Intrusion Prevention and Detection Systems on Grid
computing. This dataset is composed of resource consumption measurements (such
as CPU, RAM and network traffic), logfiles from operating system services, and
system call data collected from production Jobs running in an ALICE Grid test
site and a big set of malware. This malware was collected from security
research sites. Based on this dataset, we will proceed to develop Machine
Learning algorithms able to detect malicious Jobs.

Learn-Memorize-Recall-Reduce A Robotic Cloud Computing Paradigm

The rise of robotic applications has led to the generation of a huge volume
of unstructured data, whereas the current cloud infrastructure was designed to
process limited amounts of structured data. To address this problem, we propose
a learn-memorize-recall-reduce paradigm for robotic cloud computing. The
learning stage converts incoming unstructured data into structured data; the
memorization stage provides effective storage for the massive amount of data;
the recall stage provides efficient means to retrieve the raw data; while the
reduction stage provides means to make sense of this massive amount of
unstructured data with limited computing resources.

A novel approach for fast mining frequent itemsets use N-list structure based on MapReduce

Arkan A. G. Al-Hamodi, Songfeng Lu
Comments: 11 pages, 10 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

Frequent Pattern Mining is a one field of the most significant topics in data
mining. In recent years, many algorithms have been proposed for mining frequent
itemsets. A new algorithm has been presented for mining frequent itemsets based
on N-list data structure called Prepost algorithm. The Prepost algorithm is
enhanced by implementing compact PPC-tree with the general tree. Prepost
algorithm can only find a frequent itemsets with required (pre-order and
post-order) for each node. In this chapter, we improved prepost algorithm based
on Hadoop platform (HPrepost), proposed using the Mapreduce programming model.
The main goals of proposed method are efficient mining frequent itemsets
requiring less running time and memory usage. We have conduct experiments for
the proposed scheme to compare with another algorithms. With dense datasets,
which have a large average length of transactions, HPrepost is more effective
than frequent itemsets algorithms in terms of execution time and memory usage
for all min-sup. Generally, our algorithm outperforms algorithms in terms of
runtime and memory usage with small thresholds and large datasets.

Data aggregation routing protocols in wireless sensor networks: a taxonomy

Saeid Pourroostaei Ardakani
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

Routing in Wireless Sensor Network (WSN) aims to interconnect sensor nodes
via single or multi-hop paths. The routes are established to forward data
packets from sensor nodes to the sink. Establishing a single path to report
each data packet results in increasing energy consumption in WSN, hence, data
aggregation routing is used to combine data packets and consequently reduce the
number of transmissions. This reduces the routing overhead by eliminating
redundant and meaningless data. There are two models for data aggregation
routing in WSN: mobile agent and client/server. This paper describes data
aggregation routing and classifies then the routing protocols according to the
network architecture and routing models. The key issues of the data aggregation
routing models (client/server and mobile agent) are highlighted and discussed.

User-transparent Distributed TensorFlow

Abhinav Vishnu, Joseph Manzano, Charles Siegel, Jeff Daily
Comments: 9 pages, 8 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Deep Learning (DL) algorithms have become the {em de facto} choice for data
analysis. Several DL implementations — primarily limited to a single compute
node — such as Caffe, TensorFlow, Theano and Torch have become readily
available. Distributed DL implementations capable of execution on large scale
systems are becoming important to address the computational needs of large data
produced by scientific simulations and experiments. Yet, the adoption of
distributed DL implementations faces significant impediments: 1) most
implementations require DL analysts to modify their code significantly — which
is a show-stopper, 2) several distributed DL implementations are geared towards
cloud computing systems — which is inadequate for execution on massively
parallel systems such as supercomputers.

This work addresses each of these problems. We provide a distributed memory
DL implementation by incorporating required changes in the TensorFlow runtime
itself. This dramatically reduces the entry barrier for using a distributed
TensorFlow implementation. We use Message Passing Interface (MPI) — which
provides performance portability, especially since MPI specific changes are
abstracted from users. Lastly — and arguably most importantly — we make our
implementation available for broader use, under the umbrella of Machine
Learning Toolkit for Extreme Scale (MaTEx) at { exttt
this http URL}. We refer to our implementation as MaTEx-TensorFlow.

Sparse Communication for Distributed Gradient Descent

Alham Fikri Aji, Kenneth Heafield
Comments: Submitted to EMNLP 2017
Subjects: Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

We make distributed stochastic gradient descent faster by exchanging 99%
sparse updates instead of dense updates. In data-parallel training, nodes pull
updated values of the parameters from a sharded server, compute gradients, push
their gradients to the server, and repeat. These push and pull updates strain
the network. However, most updates are near zero, so we map the 99% smallest
updates (by absolute value) to zero then exchange sparse matrices. Even simple
coordinate and value encoding achieves 50x reduction in bandwidth. Our
experiment with a neural machine translation on 4 GPUs achieved a 22% speed
boost without impacting BLEU score.

Morpheo: Traceable Machine Learning on Hidden data

Morpheo is a transparent and secure machine learning platform collecting and
analysing large datasets. It aims at building state-of-the art prediction
models in various fields where data are sensitive. Indeed, it offers strong
privacy of data and algorithm, by preventing anyone to read the data, apart
from the owner and the chosen algorithms. Computations in Morpheo are
orchestrated by a blockchain infrastructure, thus offering total traceability
of operations. Morpheo aims at building an attractive economic ecosystem around
data prediction by channelling crypto-money from prediction requests to useful
data and algorithms providers. Morpheo is designed to handle multiple data
sources in a transfer learning approach in order to mutualize knowledge
acquired from large datasets for applications with smaller but similar
datasets.

Design Guidelines for the User-Centred Collaborative Citizen Science Platforms

Poonam Yadav, John Darlington
Comments: In Review process
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)

Online Citizen Science platforms are good examples of socio-technical systems
where technology-enabled interactions occur between scientists and the general
public (volunteers). Citizen Science platforms usually host multiple Citizen
Science projects, and allow volunteers to choose the ones to participate in.
Recent work in the area has demonstrated a positive feedback loop between
participation and learning and creativity in Citizen Science projects, which is
one of the motivating factors both for scientists and the volunteers. This
emphasises the importance of creating successful Citizen Science platforms,
which support this feedback process, and enable enhanced learning and
creativity to occur through knowledge sharing and diverse participation. In
this paper, we discuss how scientists’ and volunteers’ motivation and
participation influence the design of Citizen Science platforms. We present our
summary as guidelines for designing these platforms as user-inspired
socio-technical systems. We also present the case-studies on popular Citizen
Science platforms, including our own CitizenGrid platform, developed as part of
the CCL EU project, as well as Zooniverse, World Community Grid, CrowdCrafting
and EpiCollect+ to see how closely these platforms follow our proposed
guidelines and how these may be further improved to incorporate the creativity
enabled by the collective knowledge sharing.

Learning

Fast multi-output relevance vector regression

Youngmin Ha
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

This paper aims to decrease the time complexity of multi-output relevance
vector regression from O(VM^3) to O(V^3+M^3), where V is the number of output
dimensions, M is the number of basis functions, and V<M. The experimental
results demonstrate that the proposed method is more competitive than the
existing method, with regard to computation time. MATLAB codes are available at
this http URL

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

In this paper, we propose a simple variant of the original stochastic
variance reduction gradient (SVRG), where hereafter we refer to as the variance
reduced stochastic gradient descent (VR-SGD). Different from the choices of the
snapshot point and starting point in SVRG and its proximal variant, Prox-SVRG,
the two vectors of each epoch in VR-SGD are set to the average and last iterate
of the previous epoch, respectively. This setting allows us to use much larger
learning rates or step sizes than SVRG, e.g., 3/(7L) for VR-SGD vs 1/(10L) for
SVRG, and also makes our convergence analysis more challenging. In fact, a
larger learning rate enjoyed by VR-SGD means that the variance of its
stochastic gradient estimator asymptotically approaches zero more rapidly.
Unlike common stochastic methods such as SVRG and proximal stochastic methods
such as Prox-SVRG, we design two different update rules for smooth and
non-smooth objective functions, respectively. In other words, VR-SGD can tackle
non-smooth and/or non-strongly convex problems directly without using any
reduction techniques such as quadratic regularizers. Moreover, we analyze the
convergence properties of VR-SGD for strongly convex problems, which show that
VR-SGD attains a linear convergence rate. We also provide the convergence
guarantees of VR-SGD for non-strongly convex problems. Experimental results
show that the performance of VR-SGD is significantly better than its
counterparts, SVRG and Prox-SVRG, and it is also much better than the best
known stochastic method, Katyusha.

Adversarial and Clean Data Are Not Twins

Zhitao Gong, Wenlu Wang, Wei-Shinn Ku
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Adversarial attack has cast a shadow on the massive success of deep neural
networks. Despite being almost visually identical to the clean data, the
adversarial images can fool deep neural networks into wrong predictions with
very high confidence. In this paper, however, we show that we can build a
simple binary classifier separating the adversarial apart from the clean data
with accuracy over 99%. We also empirically show that the binary classifier is
robust to a second-round adversarial attack. In other words, it is difficult to
disguise adversarial samples to bypass the binary classifier. Further more, we
empirically investigate the generalization limitation which lingers on all
current defensive methods, including the binary classifier approach. And we
hypothesize that this is the result of intrinsic property of adversarial
crafting algorithms.

Introspection: Accelerating Neural Network Training By Learning Weight Evolution

Abhishek Sinha, Mausoom Sarkar, Aahitagni Mukherjee, Balaji Krishnamurthy
Subjects: Learning (cs.LG)

Neural Networks are function approximators that have achieved
state-of-the-art accuracy in numerous machine learning tasks. In spite of their
great success in terms of accuracy, their large training time makes it
difficult to use them for various tasks. In this paper, we explore the idea of
learning weight evolution pattern from a simple network for accelerating
training of novel neural networks. We use a neural network to learn the
training pattern from MNIST classification and utilize it to accelerate
training of neural networks used for CIFAR-10 and ImageNet classification. Our
method has a low memory footprint and is computationally efficient. This method
can also be used with other optimizers to give faster convergence. The results
indicate a general trend in the weight evolution during training of neural
networks.

Deep Relaxation: partial differential equations for optimizing deep neural networks

Pratik Chaudhari, Adam Oberman, Stanley Osher, Stefano Soatto, Guillame Carlier
Subjects: Learning (cs.LG); Analysis of PDEs (math.AP); Optimization and Control (math.OC)

We establish connections between non-convex optimization methods for training
deep neural networks (DNNs) and the theory of partial differential equations
(PDEs). In particular, we focus on relaxation techniques initially developed in
statistical physics, which we show to be solutions of a nonlinear
Hamilton-Jacobi-Bellman equation. We employ the underlying stochastic control
problem to analyze the geometry of the relaxed energy landscape and its
convergence properties, thereby confirming empirical evidence. This paper opens
non-convex optimization problems arising in deep learning to ideas from the PDE
literature. In particular, we show that the non-viscous Hamilton-Jacobi
equation leads to an elegant algorithm based on the Hopf-Lax formula that
outperforms state-of-the-art methods. Furthermore, we show that these
algorithms scale well in practice and can effectively tackle the high
dimensionality of modern neural networks.

Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention

Feiyun Zhu, Peng Liao
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Online reinforcement learning (RL) is increasingly popular for the
personalized mobile health (mHealth) intervention. It is able to personalize
the type and dose of interventions according to user’s ongoing statuses and
changing needs. However, at the beginning of online learning, there are usually
too few samples to support the RL updating, which leads to poor performances. A
delay in good performance of the online learning algorithms can be especially
detrimental in the mHealth, where users tend to quickly disengage with apps. To
address this problem, we propose a new online RL methodology that focuses on an
effective warm start. The main idea is to make full use of the data accumulated
and the decision rule achieved in a former study. As a result, we can greatly
enrich the data size at the beginning of online learning in our method. Such
case accelerates the online learning process for new users to achieve good
performances not only at the beginning of online learning but also through the
whole online learning process. Besides, we use the decision rules achieved
previously to initialize the parameter in our online RL model for new users. It
provides a good initialization for the proposed online RL algorithm. Experiment
results show that promising improvements have been achieved by our method
compared with the state-of-the-art method.

On the Gap Between Strict-Saddles and True Convexity: An Omega(log d) Lower Bound for Eigenvector Approximation

Max Simchowitz, Ahmed El Alaoui, Benjamin Recht
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Combinatorics (math.CO); Machine Learning (stat.ML)

We prove a emph{query complexity} lower bound on rank-one principal
component analysis (PCA). We consider an oracle model where, given a symmetric
matrix (M in mathbb{R}^{d imes d}), an algorithm is allowed to make (T)
emph{exact} queries of the form (w^{(i)} = Mv^{(i)}) for (i in
{1,dots,T}), where (v^{(i)}) is drawn from a distribution which depends
arbitrarily on the past queries and measurements ({v^{(j)},w^{(j)}}_{1 le j
le i-1}). We show that for a small constant (epsilon), any adaptive,
randomized algorithm which can find a unit vector (widehat{v}) for which
(widehat{v}^{ op}Mwidehat{v} ge (1-epsilon)|M|), with even small
probability, must make (T = Omega(log d)) queries. In addition to settling a
widely-held folk conjecture, this bound demonstrates a fundamental gap between
convex optimization and “strict-saddle” non-convex optimization of which PCA is
a canonical example: in the former, first-order methods can have dimension-free
iteration complexity, whereas in PCA, the iteration complexity of
gradient-based methods must necessarily grow with the dimension. Our argument
proceeds via a reduction to estimating the rank-one spike in a deformed Wigner
model. We establish lower bounds for this model by developing a “truncated”
analogue of the (chi^2) Bayes-risk lower bound of Chen et al.

Hierarchic Kernel Recursive Least-Squares

Hossein Mohamadipanah, Girish Chowdhary
Subjects: Learning (cs.LG)

We present a new hierarchic kernel based modeling technique for modeling
evenly distributed multidimensional datasets that does not rely on input space
sparsification. The presented method reorganizes the typical single-layer
kernel based model in a hierarchical structure, such that the weights of a
kernel model over each dimension are modeled over the adjacent dimension. We
show that the imposition of the hierarchical structure in the kernel based
model leads to significant computational speedup and improved modeling accuracy
(over an order of magnitude in many cases). For instance the presented method
is about five times faster and more accurate than Sparsified Kernel Recursive
Least- Squares in modeling of a two-dimensional real-world data set.

Sparse Communication for Distributed Gradient Descent

Alham Fikri Aji, Kenneth Heafield
Comments: Submitted to EMNLP 2017
Subjects: Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

We make distributed stochastic gradient descent faster by exchanging 99%
sparse updates instead of dense updates. In data-parallel training, nodes pull
updated values of the parameters from a sharded server, compute gradients, push
their gradients to the server, and repeat. These push and pull updates strain
the network. However, most updates are near zero, so we map the 99% smallest
updates (by absolute value) to zero then exchange sparse matrices. Even simple
coordinate and value encoding achieves 50x reduction in bandwidth. Our
experiment with a neural machine translation on 4 GPUs achieved a 22% speed
boost without impacting BLEU score.

Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models

Ardavan Saeedi, Matthew D. Hoffman, Stephen J. DiVerdi, Asma Ghandeharioun, Matthew J. Johnson, Ryan P. Adams
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Professional-grade software applications are powerful but
complicated(-)expert users can achieve impressive results, but novices often
struggle to complete even basic tasks. Photo editing is a prime example: after
loading a photo, the user is confronted with an array of cryptic sliders like
“clarity”, “temp”, and “highlights”. An automatically generated suggestion
could help, but there is no single “correct” edit for a given image(-)different
experts may make very different aesthetic decisions when faced with the same
image, and a single expert may make different choices depending on the intended
use of the image (or on a whim). We therefore want a system that can propose
multiple diverse, high-quality edits while also learning from and adapting to a
user’s aesthetic preferences. In this work, we develop a statistical model that
meets these objectives. Our model builds on recent advances in neural network
generative modeling and scalable inference, and uses hierarchical structure to
learn editing patterns across many diverse users. Empirically, we find that our
model outperforms other approaches on this challenging multimodal prediction
task.

Bayesian Hybrid Matrix Factorisation for Data Integration

Thomas Brouwer, Pietro Lió
Comments: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017)
Journal-ref: PMLR 54:557-566, 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for
data integration, based on combining multiple matrix factorisation methods,
that can be used for in- and out-of-matrix prediction of missing values. The
model is very general and can be used to integrate many datasets across
different entity types, including repeated experiments, similarity matrices,
and very sparse datasets. We apply our method on two biological applications,
and extensively compare it to state-of-the-art machine learning and matrix
factorisation models. For in-matrix predictions on drug sensitivity datasets we
obtain consistently better performances than existing methods. This is
especially the case when we increase the sparsity of the datasets. Furthermore,
we perform out-of-matrix predictions on methylation and gene expression
datasets, and obtain the best results on two of the three datasets, especially
when the predictivity of datasets is high.

Gang of GANs: Generative Adversarial Networks with Maximum Margin Ranking

Felix Juefei-Xu, Vishnu Naresh Boddeti, Marios Savvides
Comments: 16 pages. 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

Traditional generative adversarial networks (GAN) and many of its variants
are trained by minimizing the KL or JS-divergence loss that measures how close
the generated data distribution is from the true data distribution. A recent
advance called the WGAN based on Wasserstein distance can improve on the KL and
JS-divergence based GANs, and alleviate the gradient vanishing, instability,
and mode collapse issues that are common in the GAN training. In this work, we
aim at improving on the WGAN by first generalizing its discriminator loss to a
margin-based one, which leads to a better discriminator, and in turn a better
generator, and then carrying out a progressive training paradigm involving
multiple GANs to contribute to the maximum margin ranking loss so that the GAN
at later stages will improve upon early stages. We call this method Gang of
GANs (GoGAN). We have shown theoretically that the proposed GoGAN can reduce
the gap between the true data distribution and the generated data distribution
by at least half in an optimally trained WGAN. We have also proposed a new way
of measuring GAN quality which is based on image completion tasks. We have
evaluated our method on four visual datasets: CelebA, LSUN Bedroom, CIFAR-10,
and 50K-SSFF, and have seen both visual and quantitative improvement over
baseline WGAN.

A Novel Experimental Platform for In-Vessel Multi-Chemical Molecular Communications

Nariman Farsad, David Pan, Andrea Goldsmith
Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Learning (cs.LG)

This work presents a new multi-chemical experimental platform for molecular
communication where the transmitter can release different chemicals. This
platform is designed to be inexpensive and accessible, and it can be expanded
to simulate different environments including the cardiovascular system and
complex network of pipes in industrial complexes and city infrastructures. To
demonstrate the capabilities of the platform, we implement a time-slotted
binary communication system where a bit-0 is represented by an acid pulse, a
bit-1 by a base pulse, and information is carried via pH signals. The channel
model for this system, which is nonlinear and has long memories, is unknown.
Therefore, we devise novel detection algorithms that use techniques from
machine learning and deep learning to train a maximum-likelihood detector.
Using these algorithms the bit error rate improves by an order of magnitude
relative to the approach used in previous works. Moreover, our system achieves
a data rate that is an order of magnitude higher than any of the previous
molecular communication platforms.

Random Walk Sampling for Big Data over Networks

Saeed Basirian, Alexander Jung
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

It has been shown recently that graph signals with small total variation can
be accurately recovered from only few samples if the sampling set satisfies a
certain condition, referred to as the network nullspace property. Based on this
recovery condition, we propose a sampling strategy for smooth graph signals
based on random walks. Numerical experiments demonstrate the effectiveness of
this approach for graph signals obtained from a synthetic random graph model as
well as a real-world dataset.

In-Datacenter Performance Analysis of a Tensor Processing Unit

Many architects believe that major improvements in cost-energy-performance
must now come from domain-specific hardware. This paper evaluates a custom
ASIC—called a Tensor Processing Unit (TPU)—deployed in datacenters since
2015 that accelerates the inference phase of neural networks (NN). The heart of
the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak
throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed
on-chip memory. The TPU’s deterministic execution model is a better match to
the 99th-percentile response-time requirement of our NN applications than are
the time-varying optimizations of CPUs and GPUs (caches, out-of-order
execution, multithreading, multiprocessing, prefetching, …) that help average
throughput more than guaranteed latency. The lack of such features helps
explain why, despite having myriad MACs and a big memory, the TPU is relatively
small and low power. We compare the TPU to a server-class Intel Haswell CPU and
an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters.
Our workload, written in the high-level TensorFlow framework, uses production
NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters’
NN inference demand. Despite low utilization for some applications, the TPU is
on average about 15X – 30X faster than its contemporary GPU or CPU, with
TOPS/Watt about 30X – 80X higher. Moreover, using the GPU’s GDDR5 memory in the
TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and
200X the CPU.

Deep Learning Based Regression and Multi-class Models for Acute Oral Toxicity Prediction

Youjun Xu, Jianfeng Pei, Luhua Lai
Comments: 36 pages, 4 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)

Median lethal death, LD50, is a general indicator of compound acute oral
toxicity (AOT). Various in silico methods were developed for AOT prediction to
reduce costs and time. In this study, a deep learning architecture composed of
multi-layer convolution neural network was used to develop three types of
high-level predictive models: regression model (deepAOT-R),
multi-classification (deepAOT-C) model and multitask model (deepAOT-CR) for AOT
evaluation. These models highly outperformed previously reported models. For
the two external datasets containing 1673 (test set I) and 375 (test set II)
compounds, the R2 and mean absolute error (MAE) of deepAOTR on the test set I
were 0.864 and 0.195, and the prediction accuracy of deepAOT-C was 95.5% and
96.3% on the test set I and II, respectively. The two external prediction
accuracy of deepAOT-CR is 95.0% and 94.1%, while the R2 and MAE are 0.861 and
0.204 for test set I, respectively. We then performed forward and backward
exploration of deepAOT models for deep fingerprints, which could support
shallow machine learning methods more efficiently than traditional fingerprints
or descriptors. We further performed automatic feature learning, a key essence
of deep learning, to map the corresponding activation values into fragment
space and derive AOT-related chemical substructures by reverse mining of the
features. Our deep learning framework for AOT is generally applicable in
predicting and exploring other toxicity or property endpoints of chemical
compounds. The two deepAOT models are freely available at
this http URL

Machine Learning and the Future of Realism

Giles Hooker, Cliff Hooker
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

The preceding three decades have seen the emergence, rise, and proliferation
of machine learning (ML). From half-recognised beginnings in perceptrons,
neural nets, and decision trees, algorithms that extract correlations (that is,
patterns) from a set of data points have broken free from their origin in
computational cognition to embrace all forms of problem solving, from voice
recognition to medical diagnosis to automated scientific research and
driverless cars, and it is now widely opined that the real industrial
revolution lies less in mobile phone and similar than in the maturation and
universal application of ML. Among the consequences just might be the triumph
of anti-realism over realism.

RACE: Large-scale ReAding Comprehension Dataset From Examinations

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

We present RACE, a new dataset for benchmark evaluation of methods in the
reading comprehension task. Collected from the English exams for middle and
high school Chinese students in the age range between 12 to 18, RACE consists
of 28,000+ passages and near 100,000 questions generated by human experts
(English instructors), and covers a variety of topics which are carefully
designed for evaluating the students’ ability in understanding and reasoning.
In particular, the proportion of questions that requires reasoning is much
larger in RACE than that in other benchmark datasets for reading comprehension,
and there is a significant gap between the performance of the state-of-the-art
models (43%) and the ceiling human performance (95%). We hope this new dataset
can serve as a valuable resource for research and evaluation in machine
comprehension. The dataset is freely available at
this http URL

Deep Learning for Photoacoustic Tomography from Sparse Data

Stephan Antholzer, Markus Haltmeier, Johannes Schwab
Comments: 13 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

The development of fast and accurate image reconstruction algorithms is a
central aspect of computed tomography. In this paper we investigate this issue
for the sparse data problem of photoacoustic tomography (PAT). We develop
direct and highly efficient reconstruction algorithms based on deep-learning.
In this approach image reconstruction is performed with a deep convolutional
neural network (CNN), whose weights are adjusted prior to the actual image
reconstruction based on a set of training data. Our results demonstrate that
the proposed deep learning approach reconstructs images with a quality
komparable to state of the art iterative approaches from sparse data. At the
same time, the numerically complexity of our approach is much smaller and the
image reconstruction is performed in a fraction of the time required by
iterative methods.

Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

Jie Zhong, Yijun Huang, Ji Liu
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

This paper considers the multi-armed thresholding bandit problem —
identifying all arms above a predefined threshold via as few pulls (or rounds)
as possible — proposed by Locatelli et al. [2016] recently. Although the
proposed algorithm in Locatelli et al. [2016] achieves the optimal round
complexity a certain sense, there still remain unsolved issues. This paper
proposes an asynchronous parallel thresholding algorithm and its parameter-free
version to improve the efficiency and the applicability. On one hand, the
proposed two algorithms use the empirical variance to guide which arm to pull
at each round, and improve the round complexity of the “optimal” algorithm when
all arms have bounded high order moments. On the other hand, most bandit
algorithms assume that the reward can be observed immediately after the pull or
the next decision would not be made before all rewards are observed. Our
proposed asynchronous parallel algorithms allow making the choice of the next
pull with unobserved rewards from earlier pulls, which avoids such an
unrealistic assumption and significantly improves the identification process.
Our theoretical analysis justifies the effectiveness and the efficiency of
proposed asynchronous parallel algorithms. The empirical study is also provided
to validate the proposed algorithms.

A Sample Complexity Measure with Applications to Learning Optimal Auctions

Vasilis Syrgkanis
Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG); Statistics Theory (math.ST)

We introduce a new sample complexity measure, which we refer to as
split-sample growth rate. For any hypothesis (H) and for any sample (S) of size
(m), the split-sample growth rate (hat{ au}_H(m)) counts how many different
hypotheses can empirical risk minimization output on any sub-sample of (S) of
size (m/2). We show that the expected generalization error is upper bounded by
(Oleft(sqrt{frac{log(hat{ au}_H(2m))}{m}}
ight)). Our result is enabled
by a strengthening of the Rademacher complexity analysis of the expected
generalization error. We show that this sample complexity measure, greatly
simplifies the analysis of the sample complexity of optimal auction design, for
many auction classes studied in the literature. Their sample complexity can be
derived solely by noticing that in these auction classes, ERM on any sample or
sub-sample will pick parameters that are equal to one of the points in the
sample.

Information Theory

Low Complexity Coefficient Selection Algorithms for Compute-and-Forward

Qinhui Huang, Alister Burr
Comments: 10 pages, 10 figures
Subjects: Information Theory (cs.IT)

Compute-and-Forward (C&F) has been proposed as an efficient strategy to
reduce the backhaul load for the distributed antenna systems. Finding the
optimal coefficients in C&F has commonly been treated as a shortest vector
problem (SVP), which is N-P hard. The point of our work and of Sahraei’s recent
work is that the C&F coefficient problem can be much simpler. Due to the
special structure of C&F, some low polynomial complexity optimal algorithms
have recently been developed. However these methods can be applied to real
valued channels and integer based lattices only. In this paper, we consider the
complex valued channel with complex integer based lattices. For the first time,
we propose a low polynomial complexity algorithm to find the optimal solution
for the complex scenario. Then we propose a simple linear search algorithm
which is conceptually suboptimal, however numerical results show that the
performance degradation is negligible compared to the optimal method. Both
algorithms are suitable for lattices over any algebraic integers, and
significantly outperform the lattice reduction algorithm. The complexity of
both algorithms are investigated both theoretically and numerically. The
results show that our proposed algorithms achieve better performance-complexity
trade-offs compared to the existing algorithms.

Covert Communication in Wireless Relay Networks

Jinsong Hu, Shihao Yan, Xiangyun Zhou, Feng Shu, Jiangzhou Wang
Subjects: Information Theory (cs.IT)

Covert communication aims to shield the very existence of wireless
transmissions in order to guarantee a strong security in wireless networks. In
this work, for the first time we examine the possibility and achievable
performance of covert communication in one-way relay networks. Specifically,
the relay opportunistically transmits its own information to the destination
covertly on top of forwarding the source’s message, while the source tries to
detect this covert transmission to discover the illegitimate usage of the
recourse (e.g., power, spectrum) allocated only for the purpose of forwarding
source’s information. The necessary condition that the relay can transmit
covertly without being detected is identified and the source’s detection limit
is derived in terms of the false alarm and miss detection rates. Our analysis
indicates that boosting the forwarding ability of the relay (e.g., increasing
its maximum transmit power) also increases its capacity to perform the covert
communication in terms of achieving a higher effective covert rate subject to
some specific requirement on the source’s detection performance.

Capacity-Achieving Input Distributions in Nondispersive Optical Fibers

Jihad Fahs, Aslan Tchamkerten, Mansoor I. Yousefi
Comments: 15 pages
Subjects: Information Theory (cs.IT)

This paper investigates the discrete-time per-sample model of the
zero-dispersion optical fiber. It is shown that any capacity-achieving input
distribution has (continuous) uniform phase and discrete amplitude with a
finite number of mass points. This result holds when the input is subject to
general input cost constraints that include peak power constraint and joint
average and peak power constraint.

Caching Policy Optimization for D2D Communications by Learning User Preference

Binqiang Chen, Chenyang Yang
Comments: Accepted by VTC Spring 2017
Subjects: Information Theory (cs.IT)

Cache-enabled device-to-device (D2D) communications can boost network
throughput. By pre-downloading contents to local caches of users, the content
requested by a user can be transmitted via D2D links by other users in
proximity. Prior works optimize the caching policy at users with the knowledge
of content popularity, defined as the probability distribution of request for
every file in a library from by all users. However, content popularity can not
reflect the interest of each individual user and thus popularity-based caching
policy may not fully capture the performance gain introduced by caching. In
this paper, we optimize caching policy for cache-enabled D2D by learning user
preference, defined as the conditional probability distribution of a user’s
request for a file given that the user sends a request. We first formulate an
optimization problem with given user preference to maximize the offloading
probability, which is proved as NP-hard, and then provide a greedy algorithm to
find the solution. In order to predict the preference of each individual user,
we model the user request behavior by probabilistic latent semantic analysis
(pLSA), and then apply expectation maximization (EM) algorithm to estimate the
model parameters. Simulation results show that the user preference can be
learnt quickly. Compared to the popularity-based caching policy, the offloading
gain achieved by the proposed policy can be remarkably improved even with
predicted user preference.

Linear Transceiver Design for Bidirectional Full-Duplex MIMO OFDM Systems

Omid Taghizadeh, Vimal Radhakrishnan, Ali Cagatay Cirik, Rudolf Mathar, Lutz Lampe
Subjects: Information Theory (cs.IT)

In this paper we address the linear precoding and decoding design problem for
a bidirectional orthogonal-frequency-division-multiplexing (OFDM) communication
system, between two multiple-input-multiple-output (MIMO) full-duplex (FD)
nodes. The effects of hardware distortion, leading to the residual
self-interference and inter-carrier leakage, as well as the channel state
information (CSI) error are taken into account. In the first step, the
operation of a FD MIMO OFDM transceiver is modeled where the explicit impact of
hardware inaccuracies on the inter-carrier leakage is observed. An alternating
quadratic convex program (AltQCP) is then provided to obtain a
minimum-mean-squared-error (MMSE) design for the defined system. Moreover,
taking into account the impacts of CSI inaccuracy, an alternating
semi-definite-program (AltSDP) is proposed to obtain a worst-case MMSE design
under a norm-bounded CSI error. The provided designs are also extended to
maximize the system sum rate, applying the weighted-MMSE (WMMSE) method. The
proposed AltQCP and AltSDP algorithms are based on alternating update of the
optimization variables, with a guaranteed convergence, where in each step the
sub-problem is solved to optimality. In order to provide further insights, the
computational complexity of the AltSDP algorithm is analytically obtained in
relation to the problem dimensions. Moreover, a methodology to obtain the least
favorable CSI error matrices is obtained, by transforming the resulting
non-convex quadratic problem into a convex problem. Finally, the proposed
methods are numerically evaluated in terms of the achievable system
performance, computational complexity, as well as the comparison to other
approaches in the literature. A significant gain is observed via the
application of the proposed methods as the hardware inaccuracy, and
consequently inter-carrier leakage, increases.

Wireless Communication using Unmanned Aerial Vehicles (UAVs): Optimal Transport Theory for Hover Time Optimization

Mohammad Mozaffari, Walid Saad, Mehdi Bennis, Merouane Debbah
Subjects: Information Theory (cs.IT)

In this paper, the effective use of flight-time constrained unmanned aerial
vehicles (UAVs) as flying base stations that can provide wireless service to
ground users is investigated. In particular, a novel framework for optimizing
the performance of such UAV-based wireless systems in terms of the average
number of bits (data service) transmitted to users as well as UAVs’ hover
duration (i.e. flight time) is proposed. In the considered model, UAVs hover
over a given geographical area to serve ground users that are distributed
within the area based on an arbitrary spatial distribution function. In this
case, two practical scenarios are considered. In the first scenario, based on
the maximum possible hover times of UAVs, the average data service delivered to
the users under a fair resource allocation scheme is maximized by finding the
optimal cell partitions associated to the UAVs. Using the mathematical
framework of optimal transport theory, a gradient-based algorithm is proposed
for optimally partitioning the geographical area based on the users’
distribution, hover times, and locations of the UAVs. In the second scenario,
given the load requirements of ground users, the minimum average hover time
that the UAVs need for completely servicing their ground users is derived. To
this end, first, an optimal bandwidth allocation scheme for serving the users
is proposed. Then, given this optimal bandwidth allocation, the optimal cell
partitions associated with the UAVs are derived by exploiting the optimal
transport theory. Results show that our proposed cell partitioning approach
leads to a significantly higher fairness among the users compared to the
classical weighted Voronoi diagram. In addition, our results reveal an inherent
tradeoff between the hover time of UAVs and bandwidth efficiency while serving
the ground users.

Network Coding Channel Virtualization Schemes for Satellite Multicast Communications

Samah A. M. Ghanem, Ala Eddine Gharsellaoui, Daniele Tarchi, Alessandro Vanelli-Coralli
Subjects: Information Theory (cs.IT)

In this paper, we propose two novel schemes to solve the problem of finding a
quasi-optimal number of coded packets to multicast to a set of independent
wireless receivers suffering different channel conditions. In particular, we
propose two network channel virtualization schemes that allow for representing
the set of intended receivers in a multicast group to be virtualized as one
receiver. Such approach allows for a transmission scheme not only adapted to
per-receiver channel variation over time, but to the network-virtualized
channel representing all receivers in the multicast group. The first scheme
capitalizes on a maximum erasure criterion introduced via the creation of a
virtual worst per receiver per slot reference channel of the network. The
second scheme capitalizes on a maximum completion time criterion by the use of
the worst performing receiver channel as a virtual reference to the network. We
apply such schemes to a GEO satellite scenario. We demonstrate the benefits of
the proposed schemes comparing them to a per-receiver point-to-point adaptive
strategy.

Energy Efficient Adaptive Network Coding Schemes for Satellite Communications

Ala Eddine Gharsellaoui, Samah A. M. Ghanem, Daniele Tarchi, Alessandro Vanelli Coralli
Comments: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 24 March 2017
Subjects: Information Theory (cs.IT)

In this paper, we propose novel energy efficient adaptive network coding and
modulation schemes for time variant channels. We evaluate such schemes under a
realistic channel model for open area environments and Geostationary Earth
Orbit (GEO) satellites. Compared to non-adaptive network coding and adaptive
rate efficient network-coded schemes for time variant channels, we show that
our proposed schemes, through physical layer awareness can be designed to
transmit only if a target quality of service (QoS) is achieved. As a result,
such schemes can provide remarkable energy savings.

Angle-Domain Doppler Pre-Compensation for High-Mobility OFDM Uplink with a Massive ULA

Wei Guo, Weile Zhang, Pengcheng Mu, Feifei Gao, Bobin Yao
Subjects: Information Theory (cs.IT)

In this paper, we propose a Doppler pre-compensation scheme for high-mobility
orthogonal frequency division multiplexing (OFDM) uplink, where a high-speed
terminal transmits signals to the base station (BS). Considering that the
time-varying multipath channel consists of multiple Doppler frequency offsets
(DFOs) with different angle of departures (AoDs), we propose to perform DFO
pre-compensation at the transmitter with a large-scale uniform linear array
(ULA). The transmitted signal passes through a beamforming network with
high-spatial resolution to produce multiple parallel branches. Each branch
transmits signal towards one direction thus it is affected by one dominant DFO
when passing over the time-varying channel. Therefore, we can compensate the
DFO for each branch at the transmitter previously. Theoretical analysis for the
Doppler spread of the equivalent uplink channel is also conducted. It is found
that when the number of transmit antennas is sufficiently large, the
time-variation of channel can be efficiently suppressed. Therefore, the
performance will not degrade significantly if applying the conventional
time-invariant channel estimation and equalization methods at the receiver.
Simulation results are provided to verify the proposed scheme.

Quantization Design and Channel Estimation for Massive MIMO Systems with One-Bit ADCs

Feiyu Wang, Jun Fang, Hongbin Li, Shaoqian Li
Subjects: Information Theory (cs.IT)

We consider the problem of channel estimation for uplink multiuser massive
MIMO systems, where, in order to significantly reduce the hardware cost and
power consumption, one-bit analog-to-digital converters (ADCs) are used at the
base station (BS) to quantize the received signal. Channel estimation for
one-bit massive MIMO systems is challenging due to the severe distortion caused
by the coarse quantization. It was shown in previous studies that an extremely
long training sequence is required to attain an acceptable performance. In this
paper, we study the problem of optimal one-bit quantization design for channel
estimation in one-bit massive MIMO systems. Our analysis reveals that, if the
quantization thresholds are optimally devised, using one-bit ADCs can achieve
an estimation error close to (with an increase by a factor of (pi/2)) that of
an ideal estimator which has access to the unquantized data. The optimal
quantization thresholds, however, are dependent on the unknown channel
parameters. To cope with this difficulty, we propose an adaptive quantization
(AQ) approach in which the thresholds are adaptively adjusted in a way such
that the thresholds converge to the optimal thresholds, and a random
quantization (RQ) scheme which randomly generate a set of nonidentical
thresholds based on some statistical prior knowledge of the channel. Simulation
results show that, our proposed AQ and RQ schemes, owing to their wisely
devised thresholds, present a significant performance improvement over the
conventional fixed quantization scheme that uses a fixed (typically zero)
threshold, and meanwhile achieve a substantial training overhead reduction for
channel estimation. In particular, even with a moderate number of pilot symbols
(about 5 times the number of users), the AQ scheme can provide an achievable
rate close to that of the perfect channel state information (CSI) case.

Adaptive Network Coding Schemes for Satellite Communications

Ala Eddine Gharsellaoui, Samah A. M. Ghanem, Daniele Tarchi, Alessandro Vanelli-Coralli
Comments: IEEE Advanced Satellite Multimedia Systems Conference and the 14th Signal Processing for Space Communications Workshop (ASMS/SPSC), 2016
Subjects: Information Theory (cs.IT)

In this paper, we propose two novel physical layer aware adaptive network
coding and coded modulation schemes for time variant channels. The proposed
schemes have been applied to different satellite communications scenarios with
different Round Trip Times (RTT). Compared to adaptive network coding, and
classical non-adaptive network coding schemes for time variant channels, as
benchmarks, the proposed schemes demonstrate that adaptation of packet
transmission based on the channel variation and corresponding erasures allows
for significant gains in terms of throughput, delay and energy efficiency. We
shed light on the trade-off between energy efficiency and delay-throughput
gains, demonstrating that conservative adaptive approaches that favors less
transmission under high erasures, might cause higher delay and less throughput
gains in comparison to non-conservative approaches that favor more transmission
to account for high erasures.

Capacity of the Gaussian Two-Pair Two-Way Relay Channel to Within 1/2 Bit

Xiaojun Yuan, Haiyang Xin, Soung-Chang Liew, Yong Li
Comments: 28 pages, 6 figures, journal
Subjects: Information Theory (cs.IT)

This paper studies the transceiver design of the Gaussian two-pair two-way
relay channel (TWRC), where two pairs of users exchange information through a
common relay in a pairwise manner. Our main contribution is to show that the
capacity of the Gaussian two-pair TWRC is achievable to within 1/2 bit for
arbitrary channel conditions. In the proof, we develop a hybrid coding scheme
involving Gaussian random coding, nested lattice coding, superposition coding,
and network-coded decoding. Further, we present a message-reassembling strategy
to decouple the coding design for the uplink and downlink phases, so as to
provide flexibility to fully exploit the channel randomness. Finally,
sophisticated power allocation at the relay is necessary to approach the
channel capacity under various channel conditions.

Advances in Detection and Error Correction for Coherent Optical Communications: Regular, Irregular, and Spatially Coupled LDPC Code Designs

Laurent Schmalen, Stephan ten Brink, Andreas Leven
Comments: “Enabling Technologies for High Spectral-efficiency Coherent Optical Communication Networks” edited by X. Zhou and C. Xie, John Wiley & Sons, Inc., April 2016
Subjects: Information Theory (cs.IT)

In this chapter, we show how the use of differential coding and the presence
of phase slips in the transmission channel affect the total achievable
information rates and capacity of a system. By means of the commonly used QPSK
modulation, we show that the use of differential coding does not decrease the
total amount of reliably conveyable information over the channel. It is a
common misconception that the use of differential coding introduces an
unavoidable differential loss. This perceived differential loss is rather a
consequence of simplified differential detection and decoding at the receiver.
Afterwards, we show how capacity-approaching coding schemes based on LDPC and
spatially coupled LDPC codes can be constructed by combining iterative
demodulation and decoding. For this, we first show how to modify the
differential decoder to account for phase slips and then how to use this
modified differential decoder to construct good LDPC codes. This construction
method can serve as a blueprint to construct good and practical LDPC codes for
other applications with iterative detection, such as higher order modulation
formats with non-square constellations, multi-dimensional optimized modulation
formats, turbo equalization to mitigate ISI (e.g., due to nonlinearities) and
many more. Finally, we introduce the class of spatially coupled (SC)-LDPC
codes, which are a generalization of LDPC codes with some outstanding
properties and which can be decoded with a very simple windowed decoder. We
show that the universal behavior of spatially coupled codes makes them an ideal
candidate for iterative differential demodulation/detection and decoding.

Robust Transceiver Design Based on Interference Alignment for Multi-User Multi-Cell MIMO Networks with Channel Uncertainty

Xianzhong Xie, Helin Yang, Athanasios V. Vasilakos
Comments: 12 pages, 8 figures
Subjects: Information Theory (cs.IT)

In this paper, we firstly exploit the inter-user interference (IUI) and
inter-cell interference (ICI) as useful references to develop a robust
transceiver design based on interference alignment for a downlink multi-user
multi-cell multiple-input multiple-output (MIMO) interference network under
channel estimation error. At transmitters, we propose a two-tier transmit
beamforming strategy, we first achieve the inner beamforming direction and
allocated power by minimizing the interference leakage as well as maximizing
the system energy efficiency, respectively. Then, for the outer beamformer
design, we develop an efficient conjugate gradient Grassmann manifold subspace
tracking algorithm to minimize the distances between the subspace spanned by
interference and the interference subspace in the time varying channel. At
receivers, we propose a practical interference alignment based on fast and
robust fast data projection method (FDPM) subspace tracking algorithm, to
achieve the receive beamformer under channel uncertainty. Numerical results
show that our proposed robust transceiver design achieves better performance
compared with some existing methods in terms of the sum rate and the energy
efficiency.

Massive MU-MIMO-OFDM Downlink with One-Bit DACs and Linear Precoding

Sven Jacobsson, Giuseppe Durisi, Mikael Coldrey, Christoph Studer
Subjects: Information Theory (cs.IT)

Massive multiuser (MU) multiple-input multiple-output (MIMO) is foreseen to
be a key technology in future wireless communication systems. In this paper, we
analyze the downlink performance of an orthogonal frequency division
multiplexing (OFDM)-based massive MU-MIMO system in which the base station (BS)
is equipped with 1-bit digital-to-analog converters (DACs). Using Bussgang’s
theorem, we characterize the performance achievable with linear precoders (such
as maximal-ratio transmission and zero forcing) in terms of bit error rate
(BER). Our analysis accounts for the possibility of oversampling the
time-domain transmit signal before the DACs. We further develop a lower bound
on the information-theoretic sum-rate throughput achievable with Gaussian
inputs.

Our results suggest that the performance achievable with 1-bit DACs in a
massive MU-MIMO-OFDM downlink are satisfactory provided that the number of BS
antennas is sufficiently large.

Energy-Efficient Mobile Cooperative Computing

Changsheng You, Kaibin Huang
Comments: Submitted to IEEE Globecom 2017
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

Scavenging the idling computation resources at the enormous number of mobile
devices will provide a new platform for implementing mobile cloud computing and
thereby alleviate traffic congestion in core networks. The vision can be
realized by mobile cooperative computing, referred to simply as co-computing.
In this paper, we consider a co-computing system where a user offloads
computation of input-data to a helper. To minimize the user’s energy
consumption, the helper controls the offloading process based on a predicted
CPU-idling profile, displaying time-varying spare computation resources. Assume
the helper has a large buffer for storing the offloaded data. The derived
solution for the optimal offloading control has an interesting graphical
interpretation as follows. In the plane of user’s co-computable bits (by
offloading) versus time, based on helper’s CPU-idling profile, a so called
offloading feasibility region can be constructed constraining the range of
offloaded bits at any time instant. Given the region, the optimal offloading is
shown to be achieved by the well-known “string-pulling” strategy, graphically
referring to pulling a string across the region. The solution approach is
further extended to the case where the helper has a small buffer, via
constructing an offloading feasibility tunnel based on the helper’s CPU-idling
profile and buffer size. Furthermore, we prove that the problem of optimal data
partitioning for offloading and local computing at the user is convex,
admitting a simple solution using the sub-gradient method.

Performance of Energy Harvesting Receivers with Power Optimization

Zhengwei Ni, Mehul Motani
Comments: 28 pages
Subjects: Information Theory (cs.IT)

The difficulty of modeling energy consumption in communication systems leads
to challenges in energy harvesting (EH) systems, in which nodes scavenge energy
from their environment. An EH receiver must harvest enough energy for
demodulating and decoding. The energy required depends upon factors, like code
rate and signal-to-noise ratio, which can be adjusted dynamically. We consider
a receiver which harvests energy from ambient sources and the transmitter,
meaning the received signal is used for both EH and information decoding.
Assuming a generalized function for energy consumption, we maximize the total
number of information bits decoded, under both average and peak power
constraints at the transmitter, by carefully optimizing the power used for EH,
power used for information transmission, fraction of time for EH, and code
rate. For transmission over a single block, we find there exist problem
parameters for which either maximizing power for information transmission or
maximizing power for EH is optimal. In the general case, the optimal solution
is a tradeoff of the two. For transmission over multiple blocks, we give an
upper bound on performance and give sufficient and necessary conditions to
achieve this bound. Finally, we give some numerical results to illustrate our
results and analysis.

RaPro: A Novel 5G Rapid Prototyping System Architecture

Xi Yang, Zhichao Huang, Bin Han, Senjie Zhang, Chao-Kai Wen, Feifei Gao, Shi Jin
Comments: accepted by IEEE Wireless Communication Letters
Subjects: Information Theory (cs.IT)

We propose a novel fifth-generation (5G) rapid prototyping (RaPro) system
architecture by combining FPGA-privileged modules from a software defined radio
(or FPGA-coprocessor) and high-level programming language for advanced
algorithms from multi-core general purpose processors. The proposed system
architecture exhibits excellent flexibility and scalability in the development
of a 5G prototyping system. As a proof of concept, a multi-user full-dimension
multiple-input and multiple-output system is established based on the proposed
architecture. Experimental results demonstrate the superiority of the proposed
architecture in large-scale antenna and wideband communication systems.

A Novel Experimental Platform for In-Vessel Multi-Chemical Molecular Communications

Nariman Farsad, David Pan, Andrea Goldsmith
Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Learning (cs.LG)

This work presents a new multi-chemical experimental platform for molecular
communication where the transmitter can release different chemicals. This
platform is designed to be inexpensive and accessible, and it can be expanded
to simulate different environments including the cardiovascular system and
complex network of pipes in industrial complexes and city infrastructures. To
demonstrate the capabilities of the platform, we implement a time-slotted
binary communication system where a bit-0 is represented by an acid pulse, a
bit-1 by a base pulse, and information is carried via pH signals. The channel
model for this system, which is nonlinear and has long memories, is unknown.
Therefore, we devise novel detection algorithms that use techniques from
machine learning and deep learning to train a maximum-likelihood detector.
Using these algorithms the bit error rate improves by an order of magnitude
relative to the approach used in previous works. Moreover, our system achieves
a data rate that is an order of magnitude higher than any of the previous
molecular communication platforms.

Mean Square Error of Neural Spike Train Sequence Matching with Optogenetics

Adam Noel, Dimitrios Makrakis, Andrew W. Eckford
Comments: 6 pages, 5 figures. Submitted to IEEE Global Communications Conference (IEEE GLOBECOM 2017) in April 2017
Subjects: Neurons and Cognition (q-bio.NC); Information Theory (cs.IT); Biological Physics (physics.bio-ph)

Optogenetics is an emerging field of neuroscience where neurons are
genetically modified to express light-sensitive receptors that enable external
control over when the neurons fire. Given the prominence of neuronal signaling
within the brain and throughout the body, optogenetics has significant
potential to improve the understanding of the nervous system and to develop
treatments for neurological diseases. This paper uses a simple optogenetic
model to compare the timing distortion between a randomly-generated target
spike sequence and an externally-stimulated neuron spike sequence. The
distortion is measured by filtering each sequence and finding the mean square
error between the two filter outputs. The expected distortion is derived in
closed form when the target sequence generation rate is sufficiently low.
Derivations are verified via simulations.

Structure and Randomness of Continuous-Time Discrete-Event Processes

S. E. Marzen, J. P. Crutchfield
Comments: 10 pages, 2 figures; this http URL
Subjects: Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Statistics Theory (math.ST); Chaotic Dynamics (nlin.CD)

Loosely speaking, the Shannon entropy rate is used to gauge a stochastic
process’ intrinsic randomness; the statistical complexity gives the cost of
predicting the process. We calculate, for the first time, the entropy rate and
statistical complexity of stochastic processes generated by finite unifilar
hidden semi-Markov models—memoryful, state-dependent versions of renewal
processes. Calculating these quantities requires introducing novel mathematical
objects ({epsilon}-machines of hidden semi-Markov processes) and new
information-theoretic methods to stochastic processes.

Automaton model of protein: dynamics of conformational and functional states

Andrei Khrennikov, Ekaterina Yurova
Subjects: Biomolecules (q-bio.BM); Information Theory (cs.IT); Quantum Physics (quant-ph)

In this conceptual paper we propose to explore the analogy between
ontic/epistemic description of quantum phenomena and interrelation between
dynamics of conformational and functional states of proteins. Another new idea
is to apply theory of automata to model the latter dynamics. In our model
protein’s behavior is modeled with the aid of two dynamical systems, ontic and
epistemic, which describe evolution of conformational and functional states of
proteins, respectively. The epistemic automaton is constructed from the ontic
automaton on the basis of functional (observational) equivalence relation on
the space of ontic states. This reminds a few approaches to emergent quantum
mechanics in which a quantum (epistemic) state is treated as representing a
class of prequantum (ontic) states. This approach does not match to the
standard {it protein structure-function paradigm.} However, it is perfect for
modeling of behavior of intrinsically disordered proteins. Mathematically space
of protein’s ontic states (conformational states) is modeled with the aid of
(p)-adic numbers or more general ultrametric spaces encoding the internal
hierarchical structure of proteins. Connection with theory of (p)-adic
dynamical systems is briefly discussed.

Randomized detection and detection capacity of multidetector networks

Ghurumuruhan Ganesan
Subjects: Probability (math.PR); Information Theory (cs.IT)

In this paper, we study the following detection problem. There are (n)
detectors randomly placed in the unit square (S =
left[-frac{1}{2},frac{1}{2}
ight]^2) assigned to detect the presence of a
source located at the origin. Time is divided into slots of unit length and
(D_i(t) in {0,1}) represents the (random) decision of the (i^{
m th})
detector in time slot (t). The location of the source is unknown to the
detectors and the goal is to design schemes that use the decisions
({D_i(t)}_{i,t}) and detect the presence of the source in as short time as
possible.

We first determine the minimum achievable detection time (T_{cap}) and show
the existence of emph{randomized} detection schemes that have detection times
arbitrarily close to (T_{cap}) for almost all configuration of detectors,
provided the number of detectors (n) is sufficiently large. We call such
schemes as emph{capacity achieving} and completely characterize all capacity
achieving detection schemes.

On the Gap Between Strict-Saddles and True Convexity: An Omega(log d) Lower Bound for Eigenvector Approximation

Max Simchowitz, Ahmed El Alaoui, Benjamin Recht
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Combinatorics (math.CO); Machine Learning (stat.ML)

We prove a emph{query complexity} lower bound on rank-one principal
component analysis (PCA). We consider an oracle model where, given a symmetric
matrix (M in mathbb{R}^{d imes d}), an algorithm is allowed to make (T)
emph{exact} queries of the form (w^{(i)} = Mv^{(i)}) for (i in
{1,dots,T}), where (v^{(i)}) is drawn from a distribution which depends
arbitrarily on the past queries and measurements ({v^{(j)},w^{(j)}}_{1 le j
le i-1}). We show that for a small constant (epsilon), any adaptive,
randomized algorithm which can find a unit vector (widehat{v}) for which
(widehat{v}^{ op}Mwidehat{v} ge (1-epsilon)|M|), with even small
probability, must make (T = Omega(log d)) queries. In addition to settling a
widely-held folk conjecture, this bound demonstrates a fundamental gap between
convex optimization and “strict-saddle” non-convex optimization of which PCA is
a canonical example: in the former, first-order methods can have dimension-free
iteration complexity, whereas in PCA, the iteration complexity of
gradient-based methods must necessarily grow with the dimension. Our argument
proceeds via a reduction to estimating the rank-one spike in a deformed Wigner
model. We establish lower bounds for this model by developing a “truncated”
analogue of the (chi^2) Bayes-risk lower bound of Chen et al.