Brendan Cody-Kenny, Michael Fenton, Adrian Ronayne, Eoghan Considine, Thomas McGuire, Michael O'Neill
Comments: Submitted to the Search-Based Software Engineering (SBSE) track at the Genetic and Evolutionary Computation Conference (GECCO) 2017
Subjects: Neural and Evolutionary Computing (cs.NE)
The primary aim of automated performance improvement is to reduce the running
time of programs while maintaining (or improving on) functionality. In this
paper, Genetic Programming is used to find performance improvements in regular
expressions for an array of target programs, representing the first application
of automated software improvement for run-time performance in the Regular
Expression language. This particular problem is interesting as there may be
many possible alternative regular expressions which perform the same task while
exhibiting subtle differences in performance. A benchmark suite of candidate
regular expressions is proposed for improvement. We show that the application
of Genetic Programming techniques can result in performance improvements in all
cases.
As we start evolution from a known good regular expression, diversity is
critical in escaping the local optima of the seed expression. In order to
understand diversity during evolution we compare an initial population
consisting of only seed programs with a population initialised using a
combination of a single seed individual with individuals generated using PI
Grow and Ramped-half-and-half initialisation mechanisms.
Mohsen Moradi
Comments: 5 pages, 6 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
In this study we determined neural network weights and biases by Imperialist
Competitive Algorithm (ICA) in order to train network for predicting earthquake
intensity in Richter. For this reason, we used dependent parameters like
earthquake occurrence time, epicenter’s latitude and longitude in degree, focal
depth in kilometer, and the seismological center distance from epicenter and
earthquake focal center in kilometer which has been provided by Berkeley data
base. The studied neural network has two hidden layer: its first layer has 16
neurons and the second layer has 24 neurons. By using ICA algorithm, average
error for testing data is 0.0007 with a variance equal to 0.318. The earthquake
prediction error in Richter by MSE criteria for ICA algorithm is 0.101, but by
using GA, the MSE value is 0.115.
Xiaojing Xu, Srinjoy Das, Ken Kreutz-Delgado
Comments: 8 pages, 7 figures
Subjects: Neural and Evolutionary Computing (cs.NE)
Probabilistic generative neural networks are useful for many applications,
such as image classification, speech recognition and occlusion removal.
However, the power budget for hardware implementations of neural networks can
be extremely tight. To address this challenge we describe a design methodology
for using approximate computing methods to implement Approximate Deep Belief
Networks (ApproxDBNs) by systematically exploring the use of (1) limited
precision of variables; (2) criticality analysis to identify the nodes in the
network which can operate with such limited precision while allowing the
network to maintain target accuracy levels; and (3) a greedy search methodology
with incremental retraining to determine the optimal reduction in precision to
enable maximize power savings under user-specified accuracy constraints.
Experimental results show that significant bit-length reduction can be achieved
by our ApproxDBN with constrained accuracy loss.
Madhavun Candadai Vasu, Eduardo J. Izquierdo
Comments: Submitted to GECCO’17
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)
Elucidating principles that underlie computation in neural networks is
currently a major research topic of interest in neuroscience. Transfer Entropy
(TE) is increasingly used as a tool to bridge the gap between network
structure, function, and behavior in fMRI studies. Computational models allow
us to bridge the gap even further by directly associating individual neuron
activity with behavior. However, most computational models that have analyzed
embodied behaviors have employed non-spiking neurons. On the other hand,
computational models that employ spiking neural networks tend to be restricted
to disembodied tasks. We show for the first time the artificial evolution and
TE-analysis of embodied spiking neural networks to perform a
cognitively-interesting behavior. Specifically, we evolved an agent controlled
by an Izhikevich neural network to perform a visual categorization task. The
smallest networks capable of performing the task were found by repeating
evolutionary runs with different network sizes. Informational analysis of the
best solution revealed task-specific TE-network clusters, suggesting that
within-task homogeneity and across-task heterogeneity were key to behavioral
success. Moreover, analysis of the ensemble of solutions revealed that
task-specificity of TE-network clusters correlated with fitness. This provides
an empirically testable hypothesis that links network structure to behavior.
Krishna Kumar Singh, Yong Jae Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose `Hide-and-Seek’, a weakly-supervised framework that aims to
improve object localization in images and action localization in videos. Most
existing weakly-supervised methods localize only the most discriminative parts
of an object rather than all relevant parts, which leads to suboptimal
performance. Our key idea is to hide patches in a training image randomly,
forcing the network to seek other relevant parts when the most discriminative
part is hidden. Our approach only needs to modify the input image and can work
with any network designed for object localization. During testing, we do not
need to hide any patches. Our Hide-and-Seek approach obtains superior
performance compared to previous methods for weakly-supervised object
localization on the ILSVRC dataset. We also demonstrate that our framework can
be easily extended to weakly-supervised action localization.
Xinlei Chen, Abhinav Gupta
Comments: Draft submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Modeling instance-level context and object-object relationships is extremely
challenging. It requires reasoning about bounding boxes of different classes,
locations etc. Above all, instance-level spatial reasoning inherently requires
modeling conditional distributions on previous detections. Unfortunately, our
current object detection systems do not have any {f memory} to remember what
to condition on! The state-of-the-art object detectors still detect all object
in parallel followed by non-maximal suppression (NMS). While memory has been
used for tasks such as captioning, they mostly use image-level memory cells
without capturing the spatial layout. On the other hand, modeling object-object
relationships requires {f spatial} reasoning — not only do we need a memory
to store the spatial layout, but also a effective reasoning module to extract
spatial patterns. This paper presents a conceptually simple yet powerful
solution — Spatial Memory Network (SMN), to model the instance-level context
efficiently and effectively. Our spatial memory essentially assembles object
instances back into a pseudo “image” representation that is easy to be fed into
another ConvNet for object-object context reasoning. This leads to a new
sequential reasoning architecture where image and memory are processed in
parallel to obtain detections which update the memory again. We show our SMN
direction is promising as it provides 2.2\% improvement over baseline Faster
RCNN on the COCO dataset so far.
Yichao Zhang, Silvia L. Pintea, Jan C. van Gemert
Comments: Accepted paper at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The ability to amplify or reduce subtle image changes over time is useful in
contexts such as video editing, medical video analysis, product quality control
and sports. In these contexts there is often large motion present which
severely distorts current video amplification methods that magnify change
linearly. In this work we propose a method to cope with large motions while
still magnifying small changes. We make the following two observations: i)
large motions are linear on the temporal scale of the small changes; ii) small
changes deviate from this linearity. We ignore linear motion and propose to
magnify acceleration. Our method is pure Eulerian and does not require any
optical flow, temporal alignment or region annotations. We link temporal
second-order derivative filtering to spatial acceleration magnification. We
apply our method to moving objects where we show motion magnification and color
magnification. We provide quantitative as well as qualitative evidence for our
method while comparing to the state-of-the-art.
Junyu Dong, Lina Wang, Jun Liu, Xin Sun
Comments: 9 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Procedural textures are normally generated from mathematical models with
parameters carefully selected by experienced users. However, for naive users,
the intuitive way to obtain a desired texture is to provide semantic
descriptions such as “regular,” “lacelike,” and “repetitive” and then a
procedural model with proper parameters will be automatically suggested to
generate the corresponding textures. By contrast, it is less practical for
users to learn mathematical models and tune parameters based on multiple
examinations of large numbers of generated textures. In this study, we propose
a novel framework that generates procedural textures according to user-defined
semantic descriptions, and we establish a mapping between procedural models and
semantic texture descriptions. First, based on a vocabulary of semantic
attributes collected from psychophysical experiments, a multi-label learning
method is employed to annotate a large number of textures with semantic
attributes to form a semantic procedural texture dataset. Then, we derive a low
dimensional semantic space in which the semantic descriptions can be separated
from one other. Finally, given a set of semantic descriptions, the diverse
properties of the samples in the semantic space can lead the framework to find
an appropriate generation model that uses appropriate parameters to produce a
desired texture. The experimental results show that the proposed framework is
effective and that the generated textures closely correlate with the input
semantic descriptions.
Devinder Kumar, Alexander Wong, Graham W. Taylor
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Multimedia (cs.MM)
In this work, we propose CLass-Enhanced Attentive Response (CLEAR): an
approach to visualize and understand the decisions made by deep neural networks
(DNNs) given a specific input. CLEAR facilitates the visualization of attentive
regions and levels of interest of DNNs during the decision-making process. It
also enables the visualization of the most dominant classes associated with
these attentive regions of interest. As such, CLEAR can mitigate some of the
shortcomings of heatmap-based methods associated with decision ambiguity, and
allows for better insights into the decision-making process of DNNs.
Quantitative and qualitative experiments across three different datasets
demonstrate the efficacy of CLEAR for gaining a better understanding of the
inner workings of DNNs during the decision-making process.
Zhixin Shu, Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shechtman, Dimitris Samaras
Comments: CVPR 2017 oral
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Traditional face editing methods often require a number of sophisticated and
task specific algorithms to be applied one after the other — a process that
is tedious, fragile, and computationally intensive. In this paper, we propose
an end-to-end generative adversarial network that infers a face-specific
disentangled representation of intrinsic face properties, including shape (i.e.
normals), albedo, and lighting, and an alpha matte. We show that this network
can be trained on “in-the-wild” images by incorporating an in-network
physically-based image formation module and appropriate loss functions. Our
disentangling latent representation allows for semantically relevant edits,
where one aspect of facial appearance can be manipulated while keeping
orthogonal properties fixed, and we demonstrate its use for a number of facial
editing applications.
Cristóvão Cruz, Rakesh Mehta, Vladimir Katkovnik, Karen Egiazarian
Comments: Paper under revision on IEEE Transactions on Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Single image super resolution (SISR) is an ill-posed problem aiming at
estimating plausible high resolution (HR) image from a single low resolution
(LR) image. Current state-of-the-art SISR methods are patch-based. They use
either external data or internal self-similarity to learn a prior for a HR
image. External data based methods utilize large number of patches from the
training data, while self-similarity based approaches use a highly relevant
matching patch from the input image as a prior. In this paper, we aim at
combining the ideas from both paradigms, i.e. we learn a prior for a patch
using a large number of patches collected from the input image. We show that
this results in a strong prior. The performance of the proposed algorithm,
which is based on iterative collaborative filtering with back-projection, is
evaluated on a number of benchmark super-resolution image datasets. Without
using any external data, the proposed approach outperforms the current non-CNN
based methods on tested standard datasets for various scaling factors. On
certain datasets a gain is over 1 dB compared to the recent method A+. For high
sampling rates (x4 and higher) the proposed method performs similar to very
recent state-of-the-art deep convolutional network-based approaches.
Alejandro Cartas, Juan Marín, Petia Radeva, Mariella Dimiccoli
Comments: To appear in the Proceedings of IbPRIA 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recognizing Activities of Daily Living (ADLs) has a large number of health
applications, such as characterize lifestyle for habit improvement, nursing and
rehabilitation services. Wearable cameras can daily gather large amounts of
image data that provide rich visual information about ADLs than using other
wearable sensors. In this paper, we explore the classification of ADLs from
images captured by low temporal resolution wearable camera (2fpm) by using a
Convolutional Neural Networks (CNN) approach. We show that the classification
accuracy of a CNN largely improves when its output is combined, through a
random decision forest, with contextual information from a fully connected
layer. The proposed method was tested on a subset of the NTCIR-12 egocentric
dataset, consisting of 18,674 images and achieved an overall accuracy of 86%
activity recognition on 21 classes.
Rui Huang, Shu Zhang, Tianyu Li, Ran He
Comments: 11 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Photorealistic frontal view synthesis from a single face image has a wide
range of applications in the field of face recognition. Although data-driven
deep learning methods have been proposed to address this problem by seeking
solutions from ample face data, this problem is still challenging because it is
intrinsically ill-posed. This paper proposes a Two-Pathway Generative
Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by
simultaneously perceiving global structures and local details. Four landmark
located patch networks are proposed to attend to local textures in addition to
the commonly used global encoder-decoder network. Except for the novel
architecture, we make this ill-posed problem well constrained by introducing a
combination of adversarial loss, symmetry loss and identity preserving loss.
The combined loss function leverages both frontal face distribution and
pre-trained discriminative deep face models to guide an identity preserving
inference of frontal views from profiles. Different from previous deep learning
methods that mainly rely on intermediate features for recognition, our method
directly leverages the synthesized identity preserving image for downstream
tasks like face recognition and attribution estimation. Experimental results
demonstrate that our method not only presents compelling perceptual results but
also outperforms state-of-the-art results on large pose face recognition.
Prabuddha Chakraborty, Vinay P. Namboodiri
Comments: 11 pages, 8 figures, under review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we propose a technique for obtaining coarse pose estimation of
humans in an image that does not require any manual supervision. While a
general unsupervised technique would fail to estimate human pose, we suggest
that sufficient information about coarse pose can be obtained by observing
human motion in multiple frames. Specifically, we consider obtaining surrogate
supervision through videos as a means for obtaining motion based grouping cues.
We supplement the method using a basic object detector that detects persons.
With just these components we obtain a rough estimate of the human pose.
With these samples for training, we train a fully convolutional neural
network (FCNN)[20] to obtain accurate dense blob based pose estimation. We show
that the results obtained are close to the ground-truth and to the results
obtained using a fully supervised convolutional pose estimation method [31] as
evaluated on a challenging dataset [15]. This is further validated by
evaluating the obtained poses using a pose based action recognition method [5].
In this setting we outperform the results as obtained using the baseline method
that uses a fully supervised pose estimation algorithm and is competitive with
a new baseline created using convolutional pose estimation with full
supervision.
Qiang Wang, Jin Gao, Junliang Xing, Mengdan Zhang, Weiming Hu
Comments: 5 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Discriminant Correlation Filters (DCF) based methods now become a kind of
dominant approach to online object tracking. The features used in these
methods, however, are either based on hand-crafted features like HoGs, or
convolutional features trained independently from other tasks like image
classification. In this work, we present an end-to-end lightweight network
architecture, namely DCFNet, to learn the convolutional features and perform
the correlation tracking process simultaneously. Specifically, we treat DCF as
a special correlation filter layer added in a Siamese network, and carefully
derive the backpropagation through it by defining the network output as the
probability heatmap of object location. Since the derivation is still carried
out in Fourier frequency domain, the efficiency property of DCF is preserved.
This enables our tracker to run at more than 60 FPS during test time, while
achieving a significant accuracy gain compared with KCF using HoGs. Extensive
evaluations on OTB-2013, OTB-2015, and VOT2015 benchmarks demonstrate that the
proposed DCFNet tracker is competitive with several state-of-the-art trackers,
while being more compact and much faster.
Dino Ienco, Raffaele Gaetano, Claire Dupaquier, Pierre Maurel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Nowadays, modern earth observation programs produce huge volumes of satellite
images time series (SITS) that can be useful to monitor geographical areas
through time. How to efficiently analyze such kind of information is still an
open question in the remote sensing field. Recently, deep learning methods
proved suitable to deal with remote sensing data mainly for scene
classification (i.e. Convolutional Neural Networks – CNNs – on single images)
while only very few studies exist involving temporal deep learning approaches
(i.e Recurrent Neural Networks – RNNs) to deal with remote sensing time series.
In this letter we evaluate the ability of Recurrent Neural Networks, in
particular the Long-Short Term Memory (LSTM) model, to perform land cover
classification considering multi-temporal spatial data derived from a time
series of satellite images. We carried out experiments on two different
datasets considering both pixel-based and object-based classification. The
obtained results show that Recurrent Neural Networks are competitive compared
to state-of-the-art classifiers, and may outperform classical approaches in
presence of low represented and/or highly mixed classes. We also show that
using the alternative feature representation generated by LSTM can improve the
performances of standard classifiers.
Ge Gao, Mikko Lauri, Simone Frintrop
Comments: 7 pages, submitted to IROS2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a new saliency-guided method for generating supervoxels in 3D
space. Rather than using an evenly distributed spatial seeding procedure, our
method uses visual saliency to guide the process of supervoxel generation. This
results in densely distributed, small, and precise supervoxels in salient
regions which often contain objects, and larger supervoxels in less salient
regions that often correspond to background. Our approach largely improves the
quality of the resulting supervoxel segmentation in terms of boundary recall
and under-segmentation error on publicly available benchmarks.
Xin Tao, Chao Zhou, Xiaoyong Shen, Jue Wang, Jiaya Jia
Comments: 9 pages, submitted to conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we study an unconventional but practically meaningful
reversibility problem of commonly used image filters. We broadly define filters
as operations to smooth images or to produce layers via global or local
algorithms. And we raise the intriguingly problem if they are reservable to the
status before filtering. To answer it, we present a novel strategy to
understand general filter via contraction mappings on a metric space. A very
simple yet effective zero-order algorithm is proposed. It is able to
practically reverse most filters with low computational cost. We present quite
a few experiments in the paper and supplementary file to thoroughly verify its
performance. This method can also be generalized to solve other inverse
problems and enables new applications.
Maheen Rashid, Xiuye Gu, Yong Jae Lee
Comments: CVPR 2017 Camera Ready
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a method for localizing facial keypoints on animals by
transferring knowledge gained from human faces. Instead of directly finetuning
a network trained to detect keypoints on human faces to animal faces (which is
sub-optimal since human and animal faces can look quite different), we propose
to first adapt the animal images to the pre-trained human detection network by
correcting for the differences in animal and human face shape. We first find
the nearest human neighbors for each animal image using an unsupervised shape
matching method. We use these matches to train a thin plate spline warping
network to warp each animal face to look more human-like. The warping network
is then jointly finetuned with a pre-trained human facial keypoint detection
network using an animal dataset. We demonstrate state-of-the-art results on
both horse and sheep facial keypoint detection, and significant improvement
over simple finetuning, especially when training data is scarce. Additionally,
we present a new dataset with 3717 images with horse face and facial keypoint
annotations.
Ju Yong Chang, Kyoung Mu Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This study considers the 3D human pose estimation problem in a single RGB
image by proposing a conditional random field (CRF) model over 2D poses, in
which the 3D pose is obtained as a byproduct of the inference process. The
unary term of the proposed CRF model is defined based on a powerful heat-map
regression network, which has been proposed for 2D human pose estimation. This
study also presents a regression network for lifting the 2D pose to 3D pose and
proposes the prior term based on the consistency between the estimated 3D pose
and the 2D pose. To obtain the approximate solution of the proposed CRF model,
the N-best strategy is adopted. The proposed inference algorithm can be viewed
as sequential processes of bottom-up generation of 2D and 3D pose proposals
from the input 2D image based on deep networks and top-down verification of
such proposals by checking their consistencies. To evaluate the proposed
method, we use two large-scale datasets: Human3.6M and HumanEva. Experimental
results show that the proposed method achieves the state-of-the-art 3D human
pose estimation performance.
Stephen Tierney, Yi Guo, Junbin Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we present Collaborative Low-Rank Subspace Clustering. Given
multiple observations of a phenomenon we learn a unified representation matrix.
This unified matrix incorporates the features from all the observations, thus
increasing the discriminative power compared with learning the representation
matrix on each observation separately. Experimental evaluation shows that our
method outperforms subspace clustering on separate observations and the state
of the art collaborative learning algorithm.
Stephen Tierney, Junbin Gao, Yi Guo, Zheng Zhang
Comments: arXiv admin note: substantial text overlap with arXiv:1601.00732
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In machine learning it is common to interpret each data point as a vector in
Euclidean space. However the data may actually be functional i.e. each data
point is a function of some variable such as time and the function is
discretely sampled. The naive treatment of functional data as traditional
multivariate data can lead to poor performance since the algorithms are
ignoring the correlation in the curvature of each function. In this paper we
propose a tractable method to cluster functional data or curves by adapting the
Euclidean Low-Rank Representation (LRR) to the curve manifold. Experimental
evaluation on synthetic and real data reveals that this method massively
outperforms prior clustering methods in both speed and accuracy when clustering
functional data.
Stephen Tierney, Yi Guo, Junbin Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Sparse Subspace Clustering (SSC) has been used extensively for subspace
identification tasks due to its theoretical guarantees and relative ease of
implementation. However SSC has quadratic computation and memory requirements
with respect to the number of input data points. This burden has prohibited
SSCs use for all but the smallest datasets. To overcome this we propose a new
method, k-SSC, that screens out a large number of data points to both reduce
SSC to linear memory and computational requirements. We provide theoretical
analysis for the bounds of success for k-SSC. Our experiments show that k-SSC
exceeds theoretical expectations and outperforms existing SSC approximations by
maintaining the classification performance of SSC. Furthermore in the spirit of
reproducible research we have publicly released the source code for k-SSC
Giorgos Tolias, Ondřej Chum
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel concept of asymmetric feature maps (AFM), which allows to
evaluate multiple kernels between a query and database entries without
increasing the memory requirements. To demonstrate the advantages of the AFM
method, we derive a short vector image representation that, due to asymmetric
feature maps, supports efficient scale and translation invariant sketch-based
image retrieval. Unlike most of the short-code based retrieval systems, the
proposed method provides the query localization in the retrieved image. The
efficiency of the search is boosted by approximating a 2D translation search
via trigonometric polynomial of scores by 1D projections. The projections are a
special case of AFM. An order of magnitude speed-up is achieved compared to
traditional trigonometric polynomials. The results are boosted by an
image-based average query expansion, exceeding significantly the state of the
art on standard benchmarks.
Yuting Zhang, Luyao Yuan, Yijie Guo, Zhiyuan He, I-An Huang, Honglak Lee
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Associating image regions with text queries has been recently explored as a
new way to bridge visual and linguistic representations. A few pioneering
approaches have been proposed based on recurrent neural language models trained
generatively (e.g., generating captions), but achieving somewhat limited
localization accuracy. To better address natural-language-based visual entity
localization, we propose a discriminative approach. We formulate a
discriminative bimodal neural network (DBNet), which can be trained by a
classifier with extensive use of negative samples. Our training objective
encourages better localization on single images, incorporates text phrases in a
broad range, and properly pairs image regions with text phrases into positive
and negative examples. Experiments on the Visual Genome dataset demonstrate the
proposed DBNet significantly outperforms previous state-of-the-art methods both
for localization on single images and for detection on multiple images. We we
also establish an evaluation protocol for natural-language visual detection.
Chong You, Daniel P. Robinson, René Vidal
Comments: 16 pages. CVPR 2017 spotlight oral presentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Many computer vision tasks involve processing large amounts of data
contaminated by outliers, which need to be detected and rejected. While outlier
detection methods based on robust statistics have existed for decades, only
recently have methods based on sparse and low-rank representation been
developed along with guarantees of correct outlier detection when the inliers
lie in one or more low-dimensional subspaces. This paper proposes a new outlier
detection method that combines tools from sparse representation with random
walks on a graph. By exploiting the property that data points can be expressed
as sparse linear combinations of each other, we obtain an asymmetric affinity
matrix among data points, which we use to construct a weighted directed graph.
By defining a suitable Markov Chain from this graph, we establish a connection
between inliers/outliers and essential/inessential states of the Markov chain,
which allows us to detect outliers by using random walks. We provide a
theoretical analysis that justifies the correctness of our method under
geometric and connectivity assumptions. Experimental results on image databases
demonstrate its superiority with respect to state-of-the-art sparse and
low-rank outlier detection methods.
Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang
Comments: This work is accepted in CVPR 2017. The code and datasets are available on this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural networks have recently demonstrated high-quality
reconstruction for single-image super-resolution. In this paper, we propose the
Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively
reconstruct the sub-band residuals of high-resolution images. At each pyramid
level, our model takes coarse-resolution feature maps as input, predicts the
high-frequency residuals, and uses transposed convolutions for upsampling to
the finer level. Our method does not require the bicubic interpolation as the
pre-processing step and thus dramatically reduces the computational complexity.
We train the proposed LapSRN with deep supervision using a robust Charbonnier
loss function and achieve high-quality reconstruction. Furthermore, our network
generates multi-scale predictions in one feed-forward pass through the
progressive reconstruction, thereby facilitates resource-aware applications.
Extensive quantitative and qualitative evaluations on benchmark datasets show
that the proposed algorithm performs favorably against the state-of-the-art
methods in terms of speed and accuracy.
Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, Li-Jia Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Image captioning is a challenging problem owing to the complexity in
understanding the image content and diverse ways of describing it in natural
language. Recent advances in deep neural networks have substantially improved
the performance of this task. Most state-of-the-art approaches follow an
encoder-decoder framework, which generates captions using a sequential
recurrent prediction model. However, in this paper, we introduce a novel
decision-making framework for image captioning. We utilize a “policy network”
and a “value network” to collaboratively generate captions. The policy network
serves as a local guidance by providing the confidence of predicting the next
word according to the current state. Additionally, the value network serves as
a global and lookahead guidance by evaluating all possible extensions of the
current state. In essence, it adjusts the goal of predicting the correct words
towards the goal of generating captions similar to the ground truth captions.
We train both networks using an actor-critic reinforcement learning model, with
a novel reward defined by visual-semantic embedding. Extensive experiments and
analyses on the Microsoft COCO dataset show that the proposed framework
outperforms state-of-the-art approaches across different evaluation metrics.
Siddha Ganju, Olga Russakovsky, Abhinav Gupta
Comments: CVPR 2017 Spotlight paper and supplementary
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Collecting fully annotated image datasets is challenging and expensive. Many
types of weak supervision have been explored: weak manual annotations, web
search results, temporal continuity, ambient sound and others. We focus on one
particular unexplored mode: visual questions that are asked about images. The
key observation that inspires our work is that the question itself provides
useful information about the image (even without the answer being available).
For instance, the question “what is the breed of the dog?” informs the AI that
the animal in the scene is a dog and that there is only one dog present. We
make three contributions: (1) providing an extensive qualitative and
quantitative analysis of the information contained in human visual questions,
(2) proposing two simple but surprisingly effective modifications to the
standard visual question answering models that allow them to make use of weak
supervision in the form of unanswered questions associated with images and (3)
demonstrating that a simple data augmentation strategy inspired by our insights
results in a 7.1% improvement on the standard VQA benchmark.
Omar A. Elgendy, Stanley H. Chan
Comments: 11 pages main paper, and 8 pages supplementary
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Quanta Image Sensor (QIS) is a binary imaging device envisioned as a
candidate for the next generation image sensor after CCD and CMOS. Equipped
with a massive number of single photon detectors, the sensor has a threshold
(q) above which the number of arriving photons will trigger a binary response
“1”. Existing methods in the device literature typically assume that (q = 1)
for circuit simplicity. We argue that a spatially varying threshold can
significantly improve the signal to noise ratio of the reconstructed image. In
this paper, we present an optimal threshold design method. We make two
contributions. First, we derive a set of oracle threshold results to inform the
maximally achievable performance. We show that the oracle threshold should
match exactly with the underlying pixel intensity. Second, we show that around
the oracle threshold there exists a set of thresholds that give asymptotically
unbiased reconstructions. The asymptotic unbiasedness has a phase transition
behavior which allows us to develop a practical threshold update scheme using a
bisection method. Experimentally, the new threshold design method achieves
better rate of convergence than existing methods.
Sitao Xiang, Hao Li
Comments: 27 pages, 23 figures
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
As in many neural network architectures, the use of Batch Normalization (BN)
has become a common practice for Generative Adversarial Networks (GAN). In this
paper, we propose using Euclidean reconstruction error on a test set for
evaluating the quality of GANs. Under this measure, together with a careful
visual analysis of generated samples, we found that while being able to speed
training during early stages, BN may have negative effects on the quality of
the trained model and the stability of the training process. Furthermore,
Weight Normalization, a more recently proposed technique, is found to improve
the reconstruction, training speed and especially the stability of GANs, and
thus should be used in place of BN in GAN training.
Yurong You, Xinlei Pan, Ziyan Wang, Cewu Lu
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Reinforcement learning is considered as a promising direction for driving
policy learning. However, training autonomous driving vehicle with
reinforcement learning in real environment involves non-affordable
trial-and-error. It is more desirable to first train in a virtual environment
and then transfer to the real environment. In this paper, we propose a novel
realistic translation network to make model trained in virtual environment be
workable in real world. The proposed network can convert non-realistic virtual
image input into a realistic one with similar scene structure. Given realistic
frames as input, driving policy trained by reinforcement learning can nicely
adapt to real world driving. Experiments show that our proposed virtual to real
(VR) reinforcement learning (RL) works pretty well. To our knowledge, this is
the first successful case of driving policy trained by reinforcement learning
that can adapt to real world driving data.
Valentin Flunkert, David Salinas, Jan Gasthaus
Comments: Under review for ICML 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
A key enabler for optimizing business processes is accurately estimating the
probability distribution of a time series future given its past. Such
probabilistic forecasts are crucial for example for reducing excess inventory
in supply chains. In this paper we propose DeepAR, a novel methodology for
producing accurate probabilistic forecasts, based on training an
auto-regressive recurrent network model on a large number of related time
series. We show through extensive empirical evaluation on several real-world
forecasting data sets that our methodology is more accurate than
state-of-the-art models, while requiring minimal feature engineering.
Mieczysław Kłopotek
Comments: 70 pages, an internat intermediate research report, dating back to 1993
Subjects: Artificial Intelligence (cs.AI)
We develop our interpretation of the joint belief distribution and of
evidential updating that matches the following basic requirements:
* there must exist an efficient method for reasoning within this framework
* there must exist a clear correspondence between the contents of the
knowledge base and the real world
* there must be a clear correspondence between the reasoning method and some
real world process
* there must exist a clear correspondence between the results of the
reasoning process and the results of the real world process corresponding to
the reasoning process.
Yurong You, Xinlei Pan, Ziyan Wang, Cewu Lu
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Reinforcement learning is considered as a promising direction for driving
policy learning. However, training autonomous driving vehicle with
reinforcement learning in real environment involves non-affordable
trial-and-error. It is more desirable to first train in a virtual environment
and then transfer to the real environment. In this paper, we propose a novel
realistic translation network to make model trained in virtual environment be
workable in real world. The proposed network can convert non-realistic virtual
image input into a realistic one with similar scene structure. Given realistic
frames as input, driving policy trained by reinforcement learning can nicely
adapt to real world driving. Experiments show that our proposed virtual to real
(VR) reinforcement learning (RL) works pretty well. To our knowledge, this is
the first successful case of driving policy trained by reinforcement learning
that can adapt to real world driving data.
Devinder Kumar, Alexander Wong, Graham W. Taylor
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Multimedia (cs.MM)
In this work, we propose CLass-Enhanced Attentive Response (CLEAR): an
approach to visualize and understand the decisions made by deep neural networks
(DNNs) given a specific input. CLEAR facilitates the visualization of attentive
regions and levels of interest of DNNs during the decision-making process. It
also enables the visualization of the most dominant classes associated with
these attentive regions of interest. As such, CLEAR can mitigate some of the
shortcomings of heatmap-based methods associated with decision ambiguity, and
allows for better insights into the decision-making process of DNNs.
Quantitative and qualitative experiments across three different datasets
demonstrate the efficacy of CLEAR for gaining a better understanding of the
inner workings of DNNs during the decision-making process.
Jonas Adler, Ozan Öktem
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Functional Analysis (math.FA); Numerical Analysis (math.NA)
We propose a partially learned approach for the solution of ill posed inverse
problems with not necessarily linear forward operators. The method builds on
ideas from classical regularization theory and recent advances in deep learning
to perform learning while making use of prior information about the inverse
problem encoded in the forward operator, noise model and a regularizing
functional. The method results in a gradient-like iterative scheme, where the
“gradient” component is learned using a convolutional network that includes the
gradients of the data discrepancy and regularizer as input in each iteration.
We present results of such a partially learned gradient scheme on a
non-linear tomographic inversion problem with simulated data from both the
Sheep-Logan phantom as well as a head CT. The outcome is compared against FBP
and TV reconstruction and the proposed method provides a 5.4 dB PSNR
improvement over the TV reconstruction while being significantly faster, giving
reconstructions of 512 x 512 volumes in about 0.4 seconds using a single GPU.
Ying Zhang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF)
This paper considers a general data-fitting problem over a networked system,
in which many computing nodes are connected by an undirected graph. This kind
of problem can find many real-world applications and has been studied
extensively in the literature. However, existing solutions either need a
central controller for information sharing or requires slot synchronization
among different nodes, which increases the difficulty of practical
implementations, especially for a very large and heterogeneous system.
As a contrast, in this paper, we treat the data-fitting problem over the
network as a stochastic programming problem with many constraints. By adapting
the results in a recent paper, we design a fully distributed and asynchronized
stochastic gradient descent (SGD) algorithm. We show that our algorithm can
achieve global optimality and consensus asymptotically by only local
computations and communications. Additionally, we provide a sharp lower bound
for the convergence speed in the regular graph case. This result fits the
intuition and provides guidance to design a `good’ network topology to speed up
the convergence. Also, the merit of our design is validated by experiments on
both synthetic and real-world datasets.
Bence Cserna, Marek Petrik, Reazul Hasan Russel, Wheeler Ruml
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Multi-armed bandits are a quintessential machine learning problem requiring
the balancing of exploration and exploitation. While there has been progress in
developing algorithms with strong theoretical guarantees, there has been less
focus on practical near-optimal finite-time performance. In this paper, we
propose an algorithm for Bayesian multi-armed bandits that utilizes
value-function-driven online planning techniques. Building on previous work on
UCB and Gittins index, we introduce linearly-separable value functions that
take both the expected return and the benefit of exploration into consideration
to perform n-step lookahead. The algorithm enjoys a sub-linear performance
guarantee and we present simulation results that confirm its strength in
problems with structured priors. The simplicity and generality of our approach
makes it a strong candidate for analyzing more complex multi-armed bandit
problems.
Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, Li-Jia Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Image captioning is a challenging problem owing to the complexity in
understanding the image content and diverse ways of describing it in natural
language. Recent advances in deep neural networks have substantially improved
the performance of this task. Most state-of-the-art approaches follow an
encoder-decoder framework, which generates captions using a sequential
recurrent prediction model. However, in this paper, we introduce a novel
decision-making framework for image captioning. We utilize a “policy network”
and a “value network” to collaboratively generate captions. The policy network
serves as a local guidance by providing the confidence of predicting the next
word according to the current state. Additionally, the value network serves as
a global and lookahead guidance by evaluating all possible extensions of the
current state. In essence, it adjusts the goal of predicting the correct words
towards the goal of generating captions similar to the ground truth captions.
We train both networks using an actor-critic reinforcement learning model, with
a novel reward defined by visual-semantic embedding. Extensive experiments and
analyses on the Microsoft COCO dataset show that the proposed framework
outperforms state-of-the-art approaches across different evaluation metrics.
Joel Mackenzie, J. Shane Culpepper, Roi Blanco, Matt Crane, Charles L. A. Clarke, Jimmy Lin
Subjects: Information Retrieval (cs.IR)
Scalable web search systems typically employ multi-stage retrieval
architectures, where an initial stage generates a set of candidate documents
that are then pruned and re-ranked. Since subsequent stages typically exploit a
multitude of features of varying costs using machine-learned models, reducing
the number of documents that are considered at each stage improves latency. In
this work, we propose and validate a unified framework that can be used to
predict a wide range of performance-sensitive parameters which minimize
effectiveness loss, while simultaneously minimizing query latency, across all
stages of a multi-stage search architecture. Furthermore, our framework can be
easily applied in large-scale IR systems, can be trained without explicitly
requiring relevance judgments, and can target a variety of different
efficiency-effectiveness trade-offs, making it well suited to a wide range of
search scenarios. Our results show that we can reliably predict a number of
different parameters on a per-query basis, while simultaneously detecting and
minimizing the likelihood of tail-latency queries that exceed a pre-specified
performance budget. As a proof of concept, we use the prediction framework to
help alleviate the problem of tail-latency queries in early stage retrieval. On
the standard ClueWeb09B collection and 31k queries, we show that our new hybrid
system can reliably achieve a maximum query time of 200 ms with a 99.99%
response time guarantee without a significant loss in overall effectiveness.
The solutions presented are practical, and can easily be used in large-scale
distributed search engine deployments with a small amount of additional
overhead.
Kai Hui, Andrew Yates, Klaus Berberich, Gerard de Melo
Subjects: Information Retrieval (cs.IR)
In order to adopt deep learning for ad-hoc information retrieval, it is
essential to establish suitable representations of query-document pairs and to
design neural architectures that are able to digest such representations. In
particular, they ought to capture all relevant information required to assess
the relevance of a document for a given user query, including term overlap as
well as positional information such as proximity and term dependencies. While
previous work has successfully captured unigram term matches, none has
successfully used position-dependent information on a standard benchmark test
collection. In this work, we address this gap by encoding the relevance
matching in terms of similarity matrices and using a deep model to digest such
matrices. We present a novel model architecture consisting of convolutional
layers to capture term dependencies and proximity among query term occurrences,
followed by a recurrent layer to capture relevance over different query terms.
Extensive experiments on TREC Web Track data confirm that the proposed model
with similarity matrix representations yields improved search results.
Wei-Ning Hsu, Yu Zhang, James Glass
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
An ability to model a generative process and learn a latent representation
for speech in an unsupervised fashion will be crucial to process vast
quantities of unlabelled speech data. Recently, deep probabilistic generative
models such as Variational Autoencoders (VAEs) have achieved tremendous success
in modeling natural images. In this paper, we apply a convolutional VAE to
model the generative process of natural speech. We derive latent space
arithmetic operations to disentangle learned latent representations. We
demonstrate the capability of our model to modify the phonetic content or the
speaker identity for speech segments using the derived operations, without the
need for parallel supervisory data.
Emiel van Miltenburg, Desmond Elliott
Comments: Submitted
Subjects: Computation and Language (cs.CL)
In recent years we have seen rapid and significant progress in automatic
image description but what are the open problems in this area? Most work has
been evaluated using text-based similarity metrics, which only indicate that
there have been improvements, without explaining what has improved. In this
paper, we present a detailed error analysis of the descriptions generated by a
state-of-the-art attention-based model. Our analysis operates on two levels:
first we check the descriptions for accuracy, and then we categorize the types
of errors we observe in the inaccurate descriptions. We find only 20% of the
descriptions are free from errors, and surprisingly that 26% are unrelated to
the image. Finally, we manually correct the most frequently occurring error
types (e.g. gender identification) to estimate the performance reward for
addressing these errors, observing gains of 0.2–1 BLEU point per type.
Holger Schwenk, Ke Tran, Orhan Firat, Matthijs Douze
Comments: 8 pages, 3 figures
Subjects: Computation and Language (cs.CL)
In this paper, we use the framework of neural machine translation to learn
joint sentence representations across different languages. Our hope is that a
representation which is independent of the language a sentence is written in,
is likely to capture the underlying semantics. We search and compare more than
1.4M sentence representations in three different languages and study the
characteristics of close sentences. We provide experimental evidence that
sentences that are close in embedding space are indeed semantically highly
related, but often have quite different structure and syntax. These relations
also hold when comparing sentences in different languages.
Chloé Braud, Ophélie Lacroix, Anders Søgaard
Comments: To appear in Proceedings of ACL 2017
Subjects: Computation and Language (cs.CL)
Discourse segmentation is a crucial step in building end-to-end discourse
parsers. However, discourse segmenters only exist for a few languages and
domains. Typically they only detect intra-sentential segment boundaries,
assuming gold standard sentence and token segmentation, and relying on
high-quality syntactic parses and rich heuristics that are not generally
available across languages and domains. In this paper, we propose statistical
discourse segmenters for five languages and three domains that do not rely on
gold pre-annotations. We also consider the problem of learning discourse
segmenters when no labeled data is available for a language. Our fully
supervised system obtains 89.5% F1 for English newswire, with slight drops in
performance on other domains, and we report supervised and unsupervised
(cross-lingual) results for five languages in total.
Afshin Rahimi, Trevor Cohn, Timothy Baldwin
Subjects: Computation and Language (cs.CL)
We propose a simple yet effective text- based user geolocation model based on
a neural network with one hidden layer, which achieves state of the art
performance over three Twitter benchmark geolocation datasets, in addition to
producing word and phrase embeddings in the hidden layer that we show to be
useful for detecting dialectal terms. As part of our analysis of dialectal
terms, we release DAREDS, a dataset for evaluating dialect term detection
methods.
Tom Ouyang, David Rybach, Françoise Beaufays, Michael Riley
Subjects: Computation and Language (cs.CL)
We propose a finite-state transducer (FST) representation for the models used
to decode keyboard inputs on mobile devices. Drawing from learnings from the
field of speech recognition, we describe a decoding framework that can satisfy
the strict memory and latency constraints of keyboard input. We extend this
framework to support functionalities typically not present in speech
recognition, such as literal decoding, autocorrections, word completions, and
next word predictions.
We describe the general framework of what we call for short the keyboard “FST
decoder” as well as the implementation details that are new compared to a
speech FST decoder. We demonstrate that the FST decoder enables new UX features
such as post-corrections. Finally, we sketch how this decoder can support
advanced features such as personalization and contextualization.
Nobuhiro Kaji, Hayato Kobayashi
Subjects: Computation and Language (cs.CL)
This paper explores an incremental training strategy for the skip-gram model
with negative sampling (SGNS) from both empirical and theoretical perspectives.
Existing methods of neural word embeddings, including SNGS, are multi-pass
algorithms and thus cannot perform incremental model update. To address this
problem, we present a simple incremental extension of SNGS and provide a
thorough theoretical analysis to demonstrate its validity. Empirical
experiments demonstrated the correctness of the theoretical analysis as well as
the practical usefulness of the incremental algorithm.
Vishal Sharma, Kathiravan Srinivasan, Dushantha Nalin K. Jayakody, Omer Rana, Ravinder Kumar
Comments: 7 pages, 4 Figures, International Conference on Communication, Management and Information Technology (ICCMIT 2017), At Warsaw, Poland, 3-5 April 2017, this http URL (Best Paper Award)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Computational resource provisioning that is closer to a user is becoming
increasingly important, with a rise in the number of devices making continuous
service requests and with the significant recent take up of latency-sensitive
applications, such as streaming and real-time data processing. Fog computing
provides a solution to such types of applications by bridging the gap between
the user and public/private cloud infrastructure via the inclusion of a “fog”
layer. Such approach is capable of reducing the overall processing latency, but
the issues of redundancy, cost-effectiveness in utilizing such computing
infrastructure and handling services on the basis of a difference in their
characteristics remain. This difference in characteristics of services because
of variations in the requirement of computational resources and processes is
termed as service heterogeneity. A potential solution to these issues is the
use of Osmotic Computing — a recently introduced paradigm that allows division
of services on the basis of their resource usage, based on parameters such as
energy, load, processing time on a data center vs. a network edge resource.
Service provisioning can then be divided across different layers of a
computational infrastructure, from edge devices, in-transit nodes, and a data
center, and supported through an Osmotic software layer. In this paper, a
fitness-based Osmosis algorithm is proposed to provide support for osmotic
computing by making more effective use of existing Fog server resources. The
proposed approach is capable of efficiently distributing and allocating
services by following the principle of osmosis. The results are presented using
numerical simulations demonstrating gains in terms of lower allocation time and
a higher probability of services being handled with high resource utilization.
Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan
Comments: 49 pages
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We develop a novel family of algorithms for the online learning setting with
regret against any data sequence bounded by the empirical Rademacher complexity
of that sequence. To develop a general theory of when this type of adaptive
regret bound is achievable we establish a connection to the theory of
decoupling inequalities for martingales in Banach spaces. When the hypothesis
class is a set of linear functions bounded in some norm, such a regret bound is
achievable if and only if the norm satisfies certain decoupling inequalities
for martingales. Donald Burkholder’s celebrated geometric characterization of
decoupling inequalities (1984) states that such an inequality holds if and only
if there exists a special function called a Burkholder function satisfying
certain restricted concavity properties. Our online learning algorithms are
efficient in terms of queries to this function.
We realize our general theory by giving novel efficient algorithms for
classes including lp norms, Schatten p-norms, group norms, and reproducing
kernel Hilbert spaces. The empirical Rademacher complexity regret bound implies
— when used in the i.i.d. setting — a data-dependent complexity bound for
excess risk after online-to-batch conversion. To showcase the power of the
empirical Rademacher complexity regret bound, we derive improved rates for a
supervised learning generalization of the online learning with low rank experts
task and for the online matrix prediction task.
In addition to obtaining tight data-dependent regret bounds, our algorithms
enjoy improved efficiency over previous techniques based on Rademacher
complexity, automatically work in the infinite horizon setting, and are
scale-free. To obtain such adaptive methods, we introduce novel machinery, and
the resulting algorithms are not based on the standard tools of online convex
optimization.
Ying Zhang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF)
This paper considers a general data-fitting problem over a networked system,
in which many computing nodes are connected by an undirected graph. This kind
of problem can find many real-world applications and has been studied
extensively in the literature. However, existing solutions either need a
central controller for information sharing or requires slot synchronization
among different nodes, which increases the difficulty of practical
implementations, especially for a very large and heterogeneous system.
As a contrast, in this paper, we treat the data-fitting problem over the
network as a stochastic programming problem with many constraints. By adapting
the results in a recent paper, we design a fully distributed and asynchronized
stochastic gradient descent (SGD) algorithm. We show that our algorithm can
achieve global optimality and consensus asymptotically by only local
computations and communications. Additionally, we provide a sharp lower bound
for the convergence speed in the regular graph case. This result fits the
intuition and provides guidance to design a `good’ network topology to speed up
the convergence. Also, the merit of our design is validated by experiments on
both synthetic and real-world datasets.
Jian Du, Shaodan Ma, Yik-Chung Wu, Soummya Kar, José M. F. Moura
Comments: arXiv admin note: substantial text overlap with arXiv:1611.02010
Subjects: Learning (cs.LG)
Gaussian belief propagation (BP) has been widely used for distributed
estimation in large-scale networks such as the smart grid, communication
networks, and social networks, where local measurements/observations are
scattered over a wide geographical area. However, the convergence of Gaus- sian
BP is still an open issue. In this paper, we consider the convergence of
Gaussian BP, focusing in particular on the convergence of the information
matrix. We show analytically that the exchanged message information matrix
converges for arbitrary positive semidefinite initial value, and its dis- tance
to the unique positive definite limit matrix decreases exponentially fast.
Bence Cserna, Marek Petrik, Reazul Hasan Russel, Wheeler Ruml
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Multi-armed bandits are a quintessential machine learning problem requiring
the balancing of exploration and exploitation. While there has been progress in
developing algorithms with strong theoretical guarantees, there has been less
focus on practical near-optimal finite-time performance. In this paper, we
propose an algorithm for Bayesian multi-armed bandits that utilizes
value-function-driven online planning techniques. Building on previous work on
UCB and Gittins index, we introduce linearly-separable value functions that
take both the expected return and the benefit of exploration into consideration
to perform n-step lookahead. The algorithm enjoys a sub-linear performance
guarantee and we present simulation results that confirm its strength in
problems with structured priors. The simplicity and generality of our approach
makes it a strong candidate for analyzing more complex multi-armed bandit
problems.
Wei-Ning Hsu, Yu Zhang, James Glass
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
An ability to model a generative process and learn a latent representation
for speech in an unsupervised fashion will be crucial to process vast
quantities of unlabelled speech data. Recently, deep probabilistic generative
models such as Variational Autoencoders (VAEs) have achieved tremendous success
in modeling natural images. In this paper, we apply a convolutional VAE to
model the generative process of natural speech. We derive latent space
arithmetic operations to disentangle learned latent representations. We
demonstrate the capability of our model to modify the phonetic content or the
speaker identity for speech segments using the derived operations, without the
need for parallel supervisory data.
Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, David P. Woodruff
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Numerical Analysis (math.NA)
Understanding the singular value spectrum of a matrix (A in mathbb{R}^{n
imes n}) is a fundamental task in countless applications. In matrix
multiplication time, it is possible to perform a full SVD and directly compute
the singular values (sigma_1,…,sigma_n) in (n^omega) time. However, little
is known about algorithms that break this runtime barrier.
Using tools from stochastic trace estimation, polynomial approximation, and
fast system solvers, we show how to efficiently isolate different ranges of
(A)’s spectrum and approximate the number of singular values in these ranges.
We thus effectively compute a histogram of the spectrum, which can stand in for
the true singular values in many applications.
We use this primitive to give the first algorithms for approximating a wide
class of symmetric matrix norms in faster than matrix multiplication time. For
example, we give a ((1 + epsilon)) approximation algorithm for the Schatten
(1)-norm (the nuclear norm) running in just ( ilde O((nnz(A)n^{1/3} +
n^2)epsilon^{-3})) time for (A) with uniform row sparsity or ( ilde
O(n^{2.18} epsilon^{-3})) time for dense (A). The runtime scales smoothly for
general Schatten-(p) norms, notably becoming ( ilde O (p cdot nnz(A)
epsilon^{-3})) for any (p ge 2).
At the same time, we show that the complexity of spectrum approximation is
inherently tied to fast matrix multiplication in the small (epsilon) regime.
We prove that achieving milder (epsilon) dependencies in our algorithms would
imply faster than matrix multiplication time triangle detection for general
graphs. This further implies that highly accurate algorithms running in
subcubic time yield subcubic time matrix multiplication. As an application of
our bounds, we show that precisely computing all effective resistances in a
graph in less than matrix multiplication time is likely difficult, barring a
major breakthrough.
Devinder Kumar, Alexander Wong, Graham W. Taylor
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Multimedia (cs.MM)
In this work, we propose CLass-Enhanced Attentive Response (CLEAR): an
approach to visualize and understand the decisions made by deep neural networks
(DNNs) given a specific input. CLEAR facilitates the visualization of attentive
regions and levels of interest of DNNs during the decision-making process. It
also enables the visualization of the most dominant classes associated with
these attentive regions of interest. As such, CLEAR can mitigate some of the
shortcomings of heatmap-based methods associated with decision ambiguity, and
allows for better insights into the decision-making process of DNNs.
Quantitative and qualitative experiments across three different datasets
demonstrate the efficacy of CLEAR for gaining a better understanding of the
inner workings of DNNs during the decision-making process.
Valentin Flunkert, David Salinas, Jan Gasthaus
Comments: Under review for ICML 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
A key enabler for optimizing business processes is accurately estimating the
probability distribution of a time series future given its past. Such
probabilistic forecasts are crucial for example for reducing excess inventory
in supply chains. In this paper we propose DeepAR, a novel methodology for
producing accurate probabilistic forecasts, based on training an
auto-regressive recurrent network model on a large number of related time
series. We show through extensive empirical evaluation on several real-world
forecasting data sets that our methodology is more accurate than
state-of-the-art models, while requiring minimal feature engineering.
Mohsen Moradi
Comments: 5 pages, 6 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
In this study we determined neural network weights and biases by Imperialist
Competitive Algorithm (ICA) in order to train network for predicting earthquake
intensity in Richter. For this reason, we used dependent parameters like
earthquake occurrence time, epicenter’s latitude and longitude in degree, focal
depth in kilometer, and the seismological center distance from epicenter and
earthquake focal center in kilometer which has been provided by Berkeley data
base. The studied neural network has two hidden layer: its first layer has 16
neurons and the second layer has 24 neurons. By using ICA algorithm, average
error for testing data is 0.0007 with a variance equal to 0.318. The earthquake
prediction error in Richter by MSE criteria for ICA algorithm is 0.101, but by
using GA, the MSE value is 0.115.
Dino Ienco, Raffaele Gaetano, Claire Dupaquier, Pierre Maurel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Nowadays, modern earth observation programs produce huge volumes of satellite
images time series (SITS) that can be useful to monitor geographical areas
through time. How to efficiently analyze such kind of information is still an
open question in the remote sensing field. Recently, deep learning methods
proved suitable to deal with remote sensing data mainly for scene
classification (i.e. Convolutional Neural Networks – CNNs – on single images)
while only very few studies exist involving temporal deep learning approaches
(i.e Recurrent Neural Networks – RNNs) to deal with remote sensing time series.
In this letter we evaluate the ability of Recurrent Neural Networks, in
particular the Long-Short Term Memory (LSTM) model, to perform land cover
classification considering multi-temporal spatial data derived from a time
series of satellite images. We carried out experiments on two different
datasets considering both pixel-based and object-based classification. The
obtained results show that Recurrent Neural Networks are competitive compared
to state-of-the-art classifiers, and may outperform classical approaches in
presence of low represented and/or highly mixed classes. We also show that
using the alternative feature representation generated by LSTM can improve the
performances of standard classifiers.
Lin Ma, Caifa Zhou, Xi Liu, Yubin Xu
Comments: 3 figures, from Journal of Harbin Institute of Technology
Journal-ref: Journal of Harbin Institute of Technology, 20(3), pp.119–123
(2013)
Subjects: Methodology (stat.ME); Learning (cs.LG); Machine Learning (stat.ML)
Recently manifold learning algorithm for dimensionality reduction attracts
more and more interests, and various linear and nonlinear, global and local
algorithms are proposed. The key step of manifold learning algorithm is the
neighboring region selection. However, so far for the references we know, few
of which propose a generally accepted algorithm to well select the neighboring
region. So in this paper, we propose an adaptive neighboring selection
algorithm, which successfully applies the LLE and ISOMAP algorithms in the
test. It is an algorithm that can find the optimal K nearest neighbors of the
data points on the manifold. And the theoretical basis of the algorithm is the
approximated curvature of the data point on the manifold. Based on Riemann
Geometry, Jacob matrix is a proper mathematical concept to predict the
approximated curvature. By verifying the proposed algorithm on embedding Swiss
roll from R3 to R2 based on LLE and ISOMAP algorithm, the simulation results
show that the proposed adaptive neighboring selection algorithm is feasible and
able to find the optimal value of K, making the residual variance relatively
small and better visualization of the results. By quantitative analysis, the
embedding quality measured by residual variance is increased 45.45% after using
the proposed algorithm in LLE.
Vladimir Golkov, Marcin J. Skwark, Atanas Mirchev, Georgi Dikov, Alexander R. Geanes, Jeffrey Mendenhall, Jens Meiler, Daniel Cremers
Subjects: Biomolecules (q-bio.BM); Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
Predicting the biological function of molecules, be it proteins or drug-like
compounds, from their atomic structure is an important and long-standing
problem. Function is dictated by structure, since it is by spatial interactions
that molecules interact with each other, both in terms of steric
complementarity, as well as intermolecular forces. Thus, the electron density
field and electrostatic potential field of a molecule contain the “raw
fingerprint” of how this molecule can fit to binding partners. In this paper,
we show that deep learning can predict biological function of molecules
directly from their raw 3D approximated electron density and electrostatic
potential fields. Protein function based on EC numbers is predicted from the
approximated electron density field. In another experiment, the activity of
small molecules is predicted with quality comparable to state-of-the-art
descriptor-based methods. We propose several alternative computational models
for the GPU with different memory and runtime requirements for different sizes
of molecules and of databases. We also propose application-specific
multi-channel data representations. With future improvements of training
datasets and neural network settings in combination with complementary
information sources (sequence, genomic context, expression level), deep
learning can be expected to show its generalization power and revolutionize the
field of molecular function prediction.
Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Shin Ishii
Comments: Extended version of the paper published at ICLR2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We propose a new regularization method based on virtual adversarial loss: a
new measure of local smoothness of the output distribution. Virtual adversarial
loss is defined as the robustness of the model’s posterior distribution against
local perturbation around each input data point. Our method is similar to
adversarial training, but differs from adversarial training in that it
determines the adversarial direction based only on the output distribution and
that it is applicable to a semi-supervised setting. Because the directions in
which we smooth the model are virtually adversarial, we call our method virtual
adversarial training (VAT). The computational cost of VAT is relatively low.
For neural networks, the approximated gradient of virtual adversarial loss can
be computed with no more than two pairs of forward and backpropagations. In our
experiments, we applied VAT to supervised and semi-supervised learning on
multiple benchmark datasets. With additional improvement based on entropy
minimization principle, our VAT achieves the state-of-the-art performance on
SVHN and CIFAR-10 for semi-supervised learning tasks.
Sitao Xiang, Hao Li
Comments: 27 pages, 23 figures
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
As in many neural network architectures, the use of Batch Normalization (BN)
has become a common practice for Generative Adversarial Networks (GAN). In this
paper, we propose using Euclidean reconstruction error on a test set for
evaluating the quality of GANs. Under this measure, together with a careful
visual analysis of generated samples, we found that while being able to speed
training during early stages, BN may have negative effects on the quality of
the trained model and the stability of the training process. Furthermore,
Weight Normalization, a more recently proposed technique, is found to improve
the reconstruction, training speed and especially the stability of GANs, and
thus should be used in place of BN in GAN training.
Clément Godard, Oisin Mac Aodha, Gabriel J. Brostow
Comments: CVPR 2017 oral
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Learning based methods have shown very promising results for the task of
depth estimation in single images. However, most existing approaches treat
depth prediction as a supervised regression problem and as a result, require
vast quantities of corresponding ground truth depth data for training. Just
recording quality depth data in a range of environments is a challenging
problem. In this paper, we innovate beyond existing approaches, replacing the
use of explicit depth data during training with easier-to-obtain binocular
stereo footage.
We propose a novel training objective that enables our convolutional neural
network to learn to perform single image depth estimation, despite the absence
of ground truth depth data. Exploiting epipolar geometry constraints, we
generate disparity images by training our network with an image reconstruction
loss. We show that solving for image reconstruction alone results in poor
quality depth images. To overcome this problem, we propose a novel training
loss that enforces consistency between the disparities produced relative to
both the left and right images, leading to improved performance and robustness
compared to existing approaches. Our method produces state of the art results
for monocular depth estimation on the KITTI driving dataset, even outperforming
supervised methods that have been trained with ground truth depth.
Peter Jung, Felix Krahmer, Dominik Stöger
Comments: 49 pages, 1 figure
Subjects: Information Theory (cs.IT)
We consider simultaneous blind deconvolution of r source signals from its
noisy superposition, a problem also referred to blind demixing and
deconvolution. This signal processing problem occurs in the context of the
Internet of Things where a massive number of sensors sporadically communicate
only short messages over unknown channels. We show that robust recovery of
message and channel vectors can be achieved via convex optimization when random
linear encoding using i.i.d. complex Gaussian matrices is used at the devices
and the number of required measurements at the receiver scales with the degrees
of freedom of the overall estimation problem. Since the scaling is linear in r
our result significantly improves over recent works.
Jean Barbier, Nicolas Macris
Comments: Presented at the International Symposium on Information Theory (ISIT) 2017, Aachen, Germany
Subjects: Information Theory (cs.IT); Disordered Systems and Neural Networks (cond-mat.dis-nn)
Consider random linear estimation with Gaussian measurement matrices and
noise. One can compute infinitesimal variations of the mutual information under
infinitesimal variations of the signal-to-noise ratio or of the measurement
rate. We discuss how each variation is related to the minimum mean-square error
and deduce that the two variations are directly connected through a very simple
identity. The main technical ingredient is a new interpolation method called
“sub-extensive interpolation method”. We use it to provide a new proof of an
I-MMSE relation recently found by Reeves and Pfister [1] when the measurement
rate is varied. Our proof makes it clear that this relation is intimately
related to another I-MMSE relation also recently proved in [2]. One can
directly verify that the identity relating the two types of variation of mutual
information is indeed consistent with the one letter replica symmetric formula
for the mutual information, first derived by Tanaka [3] for binary signals, and
recently proved in more generality in [1,2,4,5] (by independent methods).
However our proof is independent of any knowledge of Tanaka’s formula.
Roy Yates, Elie Najm, Emina Soljanin, Jing Zhong
Subjects: Information Theory (cs.IT)
Using an age of information (AoI) metric, we examine the transmission of
coded updates through a binary erasure channel to a monitor/receiver. We start
by deriving the average status update age of an infinite incremental redundancy
(IIR) system in which the transmission of a k-symbol update continuesuntil k
symbols are received. This system is then compared to a fixed redundancy (FR)
system in which each update is transmitted as an n symbol packet and the packet
is successfully received if and only if at least k symbols are received. If
fewer than k symbols are received, the update is discarded. Unlike the IIR
system, the FR system requires no feedback from the receiver. For a single
monitor system, we show that tuning the redundancy to the symbol erasure rate
enables the FR system to perform as well as the IIR system. As the number of
monitors is increased, the FR system outperforms the IIR system that guarantees
delivery of all updates to all monitors.
Ali H. Abdollahi Bafghi, Mahtab Mirmohseni, Mohammad Reza Aref
Subjects: Information Theory (cs.IT)
We study the problem of joint information and energy transfer in a two-hop
channel with a Radio frequency (RF) energy harvesting relay. We consider a
finite battery size at the relay and deterministic energy loss in transmitting
energy. In other words, to be able to send an energy-contained symbol, the
relay must receive multiple energy-contained symbols. Thus, we face a kind of
channel with memory. We model the energy saved in battery as channel state with
the challenge that the receiver does not know the channel state. First, we
consider the problem without any channel noise and derive an achievable rate.
Next, we extend the results to the case with an independent and identically
distributed noise in the second hop (the relay-receiver link).
Lin Sok, Minjia Shi, Patrick Solé
Comments: This paper was presented in part at the International Conference on Coding, Cryptography and Related Topics April 7-10, 2017, Shandong, China
Subjects: Information Theory (cs.IT)
In this paper, we prove existence of optimal complementary dual codes (LCD
codes) over large finite fields. We also give methods to generate orthogonal
matrices over finite fields and then apply them to construct LCD codes.
Construction methods include random sampling in the orthogonal group, code
extension, matrix product codes and projection over a self-dual basis.
Ragnar Freij-Hollanti, Camilla Hollanti, Thomas Westerbäck
Comments: A Springer book will be published later this year as a final outcome of COST Action IC1104 “Random Network Coding and Designs over GF(q)”. This tutorial paper is one of the book chapters
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)
Recent research on distributed storage systems (DSSs) has revealed
interesting connections between matroid theory and locally repairable codes
(LRCs). The goal of this chapter is to introduce the reader to matroids and
polymatroids, and illustrate their relation to distribute storage systems.
While many of the results are rather technical in nature, effort is made to
increase accessibility via simple examples. The chapter embeds all the
essential features of LRCs, namely locality, availability, and hierarchy
alongside with related generalised Singleton bounds.
Elie Najm, Roy Yates, Emina Soljanin
Subjects: Information Theory (cs.IT)
We consider a system where randomly generated updates are to be transmitted
to a monitor, but only a single update can be in the transmission service at a
time. Therefore, the source has to prioritize between the two possible
transmission policies: preempting the current update or discarding the new one.
We consider Poisson arrivals and general service time, and refer to this system
as the M/G/1/1 queue. We start by studying the average status update age and
the optimal update arrival rate for these two schemes under general service
time distribution. We then apply these results on two practical scenarios in
which updates are sent through an erasure channel using (a) an infinite
incremental redundancy (IIR) HARQ system and (b) a fixed redundancy (FR) HARQ
system. We show that in both schemes the best strategy would be not to preempt.
Moreover, we also prove that, from an age point of view, IIR is better than FR.
Wayan Wicke, Arman Ahmadzadeh, Vahid Jamali, Harald Unterweger, Christoph Alexiou, Robert Schober
Comments: 6 pages, 4 figures, 1 table. Submitted to the 4th ACM International Conference on Nanoscale Computing and Communication September 2017 (ACM NANOCOM 2017)
Subjects: Emerging Technologies (cs.ET); Information Theory (cs.IT)
In this paper, we propose to use magnetic nanoparticles as information
carriers for molecular communication. This enables the use of an external
magnetic field to guide information-carrying particles towards the receiver. We
show that the particle movement can be mathematically modeled as diffusion with
drift. Thereby, we reveal that the key parameters determining the magnetic
force are particle size and magnetic field gradient. As an example, we consider
magnetic nanoparticle based communication in a blood vessel. For this bounded
environment, we derive an analytical expression for the channel impulse
response subject to fluid flow and magnetic drift. Numerical results, obtained
by particle-based simulation, validate the accuracy of the derived analytical
expressions. Furthermore, adopting the symbol error rate as performance metric,
we show that using magnetic nanoparticles facilitates reliable communication,
even in the presence of fluid flow.
Branko Dragovich, Andrei Yu. Khrennikov, Nataša Ž. Mišić
Comments: 20 pages. Accepted for publication in Applied Mathematics and Computation
Subjects: Other Quantitative Biology (q-bio.OT); Information Theory (cs.IT); Metric Geometry (math.MG)
Ultrametric approach to the genetic code and the genome is considered and
developed. (p)-Adic degeneracy of the genetic code is pointed out. Ultrametric
tree of the codon space is presented. It is shown that codons and amino acids
can be treated as (p)-adic ultrametric networks. Ultrametric modification of
the Hamming distance is defined and noted how it can be useful. Ultrametric
approach with (p)-adic distance is an attractive and promising trend towards
investigation of bioinformation.