Luís F. Simões, Dario Izzo, Evert Haasdijk, A. E. Eiben
Comments: Code available at this https URL
Subjects: Neural and Evolutionary Computing (cs.NE); Space Physics (physics.space-ph)
The design of spacecraft trajectories for missions visiting multiple
celestial bodies is here framed as a multi-objective bilevel optimization
problem. A comparative study is performed to assess the performance of
different Beam Search algorithms at tackling the combinatorial problem of
finding the ideal sequence of bodies. Special focus is placed on the
development of a new hybridization between Beam Search and the Population-based
Ant Colony Optimization algorithm. An experimental evaluation shows all
algorithms achieving exceptional performance on a hard benchmark problem. It is
found that a properly tuned deterministic Beam Search always outperforms the
remaining variants. Beam P-ACO, however, demonstrates lower parameter
sensitivity, while offering superior worst-case performance. Being an anytime
algorithm, it is then found to be the preferable choice for certain practical
applications.
H. Sebastian Seung, Jonathan Zung
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
Much has been learned about plasticity of biological synapses from empirical
studies. Hebbian plasticity is driven by correlated activity of presynaptic and
postsynaptic neurons. Synapses that converge onto the same neuron often behave
as if they compete for a fixed resource; some survive the competition while
others are eliminated. To provide computational interpretations of these
aspects of synaptic plasticity, we formulate unsupervised learning as a
zero-sum game between Hebbian excitation and anti-Hebbian inhibition in a
neural network model. The game formalizes the intuition that Hebbian excitation
tries to maximize correlations of neurons with their inputs, while anti-Hebbian
inhibition tries to decorrelate neurons from each other. We further include a
model of synaptic competition, which enables a neuron to eliminate all
connections except those from its most strongly correlated inputs. Through
empirical studies, we show that this facilitates the learning of sensory
features that resemble parts of objects.
Robert A. Murphy
Subjects: Neural and Evolutionary Computing (cs.NE)
As the title suggests, we will describe (and justify through the presentation
of some of the relevant mathematics) prediction methodologies for sensor
measurements. This exposition will mainly be concerned with the mathematics
related to modeling the sensor measurements.
Carsten Witt
Comments: An extended abstract of this report will appear in the proceedings of the 2017 Genetic and Evolutionary Computation Conference (GECCO 2017)
Subjects: Neural and Evolutionary Computing (cs.NE)
A runtime analysis of the Univariate Marginal Distribution Algorithm (UMDA)
is presented on the OneMax function for wide ranges of its parameters (mu) and
(lambda). If (muge clog n) for some constant (c>0) and
(lambda=(1+Theta(1))mu), a general bound (O(mu n)) on the expected runtime
is obtained. This bound crucially assumes that all marginal probabilities of
the algorithm are confined to the interval ([1/n,1-1/n]). If (muge c’
sqrt{n}log n) for a constant (c’>0) and (lambda=(1+Theta(1))mu), the
behavior of the algorithm changes and the bound on the expected runtime becomes
(O(musqrt{n})), which typically even holds if the borders on the marginal
probabilities are omitted.
The results supplement the recently derived lower bound
(Omega(musqrt{n}+nlog n)) by Krejca and Witt (FOGA 2017) and turn out as
tight for the two very different values (mu=clog n) and (mu=c’sqrt{n}log
n). They also improve the previously best known upper bound (O(nlog nloglog
n)) by Dang and Lehre (GECCO 2015).
Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox
Comments: 10 pages, 7 figures, ICCV 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)
General human action recognition requires understanding of various visual
cues. In this paper, we propose a network architecture that computes and
integrates the most important visual cues for action recognition: pose, motion,
and the raw images. For the integration, we introduce a Markov chain model
which adds cues successively. The resulting approach is efficient and
applicable to action classification as well as to spatial and temporal action
localization. The two contributions clearly improve the performance over
respective baselines. The overall approach achieves state-of-the-art action
classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover,
it yields state-of-the-art spatio-temporal action localization results on
UCF101 and J-HMDB.
Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
A grand goal of computer vision is to build systems that learn visual
representations over time that can be applied to many tasks. In this paper, we
investigate a vision-language embedding as a core representation and show that
it leads to better cross-task transfer than standard multi-task learning. In
particular, the task of visual recognition is aligned to the task of visual
question answering by forcing each to use the same word-region embeddings. We
show this leads to greater inductive transfer from recognition to VQA than
standard multitask learning. Visual recognition also improves, especially for
categories that have relatively few recognition training labels but appear
often in the VQA setting. Thus, our paper takes a small step towards creating
more general vision systems by showing the benefit of interpretable, flexible,
and trainable core representations.
Arjun Chandrasekaran, Deshraj Yadav, Prithvijit Chattopadhyay, Viraj Prabhu, Devi Parikh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Theory of Mind is the ability to attribute mental states (beliefs, intents,
knowledge, perspectives, etc.) to others and recognize that these mental states
may differ from one’s own. Theory of Mind is critical to effective
communication and to teams demonstrating higher collective performance. To
effectively leverage the progress in Artificial Intelligence (AI) to make our
lives more productive, it is important for humans and AI to work well together
in a team. Traditionally, there has been much emphasis on research to make AI
more accurate, and (to a lesser extent) on having it better understand human
intentions, tendencies, beliefs, and contexts. The latter involves making AI
more human-like and having it develop a theory of our minds.
In this work, we argue that for human-AI teams to be effective, humans must
also develop a theory of AI’s mind – get to know its strengths, weaknesses,
beliefs, and quirks. We instantiate these ideas within the domain of Visual
Question Answering (VQA). We find that using just a few examples(50), lay
people can be trained to better predict responses and oncoming failures of a
complex VQA model. Surprisingly, we find that having access to the model’s
internal states – its confidence in its top-k predictions, explicit or implicit
attention maps which highlight regions in the image (and words in the question)
the model is looking at (and listening to) while answering a question about an
image – do not help people better predict its behavior
Christian Häne, Shubham Tulsiani, Jitendra Malik
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, Convolutional Neural Networks have shown promising results for 3D
geometry prediction. They can make predictions from very little input data such
as for example a single color image, depth map or a partial 3D volume. A major
limitation of such approaches is that they only predict a coarse resolution
voxel grid, which does not capture the surface of the objects well. We propose
a general framework, called hierarchical surface prediction (HSP), which
facilitates prediction of high resolution voxel grids. The main insight is that
it is sufficient to predict high resolution voxels around the predicted
surfaces. The exterior and interior of the objects can be represented with
coarse resolution voxels. This allows us to predict significantly higher
resolution voxel grids around the surface, from which triangle meshes can be
extracted. Our approach is general and not dependent on a specific input type.
In our experiments, we show results for geometry prediction from color images,
depth images and shape completion from partial voxel grids. Our analysis shows
that the network is able to predict the surface more accurately than a low
resolution prediction.
Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alex Sorkine-Hornung, Luc Van Gool
Comments: Challenge website: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present the 2017 DAVIS Challenge, a public competition specifically
designed for the task of video object segmentation. Following the footsteps of
other successful initiatives, such as ILSVRC and PASCAL VOC, which established
the avenue of research in the fields of scene classification and semantic
segmentation, the DAVIS Challenge comprises a dataset, an evaluation
methodology, and a public competition with a dedicated workshop co-located with
CVPR 2017. The DAVIS Challenge follows up on the recent publication of DAVIS
(Densely-Annotated VIdeo Segmentation), which has fostered the development of
several novel state-of-the-art video object segmentation techniques. In this
paper we describe the scope of the benchmark, highlight the main
characteristics of the dataset and define the evaluation metrics of the
competition.
Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox
Comments: 10 pages, 7 figures, ICCV 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)
General human action recognition requires understanding of various visual
cues. In this paper, we propose a network architecture that computes and
integrates the most important visual cues for action recognition: pose, motion,
and the raw images. For the integration, we introduce a Markov chain model
which adds cues successively. The resulting approach is efficient and
applicable to action classification as well as to spatial and temporal action
localization. The two contributions clearly improve the performance over
respective baselines. The overall approach achieves state-of-the-art action
classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover,
it yields state-of-the-art spatio-temporal action localization results on
UCF101 and J-HMDB.
Lijie Fan, Yunjie Ke
Comments: 8 pages, 10 figures. arXiv admin note: text overlap with arXiv:1608.00859, arXiv:1503.08909 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Our article presents an audio-visual based multi-modal emotion classification
system. Considering the fact of deep learning approaches to facial analysis
have recently demonstrated high performance, in our work, we use convolutional
neural networks (CNNs) for emotion recognition in video, relying on temporal
averaging and pooling operations reminiscent of widely used approaches for the
spatial aggregation of information. In respect of time sequence, we extract the
feature from audio clips in the video and use RNN to propagate information. In
this work, we focus our presentation and experimental analysis on a fusion
CNN-RNN architecture for facial expression analysis.
Dimitrios Tzionas, Juergen Gall
Comments: International Conference on Computer Vision (ICCV) 2015, this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advances have enabled 3d object reconstruction approaches using a
single off-the-shelf RGB-D camera. Although these approaches are successful for
a wide range of object classes, they rely on stable and distinctive geometric
or texture features. Many objects like mechanical parts, toys, household or
decorative articles, however, are textureless and characterized by minimalistic
shapes that are simple and symmetric. Existing in-hand scanning systems and 3d
reconstruction techniques fail for such symmetric objects in the absence of
highly distinctive features. In this work, we show that extracting 3d hand
motion for in-hand scanning effectively facilitates the reconstruction of even
featureless and highly symmetric objects and we present an approach that fuses
the rich additional information of hands into a 3d reconstruction pipeline,
significantly contributing to the state-of-the-art of in-hand scanning.
Byeongyong Ahn, Nam Ik Cho
Comments: 11 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
There are two main streams in up-to-date image denoising algorithms:
non-local self similarity (NSS) prior based methods and convolutional neural
network (CNN) based methods. The NSS based methods are favorable on images with
regular and repetitive patterns while the CNN based methods perform better on
irregular structures. In this paper, we propose a block-matching convolutional
neural network (BMCNN) method that combines NSS prior and CNN. Initially,
similar local patches in the input image are integrated into a 3D block. In
order to prevent the noise from messing up the block matching, we first apply
an existing denoising algorithm on the noisy image. The denoised image is
employed as a pilot signal for the block matching, and then denoising function
for the block is learned by a CNN structure. Experimental results show that the
proposed BMCNN algorithm achieves state-of-the-art performance. In detail,
BMCNN can restore both repetitive and irregular structures.
Dimitrios Tzionas, Abhilash Srikantha, Pablo Aponte, Juergen Gall
Comments: German Conference on Pattern Recognition (GCPR) 2014, this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Hand motion capture has been an active research topic in recent years,
following the success of full-body pose tracking. Despite similarities, hand
tracking proves to be more challenging, characterized by a higher
dimensionality, severe occlusions and self-similarity between fingers. For this
reason, most approaches rely on strong assumptions, like hands in isolation or
expensive multi-camera systems, that limit the practical use. In this work, we
propose a framework for hand tracking that can capture the motion of two
interacting hands using only a single, inexpensive RGB-D camera. Our approach
combines a generative model with collision detection and discriminatively
learned salient points. We quantitatively evaluate our approach on 14 new
sequences with challenging interactions.
Yan Zhang, Mete Ozay, Shuohao Li, Takayuki Okatani
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent study shows that a wide deep network can obtain accuracy comparable to
a deeper but narrower network. Compared to narrower and deeper networks, wide
networks employ relatively less number of layers and have various important
benefits, such that they have less running time on parallel computing devices,
and they are less affected by gradient vanishing problems. However, the
parameter size of a wide network can be very large due to use of large width of
each layer in the network. In order to keep the benefits of wide networks
meanwhile improve the parameter size and accuracy trade-off of wide networks,
we propose a binary tree architecture to truncate architecture of wide networks
by reducing the width of the networks. More precisely, in the proposed
architecture, the width is continuously reduced from lower layers to higher
layers in order to increase the expressive capacity of network with a less
increase on parameter size. Also, to ease the gradient vanishing problem,
features obtained at different layers are concatenated to form the output of
our architecture. By employing the proposed architecture on a baseline wide
network, we can construct and train a new network with same depth but
considerably less number of parameters. In our experimental analyses, we
observe that the proposed architecture enables us to obtain better parameter
size and accuracy trade-off compared to baseline networks using various
benchmark image classification datasets. The results show that our model can
decrease the classification error of baseline from 20.43% to 19.22% on
Cifar-100 using only 28% of parameters that baseline has. Code is available at
this https URL
Malte Stær Nissen, Oswin Krause, Kristian Almstrup, Søren Kjærulff, Torben Trindkær Nielsen, Mads Nielsen
Comments: Submitted for Scandinavian Conference on Image Analysis 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We compare a set of convolutional neural network (CNN) architectures for the
task of segmenting and detecting human sperm cells in an image taken from a
semen sample. In contrast to previous work, samples are not stained or washed
to allow for full sperm quality analysis, making analysis harder due to
clutter. Our results indicate that training on full images is superior to
training on patches when class-skew is properly handled. Full image training
including up-sampling during training proves to be beneficial in deep CNNs for
pixel wise accuracy and detection performance. Predicted sperm cells are found
by using connected components on the CNN predictions. We investigate
optimization of a threshold parameter on the size of detected components. Our
best network achieves 93.87% precision and 91.89% recall on our test dataset
after thresholding outperforming a classical mage analysis approach.
Dimitrios Tzionas, Juergen Gall
Comments: German Conference on Pattern Recognition (GCPR) 2013, this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Benchmarking methods for 3d hand tracking is still an open problem due to the
difficulty of acquiring ground truth data. We introduce a new dataset and
benchmarking protocol that is insensitive to the accumulative error of other
protocols. To this end, we create testing frame pairs of increasing difficulty
and measure the pose estimation error separately for each of them. This
approach gives new insights and allows to accurately study the performance of
each feature or method without employing a full tracking pipeline. Following
this protocol, we evaluate various directional distances in the context of
silhouette-based 3d hand tracking, expressed as special cases of a generalized
Chamfer distance form. An appropriate parameter setup is proposed for each of
them, and a comparative study reveals the best performing method in this
context.
Kerstin Hammernik, Teresa Klatzer, Erich Kobler, Michael P Recht, Daniel K Sodickson, Thomas Pock, Florian Knoll
Comments: Submitted to Magnetic Resonance in Medicine
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Purpose: To allow fast and high-quality reconstruction of clinical
accelerated multi-coil MR data by learning a variational network that combines
the mathematical structure of variational models with deep learning.
Theory and Methods: Generalized compressed sensing reconstruction formulated
as a variational model is embedded in an unrolled gradient descent scheme. All
parameters of this formulation, including the prior model defined by filter
kernels and activation functions as well as the data term weights, are learned
during an offline training procedure. The learned model can then be applied
online to previously unseen data.
Results: The variational network approach is evaluated on a clinical knee
imaging protocol. The variational network reconstructions outperform standard
reconstruction algorithms in terms of image quality and residual artifacts for
all tested acceleration factors and sampling patterns.
Conclusion: Variational network reconstructions preserve the natural
appearance of MR images as well as pathologies that were not included in the
training data set. Due to its high computational performance, i.e.,
reconstruction time of 193 ms on a single graphics card, and the omission of
parameter tuning once the network is trained, this new approach to image
reconstruction can easily be integrated into clinical workflow.
Lin Xiong, Jayashree Karlekar, Jian Zhao, Jiashi Feng, Sugiri Pranata, Shengmei Shen
Comments: 10 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Unconstrained face recognition performance evaluations have traditionally
focused on Labeled Faces in the Wild (LFW) dataset for imagery and the
YouTubeFaces (YTF) dataset for videos in the last couple of years. Spectacular
progress in this field has resulted in a saturation on verification and
identification accuracies for those benchmark datasets. In this paper, we
propose a unified learning framework named transferred deep feature fusion
targeting at the new IARPA Janus Bechmark A (IJB-A) face recognition dataset
released by NIST face challenge. The IJB-A dataset includes real-world
unconstrained faces from 500 subjects with full pose and illumination
variations which are much harder than the LFW and YTF datasets. Inspired by
transfer learning, we train two advanced deep convolutional neural networks
(DCNN) with two different large datasets in source domain, respectively. By
exploring the complementarity of two distinct DCNNs, deep feature fusion is
utilized after feature extraction in target domain. Then, template specific
linear SVMs is adopted to enhance the discrimination of framework. Finally,
multiple matching scores corresponding different templates are merged as the
final results. This simple unified framework outperforms the state-of-the-art
by a wide margin on IJB-A dataset. Based on the proposed approach, we have
submitted our IJB-A results to National Institute of Standards and Technology
(NIST) for official evaluation.
Le Hou, Vu Nguyen, Dimitris Samaras, Tahsin M. Kurc, Yi Gao, Tianhao Zhao, Joel H. Saltz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Histopathology images are crucial to the study of complex diseases such as
cancer. The histologic characteristics of nuclei play a key role in disease
diagnosis, prognosis and analysis. In this work, we propose a sparse
Convolutional Autoencoder (CAE) for fully unsupervised, simultaneous nucleus
detection and feature extraction in histopathology tissue images. Our CAE
detects and encodes nuclei in image patches in tissue images into sparse
feature maps that encode both the location and appearance of nuclei. Our CAE is
the first unsupervised detection network for computer vision applications. The
pretrained nucleus detection and feature extraction modules in our CAE can be
fine-tuned for supervised learning in an end-to-end fashion. We evaluate our
method on four datasets and reduce the errors of state-of-the-art methods up to
42%. We are able to achieve comparable performance with only 5% of the
fully-supervised annotation cost.
Alex Kendall, Roberto Cipolla
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning has shown to be effective for robust and real-time monocular
image relocalisation. In particular, PoseNet is a deep convolutional neural
network which learns to regress the 6-DOF camera pose from a single image. It
learns to localize using high level features and is robust to difficult
lighting, motion blur and unknown camera intrinsics, where point based SIFT
registration fails. However, it was trained using a naive loss function, with
hyper-parameters which require expensive tuning. In this paper, we give the
problem a more fundamental theoretical treatment. We explore a number of novel
loss functions for learning camera pose which are based on geometry and scene
reprojection error. Additionally we show how to automatically learn an optimal
weighting to simultaneously regress position and orientation. By leveraging
geometry, we demonstrate that our technique significantly improves PoseNet’s
performance across datasets ranging from indoor rooms to a small city.
Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann
Comments: under review at ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)
Analyzing videos of human actions involves understanding the temporal
relationships among video frames. CNNs are the current state-of-the-art methods
for action recognition in videos. However, the CNN architectures currently
being used have difficulty in capturing these relationships. State-of-the-art
action recognition approaches rely on traditional local optical flow estimation
methods to pre-compute the motion information for CNNs. Such a two-stage
approach is computationally expensive, storage demanding, and not end-to-end
trainable. In this paper, we present a novel CNN architecture that implicitly
captures motion information. Our method is 10x faster than a two-stage
approach, does not need to cache flow information, and is end-to-end trainable.
Experimental results on UCF101 and HMDB51 show that it achieves competitive
accuracy with the two-stage approaches.
Yvain Quéau, Jean Mélou, Jean-Denis Durou, Daniel Cremers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a variational method for multi-view shape-from-shading under
natural illumination. The key idea is to couple PDE-based solutions for
single-image based shape-from-shading problems across multiple images and
multiple color channels by means of a variational formulation. Rather than
alternatingly solving the individual SFS problems and optimizing the
consistency across images and channels which is known to lead to suboptimal
results, we propose an efficient solution of the coupled problem by means of an
ADMM algorithm. In numerous experiments on both simulated and real imagery, we
demonstrate that the proposed fusion of multiple-view reconstruction and
shape-from-shading provides highly accurate dense reconstructions without the
need to compute dense correspondences. With the proposed variational
integration across multiple views shape-from-shading techniques become
applicable to challenging real-world reconstruction problems, giving rise to
highly detailed geometry even in areas of smooth brightness variation and
lacking texture.
Yao Shu, Man Zhu, Kun He, John Hopcroft, Pan Zhou
Comments: 7 pages, 12 Figures, submitted to a 2017 Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We systematically study the deep representation of random weight CNN
(convolutional neural network) using the DeCNN (deconvolutional neural network)
architecture. We first fix the weights of an untrained CNN, and for each layer
of its feature representation, we train a corresponding DeCNN to reconstruct
the input image. As compared with the pre-trained CNN, the DeCNN trained on a
random weight CNN can reconstruct images more quickly and accurately, no matter
which type of random distribution for the CNN’s weights. It reveals that every
layer of the random CNN can retain photographically accurate information about
the image. We then let the DeCNN be untrained, i.e. the overall CNN-DeCNN
architecture uses only random weights. Strikingly, we can reconstruct all
position information of the image for low layer representations but the colors
change. For high layer representations, we can still capture the rough contours
of the image. We also change the number of feature maps and the shape of the
feature maps and gain more insight on the random function of the CNN-DeCNN
structure. Our work reveals that the purely random CNN-DeCNN architecture
substantially contributes to the geometric and photometric invariance due to
the intrinsic symmetry and invertible structure, but it discards the
colormetric information due to the random projection.
Fabio Dittrich, Luiz E. S. de Oliveira, Alceu S. Britto Jr., Alessandro L. Koerich
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents two novel approaches for people counting in crowded and
open environments that combine the information gathered by multiple views.
Multiple camera are used to expand the field of view as well as to mitigate the
problem of occlusion that commonly affects the performance of counting methods
using single cameras. The first approach is regarded as a direct approach and
it attempts to segment and count each individual in the crowd. For such an aim,
two head detectors trained with head images are employed: one based on support
vector machines and another based on Adaboost perceptron. The second approach,
regarded as an indirect approach employs learning algorithms and statistical
analysis on the whole crowd to achieve counting. For such an aim, corner points
are extracted from groups of people in a foreground image and computed by a
learning algorithm which estimates the number of people in the scene. Both
approaches count the number of people on the scene and not only on a given
image or video frame of the scene. The experimental results obtained on the
benchmark PETS2009 video dataset show that proposed indirect method surpasses
other methods with improvements of up to 46.7% and provides accurate counting
results for the crowded scenes. On the other hand, the direct method shows high
error rates due to the fact that the latter has much more complex problems to
solve, such as segmentation of heads.
Kourosh Meshgi, Shigeyuki Oba, Shin Ishii
Comments: CRV’17 Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Discrminative trackers, employ a classification approach to separate the
target from its background. To cope with variations of the target shape and
appearance, the classifier is updated online with different samples of the
target and the background. Sample selection, labeling and updating the
classifier is prone to various sources of errors that drift the tracker. We
introduce the use of an efficient version space shrinking strategy to reduce
the labeling errors and enhance its sampling strategy by measuring the
uncertainty of the tracker about the samples. The proposed tracker, utilize an
ensemble of classifiers that represents different hypotheses about the target,
diversify them using boosting to provide a larger and more consistent coverage
of the version-space and tune the classifiers’ weights in voting. The proposed
system adjusts the model update rate by promoting the co-training of the
short-memory ensemble with a long-memory oracle. The proposed tracker
outperformed state-of-the-art trackers on different sequences bearing various
tracking challenges.
Marius Cordts, Timo Rehfeld, Lukas Schneider, David Pfeiffer, Markus Enzweiler, Stefan Roth, Marc Pollefeys, Uwe Franke
Comments: Accepted for publication in Image and Vision Computing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent progress in advanced driver assistance systems and the race towards
autonomous vehicles is mainly driven by two factors: (1) increasingly
sophisticated algorithms that interpret the environment around the vehicle and
react accordingly, and (2) the continuous improvements of sensor technology
itself. In terms of cameras, these improvements typically include higher
spatial resolution, which as a consequence requires more data to be processed.
The trend to add multiple cameras to cover the entire surrounding of the
vehicle is not conducive in that matter. At the same time, an increasing number
of special purpose algorithms need access to the sensor input data to correctly
interpret the various complex situations that can occur, particularly in urban
traffic.
By observing those trends, it becomes clear that a key challenge for vision
architectures in intelligent vehicles is to share computational resources. We
believe this challenge should be faced by introducing a representation of the
sensory data that provides compressed and structured access to all relevant
visual content of the scene. The Stixel World discussed in this paper is such a
representation. It is a medium-level model of the environment that is
specifically designed to compress information about obstacles by leveraging the
typical layout of outdoor traffic scenes. It has proven useful for a multitude
of automotive vision applications, including object detection, tracking,
segmentation, and mapping.
In this paper, we summarize the ideas behind the model and generalize it to
take into account multiple dense input streams: the image itself, stereo depth
maps, and semantic class probability maps that can be generated, e.g., by CNNs.
Our generalization is embedded into a novel mathematical formulation for the
Stixel model. We further sketch how the free parameters of the model can be
learned using structured SVMs.
G. Chierchia, D. Cozzolino, G. Poggi, L. Verdoliva
Comments: Accepted at 2017 IEEE International Geoscience and Remote Sensing Symposium, Fort Worth, Texas, July 23-28, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we investigate the use of discriminative model learning through
Convolutional Neural Networks (CNNs) for SAR image despeckling. The network
uses a residual learning strategy, hence it does not recover the filtered
image, but the speckle component, which is then subtracted from the noisy one.
Training is carried out by considering a large multitemporal SAR image properly
despeckled through 3D filtering, in order to approximate a {em clean} image.
Experimental results, both on synthetic and real SAR data, show the method to
achieve performance that improve with respect to state-of-the-art techniques.
Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
A grand goal of computer vision is to build systems that learn visual
representations over time that can be applied to many tasks. In this paper, we
investigate a vision-language embedding as a core representation and show that
it leads to better cross-task transfer than standard multi-task learning. In
particular, the task of visual recognition is aligned to the task of visual
question answering by forcing each to use the same word-region embeddings. We
show this leads to greater inductive transfer from recognition to VQA than
standard multitask learning. Visual recognition also improves, especially for
categories that have relatively few recognition training labels but appear
often in the VQA setting. Thus, our paper takes a small step towards creating
more general vision systems by showing the benefit of interpretable, flexible,
and trainable core representations.
Shuang Ma, Jing Liu, Chang Wen Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep convolutional neural networks (CNN) have recently been shown to generate
promising results for aesthetics assessment. However, the performance of these
deep CNN methods is often compromised by the constraint that the neural network
only takes the fixed-size input. To accommodate this requirement, input images
need to be transformed via cropping, warping, or padding, which often alter
image composition, reduce image resolution, or cause image distortion. Thus the
aesthetics of the original images is impaired because of potential loss of fine
grained details and holistic image layout. However, such fine grained details
and holistic image layout is critical for evaluating an image’s aesthetics. In
this paper, we present an Adaptive Layout-Aware Multi-Patch Convolutional
Neural Network (A-Lamp CNN) architecture for photo aesthetic assessment. This
novel scheme is able to accept arbitrary sized images, and learn from both
fined grained details and holistic image layout simultaneously. To enable
training on these hybrid inputs, we extend the method by developing a dedicated
double-subnet neural network structure, i.e. a Multi-Patch subnet and a
Layout-Aware subnet. We further construct an aggregation layer to effectively
combine the hybrid features from these two subnets. Extensive experiments on
the large-scale aesthetics assessment benchmark (AVA) demonstrate significant
performance improvement over the state-of-the-art in photo aesthetic
assessment.
Manoel Horta Ribeiro, Bruno Teixeira, Antônio Otávio Fernandes, Wagner Meira Jr., Erickson R. Nascimento
Comments: Conference paper published at 2016 29th SIBGRAPI, Conference on Graphics, Patterns and Images (SIBGRAPI). 8 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Many of the state-of-the-art algorithms for gesture recognition are based on
Conditional Random Fields (CRFs). Successful approaches, such as the
Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose
values are mapped to the values of the labels. In this paper we propose a novel
methodology to set the latent values according to the gesture complexity. We
use an heuristic that iterates through the samples associated with each label
value, stimating their complexity. We then use it to assign the latent values
to the label values. We evaluate our method on the task of recognizing human
gestures from video streams. The experiments were performed in binary datasets,
generated by grouping different labels. Our results demonstrate that our
approach outperforms the arbitrary one in many cases, increasing the accuracy
by up to 10%.
Xiao Sun, Jiaxiang Shang, Shuang Liang, Yichen Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Regression based methods are widely used for 3D and 2D human pose estimation,
but the performance is not satisfactory. One problem is that the structural
information of the pose is not well exploited in the existing methods. In this
work, we propose a structure-aware regression approach. It adopts a
reparameterized pose representation using bones instead of joints. It exploits
the joint connection structure and proposes a compositional loss function that
encodes the long range interactions in the pose. It is simple, effective, and
general. Comprehensive evaluation validates the effectiveness of our approach.
It significantly advances the state-of-the-art on Human3.6M and achieves
state-of-the-art results on MPII, in a unified framework for 3D and 2D pose
regression.
Peng Tang, Xinggang Wang, Xiang Bai, Wenyu Liu
Comments: Accepted by CVPR 2017, IEEE Conference on Computer Vision and Pattern Recognition 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Of late, weakly supervised object detection is with great importance in
object recognition. Based on deep learning, weakly supervised detectors have
achieved many promising results. However, compared with fully supervised
detection, it is more challenging to train deep network based detectors in a
weakly supervised manner. Here we formulate weakly supervised detection as a
Multiple Instance Learning (MIL) problem, where instance classifiers (object
detectors) are put into the network as hidden nodes. We propose a novel online
instance classifier refinement algorithm to integrate MIL and the instance
classifier refinement procedure into a single deep network, and train the
network end-to-end with only image-level supervision, i.e., without object
location information. More precisely, instance labels inferred from weak
supervision are propagated to their spatially overlapped instances to refine
instance classifier online. The iterative instance classifier refinement
procedure is implemented using multiple streams in deep network, where each
stream supervises its latter stream. Weakly supervised object detection
experiments are carried out on the challenging PASCAL VOC 2007 and 2012
benchmarks. We obtain 47% mAP on VOC 2007 that significantly outperforms the
previous state-of-the-art.
Chenfanfu Jiang, Yixin Zhu, Siyuan Qi, Siyuan Huang, Jenny Lin, Xiongwen Guo, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu
Comments: 12-page paper with 18-page supplementary file, 25 figures, submitted to ICCV 2017. Chenfanfu Jiang, Yixin Zhu, Siyuan Qi and Siyuan Huang contributed equally to this paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We propose the configurable rendering of massive quantities of photorealistic
images with ground truth for the purposes of training, benchmarking, and
diagnosing computer vision models. In contrast to the conventional
(crowd-sourced) manual labeling of ground truth for a relatively modest number
of RGB-D images captured by Kinect-like sensors, we devise a non-trivial
configurable pipeline of algorithms capable of generating a potentially
infinite variety of indoor scenes using a stochastic grammar, specifically, one
represented by an attributed spatial And-Or graph. We employ physics-based
rendering to synthesize photorealistic RGB images while automatically
synthesizing detailed, per-pixel ground truth data, including visible surface
depth and normal, object identity and material information, as well as
illumination. Our pipeline is configurable inasmuch as it enables the precise
customization and control of important attributes of the generated scenes. We
demonstrate that our generated scenes achieve a performance similar to the NYU
v2 Dataset on pre-trained deep learning models. By modifying pipeline
components in a controllable manner, we furthermore provide diagnostics on
common scene understanding tasks; eg., depth and surface normal prediction,
semantic segmentation, etc.
Jiajun Lu, Theerasit Issaranon, David Forsyth
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We describe a method to produce a network where current methods such as
DeepFool have great difficulty producing adversarial samples. Our construction
suggests some insights into how deep networks work. We provide a reasonable
analyses that our construction is difficult to defeat, and show experimentally
that our method is hard to defeat using several standard networks and datasets.
We use our method to produce a system that can reliably detect whether an image
is a picture of a real scene or not. Our system applies to images captured with
depth maps (RGBD images) and checks if a pair of image and depth map is
consistent. It relies on the relative difficulty of producing naturalistic
depth maps for images in post processing. We demonstrate that our system is
robust to adversarial examples built from currently known attacking approaches.
Shan Su, Jianbo Shi, Hyun Soo Park
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper studies a problem of inverse visual path planning: creating a
visual scene from a first person action. Our conjecture is that the spatial
arrangement of a first person visual scene is deployed to afford an action, and
therefore, the action can be inversely used to synthesize a new scene such that
the action is feasible. As a proof-of-concept, we focus on linking visual
experiences induced by walking.
A key innovation of this paper is a concept of ActionTunnel—a 3D virtual
tunnel along the future trajectory encoding what the wearer will visually
experience as moving into the scene. This connects two distinctive first person
images through similar walking paths. Our method takes a first person image
with a user defined future trajectory and outputs a new image that can afford
the future motion. The image is created by combining present and future
ActionTunnels in 3D where the missing pixels in adjoining area are computed by
a generative adversarial network. Our work can provide a travel across
different first person experiences in diverse real world scenes.
Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, Jean-François Lalonde
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we propose a method to infer high dynamic range illumination
from a single, limited field-of-view, low dynamic range photograph of an indoor
scene. Inferring scene illumination from a single photograph is a challenging
problem. The pixel intensities observed in a photograph are a complex function
of scene geometry, reflectance properties, and illumination. We introduce an
end-to-end solution to this problem and propose a deep neural network that
takes the limited field-of-view photo as input and produces an environment map
as a panorama and a light mask prediction over the panorama as the output. Our
technique does not require special image capture or user input. We preprocess
standard low dynamic range panoramas by introducing novel light source
detection and warping methods on the panorama, and use the results with
corresponding limited field-of-view crops as training data. Our method does not
rely on any assumptions on scene appearance, geometry, material properties, or
lighting. This allows us to automatically recover high-quality illumination
estimates that significantly outperform previous state-of-the-art methods.
Consequently, using our illumination estimates for applications like 3D object
insertion lead to results that are photo-realistic, which we demonstrate over a
large set of examples and via a user study.
Cheng Peng, Volkan Isler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Estimating positions of world points from features observed in images is a
key problem in 3D reconstruction, image mosaicking, simultaneous localization
and mapping and structure from motion. We consider a special instance in which
there is a dominant ground plane (mathcal{G}) viewed from a parallel viewing
plane (mathcal{S}) above it. Such instances commonly arise, for example, in
aerial photography.
Consider a world point (g in mathcal{G}) and its worst case reconstruction
uncertainty (varepsilon(g,mathcal{S})) obtained by merging emph{all}
possible views of (g) chosen from (mathcal{S}). We first show that one can
pick two views (s_p) and (s_q) such that the uncertainty
(varepsilon(g,{s_p,s_q})) obtained using only these two views is almost as
good as (i.e. within a small constant factor of) (varepsilon(g,mathcal{S})).
Next, we extend the result to the entire ground plane (mathcal{G}) and show
that one can pick a small subset of (mathcal{S’} subseteq mathcal{S}) (which
grows only linearly with the area of (mathcal{G})) and still obtain a constant
factor approximation, for every point (g in mathcal{G}), to the minimum worst
case estimate obtained by merging all views in (mathcal{S}).
Our results provide a view selection mechanism with provable performance
guarantees which can drastically increase the speed of scene reconstruction
algorithms. In addition to theoretical results, we demonstrate their
effectiveness in an application where aerial imagery is used for monitoring
farms and orchards.
Kourosh Meshgi, Maryam Sadat Mirzaei, Shigeyuki Oba, Shin Ishii
Comments: Submitted to IEEE ICSIPA’2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Adaptive tracking-by-detection approaches are popular for tracking arbitrary
objects. They treat the tracking problem as a classification task and use
online learning techniques to update the object model. However, these
approaches are heavily invested in the efficiency and effectiveness of their
detectors. Evaluating a massive number of samples for each frame (e.g.,
obtained by a sliding window) forces the detector to trade the accuracy in
favor of speed. Furthermore, misclassification of borderline samples in the
detector introduce accumulating errors in tracking. In this study, we propose a
co-tracking based on the efficient cooperation of two detectors: a rapid
adaptive exemplar-based detector and another more sophisticated but slower
detector with a long-term memory. The sampling labeling and co-learning of the
detectors are conducted by an uncertainty sampling unit, which improves the
speed and accuracy of the system. We also introduce a budgeting mechanism which
prevents the unbounded growth in the number of examples in the first detector
to maintain its rapid response. Experiments demonstrate the efficiency and
effectiveness of the proposed tracker against its baselines and its superior
performance against state-of-the-art trackers on various benchmark videos.
Hieu Le, Vu Nguyen, Chen-Ping Yu, Dimitris Samaras
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a geodesic-distance-based feature that encodes global
information for improved video segmentation algorithms. The feature is a joint
histogram of intensity and geodesic distances, where the geodesic distances are
computed as the shortest paths between superpixels via their boundaries. We
also incorporate adaptive voting weights and spatial pyramid configurations to
include spatial information into the geodesic histogram feature and show that
this further improves results. The feature is generic and can be used as part
of various algorithms. In experiments, we test the geodesic histogram feature
by incorporating it into two existing video segmentation frameworks. This leads
to significantly better performance in 3D video segmentation benchmarks on two
datasets.
Xu Han, Xiao Yang, Stephen Aylward, Roland Kwitt, Marc Niethammer
Comments: Accepted as a conference paper for ISBI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Registration involving one or more images containing pathologies is
challenging, as standard image similarity measures and spatial transforms
cannot account for common changes due to pathologies. Low-rank/Sparse (LRS)
decomposition removes pathologies prior to registration; however, LRS is
memory-demanding and slow, which limits its use on larger data sets.
Additionally, LRS blurs normal tissue regions, which may degrade registration
performance. This paper proposes an efficient alternative to LRS: (1) normal
tissue appearance is captured by principal component analysis (PCA) and (2)
blurring is avoided by an integrated model for pathology removal and image
reconstruction. Results on synthetic and BRATS 2015 data demonstrate its
utility.
Xingyu Lin, Hao Wang, Zhihao Li, Yimeng Zhang, Alan Yuille, Tai Sing Lee
Comments: Accepted to ICLR2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We develop a model of perceptual similarity judgment based on re-training a
deep convolution neural network (DCNN) that learns to associate different views
of each 3D object to capture the notion of object persistence and continuity in
our visual experience. The re-training process effectively performs distance
metric learning under the object persistency constraints, to modify the
view-manifold of object representations. It reduces the effective distance
between the representations of different views of the same object without
compromising the distance between those of the views of different objects,
resulting in the untangling of the view-manifolds between individual objects
within the same category and across categories. This untangling enables the
model to discriminate and recognize objects within the same category,
independent of viewpoints. We found that this ability is not limited to the
trained objects, but transfers to novel objects in both trained and untrained
categories, as well as to a variety of completely novel artificial synthetic
objects. This transfer in learning suggests the modification of distance
metrics in view- manifolds is more general and abstract, likely at the levels
of parts, and independent of the specific objects or categories experienced
during training. Interestingly, the resulting transformation of feature
representation in the deep networks is found to significantly better match
human perceptual similarity judgment than AlexNet, suggesting that object
persistence could be an important constraint in the development of perceptual
similarity judgment in biological neural networks.
Orlando Moreira, Merten Popp, Christian Schulz
Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Graphs are widely used to model execution dependencies in applications. In
particular, the NP-complete problem of partitioning a graph under constraints
receives enormous attention by researchers because of its applicability in
multiprocessor scheduling. We identified the additional constraint of acyclic
dependencies between blocks when mapping computer vision and imaging
applications to a heterogeneous embedded multiprocessor. Existing algorithms
and heuristics do not address this requirement and deliver results that are not
applicable for our use-case. In this work, we show that this more constrained
version of the graph partitioning problem is NP-complete and present heuristics
that achieve a close approximation of the optimal solution found by an
exhaustive search for small problem instances and much better scalability for
larger instances. In addition, we can show a positive impact on the schedule of
a real imaging application that improves communication volume and execution
time.
Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, Luc Van Gool
Comments: Supplementary visual examples available at: this http URL
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
In this work we present a new approach to learn compressible representations
in deep architectures with an end-to-end training strategy. Our method is based
on a soft relaxation of quantization and entropy, which we anneal to their
discrete counterparts throughout training. We showcase this method for two
challenging applications: Image compression and neural network compression.
While these tasks have typically been approached with different methods, our
soft-to-hard quantization approach gives state-of-the-art results for both.
Timothy I. Cannings, Thomas B. Berrett, Richard J. Samworth
Comments: 42 pages
Subjects: Statistics Theory (math.ST); Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME)
We derive a new asymptotic expansion for the global excess risk of a local
(k)-nearest neighbour classifier, where the choice of (k) may depend upon the
test point. This expansion elucidates conditions under which the dominant
contribution to the excess risk comes from the locus of points at which each
class label is equally likely to occur, but we also show that if these
conditions are not satisfied, the dominant contribution may arise from the
tails of the marginal distribution of the features. Moreover, we prove that,
provided the (d)-dimensional marginal distribution of the features has a finite
(
ho)th moment for some (
ho > 4) (as well as other regularity conditions), a
local choice of (k) can yield a rate of convergence of the excess risk of
(O(n^{-4/(d+4)})), where (n) is the sample size, whereas for the standard
(k)-nearest neighbour classifier, our theory would require (d geq 5) and (
ho
> 4d/(d-4)) finite moments to achieve this rate. Our results motivate a new
(k)-nearest neighbour classifier for semi-supervised learning problems, where
the unlabelled data are used to obtain an estimate of the marginal feature
density, and fewer neighbours are used for classification when this density
estimate is small. The potential improvements over the standard (k)-nearest
neighbour classifier are illustrated both through our theory and via a
simulation study.
Frank Nielsen, Ke Sun
Comments: 18 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Clustering categorical distributions in the probability simplex is a
fundamental primitive often met in applications dealing with histograms or
mixtures of multinomials. Traditionally, the differential-geometric structure
of the probability simplex has been used either by (i) setting the Riemannian
metric tensor to the Fisher information matrix of the categorical
distributions, or (ii) defining the information-geometric structure induced by
a smooth dissimilarity measure, called a divergence. In this paper, we
introduce a novel computationally-friendly non-Riemannian framework for
modeling the probability simplex: Hilbert simplex geometry. We discuss the pros
and cons of those three statistical modelings, and compare them experimentally
for clustering tasks.
Claudius Zelenka, Reinhard Koch
Comments: To appear in the proceedings of the 23rd International Conference on Pattern Recognition (ICPR 2016)
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV)
This contribution deals with image restoration in optical systems with
coherent illumination, which is an important topic in astronomy, coherent
microscopy and radar imaging. Such optical systems suffer from wavefront
distortions, which are caused by imperfect imaging components and conditions.
Known image restoration algorithms work well for incoherent imaging, they fail
in case of coherent images. In this paper a novel wavefront correction
algorithm is presented, which allows image restoration under coherent
conditions. In most coherent imaging systems, especially in astronomy, the
wavefront deformation is known. Using this information, the proposed algorithm
allows a high quality restoration even in case of severe wavefront distortions.
We present two versions of this algorithm, which are an evolution of the
Gerchberg-Saxton and the Hybrid-Input-Output algorithm. The algorithm is
verified on simulated and real microscopic images.
Andrew J. Ballard, Ritankar Das, Stefano Martiniani, Dhagash Mehta, Levent Sagun, Jacob D. Stevenson, David J. Wales
Comments: 41 pages, 25 figures. Accepted for publication in Physical Chemistry Chemical Physics, 2017
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Machine learning techniques are being increasingly used as flexible
non-linear fitting and prediction tools in the physical sciences. Fitting
functions that exhibit multiple solutions as local minima can be analysed in
terms of the corresponding machine learning landscape. Methods to explore and
visualise molecular potential energy landscapes can be applied to these machine
learning landscapes to gain new insight into the solution space involved in
training and the nature of the corresponding predictions. In particular, we can
define quantities analogous to molecular structure, thermodynamics, and
kinetics, and relate these emergent properties to the structure of the
underlying landscape. This Perspective aims to describe these analogies with
examples from recent applications, and suggest avenues for new
interdisciplinary research.
S. Ali Mirsoleimani, Aske Plaat, Jaap van den Herik, Jos Vermaseren
Comments: 9 pages
Subjects: Artificial Intelligence (cs.AI)
In this paper, we present a new algorithm for parallel Monte Carlo tree
search (MCTS). It is based on the pipeline pattern and allows flexible
management of the control flow of the operations in parallel MCTS. The pipeline
pattern provides for the first structured parallel programming approach to
MCTS. Moreover, we propose a new lock-free tree data structure for parallel
MCTS which removes synchronization overhead. The Pipeline Pattern for Parallel
MCTS algorithm (called 3PMCTS), scales very well to higher numbers of cores
when compared to the existing methods.
Majid Mohammadi, Amir Ahooye Atashin, Wout Hofman, Yaohua Tan
Subjects: Artificial Intelligence (cs.AI)
Ontology alignment is widely used to find the correspondences between
different ontologies in diverse fields. After discovering the alignment by
methods, several performance scores are available to evaluate them. The scores
require the produced alignment by a method and the reference alignment
containing the underlying actual correspondences of the given ontologies. The
current trend in alignment evaluation is to put forward a new score and to
compare various alignments by juxtaposing their performance scores. However, it
is substantially provocative to select one performance score among others for
comparison. On top of that, claiming if one method has a better performance
than one another can not be substantiated by solely comparing the scores. In
this paper, we propose the statistical procedures which enable us to
theoretically favor one method over one another. The McNemar test is considered
as a reliable and suitable means for comparing two ontology alignment methods
over one matching task. The test applies to a 2 x 2 contingency table which can
be constructed in two different ways based on the alignments, each of which has
their own merits/pitfalls. The ways of the contingency table construction and
various apposite statistics from the McNemar test are elaborated in minute
detail. In the case of having more than two alignment methods for comparison,
the family-wise error rate is expected to happen. Thus, the ways of preventing
such an error are also discussed. A directed graph visualizes the outcome of
the McNemar test in the presence of multiple alignment methods. From this
graph, it is readily understood if one method is better than one another or if
their differences are imperceptible. Our investigation on the methods
participated in the anatomy track of OAEI 2016 demonstrates that AML and
CroMatcher are the top two methods and DKP-AOM and Alin are the bottom two
ones.
Arjun Chandrasekaran, Deshraj Yadav, Prithvijit Chattopadhyay, Viraj Prabhu, Devi Parikh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Theory of Mind is the ability to attribute mental states (beliefs, intents,
knowledge, perspectives, etc.) to others and recognize that these mental states
may differ from one’s own. Theory of Mind is critical to effective
communication and to teams demonstrating higher collective performance. To
effectively leverage the progress in Artificial Intelligence (AI) to make our
lives more productive, it is important for humans and AI to work well together
in a team. Traditionally, there has been much emphasis on research to make AI
more accurate, and (to a lesser extent) on having it better understand human
intentions, tendencies, beliefs, and contexts. The latter involves making AI
more human-like and having it develop a theory of our minds.
In this work, we argue that for human-AI teams to be effective, humans must
also develop a theory of AI’s mind – get to know its strengths, weaknesses,
beliefs, and quirks. We instantiate these ideas within the domain of Visual
Question Answering (VQA). We find that using just a few examples(50), lay
people can be trained to better predict responses and oncoming failures of a
complex VQA model. Surprisingly, we find that having access to the model’s
internal states – its confidence in its top-k predictions, explicit or implicit
attention maps which highlight regions in the image (and words in the question)
the model is looking at (and listening to) while answering a question about an
image – do not help people better predict its behavior
Lars Maaløe, Marco Fraccaro, Ole Winther
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Deep generative models trained with large amounts of unlabelled data have
proven to be powerful within the domain of unsupervised learning. Many real
life data sets contain a small amount of labelled data points, that are
typically disregarded when training generative models. We propose the
Cluster-aware Generative Model, that uses unlabelled information to infer a
latent representation that models the natural clustering of the data, and
additional labelled data points to refine this clustering. The generative
performances of the model significantly improve when labelled information is
exploited, obtaining a log-likelihood of -79.38 nats on permutation invariant
MNIST, while also achieving competitive semi-supervised classification
accuracies. The model can also be trained fully unsupervised, and still improve
the log-likelihood performance with respect to related methods.
Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox
Comments: 10 pages, 7 figures, ICCV 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)
General human action recognition requires understanding of various visual
cues. In this paper, we propose a network architecture that computes and
integrates the most important visual cues for action recognition: pose, motion,
and the raw images. For the integration, we introduce a Markov chain model
which adds cues successively. The resulting approach is efficient and
applicable to action classification as well as to spatial and temporal action
localization. The two contributions clearly improve the performance over
respective baselines. The overall approach achieves state-of-the-art action
classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover,
it yields state-of-the-art spatio-temporal action localization results on
UCF101 and J-HMDB.
Isabelle Augenstein, Anders Søgaard
Comments: ACL 2017
Journal-ref: ACL 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Keyphrase boundary classification (KBC) is the task of detecting keyphrases
in scientific articles and labelling them with respect to predefined types.
Although important in practice, this task is so far underexplored, partly due
to the lack of labelled data. To overcome this, we explore several auxiliary
tasks, including semantic super-sense tagging and identification of multi-word
expressions, and cast the task as a multi-task learning problem with deep
recurrent neural networks. Our multi-task models perform significantly better
than previous state of the art approaches on two scientific KBC datasets,
particularly for long keyphrases.
Ahmed Hussain Qureshi, Yasar Ayaz
Comments: This paper introduces a novel algorithm called P-RRT*. The work has been published in Springer Autonomous Robots Journal
Journal-ref: Autonomous Robots 40, no. 6 (2016): 1079-1093
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Rapidly-exploring Random Tree Star(RRT*) is a recently proposed extension of
Rapidly-exploring Random Tree (RRT) algorithm that provides a collision-free,
asymptotically optimal path regardless of obstacle’s geometry in a given
environment. However, one of the limitations in the RRT* algorithm is slow
convergence to optimal path solution. As a result, it consumes high memory as
well as time due to a large number of iterations utilised in achieving optimal
path solution. To overcome these limitations, we propose the Potential Function
Based-RRT* (P-RRT*) that incorporates the Artificial Potential Field Algorithm
in RRT*. The proposed algorithm allows a considerable decrease in the number of
iterations and thus leads to more efficient memory utilization and an
accelerated convergence rate. In order to illustrate the usefulness of the
proposed algorithm in terms of space execution and convergence rate, this paper
presents rigorous simulation based comparisons between the proposed techniques
and RRT* under different environmental conditions. Moreover, both algorithms
are also tested and compared under non-holonomic differential constraints.
Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
A grand goal of computer vision is to build systems that learn visual
representations over time that can be applied to many tasks. In this paper, we
investigate a vision-language embedding as a core representation and show that
it leads to better cross-task transfer than standard multi-task learning. In
particular, the task of visual recognition is aligned to the task of visual
question answering by forcing each to use the same word-region embeddings. We
show this leads to greater inductive transfer from recognition to VQA than
standard multitask learning. Visual recognition also improves, especially for
categories that have relatively few recognition training labels but appear
often in the VQA setting. Thus, our paper takes a small step towards creating
more general vision systems by showing the benefit of interpretable, flexible,
and trainable core representations.
Lianhui Qin, Zhisong Zhang, Hai Zhao, Zhiting Hu, Eric P. Xing
Comments: To appear in ACL2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Implicit discourse relation classification is of great challenge due to the
lack of connectives as strong linguistic cues, which motivates the use of
annotated implicit connectives to improve the recognition. We propose a feature
imitation framework in which an implicit relation network is driven to learn
from another neural network with access to connectives, and thus encouraged to
extract similarly salient features for accurate classification. We develop an
adversarial model to enable an adaptive imitation scheme through competition
between the implicit network and a rival feature discriminator. Our method
effectively transfers discriminability of connectives to the implicit features,
and achieves state-of-the-art performance on the PDTB benchmark.
Leopoldo Bertossi, Mostafa Milani
Comments: Journal submission. Extended version of RuleML’15 paper
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
Data quality assessment and data cleaning are context-dependent activities.
Motivated by this observation, we propose the Ontological Multidimensional Data
Model (OMD model), which can be used to model and represent contexts as
logic-based ontologies. The data under assessment is mapped into the context,
for additional analysis, processing, and quality data extraction. The resulting
contexts allow for the representation of dimensions, and multidimensional data
quality assessment becomes possible. At the core of a multidimensional context
we include a generalized multidimensional data model and a Datalog+/- ontology
with provably good properties in terms of query answering. These main
components are used to represent dimension hierarchies, dimensional
constraints, dimensional rules, and define predicates for quality data
specification. Query answering relies upon and triggers navigation through
dimension hierarchies, and becomes the basic tool for the extraction of quality
data. The OMD model is interesting per se, beyond applications to data quality.
It allows for a logic-based, and computationally tractable representation of
multidimensional data, extending previous multidimensional data models with
additional expressive power and functionalities.
Tegjyot Singh Sethi, Mehmed Kantardzic
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Classifiers deployed in the real world operate in a dynamic environment,
where the data distribution can change over time. These changes, referred to as
concept drift, can cause the predictive performance of the classifier to drop
over time, thereby making it obsolete. To be of any real use, these classifiers
need to detect drifts and be able to adapt to them, over time. Detecting drifts
has traditionally been approached as a supervised task, with labeled data
constantly being used for validating the learned model. Although effective in
detecting drifts, these techniques are impractical, as labeling is a difficult,
costly and time consuming activity. On the other hand, unsupervised change
detection techniques are unreliable, as they produce a large number of false
alarms. The inefficacy of the unsupervised techniques stems from the exclusion
of the characteristics of the learned classifier, from the detection process.
In this paper, we propose the Margin Density Drift Detection (MD3) algorithm,
which tracks the number of samples in the uncertainty region of a classifier,
as a metric to detect drift. The MD3 algorithm is a distribution independent,
application independent, model independent, unsupervised and incremental
algorithm for reliably detecting drifts from data streams. Experimental
evaluation on 6 drift induced datasets and 4 additional datasets from the
cybersecurity domain demonstrates that the MD3 approach can reliably detect
drifts, with significantly fewer false alarms compared to unsupervised feature
based drift detectors. The reduced false alarms enables the signaling of drifts
only when they are most likely to affect classification performance. As such,
the MD3 approach leads to a detection scheme which is credible, label efficient
and general in its applicability.
Shuai Zhang, Lina Yao, Xiwei Xu
Comments: 5 pages, 2 figures
Subjects: Information Retrieval (cs.IR)
Collaborative filtering (CF) has been successfully used to provide users with
personalized products and services. However, dealing with the increasing
sparseness of user-item matrix still remains a challenge. To tackle such issue,
hybrid CF such as combining with content based filtering and leveraging side
information of users and items has been extensively studied to enhance
performance. However, most of these approaches depend on hand-crafted feature
engineering, which are usually noise-prone and biased by different feature
extraction and selection schemes. In this paper, we propose a new hybrid model
by generalizing contractive auto-encoder paradigm into matrix factorization
framework with good scalability and computational efficiency, which jointly
model content information as representations of effectiveness and compactness,
and leverage implicit user feedback to make accurate recommendations. Extensive
experiments conducted over three large-scale real datasets indicate the
proposed approach outperforms state-of-art methods for item recommendation.
Felix Beierle, Akiko Aizawa, Joeran Beel
Comments: Accepted for publication at the 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR2017)
Subjects: Information Retrieval (cs.IR)
We investigate the problem of choice overload – the difficulty of making a
decision when faced with many options – when displaying related-article
recommendations in digital libraries. So far, research regarding to how many
items should be displayed has mostly been done in the fields of media
recommendations and search engines. We analyze the number of recommendations in
current digital libraries. When browsing fullscreen with a laptop or desktop
PC, all display a fixed number of recommendations. 72% display three, four, or
five recommendations, none display more than ten. We provide results from an
empirical evaluation conducted with GESIS’ digital library Sowiport, with
recommendations delivered by recommendations-as-a-service provider Mr. DLib. We
use click-through rate as a measure of recommendation effectiveness based on
3.4 million delivered recommendations. Our results show lower click-through
rates for higher numbers of recommendations and twice as many clicked
recommendations when displaying ten instead of one related-articles. Our
results indicate that users might quickly feel overloaded by choice.
Joeran Beel, Siddharth Dinesh
Comments: This article is a long version of the article published in the Proceedings of the 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR)
Subjects: Information Retrieval (cs.IR)
Research on recommender systems is a challenging task, as is building and
operating such systems. Major challenges include non-reproducible research
results, dealing with noisy data, and answering many questions such as how many
recommendations to display, how often, and, of course, how to generate
recommendations most effectively. In the past six years, we built three
research-article recommender systems for digital libraries and reference
managers, and conducted research on these systems. In this paper, we share some
experiences we made during that time. Among others, we discuss the required
skills to build recommender systems, and why the literature provides little
help in identifying promising recommendation approaches. We explain the
challenge in creating a randomization engine to run A/B tests, and how low data
quality impacts the calculation of bibliometrics. We further discuss why
several of our experiments delivered disappointing results, and provide
statistics on how many researchers showed interest in our recommendation
dataset.
Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, Rob Procter
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Despite the increasing use of social media platforms for information and news
gathering, its unmoderated nature often leads to the emergence and spread of
rumours, i.e. pieces of information that are unverified at the time of posting.
At the same time, the openness of social media platforms provides opportunities
to study how users share and discuss rumours, and to explore how natural
language processing and data mining techniques may be used to find ways of
determining their veracity. In this survey we introduce and discuss two types
of rumours that circulate on social media; long-standing rumours that circulate
for long periods of time, and newly-emerging rumours spawned during fast-paced
events such as breaking news, where reports are released piecemeal and often
with an unverified status in their early stages. We provide an overview of
research into social media rumours with the ultimate goal of developing a
rumour classification system that consists of four components: rumour
detection, rumour tracking, rumour stance classification and rumour veracity
classification. We delve into the approaches presented in the scientific
literature for the development of each of these four components. We summarise
the efforts and achievements so far towards the development of rumour
classification systems and conclude with suggestions for avenues for future
research in social media mining for detection and resolution of rumours.
Esra Akbas
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
As the type and the number of such venues increase, automated analysis of
sentiment on textual resources has become an essential data mining task. In
this paper, we investigate the problem of mining opinions by extracting aspects
of entities on the collection of informal short texts. Both positive and
negative sentiment strength of texts are detected. We focus on a non-English
language that has few resources for text mining. This approach would help
enhance the sentiment analysis in languages where a list of opinionated words
does not exist. We propose a new method projects the text into dense and low
dimensional feature vectors according to the sentiment strength of the words.
We detect the mixture of positive and negative sentiments on a multi-variant
scale. Empirical evaluation of the proposed framework on Turkish tweets shows
that our approach gets good results for opinion mining.
Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, Rob Procter
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Despite the increasing use of social media platforms for information and news
gathering, its unmoderated nature often leads to the emergence and spread of
rumours, i.e. pieces of information that are unverified at the time of posting.
At the same time, the openness of social media platforms provides opportunities
to study how users share and discuss rumours, and to explore how natural
language processing and data mining techniques may be used to find ways of
determining their veracity. In this survey we introduce and discuss two types
of rumours that circulate on social media; long-standing rumours that circulate
for long periods of time, and newly-emerging rumours spawned during fast-paced
events such as breaking news, where reports are released piecemeal and often
with an unverified status in their early stages. We provide an overview of
research into social media rumours with the ultimate goal of developing a
rumour classification system that consists of four components: rumour
detection, rumour tracking, rumour stance classification and rumour veracity
classification. We delve into the approaches presented in the scientific
literature for the development of each of these four components. We summarise
the efforts and achievements so far towards the development of rumour
classification systems and conclude with suggestions for avenues for future
research in social media mining for detection and resolution of rumours.
Matthias Sperber, Graham Neubig, Jan Niehues, Alex Waibel
Subjects: Computation and Language (cs.CL)
The input to a neural sequence-to-sequence model is often determined by an
up-stream system, e.g. a word segmenter, part of speech tagger, or speech
recognizer. These up-stream models are potentially error-prone. Representing
inputs through word lattices allows making this uncertainty explicit by
capturing alternative sequences and their posterior probabilities in a compact
form. In this work, we extend the TreeLSTM (Tai et al., 2015) into a
LatticeLSTM that is able to consume word lattices, and can be used as encoder
in an attentional encoder-decoder model. We integrate lattice posterior scores
into this architecture by extending the TreeLSTM’s child-sum and forget gates
and introducing a bias term into the attention mechanism. We experiment with
speech translation lattices and report consistent improvements over baselines
that translate either the 1-best hypothesis or the lattice without posterior
scores.
Daniel Hershcovich, Omri Abend, Ari Rappoport
Comments: 16 pages; Accepted as long paper at ACL2017
Subjects: Computation and Language (cs.CL)
We present the first parser for UCCA, a cross-linguistically applicable
framework for semantic representation, which builds on extensive typological
work and supports rapid annotation. UCCA poses a challenge for existing parsing
techniques, as it exhibits reentrancy (resulting in DAG structures),
discontinuous structures and non-terminal nodes corresponding to complex
semantic units. To our knowledge, the conjunction of these formal properties is
not supported by any existing parser. Our transition-based parser, which uses a
novel transition set and features based on bidirectional LSTMs, has value not
just for UCCA parsing: its ability to handle more general graph structures can
inform the development of parsers for other semantic DAG structures, and in
languages that frequently use discontinuous structures.
Isabelle Augenstein, Anders Søgaard
Comments: ACL 2017
Journal-ref: ACL 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Keyphrase boundary classification (KBC) is the task of detecting keyphrases
in scientific articles and labelling them with respect to predefined types.
Although important in practice, this task is so far underexplored, partly due
to the lack of labelled data. To overcome this, we explore several auxiliary
tasks, including semantic super-sense tagging and identification of multi-word
expressions, and cast the task as a multi-task learning problem with deep
recurrent neural networks. Our multi-task models perform significantly better
than previous state of the art approaches on two scientific KBC datasets,
particularly for long keyphrases.
Yinfei Yang, Ani Nenkova
Comments: In submission to JAIR
Subjects: Computation and Language (cs.CL)
Content-dense news report important factual information about an event in
direct, succinct manner. Information seeking applications such as information
extraction, question answering and summarization normally assume all text they
deal with is content-dense. Here we empirically test this assumption on news
articles from the business, U.S. international relations, sports and science
journalism domains. Our findings clearly indicate that about half of the news
texts in our study are in fact not content-dense and motivate the development
of a supervised content-density detector. We heuristically label a large
training corpus for the task and train a two-layer classifying model based on
lexical and unlexicalized syntactic features. On manually annotated data, we
compare the performance of domain-specific classifiers, trained on data only
from a given news domain and a general classifier in which data from all four
domains is pooled together. Our annotation and prediction experiments
demonstrate that the concept of content density varies depending on the domain
and that naive annotators provide judgement biased toward the stereotypical
domain label. Domain-specific classifiers are more accurate for domains in
which content-dense texts are typically fewer. Domain independent classifiers
reproduce better naive crowdsourced judgements. Classification prediction is
high across all conditions, around 80%.
Feng Qian, Lei Sha, Baobao Chang, Lu-chen Liu, Ming Zhang
Subjects: Computation and Language (cs.CL)
Traditional approaches to Semantic Role Labeling (SRL) depend heavily on
manual feature engineering. Recurrent neural network (RNN) with long-short-term
memory (LSTM) only treats sentence as sequence data and can not utilize higher
level syntactic information. In this paper, we propose Syntax Aware LSTM
(SA-LSTM) which gives RNN-LSTM ability to utilize higher level syntactic
information gained from dependency relationship information. SA-LSTM also
assigns different trainable weights to different types of dependency
relationship automatically. Experiment results on Chinese Proposition Bank
(CPB) show that, even without pre-training or introducing any other extra
semantically annotated resources, our SA-LSTM model still outperforms the state
of the art significantly base on Student’s t-test ((p<0.05)). Trained weights
of types of dependency relationship form a stable and self-explanatory pattern.
Junki Matsuo, Mamoru Komachi, Katsuhito Sudoh
Comments: 5 pages
Subjects: Computation and Language (cs.CL)
One of the most important problems in machine translation (MT) evaluation is
to evaluate the similarity between translation hypotheses with different
surface forms from the reference, especially at the segment level. We propose
to use word embeddings to perform word alignment for segment-level MT
evaluation. We performed experiments with three types of alignment methods
using word embeddings. We evaluated our proposed methods with various
translation datasets. Experimental results show that our proposed methods
outperform previous word embeddings-based methods.
Jaehong Park, Byunggook Na, Sungroh Yoon
Comments: 10 pages, 2 figures, 3 tables
Subjects: Computation and Language (cs.CL)
Recent works have proved that synthetic parallel data generated by existing
translation models can be an effective solution to various neural machine
translation (NMT) issues. In this study, we construct NMT systems using only
synthetic parallel data. As an effective alternative to real parallel data, we
also present a new type of synthetic parallel corpus. The proposed pseudo
parallel data are distinct from previous approaches in that real and synthetic
sentences are mixed on both sides of sentence pairs. Experiments on
Czech-German and French-German translations demonstrate the efficacy of the
proposed pseudo parallel corpus that guarantees not only both balanced and
competitive performance for bidirectional translation but also substantial
improvement with the aid of a real parallel corpus.
Lianhui Qin, Zhisong Zhang, Hai Zhao, Zhiting Hu, Eric P. Xing
Comments: To appear in ACL2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Implicit discourse relation classification is of great challenge due to the
lack of connectives as strong linguistic cues, which motivates the use of
annotated implicit connectives to improve the recognition. We propose a feature
imitation framework in which an implicit relation network is driven to learn
from another neural network with access to connectives, and thus encouraged to
extract similarly salient features for accurate classification. We develop an
adversarial model to enable an adaptive imitation scheme through competition
between the implicit network and a rival feature discriminator. Our method
effectively transfers discriminability of connectives to the implicit features,
and achieves state-of-the-art performance on the PDTB benchmark.
Amrita Saha, Mitesh Khapra, Karthik Sankaranarayanan
Subjects: Computation and Language (cs.CL)
Owing to the success of deep learning techniques for tasks such as Q/A and
text-based dialog, there is an increasing demand for AI agents in several
domains such as retail, travel, entertainment, etc. that can carry on
multimodal conversations with humans employing both text and images within a
dialog seamlessly. However, deep learning research is this area has been
limited primarily due to the lack of availability of large-scale, open
conversation datasets. To overcome this bottleneck, in this paper we introduce
the task of multi-modal, domain-aware conversations, and propose the MMD
benchmark dataset to- wards this task. This dataset was gathered by working in
close coordination with large number of domain experts in the retail domain and
consists of over 150K conversation sessions between shoppers and sales agents.
With this dataset, we propose 5 new sub-tasks for multimodal conversations
along with their evaluation methodology. We also propose two novel multi-modal
deep learning models in the encode- attend-decode paradigm and demonstrate
their performance on two of the sub-tasks, namely text response generation and
best image response selection. These experiments serve to establish baseline
performance numbers and open new directions of research for each of these
sub-tasks.
Haixia Liu
Subjects: Computation and Language (cs.CL)
Citation sentiment analysis is an important task in scientific paper
analysis. Existing machine learning techniques for citation sentiment analysis
are focusing on labor-intensive feature engineering, which requires large
annotated corpus. As an automatic feature extraction tool, word2vec has been
successfully applied to sentiment analysis of short texts. In this work, I
conducted empirical research with the question: how well does word2vec work on
the sentiment analysis of citations? The proposed method constructed sentence
vectors (sent2vec) by averaging the word embeddings, which were learned from
Anthology Collections (ACL-Embeddings). I also investigated polarity-specific
word embeddings (PS-Embeddings) for classifying positive and negative
citations. The sentence vectors formed a feature space, to which the examined
citation sentence was mapped to. Those features were input into classifiers
(support vector machines) for supervised classification. Using
10-cross-validation scheme, evaluation was conducted on a set of annotated
citations. The results showed that word embeddings are effective on classifying
positive and negative citations. However, hand-crafted features performed
better for the overall classification.
Meysam Alizadeh, Ingmar Weber, Claudio Cioffi-Revilla, Santo Fortunato, Michael Macy
Comments: 11 pages, 3 figures
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Global recruitment into radical Islamic movements has spurred renewed
interest in the appeal of political extremism. Is the appeal a rational
response to material conditions or is it the expression of psychological and
personality disorders associated with aggressive behavior, intolerance,
conspiratorial imagination, and paranoia? Empirical answers using surveys have
been limited by lack of access to extremist groups, while field studies have
lacked psychological measures and failed to compare extremists with contrast
groups. We revisit the debate over the appeal of extremism in the U.S. context
by comparing publicly available Twitter messages written by over 355,000
political extremist followers with messages written by non-extremist U.S.
users. Analysis of text-based psychological indicators supports the moral
foundation theory which identifies emotion as a critical factor in determining
political orientation of individuals. Extremist followers also differ from
others in four of the Big Five personality traits.
Layla El Asri, Hannes Schulz, Shikhar Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul Mehrotra, Kaheer Suleman
Subjects: Computation and Language (cs.CL)
This paper presents the Frames dataset (Frames is available at
this http URL), a corpus of 1369 human-human dialogues
with an average of 15 turns per dialogue. We developed this dataset to study
the role of memory in goal-oriented dialogue systems. Based on Frames, we
introduce a task called frame tracking, which extends state tracking to a
setting where several states are tracked simultaneously. We propose a baseline
model for this task. We show that Frames can also be used to study memory in
dialogue management and information presentation through natural language
generation.
Katharina Kann, Ryan Cotterell, Hinrich Schütze
Comments: Accepted at ACL 2017
Subjects: Computation and Language (cs.CL)
We present a novel cross-lingual transfer method for paradigm completion, the
task of mapping a lemma to its inflected forms, using a neural encoder-decoder
model, the state of the art for the monolingual task. We use labeled data from
a high-resource language to increase performance on a low-resource language. In
experiments on 21 language pairs from four different language families, we
obtain up to 58% higher accuracy than without transfer and show that even
zero-shot and one-shot learning are possible. We further find that the degree
of language relatedness strongly influences the ability to transfer
morphological knowledge.
Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
Subjects: Computation and Language (cs.CL)
This paper proposes to tackle open-domain question answering using Wikipedia
as the unique knowledge source: the answer to any factoid question is a text
span in a Wikipedia article. This task of machine reading at scale combines the
challenges of document retrieval – finding the relevant articles – with that of
machine comprehension of text – identifying the answer spans from those
articles. Our approach combines a search component based on bigram hashing and
TF-IDF matching with a multi-layer recurrent neural network model trained to
detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA
datasets indicate that (1) both modules are highly competitive with respect to
existing counterparts and (2) multitask learning using distant supervision on
their combination is an effective complete system on this challenging task.
Esra Akbas
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
As the type and the number of such venues increase, automated analysis of
sentiment on textual resources has become an essential data mining task. In
this paper, we investigate the problem of mining opinions by extracting aspects
of entities on the collection of informal short texts. Both positive and
negative sentiment strength of texts are detected. We focus on a non-English
language that has few resources for text mining. This approach would help
enhance the sentiment analysis in languages where a list of opinionated words
does not exist. We propose a new method projects the text into dense and low
dimensional feature vectors according to the sentiment strength of the words.
We detect the mixture of positive and negative sentiments on a multi-variant
scale. Empirical evaluation of the proposed framework on Turkish tweets shows
that our approach gets good results for opinion mining.
Arjun Chandrasekaran, Deshraj Yadav, Prithvijit Chattopadhyay, Viraj Prabhu, Devi Parikh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Theory of Mind is the ability to attribute mental states (beliefs, intents,
knowledge, perspectives, etc.) to others and recognize that these mental states
may differ from one’s own. Theory of Mind is critical to effective
communication and to teams demonstrating higher collective performance. To
effectively leverage the progress in Artificial Intelligence (AI) to make our
lives more productive, it is important for humans and AI to work well together
in a team. Traditionally, there has been much emphasis on research to make AI
more accurate, and (to a lesser extent) on having it better understand human
intentions, tendencies, beliefs, and contexts. The latter involves making AI
more human-like and having it develop a theory of our minds.
In this work, we argue that for human-AI teams to be effective, humans must
also develop a theory of AI’s mind – get to know its strengths, weaknesses,
beliefs, and quirks. We instantiate these ideas within the domain of Visual
Question Answering (VQA). We find that using just a few examples(50), lay
people can be trained to better predict responses and oncoming failures of a
complex VQA model. Surprisingly, we find that having access to the model’s
internal states – its confidence in its top-k predictions, explicit or implicit
attention maps which highlight regions in the image (and words in the question)
the model is looking at (and listening to) while answering a question about an
image – do not help people better predict its behavior
Vadim Markovtsev, Eiso Kant
Comments: 11 pages
Subjects: Programming Languages (cs.PL); Computation and Language (cs.CL)
Programming languages themselves have a limited number of reserved keywords
and character based tokens that define the language specification. However,
programmers have a rich use of natural language within their code through
comments, text literals and naming entities. The programmer defined names that
can be found in source code are a rich source of information to build a high
level understanding of the project. The goal of this paper is to apply topic
modeling to names used in over 13.6 million repositories and perceive the
inferred topics. One of the problems in such a study is the occurrence of
duplicate repositories not officially marked as forks (obscure forks). We show
how to address it using the same identifiers which are extracted for topic
modeling.
We open with a discussion on naming in source code, we then elaborate on our
approach to remove exact duplicate and fuzzy duplicate repositories using
Locality Sensitive Hashing on the bag-of-words model and then discuss our work
on topic modeling; and finally present the results from our data analysis
together with open-access to the source code, tools and datasets.
Julian Romera
Comments: Initial version, 105 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Performance (cs.PF)
The Breadth First Search (BFS) algorithm is the foundation and building block
of many higher graph-based operations such as spanning trees, shortest paths
and betweenness centrality. The importance of this algorithm increases each day
due to it is a key requirement for many data structures which are becoming
popular nowadays. When the BFS algorithm is parallelized by distributing the
graph between several processors the interconnection network limits the
performance. Hence, improvements on this area may benefit the overall
performance of the algorithm.
This work presents an alternative compression scheme for communications in
distributed BFS processing. It focuses on BFS processors using General-Purpose
Graphics Processing Units.
Vineet John, Xia Liu
Comments: 8 pages, 10 figures, 1 table
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This paper surveys the message brokers that are in vogue today for
distributed communication. Their primary goal is to facilitate the construction
of decentralized topologies without single points of failure, enabling fault
tolerance and high availability. These characteristics make them optimal for
usage within distributed architectures. However, there are multiple protocols
built to achieve this, and it would be beneficial to have a empirical
comparison between their features and performance to determine their real-world
applicability.
This paper focuses on two popular protocols (Kafka and AMQP) and explores the
divergence in their features as well as their performance under varied testing
workloads.
Yanhong A. Liu, Saksham Chand, Scott D. Stoller
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This paper presents simpler high-level specifications of more complex
variants of the Paxos algorithm for distributed consensus. The development of
the specifications is by using a method and language for expressing complex
control flows and synchronization conditions precisely at a high level. The
resulting specifications are both completely precise and fully executable in a
programming language, and furthermore are formally verified using a proof
system. We show that high-level specifications allow better understanding,
faster error discovery, and easier optimization and extension, including a
general method for merging processes.
Orlando Moreira, Merten Popp, Christian Schulz
Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Graphs are widely used to model execution dependencies in applications. In
particular, the NP-complete problem of partitioning a graph under constraints
receives enormous attention by researchers because of its applicability in
multiprocessor scheduling. We identified the additional constraint of acyclic
dependencies between blocks when mapping computer vision and imaging
applications to a heterogeneous embedded multiprocessor. Existing algorithms
and heuristics do not address this requirement and deliver results that are not
applicable for our use-case. In this work, we show that this more constrained
version of the graph partitioning problem is NP-complete and present heuristics
that achieve a close approximation of the optimal solution found by an
exhaustive search for small problem instances and much better scalability for
larger instances. In addition, we can show a positive impact on the schedule of
a real imaging application that improves communication volume and execution
time.
Istvan Z Reguly, Gihan R Mudalige, Mike B Giles
Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Stencil computations occur in a multitude of scientific simulations and
therefore have been the subject of many domain-specific languages including the
OPS (Oxford Parallel library for Structured meshes) DSL embedded in
C/C++/Fortran. OPS is currently used in several large partial differential
equations (PDE) applications, and has been used as a vehicle to experiment
with, and deploy performance improving optimisations. The key common bottleneck
in most stencil codes is data movement, and other research has shown that
improving data locality through optimisations that schedule across loops do
particularly well. However, in many large PDE applications it is not possible
to apply such optimisations through a compiler because in larger-scale codes,
there are a huge number of options, execution paths and data per grid point,
many dependent on run-time parameters, and the code is distributed across a
number of different compilation units. In this paper, we adapt the data
locality improving optimisation called iteration space slicing for use in large
OPS apps, relying on run-time analysis and delayed execution. We observe
speedups of 2x on the Cloverleaf 2D/3D proxy application, which contain 83/141
loops respectively. The approach is generally applicable to any stencil DSL
that provides per loop data access information.
Rong Ge, Chi Jin, Yi Zheng
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
In this paper we develop a new framework that captures the common landscape
underlying the common non-convex low-rank matrix problems including matrix
sensing, matrix completion and robust PCA. In particular, we show for all above
problems (including asymmetric cases): 1) all local minima are also globally
optimal; 2) no high-order saddle points exists. These results explain why
simple algorithms such as stochastic gradient descent have global converge, and
efficiently optimize these non-convex objective functions in practice. Our
framework connects and simplifies the existing analyses on optimization
landscapes for matrix sensing and symmetric matrix completion. The framework
naturally leads to new results for asymmetric matrix completion and robust PCA.
Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, Luc Van Gool
Comments: Supplementary visual examples available at: this http URL
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
In this work we present a new approach to learn compressible representations
in deep architectures with an end-to-end training strategy. Our method is based
on a soft relaxation of quantization and entropy, which we anneal to their
discrete counterparts throughout training. We showcase this method for two
challenging applications: Image compression and neural network compression.
While these tasks have typically been approached with different methods, our
soft-to-hard quantization approach gives state-of-the-art results for both.
Frank Nielsen, Ke Sun
Comments: 18 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Clustering categorical distributions in the probability simplex is a
fundamental primitive often met in applications dealing with histograms or
mixtures of multinomials. Traditionally, the differential-geometric structure
of the probability simplex has been used either by (i) setting the Riemannian
metric tensor to the Fisher information matrix of the categorical
distributions, or (ii) defining the information-geometric structure induced by
a smooth dissimilarity measure, called a divergence. In this paper, we
introduce a novel computationally-friendly non-Riemannian framework for
modeling the probability simplex: Hilbert simplex geometry. We discuss the pros
and cons of those three statistical modelings, and compare them experimentally
for clustering tasks.
Sayak Ray Chowdhury, Aditya Gopalan
Subjects: Learning (cs.LG)
We consider the stochastic bandit problem with a continuous set of arms, with
the expected reward function over the arms assumed to be fixed but unknown. We
provide two new Gaussian process-based algorithms for continuous bandit
optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and
derive corresponding regret bounds. Specifically, the bounds hold when the
expected reward function belongs to the reproducing kernel Hilbert space (RKHS)
that naturally corresponds to a Gaussian process kernel used as input by the
algorithms. Along the way, we derive a new self-normalized concentration
inequality for vector- valued martingales of arbitrary, possibly infinite,
dimension. Finally, experimental evaluation and comparisons to existing
algorithms on synthetic and real-world environments are carried out that
highlight the favorable gains of the proposed strategies in many cases.
U.N. Niranjan, Arun Rajkumar, Theja Tulabandhula
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
The robust PCA problem, wherein, given an input data matrix that is the
superposition of a low-rank matrix and a sparse matrix, we aim to separate out
the low-rank and sparse components, is a well-studied problem in machine
learning. One natural question that arises is that, as in the inductive
setting, if features are provided as input as well, can we hope to do better?
Answering this in the affirmative, the main goal of this paper is to study the
robust PCA problem while incorporating feature information. In contrast to
previous works in which recovery guarantees are based on the convex relaxation
of the problem, we propose a simple iterative algorithm based on
hard-thresholding of appropriate residuals. Under weaker assumptions than
previous works, we prove the global convergence of our iterative procedure;
moreover, it admits a much faster convergence rate and lesser computational
complexity per iteration. In practice, through systematic synthetic and real
data simulations, we confirm our theoretical findings regarding improvements
obtained by using feature information.
Geoffrey I. Webb, Loong Kuan Lee, François Petitjean, Bart Goethals
Subjects: Learning (cs.LG)
Concept drift is a major issue that greatly affects the accuracy and
reliability of many real-world applications of machine learning. We argue that
to tackle concept drift it is important to develop the capacity to describe and
analyze it. We propose tools for this purpose, arguing for the importance of
quantitative descriptions of drift in marginal distributions. We present
quantitative drift analysis techniques along with methods for communicating
their results. We demonstrate their effectiveness by application to three
real-world learning tasks.
Michael Sandbichler, Karin Schnass
Comments: 10 pages, 11 figures
Subjects: Learning (cs.LG); Numerical Analysis (math.NA)
In this paper two sequential algorithms for learning analysis operators are
presented, which are built upon the same optimisation principle underlying both
Analysis K-SVD and Analysis SimCO and use a stochastic gradient descent
approach similar to ASimCO. The sequential analysis operator learning (SAOL)
algorithm is based on projected gradient descent with an appropriately chosen
step size while the implicit SAOL (ISAOL) algorithm avoids choosing a step size
altogether by using a strategy inspired by the implicit Euler scheme for
solving ordinary differential equations. Both algorithms are tested on
synthetic and image data in comparison to Analysis SimCO and found to give
slightly better recovery rates resp. decay of the objective function. In a
final denoising experiment the presented algorithms are again shown to perform
well in comparison to the state of the art algorithm ASimCO.
Ozsel Kilinc, Ismail Uysal
Comments: Submitted to UAI 2017
Subjects: Learning (cs.LG)
We introduce a novel validation framework to measure the true robustness of
learning models for real-world applications by creating source-inclusive and
source-exclusive partitions in a dataset via clustering. We develop a
robustness metric derived from source-aware lower and upper bounds of model
accuracy even when data source labels are not readily available. We clearly
demonstrate that even on a well-explored dataset like MNIST, challenging
training scenarios can be constructed under the proposed assessment framework
for two separate yet equally important applications: i) more rigorous learning
model comparison and ii) dataset adequacy evaluation. In addition, our findings
not only promise a more complete identification of trade-offs between model
complexity, accuracy and robustness but can also help researchers optimize
their efforts in data collection by identifying the less robust and more
challenging class labels.
Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, Kilian Q. Weinberger
Subjects: Learning (cs.LG)
Ensembles of neural networks are known to be much more robust and accurate
than individual networks. However, training multiple deep networks for model
averaging is computationally expensive. In this paper, we propose a method to
obtain the seemingly contradictory goal of ensembling multiple neural networks
at no additional training cost. We achieve this goal by training a single
neural network, converging to several local minima along its optimization path
and saving the model parameters. To obtain repeated rapid convergence, we
leverage recent work on cyclic learning rate schedules. The resulting
technique, which we refer to as Snapshot Ensembling, is simple, yet
surprisingly effective. We show in a series of experiments that our approach is
compatible with diverse network architectures and learning tasks. It
consistently yields lower error rates than state-of-the-art single models at no
additional training cost, and compares favorably with traditional network
ensembles. On CIFAR-10 and CIFAR-100 our DenseNet Snapshot Ensembles obtain
error rates of 3.4% and 17.4% respectively.
Wang Chi Cheung, David Simchi-Levi
Comments: 16 pages, 2 figures
Subjects: Learning (cs.LG)
Motivated by e-commerce, we study the online assortment optimization problem.
The seller offers an assortment, i.e. a subset of products, to each arriving
customer, who then purchases one or no product from her offered assortment. A
customer’s purchase decision is governed by the underlying MultiNomial Logit
(MNL) choice model. The seller aims to maximize the total revenue in a finite
sales horizon, subject to resource constraints and uncertainty in the MNL
choice model. We first propose an efficient online policy which incurs a regret
( ilde{O}(T^{2/3})), where (T) is the number of customers in the sales
horizon. Then, we propose a UCB policy that achieves a regret
( ilde{O}(T^{1/2})). Both regret bounds are sublinear in the number of
assortments.
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville
Subjects: Learning (cs.LG)
Generative Adversarial Networks (GANs) are powerful generative models, but
suffer from training instability. The recently proposed Wasserstein GAN (WGAN)
makes significant progress toward stable training of GANs, but can still
generate low-quality samples or fail to converge in some settings. We find that
these training failures are often due to the use of weight clipping in WGAN to
enforce a Lipschitz constraint on the critic, which can lead to pathological
behavior. We propose an alternative method for enforcing the Lipschitz
constraint: instead of clipping weights, penalize the norm of the gradient of
the critic with respect to its input. Our proposed method converges faster and
generates higher-quality samples than WGAN with weight clipping. Finally, our
method enables very stable GAN training: for the first time, we can train a
wide variety of GAN architectures with almost no hyperparameter tuning,
including 101-layer ResNets and language models over discrete data.
Hsiao-Yu Fish Tung, Chao-Yuan Wu, Manzil Zaheer, Alexander J. Smola
Comments: Keywords: Spectral Methods, Indian Buffet Process, Hierarchical Dirichlet Process
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Nonparametric models are versatile, albeit computationally expensive, tool
for modeling mixture models. In this paper, we introduce spectral methods for
the two most popular nonparametric models: the Indian Buffet Process (IBP) and
the Hierarchical Dirichlet Process (HDP). We show that using spectral methods
for the inference of nonparametric models are computationally and statistically
efficient. In particular, we derive the lower-order moments of the IBP and the
HDP, propose spectral algorithms for both models, and provide reconstruction
guarantees for the algorithms. For the HDP, we further show that applying
hierarchical models on dataset with hierarchical structure, which can be solved
with the generalized spectral HDP, produces better solutions to that of flat
models regarding likelihood performance.
Lars Maaløe, Marco Fraccaro, Ole Winther
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Deep generative models trained with large amounts of unlabelled data have
proven to be powerful within the domain of unsupervised learning. Many real
life data sets contain a small amount of labelled data points, that are
typically disregarded when training generative models. We propose the
Cluster-aware Generative Model, that uses unlabelled information to infer a
latent representation that models the natural clustering of the data, and
additional labelled data points to refine this clustering. The generative
performances of the model significantly improve when labelled information is
exploited, obtaining a log-likelihood of -79.38 nats on permutation invariant
MNIST, while also achieving competitive semi-supervised classification
accuracies. The model can also be trained fully unsupervised, and still improve
the log-likelihood performance with respect to related methods.
Jalal Etesami, Kun Zhang, Negar Kiyavash
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Measuring the dependencies among the variables of a network is of great
interest to many disciplines. This paper studies the limitations of the
existing dependencies measures such as their shortcomings in detecting direct
influences or their lack of ability for group selection in order to have
effective interventions and introduces a new statistical influence measure to
overcome them. This measure is inspired by Dobrushin’s coefficients and has
been developed based on the paradigm that the conditional distribution of the
variable of interest given all the direct causes will not change by intervening
on other variables in the system. We show the advantageous of this measure over
the related measures in the literature. Moreover, we establish the connection
between our measure and the integral probability metric (IPM) that helps to
develop estimators for our measure with lower complexity compared to the other
relevant information theoretic based measures. At the end, we show the
performance of this measure through a numerical simulation.
Vraj Shah, Arun Kumar, Xiaojin Zhu
Comments: 10 pages
Subjects: Databases (cs.DB); Learning (cs.LG)
Many datasets have multiple tables connected by key-foreign key dependencies.
Data scientists usually join all tables to bring in extra features from the
so-called dimension tables. Unlike the statistical relational learning setting,
such joins do not cause record duplications, which means regular IID models are
typically used. Recent work demonstrated the possibility of using foreign key
features as representatives for the dimension tables’ features and eliminating
the latter a priori, potentially saving runtime and effort of data scientists.
However, the prior work was restricted to linear models and it established a
dichotomy of when dimension tables are safe to discard due to extra overfitting
caused by the use of foreign key features. In this work, we revisit that
question for two popular high capacity models: decision tree and SVM with RBF
kernel. Our extensive empirical and simulation-based analyses show that these
two classifiers are surprisingly and counter-intuitively more robust to
discarding dimension tables and face much less extra overfitting than linear
models. We provide intuitive explanations for their behavior and identify new
open questions for further ML theoretical research. We also identify and
resolve two key practical bottlenecks in using foreign key features.
Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann
Comments: under review at ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)
Analyzing videos of human actions involves understanding the temporal
relationships among video frames. CNNs are the current state-of-the-art methods
for action recognition in videos. However, the CNN architectures currently
being used have difficulty in capturing these relationships. State-of-the-art
action recognition approaches rely on traditional local optical flow estimation
methods to pre-compute the motion information for CNNs. Such a two-stage
approach is computationally expensive, storage demanding, and not end-to-end
trainable. In this paper, we present a novel CNN architecture that implicitly
captures motion information. Our method is 10x faster than a two-stage
approach, does not need to cache flow information, and is end-to-end trainable.
Experimental results on UCF101 and HMDB51 show that it achieves competitive
accuracy with the two-stage approaches.
Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
A grand goal of computer vision is to build systems that learn visual
representations over time that can be applied to many tasks. In this paper, we
investigate a vision-language embedding as a core representation and show that
it leads to better cross-task transfer than standard multi-task learning. In
particular, the task of visual recognition is aligned to the task of visual
question answering by forcing each to use the same word-region embeddings. We
show this leads to greater inductive transfer from recognition to VQA than
standard multitask learning. Visual recognition also improves, especially for
categories that have relatively few recognition training labels but appear
often in the VQA setting. Thus, our paper takes a small step towards creating
more general vision systems by showing the benefit of interpretable, flexible,
and trainable core representations.
Lianhui Qin, Zhisong Zhang, Hai Zhao, Zhiting Hu, Eric P. Xing
Comments: To appear in ACL2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Implicit discourse relation classification is of great challenge due to the
lack of connectives as strong linguistic cues, which motivates the use of
annotated implicit connectives to improve the recognition. We propose a feature
imitation framework in which an implicit relation network is driven to learn
from another neural network with access to connectives, and thus encouraged to
extract similarly salient features for accurate classification. We develop an
adversarial model to enable an adaptive imitation scheme through competition
between the implicit network and a rival feature discriminator. Our method
effectively transfers discriminability of connectives to the implicit features,
and achieves state-of-the-art performance on the PDTB benchmark.
Patrick R. Johnstone, Pierre Moulin
Comments: 49 pages
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Numerical Analysis (cs.NA); Numerical Analysis (math.NA)
The purpose of this manuscript is to derive new convergence results for
several subgradient methods for minimizing nonsmooth convex functions with
H”olderian growth. The growth condition is satisfied in many applications and
includes functions with quadratic growth and functions with weakly sharp minima
as special cases. To this end there are four main contributions. First, for a
constant and sufficiently small stepsize, we show that the subgradient method
achieves linear convergence up to a certain region including the optimal set
with error of the order of the stepsize. Second, we derive nonergodic
convergence rates for the subgradient method under nonsummable decaying
stepsizes. Thirdly if appropriate problem parameters are known we derive a
possibly-summable stepsize which obtains a much faster convergence rate.
Finally we develop a novel “descending stairs” stepsize which obtains this
faster convergence rate but also obtains linear convergence for the special
case of weakly sharp functions. We also develop a variant of the “descending
stairs” stepsize which achieves essentially the same convergence rate without
requiring an error bound constant which is difficult to estimate in practice.
Tegjyot Singh Sethi, Mehmed Kantardzic
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Classifiers deployed in the real world operate in a dynamic environment,
where the data distribution can change over time. These changes, referred to as
concept drift, can cause the predictive performance of the classifier to drop
over time, thereby making it obsolete. To be of any real use, these classifiers
need to detect drifts and be able to adapt to them, over time. Detecting drifts
has traditionally been approached as a supervised task, with labeled data
constantly being used for validating the learned model. Although effective in
detecting drifts, these techniques are impractical, as labeling is a difficult,
costly and time consuming activity. On the other hand, unsupervised change
detection techniques are unreliable, as they produce a large number of false
alarms. The inefficacy of the unsupervised techniques stems from the exclusion
of the characteristics of the learned classifier, from the detection process.
In this paper, we propose the Margin Density Drift Detection (MD3) algorithm,
which tracks the number of samples in the uncertainty region of a classifier,
as a metric to detect drift. The MD3 algorithm is a distribution independent,
application independent, model independent, unsupervised and incremental
algorithm for reliably detecting drifts from data streams. Experimental
evaluation on 6 drift induced datasets and 4 additional datasets from the
cybersecurity domain demonstrates that the MD3 approach can reliably detect
drifts, with significantly fewer false alarms compared to unsupervised feature
based drift detectors. The reduced false alarms enables the signaling of drifts
only when they are most likely to affect classification performance. As such,
the MD3 approach leads to a detection scheme which is credible, label efficient
and general in its applicability.
Francesco Conti, Robert Schilling, Pasquale Davide Schiavone, Antonio Pullini, Davide Rossi, Frank Kagan Gürkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini
Comments: 14 pages, 12 figures, submitted to IEEE Transactions on Circuits and Systems – I: Regular Papers. Updated according to the minor revision of the reviewers
Subjects: Hardware Architecture (cs.AR); Learning (cs.LG)
Near-sensor data analytics is a promising direction for IoT endpoints, as it
minimizes energy spent on communication and reduces network load – but it also
poses security concerns, as valuable data is stored or sent over the network at
various stages of the analytics pipeline. Using encryption to protect sensitive
data at the boundary of the on-chip analytics engine is a way to address data
security issues. To cope with the combined workload of analytics and encryption
in a tight power envelope, we propose Fulmine, a System-on-Chip based on a
tightly-coupled multi-core cluster augmented with specialized blocks for
compute-intensive data processing and encryption functions, supporting software
programmability for regular computing tasks. The Fulmine SoC, fabricated in
65nm technology, consumes less than 20mW on average at 0.8V achieving an
efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to
25MIPS/mW in software. As a strong argument for real-life flexible application
of our platform, we show experimental results for three secure analytics use
cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN
consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with
secured remote recognition in 5.74pJ/op; and seizure detection with encrypted
data collection from EEG within 12.7pJ/op.
Andrew James Ferris, Christoph Hirche, David Poulin
Comments: Subsumes arXiv:1312.4575
Subjects: Information Theory (cs.IT)
Arikan’s Polar codes attracted much attention as the first efficiently
decodable and capacity achieving codes. Furthermore, Polar codes exhibit an
exponentially decreasing block error probability with an asymptotic error
exponent upper bounded by 1/2. Since their discovery, many attempts have been
made to improve the error exponent and the finite block-length performance,
while keeping the bloc-structured kernel. Recently, two of us introduced a new
family of efficiently decodable error-correction codes based on a recently
discovered efficiently-contractible tensor network family in quantum many-body
physics, called branching MERA. These codes, called branching MERA codes,
include Polar codes and also extend them in a non-trivial way by substituting
the bloc-structured kernel by a convolutional structure. Here, we perform an
in-depth study of a particular example that can be thought of as a direct
extension to Arikan’s Polar code, which we therefore name Convolutional Polar
codes. We prove that these codes polarize and exponentially suppress the
channel’s error probability, with an asymptotic error exponent log_2(3)/2 which
is provably better than for Polar codes under successive cancellation decoding.
We also perform finite block-size numerical simulations which display improved
error-correcting capability with only a minor impact on decoding complexity.
Vamsi Krishna Gummadi, Ashok Choudhary, Prasad Krishnan
Subjects: Information Theory (cs.IT)
An index coding (IC) problem consisting of a server and multiple receivers
with different side-information and demand sets can be equivalently represented
using a fitting matrix. A scalar linear index code to a given IC problem is a
matrix representing the transmitted linear combinations of the message symbols.
The length of an index code is then the number of transmissions (or
equivalently, the number of rows in the index code). An IC problem ({cal
I}_{ext}) is called an extension of another IC problem ({cal I}) if the
fitting matrix of ({cal I}) is a submatrix of the fitting matrix of ({cal
I}_{ext}). We first present a straightforward (m) extit{-order} extension
({cal I}_{ext}) of an IC problem ({cal I}) for which an index code is
obtained by concatenating (m) copies of an index code of ({cal I}). The length
of the codes is the same for both ({cal I}) and ({cal I}_{ext}), and if the
index code for ({cal I}) has optimal length then so does the extended code for
({cal I}_{ext}). More generally, an extended IC problem of ({cal I}) having
the same optimal length as ({cal I}) is said to be a extit{rank-invariant}
extension of ({cal I}). We then focus on (2)-order rank-invariant extensions
of ({cal I}), and present constructions of such extensions based on involutory
permutation matrices.
P. K. Deekshith, K. R. Sahasranand
Comments: 6 pages, 6 figures
Subjects: Information Theory (cs.IT)
The inherent nature of polar codes being channel specific makes it difficult
to use them in a setting where the communication channel changes with time. In
particular, to be able to use polar codes in a wireless scenario, varying
attenuation due to fading needs to be mitigated. To the best of our knowledge,
there has been no comprehensive work in this direction thus far. In this work,
a practical scheme involving channel inversion with the knowledge of the
channel state at the transmitter, is proposed. An additional practical
constraint on the permissible average and peak power is imposed, which in turn
makes the channel equivalent to an additive white Gaussian noise (AWGN) channel
cascaded with an erasure channel. It is shown that the constructed polar code
could be made to achieve the symmetric capacity of this channel. Further, a
means to compute the optimal design rate of the polar code for a given power
constraint is also discussed.
Wenqian Shen, Linglong Dai, Byonghyo Shim, Zhaocheng Wang, Robert W. Heath Jr
Comments: 30 pages, 9 figures
Subjects: Information Theory (cs.IT)
Channel feedback is essential in frequency division duplexing (FDD) massive
multiple-input multiple-output (MIMO) systems. Unfortunately, previous work on
multiuser MIMO has shown that the codebook size for channel feedback should
scale exponentially with the number of base station (BS) antennas, which is
greatly increased in massive MIMO systems. To reduce the codebook size and
feedback overhead, we propose an angle-of-departure (AoD)-adaptive subspace
codebook for channel feedback in FDD massive MIMO systems. Our key insight is
to leverage the observation that path AoDs vary more slowly than the path
gains. Within the angle coherence time, by utilizing the constant AoD
information, the proposed AoD-adaptive subspace codebook is able to quantize
the channel vector in a more accurate way. We also provide performance analysis
of the proposed codebook in the large-dimensional regime, where we prove that
to limit the capacity degradation within an acceptable level, the required
number of feedback bits only scales linearly with the number of resolvable
(path) AoDs, which is much smaller than the number of BS antennas. Moreover, we
compare quantized channel feedback using the proposed AoD-adaptive subspace
codebook with analog channel feedback. Extensive simulations that verify the
analytical results are provided.
Muhammad Hanif, Masoud Ardakani
Subjects: Information Theory (cs.IT)
This work is on fast encoding and decoding of polar codes. We propose and
detail 8-bit and 16-bit parallel decoders that can be used to reduce the
decoding latency of the successive-cancellation decoder. These decoders are
universal and can decode flexible-rate and flexible-length polar codes. We also
present fast encoders that can be used to increase the throughput of
serially-implemented polar encoders.
Yeqing Hu, Boon Loong Ng, Young-Han Nam, Jin Yuan, Gary Xu, Ji-Yun Seol, Jianzhong (Charlie)
Zhang
Subjects: Information Theory (cs.IT)
This paper presents the next evolution of FD-MIMO technology for beyond 5G,
where antennas of the FD-MIMO system are placed in a distributed manner
throughout the cell in a multi-cell deployment scenario. This system, referred
to as Distributed FD-MIMO (D-FD-MIMO) system, is capable of providing higher
cell average throughput as well as more uniform user experience compared to the
conventional FD-MIMO system. System level simulations are performed to evaluate
performance. Our results show that the proposed D-FD-MIMO system achieves 1.4-2
times cell average throughput gain compared to the FD-MIMO system. The insights
of performance gain are provided. Hardware implementation challenges and
potential standards impact are also presented.
Jose Flordelis, Fredrik Rusek, Fredrik Tufvesson, Erik G. Larsson, Ove Edfors
Comments: Submitted to IEEE Transactions on Wireless Communications, 31/Mar/2017
Subjects: Information Theory (cs.IT)
Downlink beamforming in Massive MIMO either relies on uplink pilot
measurements – exploiting reciprocity and TDD operation, or on the use of a
predetermined grid of beams with user equipments reporting their preferred
beams, mostly in FDD operation. Massive MIMO in its originally conceived form
uses the first strategy, with uplink pilots, whereas there is currently
significant commercial interest in the second, grid-of-beams. It has been
analytically shown that in isotropic scattering (independent Rayleigh fading)
the first approach outperforms the second. Nevertheless there remains
controversy regarding their relative performance in practice. In this
contribution, the performances of these two strategies are compared using
measured channel data at 2.6 GHz.
Jaein Kim, Seok-Hwan Park, Osvaldo Simeone, Inkyu Lee, Shlomo Shamai (Shitz)
Subjects: Information Theory (cs.IT)
In millimeter-wave communication systems with large-scale antenna arrays,
conventional digital beamforming may not be cost-effective. A promising
solution is the implementation of hybrid beamforming techniques, which consist
of low-dimensional digital beamforming followed by analog radio frequency (RF)
beamforming. This work studies the optimization of hybrid beamforming in the
context of a cloud radio access network (C-RAN) architecture. In a C-RAN
system, digital baseband signal processing functionalities are migrated from
remote radio heads (RRHs) to a baseband processing unit (BBU) in the “cloud” by
means of finite-capacity fronthaul links. Specifically, this work tackles the
problem of jointly optimizing digital beamforming and fronthaul quantization
strategies at the BBU, as well as RF beamforming at the RRHs, with the goal of
maximizing the weighted downlink sum-rate. Fronthaul capacity and per-RRH power
constraints are enforced along with constant modulus constraints on the RF
beamforming matrices. An iterative algorithm is proposed that is based on
successive convex approximation and on the relaxation of the constant modulus
constraint. The effectiveness of the proposed scheme is validated by numerical
simulation results.
Laxminarayana S Pillutla, Ramesh Annavajjala
Subjects: Information Theory (cs.IT)
In this report we present an analysis of the non-random and the Bayesian
Cramer-Rao lower bound (CRLB) for the joint estimation of angle-of-arrival
(AoA), angle-of-departure (AoD), and the multipath amplitudes, for the
millimeter-wave (mmWave) wireless networks. Our analysis is applicable to
multipath channels with Gaussian noise and independent path parameters.
Numerical results based on uniform AoA and AoD in ([0,pi)), and Rician fading
path amplitudes, reveal that the Bayesian CRLB decreases monotonically with an
increase in the Rice factor. Further, the CRLB obtained by using beamforming
and combining code books generated by quantizing directly the domain of AoA and
AoD was found to be lower than those obtained with other types of beamforming
and combining code books.
Zhifang Zhang, Jingke Xu
Subjects: Information Theory (cs.IT)
Suppose there are (N) distributed databases each storing a full set of (M)
independent files. A user wants to retrieve (r) out of the (M) files without
revealing the identity of the (r) files. When (r=1) it is the classic problem
of private information retrieval (PIR). In this paper we study the problem of
private multi-file retrieval (PMFR) which covers the case of general (r). We
first prove an upper bound on the capacity of PMFR schemes which indicates the
minimum possible download size per unit of retrieved files. Then we design a
general PMFR scheme which happens to attain the upper bound when
(rgeqfrac{M}{2}), thus achieving the optimal communication cost. As (r) goes
down we show the trivial approach of executing (r) independent PIR instances
achieves the near optimal communication cost. Comparing with the
capacity-achieving PIR schemes, our PMFR scheme reduces the number of
subpackages needed for each file from (N^M) to (N^2), which implies a great
reduction of implementation complexity.
Kwabena Doku-Amponsah
Comments: 6 pages. arXiv admin note: text overlap with arXiv:1609.05252
Subjects: Information Theory (cs.IT)
This article extends the Generalized Asypmtotic Equipartition Property of
Networked Data Structures to cover the Wireless Sensor Network modelled as
coloured geometric random graph (CGRG). The main techniques used to prove this
result remains large deviation principles for properly defined empirical
measures on coloured geometric random graphs. Application of the result to some
case study from the field of environmental science is discussed as a
motivation.
Jinke Ren, Guanding Yu, Yunlong Cai, Yinghui He
Comments: Submitted for Journal Publication
Subjects: Information Theory (cs.IT)
By offloading intensive computation tasks to the edge cloud located at the
cellular base stations, mobile-edge computation offloading (MECO) has been
regarded as a promising means to accomplish the ambitious millisecond-scale
end-to-end latency requirement of the fifth-generation networks. In this paper,
we investigate the latency-minimization problem in a multi-user time-division
multiple access MECO system with joint communication and computation resource
allocation. Three different computation models are studied, i.e., local
compression, edge cloud compression, and partial compression offloading. First,
closed-form expressions of optimal resource allocation and minimum system delay
for both local and edge cloud compression models are derived. Then, for the
partial compression offloading model, we formulate a piecewise optimization
problem and prove that the optimal data segmentation strategy has a piecewise
structure. Based on this result, an optimal joint communication and computation
resource allocation algorithm is developed. To gain more insights, we also
analyze a specific scenario where communication resource is adequate while
computation resource is limited. In this special case, the closed-form solution
of the piecewise optimization problem can be derived. Our proposed algorithms
are finally verified by numerical results, which show that the novel partial
compression offloading model can significantly reduce the end-to-end latency.
Jie Xu, Hang Wu, Lixing Chen, Cong Shen
Subjects: Information Theory (cs.IT)
Mobile Edge Computing (MEC) (a.k.a. fog computing) has recently emerged to
enable low-latency and location-aware data processing at the edge of mobile
networks. Providing grid power supply in support of MEC, however, is costly and
even infeasible, thus mandating on-site renewable energy as a major or even
sole power supply in many scenarios. Nonetheless, the high intermittency and
unpredictability of energy harvesting creates many new challenges of performing
effective MEC. In this paper, we develop an algorithm called GLOBE that
performs joint geographical load balancing (GLB) (for computation workload) and
admission control (for communication data traffic), for optimizing the system
performance of a network of MEC-enabled base stations. By leveraging the
Lyapunov optimization with perturbation technique, GLOBE operates online
without requiring future system information and addresses significant
challenges caused by battery state dynamics and energy causality constraints.
We prove that GLOBE achieves a close-to-optimal system performance compared to
the offline algorithm that knows full future information, and present a
critical tradeoff between battery capacity and system performance. Simulation
results validate our analysis and demonstrate the superior performance of GLOBE
compared to benchmark algorithms.
Emiliano Diaz
Subjects: Applications (stat.AP); Information Theory (cs.IT)
Sparse feature selection is necessary when we fit statistical models, we have
access to a large group of features, don’t know which are relevant, but assume
that most are not. Alternatively, when the number of features is larger than
the available data the model becomes over parametrized and the sparse feature
selection task involves selecting the most informative variables for the model.
When the model is a simple location model and the number of relevant features
does not grow with the total number of features, sparse feature selection
corresponds to sparse mean estimation. We deal with a simplified mean
estimation problem consisting of an additive model with gaussian noise and mean
that is in a restricted, finite hypothesis space. This restriction simplifies
the mean estimation problem into a selection problem of combinatorial nature.
Although the hypothesis space is finite, its size is exponential in the
dimension of the mean. In limited data settings and when the size of the
hypothesis space depends on the amount of data or on the dimension of the data,
choosing an approximation set of hypotheses is a desirable approach. Choosing a
set of hypotheses instead of a single one implies replacing the bias-variance
trade off with a resolution-stability trade off. Generalization capacity
provides a resolution selection criterion based on allowing the learning
algorithm to communicate the largest amount of information in the data to the
learner without error. In this work the theory of approximation set coding and
generalization capacity is explored in order to understand this approach. We
then apply the generalization capacity criterion to the simplified sparse mean
estimation problem and detail an importance sampling algorithm which at once
solves the difficulty posed by large hypothesis spaces and the slow convergence
of uniform sampling algorithms.
Amir Ban
Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)
Proper scoring rules elicit truth-telling when making predictions, or
otherwise revealing information. However, when multiple predictions are made of
the same event, telling the truth is in general no longer optimal, as agents
are motivated to distort early predictions to mislead competitors. We
demonstrate this, and then prove a significant exception: In a multi-agent
prediction setting where all agent signals belong to a jointly multivariate
normal distribution, and signal variances are common knowledge, the (proper)
logarithmic scoring rule will elicit truthful predictions from every agent at
every prediction, regardless of the number, order and timing of predictions.
The result applies in several financial models.
Mohamad Assaad, Salah Eddine Hajri, Thomas Bonald, Anthony Ephremides
Comments: conference paper, submitted
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
This paper considers the problem of power control in Massive MIMO systems
taking into account the pilot contamination issue and the arrivals and
departures of users in the network. Contrary to most of existing work in MIMO
systems that focuses on the physical layer with fixed number of users, we
consider in this work that the users arrive dynamically and leave the network
once they are served. We provide a power control strategy, having a polynomial
complexity, and prove that this policy stabilizes the network whenever
possible. We then provide a distributed implementation of the power control
policy requiring low information exchange between the BSs and show that it
achieves the same stability region as the centralized policy.
Santosh Nannuru, Kay L. Gemba, Peter Gerstoft, William S. Hodgkiss, Christoph Mecklenbräuker
Comments: 12 pages, 8 figures
Subjects: Applications (stat.AP); Information Theory (cs.IT)
Sparse Bayesian learning (SBL) has emerged as a fast and competitive method
to perform sparse processing. The SBL algorithm, which is developed using a
Bayesian framework, approximately solves a non-convex optimization problem
using fixed point updates. It provides comparable performance and is
significantly faster than convex optimization techniques used in sparse
processing. In this paper we propose signal model which accounts for dictionary
mismatch and the presence of spurious peaks at low signal-to-noise ratios. As
an extension of SBL, modified fixed point update equation is derived which
incorporates the statistics of mismatch and spurious peaks. We also extend the
SBL algorithm to process multi-frequency observations. The derived update
equations are studied quantitatively using beamforming simulations applied to
direction-of-arrival (DoA) estimation. Performance of SBL using single
frequency and multi-frequency observations, and in the presence of aliasing is
evaluated. Data collected from the SwellEx-96 experiment is used to demonstrate
qualitatively the advantages of SBL.
Ming Ding, David Lopez Perez, Guoqiang Mao
Comments: conference submission in Mar. 2017
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
We discover a new capacity scaling law in ultra-dense networks (UDNs) under
practical system assumptions, such as a general multi-piece path loss model, a
non-zero base station (BS) to user equipment (UE) antenna height difference,
and a finite UE density. The intuition and implication of this new capacity
scaling law are completely different from that found in year 2011. That law
indicated that the increase of the interference power caused by a denser
network would be exactly compensated by the increase of the signal power due to
the reduced distance between transmitters and receivers, and thus network
capacity should grow linearly with network densification. However, we find that
both the signal and interference powers become bounded in practical UDNs, which
leads to a constant capacity scaling law. As a result, network densification
should be stopped at a certain level for a given UE density, because the
network capacity will reach its limit due to (i) the bounded signal and
interference powers, and (ii) a finite frequency reuse factor because of a
finite UE density. Our new discovery on the constant capacity scaling law also
resolves the recent concerns about network capacity collapsing in UDNs, e.g.,
the capacity crash due to a non-zero BS-to-UE antenna height difference, or a
bounded path loss model of the near-field (NF) effect, etc.
U.N. Niranjan, Arun Rajkumar, Theja Tulabandhula
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
The robust PCA problem, wherein, given an input data matrix that is the
superposition of a low-rank matrix and a sparse matrix, we aim to separate out
the low-rank and sparse components, is a well-studied problem in machine
learning. One natural question that arises is that, as in the inductive
setting, if features are provided as input as well, can we hope to do better?
Answering this in the affirmative, the main goal of this paper is to study the
robust PCA problem while incorporating feature information. In contrast to
previous works in which recovery guarantees are based on the convex relaxation
of the problem, we propose a simple iterative algorithm based on
hard-thresholding of appropriate residuals. Under weaker assumptions than
previous works, we prove the global convergence of our iterative procedure;
moreover, it admits a much faster convergence rate and lesser computational
complexity per iteration. In practice, through systematic synthetic and real
data simulations, we confirm our theoretical findings regarding improvements
obtained by using feature information.
Rostislav Matveev, Jacobus W. Portegies
Subjects: Metric Geometry (math.MG); Information Theory (cs.IT); Probability (math.PR)
The entropy of a finite probability space (X) measures the observable
cardinality of large independent products (X^{otimes n}) of the probability
space. If two probability spaces (X) and (Y) have the same entropy, there is an
almost measure-preserving bijection between large parts of (X^{otimes n}) and
(Y^{otimes n}). In this way, (X) and (Y) are asymptotically equivalent.
It turns out to be challenging to generalize this notion of asymptotic
equivalence to configurations of probability spaces, which are collections of
probability spaces with measure-preserving maps between some of them.
In this article we introduce the intrinsic Kolmogorov-Sinai distance on the
space of configurations of probability spaces. Concentrating on the large-scale
geometry we pass to the asymptotic Kolmogorov-Sinai distance. It induces an
asymptotic equivalence relation on sequences of configurations of probability
spaces. We will call the equivalence classes emph{tropical probability
spaces}.
In this context we prove an Asymptotic Equipartition Property for
configurations. It states that tropical configurations can always be
approximated by homogeneous configurations. In addition, we show that the
solutions to certain Information-Optimization problems are
Lipschitz-con-tinuous with respect to the asymptotic Kolmogorov-Sinai
distance. It follows from these two statements that in order to solve an
Information-Optimization problem, it suffices to consider homogeneous
configurations.
Finally, we show that spaces of trajectories of length (n) of certain
stochastic processes, in particular stationary Markov chains, have a tropical
limit.
Renbo Zhao, William B. Haskell, Vincent Y. F. Tan
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Machine Learning (stat.ML)
We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. By
proposing a new framework for analyzing convergence, we theoretically improve
the (linear) convergence rates and computational complexities of the stochastic
L-BFGS algorithms in previous works. In addition, we propose several practical
acceleration strategies to speed up the empirical performance of such
algorithms. We also provide theoretical analyses for most of the strategies.
Experiments on large-scale logistic and ridge regression problems demonstrate
that our proposed strategies yield significant improvements via-`a-vis
competing state-of-the-art algorithms.