Alex Graves, Marc G. Bellemare, Jacob Menick, Remi Munos, Koray Kavukcuoglu
Subjects: Neural and Evolutionary Computing (cs.NE)
We introduce a method for automatically selecting the path, or syllabus, that
a neural network follows through a curriculum so as to maximise learning
efficiency. A measure of the amount that the network learns from each data
sample is provided as a reward signal to a nonstationary multi-armed bandit
algorithm, which then determines a stochastic syllabus. We consider a range of
signals derived from two distinct indicators of learning progress: rate of
increase in prediction accuracy, and rate of increase in network complexity.
Experimental results for LSTM networks on three curricula demonstrate that our
approach can significantly accelerate learning, in some cases halving the time
required to attain a satisfactory performance level.
Asit Mishra, Jeffrey J Cook, Eriko Nurvitadhi, Debbie Marr
Comments: Under submission to CVPR Workshop
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
For computer vision applications, prior works have shown the efficacy of
reducing the numeric precision of model parameters (network weights) in deep
neural networks but also that reducing the precision of activations hurts model
accuracy much more than reducing the precision of model parameters. We study
schemes to train networks from scratch using reduced-precision activations
without hurting the model accuracy. We reduce the precision of activation maps
(along with model parameters) using a novel quantization scheme and increase
the number of filter maps in a layer, and find that this scheme compensates or
surpasses the accuracy of the baseline full-precision network. As a result, one
can significantly reduce the dynamic memory footprint, memory bandwidth,
computational energy and speed up the training and inference process with
appropriate hardware support. We call our scheme WRPN – wide reduced-precision
networks. We report results using our proposed schemes and show that our
results are better than previously reported accuracies on ILSVRC-12 dataset
while being computationally less expensive compared to previously reported
reduced-precision networks.
Carlos Florensa, Yan Duan, Pieter Abbeel
Comments: Published as a conference paper at ICLR 2017
Journal-ref: International Conference on Learning Representations 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Deep reinforcement learning has achieved many impressive results in recent
years. However, tasks with sparse rewards or long horizons continue to pose
significant challenges. To tackle these important problems, we propose a
general framework that first learns useful skills in a pre-training
environment, and then leverages the acquired skills for learning faster in
downstream tasks. Our approach brings together some of the strengths of
intrinsic motivation and hierarchical methods: the learning of useful skill is
guided by a single proxy reward, the design of which requires very minimal
domain knowledge about the downstream tasks. Then a high-level policy is
trained on top of these skills, providing a significant improvement of the
exploration and allowing to tackle sparse rewards in the downstream tasks. To
efficiently pre-train a large span of skills, we use Stochastic Neural Networks
combined with an information-theoretic regularizer. Our experiments show that
this combination is effective in learning a wide span of interpretable skills
in a sample-efficient way, and can significantly boost the learning performance
uniformly across a wide range of downstream tasks.
Yu-Wei Chao, Jimei Yang, Brian Price, Scott Cohen, Jia Deng
Comments: Accepted in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents the first study on forecasting human dynamics from static
images. The problem is to input a single RGB image and generate a sequence of
upcoming human body poses in 3D. To address the problem, we propose the 3D Pose
Forecasting Network (3D-PFNet). Our 3D-PFNet integrates recent advances on
single-image human pose estimation and sequence prediction, and converts the 2D
predictions into 3D space. We train our 3D-PFNet using a three-step training
strategy to leverage a diverse source of training data, including image and
video based human pose datasets and 3D motion capture (MoCap) data. We
demonstrate competitive performance of our 3D-PFNet on 2D pose forecasting and
3D pose recovery through quantitative and qualitative results.
Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta
Comments: CVPR 2017 Camera Ready
Subjects: Computer Vision and Pattern Recognition (cs.CV)
How do we learn an object detector that is invariant to occlusions and
deformations? Our current solution is to use a data-driven strategy — collect
large-scale datasets which have object instances under different conditions.
The hope is that the final classifier can use these examples to learn
invariances. But is it really possible to see all the occlusions in a dataset?
We argue that like categories, occlusions and object deformations also follow a
long-tail. Some occlusions and deformations are so rare that they hardly
happen; yet we want to learn a model invariant to such occurrences. In this
paper, we propose an alternative solution. We propose to learn an adversarial
network that generates examples with occlusions and deformations. The goal of
the adversary is to generate examples that are difficult for the object
detector to classify. In our framework both the original detector and adversary
are learned in a joint manner. Our experimental results indicate a 2.3% mAP
boost on VOC07 and a 2.6% mAP boost on VOC2012 object detection challenge
compared to the Fast-RCNN pipeline. We also release the code for this paper.
Pim Moeskops, Jelmer M. Wolterink, Bas H.M. van der Velden, Kenneth G.A. Gilhuijs, Tim Leiner, Max A. Viergever, Ivana Išgum
Journal-ref: Moeskops, P., Wolterink, J.M., van der Velden, B.H.M., Gilhuijs,
K.G.A., Leiner, T., Viergever, M.A., Iv{s}gum, I. Deep learning for
multi-task medical image segmentation in multiple modalities. In: MICCAI
2016, pp. 478-486
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic segmentation of medical images is an important task for many
clinical applications. In practice, a wide range of anatomical structures are
visualised using different imaging modalities. In this paper, we investigate
whether a single convolutional neural network (CNN) can be trained to perform
different segmentation tasks.
A single CNN is trained to segment six tissues in MR brain images, the
pectoral muscle in MR breast images, and the coronary arteries in cardiac CTA.
The CNN therefore learns to identify the imaging modality, the visualised
anatomical structures, and the tissue classes.
For each of the three tasks (brain MRI, breast MRI and cardiac CTA), this
combined training procedure resulted in a segmentation performance equivalent
to that of a CNN trained specifically for that task, demonstrating the high
capacity of CNN architectures. Hence, a single system could be used in clinical
practice to automatically perform diverse segmentation tasks without
task-specific training.
Mieczysław A. Kłopotek
Journal-ref: Preliminaru version of the paper M.A. K{l}opotek: Reconstruction
of 3-D rigid smooth curves moving free when two traceable points only are
available. Machine Graphics & Vision 1(1992)1-2, pp. 392-405
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
This paper extends previous research in that sense that for orthogonal
projections of rigid smooth (true-3D) curves moving totally free it reduces the
number of required traceable points to two only (the best results known so far
to the author are 3 points from free motion and 2 for motion restricted to
rotation around a fixed direction and and 2 for motion restricted to influence
of a homogeneous force field). The method used is exploitation of information
on tangential projections. It discusses also possibility of simplification of
reconstruction of flat curves moving free for prospective projections.
Yu Liu, Junjie Yan, Wanli Ouyang
Comments: Accepted at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
This paper targets on the problem of set to set recognition, which learns the
metric between two image sets. Images in each set belong to the same identity.
Since images in a set can be complementary, they hopefully lead to higher
accuracy in practical applications. However, the quality of each sample cannot
be guaranteed, and samples with poor quality will hurt the metric. In this
paper, the quality aware network (QAN) is proposed to confront this problem,
where the quality of each sample can be automatically learned although such
information is not explicitly provided in the training stage. The network has
two branches, where the first branch extracts appearance feature embedding for
each sample and the other branch predicts quality score for each sample.
Features and quality scores of all samples in a set are then aggregated to
generate the final feature embedding. We show that the two branches can be
trained in an end-to-end manner given only the set-level identity annotation.
Analysis on gradient spread of this mechanism indicates that the quality
learned by the network is beneficial to set-to-set recognition and simplifies
the distribution that the network needs to fit. Experiments on both face
verification and person re-identification show advantages of the proposed QAN.
The source code and network structure can be downloaded at
this https URL
Ruth Fong, Andrea Vedaldi
Comments: 9 pages, 10 figures, submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
As machine learning algorithms are increasingly applied to high impact yet
high risk tasks, e.g. problems in health, it is critical that researchers can
explain how such algorithms arrived at their predictions. In recent years, a
number of image saliency methods have been developed to summarize where highly
complex neural networks “look” in an image for evidence for their predictions.
However, these techniques are limited by their heuristic nature and
architectural constraints.
In this paper, we make two main contributions: First, we propose a general
framework for learning different kinds of explanations for any black box
algorithm. Second, we introduce a paradigm that learns the minimally salient
part of an image by directly editing it and learning from the corresponding
changes to its output. Unlike previous works, our method is model-agnostic and
testable because it is grounded in replicable image perturbations.
Pim Moeskops, Max A. Viergever, Adriënne M. Mendrik, Linda S. de Vries, Manon J.N.L. Benders, Ivana Išgum
Journal-ref: IEEE Transactions on Medical Imaging, 35(5), 1252-1261 (2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic segmentation in MR brain images is important for quantitative
analysis in large-scale studies with images acquired at all ages.
This paper presents a method for the automatic segmentation of MR brain
images into a number of tissue classes using a convolutional neural network. To
ensure that the method obtains accurate segmentation details as well as spatial
consistency, the network uses multiple patch sizes and multiple convolution
kernel sizes to acquire multi-scale information about each voxel. The method is
not dependent on explicit features, but learns to recognise the information
that is important for the classification based on training data. The method
requires a single anatomical MR image only.
The segmentation method is applied to five different data sets: coronal
T2-weighted images of preterm infants acquired at 30 weeks postmenstrual age
(PMA) and 40 weeks PMA, axial T2- weighted images of preterm infants acquired
at 40 weeks PMA, axial T1-weighted images of ageing adults acquired at an
average age of 70 years, and T1-weighted images of young adults acquired at an
average age of 23 years. The method obtained the following average Dice
coefficients over all segmented tissue classes for each data set, respectively:
0.87, 0.82, 0.84, 0.86 and 0.91.
The results demonstrate that the method obtains accurate segmentations in all
five sets, and hence demonstrates its robustness to differences in age and
acquisition protocol.
Tae Hyun Kim, Kyoung Mu Lee, Bernhard Schölkopf, Michael Hirsch
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
State-of-the-art video deblurring methods are capable of removing non-uniform
blur caused by unwanted camera shake and/or object motion in dynamic scenes.
However, most existing methods are based on batch processing and thus need
access to all recorded frames, rendering them computationally demanding and
time consuming and thus limiting their practical use. In contrast, we propose
an online (sequential) video deblurring method based on a spatio-temporal
recurrent network that allows for real-time performance. In particular, we
introduce a novel architecture which extends the receptive field while keeping
the overall size of the network small to enable fast execution. In doing so,
our network is able to remove even large blur caused by strong camera shake
and/or fast moving objects. Furthermore, we propose a novel network layer that
enforces temporal consistency between consecutive frames by dynamic temporal
blending which compares and adaptively (at test time) shares features obtained
at different time steps. We show the superiority of the proposed method in an
extensive experimental evaluation.
Liyuan Pan, Yuchao Dai, Miaomiao Liu, Fatih Porikli
Comments: Accepted to IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Videos for outdoor scene often show unpleasant blur effects due to the large
relative motion between the camera and the dynamic objects and large depth
variations. Existing works typically focus monocular video deblurring. In this
paper, we propose a novel approach to deblurring from stereo videos. In
particular, we exploit the piece-wise planar assumption about the scene and
leverage the scene flow information to deblur the image. Unlike the existing
approach [31] which used a pre-computed scene flow, we propose a single
framework to jointly estimate the scene flow and deblur the image, where the
motion cues from scene flow estimation and blur information could reinforce
each other, and produce superior results than the conventional scene flow
estimation or stereo deblurring methods. We evaluate our method extensively on
two available datasets and achieve significant improvement in flow estimation
and removing the blur effect over the state-of-the-art methods.
Kai Zhang, Wangmeng Zuo, Shuhang Gu, Lei Zhang
Comments: Accepted to CVPR 2017. Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Model-based optimization methods and discriminative learning methods have
been the two dominant strategies for solving various inverse problems in
low-level vision. Typically, those two kinds of methods have their respective
merits and drawbacks, e.g., model-based optimization methods are flexible for
handling different inverse problems but are usually time-consuming with
sophisticated priors for the purpose of good performance; in the meanwhile,
discriminative learning methods have fast testing speed but their application
range is greatly restricted by the specialized task. Recent works have revealed
that, with the aid of variable splitting techniques, denoiser prior can be
plugged in as a modular part of model-based optimization methods to solve other
inverse problems (e.g., deblurring). Such an integration induces considerable
advantage when the denoiser is obtained via discriminative learning. However,
the study of integration with fast discriminative denoiser prior is still
lacking. To this end, this paper aims to train a set of fast and effective CNN
(convolutional neural network) denoisers and integrate them into model-based
optimization method to solve other inverse problems. Experimental results
demonstrate that the learned set of denoisers not only achieve promising
Gaussian denoising results but also can be used as prior to deliver good
performance for various low-level vision applications.
Lukas Mosser, Olivier Dubrule, Martin J. Blunt
Comments: 21 pages, 20 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Materials Science (cond-mat.mtrl-sci); Fluid Dynamics (physics.flu-dyn); Geophysics (physics.geo-ph)
To evaluate the variability of multi-phase flow properties of porous media at
the pore scale, it is necessary to acquire a number of representative samples
of the void-solid structure. While modern x-ray computer tomography has made it
possible to extract three-dimensional images of the pore space, assessment of
the variability in the inherent material properties is often experimentally not
feasible. We present a novel method to reconstruct the solid-void structure of
porous media by applying a generative neural network that allows an implicit
description of the probability distribution represented by three-dimensional
image datasets. We show, by using an adversarial learning approach for neural
networks, that this method of unsupervised learning is able to generate
representative samples of porous media that honor their statistics. We
successfully compare measures of pore morphology, such as the Euler
characteristic, two-point statistics and directional single-phase permeability
of synthetic realizations with the calculated properties of a bead pack, Berea
sandstone, and Ketton limestone. Results show that GANs can be used to
reconstruct high-resolution three-dimensional images of porous media at
different scales that are representative of the morphology of the images used
to train the neural network. The fully convolutional nature of the trained
neural network allows the generation of large samples while maintaining
computational efficiency. Compared to classical stochastic methods of image
reconstruction, the implicit representation of the learned data distribution
can be stored and reused to generate multiple realizations of the pore
structure very rapidly.
Yuanwei Li
Comments: This work was finished in August 2016 and then submitted to IEEE PAMI in August 17,2016 and submitted to IEEE TIP in April 9,2017 after revising
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Initializing optical flow field by either sparse descriptor matching or dense
patch matches has been proved to be particularly useful for capturing large
displacements. In this paper, we present a pyramidal gradient matching approach
that can provide dense matches for highly accurate and efficient optical flow
estimation. A novel contribution of our method is that image gradient is used
to describe image patches and proved to be able to produce robust matching.
Therefore, our method is more efficient than methods that adopt special
features (like SIFT) or patch distance metric. Moreover, we find that image
gradient is scalable for optical flow estimation, which means we can use
different levels of gradient feature (for example, full gradients or only
direction information of gradients) to obtain different complexity without
dramatic changes in accuracy. Another contribution is that we uncover the
secrets of limited PatchMatch through a thorough analysis and design a
pyramidal matching framework based these secrets. Our pyramidal matching
framework is aimed at robust gradient matching and effective to grow inliers
and reject outliers. In this framework, we present some special enhancements
for outlier filtering in gradient matching. By initializing EpicFlow with our
matches, experimental results show that our method is efficient and robust
(ranking 1st on both clean pass and final pass of MPI Sintel dataset among
published methods).
Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu
Comments: Published in CVPR 2017
Journal-ref: Quanshi Zhang, Ruiming Cao, Ying Nian Wu, and Song-Chun Zhu,
“Mining Object Parts from CNNs via Active Question-Answering” in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Given a convolutional neural network (CNN) that is pre-trained for object
classification, this paper proposes to use active question-answering to
semanticize neural patterns in conv-layers of the CNN and mine part concepts.
For each part concept, we mine neural patterns in the pre-trained CNN, which
are related to the target part, and use these patterns to construct an And-Or
graph (AOG) to represent a four-layer semantic hierarchy of the part. As an
interpretable model, the AOG associates different CNN units with different
explicit object parts. We use an active human-computer communication to
incrementally grow such an AOG on the pre-trained CNN as follows. We allow the
computer to actively identify objects, whose neural patterns cannot be
explained by the current AOG. Then, the computer asks human about the
unexplained objects, and uses the answers to automatically discover certain CNN
patterns corresponding to the missing knowledge. We incrementally grow the AOG
to encode new knowledge discovered during the active-learning process. In
experiments, our method exhibits high learning efficiency. Our method uses
about 1/6-1/3 of the part annotations for training, but achieves similar or
better part-localization performance than fast-RCNN methods.
Vahid Kazemi, Ali Elqursh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a new baseline for visual question answering task. Given
an image and a question in natural language, our model produces accurate
answers according to the content of the image. Our model, while being
architecturally simple and relatively small in terms of trainable parameters,
sets a new state of the art on both unbalanced and balanced VQA benchmark. On
VQA 1.0 open ended challenge, our model achieves 64.6% accuracy on the
test-standard set without using additional data, an improvement of 0.4% over
state of the art, and on newly released VQA 2.0, our model scores 59.7% on
validation set outperforming best previously reported results by 4%. The
results presented in this paper are especially interesting because very similar
models have been tried before but significantly lower performance were
reported. In light of the new results we hope to see more meaningful research
on visual question answering in the future.
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Previous approaches for scene text detection have already achieved promising
performances across various benchmarks. However, they usually fall short when
dealing with challenging scenarios, even when equipped with deep neural network
models, because the overall performance is determined by the interplay of
multiple stages and components in the pipelines. In this work, we propose a
simple yet powerful pipeline that yields fast and accurate text detection in
natural scenes. The pipeline directly predicts words or text lines of arbitrary
orientations and quadrilateral shapes in full images, eliminating unnecessary
intermediate steps (e.g., candidate aggregation and word partitioning), with a
single neural network. The simplicity of our pipeline allows concentrating
efforts on designing loss functions and neural network architecture.
Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500
demonstrate that the proposed algorithm significantly outperforms
state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR
2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps
at 720p resolution.
Xitong Yang, Palghat Ramesh, Radha Chitta, Sriganesh Madhvanath, Edgar A. Bernal, Jiebo Luo
Comments: To appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, Deep Learning has been successfully applied to multimodal
learning problems, with the aim of learning useful joint representations in
data fusion applications. When the available modalities consist of time series
data such as video, audio and sensor signals, it becomes imperative to consider
their temporal structure during the fusion process. In this paper, we propose
the Correlational Recurrent Neural Network (CorrRNN), a novel temporal fusion
model for fusing multiple input modalities that are inherently temporal in
nature. Key features of our proposed model include: (i) simultaneous learning
of the joint representation and temporal dependencies between modalities, (ii)
use of multiple loss terms in the objective function, including a maximum
correlation loss term to enhance learning of cross-modal information, and (iii)
the use of an attention model to dynamically adjust the contribution of
different input modalities to the joint representation. We validate our model
via experimentation on two different tasks: video- and sensor-based activity
classification, and audio-visual speech recognition. We empirically analyze the
contributions of different components of the proposed CorrRNN model, and
demonstrate its robustness, effectiveness and state-of-the-art performance on
multiple datasets.
Chun Pong Lau, Yu Hin Lai, Lok Ming Lui
Comments: 21 pages, 24 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG); Graphics (cs.GR)
We address the problem of restoring a high-quality image from an observed
image sequence strongly distorted by atmospheric turbulence. A novel algorithm
is proposed in this paper to reduce geometric distortion as well as
space-and-time-varying blur due to strong turbulence. By considering a suitable
energy functional, our algorithm first obtains a sharp reference image and a
subsampled image sequence containing sharp and mildly distorted image frames
with respect to the reference image. The subsampled image sequence is then
stabilized by applying the Robust Principal Component Analysis (RPCA) on the
deformation fields between image frames and warping the image frames by a
quasiconformal map associated with the low-rank part of the deformation matrix.
After image frames are registered to the reference image, the low-rank part of
them are deblurred via a blind deconvolution, and the deblurred frames are then
fused with the enhanced sparse part. Experiments have been carried out on both
synthetic and real turbulence-distorted video. Results demonstrate that our
method is effective in alleviating distortions and blur, restoring image
details and enhancing visual quality.
Yuncheng Li, Yale Song, Jiebo Luo
Comments: cvpr 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Learning to rank has recently emerged as an attractive technique to train
deep convolutional neural networks for various computer vision tasks. Pairwise
ranking, in particular, has been successful in multi-label image
classification, achieving state-of-the-art results on various benchmarks.
However, most existing approaches use the hinge loss to train their models,
which is non-smooth and thus is difficult to optimize especially with deep
networks. Furthermore, they employ simple heuristics, such as top-k or
thresholding, to determine which labels to include in the output from a ranked
list of labels, which limits their use in the real-world setting. In this work,
we propose two techniques to improve pairwise ranking based multi-label image
classification: (1) we propose a novel loss function for pairwise ranking,
which is smooth everywhere and thus is easier to optimize; and (2) we
incorporate a label decision module into the model, estimating the optimal
confidence thresholds for each visual concept. We provide theoretical analyses
of our loss function in the Bayes consistency and risk minimization framework,
and show its benefit over existing pairwise ranking formulations. We
demonstrate the effectiveness of our approach on three large-scale datasets,
VOC2007, NUS-WIDE and MS-COCO, achieving the best reported results in the
Jose Dolz, Ismail Ben Ayed, Christian Desrosiers
Comments: Accepted at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We formulate an Alternating Direction Method of Mul-tipliers (ADMM) that
systematically distributes the computations of any technique for optimizing
pairwise functions, including non-submodular potentials. Such discrete
functions are very useful in segmentation and a breadth of other vision
problems. Our method decomposes the problem into a large set of small
sub-problems, each involving a sub-region of the image domain, which can be
solved in parallel. We achieve consistency between the sub-problems through a
novel constraint that can be used for a large class of pair-wise functions. We
give an iterative numerical solution that alternates between solving the
sub-problems and updating consistency variables, until convergence. We report
comprehensive experiments, which demonstrate the benefit of our general
distributed solution in the case of the popular serial algorithm of Boykov and
Kolmogorov (BK algorithm) and, also, in the context of non-submodular
Bo Dai, Yuqi Zhang, Dahua Lin
Comments: To be appeared in CVPR 2017 as an oral paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Relationships among objects play a crucial role in image understanding.
Despite the great success of deep learning techniques in recognizing individual
objects, reasoning about the relationships among objects remains a challenging
task. Previous methods often treat this as a classification problem,
considering each type of relationship (e.g. “ride”) or each distinct visual
phrase (e.g. “person-ride-horse”) as a category. Such approaches are faced with
significant difficulties caused by the high diversity of visual appearance for
each kind of relationships or the large number of distinct visual phrases. We
propose an integrated framework to tackle this problem. At the heart of this
framework is the Deep Relational Network, a novel formulation designed
specifically for exploiting the statistical dependencies between objects and
their relationships. On two large datasets, the proposed method achieves
substantial improvement over state-of-the-art.
Dario Prandi, Jean-Paul Gauthier
Comments: 118 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Analysis of PDEs (math.AP); Representation Theory (math.RT)
In his beautiful book [54], Jean Petitot proposes a subriemannian model for
the primary visual cortex of mammals. This model is neurophysiologically
justified. Further developments of this theory lead to efficient algorithms for
image reconstruction, based upon the consideration of an associated
hypoelliptic diffusion. The subriemannian model of Petitot (or certain of its
improvements) is a left-invariant structure over the group (SE(2)) of
rototranslations of the plane. Here, we propose a semi-discrete version of this
theory, leading to a left-invariant structure over the group (SE(2,N)),
restricting to a finite number of rotations. This apparently very simple group
is in fact quite atypical: it is maximally almost periodic, which leads to much
simpler harmonic analysis compared to (SE(2).) Based upon this semi-discrete
model, we improve on the image-reconstruction algorithms and we develop a
pattern-recognition theory that leads also to very efficient algorithms in
Wei Li, Farnaz Abitahi, Zhigang Zhu
Comments: The paper is accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Action Unit (AU) detection becomes essential for facial analysis. Many
proposed approaches face challenging problems in dealing with the alignments of
different face regions, in the effective fusion of temporal information, and in
training a model for multiple AU labels. To better address these problems, we
propose a deep learning framework for AU detection with region of interest
(ROI) adaptation, integrated multi-label learning, and optimal LSTM-based
temporal fusing. First, ROI cropping nets (ROI Nets) are designed to make sure
specifically interested regions of faces are learned independently; each
sub-region has a local convolutional neural network (CNN) – an ROI Net, whose
convolutional filters will only be trained for the corresponding region.
Second, multi-label learning is employed to integrate the outputs of those
individual ROI cropping nets, which learns the inter-relationships of various
AUs and acquires global features across sub-regions for AU detection. Finally,
the optimal selection of multiple LSTM layers to form the best LSTM Net is
carried out to best fuse temporal features, in order to make the AU prediction
the most accurate. The proposed approach is evaluated on two popular AU
detection datasets, BP4D and DISFA, outperforming the state of the art
significantly, with an average improvement of around 13% on BP4D and 25% on
DISFA, respectively.
Tianmin Shu, Sinisa Todorovic, Song-Chun Zhu
Comments: Accepted to IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
This work is about recognizing human activities occurring in videos at
distinct semantic levels, including individual actions, interactions, and group
activities. The recognition is realized using a two-level hierarchy of Long
Short-Term Memory (LSTM) networks, forming a feed-forward deep architecture,
which can be trained end-to-end. In comparison with existing architectures of
LSTMs, we make two key contributions giving the name to our approach as
Confidence-Energy Recurrent Network — CERN. First, instead of using the common
softmax layer for prediction, we specify a novel energy layer (EL) for
estimating the energy of our predictions. Second, rather than finding the
common minimum-energy class assignment, which may be numerically unstable under
uncertainty, we specify that the EL additionally computes the p-values of the
solutions, and in this way estimates the most confident energy minimum. The
evaluation on the Collective Activity and Volleyball datasets demonstrates: (i)
advantages of our two contributions relative to the common softmax and
energy-minimization formulations and (ii) a superior performance relative to
the state-of-the-art approaches.
Samet Hicsonmez, Nermin Samet, Fadime Sener, Pinar Duygulu
Comments: ACM ICMR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper is motivated from a young boy’s capability to recognize an
illustrator’s style in a totally different context. In the book “We are All
Born Free” [1], composed of selected rights from the Universal Declaration of
Human Rights interpreted by different illustrators, the boy was surprised to
see a picture similar to the ones in the “Winnie the Witch” series drawn by
Korky Paul (Figure 1). The style was noticeable in other characters of the same
illustrator in different books as well. The capability of a child to easily
spot the style was shown to be valid for other illustrators such as Axel
Scheffler and Debi Gliori. The boy’s enthusiasm let us to start the journey to
explore the capabilities of machines to recognize the style of illustrators.
We collected pages from children’s books to construct a new illustrations
dataset consisting of about 6500 pages from 24 artists. We exploited deep
networks for categorizing illustrators and with around 94% classification
performance our method over-performed the traditional methods by more than 10%.
Going beyond categorization we explored transferring style. The classification
performance on the transferred images has shown the ability of our system to
capture the style. Furthermore, we discovered representative illustrations and
discriminative stylistic elements.
Pedro Morgado, Nuno Vasconcelos
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
The role of semantics in zero-shot learning is considered. The effectiveness
of previous approaches is analyzed according to the form of supervision
provided. While some learn semantics independently, others only supervise the
semantic subspace explained by training classes. Thus, the former is able to
constrain the whole space but lacks the ability to model semantic correlations.
The latter addresses this issue but leaves part of the semantic space
unsupervised. This complementarity is exploited in a new convolutional neural
network (CNN) framework, which proposes the use of semantics as constraints for
recognition.Although a CNN trained for classification has no transfer ability,
this can be encouraged by learning an hidden semantic layer together with a
semantic code for classification. Two forms of semantic constraints are then
introduced. The first is a loss-based regularizer that introduces a
generalization constraint on each semantic predictor. The second is a codeword
regularizer that favors semantic-to-class mappings consistent with prior
semantic knowledge while allowing these to be learned from data. Significant
improvements over the state-of-the-art are achieved on several datasets.
Zuxuan Wu, Larry S. Davis, Leonid Sigal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We explore the power of spatial context as a self-supervisory signal for
learning visual representations. In particular, we propose spatial context
networks that learn to predict a representation of one image patch from another
image patch, within the same image, conditioned on their real-valued relative
spatial offset. Unlike auto-encoders, that aim to encode and reconstruct
original image patches, our network aims to encode and reconstruct intermediate
representations of the spatially offset patches. As such, the network learns a
spatially conditioned contextual representation. By testing performance with
various patch selection mechanisms we show that focusing on object-centric
patches is important, and that using object proposal as a patch selection
mechanism leads to the highest improvement in performance. Further, unlike
auto-encoders, context encoders [21], or other forms of unsupervised feature
learning, we illustrate that contextual supervision (with pre-trained model
initialization) can improve on existing pre-trained model performance. We build
our spatial context networks on top of standard VGG_19 and CNN_M architectures
and, among other things, show that we can achieve improvements (with no
additional explicit supervision) over the original ImageNet pre-trained VGG_19
and CNN_M models in object categorization and detection on VOC2007.
Majid Mohammadi, Wout Hofman, Yaohua Tan, S. Hamid Mousavi
Comments: 5 pages
Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV)
In this paper, an equivalent smooth minimization for the L1 regularized least
square problem is proposed. The proposed problem is a convex box-constrained
smooth minimization which allows applying fast optimization methods to find its
solution. Further, it is investigated that the property “the dual of dual is
primal” holds for the L1 regularized least square problem. A solver for the
smooth problem is proposed, and its affinity to the proximal gradient is shown.
Finally, the experiments on L1 and total variation regularized problems are
performed, and the corresponding results are reported.
Asit Mishra, Jeffrey J Cook, Eriko Nurvitadhi, Debbie Marr
Comments: Under submission to CVPR Workshop
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
For computer vision applications, prior works have shown the efficacy of
reducing the numeric precision of model parameters (network weights) in deep
neural networks but also that reducing the precision of activations hurts model
accuracy much more than reducing the precision of model parameters. We study
schemes to train networks from scratch using reduced-precision activations
without hurting the model accuracy. We reduce the precision of activation maps
(along with model parameters) using a novel quantization scheme and increase
the number of filter maps in a layer, and find that this scheme compensates or
surpasses the accuracy of the baseline full-precision network. As a result, one
can significantly reduce the dynamic memory footprint, memory bandwidth,
computational energy and speed up the training and inference process with
appropriate hardware support. We call our scheme WRPN – wide reduced-precision
networks. We report results using our proposed schemes and show that our
results are better than previously reported accuracies on ILSVRC-12 dataset
while being computationally less expensive compared to previously reported
reduced-precision networks.
Malika Bendechache, Nhien-An Le-Khac, M-Tahar Kechadi
Subjects: Artificial Intelligence (cs.AI)
Clustering techniques are very attractive for extracting and identifying
patterns in datasets. However, their application to very large spatial datasets
presents numerous challenges such as high-dimensionality data, heterogeneity,
and high complexity of some algorithms. For instance, some algorithms may have
linear complexity but they require the domain knowledge in order to determine
their input parameters. Distributed clustering techniques constitute a very
good alternative to the big data challenges (e.g.,Volume, Variety, Veracity,
and Velocity). Usually these techniques consist of two phases. The first phase
generates local models or patterns and the second one tends to aggregate the
local results to obtain global models. While the first phase can be executed in
parallel on each site and, therefore, efficient, the aggregation phase is
complex, time consuming and may produce incorrect and ambiguous global clusters
and therefore incorrect models. In this paper we propose a new distributed
clustering approach to deal efficiently with both phases; generation of local
results and generation of global models by aggregation. For the first phase,
our approach is capable of analysing the datasets located in each site using
different clustering techniques. The aggregation phase is designed in such a
way that the final clusters are compact and accurate while the overall process
is efficient in time and memory allocation. For the evaluation, we use two
well-known clustering algorithms; K-Means and DBSCAN. One of the key outputs of
this distributed clustering technique is that the number of global clusters is
dynamic; no need to be fixed in advance. Experimental results show that the
approach is scalable and produces high quality results.
Hwiyeol Jo, Soo-Min Kim, Jeong Ryu
Comments: Rejected Paper in CogSci2017. I’m sure there is no place for integrated research. arXiv admin note: text overlap with arXiv:1607.03707
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
As the first step to model emotional state of a person, we build sentiment
analysis models with existing deep neural network algorithms and compare the
models with psychological measurements to enlighten the relationship. In the
experiments, we first examined psychological state of 64 participants and asked
them to summarize the story of a book, Chronicle of a Death Foretold (Marquez,
1981). Secondly, we trained models using crawled 365,802 movie review data;
then we evaluated participants’ summaries using the pretrained model as a
concept of transfer learning. With the background that emotion affects on
memories, we investigated the relationship between the evaluation score of the
summaries from computational models and the examined psychological
measurements. The result shows that although CNN performed the best among other
deep neural network algorithms (LSTM, GRU), its results are not related to the
psychological state. Rather, GRU shows more explainable results depending on
the psychological state. The contribution of this paper can be summarized as
follows: (1) we enlighten the relationship between computational models and
psychological measurements. (2) we suggest this framework as objective methods
to evaluate the emotion; the real sentiment analysis of a person.
Quoc Duy Vo, Jaya Thomas, Shinyoung Cho, Pradipta De, Bong Jun Choi, Lee Sael
Comments: 11 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI)
Business Intelligence and Analytics (BI&A) is the process of extracting and
predicting business-critical insights from data. Traditional BI focused on data
collection, extraction, and organization to enable efficient query processing
for deriving insights from historical data. With the rise of big data and cloud
computing, there are many challenges and opportunities for the BI. Especially
with the growing number of data sources, traditional BI&A are evolving to
provide intelligence at different scales and perspectives – operational BI,
situational BI, self-service BI. In this survey, we review the evolution of
business intelligence systems in full scale from back-end architecture to and
front-end applications. We focus on the changes in the back-end architecture
that deals with the collection and organization of the data. We also review the
changes in the front-end applications, where analytic services and
visualization are the core components. Using a uses case from BI in Healthcare,
which is one of the most complex enterprises, we show how BI&A will play an
important role beyond the traditional usage. The survey provides a holistic
view of Business Intelligence and Analytics for anyone interested in getting a
complete picture of the different pieces in the emerging next generation BI&A
Shahab Ebrahimi
Comments: 13 pages
Journal-ref: International Journal of Artificial Intelligence and Applications
(IJAIA), Vol.8, No.2, March 2017
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
The AGM model is the most remarkable framework for modeling belief revision.
However, it is not perfect in all aspects. Paraconsistent belief revision,
multi-agent belief revision and non-prioritized belief revision are three
different extensions to AGM to address three important criticisms applied to
it. In this article, we propose a framework based on AGM that takes a position
in each of these categories. Also, we discuss some features of our framework
and study the satisfiability of AGM postulates in this new context.
Mieczysław A. Kłopotek
Comments: Draft for the conference M.A. K{l}opotek: Beliefs and Probability in Bacchus’ l.p. Logic: A 3-Valued Logic Solution to Apparent Counter-intuition. [in:] R. Trappl Ed,: Cybernetics and Systems Research. Proc. 11 European Meeting on Cybernetics and System Research EMCSR’92, Wien, Osterreich, 20. April 1992. World Scientific Singapore, New Jersey, London, HongKong Vol. 1, pp. 519-526
Subjects: Artificial Intelligence (cs.AI)
Fundamental discrepancy between first order logic and statistical inference
(global versus local properties of universe) is shown to be the obstacle for
integration of logic and probability in L.p. logic of Bacchus. To overcome the
counterintuitiveness of L.p. behaviour, a 3-valued logic is proposed.
Ralf Mikut, Andreas Bartschat, Wolfgang Doneit, Jorge Ángel González Ordiano, Benjamin Schott, Johannes Stegmaier, Simon Waczowicz, Markus Reischl
Subjects: Artificial Intelligence (cs.AI)
The Matlab toolbox SciXMiner is designed for the visualization and analysis
of time series and features with a special focus to classification problems. It
was developed at the Institute of Applied Computer Science of the Karlsruhe
Institute of Technology (KIT), a member of the Helmholtz Association of German
Research Centres in Germany. The aim was to provide an open platform for the
development and improvement of data mining methods and its applications to
various medical and technical problems. SciXMiner bases on Matlab (tested for
the version 2017a). Many functions do not require additional standard toolboxes
but some parts of Signal, Statistics and Wavelet toolboxes are used for special
cases. The decision to a Matlab-based solution was made to use the wide
mathematical functionality of this package provided by The Mathworks Inc.
SciXMiner is controlled by a graphical user interface (GUI) with menu items and
control elements like popup lists, checkboxes and edit elements. This makes it
easier to work with SciXMiner for inexperienced users. Furthermore, an
automatization and batch standardization of analyzes is possible using macros.
The standard Matlab style using the command line is also available. SciXMiner
is an open source software. The download page is
this http URL It is licensed under the conditions
of the GNU General Public License (GNU-GPL) of The Free Software Foundation.
Luigi Troiano, Irene Díaz, Ciro Gaglione
Comments: FUZZ-IEEE 2017. 6 pages, 3 figures, 4 tables
Subjects: Artificial Intelligence (cs.AI)
The media industry is increasingly personalizing the offering of contents in
attempt to better target the audience. This requires to analyze the
relationships that goes established between users and content they enjoy,
looking at one side to the content characteristics and on the other to the user
profile, in order to find the best match between the two. In this paper we
suggest to build that relationship using the Dempster-Shafer’s Theory of
Evidence, proposing a reference model and illustrating its properties by means
of a toy example. Finally we suggest possible applications of the model for
tasks that are common in the modern media industry.
Carlos Florensa, Yan Duan, Pieter Abbeel
Comments: Published as a conference paper at ICLR 2017
Journal-ref: International Conference on Learning Representations 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Deep reinforcement learning has achieved many impressive results in recent
years. However, tasks with sparse rewards or long horizons continue to pose
significant challenges. To tackle these important problems, we propose a
general framework that first learns useful skills in a pre-training
environment, and then leverages the acquired skills for learning faster in
downstream tasks. Our approach brings together some of the strengths of
intrinsic motivation and hierarchical methods: the learning of useful skill is
guided by a single proxy reward, the design of which requires very minimal
domain knowledge about the downstream tasks. Then a high-level policy is
trained on top of these skills, providing a significant improvement of the
exploration and allowing to tackle sparse rewards in the downstream tasks. To
efficiently pre-train a large span of skills, we use Stochastic Neural Networks
combined with an information-theoretic regularizer. Our experiments show that
this combination is effective in learning a wide span of interpretable skills
in a sample-efficient way, and can significantly boost the learning performance
uniformly across a wide range of downstream tasks.
Yu Liu, Junjie Yan, Wanli Ouyang
Comments: Accepted at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
This paper targets on the problem of set to set recognition, which learns the
metric between two image sets. Images in each set belong to the same identity.
Since images in a set can be complementary, they hopefully lead to higher
accuracy in practical applications. However, the quality of each sample cannot
be guaranteed, and samples with poor quality will hurt the metric. In this
paper, the quality aware network (QAN) is proposed to confront this problem,
where the quality of each sample can be automatically learned although such
information is not explicitly provided in the training stage. The network has
two branches, where the first branch extracts appearance feature embedding for
each sample and the other branch predicts quality score for each sample.
Features and quality scores of all samples in a set are then aggregated to
generate the final feature embedding. We show that the two branches can be
trained in an end-to-end manner given only the set-level identity annotation.
Analysis on gradient spread of this mechanism indicates that the quality
learned by the network is beneficial to set-to-set recognition and simplifies
the distribution that the network needs to fit. Experiments on both face
verification and person re-identification show advantages of the proposed QAN.
The source code and network structure can be downloaded at
this https URL
Ruth Fong, Andrea Vedaldi
Comments: 9 pages, 10 figures, submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
As machine learning algorithms are increasingly applied to high impact yet
high risk tasks, e.g. problems in health, it is critical that researchers can
explain how such algorithms arrived at their predictions. In recent years, a
number of image saliency methods have been developed to summarize where highly
complex neural networks “look” in an image for evidence for their predictions.
However, these techniques are limited by their heuristic nature and
architectural constraints.
In this paper, we make two main contributions: First, we propose a general
framework for learning different kinds of explanations for any black box
algorithm. Second, we introduce a paradigm that learns the minimally salient
part of an image by directly editing it and learning from the corresponding
changes to its output. Unlike previous works, our method is model-agnostic and
testable because it is grounded in replicable image perturbations.
Daniyar Itegulov, John Slaney, Bruno Woltzenlogel Paleo
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
This paper introduces Scavenger, the first theorem prover for pure
first-order logic without equality based on the new conflict resolution
calculus. Conflict resolution has a restricted resolution inference rule that
resembles (a first-order generalization of) unit propagation as well as a rule
for assuming decision literals and a rule for deriving new clauses by (a
first-order generalization of) conflict-driven clause learning.
Benoit Desrochers (DGA-TN), Luc Jaulin (Ensta Bretagne, Lab-Sticc)
Comments: In Proceedings SNR 2017, arXiv:1704.02421
Journal-ref: EPTCS 247, 2017, pp. 34-45
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computational Geometry (cs.CG); Systems and Control (cs.SY)
This papers shows that using separators, which is a pair of two complementary
contractors, we can easily and efficiently solve the localization problem of a
robot with sonar measurements in an unstructured environment. We introduce
separators associated with the Minkowski sum and the Minkowski difference in
order to facilitate the resolution. A test-case is given in order to illustrate
the principle of the approach.
Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, Kam-Fai Wong
Comments: 13 pages, 7 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
In a composite-domain task-completion dialogue system, a conversation agent
often switches among multiple sub-domains before it successfully completes the
task. Given such a scenario, a standard deep reinforcement learning based
dialogue agent may suffer to find a good policy due to the issues such as:
increased state and action spaces, high sample complexity demands, sparse
reward and long horizon, etc. In this paper, we propose to use hierarchical
deep reinforcement learning approach which can operate at different temporal
scales and is intrinsically motivated to attack these problems. Our
hierarchical network consists of two levels: the top-level meta-controller for
subgoal selection and the low-level controller for dialogue policy learning.
Subgoals selected by meta-controller and intrinsic rewards can guide the
controller to effectively explore in the state-action space and mitigate the
spare reward and long horizon problems. Experiments on both simulations and
human evaluation show that our model significantly outperforms flat deep
reinforcement learning agents in terms of success rate, rewards and user
Asit Mishra, Jeffrey J Cook, Eriko Nurvitadhi, Debbie Marr
Comments: Under submission to CVPR Workshop
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
For computer vision applications, prior works have shown the efficacy of
reducing the numeric precision of model parameters (network weights) in deep
neural networks but also that reducing the precision of activations hurts model
accuracy much more than reducing the precision of model parameters. We study
schemes to train networks from scratch using reduced-precision activations
without hurting the model accuracy. We reduce the precision of activation maps
(along with model parameters) using a novel quantization scheme and increase
the number of filter maps in a layer, and find that this scheme compensates or
surpasses the accuracy of the baseline full-precision network. As a result, one
can significantly reduce the dynamic memory footprint, memory bandwidth,
computational energy and speed up the training and inference process with
appropriate hardware support. We call our scheme WRPN – wide reduced-precision
networks. We report results using our proposed schemes and show that our
results are better than previously reported accuracies on ILSVRC-12 dataset
while being computationally less expensive compared to previously reported
reduced-precision networks.
Pedro Morgado, Nuno Vasconcelos
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
The role of semantics in zero-shot learning is considered. The effectiveness
of previous approaches is analyzed according to the form of supervision
provided. While some learn semantics independently, others only supervise the
semantic subspace explained by training classes. Thus, the former is able to
constrain the whole space but lacks the ability to model semantic correlations.
The latter addresses this issue but leaves part of the semantic space
unsupervised. This complementarity is exploited in a new convolutional neural
network (CNN) framework, which proposes the use of semantics as constraints for
recognition.Although a CNN trained for classification has no transfer ability,
this can be encouraged by learning an hidden semantic layer together with a
semantic code for classification. Two forms of semantic constraints are then
introduced. The first is a loss-based regularizer that introduces a
generalization constraint on each semantic predictor. The second is a codeword
regularizer that favors semantic-to-class mappings consistent with prior
semantic knowledge while allowing these to be learned from data. Significant
improvements over the state-of-the-art are achieved on several datasets.
Nikhil Mishra, Pieter Abbeel, Igor Mordatch
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
We introduce a method for learning the dynamics of complex nonlinear systems
based on deep generative models over temporal segments of states and actions.
Unlike dynamics models that operate over individual discrete timesteps, we
learn the distribution over future state trajectories conditioned on past
state, past action, and planned future action trajectories, as well as a latent
prior over action trajectories. Our approach is based on convolutional
autoregressive models and variational autoencoders. It makes stable and
accurate predictions over long horizons for complex, stochastic systems,
effectively expressing uncertainty and modeling the effects of collisions,
sensory noise, and action delays. The learned dynamics model and action prior
can be used for end-to-end, fully differentiable trajectory optimization and
model-based policy optimization, which we use to evaluate the performance and
sample-efficiency of our method.
Etienne Papegnies (LIA), Vincent Labatut (LIA), Richard Dufour (LIA), Georges Linares (LIA)
Journal-ref: International Conference on Computational Linguistics and
Intelligent Text Processing, Apr 2017, Budapest, Hungary. International
Conference on Computational Linguistics and Intelligent Text Processing, 18,
International Conference on Computational Linguistics and Intelligent Text
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Online communities have gained considerable importance in recent years due to
the increasing number of people connected to the Internet. Moderating user
content in online communities is mainly performed manually, and reducing the
workload through automatic methods is of great financial interest for community
maintainers. Often, the industry uses basic approaches such as bad words
filtering and regular expression matching to assist the moderators. In this
article, we consider the task of automatically determining if a message is
abusive. This task is complex since messages are written in a non-standardized
way, including spelling errors, abbreviations, community-specific codes…
First, we evaluate the system that we propose using standard features of online
messages. Then, we evaluate the impact of the addition of pre-processing
strategies, as well as original specific features developed for the community
of an online in-browser strategy game. We finally propose to analyze the
usefulness of this wide range of features using feature selection. This work
can lead to two possible applications: 1) automatically flag potentially
abusive messages to draw the moderator’s attention on a narrow subset of
messages ; and 2) fully automate the moderation process by deciding whether a
message is abusive without any human intervention.
Felix Stahlberg, Bill Byrne
Comments: Submitted to EMNLP 2017
Subjects: Computation and Language (cs.CL)
Ensembling is a well-known technique in neural machine translation (NMT).
Instead of a single neural net, multiple neural nets with the same topology are
trained separately, and the decoder generates predictions by averaging over the
individual models. Ensembling often improves the quality of the generated
translations drastically. However, it is not suitable for production systems
because it is cumbersome and slow. This work aims to reduce the runtime to be
on par with a single system without compromising the translation quality.
First, we show that the ensemble can be unfolded into a single large neural
network which imitates the output of the ensemble system. We show that
unfolding can already improve the runtime in practice since more work can be
done on the GPU. We proceed by describing a set of techniques to shrink the
unfolded network by reducing the dimensionality of layers. On Japanese-English
we report that the resulting network has the size and decoding speed of a
single NMT network but performs on the level of a 3-ensemble system.
Santosh Kumar Bharti, Korra Sathya Babu
Comments: 12 pages, 4 figures
Subjects: Computation and Language (cs.CL)
In recent times, data is growing rapidly in every domain such as news, social
media, banking, education, etc. Due to the excessiveness of data, there is a
need of automatic summarizer which will be capable to summarize the data
especially textual data in original document without losing any critical
purposes. Text summarization is emerged as an important research area in recent
past. In this regard, review of existing work on text summarization process is
useful for carrying out further research. In this paper, recent literature on
automatic keyword extraction and text summarization are presented since text
summarization process is highly depend on keyword extraction. This literature
includes the discussion about different methodology used for keyword extraction
and text summarization. It also discusses about different databases used for
text summarization in several domains along with evaluation matrices. Finally,
it discusses briefly about issues and research challenges faced by researchers
along with future direction.
Zahra Mousavi, Heshaam Faili
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
This paper presents an automated supervised method for Persian wordnet
construction. Using a Persian corpus and a bi-lingual dictionary, the initial
links between Persian words and Princeton WordNet synsets have been generated.
These links will be discriminated later as correct or incorrect by employing
seven features in a trained classification system. The whole method is just a
classification system, which has been trained on a train set containing FarsNet
as a set of correct instances. State of the art results on the automatically
derived Persian wordnet is achieved. The resulted wordnet with a precision of
91.18% includes more than 16,000 words and 22,000 synsets.
Raphael Shu, Hideki Nakayama
Subjects: Computation and Language (cs.CL)
For extended periods of time, sequence generation models rely on beam search
algorithm to generate output sequence. However, the correctness of beam search
degrades when the a model is over-confident about a suboptimal prediction. In
this paper, we propose to perform minimum Bayes-risk (MBR) decoding for some
extra steps at a later stage. In order to speed up MBR decoding, we compute the
Bayes risks on GPU in batch mode. In our experiments, we found that MBR
reranking works with a large beam size. Later-stage MBR decoding is shown to
outperform simple MBR reranking in machine translation tasks.
Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, Kam-Fai Wong
Comments: 13 pages, 7 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
In a composite-domain task-completion dialogue system, a conversation agent
often switches among multiple sub-domains before it successfully completes the
task. Given such a scenario, a standard deep reinforcement learning based
dialogue agent may suffer to find a good policy due to the issues such as:
increased state and action spaces, high sample complexity demands, sparse
reward and long horizon, etc. In this paper, we propose to use hierarchical
deep reinforcement learning approach which can operate at different temporal
scales and is intrinsically motivated to attack these problems. Our
hierarchical network consists of two levels: the top-level meta-controller for
subgoal selection and the low-level controller for dialogue policy learning.
Subgoals selected by meta-controller and intrinsic rewards can guide the
controller to effectively explore in the state-action space and mitigate the
spare reward and long horizon problems. Experiments on both simulations and
human evaluation show that our model significantly outperforms flat deep
reinforcement learning agents in terms of success rate, rewards and user
Nathan Siegle Hartmann, Magali Sanches Duran, Sandra Maria Aluísio
Comments: PROPOR International Conference on the Computational Processing of Portuguese, 2016, 8 pages
Journal-ref: PROPOR 2016. Springer. Lecture Notes in Computer Science volume
9727 (2016) pgs. 202-212
Subjects: Computation and Language (cs.CL)
Semantic Role Labeling (SRL) is a Natural Language Processing task that
enables the detection of events described in sentences and the participants of
these events. For Brazilian Portuguese (BP), there are two studies recently
concluded that perform SRL in journalistic texts. [1] obtained F1-measure
scores of 79.6, using the PropBank.Br corpus, which has syntactic trees
manually revised, [8], without using a treebank for training, obtained
F1-measure scores of 68.0 for the same corpus. However, the use of manually
revised syntactic trees for this task does not represent a real scenario of
application. The goal of this paper is to evaluate the performance of SRL on
revised and non-revised syntactic trees using a larger and balanced corpus of
BP journalistic texts. First, we have shown that [1]’s system also performs
better than [8]’s system on the larger corpus. Second, the SRL system trained
on non-revised syntactic trees performs better over non-revised trees than a
system trained on gold-standard data.
Nathan Siegle Hartmann, Livia Cucatto, Danielle Brants, Sandra Aluísio
Comments: PROPOR International Conference on the Computational Processing of Portuguese, 2016, 9 pages
Journal-ref: Hartmann N., Cucatto L., Brants D., Alu’isio S. (2016) Automatic
Classification of the Complexity of Nonfiction Texts in Portuguese for Early
School Years. In: Computational Processing of the Portuguese Language. PROPOR
2016. Springer
Subjects: Computation and Language (cs.CL)
Recent research shows that most Brazilian students have serious problems
regarding their reading skills. The full development of this skill is key for
the academic and professional future of every citizen. Tools for classifying
the complexity of reading materials for children aim to improve the quality of
the model of teaching reading and text comprehension. For English, Fengs work
[11] is considered the state-of-art in grade level prediction and achieved 74%
of accuracy in automatically classifying 4 levels of textual complexity for
close school grades. There are no classifiers for nonfiction texts for close
grades in Portuguese. In this article, we propose a scheme for manual
annotation of texts in 5 grade levels, which will be used for customized
reading to avoid the lack of interest by students who are more advanced in
reading and the blocking of those that still need to make further progress. We
obtained 52% of accuracy in classifying texts into 5 levels and 74% in 3
levels. The results prove to be promising when compared to the state-of-art
Hwiyeol Jo, Soo-Min Kim, Jeong Ryu
Comments: Rejected Paper in CogSci2017. I’m sure there is no place for integrated research. arXiv admin note: text overlap with arXiv:1607.03707
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
As the first step to model emotional state of a person, we build sentiment
analysis models with existing deep neural network algorithms and compare the
models with psychological measurements to enlighten the relationship. In the
experiments, we first examined psychological state of 64 participants and asked
them to summarize the story of a book, Chronicle of a Death Foretold (Marquez,
1981). Secondly, we trained models using crawled 365,802 movie review data;
then we evaluated participants’ summaries using the pretrained model as a
concept of transfer learning. With the background that emotion affects on
memories, we investigated the relationship between the evaluation score of the
summaries from computational models and the examined psychological
measurements. The result shows that although CNN performed the best among other
deep neural network algorithms (LSTM, GRU), its results are not related to the
psychological state. Rather, GRU shows more explainable results depending on
the psychological state. The contribution of this paper can be summarized as
follows: (1) we enlighten the relationship between computational models and
psychological measurements. (2) we suggest this framework as objective methods
to evaluate the emotion; the real sentiment analysis of a person.
Lucas Benedicic, Felipe A. Cruz, Alberto Madonna, Kean Mariotti
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Building and deploying software on high-end computing systems is a
challenging task. High performance applications have to reliably run across
multiple platforms and environments, and make use of site-specific resources
while resolving complicated software-stack dependencies. Containers are a type
of lightweight virtualization technology that attempt to solve this problem by
packaging applications and their environments into standard units of software
that are: portable, easy to build and deploy, have a small footprint, and low
runtime overhead. In this work we present an extension to the container runtime
of Shifter that provides containerized applications with a mechanism to access
GPU accelerators and specialized networking from the host system, effectively
enabling performance portability of containers across HPC resources. The
presented extension makes possible to rapidly deploy high-performance software
on supercomputers from containerized applications that have been developed,
built, and tested in non-HPC commodity hardware, e.g. the laptop or workstation
of a researcher.
William R. Saunders, James Grant, Eike H. Müller
Comments: 20 pages, 12 figures, 6 tables, submitted to Computer Physics Communications
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE); Computational Physics (physics.comp-ph)
Developers of Molecular Dynamics (MD) codes face significant challenges when
adapting existing simulation packages to new hardware. In a continuously
diversifying hardware landscape it becomes increasingly difficult for
scientists to be experts both in their own domain (physics/chemistry/biology)
and specialists in the low level parallelisation and optimisation of their
codes. To address this challenge, we describe a “Separation of Concerns”
approach for the development of parallel and optimised MD codes: the science
specialist writes code at a high abstraction level in a domain specific
language (DSL), which is then translated into efficient computer code by a
scientific programmer. In a related context, an abstraction for the solution of
partial differential equations with grid based methods has recently been
implemented in the (Py)OP2 library. Inspired by this approach, we develop a
Python code generation system for molecular dynamics simulations on different
parallel architectures, including massively parallel distributed memory systems
and GPUs. We demonstrate the efficiency of the auto-generated code by studying
its performance and scalability on different hardware and compare it to other
state-of-the-art simulation packages. With growing data volumes the extraction
of physically meaningful information from the simulation becomes increasingly
challenging and requires equally efficient implementations. A particular
advantage of our approach is the easy expression of such analysis algorithms.
We consider two popular methods for deducing the crystalline structure of a
material from the local environment of each atom, show how they can be
expressed in our abstraction and implement them in the code generation
Balaji Arun, Sebastiano Peluso, Roberto Palmieri, Giuliano Losa, Binoy Ravindran
Comments: To be published in the 47th IEEE/IFIP International Conference on Dependable Systems and Networks
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This paper proposes CAESAR, a novel multi-leader Generalized Consensus
protocol for geographically replicated sites. The main goal of CAESAR is to
overcome one of the major limitations of existing approaches, which is the
significant performance degradation when application workload produces
conflicting requests. CAESAR does that by changing the way a fast decision is
taken: its ordering protocol does not reject a fast decision for a client
request if a quorum of nodes reply with different dependency sets for that
request. The effectiveness of CAESAR is demonstrated through an evaluation
study performed on Amazon’s EC2 infrastructure using 5 geo-replicated sites.
CAESAR outperforms other multi-leader (e.g., EPaxos) competitors by as much as
1.7x in the presence of 30% conflicting requests, and single-leader (e.g.,
Multi-Paxos) by up to 3.5x.
Zafar Takhirov, Joseph Wang, Marcia S. Louis, Venkatesh Saligrama, Ajay Joshi
Comments: Submitted as Work in Progress to DAC’17
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
Machine Learning (ML) algorithms, like Convolutional Neural Networks (CNN),
Support Vector Machines (SVM), etc. have become widespread and can achieve high
statistical performance. However their accuracy decreases significantly in
energy-constrained mobile and embedded systems space, where all computations
need to be completed under a tight energy budget. In this work, we present a
field of groves (FoG) implementation of random forests (RF) that achieves an
accuracy comparable to CNNs and SVMs under tight energy budgets. Evaluation of
the FoG shows that at comparable accuracy it consumes ~1.48x, ~24x, ~2.5x, and
~34.7x lower energy per classification compared to conventional RF, SVM_RBF ,
MLP, and CNN, respectively. FoG is ~6.5x less energy efficient than SVM_LR, but
achieves 18% higher accuracy on average across all considered datasets.
Duarte Patrício, José Simão, Luís Veiga
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)
Many cloud applications rely on fast and non-relational storage to aid in the
processing of large amounts of data. Managed runtimes are now widely used to
support the execution of several storage solutions of the NoSQL movement,
particularly when dealing with big data key-value store-driven applications.
The benefits of these runtimes can however be limited by modern parallel
throughput-oriented GC algorithms, where related objects have the potential to
be dispersed in memory, either in the same or different generations. In the
long run this causes more page faults and degradation of locality on
system-level memory caches.
We propose, Gang-CG, an extension to modern heap layouts and to a parallel GC
algorithm to promote locality between groups of related objects. This is done
without extensive profiling of the applications and in a way that is
transparent to the programmer, without the need to use specialized data
structures. The heap layout and algorithmic extensions were implemented over
the Parallel Scavenge garbage collector of the HotSpot JVM@.
Using microbenchmarks that capture the architecture of several key-value
stores databases, we show negligible overhead in frequent operations such as
the allocation of new objects and improvements to the access speed of data,
supported by lower misses in system-level memory caches. Overall, we show a 6\%
improvement in the average time of read and update operations and an average
decrease of 12.4\% in page faults.
K C Santosh, Suman Kalyan Maity, Arjun Mukherjee
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)
Social medias are increasing their influence with the vast public information
leading to their active use for marketing by the companies and organizations.
Such marketing promotions are difficult to identify unlike the traditional
medias like TV and newspaper. So, it is very much important to identify the
promoters in the social media. Although, there are active ongoing researches,
existing approaches are far from solving the problem. To identify such
imposters, it is very much important to understand their strategies of social
circle creation and dynamics of content posting. Are there any specific spammer
types? How successful are each types? We analyze these questions in the light
of social relationships in Twitter. Our analyses discover two types of spammers
and their relationships with the dynamics of content posts. Our results
discover novel dynamics of spamming which are intuitive and arguable. We
propose ENWalk, a framework to detect the spammers by learning the feature
representations of the users in the social media. We learn the feature
representations using the random walks biased on the spam dynamics.
Experimental results on large-scale twitter network and the corresponding
tweets show the effectiveness of our approach that outperforms the existing
Kimin Lee, Jaehyung Kim, Song Chong, Jinwoo Shin
Comments: 22 pages, 6 figures
Subjects: Learning (cs.LG)
It has been believed that stochastic feedforward neural networks (SFNNs) have
several advantages beyond deterministic deep neural networks (DNNs): they have
more expressive power allowing multi-modal mappings and regularize better due
to their stochastic nature. However, training large-scale SFNN is notoriously
harder. In this paper, we aim at developing efficient training methods for
SFNN, in particular using known architectures and pre-trained parameters of
DNN. To this end, we propose a new intermediate stochastic model, called
Simplified-SFNN, which can be built upon any baseline DNNand approximates
certain SFNN by simplifying its upper latent units above stochastic ones. The
main novelty of our approach is in establishing the connection between three
models, i.e., DNN->Simplified-SFNN->SFNN, which naturally leads to an efficient
training procedure of the stochastic models utilizing pre-trained parameters of
DNN. Using several popular DNNs, we show how they can be effectively
transferred to the corresponding stochastic models for both multi-modal and
classification tasks on MNIST, TFD, CASIA, CIFAR-10, CIFAR-100 and SVHN
datasets. In particular, we train a stochastic model of 28 layers and 36
million parameters, where training such a large-scale stochastic network is
significantly challenging without using Simplified-SFNN
Yejin Kim, Jimeng Sun, Hwanjo Yu, Xiaoqian Jiang
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Tensor factorization models offer an effective approach to convert massive
electronic health records into meaningful clinical concepts (phenotypes) for
data analysis. These models need a large amount of diverse samples to avoid
population bias. An open challenge is how to derive phenotypes jointly across
multiple hospitals, in which direct patient-level data sharing is not possible
(e.g., due to institutional policies). In this paper, we developed a novel
solution to enable federated tensor factorization for computational phenotyping
without sharing patient-level data. We developed secure data harmonization and
federated computation procedures based on alternating direction method of
multipliers (ADMM). Using this method, the multiple hospitals iteratively
update tensors and transfer secure summarized information to a central server,
and the server aggregates the information to generate phenotypes. We
demonstrated with real medical datasets that our method resembles the
centralized training model (based on combined datasets) in terms of accuracy
and phenotypes discovery while respecting privacy.
Asit Mishra, Jeffrey J Cook, Eriko Nurvitadhi, Debbie Marr
Comments: Under submission to CVPR Workshop
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
For computer vision applications, prior works have shown the efficacy of
reducing the numeric precision of model parameters (network weights) in deep
neural networks but also that reducing the precision of activations hurts model
accuracy much more than reducing the precision of model parameters. We study
schemes to train networks from scratch using reduced-precision activations
without hurting the model accuracy. We reduce the precision of activation maps
(along with model parameters) using a novel quantization scheme and increase
the number of filter maps in a layer, and find that this scheme compensates or
surpasses the accuracy of the baseline full-precision network. As a result, one
can significantly reduce the dynamic memory footprint, memory bandwidth,
computational energy and speed up the training and inference process with
appropriate hardware support. We call our scheme WRPN – wide reduced-precision
networks. We report results using our proposed schemes and show that our
results are better than previously reported accuracies on ILSVRC-12 dataset
while being computationally less expensive compared to previously reported
reduced-precision networks.
Ivaylo Popov, Nicolas Heess, Timothy Lillicrap, Roland Hafner, Gabriel Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez, Martin Riedmiller
Comments: 12 pages, 5 Figures
Subjects: Learning (cs.LG); Robotics (cs.RO)
Deep learning and reinforcement learning methods have recently been used to
solve a variety of problems in continuous control domains. An obvious
application of these techniques is dexterous manipulation tasks in robotics
which are difficult to solve using traditional control theory or
hand-engineered approaches. One example of such a task is to grasp an object
and precisely stack it on another. Solving this difficult and practically
relevant problem in the real world is an important long-term goal for the field
of robotics. Here we take a step towards this goal by examining the problem in
simulation and providing models and techniques aimed at solving it. We
introduce two extensions to the Deep Deterministic Policy Gradient algorithm
(DDPG), a model-free Q-learning based method, which make it significantly more
data-efficient and scalable. Our results show that by making extensive use of
off-policy data and replay, it is possible to find control policies that
robustly grasp objects and stack them. Further, our results hint that it may
soon be feasible to train successful stacking policies by collecting
interactions on real robots.
Chun-Ta Lu, Lifang He, Hao Ding, Philip S. Yu
Comments: 9 pages
Subjects: Learning (cs.LG)
Real-world relations among entities can often be observed and determined by
different perspectives/views. For example, the decision made by a user on
whether to adopt an item relies on multiple aspects such as the contextual
information of the decision, the item’s attributes, the user’s profile and the
reviews given by other users. Different views may exhibit multi-way
interactions among entities and provide complementary information. In this
paper, we introduce a multi-tensor-based approach that can preserve the
underlying structure of multi-view data in a generic predictive model.
Specifically, we propose structural factorization machines (SFMs) that learn
the common latent spaces shared by multi-view tensors and automatically adjust
the importance of each view in the predictive model. Furthermore, the
complexity of SFMs is linear in the number of parameters, which make SFMs
suitable to large-scale problems. Extensive experiments on real-world datasets
demonstrate that the proposed SFMs outperform several state-of-the-art methods
in terms of prediction accuracy and computational cost.
Yao Qin, Dongjin Song, Haifeng Cheng, Wei Cheng, Guofei Jiang, Garrison Cottrell
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The Nonlinear autoregressive exogenous (NARX) model, which predicts the
current value of a time series based upon its previous values as well as the
current and past values of multiple driving (exogenous) series, has been
studied for decades. Despite the fact that various NARX models have been
developed, few of them can capture the long-term temporal dependencies
appropriately and select the relevant driving series to make predictions. In
this paper, we propose a dual-stage attention based recurrent neural network
(DA-RNN) to address these two issues. In the first stage, we introduce an input
attention mechanism to adaptively extract relevant driving series (a.k.a.,
input features) at each timestamp by referring to the previous encoder hidden
state. In the second stage, we use a temporal attention mechanism to select
relevant encoder hidden states across all the timestamps. With this dual-stage
attention scheme, our model can not only make prediction effectively, but can
also be easily interpreted. Thorough empirical studies based upon the SML 2010
dataset and the NASDAQ 100 Stock dataset demonstrate that DA-RNN can outperform
state-of-the-art methods for time series prediction.
Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel
Comments: 16 pages, 7 figures
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)
Adversarial examples are maliciously perturbed inputs designed to mislead
machine learning (ML) models at test-time. Adversarial examples are known to
transfer across models: a same perturbed input is often misclassified by
different models despite being generated to mislead a specific architecture.
This phenomenon enables simple yet powerful black-box attacks against deployed
ML systems.
In this work, we propose novel methods for estimating the previously unknown
dimensionality of the space of adversarial inputs. We find that adversarial
examples span a contiguous subspace of large dimensionality and that a
significant fraction of this space is shared between different models, thus
enabling transferability.
The dimensionality of the transferred adversarial subspace implies that the
decision boundaries learned by different models are eerily close in the input
domain, when moving away from data points in adversarial directions. A first
quantitative analysis of the similarity of different models’ decision
boundaries reveals that these boundaries are actually close in arbitrary
directions, whether adversarial or benign.
We conclude with a formal study of the limits of transferability. We show (1)
sufficient conditions on the data distribution that imply transferability for
simple model classes and (2) examples of tasks for which transferability fails
to hold. This suggests the existence of defenses making models robust to
transferability attacks—even when the model is not robust to its own
adversarial examples.
Cameron Musco, David P. Woodruff
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Numerical Analysis (math.NA)
We show how to compute a relative-error low-rank approximation to any
positive semidefinite (PSD) matrix in sublinear time, i.e., for any (n imes
n) PSD matrix (A), in ( ilde O(n cdot poly(k/epsilon))) time we output a
rank-(k) matrix (B), in factored form, for which (|A-B|_F^2 leq
(1+epsilon)|A-A_k|_F^2), where (A_k) is the best rank-(k) approximation to
(A). When (k) and (1/epsilon) are not too large compared to the sparsity of
(A), our algorithm does not need to read all entries of the matrix. Hence, we
significantly improve upon previous (nnz(A)) time algorithms based on oblivious
subspace embeddings, and bypass an (nnz(A)) time lower bound for general
matrices (where (nnz(A)) denotes the number of non-zero entries in the matrix).
We prove time lower bounds for low-rank approximation of PSD matrices, showing
that our algorithm is close to optimal. Finally, we extend our techniques to
give sublinear time algorithms for low-rank approximation of (A) in the (often
stronger) spectral norm metric (|A-B|_2^2) and for ridge regression on PSD
Ruth Fong, Andrea Vedaldi
Comments: 9 pages, 10 figures, submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
As machine learning algorithms are increasingly applied to high impact yet
high risk tasks, e.g. problems in health, it is critical that researchers can
explain how such algorithms arrived at their predictions. In recent years, a
number of image saliency methods have been developed to summarize where highly
complex neural networks “look” in an image for evidence for their predictions.
However, these techniques are limited by their heuristic nature and
architectural constraints.
In this paper, we make two main contributions: First, we propose a general
framework for learning different kinds of explanations for any black box
algorithm. Second, we introduce a paradigm that learns the minimally salient
part of an image by directly editing it and learning from the corresponding
changes to its output. Unlike previous works, our method is model-agnostic and
testable because it is grounded in replicable image perturbations.
Zahra Mousavi, Heshaam Faili
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
This paper presents an automated supervised method for Persian wordnet
construction. Using a Persian corpus and a bi-lingual dictionary, the initial
links between Persian words and Princeton WordNet synsets have been generated.
These links will be discriminated later as correct or incorrect by employing
seven features in a trained classification system. The whole method is just a
classification system, which has been trained on a train set containing FarsNet
as a set of correct instances. State of the art results on the automatically
derived Persian wordnet is achieved. The resulted wordnet with a precision of
91.18% includes more than 16,000 words and 22,000 synsets.
Luigi Troiano, Elena Mejuto, Pravesh Kriplani
Comments: 6 pages, 6 figures, 5 tables
Subjects: Trading and Market Microstructure (q-fin.TR); Learning (cs.LG)
One of the major advantages in using Deep Learning for Finance is to embed a
large collection of information into investment decisions. A way to do that is
by means of compression, that lead us to consider a smaller feature space.
Several studies are proving that non-linear feature reduction performed by Deep
Learning tools is effective in price trend prediction. The focus has been put
mainly on Restricted Boltzmann Machines (RBM) and on output obtained by them.
Few attention has been payed to Auto-Encoders (AE) as an alternative means to
perform a feature reduction. In this paper we investigate the application of
both RBM and AE in more general terms, attempting to outline how architectural
and input space characteristics can affect the quality of prediction.
Daniel R. Figueiredo, Leonardo F. R. Ribeiro, Pedro H. P. Saverese
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)
Structural identity is a concept of symmetry in which network nodes are
identified according to the network structure and their relationship to other
nodes. Structural identity has been studied in theory and practice over the
past decades, but has only recently been addressed with techniques from
representational learning. This work presents struc2vec, a novel and flexible
framework for learning latent representations of node’s structural identity.
struc2vec assesses structural similarity without using node or edge attributes,
uses a hierarchy to measure similarity at different scales, and constructs a
multilayer graph to encode the structural similarities and generate structural
context for nodes. Numerical experiments indicate that state-of-the-art
techniques for learning node representations fail in capturing stronger notions
of structural identity, while struc2vec exhibits much superior performance in
this task, as it overcomes limitations of prior techniques.
Maziar Raissi
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
This work introduces the concept of parametric Gaussian processes (PGPs),
which is built upon the seemingly self-contradictory idea of making Gaussian
processes parametric. Parametric Gaussian processes, by construction, are
designed to operate in “big data” regimes where one is interested in
quantifying the uncertainty associated with noisy data. The proposed
methodology circumvents the well-established need for stochastic variational
inference, a scalable algorithm for approximating posterior distributions. The
effectiveness of the proposed approach is demonstrated using an illustrative
example with simulated data and a benchmark dataset in the airline industry
with approximately (6) million records.
Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, Kam-Fai Wong
Comments: 13 pages, 7 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
In a composite-domain task-completion dialogue system, a conversation agent
often switches among multiple sub-domains before it successfully completes the
task. Given such a scenario, a standard deep reinforcement learning based
dialogue agent may suffer to find a good policy due to the issues such as:
increased state and action spaces, high sample complexity demands, sparse
reward and long horizon, etc. In this paper, we propose to use hierarchical
deep reinforcement learning approach which can operate at different temporal
scales and is intrinsically motivated to attack these problems. Our
hierarchical network consists of two levels: the top-level meta-controller for
subgoal selection and the low-level controller for dialogue policy learning.
Subgoals selected by meta-controller and intrinsic rewards can guide the
controller to effectively explore in the state-action space and mitigate the
spare reward and long horizon problems. Experiments on both simulations and
human evaluation show that our model significantly outperforms flat deep
reinforcement learning agents in terms of success rate, rewards and user
Tianmin Shu, Sinisa Todorovic, Song-Chun Zhu
Comments: Accepted to IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
This work is about recognizing human activities occurring in videos at
distinct semantic levels, including individual actions, interactions, and group
activities. The recognition is realized using a two-level hierarchy of Long
Short-Term Memory (LSTM) networks, forming a feed-forward deep architecture,
which can be trained end-to-end. In comparison with existing architectures of
LSTMs, we make two key contributions giving the name to our approach as
Confidence-Energy Recurrent Network — CERN. First, instead of using the common
softmax layer for prediction, we specify a novel energy layer (EL) for
estimating the energy of our predictions. Second, rather than finding the
common minimum-energy class assignment, which may be numerically unstable under
uncertainty, we specify that the EL additionally computes the p-values of the
solutions, and in this way estimates the most confident energy minimum. The
evaluation on the Collective Activity and Volleyball datasets demonstrates: (i)
advantages of our two contributions relative to the common softmax and
energy-minimization formulations and (ii) a superior performance relative to
the state-of-the-art approaches.
Pedro Morgado, Nuno Vasconcelos
Comments: Accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
The role of semantics in zero-shot learning is considered. The effectiveness
of previous approaches is analyzed according to the form of supervision
provided. While some learn semantics independently, others only supervise the
semantic subspace explained by training classes. Thus, the former is able to
constrain the whole space but lacks the ability to model semantic correlations.
The latter addresses this issue but leaves part of the semantic space
unsupervised. This complementarity is exploited in a new convolutional neural
network (CNN) framework, which proposes the use of semantics as constraints for
recognition.Although a CNN trained for classification has no transfer ability,
this can be encouraged by learning an hidden semantic layer together with a
semantic code for classification. Two forms of semantic constraints are then
introduced. The first is a loss-based regularizer that introduces a
generalization constraint on each semantic predictor. The second is a codeword
regularizer that favors semantic-to-class mappings consistent with prior
semantic knowledge while allowing these to be learned from data. Significant
improvements over the state-of-the-art are achieved on several datasets.
Maria Bauza, Alberto Rodriguez
Comments: 8 pages, 11 figures
Journal-ref: ICRA 2017
Subjects: Robotics (cs.RO); Learning (cs.LG); Machine Learning (stat.ML)
This paper presents a data-driven approach to model planar pushing
interaction to predict both the most likely outcome of a push and its expected
variability. The learned models rely on a variation of Gaussian processes with
input-dependent noise called Variational Heteroscedastic Gaussian processes
(VHGP) that capture the mean and variance of a stochastic function. We show
that we can learn accurate models that outperform analytical models after less
than 100 samples and saturate in performance with less than 1000 samples. We
validate the results against a collected dataset of repeated trajectories, and
use the learned models to study questions such as the nature of the variability
in pushing, and the validity of the quasi-static assumption.
Carlos Florensa, Yan Duan, Pieter Abbeel
Comments: Published as a conference paper at ICLR 2017
Journal-ref: International Conference on Learning Representations 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Deep reinforcement learning has achieved many impressive results in recent
years. However, tasks with sparse rewards or long horizons continue to pose
significant challenges. To tackle these important problems, we propose a
general framework that first learns useful skills in a pre-training
environment, and then leverages the acquired skills for learning faster in
downstream tasks. Our approach brings together some of the strengths of
intrinsic motivation and hierarchical methods: the learning of useful skill is
guided by a single proxy reward, the design of which requires very minimal
domain knowledge about the downstream tasks. Then a high-level policy is
trained on top of these skills, providing a significant improvement of the
exploration and allowing to tackle sparse rewards in the downstream tasks. To
efficiently pre-train a large span of skills, we use Stochastic Neural Networks
combined with an information-theoretic regularizer. Our experiments show that
this combination is effective in learning a wide span of interpretable skills
in a sample-efficient way, and can significantly boost the learning performance
uniformly across a wide range of downstream tasks.
Xuhong Chen, Jiaxun Lu, Tao Li, Pingyi Fan, Khaled Ben Letaief
Comments: This paper has been accepted for future publication in IEEE ACCESS. arXiv admin note: substantial text overlap with arXiv:1702.02121
Subjects: Information Theory (cs.IT)
High-mobility adaption and massive Multiple-input Multiple-output (MIMO)
application are two primary evolving objectives for the next generation high
speed train (HST) wireless communication system. In this paper, we consider how
to design a location-aware beamforming for the massive MIMO system in the high
traffic density HST network. We first analyze the tradeoff between beam
directivity and beamwidth, based on which we present the sensitivity analysis
of positioning accuracy. Then, in order to guarantee a high efficient
transmission, we derive an optimal problem to maximize the beam directivity
under the restriction of diverse positioning accuracies. After that, we present
a low-complexity beamforming design by utilizing location information, which
requires neither eigen-decomposing (ED) the uplink channel covariance matrix
(CCM) nor ED the downlink CCM (DCCM). Finally, we study the beamforming scheme
in future high traffic density HST network, where a two HSTs encountering
scenario is emphasized. By utilizing the real-time location information, we
propose an optimal adaptive beamforming scheme to maximize the achievable rate
region under limited channel source constraint. Numerical simulation indicates
that a massive MIMO system with less than a certain positioning error can
guarantee a required performance with satisfying transmission efficiency in the
high traffic density HST scenario and the achievable rate region when two HSTs
encounter is greatly improved as well.
Ali Rahmanpour, Vahid T. Vakili, S. Mohammad Razavizadeh
Comments: Accepted for publication in Wireless Personal Communications (Springer)
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR)
In this paper, we study using Destination Artificial Noise (DAN) besides
Source Artificial Noise (SAN) to enhance physical layer secrecy with an outage
probability based approach. It is assumed that all nodes in the network (i.e.
source, destination and eavesdropper) are equipped with multiple antennas. In
addition, the eavesdropper is passive and its channel state and location are
unknown at the source and destination. In our proposed scheme, by optimized
allocation of power to the SAN, DAN and data signal, a minimum value for the
outage probability is guaranteed at the eavesdropper, and at the same time a
certain level of signal to noise ratio (SNR) at the destination is ensured. Our
simulation results show that using DAN along with SAN brings a significant
enhancement in power consumption compared to methods that merely adopt SAN to
achieve the same outage probability at the eavesdropper.
Hojjat Mostafanasab, Esra Sengelen Sevim
Comments: 6 pages
Subjects: Information Theory (cs.IT)
Symbol-pair codes, introduced by Cassuto and Blaum [1], have been raised for
symbol-pair read channels. This new idea is motivated by the limitation of the
reading process in high-density data storage technologies. Yaakobi et al. [8]
introduced codes for (b)-symbol read channels, where the read operation is
performed as a consecutive sequence of (b>2) symbols. In this paper, we come up
with a method to compute the (b)-symbol-pair distance of two (n)-tuples, where
(n) is a positive integer. Also, we deal with the (b)-symbol-pair distances of
some kind of cyclic codes of length (p^e) over (mathbb{F}_{p^m}).
L. D. Nguyen, T. Q. Duong, H. Q. Ngo, K. Tourki
Comments: Accepted for publication on IEEE Communications Letters
Journal-ref: IEEE Communications Letters, 2017
Subjects: Information Theory (cs.IT)
We consider the downlink of a cell-free massive multiple-input
multiple-output (MIMO) network where numerous distributed access points (APs)
serve a smaller number of users under time division duplex operation. An
important issue in deploying cell-free networks is high power consumption,
which is proportional to the number of APs. This issue has raised the question
as to their suitability for green communications in terms of the total energy
efficiency (bits/Joule). To tackle this, we develop a novel low-complexity
power control technique with zero-forcing precoding design to maximize the
energy efficiency of cell-free massive MIMO taking into account the backhaul
power consumption and the imperfect channel state information.
Song-Nam Hong, Seonho Kim, Namyoon Lee
Comments: Submitted to IEEE TWC
Subjects: Information Theory (cs.IT)
This paper considers an uplink multiuser massive
multiple-input-multiple-output (MIMO) system with low-resolution
analog-to-digital converters (ADCs), in which K users with a single-antenna
communicate with one base station (BS) with Nr antennas. In this system, we
present a novel multiuser MIMO detection framework that is inspired by coding
theory. The key idea of the proposed framework is to create a code C of length
2Nr over a spatial domain. This code is constructed by a so-called
auto-encoding function that is not designable but is completely described by a
channel transformation followed by a quantization function of the ADCs. From
this point of view, we convert a multiuser MIMO detection problem into an
equivalent channel coding problem, in which a codeword of C corresponding to
users’ messages is sent over 2Nr parallel channels, each with different channel
reliability. To the resulting problem, we propose a novel weighted minimum
distance decoding (wMDD) that appropriately exploits the unequal channel
reliabilities. It is shown that the proposed wMDD yields a non-trivial gain
over the conventional minimum distance decoding (MDD). From coding-theoretic
viewpoint, we identify that bit-error-rate (BER) exponentially decreases with
the minimum distance of the code C, which plays a similar role with a condition
number in conventional MIMO systems. Furthermore, we develop the communication
method that uses the wMDD for practical scenarios where the BS has no knowledge
of channel state information. Finally, numerical results are provided to verify
the superiority of the proposed method.
Ziyang Yuan, Qi Wang, Hongxia Wang
Subjects: Information Theory (cs.IT)
Phase retrieval(PR) problem is a kind of ill-condition inverse problem which
can be found in various of applications. Utilizing the sparse priority, an
algorithm called SWF(Sparse Wirtinger Flow) is proposed in this paper to deal
with sparse PR problem based on the Wirtinger flow method. SWF firstly recovers
the support of the signal and then updates the evaluation by hard thresholding
method with an elaborate initialization. Theoretical analyses show that SWF has
a geometric convergence for any (k) sparse (n) length signal with the sampling
complexity (mathcal{O}(k^2mathrm{log}n)). To get (varepsilon) accuracy, the
computational complexity of SWF is
Numerical tests also demonstrate that SWF performs better than
state-of-the-art methods especially when we have no priori knowledge about
sparsity (k). Moreover, SWF is also robust to the noise
Zohair Abu-Shaban, Xiangyun Zhou, Thushara Abhayapala, Gonzalo Seco-Granados, Henk Wymeersch
Comments: This work has been submitted to the IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
Location-aware communication systems are expected to play a pivotal part in
the next generation of mobile communication networks. Therefore, there is a
need to understand the localization limits in these networks, particularly,
using millimeter-wave technology (mmWave). Towards that, we address the uplink
and downlink localization limits in terms of 3D position and orientation error
bounds for mmWave multipath channels. We also carry out a detailed analysis of
the dependence of the bounds of different systems parameters. Our key findings
indicate that the uplink and downlink behave differently in two distinct ways.
First of all, the error bounds have different scaling factors with respect to
the number of antennas in the uplink and downlink. Secondly, uplink
localization is sensitive to the orientation angle of the user equipment (UE),
whereas downlink is not. Moreover, in the considered outdoor scenarios, the
non-line-of-sight paths generally improve localization when a line-of-sight
path exists. Finally, our numerical results show that mmWave systems are
capable of localizing a UE with sub-meter position error, and sub-degree
orientation error.
Sudharsan Parthasarathy, Suman Kumar, Radha Krishna Ganti, Sheetal Kalyani, K. Giridhar
Subjects: Information Theory (cs.IT)
In this paper, we derive the data-aided Error Vector Magnitude (EVM) in an
interference limited system when both the desired signal and interferers
experience independent and non identically distributed (kappa)-(mu) shadowed
fading. Then it is analytically shown that the EVM is equal to the square root
of number of interferers when the desired signal and interferers do not
experience fading. Further, EVM is derived in the presence of interference and
noise, when the desired signal experiences (kappa)-(mu) shadowed fading and
the interferers experience independent and identical Nakagami fading. Moreover,
using the properties of the special functions, the derived EVM expressions are
also simplified for various special cases.
Jinseok Choi, Brian L. Evans, Alan Gatherer
Comments: Submitted to IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT)
Hybrid analog-digital beamforming architectures with low-resolution
analog-to-digital converters (ADCs) reduce hardware cost and power consumption
in multiple-input multiple-output (MIMO) millimeter wave (mmWave) communication
systems. In this paper, we propose a hybrid architecture with
resolution-adaptive ADCs for mmWave receivers with large antenna arrays. We
adopt array response vectors for the analog combiners and derive ADC bit
allocation (BA) algorithms. The two proposed BA algorithms minimize the mean
square quantization error of received analog signals under a total ADC power
constraint. It is beneficial to assign more bits to the ADC with a larger
channel gain on the corresponding radio frequency (RF) chain, and the optimal
number of ADC bits is logarithmically proportional to the RF chain’s
signal-to-noise ratio raised to the 1/3 power. Contributions of this paper
include 1) an ADC bit allocation algorithm to improve communication performance
of a hybrid MIMO receiver, 2) a revised ADC bit allocation algorithm that is
robust to additive noise, and 3) a worst-case analysis of the ergodic rate of
the proposed MIMO receiver that quantifies system tradeoffs and serves as the
lower bound. Simulation results validate the ergodic rate formula and
demonstrate that the proposed BA algorithms outperform a fixed-ADC approach in
both spectral and energy efficiency. For a power constraint equivalent to that
of fixed 4-bit ADCs, the revised BA algorithm makes the quantization error
negligible while achieving 22% better energy efficiency. Having negligible
quantization error allows existing state-of-the-art digital beamforming
techniques to be readily applied to the proposed system.
Henrique F. de Arruda, Filipi N. Silva, Cesar H. Comin, Diego R. Amancio, Luciano da F. Costa
Subjects: Information Theory (cs.IT); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (; Physics and Society (physics.soc-ph)
A framework integrating information theory and network science is proposed,
giving rise to a potentially new area of network information science. By
incorporating and integrating concepts such as complexity, coding, topological
projections and network dynamics, the proposed network-based framework paves
the way not only to extending traditional information science, but also to
modeling, characterizing and analyzing a broad class of real-world problems,
from language communication to DNA coding. Basically, an original network is
supposed to be transmitted, with our without compaction, through a time-series
obtained by sampling its topology by some network dynamics, such as random
walks. We show that the degree of compression is ultimately related to the
ability to predict the frequency of symbols based on the topology of the
original network and the adopted dynamics. The potential of the proposed
approach is illustrated with respect to the efficiency of transmitting
topologies of several network models by using a variety of random walks.
Several interesting results are obtained, including the behavior of the BA
model oscillating between high and low performance depending on the considered
dynamics, and the distinct performances obtained for two analogous geographical
Amir Adler, Mati Wax
Subjects: Information Theory (cs.IT)
We present novel convex-optimization-based solutions to the problem of blind
beamforming of constant modulus signals, and to the related problem of linearly
constrained blind beamforming of constant modulus signals. These solutions
ensure global optimality and are parameter free, namely, do not contain any
tuneable parameters and do not require any a-priori parameter settings. The
performance of these solutions, as demonstrated by simulated data, is superior
to existing methods.
Flavio P. Calmon, Dennis Wei, Karthikeyan Natesan Ramamurthy, Kush R. Varshney
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Information Theory (cs.IT)
Non-discrimination is a recognized objective in algorithmic decision making.
In this paper, we introduce a novel probabilistic formulation of data
pre-processing for reducing discrimination. We propose a convex optimization
for learning a data transformation with three goals: controlling
discrimination, limiting distortion in individual data samples, and preserving
utility. We characterize the impact of limited sample size in accomplishing
this objective, and apply two instances of the proposed optimization to
datasets, including one on real-world criminal recidivism. The results
demonstrate that all three criteria can be simultaneously achieved and also
reveal interesting patterns of bias in American society.
Jose Luis Rosales, Vicente Martin
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
The recently introduced equivalent formulation of the integer factorization
problem for (N=xy), obtaining a function of the primes (E(x)) within the
factorization ensemble, is reviewed. Here we demonstrate that this formulation
can be readily translated to the physics of a quantum device in which the
quantities (E_k) are the eigenvalues of a bounded Hamiltonian. The spectrum is
solved for (x_k=o(sqrt N)) leading to a super efficient probabilistic quantum
factoring algorithm which only requires ( o((log sqrt N)^3)) input
calculations. The state of the quantum simulator can be identified as that of a
two electrons (mathbf {P}) wave in a Penning Trap. We consider the possibility
to build the simulator experimentally in order to obtain, from the measured
magnetron trap frequencies, the probabilistic quantum sieve for the possible
factors of (N). This approach is suited for large (N) and (x=o(sqrt{N})).