Alexander Hagg
Subjects: Neural and Evolutionary Computing (cs.NE)
Evolutionary illumination is a recent technique that allows producing many
diverse, optimal solutions in a map of manually defined features. To support
the large amount of objective function evaluations, surrogate model assistance
was recently introduced. Illumination models need to represent many more,
diverse optimal regions than classical surrogate models. In this PhD thesis, we
propose to decompose the sample set, decreasing model complexity, by
hierarchically segmenting the training set according to their coordinates in
feature space. An ensemble of diverse models can then be trained to serve as a
surrogate to illumination.
Shengcai Liu, Ke Tang, Xin Yao
Subjects: Neural and Evolutionary Computing (cs.NE)
This paper studies improving solvers based on their past solving experiences,
and focuses on improving solvers by offline training. Specifically, the key
issues of offline training methods are discussed, and research belonging to
this category but from different areas are reviewed in a unified framework.
Existing training methods generally adopt a two-stage strategy in which
selecting the training instances and training instances are treated in two
independent phases. This paper proposes a new training method, dubbed LiangYi,
which addresses these two issues simultaneously. LiangYi includes a training
module for a population-based solver and an instance sampling module for
updating the training instances. The idea behind LiangYi is to promote the
population-based solver by training it (with the training module) to improve
its performance on those instances (discovered by the sampling module) on which
it performs badly, while keeping the good performances obtained by it on
previous instances. An instantiation of LiangYi on the Travelling Salesman
Problem is also proposed. Empirical results on a huge testing set containing
10000 instances showed LiangYi could train solvers that perform significantly
better than the solvers trained by other state-of-the-art training method.
Moreover, empirical investigation of the behaviours of LiangYi confirmed it was
able to continuously improve the solver through training.
Yagmur G. Cinar, Hamid Mirisaee, Parantapa Goswami, Eric Gaussier, Ali Ait-Bachir, Vadim Strijov
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper, we study the use of recurrent neural networks (RNNs) for
modeling and forecasting time series. We first illustrate the fact that
standard sequence-to-sequence RNNs neither capture well periods in time series
nor handle well missing values, even though many real life times series are
periodic and contain missing values. We then propose an extended attention
mechanism that can be deployed on top of any RNN and that is designed to
capture periods and make the RNN more robust to missing values. We show the
effectiveness of this novel model through extensive experiments with multiple
univariate and multivariate datasets.
Sofia Ira Ktena, Salim Arslan, Sarah Parisot, Daniel Rueckert
Comments: accepted at ISBI 2017: International Symposium on Biomedical Imaging, Apr 2017, Melbourne, Australia
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)
Data-driven brain parcellations aim to provide a more accurate representation
of an individual’s functional connectivity, since they are able to capture
individual variability that arises due to development or disease. This renders
comparisons between the emerging brain connectivity networks more challenging,
since correspondences between their elements are not preserved. Unveiling these
correspondences is of major importance to keep track of local functional
connectivity changes. We propose a novel method based on graph edit distance
for the comparison of brain graphs directly in their domain, that can
accurately reflect similarities between individual networks while providing the
network element correspondences. This method is validated on a dataset of 116
twin subjects provided by the Human Connectome Project.
Albert Gatt, Emiel Krahmer
Comments: 111 pages, 8 figures, 2 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.
Tomaso Poggio, Qianli Liao
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Previous theoretical work on deep learning and neural network optimization
tend to focus on avoiding saddle points and local minima. However, the
practical observation is that, at least for the most successful Deep
Convolutional Neural Networks (DCNNs) for visual processing, practitioners can
always increase the network size to fit the training data (an extreme example
would be [1]). The most successful DCNNs such as VGG and ResNets are best used
with a small degree of “overparametrization”. In this work, we characterize
with a mix of theory and experiments, the landscape of the empirical risk of
overparametrized DCNNs. We first prove the existence of a large number of
degenerate global minimizers with zero empirical error (modulo inconsistent
equations). The zero-minimizers — in the case of classification — have a
non-zero margin. The same minimizers are degenerate and thus very likely to be
found by SGD that will furthermore select with higher probability the
zero-minimizer with larger margin, as discussed in Theory III (to be released).
We further experimentally explored and visualized the landscape of empirical
risk of a DCNN on CIFAR-10 during the entire training process and especially
the global minima. Finally, based on our theoretical and experimental results,
we propose an intuitive model of the landscape of DCNN’s empirical loss
surface, which might not be as complicated as people commonly believe.
Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, Gang Hua
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present variational generative adversarial networks, a general learning
framework that combines a variational auto-encoder with a generative
adversarial network, for synthesizing images of fine-grained categories, such
as faces of a specific person or objects in a category. Our approach models an
image as a composition of label and latent attributes in a probabilistic model.
By varying the fine-grained category label fed to the resulting generative
model, we can generate images in a specific category by randomly drawn values
on a latent attribute vector. The novelty of our approach comes from two
aspects. Firstly, we propose to adopt a cross entropy loss for the
discriminative and classifier network, but a mean discrepancy objective for the
generative network. This kind of asymmetric loss function makes the training of
the GAN more stable. Secondly, we adopt an encoder network to learn the
relationship between the latent space and the real image space, and use
pairwise feature matching to keep the structure of generated images. We
experiment with natural images of faces, flowers, and birds, and demonstrate
that the proposed models are capable of generating realistic and diverse
samples with fine-grained category labels. We further show that our models can
be applied to other tasks, such as image inpainting, super-resolution, and data
augmentation for training better face recognition models.
Matan Sela, Elad Richardson, Ron Kimmel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
It has been recently shown that neural networks can recover the geometric
structure of a face from a single given image. A common denominator of most
existing face geometry reconstruction methods is the restriction of the
solution space to some low-dimensional subspace. While such a model
significantly simplifies the reconstruction problem, it is inherently limited
in its expressiveness. As an alternative, we propose an Image-to-Image
translation network that maps the input image to a depth image and a facial
correspondence map. This explicit pixel-based mapping can then be utilized to
provide high quality reconstructions of diverse faces under extreme
expressions. In the spirit of recent approaches, the network is trained only
with synthetic data, and is then evaluated on “in-the-wild” facial images. Both
qualitative and quantitative analyses demonstrate the accuracy and the
robustness of our approach. As an additional analysis of the proposed network,
we show that it can be used as a geometric constraint for facial image
translation tasks.
Mo Shan, Fei Wang, Feng Lin, Zhi Gao, Ya Z. Tang, Ben M. Chen
Comments: Published in ROBIO 2015, Zhuhai, China. Fixed a typo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a framework for Google Map aided UAV navigation in GPS-denied
environment. Geo-referenced navigation provides drift-free localization and
does not require loop closures. The UAV position is initialized via
correlation, which is simple and efficient. We then use optical flow to predict
its position in subsequent frames. During pose tracking, we obtain inter-frame
translation either by motion field or homography decomposition, and we use HOG
features for registration on Google Map. We employ particle filter to conduct a
coarse to fine search to localize the UAV. Offline test using aerial images
collected by our quadrotor platform shows promising results as our approach
eliminates the drift in dead-reckoning, and the small localization error
indicates the superiority of our approach as a supplement to GPS.
Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung Jin Hwang, Joel Shor, George Toderici
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a method for lossy image compression based on recurrent,
convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000,
and JPEG as measured by MS-SSIM. We introduce three improvements over previous
research that lead to this state-of-the-art result. First, we show that
training with a pixel-wise loss weighted by SSIM increases reconstruction
quality according to several metrics. Second, we modify the recurrent
architecture to improve spatial diffusion, which allows the network to more
effectively capture and propagate image information through the network’s
hidden state. Finally, in addition to lossless entropy coding, we use a
spatially adaptive bit allocation algorithm to more efficiently use the limited
number of bits to encode visually complex image regions. We evaluate our method
on the Kodak and Tecnick image sets and compare against standard codecs as well
recently published methods based on deep neural networks.
Fabien Baradel, Christian Wolf, Julien Mille
Comments: 10 pages, project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We address human action recognition from multi-modal video data involving
articulated pose and RGB frames and propose a two-stream approach. The pose
stream is processed with a convolutional model taking as input a 3D tensor
holding data from a sub-sequence. A specific joint ordering, which respects the
topology of the human body, ensures that different convolutional layers
correspond to meaningful levels of abstraction. The raw RGB stream is handled
by a spatio-temporal soft-attention mechanism conditioned on features from the
pose network. An LSTM network receives input from a set of image locations at
each instant. A trainable glimpse sensor extracts features on a set of
predefined locations specified by the pose stream, namely the 4 hands of the
two people involved in the activity. Appearance features give important cues on
hand motion and on objects held in each hand. We show that it is of high
interest to shift the attention to different hands at different time steps
depending on the activity itself. Finally a temporal attention mechanism learns
how to fuse LSTM features over time. We evaluate the method on 3 datasets.
State-of-the-art results are achieved on the largest dataset for human activity
recognition, namely NTU-RGB+D, as well as on the SBU Kinect Interaction
dataset. Performance close to state-of-the-art is achieved on the smaller MSR
Daily Activity 3D dataset.
Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Extending state-of-the-art object detectors from image to video is
challenging. The accuracy of detection suffers from degenerated object
appearances in videos, e.g., motion blur, video defocus, rare poses, etc.
Existing work attempts to exploit temporal information on box level, but such
methods are not trained end-to-end. We present flow-guided feature aggregation,
an accurate and end-to-end learning framework for video object detection. It
leverages temporal coherence on feature level instead. It improves the
per-frame features by aggregation of nearby features along the motion paths,
and thus improves the video recognition accuracy. Our method significantly
improves upon strong single-frame baselines in ImageNet VID, especially for
more challenging fast moving objects. Our framework is principled, and on par
with the best engineered systems winning the ImageNet VID challenges 2016,
without additional bells-and-whistles. The code would be released.
Zhiqiang Shen, Yu-Gang Jiang, Dequan Wang, Xiangyang Xue
Comments: To appear in ICME 2017 as an oral paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The aim of fine-grained recognition is to identify sub-ordinate categories in
images like different species of birds. Existing works have confirmed that, in
order to capture the subtle differences across the categories, automatic
localization of objects and parts is critical. Most approaches for object and
part localization relied on the bottom-up pipeline, where thousands of region
proposals are generated and then filtered by pre-trained object/part models.
This is computationally expensive and not scalable once the number of
objects/parts becomes large. In this paper, we propose a nonparametric
data-driven method for object and part localization. Given an unlabeled test
image, our approach transfers annotations from a few similar images retrieved
in the training set. In particular, we propose an iterative transfer strategy
that gradually refine the predicted bounding boxes. Based on the located
objects and parts, deep convolutional features are extracted for recognition.
We evaluate our approach on the widely-used CUB200-2011 dataset and a new and
large dataset called Birdsnap. On both datasets, we achieve better results than
many state-of-the-art approaches, including a few using oracle (manually
annotated) bounding boxes in the test images.
Alexis Arnaudon, Darryl D. Holm, Stefan Sommer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Dynamical Systems (math.DS); Numerical Analysis (math.NA)
We introduce a stochastic model of diffeomorphisms, whose action on a variety
of data types descends to stochastic models of shapes, images and landmarks.
The stochasticity is introduced in the vector field which transports the data
in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework for
shape analysis and image registration. The stochasticity thereby models errors
or uncertainties of the flow in following the prescribed deformation velocity.
The approach is illustrated in the example of finite dimensional landmark
manifolds, whose stochastic evolution is studied both via the Fokker-Planck
equation and by numerical simulations. We derive two approaches for inferring
parameters of the stochastic model from landmark configurations observed at
discrete time points. The first of the two approaches matches moments of the
Fokker-Planck equation to sample moments of the data, while the second approach
employs an Expectation-Maximisation based algorithm using a Monte Carlo bridge
sampling scheme to optimise the data likelihood. We derive and numerically test
the ability of the two approaches to infer the spatial correlation length of
the underlying noise.
Siavash Arjomand Bigdeli, Matthias Zwicker
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
We propose to leverage denoising autoencoder networks as priors to address
image restoration problems. We build on the key observation that the output of
an optimal denoising autoencoder is a local mean of the true data density, and
the autoencoder error (the difference between the output and input of the
trained autoencoder) is a mean shift vector. We use the magnitude of this mean
shift vector, that is, the distance to the local mean, as the negative log
likelihood of our natural image prior. For image restoration, we maximize the
likelihood using gradient descent by backpropagating the autoencoder error. A
key advantage of our approach is that we do not need to train separate networks
for different image restoration tasks, such as non-blind deconvolution with
different kernels, or super-resolution at different magnification factors. We
demonstrate state of the art results for non-blind deconvolution and
super-resolution using the same autoencoding prior.
Estefania Talavera, Nicola Strisciuglio, Nicolai Petkov, Petia Radeva
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Lifelogging is a process of collecting rich source of information about daily
life of people. In this paper, we introduce the problem of sentiment analysis
in egocentric events focusing on the moments that compose the images recalling
positive, neutral or negative feelings to the observer. We propose a method for
the classification of the sentiments in egocentric pictures based on global and
semantic image features extracted by Convolutional Neural Networks. We carried
out experiments on an egocentric dataset, which we organized in 3 classes on
the basis of the sentiment that is recalled to the user (positive, negative or
neutral).
Qiong Zeng, Wenzheng Chen, Zhuo Han, Mingyi Shi, Yanir Kleiman, Daniel Cohen-Or, Baoquan Chen, Yangyan Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Understanding semantic similarity among images is the core of a wide range of
computer vision applications. An important step towards this goal is to collect
and learn human perceptions. Interestingly, the semantic context of images is
often ambiguous as images can be perceived with emphasis on different aspects,
which may be contradictory to each other.
In this paper, we present a method for learning the semantic similarity among
images, inferring their latent aspects and embedding them into multi-spaces
corresponding to their semantic aspects.
We consider the multi-embedding problem as an optimization function that
evaluates the embedded distances with respect to the qualitative clustering
queries. The key idea of our approach is to collect and embed qualitative
measures that share the same aspects in bundles. To ensure similarity aspect
sharing among multiple measures, image classification queries are presented to,
and solved by users. The collected image clusters are then converted into
bundles of tuples, which are fed into our bundle optimization algorithm that
jointly infers the aspect similarity and multi-aspect embedding. Extensive
experimental results show that our approach significantly outperforms
state-of-the-art multi-embedding approaches on various datasets, and scales
well for large multi-aspect similarity measures.
Zhengtao Wang, Ce Zhu, Zhiqiang Xia, Qi Guo, Yipeng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep network pruning is an effective method to reduce the storage and
computation cost of deep neural networks when applying them to resource-limited
devices. Among many pruning granularities, neuron level pruning will remove
redundant neurons and filters in the model and result in thinner networks. In
this paper, we propose a gradually global pruning scheme for neuron level
pruning. In each pruning step, a small percent of neurons were selected and
dropped across all layers in the model. We also propose a simple method to
eliminate the biases in evaluating the importance of neurons to make the scheme
feasible. Compared with layer-wise pruning scheme, our scheme avoid the
difficulty in determining the redundancy in each layer and is more effective
for deep networks. Our scheme would automatically find a thinner sub-network in
original network under a given performance.
Hazel Doughty, Dima Damen, Walterio Mayol-Cuevas
Comments: 10 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a method for assessing skill of performance from video,
for a variety of tasks, ranging from drawing to surgery and rolling dough. We
formulate the problem as pairwise and overall ranking of video collections, and
propose a supervised deep ranking model to learn discriminative features
between pairs of videos exhibiting different amounts of skill. We utilise a
two-stream Temporal Segment Network to capture both the type and quality of
motions and the evolving task state. Results demonstrate our method is
applicable to a variety of tasks, with the percentage of correctly ordered
pairs of videos ranging from 70% to 82% for four datasets. We demonstrate the
robustness of our approach via sensitivity analysis of its parameters.
We see this work as effort toward the automated and objective organisation of
how-to videos and overall, generic skill determination in video.
J. H. Rick Chang, Chun-Liang Li, Barnabas Poczos, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
While deep learning methods have achieved state-of-the-art performance in
many challenging inverse problems like image inpainting and super-resolution,
they invariably involve problem-specific training of the networks. Under this
approach, different problems require different networks. In scenarios where we
need to solve a wide variety of problems, e.g., on a mobile camera, it is
inefficient and costly to use these specially-trained networks. On the other
hand, traditional methods using signal priors can be used in all linear inverse
problems but often have worse performance on challenging tasks. In this work,
we provide a middle ground between the two kinds of methods — we propose a
general framework to train a single deep neural network that solves arbitrary
linear inverse problems. The proposed network acts as a proximal operator for
an optimization algorithm and projects non-image signals onto the set of
natural images defined by the decision boundary of a classifier. In our
experiments, the proposed framework demonstrates superior performance over
traditional methods using a wavelet sparsity prior and achieves comparable
performance of specially-trained networks on tasks including compressive
sensing and pixel-wise inpainting.
Shiyu Chen, Shangfei Wang, Tanfang Chen, Xiaoxiao Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a novel approach for learning multi-label
classifiers with the help of privileged information. Specifically, we use
similarity constraints to capture the relationship between available
information and privileged information, and use ranking constraints to capture
the dependencies among multiple labels. By integrating similarity constraints
and ranking constraints into the learning process of classifiers, the
privileged information and the dependencies among multiple labels are exploited
to construct better classifiers during training. A maximum margin classifier is
adopted, and an efficient learning algorithm of the proposed method is also
developed. We evaluate the proposed method on two applications: multiple object
recognition from images with the help of implicit information about object
importance conveyed by the list of manually annotated image tags; and multiple
facial action unit detection from low-resolution images augmented by
high-resolution images. Experimental results demonstrate that the proposed
method can effectively take full advantage of privileged information and
dependencies among multiple labels for better object recognition and better
facial action unit detection.
Hexiang Hu, Zhiwei Deng, Guang-Tong Zhou, Fei Sha, Greg Mori
Comments: Pre-prints
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Semantic segmentation requires a detailed labeling of image pixels by object
category. Information derived from local image patches is necessary to describe
the detailed shape of individual objects. However, this information is
ambiguous and can result in noisy labels. Global inference of image content can
instead capture the general semantic concepts present. We advocate that
holistic inference of image concepts provides valuable information for detailed
pixel labeling. We propose a generic framework to leverage holistic information
in the form of a LabelBank for pixel-level segmentation.
We show the ability of our framework to improve semantic segmentation
performance in a variety of settings. We learn models for extracting a holistic
LabelBank from visual cues, attributes, and/or textual descriptions. We
demonstrate improvements in semantic segmentation accuracy on standard datasets
across a range of state-of-the-art segmentation architectures and holistic
inference approaches.
Arvind Balachandrasekaran, Mathews Jacob
Comments: 4 pages, 3 figures, accepted at ISBI 2017, Melbourne, Australia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a structured low rank matrix completion algorithm to recover a
time series of images consisting of linear combination of exponential
parameters at every pixel, from under-sampled Fourier measurements. The spatial
smoothness of these parameters is exploited along with the exponential
structure of the time series at every pixel, to derive an annihilation relation
in the (k-t) domain. This annihilation relation translates into a structured
low rank matrix formed from the (k-t) samples. We demonstrate the algorithm in
the parameter mapping setting and show significant improvement over state of
the art methods.
Ryan Szeto, Jason J. Corso
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We motivate and address a human-in-the-loop variant of the monocular
viewpoint estimation task in which the location and class of one semantic
object keypoint is available at test time. In order to leverage the keypoint
information, we devise a Convolutional Neural Network called Click-Here CNN
(CH-CNN) that integrates the keypoint information with activations from the
layers that process the image. It transforms the keypoint information into a 2D
map that can be used to weigh features from certain parts of the image more
heavily. The weighted sum of these spatial features is combined with global
image features to provide relevant information to the prediction layers. To
train our network, we collect a novel dataset of 3D keypoint annotations on
thousands of CAD models, and synthetically render millions of images with 2D
keypoint information. On test instances from PASCAL 3D+, our model achieves a
mean class accuracy of 90.7%, whereas the state-of-the-art baseline only
obtains 85.7% accuracy, justifying our argument for human-in-the-loop
inference.
Joseph Antony, Kevin McGuinness, Kieran Moran, Noel E O'Connor
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces a new approach to automatically quantify the severity
of knee OA using X-ray images. Automatically quantifying knee OA severity
involves two steps: first, automatically localizing the knee joints; next,
classifying the localized knee joint images. We introduce a new approach to
automatically detect the knee joints using a fully convolutional neural network
(FCN). We train convolutional neural networks (CNN) from scratch to
automatically quantify the knee OA severity optimizing a weighted ratio of two
loss functions: categorical cross-entropy and mean-squared loss. This joint
training further improves the overall quantification of knee OA severity, with
the added benefit of naturally producing simultaneous multi-class
classification and regression outputs. Two public datasets are used to evaluate
our approach, the Osteoarthritis Initiative (OAI) and the Multicenter
Osteoarthritis Study (MOST), with extremely promising results that outperform
existing approaches.
Hossein Hosseini, Baicen Xiao, Radha Poovendran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Despite the rapid progress of the techniques for image classification, video
annotation has remained a challenging task. Automated video annotation would be
a breakthrough technology, enabling users to search within the videos.
Recently, Google introduced the Cloud Video Intelligence API for video
analysis. As per the website, the system “separates signal from noise, by
retrieving relevant information at the video, shot or per frame.” A
demonstration website has been also launched, which allows anyone to select a
video for annotation. The API then detects the video labels (objects within the
video) as well as shot labels (description of the video events over time). In
this paper, we examine the usability of the Google’s Cloud Video Intelligence
API in adversarial environments. In particular, we investigate whether an
adversary can manipulate a video in such a way that the API will return only
the adversary-desired labels. For this, we select an image that is different
from the content of the Video and insert it, periodically and at a very low
rate, into the video. We found that if we insert one image every two seconds,
the API is deceived into annotating the entire video as if it only contains the
inserted image. Note that the modification to the video is hardly noticeable
as, for instance, for a typical frame rate of 25, we insert only one image per
50 video frames. We also found that, by inserting one image per second, all the
shot labels returned by the API are related to the inserted image. We perform
the experiments on the sample videos provided by the API demonstration website
and show that our attack is successful with different videos and images.
Luowei Zhou, Chenliang Xu, Jason J. Corso
Comments: 15 pages including Appendix
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a temporal segmentation and procedure learning model for long
untrimmed and unconstrained videos, e.g., videos from YouTube. The proposed
model segments a video into segments that constitute a procedure and learns the
underlying temporal dependency among the procedure segments. The output
procedure segments can be applied for other tasks, such as video description
generation or activity recognition. Two aspects distinguish our work from the
existing literature. First, we introduce the problem of learning long-range
temporal structure for procedure segments within a video, in contrast to the
majority of efforts that focus on understanding short-range temporal structure.
Second, the proposed model segments an unseen video with only visual evidence
and can automatically determine the number of segments to predict. For
evaluation, there is no large-scale dataset with annotated procedure steps
available. Hence, we collect a new cooking video dataset, named YouCookII, with
the procedure steps localized and described. Our ProcNets model achieves
state-of-the-art performance in procedure segmentation.
Yanhai Gan, Huifang Chi, Ying Gao, Jun Liu, Guoqiang Zhong, Junyu Dong
Comments: 7 pages, 4 figures, icme2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
This paper investigates a novel task of generating texture images from
perceptual descriptions. Previous work on texture generation focused on either
synthesis from examples or generation from procedural models. Generating
textures from perceptual attributes have not been well studied yet. Meanwhile,
perceptual attributes, such as directionality, regularity and roughness are
important factors for human observers to describe a texture. In this paper, we
propose a joint deep network model that combines adversarial training and
perceptual feature regression for texture generation, while only random noise
and user-defined perceptual attributes are required as input. In this model, a
preliminary trained convolutional neural network is essentially integrated with
the adversarial framework, which can drive the generated textures to possess
given perceptual attributes. An important aspect of the proposed model is that,
if we change one of the input perceptual features, the corresponding appearance
of the generated textures will also be changed. We design several experiments
to validate the effectiveness of the proposed method. The results show that the
proposed method can produce high quality texture images with desired perceptual
properties.
Rui Zhao, Haider Ali, Patrick van der Smagt
Comments: 8 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
The recognition of actions from video sequences has many applications in
health monitoring, assisted living, surveillance, and smart homes. Despite
advances in sensing, in particular related to 3D video, the methodologies to
process the data are still subject to research. We demonstrate superior results
by a system which combines recurrent neural networks with convolutional neural
networks in a voting approach. The gated-recurrent-unit-based neural networks
are particularly well-suited to distinguish actions based on long-term
information from optical tracking data; the 3D-CNNs focus more on detailed,
recent information from video data. The resulting features are merged in an SVM
which then classifies the movement. In this architecture, our method improves
recognition rates of state-of-the-art methods by 14% on standard data sets.
Kamel Abdelouahab, Cedric Bourrasset, Maxime Pelcat, François Berry, Jean-Charles Quinton, Jocelyn Serot
Comments: 8 pages, 6 figures
Journal-ref: Proceedings of the 10th International Conference on Distributed
Smart Camera (ICDSC) 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep Neural Networks are becoming the de-facto standard models for image
understanding, and more generally for computer vision tasks. As they involve
highly parallelizable computations, CNN are well suited to current fine grain
programmable logic devices. Thus, multiple CNN accelerators have been
successfully implemented on FPGAs. Unfortunately, FPGA resources such as logic
elements or DSP units remain limited. This work presents a holistic method
relying on approximate computing and design space exploration to optimize the
DSP block utilization of a CNN implementation on an FPGA. This method was
tested when implementing a reconfigurable OCR convolutional neural network on
an Altera Stratix V device and varying both data representation and CNN
topology in order to find the best combination in terms of DSP block
utilization and classification accuracy. This exploration generated dataflow
architectures of 76 CNN topologies with 5 different fixed point representation.
Most efficient implementation performs 883 classifications/sec at 256 x 256
resolution using 8% of the available DSP blocks.
Caglar Aytekin, Jarno Nikkanen, Moncef Gabbouj
Comments: Download Link for the Dataset: this https URL Submission Info: Submitted to IEEE TIP
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we provide a novel dataset designed for camera invariant color
constancy research. Camera invariance corresponds to the robustness of an
algorithm’s performance when run on images of the same scene taken by different
cameras. Accordingly, images in the database correspond to several lab and
field scenes each of which are captured by three different cameras with minimal
registration errors. The lab scenes are also captured under five different
illuminations. The spectral responses of cameras and the spectral power
distributions of the lab light sources are also provided, as they may prove
beneficial for training future algorithms to achieve color constancy. For a
fair evaluation of future methods, we provide guidelines for supervised methods
with indicated training, validation and testing partitions. Accordingly, we
evaluate a recently proposed convolutional neural network based color constancy
algorithm as a baseline for future research. As a side contribution, this
dataset also includes images taken by a mobile camera with color shading
corrected and uncorrected results. This allows research on the effect of color
shading as well.
Mathieu Garon, Jean-François Lalonde
Comments: 8 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a temporal 6-DOF tracking method which leverages deep learning to
achieve state-of-the-art performance on challenging datasets of real world
capture. Our method is both more accurate and more robust to occlusions than
the existing best performing approaches while maintaining real-time
performance. To assess its efficacy, we evaluate our approach on several
challenging RGBD sequences of real objects in a variety of conditions. Notably,
we systematically evaluate robustness to occlusions through a series of
sequences where the object to be tracked is increasingly occluded. Finally, our
approach is purely data-driven and does not require any hand-designed features:
robust tracking is automatically learned from data.
Wei Wen, Cong Xu, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Very large-scale Deep Neural Networks (DNNs) have achieved remarkable
successes in a large variety of computer vision tasks. However, the high
computation intensity of DNNs makes it challenging to deploy these models on
resource-limited systems. Some studies used low-rank approaches that
approximate the filters by low-rank basis to accelerate the testing. Those
works directly decomposed the pre-trained DNNs by Low-Rank Approximations
(LRA). How to train DNNs toward lower-rank space for more efficient DNNs,
however, remains as an open area. To solve the issue, in this work, we propose
Force Regularization, which uses attractive forces to enforce filters so as to
coordinate more weight information into lower-rank space. We mathematically and
empirically prove that after applying our technique, standard LRA methods can
reconstruct filters using much lower basis and thus result in faster DNNs. The
effectiveness of our approach is comprehensively evaluated in ResNets, AlexNet,
and GoogLeNet. In AlexNet, for example, Force Regularization gains 2x speedup
on modern GPU without accuracy loss and 4.05x speedup on CPU by paying small
accuracy degradation. Moreover, Force Regularization better initializes the
low-rank DNNs such that the fine-tuning can converge faster toward higher
accuracy. The obtained lower-rank DNNs can be further sparsified, proving that
Force Regularization can be integrated with state-of-the-art sparsity-based
acceleration methods.
Shun Yang, Wenshuo Wang, Chang Liu, Kevin Deng, J. Karl Hedrick
Comments: 6 pages, 11 figures, 3 tables, accepted by 2017 IEEE Intelligent Vehicles Symposium
Subjects: Computer Vision and Pattern Recognition (cs.CV); Systems and Control (cs.SY)
Deep learning-based approaches have been widely used for training controllers
for autonomous vehicles due to their powerful ability to approximate nonlinear
functions or policies. However, the training process usually requires large
labeled data sets and takes a lot of time. In this paper, we analyze the
influences of features on the performance of controllers trained using the
convolutional neural networks (CNNs), which gives a guideline of feature
selection to reduce computation cost. We collect a large set of data using The
Open Racing Car Simulator (TORCS) and classify the image features into three
categories (sky-related, roadside-related, and road-related features).We then
design two experimental frameworks to investigate the importance of each single
feature for training a CNN controller.The first framework uses the training
data with all three features included to train a controller, which is then
tested with data that has one feature removed to evaluate the feature’s
effects. The second framework is trained with the data that has one feature
excluded, while all three features are included in the test data. Different
driving scenarios are selected to test and analyze the trained controllers
using the two experimental frameworks. The experiment results show that (1) the
road-related features are indispensable for training the controller, (2) the
roadside-related features are useful to improve the generalizability of the
controller to scenarios with complicated roadside information, and (3) the
sky-related features have limited contribution to train an end-to-end
autonomous vehicle controller.
Tavi Halperin, Michael Werman
Comments: Submitted to ICCV
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We exploit the following observation to directly find epipolar lines. For a
pixel p in Image A all pixels corresponding to p in Image B are on the same
epipolar line, or equivalently the image of the line spanning A’s center and p
is an epipolar line in B. Computing the epipolar geometry from feature points
between cameras with very different viewpoints is often error prone as an
object’s appearance can vary greatly between images. This paper extends earlier
work based on the dynamics of the scene which was successful in these cases.
The algorithms introduced here for finding corresponding epipolar lines
accelerate and robustify previous methods for computing the epipolar geometry
in dynamic scenes.
Tomaso Poggio, Qianli Liao
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Previous theoretical work on deep learning and neural network optimization
tend to focus on avoiding saddle points and local minima. However, the
practical observation is that, at least for the most successful Deep
Convolutional Neural Networks (DCNNs) for visual processing, practitioners can
always increase the network size to fit the training data (an extreme example
would be [1]). The most successful DCNNs such as VGG and ResNets are best used
with a small degree of “overparametrization”. In this work, we characterize
with a mix of theory and experiments, the landscape of the empirical risk of
overparametrized DCNNs. We first prove the existence of a large number of
degenerate global minimizers with zero empirical error (modulo inconsistent
equations). The zero-minimizers — in the case of classification — have a
non-zero margin. The same minimizers are degenerate and thus very likely to be
found by SGD that will furthermore select with higher probability the
zero-minimizer with larger margin, as discussed in Theory III (to be released).
We further experimentally explored and visualized the landscape of empirical
risk of a DCNN on CIFAR-10 during the entire training process and especially
the global minima. Finally, based on our theoretical and experimental results,
we propose an intuitive model of the landscape of DCNN’s empirical loss
surface, which might not be as complicated as people commonly believe.
Tshilidzi Marwala
Subjects: Artificial Intelligence (cs.AI)
The theory of rational choice assumes that when people make decisions they do
so in order to maximize their utility. In order to achieve this goal they ought
to use all the information available and consider all the choices available to
choose an optimal choice. This paper investigates what happens when decisions
are made by artificially intelligent machines in the market rather than human
beings. Firstly, the expectations of the future are more consistent if they are
made by an artificially intelligent machine and the decisions are more rational
and thus marketplace becomes more rational.
Peng Peng (1), Quan Yuan (1), Ying Wen (2), Yaodong Yang (2), Zhenkun Tang (1), Haitao Long (1), Jun Wang (2) ((1) Alibaba Group, (2) University College London)
Comments: 13 pages, 10 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Real-world artificial intelligence (AI) applications often require multiple
agents to work in a collaborative effort. Efficient learning for intra-agent
communication and coordination is an indispensable step towards general AI. In
this paper, we take StarCraft combat game as the test scenario, where the task
is to coordinate multiple agents as a team to defeat their enemies. To maintain
a scalable yet effective communication protocol, we introduce a multiagent
bidirectionally-coordinated network (BiCNet [‘bIknet]) with a vectorised
extension of actor-critic formulation. We show that BiCNet can handle different
types of combats under diverse terrains with arbitrary numbers of AI agents for
both sides. Our analysis demonstrates that without any supervisions such as
human demonstrations or labelled data, BiCNet could learn various types of
coordination strategies that is similar to these of experienced game players.
Moreover, BiCNet is easily adaptable to the tasks with heterogeneous agents. In
our experiments, we evaluate our approach against multiple baselines under
different scenarios; it shows state-of-the-art performance, and possesses
potential values for large-scale real-world applications.
Mitra Baratchi, Geert Heijenk, Maarten van Steen
Subjects: Artificial Intelligence (cs.AI)
In this paper, we address the problem of how automated situation-awareness
can be achieved by learning real-world situations from ubiquitously generated
mobility data. Without semantic input about the time and space where situations
take place, this turns out to be a fundamental challenging problem.
Uncertainties also introduce technical challenges when data is generated in
irregular time intervals, being mixed with noise, and errors. Purely relying on
temporal patterns observable in mobility data, in this paper, we propose
Spaceprint, a fully automated algorithm for finding the repetitive pattern of
similar situations in spaces. We evaluate this technique by showing how the
latent variables describing the category, and the actual identity of a space
can be discovered from the extracted situation patterns. Doing so, we use
different real-world mobility datasets with data about the presence of mobile
entities in a variety of spaces. We also evaluate the performance of this
technique by showing its robustness against uncertainties.
Zilu Ma, Shiqi Liu, Deyu Meng
Comments: 9 pages, 0 figures
Subjects: Artificial Intelligence (cs.AI)
Self-paced learning (SPL) is a new methodology that simulates the learning
principle of humans/animals to start learning easier aspects of a learning
task, and then gradually take more complex examples into training. This
new-coming learning regime has been empirically substantiated to be effective
in various computer vision and pattern recognition tasks. Recently, it has been
proved that the SPL regime has a close relationship to a implicit self-paced
objective function. While this implicit objective could provide helpful
interpretations to the effectiveness, especially the robustness, insights under
the SPL paradigms, there are still no theoretical results strictly proved to
verify such relationship. To this issue, in this paper, we provide some
convergence results on this implicit objective of SPL. Specifically, we prove
that the learning process of SPL always converges to critical points of this
implicit objective under some mild conditions. This result verifies the
intrinsic relationship between SPL and this implicit objective, and makes the
previous robustness analysis on SPL complete and theoretically rational.
Patrick Glauner, Manxing Du, Victor Paraschiv, Andrey Boytsov, Isabel Lopez Andrade, Jorge Meira, Petko Valtchev, Radu State
Journal-ref: Proceedings of the 25th European Symposium on Artificial Neural
Networks, Computational Intelligence and Machine Learning (ESANN 2017)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Which topics of machine learning are most commonly addressed in research?
This question was initially answered in 2007 by doing a qualitative survey
among distinguished researchers. In our study, we revisit this question from a
quantitative perspective. Concretely, we collect 54K abstracts of papers
published between 2007 and 2016 in leading machine learning journals and
conferences. We then use machine learning in order to determine the top 10
topics in machine learning. We not only include models, but provide a holistic
view across optimization, data, features, etc. This quantitative approach
allows reducing the bias of surveys. It reveals new and up-to-date insights
into what the 10 most prolific topics in machine learning research are. This
allows researchers to identify popular topics as well as new and rising topics
for their research.
Albert Gatt, Emiel Krahmer
Comments: 111 pages, 8 figures, 2 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.
Hexiang Hu, Zhiwei Deng, Guang-Tong Zhou, Fei Sha, Greg Mori
Comments: Pre-prints
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Semantic segmentation requires a detailed labeling of image pixels by object
category. Information derived from local image patches is necessary to describe
the detailed shape of individual objects. However, this information is
ambiguous and can result in noisy labels. Global inference of image content can
instead capture the general semantic concepts present. We advocate that
holistic inference of image concepts provides valuable information for detailed
pixel labeling. We propose a generic framework to leverage holistic information
in the form of a LabelBank for pixel-level segmentation.
We show the ability of our framework to improve semantic segmentation
performance in a variety of settings. We learn models for extracting a holistic
LabelBank from visual cues, attributes, and/or textual descriptions. We
demonstrate improvements in semantic segmentation accuracy on standard datasets
across a range of state-of-the-art segmentation architectures and holistic
inference approaches.
Krishnaram Kenthapadi, Stuart Ambler, Liang Zhang, Deepak Agarwal
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
The recently launched LinkedIn Salary product has been designed to realize
the vision of helping the world’s professionals optimize their earning
potential through salary transparency. We describe the overall design and
architecture of the salary modeling system underlying this product. We focus on
the unique data mining challenges in designing and implementing the system, and
describe the modeling components such as outlier detection and Bayesian
hierarchical smoothing that help to compute and present robust compensation
insights to users. We report on extensive evaluation with nearly one year of
anonymized compensation data collected from over one million LinkedIn users,
thereby demonstrating the efficacy of the statistical models. We also highlight
the lessons learned through the deployment of our system at LinkedIn.
Martin Plajner
Comments: Study for Dissertation Thesis. Supervisor: Jiv{r}’i Vomlel
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
In this paper we follow our previous research in the area of Computerized
Adaptive Testing (CAT). We present three different methods for CAT. One of
them, the item response theory, is a well established method, while the other
two, Bayesian and neural networks, are new in the area of educational testing.
In the first part of this paper, we present the concept of CAT and its
advantages and disadvantages. We collected data from paper tests performed with
grammar school students. We provide the summary of data used for our
experiments in the second part. Next, we present three different model types
for CAT. They are based on the item response theory, Bayesian networks, and
neural networks. The general theory associated with each type is briefly
explained and the utilization of these models for CAT is analyzed. Future
research is outlined in the concluding part of the paper. It shows many
interesting research paths that are important not only for CAT but also for
other areas of artificial intelligence.
Yanhai Gan, Huifang Chi, Ying Gao, Jun Liu, Guoqiang Zhong, Junyu Dong
Comments: 7 pages, 4 figures, icme2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
This paper investigates a novel task of generating texture images from
perceptual descriptions. Previous work on texture generation focused on either
synthesis from examples or generation from procedural models. Generating
textures from perceptual attributes have not been well studied yet. Meanwhile,
perceptual attributes, such as directionality, regularity and roughness are
important factors for human observers to describe a texture. In this paper, we
propose a joint deep network model that combines adversarial training and
perceptual feature regression for texture generation, while only random noise
and user-defined perceptual attributes are required as input. In this model, a
preliminary trained convolutional neural network is essentially integrated with
the adversarial framework, which can drive the generated textures to possess
given perceptual attributes. An important aspect of the proposed model is that,
if we change one of the input perceptual features, the corresponding appearance
of the generated textures will also be changed. We design several experiments
to validate the effectiveness of the proposed method. The results show that the
proposed method can produce high quality texture images with desired perceptual
properties.
Antti Kangasrääsiö, Samuel Kaski
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Inverse reinforcement learning (IRL) aims to explain observed complex
behavior by fitting reinforcement learning models to behavioral data. However,
traditional IRL methods are only applicable when the observations are in the
form of state-action paths. This is a problem in many real-world modelling
settings, where only more limited observations are easily available. To address
this issue, we extend the traditional IRL problem formulation. We call this new
formulation the inverse reinforcement learning from summary data (IRL-SD)
problem, where instead of state-action paths, only summaries of the paths are
observed. We propose exact and approximate methods for both maximum likelihood
and full posterior estimation for IRL-SD problems. Through case studies we
compare these methods, demonstrating that the approximate methods can be used
to solve moderate-sized IRL-SD problems in reasonable time.
Shiri Dori-Hacohen, Myungha Jang, James Allan
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
A growing body of research focuses on computationally detecting controversial
topics and understanding the stances people hold on them. Yet gaps remain in
our theoretical and practical understanding of how to define controversy, how
it manifests, and how to measure it. In this paper, we introduce a novel
measure we call “contention”, defined with respect to a topic and a population.
We model contention from a mathematical standpoint. We validate our model by
examining a diverse set of sources: real-world polling data sets, actual voter
data, and Twitter coverage on several topics. In our publicly-released Twitter
data set of nearly 100M tweets, we examine several topics such as Brexit, the
2016 U.S. Elections, and “The Dress”, and cross-reference them with other
sources. We demonstrate that the contention measure holds explanatory power for
a wide variety of observed phenomena, such as controversies over climate change
and other topics that are well within scientific consensus. Finally, we
re-examine the notion of controversy, and present a theoretical framework that
defines it in terms of population. We present preliminary evidence suggesting
that contention is one dimension of controversy, along with others, such as
“importance”. Our new contention measure, along with the hypothesized model of
controversy, suggest several avenues for future work in this emerging
interdisciplinary research area.
Krishnaram Kenthapadi, Stuart Ambler, Liang Zhang, Deepak Agarwal
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
The recently launched LinkedIn Salary product has been designed to realize
the vision of helping the world’s professionals optimize their earning
potential through salary transparency. We describe the overall design and
architecture of the salary modeling system underlying this product. We focus on
the unique data mining challenges in designing and implementing the system, and
describe the modeling components such as outlier detection and Bayesian
hierarchical smoothing that help to compute and present robust compensation
insights to users. We report on extensive evaluation with nearly one year of
anonymized compensation data collected from over one million LinkedIn users,
thereby demonstrating the efficacy of the statistical models. We also highlight
the lessons learned through the deployment of our system at LinkedIn.
Haixia Liu
Comments: 13 pages; 5 tables
Subjects: Computation and Language (cs.CL)
In comparison with document summarization on the articles from social media
and newswire, argumentative zoning (AZ) is an important task in scientific
paper analysis. Traditional methodology to carry on this task relies on feature
engineering from different levels. In this paper, three models of generating
sentence vectors for the task of sentence classification were explored and
compared. The proposed approach builds sentence representations using learned
embeddings based on neural network. The learned word embeddings formed a
feature space, to which the examined sentence is mapped to. Those features are
input into the classifiers for supervised classification. Using
10-cross-validation scheme, evaluation was conducted on the
Argumentative-Zoning (AZ) annotated articles. The results showed that simply
averaging the word vectors in a sentence works better than the paragraph to
vector algorithm and by integrating specific cuewords into the loss function of
the neural network can improve the classification performance. In comparison
with the hand-crafted features, the word2vec method won for most of the
categories. However, the hand-crafted features showed their strength on
classifying some of the categories.
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous
Comments: Submitted to Interspeech 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)
A text-to-speech synthesis system typically consists of multiple stages, such
as a text analysis frontend, an acoustic model and an audio synthesis module.
Building these components often requires extensive domain expertise and may
contain brittle design choices. In this paper, we present Tacotron, an
end-to-end generative text-to-speech model that synthesizes speech directly
from characters. Given <text, audio> pairs, the model can be trained completely
from scratch with random initialization. We present several key techniques to
make the sequence-to-sequence framework perform well for this challenging task.
Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English,
outperforming a production parametric system in terms of naturalness. In
addition, since Tacotron generates speech at the frame level, it’s
substantially faster than sample-level autoregressive methods.
Simon Šuster, Stéphan Tulkens, Walter Daelemans
Comments: First Workshop on Ethics in Natural Language Processing (EACL’17)
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Clinical NLP has an immense potential in contributing to how clinical
practice will be revolutionized by the advent of large scale processing of
clinical records. However, this potential has remained largely untapped due to
slow progress primarily caused by strict data access policies for researchers.
In this paper, we discuss the concern for privacy and the measures it entails.
We also suggest sources of less sensitive data. Finally, we draw attention to
biases that can compromise the validity of empirical research and lead to
socially harmful applications.
Soumia Bougrine, Hadda Cherroun, Djelloul Ziadi
Comments: 33 pages, 7 figures
Subjects: Computation and Language (cs.CL)
In daily communications, Arabs use local dialects which are hard to identify
automatically using conventional classification methods. The dialect
identification challenging task becomes more complicated when dealing with an
under-resourced dialects belonging to a same county/region. In this paper, we
start by analyzing statistically Algerian dialects in order to capture their
specificities related to prosody information which are extracted at utterance
level after a coarse-grained consonant/vowel segmentation. According to these
analysis findings, we propose a Hierarchical classification approach for spoken
Arabic algerian Dialect IDentification (HADID). It takes advantage from the
fact that dialects have an inherent property of naturally structured into
hierarchy. Within HADID, a top-down hierarchical classification is applied, in
which we use Deep Neural Networks (DNNs) method to build a local classifier for
every parent node into the hierarchy dialect structure. Our framework is
implemented and evaluated on Algerian Arabic dialects corpus. Whereas, the
hierarchy dialect structure is deduced from historic and linguistic knowledges.
The results reveal that within {HD}, the best classifier is DNNs compared to
Support Vector Machine. In addition, compared with a baseline Flat
classification system, our HADID gives an improvement of 63.5% in term of
precision. Furthermore, overall results evidence the suitability of our
prosody-based HADID for speaker independent dialect identification while
requiring less than 6s test utterances.
Albert Gatt, Emiel Krahmer
Comments: 111 pages, 8 figures, 2 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.
Haonan Yu, Haichao Zhang, Wei Xu
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We tackle a task where an agent learns to navigate in a 2D maze-like
environment called XWORLD. In each session, the agent perceives a sequence of
raw-pixel frames, a natural language command issued by a teacher, and a set of
rewards. The agent learns the teacher’s language from scratch in a grounded and
compositional manner, such that after training it is able to correctly execute
zero-shot commands: 1) the combination of words in the command never appeared
before, and/or 2) the command contains new object concepts that are learned
from another task but never learned from navigation. Our deep framework for the
agent is trained end to end: it learns simultaneously the visual
representations of the environment, the syntax and semantics of the language,
and the action module that outputs actions. The zero-shot learning capability
of our framework results from its compositionality and modularity with
parameter tying. We visualize the intermediate outputs of the framework,
demonstrating that the agent truly understands how to solve the problem. We
believe that our results provide some preliminary insights on how to train an
agent with similar abilities in a 3D environment.
Areej Alhothali, Jesse Hoey
Subjects: Computation and Language (cs.CL)
In this paper, we propose an extension to graph-based sentiment lexicon
induction methods by incorporating distributed and semantic word
representations in building the similarity graph to expand a three-dimensional
sentiment lexicon. We also implemented and evaluated the label propagation
using four different word representations and similarity metrics. Our
comprehensive evaluation of the four approaches was performed on a single data
set, demonstrating that all four methods can generate a significant number of
new sentiment assignments with high accuracy. The highest correlations
(tau=0.51) and the lowest error (mean absolute error < 1.1%), obtained by
combining both the semantic and the distributional features, outperformed the
distributional-based and semantic-based label-propagation models and approached
a supervised algorithm.
Einat Naaman, Yossi Adi, Joseph Keshet
Subjects: Computation and Language (cs.CL)
A significant source of errors in Automatic Speech Recognition (ASR) systems
is due to pronunciation variations which occur in spontaneous and
conversational speech. Usually ASR systems use a finite lexicon that provides
one or more pronunciations for each word. In this paper, we focus on learning a
similarity function between two pronunciations. The pronunciation can be the
canonical and the surface pronunciations of the same word or it can be two
surface pronunciations of different words. This task generalizes problems such
as lexical access (the problem of learning the mapping between words and their
possible pronunciations), and defining word neighborhoods. It can also be used
to dynamically increase the size of the pronunciation lexicon, or in predicting
ASR errors. We propose two methods, which are based on recurrent neural
networks, to learn the similarity function. The first is based on binary
classification, and the second is based on learning the ranking of the
pronunciations. We demonstrate the efficiency of our approach on the task of
lexical access using a subset from the Switchboard conversational speech
corpus. Results suggest that our method is superior to previous methods which
are based on graphical Bayesian methods.
Kouakou Ive Arsene Koffi, Konan Marcellin Brou, Souleymane Oumtanaga
Comments: in French
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Databases (cs.DB)
The large amount of information and the increasing complexity of applications
constrain developers to have stand-alone and reusable components from libraries
and component markets.Our approach consists in developing methods to evaluate
the quality of the software component of these libraries, on the one hand and
moreover to optimize the financial cost and the adaptation’s time of these
selected components. Our objective function defines a metric that maximizes the
value of the software component quality by minimizing the financial cost and
maintenance time. This model should make it possible to classify the components
and order them in order to choose the most optimized.
MOTS-CLES : d{‘e}veloppement de m{‘e}thode, r{‘e}utilisation, composants
logiciels, qualit{‘e} de composant
KEYWORDS:method development, reuse, software components, component quality .
Kashish Ara Shakil, Ari Ora, Mansaf Alam, Shabih Shakeel
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cloud computing is a cost-effective way for start-up life sciences
laboratories to store and manage their data. However, in many instances the
data stored over the cloud could be redundant which makes cloud-based data
management inefficient and costly because one has to pay for every byte of data
stored over the cloud. Here, we tested efficient management of data generated
by an electron cryo microscopy (cryoEM) lab on a cloud-based environment. The
test data was obtained from cryoEM repository EMPIAR. All the images were
subjected to an in-house parallelized version of principal component analysis.
An efficient cloud-based MapReduce modality was used for parallelization. We
showed that large data in order of terabytes could be efficiently reduced to
its minimal essential self in a cost-effective scalable manner. Furthermore,
on-spot instance on Amazon EC2 was shown to reduce costs by a margin of about
27 percent. This approach could be scaled to data of any large volume and type.
Nhien-An Le-Khac, M-Tahar Kechadi, Joe Carthy
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper, we present the ADMIRE architecture; a new framework for
developing novel and innovative data mining techniques to deal with very large
and distributed heterogeneous datasets in both commercial and academic
applications. The main ADMIRE components are detailed as well as its interfaces
allowing the user to efficiently develop and implement their data mining
applications techniques on a Grid platform such as Globus ToolKit, DGET, etc.
Bin Chen, Ronald Kantowski, Xinyu Dai, Eddie Baron, Paul Van der Mark
Comments: 18 pages, 3 figures, accepted by the Astronomy & Computing
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); High Energy Astrophysical Phenomena (astro-ph.HE); Distributed, Parallel, and Cluster Computing (cs.DC)
Recently Graphics Processing Units (GPUs) have been used to speed up very
CPU-intensive gravitational microlensing simulations. In this work, we use the
Xeon Phi coprocessor to accelerate such simulations and compare its performance
on a microlensing code with that of NVIDIA’s GPUs. For the selected set of
parameters evaluated in our experiment, we find that the speedup by Intel’s
Knights Corner coprocessor is comparable to that by NVIDIA’s Fermi family of
GPUs with compute capability 2.0, but less significant than GPUs with higher
compute capabilities such as the Kepler. However, the very recently released
second generation Xeon Phi, Knights Landing, is about 5.8 times faster than the
Knights Corner, and about 2.9 times faster than the Kepler GPU used in our
simulations. We conclude that the Xeon Phi is a very promising alternative to
GPUs for modern high performance microlensing simulations.
Patrick Glauner, Manxing Du, Victor Paraschiv, Andrey Boytsov, Isabel Lopez Andrade, Jorge Meira, Petko Valtchev, Radu State
Journal-ref: Proceedings of the 25th European Symposium on Artificial Neural
Networks, Computational Intelligence and Machine Learning (ESANN 2017)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Which topics of machine learning are most commonly addressed in research?
This question was initially answered in 2007 by doing a qualitative survey
among distinguished researchers. In our study, we revisit this question from a
quantitative perspective. Concretely, we collect 54K abstracts of papers
published between 2007 and 2016 in leading machine learning journals and
conferences. We then use machine learning in order to determine the top 10
topics in machine learning. We not only include models, but provide a holistic
view across optimization, data, features, etc. This quantitative approach
allows reducing the bias of surveys. It reveals new and up-to-date insights
into what the 10 most prolific topics in machine learning research are. This
allows researchers to identify popular topics as well as new and rising topics
for their research.
Junyu Luo
Comments: 9 pages, 6 figures
Subjects: Learning (cs.LG)
Generative Adversarial Net has shown its great ability in generating samples.
The inverse mapping of generator also contains a great value. Some works have
been developed to construct the inverse function of generator. However, the
existing ways of training the inverse model of GANs have many shortcomings. In
this paper, we propose a new approach of training the inverse model of
generator by regarding a pre-trained generator as the decoder part of an
autoencoder network. This model does not directly minimize the difference
between original input and inverse output, but try to minimize the difference
between the generated data by using original input and inverse output. This
strategy overcome the difficulty in training a inverse model of a non
one-to-one function. And the inverse mapping we learned can be directly used in
image searching and processing.
Yagmur G. Cinar, Hamid Mirisaee, Parantapa Goswami, Eric Gaussier, Ali Ait-Bachir, Vadim Strijov
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper, we study the use of recurrent neural networks (RNNs) for
modeling and forecasting time series. We first illustrate the fact that
standard sequence-to-sequence RNNs neither capture well periods in time series
nor handle well missing values, even though many real life times series are
periodic and contain missing values. We then propose an extended attention
mechanism that can be deployed on top of any RNN and that is designed to
capture periods and make the RNN more robust to missing values. We show the
effectiveness of this novel model through extensive experiments with multiple
univariate and multivariate datasets.
Feiyun Zhu, Peng Liao, Xinliang Zhu, Yaowen Yao, Junzhou Huang
Subjects: Learning (cs.LG)
In the wake of the vast population of smart device users worldwide, mobile
health (mHealth) technologies are hopeful to generate positive and wide
influence on people’s health. They are able to provide flexible, affordable and
portable health guides to device users. Current online decision-making methods
for mHealth assume that the users are completely heterogeneous. They share no
information among users and learn a separate policy for each user. However,
data for each user is very limited in size to support the separate online
learning, leading to unstable policies that contain lots of variances. Besides,
we find the truth that a user may be similar with some, but not all, users, and
connected users tend to have similar behaviors. In this paper, we propose a
network cohesion constrained (actor-critic) Reinforcement Learning (RL) method
for mHealth. The goal is to explore how to share information among similar
users to better convert the limited user information into sharper learned
policies. To the best of our knowledge, this is the first online actor-critic
RL for mHealth and first network cohesion constrained (actor-critic) RL method
in all applications. The network cohesion is important to derive effective
policies. We come up with a novel method to learn the network by using the warm
start trajectory, which directly reflects the users’ property. The optimization
of our model is difficult and very different from the general supervised
learning due to the indirect observation of values. As a contribution, we
propose two algorithms for the proposed online RLs. Apart from mHealth, the
proposed methods can be easily applied or adapted to other health-related
tasks. Extensive experiment results on the HeartSteps dataset demonstrates that
in a variety of parameter settings, the proposed two methods obtain obvious
improvements over the state-of-the-art methods.
Maren Mahsereci, Philipp Hennig
Comments: Extended version of the NIPS ’15 conference paper, includes detailed pseudo-code, 51 pages, 30 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In deterministic optimization, line searches are a standard tool ensuring
stability and efficiency. Where only stochastic gradients are available, no
direct equivalent has so far been formulated, because uncertain gradients do
not allow for a strict sequence of decisions collapsing the search space. We
construct a probabilistic line search by combining the structure of existing
deterministic methods with notions from Bayesian optimization. Our method
retains a Gaussian process surrogate of the univariate optimization objective,
and uses a probabilistic belief over the Wolfe conditions to monitor the
descent. The algorithm has very low computational cost, and no user-controlled
parameters. Experiments show that it effectively removes the need to define a
learning rate for stochastic gradient descent.
Jiaqi Zhang, Kai Zheng, Wenlong Mou, Liwei Wang
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
In this paper, we consider efficient differentially private empirical risk
minimization from the viewpoint of optimization algorithms. For strongly convex
and smooth objectives, we prove that gradient descent with output perturbation
not only achieves nearly optimal utility, but also significantly improves the
running time of previous state-of-the-art private optimization algorithms, for
both (epsilon)-DP and ((epsilon, delta))-DP. For non-convex but smooth
objectives, we propose an RRPSGD (Random Round Private Stochastic Gradient
Descent) algorithm, which provably converges to a stationary point with privacy
guarantee. Besides the expected utility bounds, we also provide guarantees in
high probability form. Experiments demonstrate that our algorithm consistently
outperforms existing method in both utility and running time.
Subin Yi, Janghoon Ju, Man-Ki Yoon, Jaesik Choi
Subjects: Learning (cs.LG)
Analyzing multivariate time series data is important for many applications
such as automated control, fault diagnosis and anomaly detection. One of the
key challenges is to learn latent features automatically from dynamically
changing multivariate input. In visual recognition tasks, convolutional neural
networks (CNNs) have been successful to learn generalized feature extractors
with shared parameters over the spatial domain. However, when high-dimensional
multivariate time series is given, designing an appropriate CNN model structure
becomes challenging because the kernels may need to be extended through the
full dimension of the input volume. To address this issue, we present two
structure learning algorithms for deep CNN models. Our algorithms exploit the
covariance structure over multiple time series to partition input volume into
groups. The first algorithm learns the group CNN structures explicitly by
clustering individual input sequences. The second algorithm learns the group
CNN structures implicitly from the error backpropagation. In experiments with
two real-world datasets, we demonstrate that our group CNNs outperform existing
CNN based regression methods.
Mohamed Abuella, Badrul Chowdhury
Comments: This works has been presented in the American Society for Engineering Management, International Annual Conference, 2016
Subjects: Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Generation and load balance is required in the economic scheduling of
generating units in the smart grid. Variable energy generations, particularly
from wind and solar energy resources, are witnessing a rapid boost, and, it is
anticipated that with a certain level of their penetration, they can become
noteworthy sources of uncertainty. As in the case of load demand, energy
forecasting can also be used to mitigate some of the challenges that arise from
the uncertainty in the resource. While wind energy forecasting research is
considered mature, solar energy forecasting is witnessing a steadily growing
attention from the research community. This paper presents a support vector
regression model to produce solar power forecasts on a rolling basis for 24
hours ahead over an entire year, to mimic the practical business of energy
forecasting. Twelve weather variables are considered from a high-quality
benchmark dataset and new variables are extracted. The added value of the heat
index and wind speed as additional variables to the model is studied across
different seasons. The support vector regression model performance is compared
with artificial neural networks and multiple linear regression models for
energy forecasting.
Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, Kilian Q. Weinberger
Subjects: Learning (cs.LG)
This paper studies convolutional networks that require limited computational
resources at test time. We develop a new network architecture that performs on
par with state-of-the-art convolutional networks, whilst facilitating
prediction in two settings: (1) an anytime-prediction setting in which the
network’s prediction for one example is progressively updated, facilitating the
output of a prediction at any time; and (2) a batch computational budget
setting in which a fixed amount of computation is available to classify a set
of examples that can be spent unevenly across ‘easier’ and ‘harder’ examples.
Our network architecture uses multi-scale convolutions and progressively
growing feature representations, which allows for the training of multiple
classifiers at intermediate layers of the network. Experiments on three
image-classification datasets demonstrate the efficacy of our architecture, in
particular, when measured in terms of classification accuracy as a function of
the amount of compute available.
Lillian J. Ratliff, Eric Mazumdar
Subjects: Learning (cs.LG)
We address the problem of inverse reinforcement learning in Markov decision
processes where the agent is risk-sensitive. In particular, we model
risk-sensitivity in a reinforcement learning framework by making use of models
of human decision-making having their origins in behavioral psychology,
behavioral economics, and neuroscience. We propose a gradient-based inverse
reinforcement learning algorithm that minimizes a loss function defined on the
observed behavior. We demonstrate the performance of the proposed technique on
two examples, the first of which is the canonical Grid World example and the
second of which is a Markov decision process modeling passengers decisions
regarding ride-sharing. In the latter, we use pricing and travel time data from
a ride-sharing company to construct the transition probabilities and rewards of
the Markov decision process.
Tomaso Poggio, Qianli Liao
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Previous theoretical work on deep learning and neural network optimization
tend to focus on avoiding saddle points and local minima. However, the
practical observation is that, at least for the most successful Deep
Convolutional Neural Networks (DCNNs) for visual processing, practitioners can
always increase the network size to fit the training data (an extreme example
would be [1]). The most successful DCNNs such as VGG and ResNets are best used
with a small degree of “overparametrization”. In this work, we characterize
with a mix of theory and experiments, the landscape of the empirical risk of
overparametrized DCNNs. We first prove the existence of a large number of
degenerate global minimizers with zero empirical error (modulo inconsistent
equations). The zero-minimizers — in the case of classification — have a
non-zero margin. The same minimizers are degenerate and thus very likely to be
found by SGD that will furthermore select with higher probability the
zero-minimizer with larger margin, as discussed in Theory III (to be released).
We further experimentally explored and visualized the landscape of empirical
risk of a DCNN on CIFAR-10 during the entire training process and especially
the global minima. Finally, based on our theoretical and experimental results,
we propose an intuitive model of the landscape of DCNN’s empirical loss
surface, which might not be as complicated as people commonly believe.
Iman Niazazari, Hanif Livani
Comments: 5 pages, 5 figures, conference
Subjects: Learning (cs.LG); Systems and Control (cs.SY)
Proliferation of advanced metering devices with high sampling rates in
distribution grids, e.g., micro-phasor measurement units ({mu}PMU), provides
unprecedented potentials for wide-area monitoring and diagnostic applications,
e.g., situational awareness, health monitoring of distribution assets.
Unexpected disruptive events interrupting the normal operation of assets in
distribution grids can eventually lead to permanent failure with expensive
replacement cost over time. Therefore, disruptive event classification provides
useful information for preventive maintenance of the assets in distribution
networks. Preventive maintenance provides wide range of benefits in terms of
time, avoiding unexpected outages, maintenance crew utilization, and equipment
replacement cost. In this paper, a PMU-data-driven framework is proposed for
classification of disruptive events in distribution networks. The two
disruptive events, i.e., malfunctioned capacitor bank switching and
malfunctioned regulator on-load tap changer (OLTC) switching are considered and
distinguished from the normal abrupt load change in distribution grids. The
performance of the proposed framework is verified using the simulation of the
events in the IEEE 13-bus distribution network. The event classification is
formulated using two different algorithms as; i) principle component analysis
(PCA) together with multi-class support vector machine (SVM), and ii)
autoencoder along with softmax classifier. The results demonstrate the
effectiveness of the proposed algorithms and satisfactory classification
accuracies.
Loic Bontemps, Van Loi Cao, James McDermott, Nhien-An Le-Khac
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR)
Intrusion detection for computer network systems becomes one of the most
critical tasks for network administrators today. It has an important role for
organizations, governments and our society due to its valuable resources on
computer networks. Traditional misuse detection strategies are unable to detect
new and unknown intrusion. Besides, anomaly detection in network security is
aim to distinguish between illegal or malicious events and normal behavior of
network systems. Anomaly detection can be considered as a classification
problem where it builds models of normal network behavior, which it uses to
detect new patterns that significantly deviate from the model. Most of the cur-
rent research on anomaly detection is based on the learning of normally and
anomaly behaviors. They do not take into account the previous, re- cent events
to detect the new incoming one. In this paper, we propose a real time
collective anomaly detection model based on neural network learning and feature
operating. Normally a Long Short Term Memory Recurrent Neural Network (LSTM
RNN) is trained only on normal data and it is capable of predicting several
time steps ahead of an input. In our approach, a LSTM RNN is trained with
normal time series data before performing a live prediction for each time step.
Instead of considering each time step separately, the observation of prediction
errors from a certain number of time steps is now proposed as a new idea for
detecting collective anomalies. The prediction errors from a number of the
latest time steps above a threshold will indicate a collective anomaly. The
model is built on a time series version of the KDD 1999 dataset. The
experiments demonstrate that it is possible to offer reliable and efficient for
collective anomaly detection.
Antti Kangasrääsiö, Samuel Kaski
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Inverse reinforcement learning (IRL) aims to explain observed complex
behavior by fitting reinforcement learning models to behavioral data. However,
traditional IRL methods are only applicable when the observations are in the
form of state-action paths. This is a problem in many real-world modelling
settings, where only more limited observations are easily available. To address
this issue, we extend the traditional IRL problem formulation. We call this new
formulation the inverse reinforcement learning from summary data (IRL-SD)
problem, where instead of state-action paths, only summaries of the paths are
observed. We propose exact and approximate methods for both maximum likelihood
and full posterior estimation for IRL-SD problems. Through case studies we
compare these methods, demonstrating that the approximate methods can be used
to solve moderate-sized IRL-SD problems in reasonable time.
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous
Comments: Submitted to Interspeech 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)
A text-to-speech synthesis system typically consists of multiple stages, such
as a text analysis frontend, an acoustic model and an audio synthesis module.
Building these components often requires extensive domain expertise and may
contain brittle design choices. In this paper, we present Tacotron, an
end-to-end generative text-to-speech model that synthesizes speech directly
from characters. Given <text, audio> pairs, the model can be trained completely
from scratch with random initialization. We present several key techniques to
make the sequence-to-sequence framework perform well for this challenging task.
Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English,
outperforming a production parametric system in terms of naturalness. In
addition, since Tacotron generates speech at the frame level, it’s
substantially faster than sample-level autoregressive methods.
Bryan Cai, Constantinos Daskalakis, Gautam Kamath
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)
We develop differentially private hypothesis testing methods for the small
sample regime. Given a sample (cal D) from a categorical distribution (p) over
some domain (Sigma), an explicitly described distribution (q) over (Sigma),
some privacy parameter (varepsilon), accuracy parameter (alpha), and
requirements (eta_{
m I}) and (eta_{
m II}) for the type I and type II
errors of our test, the goal is to distinguish between (p=q) and
(d_{
m{TV}}(p,q) geq alpha).
We provide theoretical bounds for the sample size (|{cal D}|) so that our
method both satisfies ((varepsilon,0))-differential privacy, and guarantees
(eta_{
m I}) and (eta_{
m II}) type I and type II errors. We show that
differential privacy may come for free in some regimes of parameters, and we
always beat the sample complexity resulting from running the (chi^2)-test with
noisy counts, or standard approaches such as repetition for endowing
non-private (chi^2)-style statistics with differential privacy guarantees. We
experimentally compare the sample complexity of our method to that of recently
proposed methods for private hypothesis testing.
Peng Peng (1), Quan Yuan (1), Ying Wen (2), Yaodong Yang (2), Zhenkun Tang (1), Haitao Long (1), Jun Wang (2) ((1) Alibaba Group, (2) University College London)
Comments: 13 pages, 10 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Real-world artificial intelligence (AI) applications often require multiple
agents to work in a collaborative effort. Efficient learning for intra-agent
communication and coordination is an indispensable step towards general AI. In
this paper, we take StarCraft combat game as the test scenario, where the task
is to coordinate multiple agents as a team to defeat their enemies. To maintain
a scalable yet effective communication protocol, we introduce a multiagent
bidirectionally-coordinated network (BiCNet [‘bIknet]) with a vectorised
extension of actor-critic formulation. We show that BiCNet can handle different
types of combats under diverse terrains with arbitrary numbers of AI agents for
both sides. Our analysis demonstrates that without any supervisions such as
human demonstrations or labelled data, BiCNet could learn various types of
coordination strategies that is similar to these of experienced game players.
Moreover, BiCNet is easily adaptable to the tasks with heterogeneous agents. In
our experiments, we evaluate our approach against multiple baselines under
different scenarios; it shows state-of-the-art performance, and possesses
potential values for large-scale real-world applications.
Hexiang Hu, Zhiwei Deng, Guang-Tong Zhou, Fei Sha, Greg Mori
Comments: Pre-prints
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Semantic segmentation requires a detailed labeling of image pixels by object
category. Information derived from local image patches is necessary to describe
the detailed shape of individual objects. However, this information is
ambiguous and can result in noisy labels. Global inference of image content can
instead capture the general semantic concepts present. We advocate that
holistic inference of image concepts provides valuable information for detailed
pixel labeling. We propose a generic framework to leverage holistic information
in the form of a LabelBank for pixel-level segmentation.
We show the ability of our framework to improve semantic segmentation
performance in a variety of settings. We learn models for extracting a holistic
LabelBank from visual cues, attributes, and/or textual descriptions. We
demonstrate improvements in semantic segmentation accuracy on standard datasets
across a range of state-of-the-art segmentation architectures and holistic
inference approaches.
Haonan Yu, Haichao Zhang, Wei Xu
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We tackle a task where an agent learns to navigate in a 2D maze-like
environment called XWORLD. In each session, the agent perceives a sequence of
raw-pixel frames, a natural language command issued by a teacher, and a set of
rewards. The agent learns the teacher’s language from scratch in a grounded and
compositional manner, such that after training it is able to correctly execute
zero-shot commands: 1) the combination of words in the command never appeared
before, and/or 2) the command contains new object concepts that are learned
from another task but never learned from navigation. Our deep framework for the
agent is trained end to end: it learns simultaneously the visual
representations of the environment, the syntax and semantics of the language,
and the action module that outputs actions. The zero-shot learning capability
of our framework results from its compositionality and modularity with
parameter tying. We visualize the intermediate outputs of the framework,
demonstrating that the agent truly understands how to solve the problem. We
believe that our results provide some preliminary insights on how to train an
agent with similar abilities in a 3D environment.
Hossein Hosseini, Baicen Xiao, Radha Poovendran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Despite the rapid progress of the techniques for image classification, video
annotation has remained a challenging task. Automated video annotation would be
a breakthrough technology, enabling users to search within the videos.
Recently, Google introduced the Cloud Video Intelligence API for video
analysis. As per the website, the system “separates signal from noise, by
retrieving relevant information at the video, shot or per frame.” A
demonstration website has been also launched, which allows anyone to select a
video for annotation. The API then detects the video labels (objects within the
video) as well as shot labels (description of the video events over time). In
this paper, we examine the usability of the Google’s Cloud Video Intelligence
API in adversarial environments. In particular, we investigate whether an
adversary can manipulate a video in such a way that the API will return only
the adversary-desired labels. For this, we select an image that is different
from the content of the Video and insert it, periodically and at a very low
rate, into the video. We found that if we insert one image every two seconds,
the API is deceived into annotating the entire video as if it only contains the
inserted image. Note that the modification to the video is hardly noticeable
as, for instance, for a typical frame rate of 25, we insert only one image per
50 video frames. We also found that, by inserting one image per second, all the
shot labels returned by the API are related to the inserted image. We perform
the experiments on the sample videos provided by the API demonstration website
and show that our attack is successful with different videos and images.
Yanhai Gan, Huifang Chi, Ying Gao, Jun Liu, Guoqiang Zhong, Junyu Dong
Comments: 7 pages, 4 figures, icme2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
This paper investigates a novel task of generating texture images from
perceptual descriptions. Previous work on texture generation focused on either
synthesis from examples or generation from procedural models. Generating
textures from perceptual attributes have not been well studied yet. Meanwhile,
perceptual attributes, such as directionality, regularity and roughness are
important factors for human observers to describe a texture. In this paper, we
propose a joint deep network model that combines adversarial training and
perceptual feature regression for texture generation, while only random noise
and user-defined perceptual attributes are required as input. In this model, a
preliminary trained convolutional neural network is essentially integrated with
the adversarial framework, which can drive the generated textures to possess
given perceptual attributes. An important aspect of the proposed model is that,
if we change one of the input perceptual features, the corresponding appearance
of the generated textures will also be changed. We design several experiments
to validate the effectiveness of the proposed method. The results show that the
proposed method can produce high quality texture images with desired perceptual
properties.
Rui Zhao, Haider Ali, Patrick van der Smagt
Comments: 8 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
The recognition of actions from video sequences has many applications in
health monitoring, assisted living, surveillance, and smart homes. Despite
advances in sensing, in particular related to 3D video, the methodologies to
process the data are still subject to research. We demonstrate superior results
by a system which combines recurrent neural networks with convolutional neural
networks in a voting approach. The gated-recurrent-unit-based neural networks
are particularly well-suited to distinguish actions based on long-term
information from optical tracking data; the 3D-CNNs focus more on detailed,
recent information from video data. The resulting features are merged in an SVM
which then classifies the movement. In this architecture, our method improves
recognition rates of state-of-the-art methods by 14% on standard data sets.
Reza Mosayebi, Vahid Jamali, Nafiseh Ghoroghchian, Robert Schober, Masoumeh Nasiri-Kenari, Mahdieh Mehrabi
Comments: 30 pages, 9 figures
Subjects: Information Theory (cs.IT)
In this paper, we consider abnormality detection via diffusive molecular
communications (MCs) for a network consisting of several sensors and a fusion
center (FC). If a sensor detects an abnormality, it injects into the medium a
number of molecules which is proportional to the sensed value. Two transmission
schemes for releasing molecules into the medium are considered. In the first
scheme, referred to as DTM, each sensor releases a different type of molecule,
whereas in the second scheme, referred to as STM, all sensors release the same
type of molecule. The molecules released by the sensors propagate through the
MC channel and some may reach the FC where the final decision regarding whether
or not an abnormality has occurred is made. We derive the optimal decision
rules for both DTM and STM. However, the optimal detectors entail high
computational complexity as log-likelihood ratios (LLRs) have to be computed.
To overcome this issue, we show that the optimal decision rule for STM can be
transformed into an equivalent low-complexity decision rule. Since a similar
transformation is not possible for DTM, we propose simple low-complexity
sub-optimal detectors based on different approximations of the LLR. The
proposed low-complexity detectors are more suitable for practical MC systems
than the original complex optimal decision rule, particularly when the FC is a
nano-machine with limited computational capabilities. Furthermore, we analyze
the performance of the proposed detectors in terms of their false alarm and
missed detection probabilities. Simulation results verify our analytical
derivations and reveal interesting insights regarding the trade-off between
complexity and performance of the proposed detectors and the considered DTM and
STM schemes.
Albrecht Wolf, Philipp Schulz, David Öhmann, Meik Dörpinghaus, Gerhard Fettweis
Subjects: Information Theory (cs.IT)
Multi-connectivity is considered to be key for enabling reliable
transmissions and enhancing data rates in future wireless networks. In this
work, we quantify the communication performance by the outage probability and
the system throughput. We establish a remarkably simple, yet accurate
analytical framework based on distributed source coding to describe the outage
probability and the system throughput in dependency on the number of links, the
modulation scheme, the code rate, the bandwidth, and the received
signal-to-noise ratio (SNR). It is known that a tradeoff exists between the
outage probability and the system throughput. To investigate this tradeoff we
define two modes to either achieve low outage probabilities or high system
throughput which we refer to as the diversity and the multiplexing mode,
respectively. For the diversity mode, we compare three signal processing
schemes and show the SNR gain of joint decoding in comparison to maximum
selection combining and maximum ratio combing while achieving the same outage
probability. We then establish a diversity-multiplexing tradeoff analysis based
on time sharing between both modes. Additionally, we apply our analytical
framework to real field channel measurements and thereby illustrate the
potential of multi-connectivity in real cellular networks to achieve high
reliability or high throughput.
Thomas Strohmer, Ke Wei
Subjects: Information Theory (cs.IT)
Assume we are given a sum of linear measurements of (s) different rank-(r)
matrices of the form (y = sum_{k=1}^{s} mathcal{A}_k ({X}_k)). When and under
which conditions is it possible to extract (demix) the individual matrices
({X}_k) from the single measurement vector ({y})? And can we do the demixing
numerically efficiently? We present two computationally efficient algorithms
based on hard thresholding to solve this low rank demixing problem. We prove
that under suitable conditions these algorithms are guaranteed to converge to
the correct solution at a linear rate. We discuss applications in connection
with quantum tomography and the Internet-of-Things. Numerical simulations
demonstrate empirically the performance of the proposed algorithms.
Emmanuel Abbe
Subjects: Probability (math.PR); Computational Complexity (cs.CC); Information Theory (cs.IT); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed.
Bryan Cai, Constantinos Daskalakis, Gautam Kamath
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)
We develop differentially private hypothesis testing methods for the small
sample regime. Given a sample (cal D) from a categorical distribution (p) over
some domain (Sigma), an explicitly described distribution (q) over (Sigma),
some privacy parameter (varepsilon), accuracy parameter (alpha), and
requirements (eta_{
m I}) and (eta_{
m II}) for the type I and type II
errors of our test, the goal is to distinguish between (p=q) and
(d_{
m{TV}}(p,q) geq alpha).
We provide theoretical bounds for the sample size (|{cal D}|) so that our
method both satisfies ((varepsilon,0))-differential privacy, and guarantees
(eta_{
m I}) and (eta_{
m II}) type I and type II errors. We show that
differential privacy may come for free in some regimes of parameters, and we
always beat the sample complexity resulting from running the (chi^2)-test with
noisy counts, or standard approaches such as repetition for endowing
non-private (chi^2)-style statistics with differential privacy guarantees. We
experimentally compare the sample complexity of our method to that of recently
proposed methods for private hypothesis testing.
Anurag Anshu, Rahul Jain, Naqueeb Ahmad Warsi
Comments: version 1, 1 figure, 18 pages
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
In this work we consider a quantum generalization of the task considered by
Slepian and Wolf [1973] regarding distributed source compression. In our task
Alice, Bob, Charlie and Referee share a joint pure state. Alice and Bob wish to
send a part of their respective systems to Charlie without collaborating with
each other. We give achievability bounds for this task in the one-shot setting
and provide asymptotic analysis in the case when there is no side information
with Charlie.
Our result implies the result of Abeyesinghe, Devetak, Hayden and Winter
[2009] who studied a special case of this problem. As another special case
wherein Bob holds trivial registers, we recover the result of Devetak and Yard
[2008] regarding quantum state redistribution.