Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, Sebastian Riedel
Comments: Under review as a conference paper at ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)
To enhance developer productivity, all modern integrated development
environments (IDEs) include code suggestion functionality that proposes likely
next tokens at the cursor. While current IDEs work well for statically-typed
languages, their reliance on type annotations means that they do not provide
the same level of support for dynamic programming languages as for
statically-typed languages. Moreover, suggestion engines in modern IDEs do not
propose expressions or multi-statement idiomatic code. Recent work has shown
that language models can improve code suggestion systems by learning from
software repositories. This paper introduces a neural language model with a
sparse pointer network aimed at capturing very long-range dependencies. We
release a large-scale code suggestion corpus of 41M lines of Python code
crawled from GitHub. On this corpus, we found standard neural language models
to perform well at suggesting local phenomena, but struggle to refer to
identifiers that are introduced many tokens in the past. By augmenting a neural
language model with a pointer network specialized in referring to predefined
classes of identifiers, we obtain a much lower perplexity and a 5 percentage
points increase in accuracy for code suggestion compared to an LSTM baseline.
In fact, this increase in code suggestion accuracy is due to a 13 times more
accurate prediction of identifiers. Furthermore, a qualitative analysis shows
this model indeed captures interesting long-range dependencies, like referring
to a class member defined over 60 tokens in the past.
Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We survey results on neural network expressivity described in “On the
Expressive Power of Deep Neural Networks”. The paper motivates and develops
three natural measures of expressiveness, which all display an exponential
dependence on the depth of the network. In fact, all of these measures are
related to a fourth quantity, trajectory length. This quantity grows
exponentially in the depth of the network, and is responsible for the depth
sensitivity observed. These results translate to consequences for networks
during and after training. They suggest that parameters earlier in a network
have greater influence on its expressive power — in particular, given a layer,
its influence on expressivity is determined by the remaining depth of the
network after that layer. This is verified with experiments on MNIST and
CIFAR-10. We also explore the effect of training on the input-output map, and
find that it trades off between the stability and expressivity.
Sanghoon Hong, Byungseok Roh, Kye-Hyeon Kim, Yeongjae Cheon, Minje Park
Comments: Presented at NIPS 2016 Workshop on Efficient Methods for Deep Neural Networks (EMDNN). Continuation of arXiv:1608.08021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In object detection, reducing computational cost is as important as improving
accuracy for most practical usages. This paper proposes a novel network
structure, which is an order of magnitude lighter than other state-of-the-art
networks while maintaining the accuracy. Based on the basic principle of more
layers with less channels, this new deep neural network minimizes its
redundancy by adopting recent innovations including C.ReLU and Inception
structure. We also show that this network can be trained efficiently to achieve
solid results on well-known object detection benchmarks: 84.9% and 84.2% mAP on
VOC2007 and VOC2012 while the required compute is less than 10% of the recent
ResNet-101.
Ari Seff, Jianxiong Xiao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Today’s autonomous vehicles rely extensively on high-definition 3D maps to
navigate the environment. While this approach works well when these maps are
completely up-to-date, safe autonomous vehicles must be able to corroborate the
map’s information via a real time sensor-based system. Our goal in this work is
to develop a model for road layout inference given imagery from on-board
cameras, without any reliance on high-definition maps. However, no sufficient
dataset for training such a model exists. Here, we leverage the availability of
standard navigation maps and corresponding street view images to construct an
automatically labeled, large-scale dataset for this complex scene understanding
problem. By matching road vectors and metadata from navigation maps with Google
Street View images, we can assign ground truth road layout attributes (e.g.,
distance to an intersection, one-way vs. two-way street) to the images. We then
train deep convolutional networks to predict these road layout attributes given
a single monocular RGB image. Experimental evaluation demonstrates that our
model learns to correctly infer the road attributes using only panoramas
captured by car-mounted cameras as input. Additionally, our results indicate
that this method may be suitable to the novel application of recommending
safety improvements to infrastructure (e.g., suggesting an alternative speed
limit for a street).
Gurkirt Singh, Suman Saha, Fabio Cuzzolin
Comments: Submitted to CVPR 2017 for review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a method for multiple spatiotemporal (S/T) action localisation,
classification and early prediction based on a single deep learning platform,
able to perform in an online fashion and in real time. Our online action
detection pipeline comprises of three main steps. In step one, two end-to-end
trainable SSD (Single Shot MultiBox Detector) convolutional neural networks are
employed to regress and classify detection boxes in each video frame
potentially containing an action of interest, one from optical flow and one
from RBG images. In step two, appearance and motion cues are combined by
merging the detection boxes and classification scores generated by the two
networks. In step three, sequences of detection boxes most likely to be
associated with a single action instance, called `action tubes’, are
constructed incrementally and efficiently in an online fashion, allowing the
system to perform early action class prediction and spatial localisation in
real time. Empirical results on the challenging UCF101 and J-HMDB-21 datasets
demonstrate new state-of-the-art results in S/T action localisation on UCF101,
even when compared to offline competitors, and superior results on early action
label prediction across the board. To the best of our knowledge, ours is the
first platform able to perform online spatial and temporal action localisation.
The system works in real time (30fps) when computing optical flow
incrementally, perform better than real time (52fps) when processing only RBG
images.
Eric Heim, Alexander Seitel, Christian Stock, Fabian Isensee, Lena Maier-Hein
Comments: 13 pages, 12 figures, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
With the rapidly increasing interest in machine learning based solutions for
automatic image annotation, the availability of reference image annotations for
algorithm training is one of the major bottlenecks in the field. Crowdsourcing
has evolved as a valuable option for low-cost and large-scale data annotation;
however, quality control remains a major issue which needs to be addressed. To
our knowledge, we are the first to investigate the analysis of clickstreams to
improve crowd-based image segmentation. Our method involves training a
regressor to estimate a worker’s segmentation performance from the clickstream
data, where the performance is represented by the DICE similarity coefficient
(DSC) comparing the worker’s annotation and the reference segmentation. The DSC
prediction can be used to identify spammers and weight individual workers by
their (estimated) skill when merging multiple segmentations of one image. Using
a total of 20,000 crowd annotations performed on publicly available data of
cars and cats, we show that (1) our method is highly accurate in predicting the
DSC based on clickstream data and thus outperforms state-of-the-art methods for
merging multiple annotations and that (2) the regressor does not need to be
trained on the object class (e.g. cars) that it is applied to (e.g. cats). In
contrast to previously proposed methods, our approach does not rely on
additional sanity tasks and can thus be regarded as a low-cost option for spam
control and confidence analysis in the context of crowd-based image annotation.
Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, Yisheng Zhong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most existing person re-identification (ReID) methods rely only on the
spatial appearance information from either one or multiple person images,
whilst ignore the space-time cues readily available in video or image-sequence
data. Moreover, they often assume the availability of exhaustively labelled
cross-view pairwise data for every camera pair, making them non-scalable to
ReID applications in real-world large scale camera networks. In this work, we
introduce a novel video based person ReID method capable of accurately matching
people across views from arbitrary unaligned image-sequences without any
labelled pairwise data. Specifically, we introduce a new space-time person
representation by encoding multiple granularities of spatio-temporal dynamics
in form of time series. Moreover, a Time Shift Dynamic Time Warping (TS-DTW)
model is derived for performing automatically alignment whilst achieving data
selection and matching between inherently inaccurate and incomplete sequences
in a unified way. We further extend the TS-DTW model for accommodating multiple
feature-sequences of an image-sequence in order to fuse information from
different descriptions. Crucially, this model does not require pairwise
labelled training data (i.e. unsupervised) therefore readily scalable to large
scale camera networks of arbitrary camera pairs without the need for exhaustive
data annotation for every camera pair. We show the effectiveness and advantages
of the proposed method by extensive comparisons with related state-of-the-art
approaches using two benchmarking ReID datasets, PRID2011 and iLIDS-VID.
Vardan Papyan, Ronen Talmon
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Consider a set of multiple, multimodal sensors capturing a complex system or
a physical phenomenon of interest. Our primary goal is to distinguish the
underlying sources of variability manifested in the measured data. The first
step in our analysis is to find the common source of variability present in all
sensor measurements. We base our work on a recent paper, which tackles this
problem with alternating diffusion (AD). In this work, we suggest to further
the analysis by extracting the sensor-specific variables in addition to the
common source. We propose an algorithm, which we analyze theoretically, and
then demonstrate on three different applications: a synthetic example, a toy
problem, and the task of fetal ECG extraction.
Alan Lukežič, Tomáš Vojíř, Luka Čehovin, Jiří Matas, Matej Kristan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Short-term tracking is an open and challenging problem for which
discriminative correlation filters (DCF) have shown excellent performance. We
introduce the channel and spatial reliability concepts to DCF tracking and
provide a novel learning algorithm for its efficient and seamless integration
in the filter update and the tracking process. The spatial reliability map
adjusts the filter support to the part of the object suitable for tracking.
This both allows to enlarge the search region and improves tracking of
non-rectangular objects. Reliability scores reflect channel-wise quality of the
learned filters and are used as feature weighting coefficients in localization.
Experimentally, with only two simple standard features, HoGs and Colornames,
the novel CSR-DCF method — DCF with Channel and Spatial Reliability —
achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB100. The CSR-DCF
runs in real-time on a CPU.
Pauline Luc, Camille Couprie, Soumith Chintala, Jakob Verbeek
Journal-ref: NIPS Workshop on Adversarial Training, Dec 2016, Barcelona, Spain
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Adversarial training has been shown to produce state of the art results for
generative image modeling. In this paper we propose an adversarial training
approach to train semantic segmentation models. We train a convolutional
semantic segmentation network along with an adversarial network that
discriminates segmentation maps coming either from the ground truth or from the
segmentation network. The motivation for our approach is that it can detect and
correct higher-order inconsistencies between ground truth segmentation maps and
the ones produced by the segmentation net. Our experiments show that our
adversarial training approach leads to improved accuracy on the Stanford
Background and PASCAL VOC 2012 datasets.
Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Michael M. Bronstein
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning has achieved a remarkable performance breakthrough in several
fields, most notably in speech recognition, natural language processing, and
computer vision. In particular, convolutional neural network (CNN)
architectures currently produce state-of-the-art performance on a variety of
image analysis tasks such as object detection and recognition. Most of deep
learning research has so far focused on dealing with 1D, 2D, or 3D
Euclidean-structured data such as acoustic signals, images, or videos.
Recently, there has been an increasing interest in geometric deep learning,
attempting to generalize deep learning methods to non-Euclidean structured data
such as graphs and manifolds, with a variety of applications from the domains
of network analysis, computational social science, or computer graphics. In
this paper, we propose a unified framework allowing to generalize CNN
architectures to non-Euclidean domains (graphs and manifolds) and learn local,
stationary, and compositional task-specific features. We show that various
non-Euclidean CNN methods previously proposed in the literature can be
considered as particular instances of our framework. We test the proposed
method on standard tasks from the realms of image-, graph- and 3D shape
analysis and show that it consistently outperforms previous approaches.
Huan Lei, Guang Jiang, Long Quan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Information about the illuminant color is well contained in both achromatic
regions and the specular components of highlight regions. In this paper, we
propose a novel way to achieve color constancy by exploiting such clues. The
key to our approach lies in the use of suitably extracted derivative colors,
which are able to compute the illuminant color robustly with kernel density
estimation. While extracting derivative colors from achromatic regions to
approximate the illuminant color well is basically straightforward, the success
of our extraction in highlight regions is attributed to the different rates of
variation of the diffuse and specular magnitudes in the dichromatic reflection
model. The proposed approach requires no training phase and is simple to
implement. More significantly, it performs quite satisfactorily under
inter-database parameter settings. Our experiments on three standard databases
demonstrate its effectiveness and fine performance in comparison to
state-of-the-art methods.
Shuochen Su, Mauricio Delbracio, Jue Wang, Guillermo Sapiro, Wolfgang Heidrich, Oliver Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Motion blur from camera shake is a major problem in videos captured by
hand-held devices. Unlike single-image deblurring, video-based approaches can
take advantage of the abundant information that exists across neighboring
frames. As a result the best performing methods rely on aligning nearby frames.
However, aligning images is a computationally expensive and fragile procedure,
and methods that aggregate information must therefore be able to identify which
regions have been accurately aligned and which have not, a task which requires
high level scene understanding. In this work, we introduce a deep learning
solution to video deblurring, where a CNN is trained end-to-end to learn how to
accumulate information across frames. To train this network, we collected a
dataset of real videos recorded with a high framerate camera, which we use to
generate synthetic motion blur for supervision. We show that the features
learned from this dataset extend to deblurring motion blur that arises due to
camera shake in a wide range of videos, and compare the quality of results to a
number of other baselines.
Samitha Herath, Mehrtash Harandi, Fatih Porikli
Comments: 23 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper introduces a learning scheme to construct a Hilbert space (i.e., a
vector space along its inner product) to address both unsupervised and
semi-supervised domain adaptation problems. This is achieved by learning
projections from each domain to a latent space along the Mahalanobis metric of
the latent space to simultaneously minimizing a notion of domain variance while
maximizing a measure of discriminatory power. In particular, we make use of the
Riemannian optimization techniques to match statistical properties (e.g., first
and second order statistics) between samples projected into the latent space
from different domains. Upon availability of class labels, we further deem
samples sharing the same label to form more compact clusters while pulling away
samples coming from different classes.We extensively evaluate and contrast our
proposal against state-of-the-art methods for the task of visual domain
adaptation using both handcrafted and deep-net features. Our experiments show
that even with a simple nearest neighbor classifier, the proposed method can
outperform several state-of-the-art methods benefitting from more involved
classification schemes.
Tobias Pohlen, Alexander Hermans, Markus Mathias, Bastian Leibe
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Semantic image segmentation is an essential component of modern autonomous
driving systems, as an accurate understanding of the surrounding scene is
crucial to navigation and action planning. Current state-of-the-art approaches
in semantic image segmentation rely on pre-trained networks that were initially
developed for classifying images as a whole. While these networks exhibit
outstanding recognition performance (i.e., what is visible?), they lack
localization accuracy (i.e., where precisely is something located?). Therefore,
additional processing steps have to be performed in order to obtain
pixel-accurate segmentation masks at the full image resolution. To alleviate
this problem we propose a novel ResNet-like architecture that exhibits strong
localization and recognition performance. We combine multi-scale context with
pixel-level accuracy by using two processing streams within our network: One
stream carries information at the full image resolution, enabling precise
adherence to segment boundaries. The other stream undergoes a sequence of
pooling operations to obtain robust features for recognition. The two streams
are coupled at the full image resolution using residuals. Without additional
processing steps and without pre-training, our approach achieves an
intersection-over-union score of 71.8% on the Cityscapes dataset.
Min Bai, Raquel Urtasun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most contemporary approaches to instance segmentation use complex pipelines
involving conditional random fields, recurrent neural networks, object
proposals, or template matching schemes. In our paper, we present a simple yet
powerful end-to-end convolutional neural network to tackle this task. Our
approach combines intuitions from the classical watershed transform and modern
deep learning to produce an energy map of the image where object instances are
unambiguously represented as basins in the energy map. We then perform a cut at
a single energy level to directly yield connected components corresponding to
object instances. Our model achieves an increase of 75% in performance on the
challenging Cityscapes Instance Level Segmentation task over the
state-of-the-art.
Alexander Kirillov, Evgeny Levinkov, Bjoern Andres, Bogdan Savchynskyy, Carsten Rother
Comments: The code would be released at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This work addresses the task of instance-aware semantic segmentation. Our key
motivation is to design a simple method with a new modelling-paradigm, which
therefore has a different trade-off between advantages and disadvantages
compared to known approaches. Our approach, we term InstanceCut, represents the
problem by two output modalities: (i) an instance-agnostic semantic
segmentation and (ii) all instance-boundaries. The former is computed from a
standard convolutional neural network for semantic segmentation, and the latter
is derived from a new instance-aware edge detection model. To reason globally
about the optimal partitioning of an image into instances, we combine these two
modalities into a novel MultiCut formulation. We evaluate our approach on the
challenging CityScapes dataset. Despite the conceptual simplicity of our
approach, we achieve the best result among all published methods, and perform
particularly well for rare object classes.
Ali Diba, Vivek Sharma, Ali Pazandeh, Hamed Pirsiavash, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Object detection is a challenging task in visual understanding domain, and
even more so if the supervision is to be weak. Recently, few efforts to handle
the task without expensive human annotations is established by promising deep
neural network. A new architecture of cascaded networks is proposed to learn a
convolutional neural network (CNN) under such conditions. We introduce two such
architectures, with either two cascade stages or three which are trained in an
end-to-end pipeline. The first stage of both architectures extracts best
candidate of class specific region proposals by training a fully convolutional
network. In the case of the three stage architecture, the middle stage provides
object segmentation, using the output of the activation maps of first stage.
The final stage of both architectures is a part of a convolutional neural
network that performs multiple instance learning on proposals extracted in the
previous stage(s). Our experiments on the PASCAL VOC 2007, 2010, 2012 and large
scale object datasets, ILSVRC 2013, 2014 datasets show improvements in the
areas of weakly-supervised object detection, classification and localization.
Amlan Kar, Nishant Rai, Karan Sikka, Gaurav Sharma
Comments: In submission at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel method for temporally pooling frames in a video for the
task of human action recognition. The method is motivated by the observation
that there are only a small number of frames which, together, contain
sufficient information to discriminate an action class present in a video, from
the rest. The proposed method learns to pool such discriminative and
informative frames, while discarding a majority of the non-informative frames
in a single temporal scan of the video. Our algorithm does so by continuously
predicting the discriminative importance of each video frame and subsequently
pooling them in a deep learning framework. We show the effectiveness of our
proposed pooling method on standard benchmarks where it consistently improves
on baseline pooling methods, with both RGB and optical flow based Convolutional
networks. Further, in combination with complementary video representations, we
show results that are competitive with respect to the state-of-the-art results
on two challenging and publicly available benchmark datasets.
Andrea Palazzi, Francesco Solera, Simone Calderara, Stefano Alletto, Rita Cucchiara
Comments: submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Despite the advent of autonomous cars, it’s likely – at least in the near
future – that human attention will still maintain a central role as a guarantee
in terms of legal responsibility during the driving task. In this paper we
study the dynamics of the driver’s gaze and use it as a proxy to understand
related attentional mechanisms. First, we build our analysis upon two
questions: where and what the driver is looking at? Second, we model the
driver’s gaze by training a coarse-to-fine convolutional network on short
sequences extracted from the DR(eye)VE dataset. Experimental comparison against
different baselines reveal that the driver’s gaze can indeed be learnt to some
extent, despite i) being highly subjective and ii) having only one driver’s
gaze available for each sequence due to the irreproducibility of the scene.
Eventually, we advocate for a new assisted driving paradigm which suggests to
the driver, with no intervention, where she should focus her attention.
Nikolay Jetchev, Urs Bergmann, Roland Vollgraf
Comments: submitted at NIPS 2016 adversarial learning workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Generative adversarial networks (GANs) are a recent approach to train
generative models of data, which have been shown to work particularly well on
image data. In the current paper we introduce a new model for texture synthesis
based on GAN learning. By extending the input noise distribution space from a
single vector to a whole spatial tensor, we create an architecture with
properties well suited to the task of texture synthesis, which we call spatial
GAN (SGAN). To our knowledge, this is the first successful completely
data-driven texture synthesis method based on GANs.
Our method has the following features which make it a state of the art
algorithm for texture synthesis: high image quality of the generated textures,
very high scalability w.r.t. the output texture size, fast real-time forward
generation, the ability to fuse multiple diverse source images in complex
textures. To illustrate these capabilities we present multiple experiments with
different classes of texture images and use cases. We also discuss some
limitations of our method with respect to the types of texture images it can
synthesize, and compare it to other neural techniques for texture generation.
Piotr Koniusz, Yusuf Tas, Fatih Porikli
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose an approach to the domain adaptation, dubbed
Second- or Higher-order Transfer of Knowledge (So-HoT), based on the mixture of
alignments of second- or higher-order scatter statistics between the source and
target domains. The human ability to learn from few labeled samples is a
recurring motivation in the literature for domain adaptation. Towards this end,
we investigate the supervised target scenario for which few labeled target
training samples per category exist. Specifically, we utilize two CNN streams:
the source and target networks fused at the classifier level. Features from the
fully connected layers fc7 of each network are used to compute second- or even
higher-order scatter tensors; one per network stream per class. As the source
and target distributions are somewhat different despite being related, we align
the scatters of the two network streams of the same class (within-class
scatters) to a desired degree with our bespoke loss while maintaining good
separation of the between-class scatters. We train the entire network in
end-to-end fashion. We provide evaluations on the standard Office benchmark
(visual domains), RGB-D combined with Caltech256 (depth-to-rgb transfer) and
Pascal VOC2007 combined with the TU Berlin dataset (image-to-sketch transfer).
We attain state-of-the-art results.
Naila Murray, Hervé Jégou, Florent Perronnin, Andrew Zisserman
Comments: Accepted as regular paper in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We consider the design of an image representation that embeds and aggregates
a set of local descriptors into a single vector. Popular representations of
this kind include the bag-of-visual-words, the Fisher vector and the VLAD. When
two such image representations are compared with the dot-product, the
image-to-image similarity can be interpreted as a match kernel. In match
kernels, one has to deal with interference, i.e. with the fact that even if two
descriptors are unrelated, their matching score may contribute to the overall
similarity.
We formalise this problem and propose two related solutions, both aimed at
equalising the individual contributions of the local descriptors in the final
representation. These methods modify the aggregation stage by including a set
of per-descriptor weights. They differ by the objective function that is
optimised to compute those weights. The first is a “democratisation” strategy
that aims at equalising the relative importance of each descriptor in the set
comparison metric. The second one involves equalising the match of a single
descriptor to the aggregated vector.
These concurrent methods give a substantial performance boost over the state
of the art in image search with short or mid-size vectors, as demonstrated by
our experiments on standard public image retrieval benchmarks.
Pedro A. Marín-Reyes, Javier Lorenzo-Navarro, Modesto Castrillón-Santana
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Color based re-identification methods usually rely on a distance function to
measure the similarity between individuals. In this paper we study the behavior
of several histogram distance measures in different color spaces. We wonder
whether there is a particular histogram distance measure better than others,
likewise also, if there is a color space that present better discrimination
features. Several experiments are designed and evaluated in several images to
obtain measures against various color spaces. We test in several image
databases. A measure ranking is generated to calculate the area under the CMC,
this area is the indicator used to evaluate which distance measure and color
space present the best performance for the considered databases. Also, other
parameters such as the image division in horizontal stripes and number of
histogram bins, have been studied.
Raghavendra Selvan, Jens Petersen, Jesper H. Pedersen, Marleen de Bruijne
Comments: 12 pages. Presented at the MICCAI Pulmonary Image Analysis Workshop, Athens, Greece, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Knowledge of airway tree morphology has important clinical applications in
diagnosis of chronic obstructive pulmonary disease. We present an automatic
tree extraction method based on multiple hypothesis tracking and template
matching for this purpose and evaluate its performance on chest CT images. The
method is adapted from a semi-automatic method devised for vessel segmentation.
Idealized tubular templates are constructed that match airway probability
obtained from a trained classifier and ranked based on their relative
significance. Several such regularly spaced templates form the local hypotheses
used in constructing a multiple hypothesis tree, which is then traversed to
reach decisions. The proposed modifications remove the need for local
thresholding of hypotheses as decisions are made entirely based on statistical
comparisons involving the hypothesis tree. The results show improvements in
performance when compared to the original method and region growing on
intensity images. We also compare the method with region growing on the
probability images, where the presented method does not show substantial
improvement, but we expect it to be less sensitive to local anomalies in the
data.
Shengyong Ding, Junyu Wu, Wei Xu, Hongyang Chao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Training data are critical in face recognition systems. However, labeling a
large scale face data for a particular domain is very tedious. In this paper,
we propose a method to automatically and incrementally construct datasets from
massive weakly labeled data of the target domain which are readily available on
the Internet under the help of a pretrained face model. More specifically,
given a large scale weakly labeled dataset in which each face image is
associated with a label, i.e. the name of an identity, we create a graph for
each identity with edges linking matched faces verified by the existing model
under a tight threshold. Then we use the maximal subgraph as the cleaned data
for that identity. With the cleaned dataset, we update the existing face model
and use the new model to filter the original dataset to get a larger cleaned
dataset. We collect a large weakly labeled dataset containing 530,560 Asian
face images of 7,962 identities from the Internet, which will be published for
the study of face recognition. By running the filtering process, we obtain a
cleaned datasets (99.7+% purity) of size 223,767 (recall 70.9%). On our testing
dataset of Asian faces, the model trained by the cleaned dataset achieves
recognition rate 93.1%, which obviously outperforms the model trained by the
public dataset CASIA whose recognition rate is 85.9%.
Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, Pierre Vandergheynst
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Many signal processing problems involve data whose underlying structure is
non-Euclidean, but may be modeled as a manifold or (combinatorial) graph. For
instance, in social networks, the characteristics of users can be modeled as
signals on the vertices of the social graph. Sensor networks are graph models
of distributed interconnected sensors, whose readings are modelled as
time-dependent signals on the vertices. In genetics, gene expression data are
modeled as signals defined on the regulatory network. In neuroscience, graph
models are used to represent anatomical and functional structures of the brain.
Modeling data given as points in a high-dimensional Euclidean space using
nearest neighbor graphs is an increasingly popular trend in data science,
allowing practitioners access to the intrinsic structure of the data. In
computer graphics and vision, 3D objects are modeled as Riemannian manifolds
(surfaces) endowed with properties such as color texture. Even more complex
examples include networks of operators, e.g., functional correspondences or
difference operators in a collection of 3D shapes, or orientations of
overlapping cameras in multi-view vision (“structure from motion”) problems.
The complexity of geometric data and the availability of very large datasets
(in the case of social networks, on the scale of billions) suggest the use of
machine learning techniques. In particular, deep learning has recently proven
to be a powerful tool for problems with large datasets with underlying
Euclidean structure. The purpose of this paper is to overview the problems
arising in relation to geometric deep learning and present solutions existing
today for this class of problems, as well as key difficulties and future
research directions.
Junyu Wu, Shengyong Ding, Wei Xu, Hongyang Chao
Comments: 10 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep models have achieved impressive performance for face hallucination
tasks. However, we observe that directly feeding the hallucinated facial images
into recog- nition models can even degrade the recognition performance despite
the much better visualization quality. In this paper, we address this problem
by jointly learning a deep model for two tasks, i.e. face hallucination and
recognition. In particular, we design an end-to-end deep convolution network
with hallucination sub-network cascaded by recognition sub-network. The
recognition sub- network are responsible for producing discriminative feature
representations using the hallucinated images as inputs generated by
hallucination sub-network. During training, we feed LR facial images into the
network and optimize the parameters by minimizing two loss items, i.e. 1) face
hallucination loss measured by the pixel wise difference between the ground
truth HR images and network-generated images; and 2) verification loss which is
measured by the classification error and intra-class distance. We extensively
evaluate our method on LFW and YTF datasets. The experimental results show that
our method can achieve recognition accuracy 97.95% on 4x down-sampled LFW
testing set, outperforming the accuracy 96.35% of conventional face recognition
model. And on the more challenging YTF dataset, we achieve recognition accuracy
90.65%, a margin over the recognition accuracy 89.45% obtained by conventional
face recognition model on the 4x down-sampled version.
Bo Li
Comments: Submitted to ICRA 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
2D fully convolutional network has been recently successfully applied to
object detection from images. In this paper, we extend the fully convolutional
network based detection techniques to 3D and apply it to point cloud data. The
proposed approach is verified on the task of vehicle detection from lidar point
cloud for autonomous driving. Experiments on the KITTI dataset shows a
significant performance improvement over the previous point cloud based
detection approaches.
Hexiang Hu, Zhiwei Deng, Guang-tong Zhou, Fei Sha, Greg Mori
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Semantic segmentation requires a detailed labeling of image pixels by object
category. Information derived from local image patches is necessary to describe
the detailed shape of individual objects. However, this information is
ambiguous and can result in noisy labels. Global inference of image content can
instead capture the general semantic concepts present. We advocate that
high-recall holistic inference of image concepts provides valuable information
for detailed pixel labeling. We build a two-stream neural network architecture
that facilitates information flow from holistic information to local pixels,
while keeping common image features shared among the low-level layers of both
the holistic analysis and segmentation branches. We empirically evaluate our
network on four standard semantic segmentation datasets. Our network obtains
state-of-the-art performance on PASCAL-Context and NYUDv2, and ablation studies
verify its effectiveness on ADE20K and SIFT-Flow.
Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh
Comments: Submitted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a realtime approach for multi-person 2D pose estimation that
predicts vector fields, which we refer to as Part Affinity Fields (PAFs), that
directly expose the association between anatomical parts in an image. The
architecture is designed to jointly learn part locations and their association,
via two branches of the same sequential prediction process. The sequential
prediction enables the part confidence maps and the association fields to
encode global context, while allowing an efficient bottom-up parsing step that
maintains tractable runtime complexity. Our method has set the state-of-the-art
performance on the inaugural MSCOCO 2016 keypoints challenge, and significantly
exceeds the previous state-of-the-art result on the MPII Multi-Person
benchmark, both in performance and efficiency.
Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)
A Semantic Compositional Network (SCN) is developed for image captioning, in
which semantic concepts (i.e., tags) are detected from the image, and the
probability of each tag is used to compose the parameters in a long short-term
memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an
ensemble of tag-dependent weight matrices. The degree to which each member of
the ensemble is used to generate an image caption is tied to the
image-dependent probability of the corresponding tag. In addition to captioning
images, we also extend the SCN to generate captions for video clips. We
qualitatively analyze semantic composition in SCNs, and quantitatively evaluate
the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text.
Experimental results show that the proposed method significantly outperforms
prior state-of-the-art approaches, across multiple evaluation metrics.
Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville
Comments: 23 pages
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We introduce GuessWhat?!, a two-player guessing game as a testbed for
research on the interplay of computer vision and dialogue systems. The goal of
the game is to locate an unknown object in a rich image scene by asking a
sequence of questions. Higher-level image understanding, like spatial reasoning
and language grounding, is required to solve the proposed task. Our key
contribution is the collection of a large-scale dataset consisting of 150K
human-played games with a total of 800K visual question-answer pairs on 66K
images. We explain our design decisions in collecting the dataset and introduce
the oracle and questioner tasks that are associated with the two players of the
game. We prototyped deep learning models to establish initial baselines of the
introduced tasks.
Junhua Mao, Jiajing Xu, Yushi Jing, Alan Yuille
Comments: Appears in NIPS 2016. The datasets introduced in this work will be gradually released on the project page
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we focus on training and evaluating effective word embeddings
with both text and visual information. More specifically, we introduce a
large-scale dataset with 300 million sentences describing over 40 million
images crawled and downloaded from publicly available Pins (i.e. an image with
sentence descriptions uploaded by users) on Pinterest. This dataset is more
than 200 times larger than MS COCO, the standard large-scale image dataset with
sentence descriptions. In addition, we construct an evaluation dataset to
directly assess the effectiveness of word embeddings in terms of finding
semantically similar or related words and phrases. The word/phrase pairs in
this evaluation dataset are collected from the click data with millions of
users in an image search system, thus contain rich semantic relationships.
Based on these datasets, we propose and compare several Recurrent Neural
Networks (RNNs) based multimodal (text and image) models. Experiments show that
our model benefits from incorporating the visual information into the word
embeddings, and a weight sharing strategy is crucial for learning such
multimodal embeddings. The project page is:
this http URL
Xin Li, Alex Belianinov, Stephen Jesse, Chiwoo Park
Comments: 26 Pages, 12 Figures, 2 Tables
Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV)
This paper presents a regularized regression model with two-level structural
sparsity penalties and applies it for locating individual atoms in a noisy
electron microscope image. For crystalline materials, the locations of atoms
have spatial symmetries, forming a few regular lattice groups. Therefore, by
simply estimating the underlying lattice groups seen in the image, one can
locate most atoms in the image accurately. Identifying the few underlying
lattice groups is formulated as a sparse group selection problem. On the other
hand, some positions on the lattice groups can be vacant due to atomic defects,
so simply finding the lattice groups may result in many false detections on the
vacant positions. To minimize such false detections, the proposed model
includes an individual sparsity regularization in addition to the group
sparsity for a within-group selection, which results in a regularization
regression model with two-level sparsities. We propose a modification of the
group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to
solve the problem. The convergence analysis and statistical analysis of the
proposed algorithm are presented. The proposed algorithm is also evaluated
through numerical experiments with two simulated images and three real images.
Sulabh Kumra, Christopher Kanan
Comments: Submitted to CVPR 2017
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Deep learning has significantly advanced computer vision and natural language
processing. While there have been some successes in robotics using deep
learning, deep learning has not been widely adopted. In this paper, we present
a novel robotic grasp detection system that predicts the best grasping pose of
a parallel-plate robotic gripper for novel objects using the RGB-D image of the
scene. The proposed model uses a deep convolutional neural network to extract
features from the scene and then uses a shallow convolutional neural network to
predict the graspability of the object of interest for a specific position and
orientation. Our multi-modal model achieved an accuracy of 88.96% and runs at
real-time speeds. This redefines the state-ofthe- art for robotic grasp
detection.
Till Mossakowski, Fabian Neuhaus
Subjects: Artificial Intelligence (cs.AI)
This paper discusses the semantics of weighted argumentation graphs that are
biplor, i.e. contain both attacks and support graphs. The work builds on
previous work by Amgoud, Ben-Naim et. al., which presents and compares several
semantics for argumentation graphs that contain only supports or only attacks
relationships, respectively.
Florentin Smarandache, Surapati Pramanik (Editors)
Comments: 424 pages
Journal-ref: Pons asbl, Brussels, 2016
Subjects: Artificial Intelligence (cs.AI)
Neutrosophic theory and applications have been expanding in all directions at
an astonishing rate especially after the introduction the journal entitled
Neutrosophic Sets and Systems. New theories, techniques, algorithms have been
rapidly developed. One of the most striking trends in the neutrosophic theory
is the hybridization of neutrosophic set with other potential sets such as
rough set, bipolar set, soft set, hesitant fuzzy set, etc. The different hybrid
structure such as rough neutrosophic set, single valued neutrosophic rough set,
bipolar neutrosophic set, single valued neutrosophic hesitant fuzzy set, etc.
are proposed in the literature in a short period of time. Neutrosophic set has
been a very important tool in all various areas of data mining, decision
making, e-learning, engineering, medicine, social science, and some more. The
book New Trends in Neutrosophic Theories and Applications focuses on theories,
methods, algorithms for decision making and also applications involving
neutrosophic information. Some topics deal with data mining, decision making,
e-learning, graph theory, medical diagnosis, probability theory, topology, and
some more.
Nhien Pham Hoang Bao, Hiroyuki Iida
Comments: 10 pages
Subjects: Artificial Intelligence (cs.AI)
This paper explores a novel way for analyzing the tournament structures to
find a best suitable one for the tournament under consideration. It concerns
about three aspects such as tournament conducting cost, competitiveness
development and ranking precision. It then proposes a new method using progress
tree to detect potential throwaway matches. The analysis performed using the
proposed method reveals the strengths and weaknesses of tournament structures.
As a conclusion, single elimination is best if we want to qualify one winner
only, all matches conducted are exciting in term of competitiveness. Double
elimination with proper seeding system is a better choice if we want to qualify
more winners. A reasonable number of extra matches need to be conducted in
exchange of being able to qualify top four winners. Round-robin gives reliable
ranking precision for all participants. However, its conduction cost is very
high, and it fails to maintain competitiveness development.
Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville
Comments: 23 pages
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We introduce GuessWhat?!, a two-player guessing game as a testbed for
research on the interplay of computer vision and dialogue systems. The goal of
the game is to locate an unknown object in a rich image scene by asking a
sequence of questions. Higher-level image understanding, like spatial reasoning
and language grounding, is required to solve the proposed task. Our key
contribution is the collection of a large-scale dataset consisting of 150K
human-played games with a total of 800K visual question-answer pairs on 66K
images. We explain our design decisions in collecting the dataset and introduce
the oracle and questioner tasks that are associated with the two players of the
game. We prototyped deep learning models to establish initial baselines of the
introduced tasks.
Bjørn Magnus Mathisen, Peter Haro, Bård Hanssen, Sara Björk, Ståle Walderhaug
Subjects: Artificial Intelligence (cs.AI)
Decision support systems help decision makers make better decisions in the
face of complex decision problems (e.g. investment or policy decisions).
Fisheries and Aquaculture is a domain where decision makers face such decisions
since they involve factors from many different scientific fields. No systematic
overview of literature describing decision support systems and their
application in fisheries and aquaculture has been conducted. This paper
summarizes scientific literature that describes decision support systems
applied to the domain of Fisheries and Aquaculture. We use an established
systematic mapping survey method to conduct our literature mapping. Our
research questions are: What decision support systems for fisheries and
aquaculture exists? What are the most investigated fishery and aquaculture
decision support systems topics and how have these changed over time? Do any
current DSS for fisheries provide real- time analytics? Do DSSes in Fisheries
and Aquaculture build their models using machine learning done on captured and
grounded data? The paper then detail how we employ the systematic mapping
method in answering these questions. This results in 27 papers being identified
as relevant and gives an exposition on the primary methods concluded in the
study for designing a decision support system. We provide an analysis of the
research done in the studies collected. We discovered that most literature does
not consider multiple aspects for multiple stakeholders in their work. In
addition we observed that little or no work has been done with real-time
analysis in these decision support systems.
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell
Subjects: Artificial Intelligence (cs.AI)
It is clear that one of the primary tools we can use to mitigate the
potential risk from a misbehaving AI system is the ability to turn the system
off. As the capabilities of AI systems improve, it is important to ensure that
such systems do not adopt subgoals that prevent a human from switching them
off. This is a challenge because many formulations of rational agents create
strong incentives for self-preservation. This is not caused by a built-in
instinct, but because a rational agent will maximize expected utility and
cannot achieve whatever objective it has been given if it is dead. Our goal is
to study the incentives an agent has to allow itself to be switched off. We
analyze a simple game between a human H and a robot R, where H can press R’s
off switch but R can disable the off switch. A traditional agent takes its
reward function for granted: we show that such agents have an incentive to
disable the off switch, except in the special case where H is perfectly
rational. Our key insight is that for R to want to preserve its off switch, it
needs to be uncertain about the utility associated with the outcome, and to
treat H’s actions as important observations about that utility. (R also has no
incentive to switch itself off in this setting.) We conclude that giving
machines an appropriate level of uncertainty about their objectives leads to
safer designs, and we argue that this setting is a useful generalization of the
classical AI paradigm of rational agents.
Jiani Zhang, Xingjian Shi, Irwin King, Dit-Yan Yeung
Comments: Part of the paper is under review in WWW 2017
Subjects: Artificial Intelligence (cs.AI)
The goal of knowledge tracing is to model students’ mastering levels of
underlying knowledge concepts, termed knowledge state, based on students’
exercise performance data. However, existing methods, such as Bayesian
Knowledge Tracing (BKT) or Deep Knowledge Tracing (DKT), either require costly
human-labeled concept annotations or fail to exactly pinpoint which concepts a
student is good at or unfamiliar with. To solve these problems, in this paper
we introduce a new model called Dynamic Key-Value Memory Network (DKVMN) that
can learn representations using nonlinear transformations and directly output a
student’s mastering level of each concept. Unlike standard Memory-Augmented
Neural Networks (MANNs) that facilitate a single memory matrix or two static
memory matrices, our model has one static matrix called key that stores the
knowledge concepts and the other dynamic matrix called value that stores and
updates corresponding concepts’ mastery levels. Experiments show that our DKVMN
model, which is trained end-to-end, consistently outperforms the
state-of-the-art model on a range of knowledge tracing data-sets. We also
illustrate that the learned DKVMN can automatically discover underlying
concepts of the exercises which are typically performed by human annotations,
and depict a student’s changing knowledge state.
Guangming Lang
Comments: It enriches the fuzzy covering rough set theory
Subjects: Artificial Intelligence (cs.AI)
In digital-based information boom, the fuzzy covering rough set model is an
important mathematical tool for artificial intelligence, and how to build the
bridge between the fuzzy covering rough set theory and Pawlak’s model is
becoming a hot research topic. In this paper, we first present the
(gamma-)fuzzy covering based probabilistic and grade approximation operators
and double-quantitative approximation operators. We also study the
relationships among the three types of (gamma-)fuzzy covering based
approximation operators. Second, we propose the (gamma^{ast}-)fuzzy coverings
based multi-granulation probabilistic and grade lower and upper approximation
operators and multi-granulation double-quantitative lower and upper
approximation operators. We also investigate the relationships among these
types of (gamma-)fuzzy coverings based approximation operators. Finally, we
employ several examples to illustrate how to construct the lower and upper
approximations of fuzzy sets with the absolute and relative quantitative
information.
Zhibei Ma, Kai Yin, Lantao Liu, Gaurav S. Sukhatme
Subjects: Artificial Intelligence (cs.AI)
We consider an orienteering problem (OP) where an agent needs to visit a
series (possibly a subset) of depots, from which the maximal accumulated
profits are desired within given limited time budget. Different from most
existing works where the profits are assumed to be static, in this work we
investigate a variant that has time-dependent profits. Specifically, the
profits to be collected change over time and they follow different (e.g.,
independent) time-varying functions. The problem is essentially NP-hard. To
tackle the challenge, we present a simple and effective framework that
incorporates time-variations into the fundamental planning process.
Specifically, we propose a deterministic spatio-temporal representation where
both spatial description and temporal logic are unified into one routing
topology. By employing existing basic sorting and searching algorithms, the
routing solutions can be computed in an extremely efficient way. The proposed
method is easy to implement and extensive numerical results show that our
approach is time efficient and generates near-optimal solutions.
Muhammad Yousefnezhad, Daoqiang Zhang
Comments: Published in the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Multivariate Pattern (MVP) classification can map different cognitive states
to the brain tasks. One of the main challenges in MVP analysis is validating
the generated results across subjects. However, analyzing multi-subject fMRI
data requires accurate functional alignments between neuronal activities of
different subjects, which can rapidly increase the performance and robustness
of the final results. Hyperalignment (HA) is one of the most effective
functional alignment methods, which can be mathematically formulated by the
Canonical Correlation Analysis (CCA) methods. Since HA mostly uses the
unsupervised CCA techniques, its solution may not be optimized for MVP
analysis. By incorporating the idea of Local Discriminant Analysis (LDA) into
CCA, this paper proposes Local Discriminant Hyperalignment (LDHA) as a novel
supervised HA method, which can provide better functional alignment for MVP
analysis. Indeed, the locality is defined based on the stimuli categories in
the train-set, where the correlation between all stimuli in the same category
will be maximized and the correlation between distinct categories of stimuli
approaches to near zero. Experimental studies on multi-subject MVP analysis
confirm that the LDHA method achieves superior performance to other
state-of-the-art HA algorithms.
Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, Sebastian Riedel
Comments: Under review as a conference paper at ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)
To enhance developer productivity, all modern integrated development
environments (IDEs) include code suggestion functionality that proposes likely
next tokens at the cursor. While current IDEs work well for statically-typed
languages, their reliance on type annotations means that they do not provide
the same level of support for dynamic programming languages as for
statically-typed languages. Moreover, suggestion engines in modern IDEs do not
propose expressions or multi-statement idiomatic code. Recent work has shown
that language models can improve code suggestion systems by learning from
software repositories. This paper introduces a neural language model with a
sparse pointer network aimed at capturing very long-range dependencies. We
release a large-scale code suggestion corpus of 41M lines of Python code
crawled from GitHub. On this corpus, we found standard neural language models
to perform well at suggesting local phenomena, but struggle to refer to
identifiers that are introduced many tokens in the past. By augmenting a neural
language model with a pointer network specialized in referring to predefined
classes of identifiers, we obtain a much lower perplexity and a 5 percentage
points increase in accuracy for code suggestion compared to an LSTM baseline.
In fact, this increase in code suggestion accuracy is due to a 13 times more
accurate prediction of identifiers. Furthermore, a qualitative analysis shows
this model indeed captures interesting long-range dependencies, like referring
to a class member defined over 60 tokens in the past.
Peter Wittek, Christian Gogolin
Comments: 8 pages, 1 figure
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
Markov logic networks (MLNs) reconcile two opposing schools in machine
learning and artificial intelligence: causal networks, which account for
uncertainty extremely well, and first-order logic, which allows for formal
deduction. An MLN is essentially a first-order logic template to generate
Markov networks. Inference in MLNs is probabilistic and it is often performed
by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling.
An MLN has many regular, symmetric structures that can be exploited at both
first-order level and in the generated Markov network. We analyze the graph
structures that are produced by various lifting methods and investigate the
extent to which quantum protocols can be used to speed up Gibbs sampling with
state preparation and measurement schemes. We review different such approaches,
discuss their advantages, theoretical limitations, and their appeal to
implementations. We find that a straightforward application of a recent result
yields exponential speedup compared to classical heuristics in approximate
probabilistic inference, thereby demonstrating another example where advanced
quantum resources can potentially prove useful in machine learning.
Jung-Su Ha, Han-Lim Choi
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
This work presents a multiscale framework to solve an inverse reinforcement
learning (IRL) problem for continuous-time/state stochastic systems. We take
advantage of a diffusion wavelet representation of the associated Markov chain
to abstract the state space. This not only allows for effectively handling the
large (and geometrically complex) decision space but also provides more
interpretable representations of the demonstrated state trajectories and also
of the resulting policy of IRL. In the proposed framework, the problem is
divided into the global and local IRL, where the global approximation of the
optimal value functions are obtained using coarse features and the local
details are quantified using fine local features. An illustrative numerical
example on robot path control in a complex environment is presented to verify
the proposed method.
Zheqian Chen, Chi Zhang, Zhou Zhao, Deng Cai
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Community based question answering platforms have attracted substantial users
to share knowledge and learn from each other. As the rapid enlargement of CQA
platforms, quantities of overlapped questions emerge, which makes users
confounded to select a proper reference. It is urgent for us to take effective
automated algorithms to reuse historical questions with corresponding answers.
In this paper we focus on the problem with question retrieval, which aims to
match historical questions that are relevant or semantically equivalent to
resolve one s query directly. The challenges in this task are the lexical gaps
between questions for the word ambiguity and word mismatch problem.
Furthermore, limited words in queried sentences cause sparsity of word
features. To alleviate these challenges, we propose a novel framework named
HNIL which encodes not only the question contents but also the askers social
interactions to enhance the question embedding performance. More specifically,
we apply random walk based learning method with recurrent neural network to
match the similarities between askers question and historical questions
proposed by other users. Extensive experiments on a large scale dataset from a
real world CQA site show that employing the heterogeneous social network
information outperforms the other state of the art solutions in this task.
Zheqian Chen, Ben Gao, Huimin Zhang, Zhou Zhao, Deng Cai
Comments: draft for www
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Community based question answering services have arisen as a popular
knowledge sharing pattern for netizens. With abundant interactions among users,
individuals are capable of obtaining satisfactory information. However, it is
not effective for users to attain answers within minutes. Users have to check
the progress over time until the satisfying answers submitted. We address this
problem as a user personalized satisfaction prediction task. Existing methods
usually exploit manual feature selection. It is not desirable as it requires
careful design and is labor intensive. In this paper, we settle this issue by
developing a new multiple instance deep learning framework. Specifically, in
our settings, each question follows a weakly supervised learning multiple
instance learning assumption, where its obtained answers can be regarded as
instance sets and we define the question resolved with at least one
satisfactory answer. We thus design an efficient framework exploiting multiple
instance learning property with deep learning to model the question answer
pairs. Extensive experiments on large scale datasets from Stack Exchange
demonstrate the feasibility of our proposed framework in predicting askers
personalized satisfaction. Our framework can be extended to numerous
applications such as UI satisfaction Prediction, multi armed bandit problem,
expert finding and so on.
Jiwei Li, Will Monroe, Dan Jurafsky
Subjects: Computation and Language (cs.CL)
In this paper, we propose a simple, fast decoding algorithm that fosters
diversity in neural generation. The algorithm modifies the standard beam search
algorithm by adding an inter-sibling ranking penalty, favoring choosing
hypotheses from diverse parents. We evaluate the proposed model on the tasks of
dialogue response generation, abstractive summarization and machine
translation. We find that diverse decoding helps across all tasks, especially
those for which reranking is needed.
We further propose a variation that is capable of automatically adjusting its
diversity decoding rates for different inputs using reinforcement learning
(RL). We observe a further performance boost from this RL technique. This paper
includes material from the unpublished script “Mutual Information and Diverse
Decoding Improve Neural Machine Translation” (Li and Jurafsky, 2016).
Joji Toyama, Masanori Misono, Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo
Subjects: Computation and Language (cs.CL)
Although attention-based Neural Machine Translation have achieved great
success, attention-mechanism cannot capture the entire meaning of the source
sentence because the attention mechanism generates a target word depending
heavily on the relevant parts of the source sentence. The report of earlier
studies has introduced a latent variable to capture the entire meaning of
sentence and achieved improvement on attention-based Neural Machine
Translation. We follow this approach and we believe that the capturing meaning
of sentence benefits from image information because human beings understand the
meaning of language not only from textual information but also from perceptual
information such as that gained from vision. As described herein, we propose a
neural machine translation model that introduces a continuous latent variable
containing an underlying semantic extracted from texts and images. Our model,
which can be trained end-to-end, requires image information only when training.
Experiments conducted with an English–German translation task show that our
model outperforms over the baseline.
A N Akshatha, Chandana G Upadhyaya, Rajashekara S Murthy
Comments: 7 pages, 10 figures
Subjects: Computation and Language (cs.CL)
Spelling errors are introduced in text either during typing, or when the user
does not know the correct phoneme or grapheme. If a language contains complex
words like sandhi where two or more morphemes join based on some rules, spell
checking becomes very tedious. In such situations, having a spell checker with
sandhi splitter which alerts the user by flagging the errors and providing
suggestions is very useful. A novel algorithm of sandhi splitting is proposed
in this paper. The sandhi splitter can split about 7000 most common sandhi
words in Kannada language used as test samples. The sandhi splitter was
integrated with a Kannada spell checker and a mechanism for generating
suggestions was added. A comprehensive, platform independent, standalone spell
checker with sandhi splitter application software was thus developed and tested
extensively for its efficiency and correctness. A comparative analysis of this
spell checker with sandhi splitter was made and results concluded that the
Kannada spell checker with sandhi splitter has an improved performance. It is
twice as fast, 200 times more space efficient, and it is 90% accurate in case
of complex nouns and 50% accurate for complex verbs. Such a spell checker with
sandhi splitter will be of foremost significance in machine translation
systems, voice processing, etc. This is the first sandhi splitter in Kannada
and the advantage of the novel algorithm is that, it can be extended to all
Indian languages.
Zhe Gan, Chunyuan Li, Changyou Chen, Yunchen Pu, Qinliang Su, Lawrence Carin
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Recurrent neural networks (RNNs) have shown promising performance for
language modeling. However, traditional training of RNNs using back-propagation
through time often suffers from overfitting. One reason for this is that
stochastic optimization (used for large training sets) does not provide good
estimates of model uncertainty. This paper leverages recent advances in
stochastic gradient Markov Chain Monte Carlo (also appropriate for large
training sets) to learn weight uncertainty in RNNs. It yields a principled
Bayesian learning algorithm, adding gradient noise during training (enhancing
exploration of the model-parameter space) and model averaging when testing.
Extensive experiments on various RNN models and across a broad range of
applications demonstrate the superiority of the proposed approach over
stochastic optimization.
Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
Comments: This paper “Bidirectional LSTM-CRF for Clinical Concept Extraction” is accepted for short paper presentation at Clinical Natural Language Processing Workshop at COLING 2016 Osaka, Japan. December 11, 2016
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
Automated extraction of concepts from patient clinical records is an
essential facilitator of clinical research. For this reason, the 2010 i2b2/VA
Natural Language Processing Challenges for Clinical Records introduced a
concept extraction task aimed at identifying and classifying concepts into
predefined categories (i.e., treatments, tests and problems). State-of-the-art
concept extraction approaches heavily rely on handcrafted features and
domain-specific resources which are hard to collect and define. For this
reason, this paper proposes an alternative, streamlined approach: a recurrent
neural network (the bidirectional LSTM with CRF decoding) initialized with
general-purpose, off-the-shelf word embeddings. The experimental results
achieved on the 2010 i2b2/VA reference corpora using the proposed framework
outperform all recent methods and ranks closely to the best submission from the
original 2010 i2b2/VA challenge.
Junhua Mao, Jiajing Xu, Yushi Jing, Alan Yuille
Comments: Appears in NIPS 2016. The datasets introduced in this work will be gradually released on the project page
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we focus on training and evaluating effective word embeddings
with both text and visual information. More specifically, we introduce a
large-scale dataset with 300 million sentences describing over 40 million
images crawled and downloaded from publicly available Pins (i.e. an image with
sentence descriptions uploaded by users) on Pinterest. This dataset is more
than 200 times larger than MS COCO, the standard large-scale image dataset with
sentence descriptions. In addition, we construct an evaluation dataset to
directly assess the effectiveness of word embeddings in terms of finding
semantically similar or related words and phrases. The word/phrase pairs in
this evaluation dataset are collected from the click data with millions of
users in an image search system, thus contain rich semantic relationships.
Based on these datasets, we propose and compare several Recurrent Neural
Networks (RNNs) based multimodal (text and image) models. Experiments show that
our model benefits from incorporating the visual information into the word
embeddings, and a weight sharing strategy is crucial for learning such
multimodal embeddings. The project page is:
this http URL
Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, Sebastian Riedel
Comments: Under review as a conference paper at ICLR 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)
To enhance developer productivity, all modern integrated development
environments (IDEs) include code suggestion functionality that proposes likely
next tokens at the cursor. While current IDEs work well for statically-typed
languages, their reliance on type annotations means that they do not provide
the same level of support for dynamic programming languages as for
statically-typed languages. Moreover, suggestion engines in modern IDEs do not
propose expressions or multi-statement idiomatic code. Recent work has shown
that language models can improve code suggestion systems by learning from
software repositories. This paper introduces a neural language model with a
sparse pointer network aimed at capturing very long-range dependencies. We
release a large-scale code suggestion corpus of 41M lines of Python code
crawled from GitHub. On this corpus, we found standard neural language models
to perform well at suggesting local phenomena, but struggle to refer to
identifiers that are introduced many tokens in the past. By augmenting a neural
language model with a pointer network specialized in referring to predefined
classes of identifiers, we obtain a much lower perplexity and a 5 percentage
points increase in accuracy for code suggestion compared to an LSTM baseline.
In fact, this increase in code suggestion accuracy is due to a 13 times more
accurate prediction of identifiers. Furthermore, a qualitative analysis shows
this model indeed captures interesting long-range dependencies, like referring
to a class member defined over 60 tokens in the past.
Zheqian Chen, Chi Zhang, Zhou Zhao, Deng Cai
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Community based question answering platforms have attracted substantial users
to share knowledge and learn from each other. As the rapid enlargement of CQA
platforms, quantities of overlapped questions emerge, which makes users
confounded to select a proper reference. It is urgent for us to take effective
automated algorithms to reuse historical questions with corresponding answers.
In this paper we focus on the problem with question retrieval, which aims to
match historical questions that are relevant or semantically equivalent to
resolve one s query directly. The challenges in this task are the lexical gaps
between questions for the word ambiguity and word mismatch problem.
Furthermore, limited words in queried sentences cause sparsity of word
features. To alleviate these challenges, we propose a novel framework named
HNIL which encodes not only the question contents but also the askers social
interactions to enhance the question embedding performance. More specifically,
we apply random walk based learning method with recurrent neural network to
match the similarities between askers question and historical questions
proposed by other users. Extensive experiments on a large scale dataset from a
real world CQA site show that employing the heterogeneous social network
information outperforms the other state of the art solutions in this task.
Zheqian Chen, Ben Gao, Huimin Zhang, Zhou Zhao, Deng Cai
Comments: draft for www
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Community based question answering services have arisen as a popular
knowledge sharing pattern for netizens. With abundant interactions among users,
individuals are capable of obtaining satisfactory information. However, it is
not effective for users to attain answers within minutes. Users have to check
the progress over time until the satisfying answers submitted. We address this
problem as a user personalized satisfaction prediction task. Existing methods
usually exploit manual feature selection. It is not desirable as it requires
careful design and is labor intensive. In this paper, we settle this issue by
developing a new multiple instance deep learning framework. Specifically, in
our settings, each question follows a weakly supervised learning multiple
instance learning assumption, where its obtained answers can be regarded as
instance sets and we define the question resolved with at least one
satisfactory answer. We thus design an efficient framework exploiting multiple
instance learning property with deep learning to model the question answer
pairs. Extensive experiments on large scale datasets from Stack Exchange
demonstrate the feasibility of our proposed framework in predicting askers
personalized satisfaction. Our framework can be extended to numerous
applications such as UI satisfaction Prediction, multi armed bandit problem,
expert finding and so on.
Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)
A Semantic Compositional Network (SCN) is developed for image captioning, in
which semantic concepts (i.e., tags) are detected from the image, and the
probability of each tag is used to compose the parameters in a long short-term
memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an
ensemble of tag-dependent weight matrices. The degree to which each member of
the ensemble is used to generate an image caption is tied to the
image-dependent probability of the corresponding tag. In addition to captioning
images, we also extend the SCN to generate captions for video clips. We
qualitatively analyze semantic composition in SCNs, and quantitatively evaluate
the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text.
Experimental results show that the proposed method significantly outperforms
prior state-of-the-art approaches, across multiple evaluation metrics.
Dhanya R Krishnan
Comments: this http URL
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Most data analytics systems that require low-latency execution and efficient
utilization of computing resources, increasingly adopt two computational
paradigms, namely, incremental and approximate computing. Incremental
computation updates the output incrementally instead of re-computing everything
from scratch for successive runs of a job with input changes. Approximate
computation returns an approximate output for a job instead of the exact
output.
Both paradigms rely on computing over a subset of data items instead of
computing over the entire dataset, but they differ in their means for skipping
parts of the computation. Incremental computing relies on the memoization of
intermediate results of sub-computations, and reusing these memoized results
across jobs for sub-computations that are unaffected by the changed input.
Approximate computing relies on representative sampling of the entire dataset
to compute over a subset of data items.
In this thesis, we make the observation that these two computing paradigms
are complementary, and can be married together! The high level idea is to:
design a sampling algorithm that biases the sample selection to the memoized
data items from previous runs. To concretize this idea, we designed an online
stratified sampling algorithm that uses self-adjusting computation to produce
an incrementally updated approximate output with bounded error. We implemented
our algorithm in a data analytics system called IncAppox based on Apache Spark
Streaming. Our evaluation of the system shows that IncApprox achieves the
benefits of both incremental and approximate computing.
Nadine Boudargham, Jacques Bou Abdo, Jacques Demerjian, Christophe Guyeux, Abdallah Makhoul
Comments: Accepted to AICCSA 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)
The rapid development of medical sensors has increased the interest in
Wireless Body Area Network (WBAN) applications where physiological data from
the human body and its environment is gathered, monitored, and analyzed to take
the proper measures. In WBANs, it is essential to design MAC protocols that
ensure adequate Quality of Service (QoS) such as low delay and high
scalability. This paper investigates Medium Access Control (MAC) protocols used
in WBAN, and compares their performance in a high traffic environment. Such
scenario can be induced in case of emergency for example, where physiological
data collected from all sensors on human body should be sent simultaneously to
take appropriate action. This study can also be extended to cover collaborative
WBAN systems where information from different bodies is sent simultaneously
leading to high traffic. OPNET simulations are performed to compare the delay
and scalability performance of the different MAC protocols under the same
experimental conditions and to draw conclusions about the best protocol to be
used in a high traffic environment.
Fabian Reiter
Comments: 9 pages
Subjects: Formal Languages and Automata Theory (cs.FL); Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO)
We establish the equivalence on finite directed graphs between a class of
asynchronous distributed automata and a small fragment of least fixpoint logic.
More specifically, the logic we consider is (a variant of) the fragment of
modal (mu)-calculus that allows least fixpoints but forbids greatest
fixpoints. The corresponding automaton model uses a network of identical
finite-state machines that communicate in an asynchronous manner and whose
state diagram must be acyclic except for self-loops. Exploiting the connection
with logic, we also prove that the expressive power of these machines is
independent of whether or not messages can be lost.
Jurek Czyzowicz, Konstantinos Georgiou, Evangelos Kranakis, Danny Krizanc, Lata Narayanan, Jaroslav Opatrny, Sunil Shende
Comments: 14 pages
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
We consider the problem of fault-tolerant parallel search on an infinite line
by (n) robots. Starting from the origin, the robots are required to find a
target at an unknown location. The robots can move with maximum speed (1) and
can communicate in wireless mode among themselves. However, among the (n)
robots, there are (f) robots that exhibit {em byzantine faults}. A faulty
robot can fail to report the target even after reaching it, or it can make
malicious claims about having found the target when in fact it has not. Given
the presence of such faulty robots, the search for the target can only be
concluded when the non-faulty robots have sufficient verification that the
target has been found. We aim to design algorithms that minimize the value of
(S_d(n,f)), the time to find a target at a distance (d) from the origin by (n)
robots among which (f) are faulty. We give several different algorithms whose
running time depends on the ratio (f/n), the density of faulty robots, and also
prove lower bounds. Our algorithms are optimal for some densities of faulty
robots.
Guoqiang Zhong, Li-Na Wang, Junyu Dong
Comments: About 20 pages. Submitted to Journal of Finance and Data Science as an invited paper
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Since about 100 years ago, to learn the intrinsic structure of data, many
representation learning approaches have been proposed, including both linear
ones and nonlinear ones, supervised ones and unsupervised ones. Particularly,
deep architectures are widely applied for representation learning in recent
years, and have delivered top results in many tasks, such as image
classification, object detection and speech recognition. In this paper, we
review the development of data representation learning methods. Specifically,
we investigate both traditional feature learning algorithms and
state-of-the-art deep learning models. The history of data representation
learning is introduced, while available resources (e.g. online course, tutorial
and book information) and toolboxes are provided. Finally, we conclude this
paper with remarks and some interesting research directions on data
representation learning.
Junhua Mao, Jiajing Xu, Yushi Jing, Alan Yuille
Comments: Appears in NIPS 2016. The datasets introduced in this work will be gradually released on the project page
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we focus on training and evaluating effective word embeddings
with both text and visual information. More specifically, we introduce a
large-scale dataset with 300 million sentences describing over 40 million
images crawled and downloaded from publicly available Pins (i.e. an image with
sentence descriptions uploaded by users) on Pinterest. This dataset is more
than 200 times larger than MS COCO, the standard large-scale image dataset with
sentence descriptions. In addition, we construct an evaluation dataset to
directly assess the effectiveness of word embeddings in terms of finding
semantically similar or related words and phrases. The word/phrase pairs in
this evaluation dataset are collected from the click data with millions of
users in an image search system, thus contain rich semantic relationships.
Based on these datasets, we propose and compare several Recurrent Neural
Networks (RNNs) based multimodal (text and image) models. Experiments show that
our model benefits from incorporating the visual information into the word
embeddings, and a weight sharing strategy is crucial for learning such
multimodal embeddings. The project page is:
this http URL
Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann
Comments: 11 pages, Thirty-First AAAI conference on Artificial Intelligence
Subjects: Learning (cs.LG)
We study the problem of troubleshooting machine learning systems that rely on
analytical pipelines of distinct components. Understanding and fixing errors
that arise in such integrative systems is difficult as failures can occur at
multiple points in the execution workflow. Moreover, errors can propagate,
become amplified or be suppressed, making blame assignment difficult. We
propose a human-in-the-loop methodology which leverages human intellect for
troubleshooting system failures. The approach simulates potential component
fixes through human computation tasks and measures the expected improvements in
the holistic behavior of the system. The method provides guidance to designers
about how they can best improve the system. We demonstrate the effectiveness of
the approach on an automated image captioning system that has been pressed into
real-world use.
Cristian Rusu, John Thompson
Subjects: Learning (cs.LG)
Given a dataset, the task of learning a transform that allows sparse
representations of the data bears the name of dictionary learning. In many
applications, these learned dictionaries represent the data much better than
the static well-known transforms (Fourier, Hadamard etc.). The main downside of
learned transforms is that they lack structure and therefore they are not
computationally efficient, unlike their classical counterparts. This posses
several difficulties especially when using power limited hardware such as
mobile devices, therefore discouraging the application of sparsity techniques
in such scenarios. In this paper we construct orthonormal and non-orthonormal
dictionaries that are factorized as a product of a few basic transformations.
In the orthonormal case, we solve exactly the dictionary update problem for one
basic transformation, which can be viewed as a generalized Givens rotation, and
then propose to construct orthonormal dictionaries that are a product of these
transformations, guaranteeing their fast manipulation. We also propose a method
to construct fast square but non-orthonormal dictionaries that are factorized
as a product of few transforms that can be viewed as a further generalization
of Givens rotations to the non-orthonormal setting. We show how the proposed
transforms can balance very well data representation performance and
computational complexity. We also compare with classical fast and learned
general and orthonormal transforms.
Cristian Rusu, Nuria Gonzalez-Prelcic, Robert Heath
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Dictionary learning is the task of determining a data-dependent transform
that yields a sparse representation of some observed data. The dictionary
learning problem is non-convex, and usually solved via computationally complex
iterative algorithms. Furthermore, the resulting transforms obtained generally
lack structure that permits their fast application to data. To address this
issue, this paper develops a framework for learning orthonormal dictionaries
which are built from products of a few Householder reflectors. Two algorithms
are proposed to learn the reflector coefficients: one that considers a
sequential update of the reflectors and one with a simultaneous update of all
reflectors that imposes an additional internal orthogonal constraint. The
proposed methods have low computational complexity and are shown to converge to
local minimum points which can be described in terms of the spectral properties
of the matrices involved. The resulting dictionaries balance between the
computational complexity and the quality of the sparse representations by
controlling the number of Householder reflectors in their product. Simulations
of the proposed algorithms are shown in the image processing setting where
well-known fast transforms are available for comparisons. The proposed
algorithms have favorable reconstruction error and the advantage of a fast
implementation relative to the classical, unstructured, dictionaries.
Jung-Su Ha, Han-Lim Choi
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
This work presents a multiscale framework to solve an inverse reinforcement
learning (IRL) problem for continuous-time/state stochastic systems. We take
advantage of a diffusion wavelet representation of the associated Markov chain
to abstract the state space. This not only allows for effectively handling the
large (and geometrically complex) decision space but also provides more
interpretable representations of the demonstrated state trajectories and also
of the resulting policy of IRL. In the proposed framework, the problem is
divided into the global and local IRL, where the global approximation of the
optimal value functions are obtained using coarse features and the local
details are quantified using fine local features. An illustrative numerical
example on robot path control in a complex environment is presented to verify
the proposed method.
Vernon J. Lawhern, Amelia J. Solon, Nicholas R. Waytowich, Stephen M. Gordon, Chou P. Hung, Brent J. Lance
Comments: Under review at IEEE Transactions on Biomedical Engineering
Subjects: Learning (cs.LG); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Objective: Brain-Computer Interface technologies (BCI) enable the direct
communication between humans and computers by analyzing brain measurements,
such as electroencephalography (EEG). These technologies have been applied to a
variety of domains, including neuroprosthetic control and the monitoring of
epileptic seizures. Existing BCI systems primarily use a priori knowledge of
EEG features of interest to build machine learning models. Recently,
convolutional networks have been used for automatic feature extraction of large
image databases, where they have obtained state-of-the-art results. In this
work we introduce EEGNet, a compact fully convolutional network for EEG-based
BCIs developed using Deep Learning approaches. Methods: EEGNet is a 4-layer
convolutional network that uses filter factorization for learning a compact
representation of EEG time series. EEGNet is one of the smallest convolutional
networks to date, having less than 2200 parameters for a binary classification.
Results: We show state-of-the-art classification performance across four
different BCI paradigms: P300 event-related potential, error-related
negativity, movement-related cortical potential, and sensory motor rhythm, with
as few as 500 EEG trials. We also show that adding more trials reduces the
error variance of prediction rather than improving classification performance.
Conclusion: We provide preliminary evidence suggesting that our model can be
used with small EEG databases while improving upon the state-of-the-art
performance across several tasks and across subjects. Significance: The EEGNet
neural network architecture provides state-of-the-art performance across
several tasks and across subjects, challenging the notion that large datasets
are required to obtain optimal performance.
Rui Shu, Hung H. Bui, Mohammad Ghavamzadeh
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We propose a neural network framework for high-dimensional conditional
density estimation. The Bottleneck Conditional Density Estimator (BCDE) is a
variant of the conditional variational autoencoder (CVAE) that employs layer(s)
of stochastic variables as the bottleneck between the input x and target y,
where both are high-dimensional. The key to effectively train BCDEs is the
hybrid blending of the conditional generative model with a joint generative
model that leverages unlabeled data to regularize the conditional model. We
show that the BCDE significantly outperforms the CVAE in MNIST quadrant
prediction benchmarks in the fully supervised case and establishes new
benchmarks in the semi-supervised case.
Arnak S. Dalalyan, Edwin Grappin, Quentin Paris
Comments: 30 pages, 2 figures
Subjects: Statistics Theory (math.ST); Learning (cs.LG)
In this paper, we study the statistical behaviour of the Exponentially
Weighted Aggregate (EWA) in the problem of high-dimensional regression with
fixed design. Under the assumption that the underlying regression vector is
sparse, it is reasonable to use the Laplace distribution as a prior. The
resulting estimator and, specifically, a particular instance of it referred to
as the Bayesian lasso, was already used in the statistical literature because
of its computational convenience, even though no thorough mathematical analysis
of its statistical properties was carried out. The present work fills this gap
by establishing sharp oracle inequalities for the EWA with the Laplace prior.
These inequalities show that if the temperature parameter is small, the EWA
with the Laplace prior satisfies the same type of oracle inequality as the
lasso estimator does, as long as the quality of estimation is measured by the
prediction loss. Extensions of the proposed methodology to the problem of
prediction with low-rank matrices are considered.
Maximilian Alber, Julian Zimmert, Urun Dogan, Marius Kloft
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Training of one-vs.-rest SVMs can be parallelized over the number of classes
in a straight forward way. Given enough computational resources, one-vs.-rest
SVMs can thus be trained on data involving a large number of classes. The same
cannot be stated, however, for the so-called all-in-one SVMs, which require
solving a quadratic program of size quadratically in the number of classes. We
develop distributed algorithms for two all-in-one SVM formulations (Lee et al.
and Weston and Watkins) that parallelize the computation evenly over the number
of classes. This allows us to compare these models to one-vs.-rest SVMs on
unprecedented scale. The results indicate superior accuracy on text
classification data.
Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
Comments: This paper “Bidirectional LSTM-CRF for Clinical Concept Extraction” is accepted for short paper presentation at Clinical Natural Language Processing Workshop at COLING 2016 Osaka, Japan. December 11, 2016
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
Automated extraction of concepts from patient clinical records is an
essential facilitator of clinical research. For this reason, the 2010 i2b2/VA
Natural Language Processing Challenges for Clinical Records introduced a
concept extraction task aimed at identifying and classifying concepts into
predefined categories (i.e., treatments, tests and problems). State-of-the-art
concept extraction approaches heavily rely on handcrafted features and
domain-specific resources which are hard to collect and define. For this
reason, this paper proposes an alternative, streamlined approach: a recurrent
neural network (the bidirectional LSTM with CRF decoding) initialized with
general-purpose, off-the-shelf word embeddings. The experimental results
achieved on the 2010 i2b2/VA reference corpora using the proposed framework
outperform all recent methods and ranks closely to the best submission from the
original 2010 i2b2/VA challenge.
Chen Xu, Zhouchen Lin, Hongbin Zha
Comments: The paper is accepted by AAAI-17. We show that multi-factor matrix factorization enjoys superiority over the traditional two-factor case
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
The Schatten-(p) norm ((0<p<1)) has been widely used to replace the nuclear
norm for better approximating the rank function. However, existing methods are
either 1) not scalable for large scale problems due to relying on singular
value decomposition (SVD) in every iteration, or 2) specific to some (p)
values, e.g., (1/2), and (2/3). In this paper, we show that for any (p), (p_1),
and (p_2 >0) satisfying (1/p=1/p_1+1/p_2), there is an equivalence between the
Schatten-(p) norm of one matrix and the Schatten-(p_1) and the Schatten-(p_2)
norms of its two factor matrices. We further extend the equivalence to multiple
factor matrices and show that all the factor norms can be convex and smooth for
any (p>0). In contrast, the original Schatten-(p) norm for (0<p<1) is
non-convex and non-smooth. As an example we conduct experiments on matrix
completion. To utilize the convexity of the factor matrix norms, we adopt the
accelerated proximal alternating linearized minimization algorithm and
establish its sequence convergence. Experiments on both synthetic and real
datasets exhibit its superior performance over the state-of-the-art methods.
Its speed is also highly competitive.
Muhammad Yousefnezhad, Daoqiang Zhang
Comments: Published in the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
Multivariate Pattern (MVP) classification can map different cognitive states
to the brain tasks. One of the main challenges in MVP analysis is validating
the generated results across subjects. However, analyzing multi-subject fMRI
data requires accurate functional alignments between neuronal activities of
different subjects, which can rapidly increase the performance and robustness
of the final results. Hyperalignment (HA) is one of the most effective
functional alignment methods, which can be mathematically formulated by the
Canonical Correlation Analysis (CCA) methods. Since HA mostly uses the
unsupervised CCA techniques, its solution may not be optimized for MVP
analysis. By incorporating the idea of Local Discriminant Analysis (LDA) into
CCA, this paper proposes Local Discriminant Hyperalignment (LDHA) as a novel
supervised HA method, which can provide better functional alignment for MVP
analysis. Indeed, the locality is defined based on the stimuli categories in
the train-set, where the correlation between all stimuli in the same category
will be maximized and the correlation between distinct categories of stimuli
approaches to near zero. Experimental studies on multi-subject MVP analysis
confirm that the LDHA method achieves superior performance to other
state-of-the-art HA algorithms.
Zhe Zhang, Daniel B. Neill
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We present a novel subset scan method to detect if a probabilistic binary
classifier has statistically significant bias — over or under predicting the
risk — for some subgroup, and identify the characteristics of this subgroup.
This form of model checking and goodness-of-fit test provides a way to
interpretably detect the presence of classifier bias and poor classifier fit,
not just in one or two dimensions of features of a priori interest, but in the
space of all possible feature subgroups. We use subset scan and parametric
bootstrap methods to efficiently address the difficulty of assessing the
exponentially many possible subgroups. We also suggest several useful
extensions of this method for increasing interpretability of predictive models
and prediction performance.
Wojciech Samek, Grégoire Montavon, Alexander Binder, Sebastian Lapuschkin, Klaus-Robert Müller
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Complex nonlinear models such as deep neural network (DNNs) have become an
important tool for image classification, speech recognition, natural language
processing, and many other fields of application. These models however lack
transparency due to their complex nonlinear structure and to the complex data
distributions to which they typically apply. As a result, it is difficult to
fully characterize what makes these models reach a particular decision for a
given input. This lack of transparency can be a drawback, especially in the
context of sensitive applications such as medical analysis or security. In this
short paper, we summarize a recent technique introduced by Bach et al. [1] that
explains predictions by decomposing the classification decision of DNN models
in terms of input variables.
Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein
Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We survey results on neural network expressivity described in “On the
Expressive Power of Deep Neural Networks”. The paper motivates and develops
three natural measures of expressiveness, which all display an exponential
dependence on the depth of the network. In fact, all of these measures are
related to a fourth quantity, trajectory length. This quantity grows
exponentially in the depth of the network, and is responsible for the depth
sensitivity observed. These results translate to consequences for networks
during and after training. They suggest that parameters earlier in a network
have greater influence on its expressive power — in particular, given a layer,
its influence on expressivity is determined by the remaining depth of the
network after that layer. This is verified with experiments on MNIST and
CIFAR-10. We also explore the effect of training on the input-output map, and
find that it trades off between the stability and expressivity.
Zhe Gan, Chunyuan Li, Changyou Chen, Yunchen Pu, Qinliang Su, Lawrence Carin
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Recurrent neural networks (RNNs) have shown promising performance for
language modeling. However, traditional training of RNNs using back-propagation
through time often suffers from overfitting. One reason for this is that
stochastic optimization (used for large training sets) does not provide good
estimates of model uncertainty. This paper leverages recent advances in
stochastic gradient Markov Chain Monte Carlo (also appropriate for large
training sets) to learn weight uncertainty in RNNs. It yields a principled
Bayesian learning algorithm, adding gradient noise during training (enhancing
exploration of the model-parameter space) and model averaging when testing.
Extensive experiments on various RNN models and across a broad range of
applications demonstrate the superiority of the proposed approach over
stochastic optimization.
Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng Gao, Lawrence Carin, Li Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)
A Semantic Compositional Network (SCN) is developed for image captioning, in
which semantic concepts (i.e., tags) are detected from the image, and the
probability of each tag is used to compose the parameters in a long short-term
memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an
ensemble of tag-dependent weight matrices. The degree to which each member of
the ensemble is used to generate an image caption is tied to the
image-dependent probability of the corresponding tag. In addition to captioning
images, we also extend the SCN to generate captions for video clips. We
qualitatively analyze semantic composition in SCNs, and quantitatively evaluate
the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text.
Experimental results show that the proposed method significantly outperforms
prior state-of-the-art approaches, across multiple evaluation metrics.
S.B.Balaji, G.R.Kini, P. Vijay Kumar
Comments: To be submitted to ISIT 2017
Subjects: Information Theory (cs.IT)
In this paper, we derive an upper bound on rate of a code with locality with
sequential recovery from multiple erasures for any (r geq 3) and any (t>0) .
We also give a construction of codes achieving our rate bound for any (r) and
(t in 2mathbb{Z_{+}}).
Martin Haenggi
Comments: 8 pages, 4 figures
Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Networking and Internet Architecture (cs.NI); Probability (math.PR)
The point process of concurrent users is critical for the analysis of
cellular networks, in particular for the uplink and for full-duplex
communication. We analyze the properties of two popular models. For the first
one, we provide an accurate characterization of the pair correlation functions
from the user and the base station point of view, which are applied to
approximate the user process by Poisson and Ginibre point processes. For the
second model, which includes the first model asymptotically, we study the cell
vacancy probability, the mean area of vacant and occupied cells, the user-base
station distance, and the pair correlation function in lightly and heavily
loaded regimes.
Vaneet Aggarwal
Subjects: Information Theory (cs.IT)
In this paper, we find the Laplace Stieltjes transform of the probability of
data loss for the k-out-of-n distributed storage system with deterministic
repair times. We consider two repair models, namely the serial and parallel
repair. We show that for failure rate much lower than the repair rate, mean
time of data loss for the two models is the same unlike the case for
exponential repair models.
Vahid Aref, Son T. Le, Henning Buelow
Comments: The paper was presented in European Conference on Optical Communication (ECOC) 2016, Sept. 2016
Subjects: Information Theory (cs.IT)
We report a 3 dB increase in the nonlinear threshold of a 64*0.5Gbaud 16-QAM
continuous-nonlinear-spectrum modulated signal by nonlinear multiplexing with
QPSK modulated multi-solitons, showing the first ever fully nonlinear-spectrum
modulated system in the highly nonlinear regime.
Ni Ding, Chung Chan, Qiaoqiao Zhou, Rodney A. Kennedy, Parastoo Sadeghi
Comments: 26 pages, 14 figures
Subjects: Information Theory (cs.IT)
This paper considers the communication for omniscience (CO) problem: A set of
users observe a discrete memoryless multiple source and want to recover the
entire multiple source via noise-free broadcast communications. We study the
problem of how to attain omniscience with the minimum sum-rate, the total
number of communications, and determine a corresponding optimal rate vector.
The results cover both asymptotic and non-asymptotic models where the
transmission rates are real and integral, respectively. Based on the concepts
of submodularity and Dilworth truncation, we formulate a maximization problem.
The maximum is the highest Slepian-Wolf constraint over all multi-way cuts of
the user set, which determines the minimum sum-rate. For solving this
maximization problem and searching for an optimal rate vector, we propose a
modified decomposition algorithm (MDA) and a sum-rate increment algorithm (SIA)
for asymptotic and non-asymptotic models, respectively, both of which complete
in polynomial time. For solving the Dilworth truncation problem as the
subroutine in both algorithms, we propose a fusion method to implement the
existing coordinate saturation capacity (CoordSatCap) algorithm, where the
submodular function minimization (SFM) is done over a merged user set. We show
by experimental results that this fusion method contributes to a reduction in
computation complexity as compared to the original CoordSatCap algorithm.
Tan Tai Do, Hien Quoc Ngo, Trung Q. Duong, Tobias J. Oechtering, Mikael Skoglund
Comments: Accepted for publication in the IEEE Wireless Communications Letters
Subjects: Information Theory (cs.IT)
This letter proposes anti-jamming strategies based on pilot retransmission
for a single user uplink massive MIMO under jamming attack. A jammer is assumed
to attack the system both in the training and data transmission phases. We
first derive an achievable rate which enables us to analyze the effect of
jamming attacks on the system performance. Counter-attack strategies are then
proposed to mitigate this effect under two different scenarios: random and
deterministic jamming attacks. Numerical results illustrate our analysis and
benefit of the proposed schemes.
Vahid Aref, Narayanan Rengaswamy, Laurent Schmalen
Subjects: Information Theory (cs.IT)
Spatially-Coupled LDPC (SC-LDPC) ensembles have gained significant interest
since they were shown to universally achieve the capacity of binary memoryless
channels under low-complexity belief-propagation decoding. In this work, we
focus on the performance of these ensembles over binary erasure channels
affected additionally by bursts. We assume that the burst can erase either a
complete spatial position, modeling node failures in distributed storage, or
can span across multiple spatial positions. We study the expected performance
of random regular SC-LDPC ensembles on single-burst-erasure channels and
provide tight lower bounds for the block erasure probability ((P_B)) at finite
block length and bounds on the coupling parameter for being asymptotically able
to recover the burst. Subsequently, we show the effect of introducing
additional random erasures to these channels. Finally, we show that expurgation
can bring substantial improvements in the block error rate by analyzing the
minimal stopping sets of some expurgated code ensembles. Our study shows that
for a fixed asymptotic code rate, the combination of increasing the variable
node degree and expurgating the ensemble can improve the block error
probability by several orders of magnitude. All the results are verified using
Monte-Carlo simulations.
Shun Watanabe
Comments: 30 pages, 8 figures
Subjects: Information Theory (cs.IT)
The problem of zero-rate multiterminal hypothesis testing is revisited. A
Neyman-Pearson-like test is proposed and its non-asymptotic performance is
clarified; for short blocklength, it is numerically examined that the proposed
test is superior to a previously known Hoeffding-like test proposed by
Han-Kobayashi. For the large deviation regime, it is shown that our proposed
test achieves the optimal trade-off between the type I and type II exponents
shown by Han-Kobayashi. Among the class of symmetric (type based) testing
schemes, when the type I error probability is non-vanishing, the proposed test
is optimal up to the second-order term of the type II error exponent; the
second-order term is characterized in terms of the variance of the projected
relative entropy density. The information geometry method plays an important
role in the analysis as well as the construction of the test.
Mehmet E. Koroglu, Ibrahim Ozbek, Irfan Siap
Subjects: Information Theory (cs.IT)
In this work, we study cyclic codes that have generators as Fibonacci
polynomials over finite fields. We show that these cyclic codes in most cases
produce families of maximum distance separable and optimal codes with
interesting properties. We explore these relations and present some examples.
Also, we present applications of these codes to secret sharing schemes.
Yi Zhong, Tony Q.S. Quek, Xiaohu Ge
Comments: 30 pages, 8 figures
Subjects: Information Theory (cs.IT)
Emergence of new types of services has led to various traffic and diverse
delay requirements in fifth generation (5G) wireless networks. Meeting diverse
delay requirements is one of the most critical goals for the design of 5G
wireless networks. Though the delay of point-to-point communications has been
well investigated, the delay of multi-point to multi-point communications has
not been thoroughly studied since it is a complicated function of all links in
the network. In this work, we propose a novel tractable approach to analyze the
delay in the heterogenous cellular networks with spatio-temporal random arrival
of traffic. Specifically, we propose the notion of delay outage and evaluated
the effect of different scheduling policies on the delay performance. Our
numerical analysis reveals that offloading policy based on cell range expansion
greatly reduces the macrocell traffic while bringing a small amount of growth
for the picocell traffic. Our results also show that the delay performance of
round-robin scheduling outperforms first in first out scheduling for heavy
traffic, and it is reversed for light traffic. In summary, this analytical
framework provides an understanding and a rule-of-thumb for the practical
deployment of 5G systems where delay requirement is increasingly becoming a key
concern.
Masoumeh Sadeghi, Homa Nikbakht, Amir Masoud Rabiei, Vahid Shah-Mansouri
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
A cooperative medium access control (CoopMAC) network with randomly
distributed helpers is considered. We introduce a new helper selection scheme
which maximizes the average throughput while maintaining a low power
consumption profile in the network. To this end, all transmissions are assumed
to be performed using power control. We assume that each node can estimate the
channel between itself and a receiving node. Then, it evaluates the minimum
transmission power needed to achieve the desired signal to noise ratio (SNR).
If the required transmission power is less than the maximum transmission power
of the node, the communication is regarded as successful. Otherwise, the
transmission is canceled. %Also, we classify the helpers into six groups
according to their transmission rates IEEE 802.11b Standard. In order to
increase the average throughput, we assume that the cooperative link with the
highest transmission rate is chosen from those that can successfully forward
the source signal to destination. Also, when there are several links with the
same rates, the one with minimum required power is given the highest priority.
Assuming that the helpers are distributed as a Poisson point process with fixed
intensity, we derive exact expressions for the average throughput and the power
consumption of the network. Simulation results show that our scheme is able to
significantly increase the throughput in comparison to the conventional CoopMAC
network. It is also able to reduce the power consumption compared to a network
with no power control approach.
Nadine Boudargham, Jacques Bou Abdo, Jacques Demerjian, Christophe Guyeux, Abdallah Makhoul
Comments: Accepted to AICCSA 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)
The rapid development of medical sensors has increased the interest in
Wireless Body Area Network (WBAN) applications where physiological data from
the human body and its environment is gathered, monitored, and analyzed to take
the proper measures. In WBANs, it is essential to design MAC protocols that
ensure adequate Quality of Service (QoS) such as low delay and high
scalability. This paper investigates Medium Access Control (MAC) protocols used
in WBAN, and compares their performance in a high traffic environment. Such
scenario can be induced in case of emergency for example, where physiological
data collected from all sensors on human body should be sent simultaneously to
take appropriate action. This study can also be extended to cover collaborative
WBAN systems where information from different bodies is sent simultaneously
leading to high traffic. OPNET simulations are performed to compare the delay
and scalability performance of the different MAC protocols under the same
experimental conditions and to draw conclusions about the best protocol to be
used in a high traffic environment.
Yasser Fadlallah (SOCRATE), Paul Ferrand, Leonardo Cardoso (SOCRATE), Jean-Marie Gorce (SOCRATE)
Journal-ref: IEEE 17th International Workshop onSignal Processing Advances in
Wireless Communications (SPAWC), Jul 2016, Edinburgh, United Kingdom. pp.1 –
5, 2016
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Interference Alignment (IA) is technique that, in a large sense, makes use of
the increasing signal dimensions available in the system through MIMO and OFDM
technologies in order to globally reduce the interference suffered by users in
a network. In this paper, we address the problem of downlink cellular networks,
the so-called interfering broadcast channels, where mobile users at cell edges
may suffer from high interference and thus, poor performance. Starting from the
downlink IA scheme proposed by Suh et al., a new approach is proposed where
each user feeds back multiple selected received signal directions with high
signal-to-interference gain. A exhaustive search based scheduler selects a
subset of users to be served simultaneously, balancing between sum-rate
performance and fairness, but becomes untractable in dense network scenarios
where many users send simultaneous requests. Therefore, we develop a
sub-optimal scheduler that greatly decreases the complexity while preserving a
near-optimal data rate gain. More interestingly, our simulations show that the
IA scheme becomes valuable only in correlated channels, whereas the matched
filtering based scheme performs the best in the uncorrelated scenarios.