Oswald Berthold, Verena Hafner
Comments: Paper submitted to the ICDL-Epirob 2017 conference. Currently under review
Subjects: Neural and Evolutionary Computing (cs.NE)
We propose sensorimotor tappings, a new graphical technique that explicitly
represents relations between the time steps of an agent’s sensorimotor loop and
a single training step of an adaptive model that the agent is using internally.
In the simplest case this is a relation linking two time steps. In realistic
cases these relations can extend over several time steps and over different
sensory channels. The aim is to capture the footprint of information intake
relative to the agent’s current time step. We argue that this view allows us to
make prior considerations explicit and then use them in implementations without
modification once they are established. In the paper we introduce the problem
domain, explain the basic idea, provide example tappings for standard
configurations used in developmental models, and show how tappings can be
applied to problems in related fields.
Justin Lazarow, Long Jin, Zhuowen Tu
Comments: 10 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We study unsupervised learning by developing introspective generative
modeling (IGM) that attains a generator using progressively learned deep
convolutional neural networks. The generator is itself a discriminator, capable
of introspection: being able to self-evaluate the difference between its
generated samples and the given training data. When followed by repeated
discriminative learning, desirable properties of modern discriminative
classifiers are directly inherited by the generator. IGM learns a cascade of
CNN classifiers using a synthesis-by-classification algorithm. In the
experiments, we observe encouraging results on a number of applications
including texture modeling, artistic style transferring, face modeling, and
semi-supervised learning.
Long Jin, Justin Lazarow, Zhuowen Tu
Comments: 11 pages, 6 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper we propose introspective classifier learning (ICL) that
emphasizes the importance of having a discriminative classifier empowered with
generative capabilities. We develop a reclassification-by-synthesis algorithm
to perform training using a formulation stemmed from the Bayes theory. Our
classifier is able to iteratively: (1) synthesize pseudo-negative samples in
the synthesis step; and (2) enhance itself by improving the classification in
the reclassification step. The single classifier learned is at the same time
generative — being able to directly synthesize new samples within its own
discriminative model. We conduct experiments on standard benchmark datasets
including MNIST, CIFAR, and SVHN using state-of-the-art CNN architectures, and
observe improved classification results.
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim
Comments: Accepted at IJCAI 2017 (The 26th International Joint Conference on Artificial Intelligence)
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Computer programs written in one language are often required to be ported to
other languages to support multiple devices and environments. When programs use
language specific APIs (Application Programming Interfaces), it is very
challenging to migrate these APIs to the corresponding APIs written in other
languages. Existing approaches mine API mappings from projects that have
corresponding versions in two languages. They rely on the sparse availability
of bilingual projects, thus producing a limited number of API mappings. In this
paper, we propose an intelligent system called DeepAM for automatically mining
API mappings from a large-scale code corpus without bilingual projects. The key
component of DeepAM is based on the multimodal sequence to sequence learning
architecture that aims to learn joint semantic representations of bilingual API
sequences from big source code data. Experimental results indicate that DeepAM
significantly increases the accuracy of API mappings as well as the number of
API mappings, when compared with the state-of-the-art approaches.
Justin Lazarow, Long Jin, Zhuowen Tu
Comments: 10 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We study unsupervised learning by developing introspective generative
modeling (IGM) that attains a generator using progressively learned deep
convolutional neural networks. The generator is itself a discriminator, capable
of introspection: being able to self-evaluate the difference between its
generated samples and the given training data. When followed by repeated
discriminative learning, desirable properties of modern discriminative
classifiers are directly inherited by the generator. IGM learns a cascade of
CNN classifiers using a synthesis-by-classification algorithm. In the
experiments, we observe encouraging results on a number of applications
including texture modeling, artistic style transferring, face modeling, and
semi-supervised learning.
Long Jin, Justin Lazarow, Zhuowen Tu
Comments: 11 pages, 6 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper we propose introspective classifier learning (ICL) that
emphasizes the importance of having a discriminative classifier empowered with
generative capabilities. We develop a reclassification-by-synthesis algorithm
to perform training using a formulation stemmed from the Bayes theory. Our
classifier is able to iteratively: (1) synthesize pseudo-negative samples in
the synthesis step; and (2) enhance itself by improving the classification in
the reclassification step. The single classifier learned is at the same time
generative — being able to directly synthesize new samples within its own
discriminative model. We conduct experiments on standard benchmark datasets
including MNIST, CIFAR, and SVHN using state-of-the-art CNN architectures, and
observe improved classification results.
Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe
Comments: Accepted to CVPR 2017. Project webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present an unsupervised learning framework for the task of monocular depth
and camera motion estimation from unstructured video sequences. We achieve this
by simultaneously training depth and camera pose estimation networks using the
task of view synthesis as the supervisory signal. The networks are thus coupled
via the view synthesis objective during training, but can be applied
independently at test time. Empirical evaluation on the KITTI dataset
demonstrates the effectiveness of our approach: 1) monocular depth performing
comparably with supervised methods that use either ground-truth pose or depth
for training, and 2) pose estimation performing favorably with established SLAM
systems under comparable input settings.
Tomas Simon, Hanbyul Joo, Iain Matthews, Yaser Sheikh
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present an approach that uses a multi-camera system to train fine-grained
detectors for keypoints that are prone to occlusion, such as the joints of a
hand. We call this procedure multiview bootstrapping: first, an initial
keypoint detector is used to produce noisy labels in multiple views of the
hand. The noisy detections are then triangulated in 3D using multiview geometry
or marked as outliers. Finally, the reprojected triangulations are used as new
labeled training data to improve the detector. We repeat this process,
generating more labeled data in each iteration. We derive a result analytically
relating the minimum number of views to achieve target true and false positive
rates for a given detector. The method is used to train a hand keypoint
detector for single images. The resulting keypoint detector runs in realtime on
RGB images and has accuracy comparable to methods that use depth sensors. The
single view detector, triangulated over multiple views, enables 3D markerless
hand motion capture with complex object interactions.
Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose SfM-Net, a geometry-aware neural network for motion estimation in
videos that decomposes frame-to-frame pixel motion in terms of scene and object
depth, camera motion and 3D object rotations and translations. Given a sequence
of frames, SfM-Net predicts depth, segmentation, camera and rigid object
motions, converts those into a dense frame-to-frame motion field (optical
flow), differentiably warps frames in time to match pixels and back-propagates.
The model can be trained with various degrees of supervision: 1)
self-supervised by the re-projection photometric error (completely
unsupervised), 2) supervised by ego-motion (camera motion), or 3) supervised by
depth (e.g., as provided by RGBD sensors). SfM-Net extracts meaningful depth
estimates and successfully estimates frame-to-frame camera rotations and
translations. It often successfully segments the moving objects in the scene,
even though such supervision is never provided.
José Ignacio Orlando, Hugo Luis Manterola, Enzo Ferrante, Federico Ariel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arabidopsis thaliana is a plant species widely utilized by scientists to
estimate the impact of genetic differences in root morphological features. For
this purpose, images of this plant after genetic modifications are taken to
study differences in the root architecture. This task requires manual
segmentations of radicular structures, although this is a particularly tedious
and time-consuming labor. In this work, we present an unsupervised method for
Arabidopsis thaliana root segmentation based on morphological operations and
fully-connected Conditional Random Fields. Although other approaches have been
proposed to this purpose, all of them are based on more complex and expensive
imaging modalities. Our results prove that our method can be easily applied
over images taken using conventional scanners, with a minor user intervention.
A first data set, our results and a fully open source implementation are
available online.
Kuan-Lun Tseng, Yen-Liang Lin, Winston Hsu, Chung-Yang Huang
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning models such as convolutional neural net- work have been widely
used in 3D biomedical segmentation and achieve state-of-the-art performance.
However, most of them often adapt a single modality or stack multiple
modalities as different input channels. To better leverage the multi-
modalities, we propose a deep encoder-decoder structure with cross-modality
convolution layers to incorporate different modalities of MRI data. In
addition, we exploit convolutional LSTM to model a sequence of 2D slices, and
jointly learn the multi-modalities and convolutional LSTM in an end-to-end
manner. To avoid converging to the certain labels, we adopt a re-weighting
scheme and two-phase training to handle the label imbalance. Experimental
results on BRATS-2015 show that our method outperforms state-of-the-art
biomedical segmentation approaches.
Shaohuai Shi, Xiaowen Chu
Comments: 7 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Rectifier neuron units (ReLUs) have been widely used in deep convolutional
networks. An ReLU converts negative values to zeros, and does not change
positive values, which leads to a high sparsity of neurons. In this work, we
first examine the sparsity of the outputs of ReLUs in some popular deep
convolutional architectures. And then we use the sparsity property of ReLUs to
accelerate the calculation of convolution by skipping calculations of
zero-valued neurons. The proposed sparse convolution algorithm achieves
significant speedup on CPUs compared to the traditional matrix-matrix
multiplication algorithm for convolution.
Shervin Minaee, Yao Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Signal decomposition is a classical problem in signal processing, which aims
to separate an observed signal into two or more components each with its own
property. Usually each component is described by its own subspace or
dictionary. Extensive research has been done for the case where the components
are additive, but in real world applications, the components are often
non-additive. For example, an image may consist of a foreground object overlaid
on a background, where each pixel either belongs to the foreground or the
background. In such a situation, to separate signal components, we need to find
a binary mask which shows the location of each component. Therefore it requires
to solve a binary optimization problem. Since most of the binary optimization
problems are intractable, we relax this problem to the approximated continuous
problem, and solve it by alternating optimization technique, which solves the
mask and the subspace projection for each component, alternatingly and
iteratively. We show the application of the proposed algorithm for three
applications: separation of text from background in images, separation of
moving objects from a background undergoing global camera motion in videos,
separation of sinusoidal and spike components in one dimensional signals. We
demonstrate in each case that considering the non-additive nature of the
problem can lead to significant improvement.
Md Zahangir Alom, Mahmudul Hasan, Chris Yakopcic, Tarek M. Taha
Comments: 11 pages, 10 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep convolutional neural networks (DCNNs) are an influential tool for
solving various problems in the machine learning and computer vision fields. In
this paper, we introduce a new deep learning model called an Inception-
Recurrent Convolutional Neural Network (IRCNN), which utilizes the power of an
inception network combined with recurrent layers in DCNN architecture. We have
empirically evaluated the recognition performance of the proposed IRCNN model
using different benchmark datasets such as MNIST, CIFAR-10, CIFAR- 100, and
SVHN. Experimental results show similar or higher recognition accuracy when
compared to most of the popular DCNNs including the RCNN. Furthermore, we have
investigated IRCNN performance against equivalent Inception Networks and
Inception-Residual Networks using the CIFAR-100 dataset. We report about 3.5%,
3.47% and 2.54% improvement in classification accuracy when compared to the
RCNN, equivalent Inception Networks, and Inception- Residual Networks on the
augmented CIFAR- 100 dataset respectively.
Lucia Ballerini, Ruggiero Lovreglio, Maria del C. Valdes-Hernandez, Joel Ramirez, Bradley J. MacIntosh, Sandra E. Black, Joanna M. Wardlaw
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Perivascular Spaces (PVS) are a recently recognised feature of Small Vessel
Disease (SVD), also indicating neuroinflammation, and are an important part of
the brain’s circulation and glymphatic drainage system. Quantitative analysis
of PVS on Magnetic Resonance Images (MRI) is important for understanding their
relationship with neurological diseases. In this work, we propose a
segmentation technique based on the 3D Frangi filtering for extraction of PVS
from MRI. Based on prior knowledge from neuroradiological ratings of PVS, we
used ordered logit models to optimise Frangi filter parameters in response to
the variability in the scanner’s parameters and study protocols. We optimized
and validated our proposed models on two independent cohorts, a dementia sample
(N=20) and patients who previously had mild to moderate stroke (N=48). Results
demonstrate the robustness and generalisability of our segmentation method.
Segmentation-based PVS burden estimates correlated with neuroradiological
assessments (Spearman’s (
ho) = 0.74, p (<) 0.001), suggesting the great
potential of our proposed method
Jeong-Kyun Lee, Jae-Won Yea, Min-Gyu Park, Kuk-Jin Yoon
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a novel method to jointly solve scene layout
estimation and global registration problems for accurate indoor 3D
reconstruction. Given a sequence of range data, we first build a set of scene
fragments using KinectFusion and register them through pose graph optimization.
Afterwards, we alternate between layout estimation and layout-based global
registration processes in iterative fashion to complement each other. We
extract the scene layout through hierarchical agglomerative clustering and
energy-based multi-model fitting in consideration of noisy measurements. Having
the estimated scene layout in one hand, we register all the range data through
the global iterative closest point algorithm where the positions of 3D points
that belong to the layout such as walls and a ceiling are constrained to be
close to the layout. We experimentally verify the proposed method with the
publicly available synthetic and real-world datasets in both quantitative and
qualitative ways.
Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu
Comments: ICMEW 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Current state-of-the-art approaches to skeleton-based action recognition are
mostly based on recurrent neural networks (RNN). In this paper, we propose a
novel convolutional neural networks (CNN) based framework for both action
classification and detection. Raw skeleton coordinates as well as skeleton
motion are fed directly into CNN for label prediction. A novel skeleton
transformer module is designed to rearrange and select important skeleton
joints automatically. With a simple 7-layer network, we obtain 89.3% accuracy
on validation set of the NTU RGB+D dataset. For action detection in untrimmed
videos, we develop a window proposal network to extract temporal segment
proposals, which are further classified within the same network. On the recent
PKU-MMD dataset, we achieve 93.7% mAP, surpassing the baseline by a large
margin.
Vamsi Kiran Adhikarla, Marek Vinkler, Denis Sumin, Rafał K. Mantiuk
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Light fields become a popular representation of three dimensional scenes, and
there is interest in their processing, resampling, and compression. As those
operations often result in loss of quality, there is a need to quantify it. In
this work, we collect a new dataset of dense reference and distorted light
fields as well as the corresponding quality scores which are scaled in
perceptual units. The scores were acquired in a subjective experiment using an
interactive light-field viewing setup. The dataset contains typical artifacts
that occur in light-field processing chain due to light-field reconstruction,
multi-view compression, and limitations of automultiscopic displays. We test a
number of existing objective quality metrics to determine how well they can
predict the quality of light fields. We find that the existing image quality
metrics provide good measures of light-field quality, but require dense
reference light- fields for optimal performance. For more complex tasks of
comparing two distorted light fields, their performance drops significantly,
which reveals the need for new, light-field-specific metrics.
Yongliang Chen
Comments: 10 pages, 8 figures, 2 tables, forbidden work
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Segmenting blood vessels in fundus imaging plays an important role in medical
diagnosis. Many algorithms have been proposed. While deep Neural Networks have
been attracting enormous attention from computer vision community recent years
and several novel works have been done in terms of its application in retinal
blood vessel segmentation, most of them are based on supervised learning which
requires amount of labeled data, which is both scarce and expensive to obtain.
We leverage the power of Deep Convolutional Neural Networks (DCNN) in feature
learning, in this work, to achieve this ultimate goal. The highly efficient
feature learning of DCNN inspires our novel approach that trains the networks
with automatically-generated samples to achieve desirable performance on
real-world fundus images. For this, we design a set of rules abstracted from
the domain-specific prior knowledge to generate these samples. We argue that,
with the high efficiency of DCNN in feature learning, one can achieve this goal
by constructing the training dataset with prior knowledge, no manual labeling
is needed. This approach allows us to take advantages of supervised learning
without labeling. We also build a naive DCNN model to test it. The results on
standard benchmarks of fundus imaging show it is competitive to the
state-of-the-art methods which implies a potential way to leverage the power of
DCNN in feature learning.
Miguel Costa, Beatriz Quintino Ferreira, Manuel Marques
Comments: Submitted to ITSC2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Aiming to reduce pollutant emissions, bicycles are regaining popularity
specially in urban areas. However, the number of cyclists’ fatalities is not
showing the same decreasing trend as the other traffic groups. Hence,
monitoring cyclists’ data appears as a keystone to foster urban cyclists’
safety by helping urban planners to design safer cyclist routes. In this work,
we propose a fully image-based framework to assess the rout risk from the
cyclist perspective. From smartphone sequences of images, this generic
framework is able to automatically identify events considering different risk
criteria based on the cyclist’s motion and object detection. Moreover, since it
is entirely based on images, our method provides context on the situation and
is independent from the expertise level of the cyclist. Additionally, we build
on an existing platform and introduce several improvements on its mobile app to
acquire smartphone sensor data, including video. From the inertial sensor data,
we automatically detect the route segments performed by bicycle, applying
behavior analysis techniques. We test our methods on real data, attaining very
promising results in terms of risk classification, according to two different
criteria, and behavior analysis accuracy.
Hamed R. Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
To bridge the gap between humans and machines in image understanding and
describing, we need further insight into how people describe a perceived scene.
In this paper, we study the agreement between bottom-up saliency-based visual
attention and object referrals in scene description constructs. We investigate
the properties of human-written descriptions and machine-generated ones. We
then propose a saliency-boosted image captioning model in order to investigate
benefits from low-level cues in language models. We learn that (1) humans
mention more salient objects earlier than less salient ones in their
descriptions, (2) the better a captioning model performs, the better attention
agreement it has with human descriptions, (3) the proposed saliency-boosted
model, compared to its baseline form, does not improve significantly on the MS
COCO database, indicating explicit bottom-up boosting does not help when the
task is well learnt and tuned on a data, (4) a better generalization ability
is, however, observed for the saliency-boosted model on unseen data.
Hamed R. Tavakoli, Jorma Laaksonen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
This manuscript introduces the problem of prominent object detection and
recognition. The problem deals with finding the most important region of
interest, segmenting the relevant item/object in that area, and assigning it an
object class label. In other words, we are solving the three problems of
saliency modeling, saliency detection, and object recognition under one
umbrella. The motivation behind such a problem formulation is 1) the benefits
to the knowledge representation-based vision pipelines, and 2) the potential
improvements in emulating bio-inspired vision systems by solving these three
problems together. We propose a method as a baseline for further research. The
proposed model predicts the most important area in the image, segments the
associated objects, and labels them. The proposed problem and method are
evaluated against human fixations, annotated segmentation masks, and object
class categories. We define a chance level for each of the evaluation criterion
to compare the proposed algorithm with. Despite the good performance of the
proposed baseline, the overall evaluations indicate that the problem of
prominent object detection and recognition is a challenging task that is still
worth investigating further.
Changde Du, Changying Du, Huiguang He
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
Decoding human brain activities via functional magnetic resonance imaging
(fMRI) has gained increasing attention in recent years. While encouraging
results have been reported in brain states classification tasks, reconstructing
the details of human visual experience still remains difficult. Two main
challenges that hinder the development of effective models are the perplexing
fMRI measurement noise and the high dimensionality of limited data instances.
Existing methods generally suffer from one or both of these issues and yield
dissatisfactory results. In this paper, we tackle this problem by casting the
reconstruction of visual stimulus as the Bayesian inference of missing view in
a multiview latent variable model. Sharing a common latent representation, our
joint generative model of external stimulus and brain response is not only
“deep” in extracting nonlinear features from visual images, but also powerful
in capturing correlations among voxel activities of fMRI recordings. The
nonlinearity and deep structure endow our model with strong representation
ability, while the correlations of voxel activities are critical for
suppressing noise and improving prediction. We devise an efficient variational
Bayesian method to infer the latent variables and the model parameters. To
further improve the reconstruction accuracy, the latent representations of
testing instances are enforced to be close to that of their neighbours from the
training set via posterior regularization. Experiments on three fMRI recording
datasets demonstrate that our approach can more accurately reconstruct visual
stimuli.
Ramakanth Pasunuru, Mohit Bansal
Comments: Accepted at ACL 2017 (13 pages w/ supplementary)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Video captioning, the task of describing the content of a video, has seen
some promising improvements in recent years with sequence-to-sequence models,
but accurately learning the temporal and logical dynamics involved in the task
still remains a challenge, especially given the lack of sufficient annotated
data. We improve video captioning by sharing knowledge with two related
directed-generation tasks: a temporally-directed unsupervised video prediction
task to learn richer context-aware video encoder representations, and a
logically-directed language entailment generation task to learn better
video-entailing caption decoder representations. For this, we present a
many-to-many multi-task learning model that shares parameters across the
encoders and decoders of the three tasks. We achieve significant improvements
and the new state-of-the-art on several standard video captioning datasets
using diverse automatic and human evaluations. We also show mutual multi-task
improvements on the entailment generation task.
Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu
Comments: Patent disclosure, UCSD Docket No. SD2014-313, filed on May 22, 2014
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Our proposed deeply-supervised nets (DSN) method simultaneously minimizes
classification error while making the learning process of hidden layers direct
and transparent. We make an attempt to boost the classification performance by
studying a new formulation in deep networks. Three aspects in convolutional
neural networks (CNN) style architectures are being looked at: (1) transparency
of the intermediate layers to the overall classification; (2)
discriminativeness and robustness of learned features, especially in the early
layers; (3) effectiveness in training due to the presence of the exploding and
vanishing gradients. We introduce “companion objective” to the individual
hidden layers, in addition to the overall objective at the output layer (a
different strategy to layer-wise pre-training). We extend techniques from
stochastic gradient methods to analyze our algorithm. The advantage of our
method is evident and our experimental result on benchmark datasets shows
significant performance gain over existing methods (e.g. all state-of-the-art
results on MNIST, CIFAR-10, CIFAR-100, and SVHN).
Amit Gupta, Rémi Lebret, Hamza Harkous, Karl Aberer
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike most previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimum-cost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
state-of-the-art taxonomy induction approaches across four languages.
Furthermore, we show that our approach is robust to the presence of noise in
the input vocabulary.
Changde Du, Changying Du, Huiguang He
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
Decoding human brain activities via functional magnetic resonance imaging
(fMRI) has gained increasing attention in recent years. While encouraging
results have been reported in brain states classification tasks, reconstructing
the details of human visual experience still remains difficult. Two main
challenges that hinder the development of effective models are the perplexing
fMRI measurement noise and the high dimensionality of limited data instances.
Existing methods generally suffer from one or both of these issues and yield
dissatisfactory results. In this paper, we tackle this problem by casting the
reconstruction of visual stimulus as the Bayesian inference of missing view in
a multiview latent variable model. Sharing a common latent representation, our
joint generative model of external stimulus and brain response is not only
“deep” in extracting nonlinear features from visual images, but also powerful
in capturing correlations among voxel activities of fMRI recordings. The
nonlinearity and deep structure endow our model with strong representation
ability, while the correlations of voxel activities are critical for
suppressing noise and improving prediction. We devise an efficient variational
Bayesian method to infer the latent variables and the model parameters. To
further improve the reconstruction accuracy, the latent representations of
testing instances are enforced to be close to that of their neighbours from the
training set via posterior regularization. Experiments on three fMRI recording
datasets demonstrate that our approach can more accurately reconstruct visual
stimuli.
Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen
Subjects: Artificial Intelligence (cs.AI)
This work introduces a method to tune a sequence-based generative model for
molecular de novo design that through augmented episodic likelihood can learn
to generate structures with certain specified desirable properties. We
demonstrate how this model can execute a range of tasks such as generating
analogues to a query structure and generating compounds predicted to be active
against a biological target. As a proof of principle, the model is first
trained to generate molecules that do not contain sulphur. As a second example,
the model is trained to generate analogues to the drug Celecoxib, a technique
that could be used for scaffold hopping or library expansion starting from a
single molecule. Finally, when tuning the model towards generating compounds
predicted to be active against the dopamine receptor D2, the model generates
structures of which more than 95% are predicted to be active, including
experimentally confirmed actives that have not been included in either the
generative model nor the activity prediction model.
Changde Du, Changying Du, Jinpeng Li, Wei-long Zheng, Bao-liang Lu, Huiguang He
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
In emotion recognition, it is difficult to recognize human’s emotional states
using just a single modality. Besides, the annotation of physiological
emotional data is particularly expensive. These two aspects make the building
of effective emotion recognition model challenging. In this paper, we first
build a multi-view deep generative model to simulate the generative process of
multi-modality emotional data. By imposing a mixture of Gaussians assumption on
the posterior approximation of the latent variables, our model can learn the
shared deep representation from multiple modalities. To solve the
labeled-data-scarcity problem, we further extend our multi-view model to
semi-supervised learning scenario by casting the semi-supervised classification
problem as a specialized missing data imputation task. Our semi-supervised
multi-view deep generative framework can leverage both labeled and unlabeled
data from multiple modalities, where the weight factor for each modality can be
learned automatically. Compared with previous emotion recognition methods, our
method is more robust and flexible. The experiments conducted on two real
multi-modal emotion datasets have demonstrated the superiority of our framework
over a number of competitors.
Wolfgang Hönig, T. K. Satish Kumar, Liron Cohen, Hang Ma, Sven Koenig, Nora Ayanian
Comments: Published in Southern California Robotics Symposium 2016
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)
Path planning for multiple robots is well studied in the AI and robotics
communities. For a given discretized environment, robots need to find
collision-free paths to a set of specified goal locations. Robots can be fully
anonymous, non-anonymous, or organized in groups. Although powerful solvers for
this abstract problem exist, they make simplifying assumptions by ignoring
kinematic constraints, making it difficult to use the resulting plans on actual
robots. In this paper, we present a solution which takes kinematic constraints,
such as maximum velocities, into account, while guaranteeing a user-specified
minimum safety distance between robots. We demonstrate our approach in
simulation and on real robots in 2D and 3D environments.
Cheng-Hao Cai, Dengfeng Ke, Yanyan Xu, Kaile Su
Comments: 8 pages, 7 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)
There is a wide gap between symbolic reasoning and deep learning. In this
research, we explore the possibility of using deep learning to improve symbolic
reasoning. Briefly, in a reasoning system, a deep feedforward neural network is
used to guide rewriting processes after learning from algebraic reasoning
examples produced by humans. To enable the neural network to recognise patterns
of algebraic expressions with non-deterministic sizes, reduced partial trees
are used to represent the expressions. Also, to represent both top-down and
bottom-up information of the expressions, a centralisation technique is used to
improve the reduced partial trees. Besides, symbolic association vectors and
rule application records are used to improve the rewriting processes.
Experimental results reveal that the algebraic reasoning examples can be
accurately learnt only if the feedforward neural network has enough hidden
layers. Also, the centralisation technique, the symbolic association vectors
and the rule application records can reduce error rates of reasoning. In
particular, the above approaches have led to 4.6% error rate of reasoning on a
dataset of linear equations, differentials and integrals.
Amin Morid, Olivia R. Liu Sheng, Samir Abdelrahman
Comments: 10 pages, Proc. of 7th Annual Workshop on Health IT and Economics (WHITE) (Eds.). Washington, DC., 2016
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Patient time series classification faces challenges in high degrees of
dimensionality and missingness. In light of patient similarity theory, this
study explores effective temporal feature engineering and reduction, missing
value imputation, and change point detection methods that can afford
similarity-based classification models with desirable accuracy enhancement. We
select a piecewise aggregation approximation method to extract fine-grain
temporal features and propose a minimalist method to impute missing values in
temporal features. For dimensionality reduction, we adopt a gradient descent
search method for feature weight assignment. We propose new patient status and
directional change definitions based on medical knowledge or clinical
guidelines about the value ranges for different patient status levels, and
develop a method to detect change points indicating positive or negative
patient status changes. We evaluate the effectiveness of the proposed methods
in the context of early Intensive Care Unit mortality prediction. The
evaluation results show that the k-Nearest Neighbor algorithm that incorporates
methods we select and propose significantly outperform the relevant benchmarks
for early ICU mortality prediction. This study makes contributions to time
series classification and early ICU mortality prediction via identifying and
enhancing temporal feature engineering and reduction methods for
similarity-based time series classification.
Freddy Lecue, Jiaoyan Chen, Jeff Pan, Huajun Chen
Subjects: Artificial Intelligence (cs.AI)
Data stream learning has been largely studied for extracting knowledge
structures from continuous and rapid data records. In the semantic Web, data is
interpreted in ontologies and its ordered sequence is represented as an
ontology stream. Our work exploits the semantics of such streams to tackle the
problem of concept drift i.e., unexpected changes in data distribution, causing
most of models to be less accurate as time passes. To this end we revisited (i)
semantic inference in the context of supervised stream learning, and (ii)
models with semantic embeddings. The experiments show accurate prediction with
data from Dublin and Beijing.
Maxim Rabinovich, Dan Klein
Comments: ACL 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
As entity type systems become richer and more fine-grained, we expect the
number of types assigned to a given entity to increase. However, most
fine-grained typing work has focused on datasets that exhibit a low degree of
type multiplicity. In this paper, we consider the high-multiplicity regime
inherent in data sources such as Wikipedia that have semi-open type systems. We
introduce a set-prediction approach to this problem and show that our model
outperforms unstructured baselines on a new Wikipedia-based fine-grained typing
corpus.
Amit Gupta, Rémi Lebret, Hamza Harkous, Karl Aberer
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
We propose a simple, yet effective, approach towards inducing multilingual
taxonomies from Wikipedia. Given an English taxonomy, our approach leverages
the interlanguage links of Wikipedia followed by character-level classifiers to
induce high-precision, high-coverage taxonomies in other languages. Through
experiments, we demonstrate that our approach significantly outperforms the
state-of-the-art, heuristics-heavy approaches for six languages. As a
consequence of our work, we release presumably the largest and the most
accurate multilingual taxonomic resource spanning over 280 languages.
Maxim Rabinovich, Mitchell Stern, Dan Klein
Comments: ACL 2017. MR and MS contributed equally
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Tasks like code generation and semantic parsing require mapping unstructured
(or partially structured) inputs to well-formed, executable outputs. We
introduce abstract syntax networks, a modeling framework for these problems.
The outputs are represented as abstract syntax trees (ASTs) and constructed by
a decoder with a dynamically-determined modular structure paralleling the
structure of the output tree. On the benchmark Hearthstone dataset for code
generation, our model obtains 79.2 BLEU and 22.7% exact match accuracy,
compared to previous state-of-the-art values of 67.1 and 6.1%. Furthermore, we
perform competitively on the Atis, Jobs, and Geo semantic parsing datasets with
no task-specific engineering.
Ramakanth Pasunuru, Mohit Bansal
Comments: Accepted at ACL 2017 (13 pages w/ supplementary)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Video captioning, the task of describing the content of a video, has seen
some promising improvements in recent years with sequence-to-sequence models,
but accurately learning the temporal and logical dynamics involved in the task
still remains a challenge, especially given the lack of sufficient annotated
data. We improve video captioning by sharing knowledge with two related
directed-generation tasks: a temporally-directed unsupervised video prediction
task to learn richer context-aware video encoder representations, and a
logically-directed language entailment generation task to learn better
video-entailing caption decoder representations. For this, we present a
many-to-many multi-task learning model that shares parameters across the
encoders and decoders of the three tasks. We achieve significant improvements
and the new state-of-the-art on several standard video captioning datasets
using diverse automatic and human evaluations. We also show mutual multi-task
improvements on the entailment generation task.
Ritambhara Singh, Arshdeep Sekhon, Kamran Kowsari, Jack Lanchantin, Beilun Wang, Yanjun Qi
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS)
String Kernel (SK) techniques, especially those using gapped (k)-mers as
features (gk), have obtained great success in classifying sequences like DNA,
protein, and text. However, the state-of-the-art gk-SK runs extremely slow when
we increase the dictionary size ((Sigma)) or allow more mismatches ((M)). This
is because current gk-SK uses a trie-based algorithm to calculate co-occurrence
of mismatched substrings resulting in a time cost proportional to
(O(Sigma^{M})). We propose a extbf{fast} algorithm for calculating
underline{Ga}pped (k)-mer underline{K}ernel using underline{Co}unting
(GaKCo). GaKCo uses associative arrays to calculate the co-occurrence of
substrings using cumulative counting. This algorithm is fast, scalable to
larger (Sigma) and (M), and naturally parallelizable. We provide a rigorous
asymptotic analysis that compares GaKCo with the state-of-the-art gk-SK.
Theoretically, the time cost of GaKCo is independent of the (Sigma^{M}) term
that slows down the trie-based approach. Experimentally, we observe that GaKCo
achieves the same accuracy as the state-of-the-art and outperforms its speed by
factors of 2, 100, and 4, on classifying sequences of DNA (5 datasets), protein
(12 datasets), and character-based English text (2 datasets), respectively
GaKCo is shared as an open source tool at
url{this https URL}
Hamed R. Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
To bridge the gap between humans and machines in image understanding and
describing, we need further insight into how people describe a perceived scene.
In this paper, we study the agreement between bottom-up saliency-based visual
attention and object referrals in scene description constructs. We investigate
the properties of human-written descriptions and machine-generated ones. We
then propose a saliency-boosted image captioning model in order to investigate
benefits from low-level cues in language models. We learn that (1) humans
mention more salient objects earlier than less salient ones in their
descriptions, (2) the better a captioning model performs, the better attention
agreement it has with human descriptions, (3) the proposed saliency-boosted
model, compared to its baseline form, does not improve significantly on the MS
COCO database, indicating explicit bottom-up boosting does not help when the
task is well learnt and tuned on a data, (4) a better generalization ability
is, however, observed for the saliency-boosted model on unseen data.
Hamed R. Tavakoli, Jorma Laaksonen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
This manuscript introduces the problem of prominent object detection and
recognition. The problem deals with finding the most important region of
interest, segmenting the relevant item/object in that area, and assigning it an
object class label. In other words, we are solving the three problems of
saliency modeling, saliency detection, and object recognition under one
umbrella. The motivation behind such a problem formulation is 1) the benefits
to the knowledge representation-based vision pipelines, and 2) the potential
improvements in emulating bio-inspired vision systems by solving these three
problems together. We propose a method as a baseline for further research. The
proposed model predicts the most important area in the image, segments the
associated objects, and labels them. The proposed problem and method are
evaluated against human fixations, annotated segmentation masks, and object
class categories. We define a chance level for each of the evaluation criterion
to compare the proposed algorithm with. Despite the good performance of the
proposed baseline, the overall evaluations indicate that the problem of
prominent object detection and recognition is a challenging task that is still
worth investigating further.
Harshita Sahijwani, Sourish Dasgupta
Comments: Work in progress. arXiv admin note: text overlap with arXiv:1611.04822
Subjects: Information Retrieval (cs.IR)
We design a recommender system for research papers based on topic-modeling.
The users feedback to the results is used to make the results more relevant the
next time they fire a query. The user’s needs are understood by observing the
change in the themes that the user shows a preference for over time.
Maxim Rabinovich, Dan Klein
Comments: ACL 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
As entity type systems become richer and more fine-grained, we expect the
number of types assigned to a given entity to increase. However, most
fine-grained typing work has focused on datasets that exhibit a low degree of
type multiplicity. In this paper, we consider the high-multiplicity regime
inherent in data sources such as Wikipedia that have semi-open type systems. We
introduce a set-prediction approach to this problem and show that our model
outperforms unstructured baselines on a new Wikipedia-based fine-grained typing
corpus.
Amit Gupta, Rémi Lebret, Hamza Harkous, Karl Aberer
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike most previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimum-cost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
state-of-the-art taxonomy induction approaches across four languages.
Furthermore, we show that our approach is robust to the presence of noise in
the input vocabulary.
Amit Gupta, Rémi Lebret, Hamza Harkous, Karl Aberer
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
We propose a simple, yet effective, approach towards inducing multilingual
taxonomies from Wikipedia. Given an English taxonomy, our approach leverages
the interlanguage links of Wikipedia followed by character-level classifiers to
induce high-precision, high-coverage taxonomies in other languages. Through
experiments, we demonstrate that our approach significantly outperforms the
state-of-the-art, heuristics-heavy approaches for six languages. As a
consequence of our work, we release presumably the largest and the most
accurate multilingual taxonomic resource spanning over 280 languages.
Maxim Rabinovich, Dan Klein
Comments: ACL 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
As entity type systems become richer and more fine-grained, we expect the
number of types assigned to a given entity to increase. However, most
fine-grained typing work has focused on datasets that exhibit a low degree of
type multiplicity. In this paper, we consider the high-multiplicity regime
inherent in data sources such as Wikipedia that have semi-open type systems. We
introduce a set-prediction approach to this problem and show that our model
outperforms unstructured baselines on a new Wikipedia-based fine-grained typing
corpus.
Amit Gupta, Rémi Lebret, Hamza Harkous, Karl Aberer
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
We propose a simple, yet effective, approach towards inducing multilingual
taxonomies from Wikipedia. Given an English taxonomy, our approach leverages
the interlanguage links of Wikipedia followed by character-level classifiers to
induce high-precision, high-coverage taxonomies in other languages. Through
experiments, we demonstrate that our approach significantly outperforms the
state-of-the-art, heuristics-heavy approaches for six languages. As a
consequence of our work, we release presumably the largest and the most
accurate multilingual taxonomic resource spanning over 280 languages.
Liner Yang, Meishan Zhang, Yang Liu, Nan Yu, Maosong Sun, Guohong Fu
Subjects: Computation and Language (cs.CL)
While part-of-speech (POS) tagging and dependency parsing are observed to be
closely related, existing work on joint modeling with manually crafted feature
templates suffers from the feature sparsity and incompleteness problems. In
this paper, we propose an approach to joint POS tagging and dependency parsing
using transition-based neural networks. Three neural network based classifiers
are designed to resolve shift/reduce, tagging, and labeling conflicts.
Experiments show that our approach significantly outperforms previous methods
for joint POS tagging and dependency parsing across a variety of natural
languages.
Xinchi Chen, Zhan Shi, Xipeng Qiu, Xuanjing Huang
Subjects: Computation and Language (cs.CL)
Different linguistic perspectives causes many diverse segmentation criteria
for Chinese word segmentation (CWS). Most existing methods focus on improve the
performance for each single criterion. However, it is interesting to exploit
these different criteria and mining their common underlying knowledge. In this
paper, we propose adversarial multi-criteria learning for CWS by integrating
shared knowledge from multiple heterogeneous segmentation criteria. Experiments
on eight corpora with heterogeneous segmentation criteria show that the
performance of each corpus obtains a significant improvement, compared to
single-criterion learning. Source codes of this paper are available on Github.
Maxim Rabinovich, Mitchell Stern, Dan Klein
Comments: ACL 2017. MR and MS contributed equally
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Tasks like code generation and semantic parsing require mapping unstructured
(or partially structured) inputs to well-formed, executable outputs. We
introduce abstract syntax networks, a modeling framework for these problems.
The outputs are represented as abstract syntax trees (ASTs) and constructed by
a decoder with a dynamically-determined modular structure paralleling the
structure of the output tree. On the benchmark Hearthstone dataset for code
generation, our model obtains 79.2 BLEU and 22.7% exact match accuracy,
compared to previous state-of-the-art values of 67.1 and 6.1%. Furthermore, we
perform competitively on the Atis, Jobs, and Geo semantic parsing datasets with
no task-specific engineering.
Ramakanth Pasunuru, Mohit Bansal
Comments: Accepted at ACL 2017 (13 pages w/ supplementary)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Video captioning, the task of describing the content of a video, has seen
some promising improvements in recent years with sequence-to-sequence models,
but accurately learning the temporal and logical dynamics involved in the task
still remains a challenge, especially given the lack of sufficient annotated
data. We improve video captioning by sharing knowledge with two related
directed-generation tasks: a temporally-directed unsupervised video prediction
task to learn richer context-aware video encoder representations, and a
logically-directed language entailment generation task to learn better
video-entailing caption decoder representations. For this, we present a
many-to-many multi-task learning model that shares parameters across the
encoders and decoders of the three tasks. We achieve significant improvements
and the new state-of-the-art on several standard video captioning datasets
using diverse automatic and human evaluations. We also show mutual multi-task
improvements on the entailment generation task.
Chandler May, Kevin Duh, Benjamin Van Durme, Ashwin Lall
Comments: 16 pages
Subjects: Computation and Language (cs.CL)
We develop a streaming (one-pass, bounded-memory) word embedding algorithm
based on the canonical skip-gram with negative sampling algorithm implemented
in word2vec. We compare our streaming algorithm to word2vec empirically by
measuring the cosine similarity between word pairs under each algorithm and by
applying each algorithm in the downstream task of hashtag prediction on a
two-month interval of the Twitter sample stream. We then discuss the results of
these experiments, concluding they provide partial validation of our approach
as a streaming replacement for word2vec. Finally, we discuss potential failure
modes and suggest directions for future work.
Yanging Chen, Rami Al-Rfou', Yejin Choi
Comments: 9 figures, 5 tables, 9 pages
Subjects: Computation and Language (cs.CL)
This paper presents the first attempt, up to our knowledge, to classify
English writing styles on this scale with the challenge of classifying day to
day language written by writers with different backgrounds covering various
areas of topics.The paper proposes simple machine learning algorithms and
simple to generate features to solve hard problems. Relying on the scale of the
data available from large sources of knowledge like Wikipedia. We believe such
sources of data are crucial to generate robust solutions for the web with high
accuracy and easy to deploy in practice. The paper achieves 74\% accuracy
classifying native versus non native speakers writing styles.
Moreover, the paper shows some interesting observations on the similarity
between different languages measured by the similarity of their users English
writing styles. This technique could be used to show some well known facts
about languages as in grouping them into families, which our experiments
support.
Pierre Isabelle, Colin Cherry, George Foster
Comments: 26 pages, including appendix
Subjects: Computation and Language (cs.CL)
Neural machine translation represents an exciting leap forward in translation
quality. But what longstanding weaknesses does it resolve, and which remain? We
address these questions with a challenge set approach to translation evaluation
and error analysis. A challenge set consists of a small set of sentences, each
hand-designed to probe a system’s capacity to bridge a particular structural
divergence between languages. To exemplify this approach, we present an
English-French challenge set, and use it to analyze phrase-based and neural
systems. The resulting analysis provides not only a more fine-grained picture
of the strengths of neural systems, but also insight into which linguistic
phenomena remain out of reach.
Yanqing Chen, Steven Skiena
Comments: 9 pages, 6 tables, 5 figures
Subjects: Computation and Language (cs.CL)
Wikipedia is a useful knowledge source that benefits many applications in
language processing and knowledge representation. An important feature of
Wikipedia is that of categories. Wikipedia pages are assigned different
categories according to their contents as human-annotated labels which can be
used in information retrieval, ad hoc search improvements, entity ranking and
tag recommendations. However, important pages are usually assigned too many
categories, which makes it difficult to recognize the most important ones that
give the best descriptions.
In this paper, we propose an approach to recognize the most descriptive
Wikipedia categories. We observe that historical figures in a precise category
presumably are mutually similar and such categorical coherence could be
evaluated via texts or Wikipedia links of corresponding members in the
category. We rank descriptive level of Wikipedia categories according to their
coherence and our ranking yield an overall agreement of 88.27% compared with
human wisdom.
Yichen Gong, Samuel R. Bowman
Comments: 10 pages, 6 figures
Subjects: Computation and Language (cs.CL)
To answer the question in machine comprehension (MC) task, the models need to
establish the interaction between the question and the context. To tackle the
problem that the single-pass model cannot reflect on and correct its answer, we
present Ruminating Reader. Ruminating Reader adds a second pass of attention
and a novel information fusion component to the Bi-Directional Attention Flow
model (BiDAF). We propose novel layer structures that construct an query-aware
context vector representation and fuse encoding representation with
intermediate representation on top of BiDAF model. We show that a multi-hop
attention mechanism can be applied to a bi-directional attention structure. In
experiments on SQuAD, we find that the Reader outperforms the BiDAF baseline by
a substantial margin, and matches or surpasses the performance of all other
published systems.
Yevgeni Berzak, Chie Nakamura, Suzanne Flynn, Boris Katz
Comments: ACL 2017
Subjects: Computation and Language (cs.CL)
A fundamental question in language learning concerns the role of a speaker’s
first language in second language acquisition. We present a novel methodology
for studying this question: analysis of eye-movement patterns in second
language reading of free-form text. Using this methodology, we demonstrate for
the first time that the native language of English learners can be predicted
from their gaze fixations when reading English. We provide analysis of
classifier uncertainty and learned features, which indicates that differences
in English reading are likely to be rooted in linguistic divergences across
native languages. The presented framework complements production studies and
offers new ground for advancing research on multilingualism.
Emeric Bernard-Jones, Jeremiah Onaolapo, Gianluca Stringhini
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)
We set out to understand the effects of differing language on the ability of
cybercriminals to navigate webmail accounts and locate sensitive information in
them. To this end, we configured thirty Gmail honeypot accounts with English,
Romanian, and Greek language settings. We populated the accounts with email
messages in those languages by subscribing them to selected online newsletters.
We hid email messages about fake bank accounts in fifteen of the accounts to
mimic real-world webmail users that sometimes store sensitive information in
their accounts. We then leaked credentials to the honey accounts via paste
sites on the Surface Web and the Dark Web, and collected data for fifteen days.
Our statistical analyses on the data show that cybercriminals are more likely
to discover sensitive information (bank account information) in the Greek
accounts than the remaining accounts, contrary to the expectation that Greek
ought to constitute a barrier to the understanding of non-Greek visitors to the
Greek accounts. We also extracted the important words among the emails that
cybercriminals accessed (as an approximation of the keywords that they searched
for within the honey accounts), and found that financial terms featured among
the top words. In summary, we show that language plays a significant role in
the ability of cybercriminals to access sensitive information hidden in
compromised webmail accounts.
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim
Comments: Accepted at IJCAI 2017 (The 26th International Joint Conference on Artificial Intelligence)
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Computer programs written in one language are often required to be ported to
other languages to support multiple devices and environments. When programs use
language specific APIs (Application Programming Interfaces), it is very
challenging to migrate these APIs to the corresponding APIs written in other
languages. Existing approaches mine API mappings from projects that have
corresponding versions in two languages. They rely on the sparse availability
of bilingual projects, thus producing a limited number of API mappings. In this
paper, we propose an intelligent system called DeepAM for automatically mining
API mappings from a large-scale code corpus without bilingual projects. The key
component of DeepAM is based on the multimodal sequence to sequence learning
architecture that aims to learn joint semantic representations of bilingual API
sequences from big source code data. Experimental results indicate that DeepAM
significantly increases the accuracy of API mappings as well as the number of
API mappings, when compared with the state-of-the-art approaches.
Amit Gupta, Rémi Lebret, Hamza Harkous, Karl Aberer
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike most previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimum-cost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
state-of-the-art taxonomy induction approaches across four languages.
Furthermore, we show that our approach is robust to the presence of noise in
the input vocabulary.
Ritambhara Singh, Arshdeep Sekhon, Kamran Kowsari, Jack Lanchantin, Beilun Wang, Yanjun Qi
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS)
String Kernel (SK) techniques, especially those using gapped (k)-mers as
features (gk), have obtained great success in classifying sequences like DNA,
protein, and text. However, the state-of-the-art gk-SK runs extremely slow when
we increase the dictionary size ((Sigma)) or allow more mismatches ((M)). This
is because current gk-SK uses a trie-based algorithm to calculate co-occurrence
of mismatched substrings resulting in a time cost proportional to
(O(Sigma^{M})). We propose a extbf{fast} algorithm for calculating
underline{Ga}pped (k)-mer underline{K}ernel using underline{Co}unting
(GaKCo). GaKCo uses associative arrays to calculate the co-occurrence of
substrings using cumulative counting. This algorithm is fast, scalable to
larger (Sigma) and (M), and naturally parallelizable. We provide a rigorous
asymptotic analysis that compares GaKCo with the state-of-the-art gk-SK.
Theoretically, the time cost of GaKCo is independent of the (Sigma^{M}) term
that slows down the trie-based approach. Experimentally, we observe that GaKCo
achieves the same accuracy as the state-of-the-art and outperforms its speed by
factors of 2, 100, and 4, on classifying sequences of DNA (5 datasets), protein
(12 datasets), and character-based English text (2 datasets), respectively
GaKCo is shared as an open source tool at
url{this https URL}
Leszek Gasieniec, Grzegorz Stachowiak
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The model of population protocols refers to the growing in popularity
theoretical framework suitable for studying pairwise interactions within a
large collection of simple indistinguishable entities, frequently called
agents. In this paper the emphasis is on the space complexity in fast leader
election via population protocols governed by the random scheduler, which
uniformly at random selects pairwise interactions within the population of n
agents.
The main result of this paper is a new fast and space optimal leader election
protocol. The new protocol utilises O(log^2 n) parallel time (which is
equivalent to O(n log^2 n) sequential pairwise interactions), and each agent
operates on O(log log n) states. This double logarithmic space usage matches
asymptotically the lower bound 1/2 log log n on the minimal number of states
required by agents in any leader election algorithm with the running time
o(n/polylog n).
Our solution takes an advantage of the concept of phase clocks, a fundamental
synchronisation and coordination tool in distributed computing. We propose a
new fast and robust population protocol for initialisation of phase clocks to
be run simultaneously in multiple modes and intertwined with the leader
election process. We also provide the reader with the relevant formal
argumentation indicating that our solution is always correct, and fast with
high probability.
Zhi Li, Wei Shi, Ming Yan
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
This paper considers the problem of decentralized optimization with a
composite objective containing smooth and non-smooth terms. To solve the
problem, a proximal-gradient scheme is studied. Specifically, the smooth and
nonsmooth terms are dealt with by gradient update and proximal update,
respectively. The studied algorithm is closely related to a previous
decentralized optimization algorithm, PG-EXTRA [37], but has a few advantages.
First of all, in our new scheme, agents use uncoordinated step-sizes and the
stable upper bounds on step-sizes are independent from network topology. The
step-sizes depend on local objective functions, and they can be as large as
that of the gradient descent. Secondly, for the special case without non-smooth
terms, linear convergence can be achieved under the strong convexity
assumption. The dependence of the convergence rate on the objective functions
and the network are separated, and the convergence rate of our new scheme is as
good as one of the two convergence rates that match the typical rates for the
general gradient descent and the consensus averaging. We also provide some
numerical experiments to demonstrate the efficacy of the introduced algorithms
and validate our theoretical discoveries.
Haoyi Xiong, Wei Cheng, Wenqing Hu, Jiang Bian, Zhishan Guo
Subjects: Learning (cs.LG)
Linear Discriminant Analysis (LDA) on Electronic Health Records (EHR) data is
widely-used for early detection of diseases. Classical LDA for EHR data
classification, however, suffers from two handicaps: the ill-posed estimation
of LDA parameters (e.g., covariance matrix), and the “linear inseparability” of
EHR data. To handle these two issues, in this paper, we propose a novel
classifier FWDA — Fast Wishart Discriminant Analysis, that makes predictions
in an ensemble way. Specifically, FWDA first surrogates the distribution of
inverse covariance matrices using a Wishart distribution estimated from the
training data, then “weighted-averages” the classification results of multiple
LDA classifiers parameterized by the sampled inverse covariance matrices via a
Bayesian Voting scheme. The weights for voting are optimally updated to adapt
each new input data, so as to enable the nonlinear classification. Theoretical
analysis indicates that FWDA possesses a fast convergence rate and a robust
performance on high dimensional data. Extensive experiments on large-scale EHR
dataset show that our approach outperforms state-of-the-art algorithms by a
large margin.
Jordan Hochenbaum, Owen S. Vallis, Arun Kejariwal
Comments: 13 pages, 12 figures
Subjects: Learning (cs.LG)
Performance and high availability have become increasingly important drivers,
amongst other drivers, for user retention in the context of web services such
as social networks, and web search. Exogenic and/or endogenic factors often
give rise to anomalies, making it very challenging to maintain high
availability, while also delivering high performance. Given that
service-oriented architectures (SOA) typically have a large number of services,
with each service having a large set of metrics, automatic detection of
anomalies is non-trivial.
Although there exists a large body of prior research in anomaly detection,
existing techniques are not applicable in the context of social network data,
owing to the inherent seasonal and trend components in the time series data.
To this end, we developed two novel statistical techniques for automatically
detecting anomalies in cloud infrastructure data. Specifically, the techniques
employ statistical learning to detect anomalies in both application, and system
metrics. Seasonal decomposition is employed to filter the trend and seasonal
components of the time series, followed by the use of robust statistical
metrics — median and median absolute deviation (MAD) — to accurately detect
anomalies, even in the presence of seasonal spikes.
We demonstrate the efficacy of the proposed techniques from three different
perspectives, viz., capacity planning, user behavior, and supervised learning.
In particular, we used production data for evaluation, and we report Precision,
Recall, and F-measure in each case.
Arit Kumar Bishwas, Ashish Mani, Vasile Palade
Subjects: Learning (cs.LG); Quantum Physics (quant-ph)
In this paper we have discussed a quantum approach for the all-pair
multiclass classification problem. We have shown that the multiclass support
vector machine for big data classification with a quantum all-pair approach can
be implemented in logarithm time complexity on a quantum computer. In an
all-pair approach, there is one binary classification problem for each pair of
classes, and so there are k (k-1)/2 classifiers for a k-class problem. As
compared to the classical multiclass support vector machine that can be
implemented with polynomial time complexity, our approach exhibits exponential
speed up in the quantum version. The quantum all-pair algorithm can be used
with other classification algorithms, and a speed up gain can be achieved as
compared to their classical counterparts.
Dmitry Ignatov, Andrey Ignatov
Subjects: Learning (cs.LG)
Various modifications of decision trees have been extensively used during the
past years due to their high efficiency and interpretability. Selection of
relevant features for spitting the tree nodes is a key property of their
architecture, at the same time being their major shortcoming: the recursive
nodes partitioning leads to geometric reduction of data quantity in the leaf
nodes, which causes an excessive model complexity and data overfitting. In this
paper, we present a novel architecture – a Decision Stream, – aimed to overcome
this problem. Instead of building an acyclic tree structure during the training
process, we propose merging nodes from different branches based on their
similarity that is estimated with two-sample test statistics. To evaluate the
proposed solution, we test it on several common machine learning problems~—
credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and
CIFAR image classification, synthetic data classification and regression. Our
experimental results reveal that the proposed approach significantly
outperforms the standard decision tree method on both regression and
classification tasks, yielding a prediction error decrease up to 35%.
Shin Ando, Chun Yuan Huang
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Class imbalance is a challenging issue in practical classification problems
for deep learning models as well as traditional models. Traditionally
successful countermeasures such as synthetic over-sampling have had limited
success with complex, structured data handled by deep learning models. In this
paper, we propose Deep Over-sampling (DOS), a framework for extending the
synthetic over-sampling method to exploit the deep feature space acquired by a
convolutional neural network (CNN). Its key feature is an explicit, supervised
representation learning, for which the training data presents each raw input
sample with a synthetic embedding target in the deep feature space . which is
sampled from the linear subspace of in-class neighbors. We implement an
iterative process of training the CNN and updating the targets, which induces
smaller in-class variance among the embeddings, to increase the discriminative
power of the deep representation. We present an empirical study using public
benchmarks, which shows that the DOS framework not only counteracts class
imbalance better than the existing method, but also improves the performance of
the CNN in the standard, balanced settings.
Ga Wu, Buser Say, Scott Sanner
Comments: 8 pages
Subjects: Learning (cs.LG)
Given recent deep learning results that demonstrate the ability to
effectively optimize high-dimensional non-convex functions with gradient
descent optimization on GPUs, we ask in this paper whether symbolic gradient
optimization tools such as Tensorflow can be effective for planning in hybrid
(mixed discrete and continuous) nonlinear domains with high dimensional state
and action spaces? To this end, we demonstrate that hybrid planning with
Tensorflow and RMSProp gradient descent is competitive with mixed integer
linear program (MILP) based optimization on piecewise linear planning domains
(where we can compute optimal solutions) and substantially outperforms
state-of-the-art interior point methods for nonlinear planning domains.
Furthermore, we remark that Tensorflow is highly scalable, converging to a
strong policy on a large-scale concurrent domain with a total of 576,000
continuous actions over a horizon of 96 time steps in only 4 minutes. We
provide a number of insights that clarify such strong performance including
observations that despite long horizons, RMSProp avoids both the vanishing and
exploding gradients problem. Together these results suggest a new frontier for
highly scalable planning in nonlinear hybrid domains by leveraging GPUs and the
power of recent advances in gradient descent with highly optmized toolkits like
Tensorflow.
Eugenio Tacchini, Gabriele Ballarin, Marco L. Della Vedova, Stefano Moret, Luca de Alfaro
Subjects: Learning (cs.LG); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI)
In recent years, the reliability of information on the Internet has emerged
as a crucial issue of modern society. Social network sites (SNSs) have
revolutionized the way in which information is spread by allowing users to
freely share content. As a consequence, SNSs are also increasingly used as
vectors for the diffusion of misinformation and hoaxes. The amount of
disseminated information and the rapidity of its diffusion make it practically
impossible to assess reliability in a timely manner, highlighting the need for
automatic hoax detection systems.
As a contribution towards this objective, we show that Facebook posts can be
classified with high accuracy as hoaxes or non-hoaxes on the basis of the users
who “liked” them. We present two classification techniques, one based on
logistic regression, the other on a novel adaptation of boolean crowdsourcing
algorithms. On a dataset consisting of 15,500 Facebook posts and 909,236 users,
we obtain classification accuracies exceeding 99% even when the training set
contains less than 1% of the posts. We further show that our techniques are
robust: they work even when we restrict our attention to the users who like
both hoax and non-hoax posts. These results suggest that mapping the diffusion
pattern of information can be a useful component of automatic hoax detection
systems.
Mohammad Amin Morid, Olivia R. Liu Sheng, Samir Abdelrahman
Comments: 10 pages, Healthcare Analytics and Medical Decision Making, INFORMS Workshop. Nashville, Tennessee, 2016
Subjects: Learning (cs.LG)
To date, developing a good model for early intensive care unit (ICU)
mortality prediction is still challenging. This paper presents a patient based
predictive modeling framework (PPMF) to improve the performance of ICU
mortality prediction using data collected during the first 48 hours of ICU
admission. PPMF consists of three main components verifying three related
research hypotheses. The first component captures dynamic changes of patients
status in the ICU using their time series data (e.g., vital signs and
laboratory tests). The second component is a local approximation algorithm that
classifies patients based on their similarities. The third component is a
Gradient Decent wrapper that updates feature weights according to the
classification feedback. Experiments using data from MIMICIII show that PPMF
significantly outperforms: (1) the severity score systems, namely SASP III,
APACHE IV, and MPM0III, (2) the aggregation based classifiers that utilize
summarized time series, and (3) baseline feature selection methods.
Jonathan T. Barron
Subjects: Learning (cs.LG)
Exponential Linear Units (ELUs) are a useful rectifier for constructing deep
learning architectures, as they may speed up and otherwise improve learning by
virtue of not have vanishing gradients and by having mean activations near
zero. However, the ELU activation as parametrized in [1] is not continuously
differentiable with respect to its input when the shape parameter alpha is not
equal to 1. We present an alternative parametrization which is C1 continuous
for all values of alpha, making the rectifier easier to reason about and making
alpha easier to tune. This alternative parametrization has several other useful
properties that the original parametrization of ELU does not: 1) its derivative
with respect to x is bounded, 2) it contains both the linear transfer function
and ReLU as special cases, and 3) it is scale-similar with respect to alpha.
Ritambhara Singh, Arshdeep Sekhon, Kamran Kowsari, Jack Lanchantin, Beilun Wang, Yanjun Qi
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS)
String Kernel (SK) techniques, especially those using gapped (k)-mers as
features (gk), have obtained great success in classifying sequences like DNA,
protein, and text. However, the state-of-the-art gk-SK runs extremely slow when
we increase the dictionary size ((Sigma)) or allow more mismatches ((M)). This
is because current gk-SK uses a trie-based algorithm to calculate co-occurrence
of mismatched substrings resulting in a time cost proportional to
(O(Sigma^{M})). We propose a extbf{fast} algorithm for calculating
underline{Ga}pped (k)-mer underline{K}ernel using underline{Co}unting
(GaKCo). GaKCo uses associative arrays to calculate the co-occurrence of
substrings using cumulative counting. This algorithm is fast, scalable to
larger (Sigma) and (M), and naturally parallelizable. We provide a rigorous
asymptotic analysis that compares GaKCo with the state-of-the-art gk-SK.
Theoretically, the time cost of GaKCo is independent of the (Sigma^{M}) term
that slows down the trie-based approach. Experimentally, we observe that GaKCo
achieves the same accuracy as the state-of-the-art and outperforms its speed by
factors of 2, 100, and 4, on classifying sequences of DNA (5 datasets), protein
(12 datasets), and character-based English text (2 datasets), respectively
GaKCo is shared as an open source tool at
url{this https URL}
Justin Lazarow, Long Jin, Zhuowen Tu
Comments: 10 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We study unsupervised learning by developing introspective generative
modeling (IGM) that attains a generator using progressively learned deep
convolutional neural networks. The generator is itself a discriminator, capable
of introspection: being able to self-evaluate the difference between its
generated samples and the given training data. When followed by repeated
discriminative learning, desirable properties of modern discriminative
classifiers are directly inherited by the generator. IGM learns a cascade of
CNN classifiers using a synthesis-by-classification algorithm. In the
experiments, we observe encouraging results on a number of applications
including texture modeling, artistic style transferring, face modeling, and
semi-supervised learning.
Long Jin, Justin Lazarow, Zhuowen Tu
Comments: 11 pages, 6 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper we propose introspective classifier learning (ICL) that
emphasizes the importance of having a discriminative classifier empowered with
generative capabilities. We develop a reclassification-by-synthesis algorithm
to perform training using a formulation stemmed from the Bayes theory. Our
classifier is able to iteratively: (1) synthesize pseudo-negative samples in
the synthesis step; and (2) enhance itself by improving the classification in
the reclassification step. The single classifier learned is at the same time
generative — being able to directly synthesize new samples within its own
discriminative model. We conduct experiments on standard benchmark datasets
including MNIST, CIFAR, and SVHN using state-of-the-art CNN architectures, and
observe improved classification results.
Zhi Li, Wei Shi, Ming Yan
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
This paper considers the problem of decentralized optimization with a
composite objective containing smooth and non-smooth terms. To solve the
problem, a proximal-gradient scheme is studied. Specifically, the smooth and
nonsmooth terms are dealt with by gradient update and proximal update,
respectively. The studied algorithm is closely related to a previous
decentralized optimization algorithm, PG-EXTRA [37], but has a few advantages.
First of all, in our new scheme, agents use uncoordinated step-sizes and the
stable upper bounds on step-sizes are independent from network topology. The
step-sizes depend on local objective functions, and they can be as large as
that of the gradient descent. Secondly, for the special case without non-smooth
terms, linear convergence can be achieved under the strong convexity
assumption. The dependence of the convergence rate on the objective functions
and the network are separated, and the convergence rate of our new scheme is as
good as one of the two convergence rates that match the typical rates for the
general gradient descent and the consensus averaging. We also provide some
numerical experiments to demonstrate the efficacy of the introduced algorithms
and validate our theoretical discoveries.
Maxim Rabinovich, Dan Klein
Comments: ACL 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
As entity type systems become richer and more fine-grained, we expect the
number of types assigned to a given entity to increase. However, most
fine-grained typing work has focused on datasets that exhibit a low degree of
type multiplicity. In this paper, we consider the high-multiplicity regime
inherent in data sources such as Wikipedia that have semi-open type systems. We
introduce a set-prediction approach to this problem and show that our model
outperforms unstructured baselines on a new Wikipedia-based fine-grained typing
corpus.
Wenjian Yu, Yu Gu, Jian Li, Shenghua Liu, Yaohang Li
Comments: IJCAI 2017, 16 pages, 6 figures
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG); Numerical Analysis (math.NA)
Principal component analysis (PCA) is a fundamental dimension reduction tool
in statistics and machine learning. For large and high-dimensional data,
computing the PCA (i.e., the singular vectors corresponding to a number of
dominant singular values of the data matrix) becomes a challenging task. In
this work, a single-pass randomized algorithm is proposed to compute PCA with
only one pass over the data. It is suitable for processing extremely large and
high-dimensional data stored in slow memory (hard disk) or the data generated
in a streaming fashion. Experiments with synthetic and real data validate the
algorithm’s accuracy, which has orders of magnitude smaller error than an
existing single-pass algorithm. For a set of high-dimensional data stored as a
150 GB file, the proposed algorithm is able to compute the first 50 principal
components in just 24 minutes on a typical 24-core computer, with less than 1
GB memory cost.
Tushar Vaidya, Carlos Murguia, Georgios Piliouras
Comments: 12 pages, 7 figures
Subjects: Mathematical Finance (q-fin.MF); Learning (cs.LG); Multiagent Systems (cs.MA)
Black-Scholes (BS) is the standard mathematical model for option pricing in
financial markets. Option prices are calculated using an analytical formula
whose main inputs are strike (at which price to exercise) and volatility. The
BS framework assumes that volatility remains constant across all strikes,
however, in practice it varies. How do traders come to learn these parameters?
We introduce natural models of learning agents, in which they update their
beliefs about the true implied volatility based on the opinions of other
traders. We prove convergence of these opinion dynamics using techniques from
control theory and leader-follower models, thus providing a resolution between
theory and market practices. We allow for two different models, one with
feedback and one with an unknown leader and no feedback. Both scalar and
multidimensional cases are analyzed.
Changde Du, Changying Du, Jinpeng Li, Wei-long Zheng, Bao-liang Lu, Huiguang He
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
In emotion recognition, it is difficult to recognize human’s emotional states
using just a single modality. Besides, the annotation of physiological
emotional data is particularly expensive. These two aspects make the building
of effective emotion recognition model challenging. In this paper, we first
build a multi-view deep generative model to simulate the generative process of
multi-modality emotional data. By imposing a mixture of Gaussians assumption on
the posterior approximation of the latent variables, our model can learn the
shared deep representation from multiple modalities. To solve the
labeled-data-scarcity problem, we further extend our multi-view model to
semi-supervised learning scenario by casting the semi-supervised classification
problem as a specialized missing data imputation task. Our semi-supervised
multi-view deep generative framework can leverage both labeled and unlabeled
data from multiple modalities, where the weight factor for each modality can be
learned automatically. Compared with previous emotion recognition methods, our
method is more robust and flexible. The experiments conducted on two real
multi-modal emotion datasets have demonstrated the superiority of our framework
over a number of competitors.
Maxim Rabinovich, Mitchell Stern, Dan Klein
Comments: ACL 2017. MR and MS contributed equally
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Tasks like code generation and semantic parsing require mapping unstructured
(or partially structured) inputs to well-formed, executable outputs. We
introduce abstract syntax networks, a modeling framework for these problems.
The outputs are represented as abstract syntax trees (ASTs) and constructed by
a decoder with a dynamically-determined modular structure paralleling the
structure of the output tree. On the benchmark Hearthstone dataset for code
generation, our model obtains 79.2 BLEU and 22.7% exact match accuracy,
compared to previous state-of-the-art values of 67.1 and 6.1%. Furthermore, we
perform competitively on the Atis, Jobs, and Geo semantic parsing datasets with
no task-specific engineering.
Feng Nan, Venkatesh Saligrama
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We present a dynamic model selection approach for resource-constrained
prediction. Given an input instance at test-time, a gating function identifies
a prediction model for the input among a collection of models. Our objective is
to minimize overall average cost without sacrificing accuracy. We learn gating
and prediction models on fully labeled training data by means of a bottom-up
strategy. Our novel bottom-up method is a recursive scheme whereby a
high-accuracy complex model is first trained. Then a low-complexity gating and
prediction model are subsequently learnt to adaptively approximate the
high-accuracy model in regions where low-cost models are capable of making
highly accurate predictions. We pose an empirical loss minimization problem
with cost constraints to jointly train gating and prediction models. On a
number of benchmark datasets our method outperforms state-of-the-art achieving
higher accuracy for the same cost.
Cheng-Hao Cai, Dengfeng Ke, Yanyan Xu, Kaile Su
Comments: 8 pages, 7 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Logic in Computer Science (cs.LO)
There is a wide gap between symbolic reasoning and deep learning. In this
research, we explore the possibility of using deep learning to improve symbolic
reasoning. Briefly, in a reasoning system, a deep feedforward neural network is
used to guide rewriting processes after learning from algebraic reasoning
examples produced by humans. To enable the neural network to recognise patterns
of algebraic expressions with non-deterministic sizes, reduced partial trees
are used to represent the expressions. Also, to represent both top-down and
bottom-up information of the expressions, a centralisation technique is used to
improve the reduced partial trees. Besides, symbolic association vectors and
rule application records are used to improve the rewriting processes.
Experimental results reveal that the algebraic reasoning examples can be
accurately learnt only if the feedforward neural network has enough hidden
layers. Also, the centralisation technique, the symbolic association vectors
and the rule application records can reduce error rates of reasoning. In
particular, the above approaches have led to 4.6% error rate of reasoning on a
dataset of linear equations, differentials and integrals.
Amin Morid, Olivia R. Liu Sheng, Samir Abdelrahman
Comments: 10 pages, Proc. of 7th Annual Workshop on Health IT and Economics (WHITE) (Eds.). Washington, DC., 2016
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Patient time series classification faces challenges in high degrees of
dimensionality and missingness. In light of patient similarity theory, this
study explores effective temporal feature engineering and reduction, missing
value imputation, and change point detection methods that can afford
similarity-based classification models with desirable accuracy enhancement. We
select a piecewise aggregation approximation method to extract fine-grain
temporal features and propose a minimalist method to impute missing values in
temporal features. For dimensionality reduction, we adopt a gradient descent
search method for feature weight assignment. We propose new patient status and
directional change definitions based on medical knowledge or clinical
guidelines about the value ranges for different patient status levels, and
develop a method to detect change points indicating positive or negative
patient status changes. We evaluate the effectiveness of the proposed methods
in the context of early Intensive Care Unit mortality prediction. The
evaluation results show that the k-Nearest Neighbor algorithm that incorporates
methods we select and propose significantly outperform the relevant benchmarks
for early ICU mortality prediction. This study makes contributions to time
series classification and early ICU mortality prediction via identifying and
enhancing temporal feature engineering and reduction methods for
similarity-based time series classification.
Rushil Anirudh, Jayaraman J. Thiagarajan
Comments: Under review
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Using predictive models to identify patterns that can act as biomarkers for
different neuropathoglogical conditions is becoming highly prevalent. In this
paper, we consider the problem of Autism Spectrum Disorder (ASD)
classification. While non-invasive imaging measurements, such as the rest state
fMRI, are typically used in this problem, it can be beneficial to incorporate a
wide variety of non-imaging features, including personal and socio-cultural
traits, into predictive modeling. We propose to employ a graph-based approach
for combining both types of feature, where a contextual graph encodes the
traits of a larger population while the brain activity patterns are defined as
a multivariate function at the nodes of the graph. Since the underlying graph
dictates the performance of the resulting predictive models, we explore the use
of different graph construction strategies. Furthermore, we develop a
bootstrapped version of graph convolutional neural networks (G-CNNs) that
utilizes an ensemble of weakly trained G-CNNs to avoid overfitting and also
reduce the sensitivity of the models on the choice of graph construction. We
demonstrate its effectiveness on the Autism Brain Imaging Data Exchange (ABIDE)
dataset and show that the proposed approach outperforms state-of-the-art
approaches for this problem.
Haw-Shiuan Chang, Erik Learned-Miller, Andrew McCallum
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
This paper addresses the limitations of previous training methods that
emphasize either easy examples like self-paced learning or difficult examples
like hard example mining. Inspired by active learning, we propose two
alternatives to re-weight training samples based on lightweight estimates of
sample uncertainty in stochastic gradient descent (SGD): the variance in
predicted probability of the correct class across iterations of mini-batch SGD,
and the proximity of the correct class probability to the decision threshold
(or threshold closeness). Extensive experimental results on multiple datasets
show that our methods reliably improve accuracy in various network
architectures, including providing additional gains on top of other popular
training tools, such as ADAM, dropout, and distillation.
Arnaud Marsiglietti, Victoria Kostina
Comments: 31 pages, 2 figures
Subjects: Information Theory (cs.IT)
We derive a lower bound on the differential entropy of a log-concave random
variable (X) in terms of the (p)-th absolute moment of (X). The new bound leads
to a reverse entropy power inequality with an explicit constant, and to new
bounds on the rate-distortion function and the channel capacity.
Specifically, we study the rate-distortion function for log-concave sources
and distortion measure (| x – hat x|^r), and we establish that the difference
between the rate distortion function and the Shannon lower bound is at most
(log(2 sqrt{pi e}) approx 2.5) bits, independently of (r) and the target
distortion (d). For mean-square error distortion, the difference is at most
(log (sqrt{2 pi e}) approx 2) bits, regardless of (d). The bounds can be
further strengthened if the source, in addition to being log-concave, is
symmetric. In particular, we establish that for mean-square error distortion,
the difference is at most (log (sqrt{pi e}) approx 1.5) bits, regardless of
(d).
We also provide bounds on the capacity of memoryless additive noise channels
when the noise is log-concave. We show that the difference between the capacity
of such channels and the capacity of the Gaussian channel with the same noise
power is at most (log(sqrt {2pi e}) approx 2) bits, and at most (log
(sqrt{pi e}) approx 1.5) bits if the noise is symmetric log-concave.
Our results generalize to the case of vector (X) with possibly dependent
coordinates, and to (gamma)-concave random variables. Our proof technique
leverages tools from convex geometry.
Mostafa Medra, Timothy N. Davidson
Subjects: Information Theory (cs.IT)
This paper considers the design of the beamformers for a multiple-input
single-output (MISO) downlink system that seeks to mitigate the impact of the
imperfections in the channel state information (CSI) that is available at the
base station (BS). The goal of the design is to minimize the outage probability
of specified signal-to-interference-and-noise ratio (SINR) targets, while
satisfying per-antenna power constraints (PAPCs), and to do so at a low
computational cost. Based on insights from the offset maximization technique
for robust beamforming, and observations regarding the structure of the
optimality conditions, low-complexity iterative algorithms that involve the
evaluation of closed-form expressions are developed. To further reduce the
computational cost, algorithms are developed for per-antenna power-constrained
variants of the zero-forcing (ZF) and maximum ratio transmission (MRT)
beamforming directions. In the MRT case, our low-complexity version for systems
with a large number of antennas may be of independent interest. The proposed
algorithms are extended to systems with both PAPCs and a total power
constraint. Simulation results show that the proposed robust designs can
provide substantial gains in the outage probability while satisfying the PAPCs.
Yongpeng Wu, Jun-Bo Wang, Jue Wang, Robert Schober, Chengshan Xiao
Comments: Accepted by IEEE Transactions on Communications. arXiv admin note: text overlap with arXiv:1612.08328
Subjects: Information Theory (cs.IT)
In this paper, we investigate secure transmission over the large-scale
multiple-antenna wiretap channel with finite alphabet inputs. First, we
investigate the case where instantaneous channel state information (CSI) of the
eavesdropper is known at the transmitter. We show analytically that a
generalized singular value decomposition (GSVD) based design, which is optimal
for Gaussian inputs, may exhibit a severe performance loss for finite alphabet
inputs in the high signal-to-noise ratio (SNR) regime. In light of this, we
propose a novel Per-Group-GSVD (PG-GSVD) design which can effectively
compensate the performance loss caused by the GSVD design. More importantly,
the computational complexity of the PG-GSVD design is by orders of magnitude
lower than that of the existing design for finite alphabet inputs in [1] while
the resulting performance loss is minimal. Then, we extend the PG-GSVD design
to the case where only statistical CSI of the eavesdropper is available at the
transmitter. Numerical results indicate that the proposed PG-GSVD design can be
efficiently implemented in large-scale multiple-antenna systems and achieves
significant performance gains compared to the GSVD design.
Amitalok J. Budkuley, Bikash Kumar Dey, Vinod M. Prabhakaran
Comments: 11 pages
Subjects: Information Theory (cs.IT)
We study a lossy source coding problem for a memoryless remote source. The
source data is broadcast over an arbitrarily varying channel (AVC) controlled
by an adversary. One output of the AVC is received as input at the encoder, and
another output is received as side information at the decoder. The adversary is
assumed to know the source data non-causally, and can employ randomized jamming
strategies arbitrarily correlated to the source data. The decoder reconstructs
the source data from the encoded message and the side information. We prove
upper and lower bounds on the adversarial rate distortion function for the
source under randomized coding. Furthermore, we present some interesting
special cases of our general setup where the above bounds coincide, and thus,
provide their complete rate distortion function characterization.
Sundeep Prabhakar Chepuri, Geert Leus
Comments: Under peer review for Jour. of Sel. Topics in Signal Proc. (special issue on graph signal processing), Nov. 2016
Subjects: Information Theory (cs.IT)
In this paper the focus is on subsampling as well as reconstructing the
second-order statistics of signals residing on nodes of arbitrary undirected
graphs. Second-order stationary graph signals may be obtained by graph
filtering zero-mean white noise and they admit a well-defined power spectrum
whose shape is determined by the frequency response of the graph filter.
Estimating the graph power spectrum forms an important component of stationary
graph signal processing and related inference tasks such as Wiener prediction
or inpainting on graphs. The central result of this paper is that by sampling a
significantly smaller subset of vertices and using simple least squares, we can
reconstruct the second-order statistics of the graph signal from the subsampled
observations, and more importantly, without any spectral priors. To this end,
both a nonparametric approach as well as parametric approaches including moving
average and autoregressive models for the graph power spectrum are considered.
The results specialize for undirected circulant graphs in that the graph nodes
leading to the best compression rates are given by the so-called minimal sparse
rulers. A near-optimal greedy algorithm is developed to design the subsampling
scheme for the non-parametric and the moving average models, whereas a
particular subsampling scheme that allows linear estimation for the
autoregressive model is proposed. Numerical experiments on synthetic as well as
real datasets related to climatology and processing handwritten digits are
provided to demonstrate the developed theory.
Ganggang Ma, Jie Xu, Lingjie Duan, Rui Zhang
Comments: Submitted for conference publication
Subjects: Information Theory (cs.IT)
Wireless surveillance is becoming increasingly important to protect the
public security by legitimately eavesdropping suspicious wireless
communications. This paper studies the wireless surveillance of a two-hop
suspicious communication link by a half-duplex legitimate monitor. By exploring
the suspicious link’s two-hop nature, the monitor can adaptively choose among
the following three eavesdropping modes to improve the eavesdropping
performance: (I) emph{passive eavesdropping} to intercept both hops to decode
the message collectively, (II) emph{proactive eavesdropping} via {emph{noise
jamming}} over the first hop, and (III) emph{proactive eavesdropping} via
{emph{hybrid jamming}} over the second hop. In both proactive eavesdropping
modes, the (noise/hybrid) jamming over one hop is for the purpose of reducing
the end-to-end communication rate of the suspicious link and accordingly making
the interception more easily over the other hop. Under this setup, we maximize
the eavesdropping rate at the monitor by jointly optimizing the eavesdropping
mode selection as well as the transmit power for noise and hybrid jamming.
Numerical results show that the eavesdropping mode selection significantly
improves the eavesdropping rate as compared to each individual eavesdropping
mode.
Hanaa Marshoud, Sami Muhaidat, Paschalis C. Sofotasios, Sajjad Hussain, Muhammad Ali Imran, Bayan S. Sharif
Subjects: Information Theory (cs.IT)
The proliferation of mobile Internet and connected devices, offering a
variety of services at different levels of performance, represents a major
challenge for the fifth generation wireless networks and beyond. This requires
a paradigm shift towards the development of key enabling techniques for the
next generation wireless networks. In this respect, visible light communication
(VLC) has recently emerged as a new communication paradigm that is capable of
providing ubiquitous connectivity by complementing radio frequency
communications. One of the main challenges of VLC systems, however, is the low
modulation bandwidth of the light-emitting-diodes, which is in the megahertz
range. This article presents a promising technology, referred to as “optical-
non-orthogonal multiple access (O-NOMA)”, which is envisioned to address the
key challenges in the next generation of wireless networks. We provide a
detailed overview and analysis of the state-of-the-art integration of O-NOMA in
VLC networks. Furthermore, we provide insights on the potential opportunities
and challenges as well as some open research problems that are envisioned to
pave the way for the future design and implementation of O-NOMA in VLC systems.
Giulio Giaconi, Deniz Gunduz, H. Vincent Poor
Comments: 6 pages, 7 figures
Subjects: Information Theory (cs.IT)
The smart meter (SM) privacy problem is addressed together with the cost of
energy for the consumer. It is assumed that a storage device, e.g., an
electrical battery, is available to the consumer, which can be utilized both to
achieve privacy and to reduce the energy cost by modifying the energy
consumption profile. Privacy is measured via the squared-error distortion
between the SM readings, which are reported to the utility provider (UP), and a
target profile, whereas time-of-use pricing is considered for energy cost
calculation. Extensive numerical results are presented to evaluate the
performance of the proposed strategy. The trade-off between the achievable
privacy and the energy cost is studied by taking into account the limited
capacity of the battery as well as the capability to sell energy to the UP.
Andreas Lenz, Manuel S. Stein, A. Lee Swindlehurst
Comments: 13 pages, 14 figures
Subjects: Information Theory (cs.IT)
In this article a framework is presented for the joint optimization of the
analog transmit and receive filter with respect to a channel estimation
problem. At the receiver, conventional signal processing systems restrict the
bandwidth of the analog pre-filter (B) to the rate of the analog-to-digital
converter (f_s) in order to comply with the well-known Nyquist sampling
theorem. In contrast, here we consider a transceiver that by design violates
the common paradigm (Bleq f_s). To this end, at the receiver we allow for a
higher pre-filter bandwidth (B>f_s) and study the achievable channel estimation
accuracy under a fixed sampling rate when the transmit and receive filter are
jointly optimized with respect to the Bayesian Cram'{e}r-Rao lower bound. For
the case of a channel with unknown delay-Doppler shift we show how to
approximate the required Fisher information matrix and solve the transceiver
design problem by an alternating optimization algorithm. The presented approach
allows us to explore the Pareto-optimal region spanned by transmit and receive
filters which are favorable under a weighted mean squared error criterion. We
discuss the complexity of the obtained transceiver design by visualizing the
resulting ambiguity function. Finally, we verify the achievable performance of
the proposed designs by Monte-Carlo simulations of a likelihood-based channel
estimator.
Yongpeng Wu, Chengshan Xiao, Zhi Ding, Xiqi Gao, Shi Jin
Comments: 110 pages, 512 references, submit to Proceedings of the IEEE
Subjects: Information Theory (cs.IT)
Multiple antennas have been exploited for spatial multiplexing and diversity
transmission in a wide range of communication applications. However, most of
the advances in the design of high speed wireless multiple-input multiple
output (MIMO) systems are based on information-theoretic principles that
demonstrate how to efficiently transmit signals conforming to Gaussian
distribution. Although the Gaussian signal is capacity-achieving, signals
conforming to discrete constellations are transmitted in practical
communication systems. As a result, this paper is motivated to provide a
comprehensive overview on MIMO transmission design with discrete input signals.
We first summarize the existing fundamental results for MIMO systems with
discrete input signals. Then, focusing on the basic point-to-point MIMO
systems, we examine transmission schemes based on three most important criteria
for communication systems: the mutual information driven designs, the mean
square error driven designs, and the diversity driven designs. Particularly, a
unified framework which designs low complexity transmission schemes applicable
to massive MIMO systems in upcoming 5G wireless networks is provided in the
first time. Moreover, adaptive transmission designs which switch among these
criteria based on the channel conditions to formulate the best transmission
strategy are discussed. Then, we provide a survey of the transmission designs
with discrete input signals for multiuser MIMO scenarios, including MIMO uplink
transmission, MIMO downlink transmission, MIMO interference channel, and MIMO
wiretap channel. Additionally, we discuss the transmission designs with
discrete input signals for other systems using MIMO technology. Finally,
technical challenges which remain unresolved at the time of writing are
summarized and the future trends of transmission designs with discrete input
signals are addressed.
Cristina Perfecto, Javier Del Ser, Mehdi Bennis, Miren Nekane Bilbao
Comments: 6 pages, 4 figures
Subjects: Information Theory (cs.IT)
In vehicular scenarios context awareness is a key enabler for road safety.
However, the amount of contextual information that can be collected by a
vehicle is stringently limited by the sensor technology itself (e.g.,
line-of-sight, coverage, weather robustness) and by the low bandwidths offered
by current wireless vehicular technologies such as DSRC/802.11p. Motivated by
the upsurge of research around millimeter-wave vehicle-to-anything (V2X)
communications, in this work we propose a distributed vehicle-to-vehicle (V2V)
association scheme that considers a quantitative measure of the potential value
of the shared contextual information in the pairing process. First, we properly
define the utility function of every vehicle balancing classical channel state
and queuing state information (CSI/QSI) with context information i.e., sensing
content resolution, timeliness and enhanced range of the sensing. Next we solve
the problem via a distributed many-to-one matching game in a junction scenario
with realistic vehicular mobility traces. It is shown that when receivers are
able to leverage information from different sources, the average volume of
collected extended sensed information under our proposed scheme is up to 71%
more than that of distance and minimum delay-based matching baselines.
Ehab Salahat, Ahmed Kulaib, Nazar Ali, Raed Shubair
Comments: Accepted in IEEE International Symposium on Antennas and Propagation (APS17), San Diego, California, 9-14 Jul. 2017. arXiv admin note: substantial text overlap with arXiv:1704.06874
Subjects: Information Theory (cs.IT)
Asymmetry is unquestionably an important characteristic of the wireless
propagation channel, which needs to be accurately modeled for wireless and
mobile communications, 5G networks, and associated applications such as
indoor/outdoor localization. This paper reports on the potential causes of
propagation asymmetry. Practical channel measurements at Khalifa University
premises proved that wireless channels are asymmetric in realistic scenarios.
Some important conclusions and recommendation are also summarized.
Michael Newman, Yaoyun Shi
Comments: 22 pages, 2 figures, comments welcome!
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)
Transversality is a simple and effective method for implementing quantum
computation fault-tolerantly. However, no quantum error-correcting code (QECC)
can transversally implement a quantum universal gate set (Eastin and Knill,
Phys. Rev. Lett., 102, 110502). Since reversible classical computation is often
a dominating part of useful quantum computation, whether or not it can be
implemented transversally is an important open problem. We show that, other
than a small set of non-additive codes that we cannot rule out, no binary QECC
can transversally implement a classical reversible universal gate set. In
particular, no such QECC can implement the Toffoli gate transversally.
We prove our result by constructing an information theoretically secure (but
inefficient) quantum homomorphic encryption (ITS-QHE) scheme inspired by Ouyang
et al. (arXiv:1508.00938). Homomorphic encryption allows the implementation of
certain functions directly on encrypted data, i.e. homomorphically. Our scheme
builds on almost any QECC, and implements that code’s transversal gate set
homomorphically. We observe a restriction imposed by Nayak’s bound (FOCS 1999)
on ITS-QHE, implying that any ITS quantum fully homomorphic scheme (ITS-QFHE)
implementing the full set of classical reversible functions must be highly
inefficient. While our scheme incurs exponential overhead, any such QECC
implementing Toffoli transversally would still violate this lower bound through
our scheme.
Maksim Butsenko, Johan Swärd, Andreas Jakobsson
Subjects: Methodology (stat.ME); Information Theory (cs.IT)
In this paper, we introduce a wideband dictionary framework for estimating
sparse signals. By formulating integrated dictionary elements spanning bands of
the considered parameter space, one may efficiently find and discard large
parts of the parameter space not active in the signal. After each iteration,
the zero-valued parts of the dictionary may be discarded to allow a refined
dictionary to be formed around the active elements, resulting in a zoomed
dictionary to be used in the following iterations. Implementing this scheme
allows for more accurate estimates, at a much lower computational cost, as
compared to directly forming a larger dictionary spanning the whole parameter
space or performing a zooming procedure using standard dictionary elements.
Different from traditional dictionaries, the wideband dictionary allows for the
use of dictionaries with fewer elements than the number of available samples
without loss of resolution. The technique may be used on both one- and
multi-dimensional signals, and may be exploited to refine several traditional
sparse estimators, here illustrated with the LASSO and the SPICE estimators.
Numerical examples illustrate the improved performance.
Neelakantan Nurani Krishnan, Ratnesh Kumbhkar, Narayan B. Mandayam, Ivan Seskar, Sastry Kompella
Comments: This paper has been submitted to IEEE Globecom 2017
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Increasing data traffic demands over wireless spectrum have necessitated
spectrum sharing and coexistence between heterogeneous systems such as radar
and cellular communications systems. In this context, we specifically
investigate the co-channel coexistence between an air traffic control (ATC)
radar and a wide area cellular communication (comms) system. We present a
comprehensive characterization and analysis of interference caused by the comms
system on the ATC radar with respect to multiple parameters such as radar
range, protection radius around the radar, and radar antenna elevation angle.
The analysis suggests that maintaining a protection radius of 50 km around the
radar will ensure the required INR protection criterion of -10 dB at the radar
receiver with ~0.9 probability, even when the radar beam is in the same horizon
as the comms BS. Detailed evaluations of the radar target detection performance
provide a framework to choose appropriate protection radii around the radar to
meet specific performance requirements.
Ashwin Pananjady, Martin J. Wainwright, Thomas A. Courtade
Comments: To appear in part at ISIT 2017, Aachen
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Statistics Theory (math.ST)
The multivariate linear regression model with shuffled data and additive
Gaussian noise arises in various correspondence estimation and matching
problems. Focusing on the denoising aspect of this problem, we provide a
characterization the minimax error rate that is sharp up to logarithmic
factors. We also analyze the performance of two versions of a computationally
efficient estimator, and establish their consistency for a large range of input
parameters. Finally, we provide an exact algorithm for the noiseless problem
and demonstrate its performance on an image point-cloud matching task. Our
analysis also extends to datasets with outliers.