Chen Huang, Chen Change Loy, Xiaoou Tang
Comments: 9 pages, 4 figures, 2 tables. Accepted to NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Existing deep embedding methods in vision tasks are capable of learning a
compact Euclidean space from images, where Euclidean distances correspond to a
similarity metric. To make learning more effective and efficient, hard sample
mining is usually employed, with samples identified through computing the
Euclidean feature distance. However, the global Euclidean distance cannot
faithfully characterize the true feature similarity in a complex visual feature
space, where the intraclass distance in a high-density region may be larger
than the interclass distance in low-density regions. In this paper, we
introduce a Position-Dependent Deep Metric (PDDM) unit, which is capable of
learning a similarity metric adaptive to local feature structure. The metric
can be used to select genuinely hard samples in a local neighborhood to guide
the deep embedding learning in an online and robust manner. The new layer is
appealing in that it is pluggable to any convolutional networks and is trained
end-to-end. Our local similarity-aware feature embedding not only demonstrates
faster convergence and boosted performance on two complex image retrieval
datasets, its large margin nature also leads to superior generalization results
under the large and open set scenarios of transfer learning and zero-shot
learning on ImageNet 2010 and ImageNet-10K datasets.
Nicholas Westlake, Hongping Cai, Peter Hall
Comments: 14 pages, plus 3 pages of references; 7 figures in ECCV 2016 Workshops
Subjects: Computer Vision and Pattern Recognition (cs.CV)
CNNs have massively improved performance in object detection in photographs.
However research into object detection in artwork remains limited. We show
state-of-the-art performance on a challenging dataset, People-Art, which
contains people from photos, cartoons and 41 different artwork movements. We
achieve this high performance by fine-tuning a CNN for this task, thus also
demonstrating that training CNNs on photos results in overfitting for photos:
only the first three or four layers transfer from photos to artwork. Although
the CNN’s performance is the highest yet, it remains less than 60\% AP,
suggesting further work is needed for the cross-depiction problem. The final
publication is available at Springer via
this http URL
Manish Sahu, Anirban Mukhopadhyay, Angelika Szengel, Stefan Zachow
Comments: MICCAI M2CAI 2016 Surgical tool & phase detection challenge report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A transfer learning method for generating features suitable for surgical
tools and phase recognition from the ImageNet classification features [1] is
proposed here. In addition, methods are developed for generating contextual
features and combining them with time series analysis for final classification
using multi-class random forest. The proposed pipeline is tested over the
training and testing datasets of M2CAI16 challenges: tool and phase detection.
Encouraging results are obtained by leave-one-out cross validation evaluation
on the training dataset.
Andru P. Twinanda, Didier Mutter, Jacques Marescaux, Michel de Mathelin, Nicolas Padoy
Comments: The dataset is available at this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The tool presence detection challenge at M2CAI 2016 consists of identifying
the presence/absence of seven surgical tools in the images of cholecystectomy
videos. Here, we propose to use deep architectures that are based on our
previous work where we presented several architectures to perform multiple
recognition tasks on laparoscopic videos. In this technical report, we present
the tool presence detection results using two architectures: (1) a single-task
architecture designed to perform solely the tool presence detection task and
(2) a multi-task architecture designed to perform jointly phase recognition and
tool presence detection. The results show that the multi-task network only
slightly improves the tool presence detection results. In constrast, a
significant improvement is obtained when there are more data available to train
the networks. This significant improvement can be regarded as a call for action
for other institutions to start working toward publishing more datasets into
the community, so that better models could be generated to perform the task.
Andru P. Twinanda, Didier Mutter, Jacques Marescaux, Michel de Mathelin, Nicolas Padoy
Comments: The dataset is available at this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The surgical workflow challenge at M2CAI 2016 consists of identifying 8
surgical phases in cholecystectomy procedures. Here, we propose to use deep
architectures that are based on our previous work where we presented several
architectures to perform multiple recognition tasks on laparoscopic videos. In
this technical report, we present the phase recognition results using two
architectures: (1) a single-task architecture designed to perform solely the
surgical phase recognition task and (2) a multi-task architecture designed to
perform jointly phase recognition and tool presence detection. On top of these
architectures we propose to use two different approaches to enforce the
temporal constraints of the surgical workflow: (1) HMM-based and (2) LSTM-based
pipelines. The results show that the LSTM-based approach is able to outperform
the HMM-based approach and also to properly enforce the temporal constraints
into the recognition process.
Madhura Katageri, Manisha Mandal, Mansi Gandhi, Navin Koregaonkar, Prof. Sharmila Sengupta
Journal-ref: International Journal of Computer Science and Information
Technology (IJCSIT) December 2015, volume 7, number 6
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Potholes though seem inconsequential, may cause accidents resulting in loss
of human life. In this paper, we present an automated system to efficiently
manage the potholes in a ward by deploying geotagging and image processing
techniques that overcomes the drawbacks associated with the existing
survey-oriented systems. Image processing is used for identification of target
pothole regions in the 2D images using edge detection and morphological image
processing operations. A method is developed to accurately estimate the
dimensions of the potholes from their images, analyze their area and depth,
estimate the quantity of filling material required and therefore enabling
pothole attendance on a priority basis. This will further enable the government
official to have a fully automated system for effectively managing pothole
related disasters.
Abhinav Kumar, Animesh Kumar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST)
The estimation of grayscale images using their single-bit zero mean Gaussian
noise-affected pixels is presented in this paper. The images are assumed to be
bandlimited in the Fourier Cosine transform (FCT) domain. The images are
oversampled over their Nyquist rate in the FCT domain. We propose a
non-recursive approach based on first order approximation of Cumulative
Distribution Function (CDF) to estimate the image from single bit pixels which
itself is based on Banach’s contraction theorem. The decay rate for mean
squared error of estimating such images is found to be independent of the
precision of the quantizer and it varies as (O(1/N)) where (N) is the
“effective” oversampling ratio with respect to the Nyquist rate in the FCT
domain.
Peixin Hou, Hao Deng, Jiguang Yue, Shuguang Liu
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
In this paper, we take a new look at the possibilistic c-means (PCM) and
adaptive PCM (APCM) clustering algorithms from the perspective of uncertainty.
This new perspective offers us insights into the clustering process, and also
provides us greater degree of flexibility. We analyze the clustering behavior
of PCM-based algorithms and introduce parameters (sigma_v) and (alpha) to
characterize uncertainty of estimated bandwidth and noise level of the dataset
respectively. Then uncertainty (fuzziness) of membership values caused by
uncertainty of the estimated bandwidth parameter is modeled by a conditional
fuzzy set, which is a new formulation of the type-2 fuzzy set. Experiments show
that parameters (sigma_v) and (alpha) make the clustering process more easy
to control, and main features of PCM and APCM are unified in this new
clustering framework (UPCM). More specifically, UPCM reduces to PCM when we set
a small (alpha) or a large (sigma_v), and UPCM reduces to APCM when clusters
are confined in their physical clusters and possible cluster elimination are
ensured. Finally we present further researches of this paper.
Jianjia Zhang, Lei Wang, Luping Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The past few years have witnessed increasing research interest on
covariance-based feature representation. A variety of methods have been
proposed to boost its efficacy, with some recent ones resorting to nonlinear
kernel technique. Noting that the essence of this feature representation is to
characterise the underlying structure of visual features, this paper argues
that an equally, if not more, important approach to boosting its efficacy shall
be to improve the quality of this characterisation. Following this idea, we
propose to exploit the structure sparsity of visual features in skeletal human
action recognition, and compute sparse inverse covariance estimate (SICE) as
feature representation. We discuss the advantage of this new representation on
dealing with small sample, high dimensionality, and modelling capability.
Furthermore, utilising the monotonicity property of SICE, we efficiently
generate a hierarchy of SICE matrices to characterise the structure of visual
features at different sparsity levels, and two discriminative learning
algorithms are then developed to adaptively integrate them to perform
recognition. As demonstrated by extensive experiments, the proposed
representation leads to significantly improved recognition performance over the
state-of-the-art comparable methods. In particular, as a method fully based on
linear technique, it is comparable or even better than those employing
nonlinear kernel technique. This result well demonstrates the value of
exploiting structure sparsity for covariance-based feature representation.
Hua Lan, Shuai Sun, Zengfu Wang, Quan Pan, Zhishan Zhang
Comments: 21 pages, 15 figures, has been submitted to IEEE Transactions on Signal Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Different from traditional point target tracking systems assuming that a
target generates at most one single measurement per scan, there exists a class
of multipath target tracking systems where each measurement may originate from
the interested target via one of multiple propagation paths or from clutter,
while the correspondence among targets, measurements, and propagation paths is
unknown. The performance of multipath target tracking systems can be improved
if multiple measurements from the same target are effectively utilized, but
suffers from two major challenges. The first is multipath detection that
detects appearing and disappearing targets automatically, while one target may
produce (s) tracks for (s) propagation paths. The second is multipath tracking
that calculates the target-to-measurement-to-path assignment matrices to
estimate target states, which is computationally intractable due to the
combinatorial explosion. Based on variational Bayesian framework, this paper
introduces a novel probabilistic joint detection and tracking algorithm
(JDT-VB) that incorporates data association, path association, state estimation
and automatic track management. The posterior probabilities of these latent
variables are derived in a closed-form iterative manner, which is effective for
dealing with the coupling issue of multipath data association identification
risk and state estimation error. Loopy belief propagation (LBP) is exploited to
approximate the multipath data association, which significantly reduces the
computational cost. The proposed JDT-VB algorithm can simultaneously deal with
the track initiation, maintenance, and termination for multiple multipath
target tracking with time-varying number of targets, and its performance is
verified by a numerical simulation of over-the-horizon radar.
Tiep Vu, Vishal Monga
Comments: First submission to TIP
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Despite the fact that different objects possess distinct class-specific
features, they also usually share common patterns. This observation has been
exploited partially in a recently proposed dictionary learning framework by
separating the particularity and the commonality (COPAR). Inspired by this, we
propose a novel method to explicitly and simultaneously learn a set of common
patterns as well as class-specific features for classification with more
intuitive constraints. Our dictionary learning framework is hence characterized
by both a shared dictionary and particular (class-specific) dictionaries. For
the shared dictionary, we enforce a low-rank constraint, i.e. claim that its
spanning subspace should have low dimension and the coefficients corresponding
to this dictionary should be similar. For the particular dictionaries, we
impose on them the well-known constraints stated in the Fisher discrimination
dictionary learning (FDDL). Further, we develop new fast and accurate
algorithms to solve the subproblems in the learning step, accelerating its
convergence. The said algorithms could also be applied to FDDL and its
extensions. The efficiencies of these algorithms are theoretically and
experimentally verified by comparing their complexities and running time with
those of other well-known dictionary learning methods. Experimental results on
widely used image datasets establish the advantages of our method over
state-of-the-art dictionary learning methods.
Abhishek Kumar Dubey, Alexandros-Stavros Iliopoulos, Xiaobai Sun, Fang-Fang Yin, Lei Ren
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Purpose: The inverse of a deformation vector field (DVF) is often needed in
deformable registration, 4D image reconstruction, and adaptive radiation
therapy. This study aims at improving both the accuracy with respect to inverse
consistency and efficiency of the numerical DVF inversion by developing a
fixed-point iteration method with feedback control.
Method: We introduce an iterative method with active feedback control for DVF
inversion. The method is built upon a previous fixed-point iteration method,
which is represented as a particular passive instance in the new method. At
each iteration step, we measure the inconsistency, namely the residual, between
the iterative inverse estimate and the input DVF. The residual is modulated by
a feedback control mechanism before being incorporated into the next iterate.
The feedback control design is based on analysis of error propagation in the
iteration process. The design goal is to make the convergence region as large
as possible, and to make estimate errors vanish faster. The feedback control
design is assessed with two data sets: an analytic DVF pair, and a DVF between
two phases of a 4D XCAT phantom.
Results: The single-parameter feedback control improved both the convergence
region and convergence rate of the iterative algorithm, for both datasets. With
the analytic data, the iteration becomes convergent over the entire image
domain, whereas the the precursor method diverges as the deformation becomes
larger. With the XCAT DVF data, feedback control reduced the 95th percentile of
residuals from 1 mm to 1E-6 mm. Additionally, convergence rate was accelerated
by at least a factor of 2 for both datasets.
Conclusion: The introduced method shows the previously unexplored possibility
in exercising active feedback control in DVF inversion, and the unexploited
potential in improving both numerical accuracy and computational efficiency.
Haoyu Li, Changliang Guo, Inbarasan Muniraj, Bryce C. Schroeder, John T. Sheridan, Shu Jia
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Biological Physics (physics.bio-ph); Optics (physics.optics)
We report a light-field based method that allows the optical encryption of
three-dimensional (3D) volumetric information at the microscopic scale in a
single 2D light-field image. The system consists of a microlens array and an
array of random phase/amplitude masks. The method utilizes a wave optics model
to account for the dominant diffraction effect at this new scale, and the
system point-spread function (PSF) serves as the key for encryption and
decryption. We successfully developed and demonstrated a deconvolution
algorithm to retrieve spatially multiplexed discrete and continuous volumetric
data from 2D light-field images. Showing that the method is practical for data
transmission and storage, we obtained a faithful reconstruction of the 3D
volumetric information from a digital copy of the encrypted light-field image.
The method represents a new level of optical encryption, paving the way for
broad industrial and biomedical applications in processing and securing 3D data
at the microscopic scale.
Ahmed M. Alaa, Jinsung Yoon, Scott Hu, Mihaela van der Schaar
Subjects: Artificial Intelligence (cs.AI)
Objective: In this paper, we develop a personalized real-time risk scoring
algorithm that provides timely and granular assessments for the clinical acuity
of ward patients based on their (temporal) lab tests and vital signs; the
proposed risk scoring system ensures timely intensive care unit (ICU)
admissions for clinically deteriorating patients. Methods: The risk scoring
system learns a set of latent patient subtypes from the offline electronic
health record data, and trains a mixture of Gaussian Process (GP) experts,
where each expert models the physiological data streams associated with a
specific patient subtype. Transfer learning techniques are used to learn the
relationship between a patient’s latent subtype and her static admission
information (e.g. age, gender, transfer status, ICD-9 codes, etc). Results:
Experiments conducted on data from a heterogeneous cohort of 6,321 patients
admitted to Ronald Reagan UCLA medical center show that our risk score
significantly and consistently outperforms the currently deployed risk scores,
such as the Rothman index, MEWS, APACHE and SOFA scores, in terms of
timeliness, true positive rate (TPR), and positive predictive value (PPV).
Conclusion: Our results reflect the importance of adopting the concepts of
personalized medicine in critical care settings; significant accuracy and
timeliness gains can be achieved by accounting for the patients’ heterogeneity.
Significance: The proposed risk scoring methodology can confer huge clinical
and social benefits on more than 200,000 critically ill inpatient who exhibit
cardiac arrests in the US every year.
Marti Luis (TAO, LRI), Fansi-Tchango Arsene (TRT), Navarro Laurent (TRT), Marc Schoenauer (TAO, LRI)
Journal-ref: Parallel Problem Solving from Nature — PPSN XIV, Sep 2016,
Edinburgh, France. Springer Verlag, 9921, pp.697-706, 2016, LNCS
Subjects: Artificial Intelligence (cs.AI)
This paper presents the Voronoi diagram-based evolutionary algorithm
(VorEAl). VorEAl partitions input space in abnormal/normal subsets using
Voronoi diagrams. Diagrams are evolved using a multi-objective bio-inspired
approach in order to conjointly optimize classification metrics while also
being able to represent areas of the data space that are not present in the
training dataset. As part of the paper VorEAl is experimentally validated and
contrasted with similar approaches.
Iuliia Kotseruba, Oscar J. Avella Gonzalez, John K. Tsotsos
Comments: 37 pages, 11 figures
Subjects: Artificial Intelligence (cs.AI)
In this paper we present a broad overview of the last 40 years of research on
cognitive architectures. Although the number of existing architectures is
nearing several hundred, most of the existing surveys do not reflect this
growth and focus on a handful of well-established architectures. While their
contributions are undeniable, they represent only a part of the research in the
field. Thus, in this survey we wanted to shift the focus towards a more
inclusive and high-level overview of the research in cognitive architectures.
Our final set of 86 architectures includes 55 that are still actively
developed, and borrow from a diverse set of disciplines, spanning areas from
psychoanalysis to neuroscience. To keep the length of this paper within
reasonable limits we discuss only the core cognitive abilities, such as
perception, attention mechanisms, learning and memory structure. To assess the
breadth of practical applications of cognitive architectures we gathered
information on over 700 practical projects implemented using the cognitive
architectures in our list. We use various visualization techniques to highlight
overall trends in the development of the field. Our analysis of practical
applications shows that most architectures are very narrowly focused on a
particular application domain. Furthermore, there is an apparent gap between
general research in robotics and computer vision and research in these areas
within the cognitive architectures field. It is very clear that biologically
inspired models do not have the same range and efficiency compared to the
systems based on engineering principles and heuristics. Another observation is
related to a general lack of collaboration. Several factors hinder
communication, such as the closed nature of the individual projects (only
one-third of the reviewed here architectures are open-source) and
terminological differences.
Maruan Al-Shedivat, Andrew Gordon Wilson, Yunus Saatchi, Zhiting Hu, Eric P. Xing
Comments: 29 pages, 7 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Many applications in speech, robotics, finance, and biology deal with
sequential data, where ordering matters and recurrent structures are common.
However, this structure cannot be easily captured by standard kernel functions.
To model such structure, we propose expressive closed-form kernel functions for
Gaussian processes. The resulting model, GP-LSTM, fully encapsulates the
inductive biases of long short-term memory (LSTM) recurrent networks, while
retaining the non-parametric probabilistic advantages of Gaussian processes. We
learn the properties of the proposed kernels by optimizing the Gaussian process
marginal likelihood using a new provably convergent semi-stochastic procedure
and exploit the structure of these kernels for fast and scalable training and
prediction. We demonstrate state-of-the-art performance on several benchmarks,
and thoroughly investigate a consequential autonomous driving application,
where the predictive uncertainties provided by GP-LSTM are uniquely valuable.
Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek
Subjects: Computation (stat.CO); Artificial Intelligence (cs.AI); Combinatorics (math.CO); Probability (math.PR)
We propose the Hit-and-Run algorithm for planning and sampling problems in
non-convex spaces. For sampling, we show the first analysis of the Hit-and-Run
algorithm in non-convex spaces and show that it mixes fast as long as certain
smoothness conditions are satisfied. In particular, our analysis reveals an
intriguing connection between fast mixing and the existence of smooth
measure-preserving mappings from a convex space to the non-convex space. For
planning, we show advantages of Hit-and-Run compared to state-of-the-art
planning methods such as Rapidly-Exploring Random Trees.
Nils Jansen, Murat Cubuktepe, Ufuk Topcu
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
We formalize synthesis of shared control protocols with correctness
guarantees for temporal logic specifications. More specifically, we introduce a
modeling formalism in which both a human and an autonomy protocol can issue
commands to a robot towards performing a certain task. These commands are
blended into a joint input to the robot. The autonomy protocol is synthesized
using an abstraction of possible human commands accounting for randomness in
decisions caused by factors such as fatigue or incomprehensibility of the
problem at hand. The synthesis is designed to ensure that the resulting robot
behavior satisfies given safety and performance specifications, e.g., in
temporal logic. Our solution is based on nonlinear programming and we address
the inherent scalability issue by presenting alternative methods. We assess the
feasibility and the scalability of the approach by an experimental evaluation.
Thomas Brouwer, Jes Frellsen, Pietro Lio'
Comments: NIPS 2016 Workshop on Advances in Approximate Bayesian Inference
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (cs.NA); Machine Learning (stat.ML)
We present a fast variational Bayesian algorithm for performing non-negative
matrix factorisation and tri-factorisation. We show that our approach achieves
faster convergence per iteration and timestep (wall-clock) than Gibbs sampling
and non-probabilistic approaches, and do not require additional samples to
estimate the posterior. We show that in particular for matrix tri-factorisation
convergence is difficult, but our variational Bayesian approach offers a fast
solution, allowing the tri-factorisation approach to be used more effectively.
Sanjaya Wijeratne, Lakshika Balasuriya, Derek Doran, Amit Sheth
Comments: 7 pages, 1 figure, 2 tables, Published at IJCAI Workshop on Semantic Machine Learning (SML 2016)
Journal-ref: IJCAI Workshop on Semantic Machine Learning (SML 2016). pp. 18-24.
CEUR-WS, New York City, NY (07 2016)
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR)
Gang affiliates have joined the masses who use social media to share thoughts
and actions publicly. Interestingly, they use this public medium to express
recent illegal actions, to intimidate others, and to share outrageous images
and statements. Agencies able to unearth these profiles may thus be able to
anticipate, stop, or hasten the investigation of gang-related crimes. This
paper investigates the use of word embeddings to help identify gang members on
Twitter. Building on our previous work, we generate word embeddings that
translate what Twitter users post in their profile descriptions, tweets,
profile images, and linked YouTube content to a real vector format amenable for
machine learning classification. Our experimental results show that pre-trained
word embeddings can boost the accuracy of supervised learning algorithms
trained over gang members social media posts.
Ellery Wulczyn, Nithum Thain, Lucas Dixon
Subjects: Computation and Language (cs.CL)
The damage personal attacks make to online discourse motivates many platforms
to try to curb the phenomenon. However, understanding the prevalence and impact
of personal attacks in online platforms at scale remains surprisingly
difficult. The contribution of this paper is to develop and illustrate a method
that combines crowdsourcing and machine learning to analyze personal attacks at
scale. We show an evaluation method for a classifier in terms of the aggregated
number of crowd-workers it can approximate. We apply our methodology to English
Wikipedia, generating a corpus of over 100k high quality human-labeled comments
and 63M machine-labeled ones from a classifier that is as good as the aggregate
of 3 crowd-workers. Using the corpus of machine-labeled scores, our methodology
allows us to explore some of the open questions about the nature of online
personal attacks. This reveals that the majority of personal attacks on
Wikipedia are not the result of a few malicious users, nor primarily the
consequence of allowing anonymous contributions.
Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Prateek Vij
Comments: 12 pages, 4 figures in COLING 2016
Subjects: Computation and Language (cs.CL)
Sarcasm detection is a key task for many natural language processing tasks.
In sentiment analysis, for example, sarcasm can flip the polarity of an
“apparently positive” sentence and, hence, negatively affect polarity detection
performance. To date, most approaches to sarcasm detection have treated the
task primarily as a text categorization problem. Sarcasm, however, can be
expressed in very subtle ways and requires a deeper understanding of natural
language that standard text categorization techniques cannot grasp. In this
work, we develop models based on a pre-trained convolutional neural network for
extracting sentiment, emotion and personality features for sarcasm detection.
Such features, along with the network’s baseline features, allow the proposed
models to outperform the state of the art on benchmark datasets. We also
address the often ignored generalizability issue of classifying data that have
not been seen by the models at learning phase.
Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han
Comments: submitted to WWW 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Extracting entities and relations for types of interest from text is
important for understanding massive text corpora. Traditionally, systems of
entity relation extraction have relied on human-annotated corpora for training
and adopted an incremental pipeline. Such systems require additional human
expertise to be ported to a new domain, and are vulnerable to errors cascading
down the pipeline. In this paper, we investigate joint extraction of typed
entities and relations with labeled data heuristically obtained from knowledge
bases (i.e., distant supervision). As our algorithm for type labeling via
distant supervision is context-agnostic, noisy training data poses unique
challenges for the task. We propose a novel domain-independent framework,
called CoType, that runs a data-driven text segmentation algorithm to extract
entity mentions, and jointly embeds entity mentions, relation mentions, text
features and type labels into two low-dimensional spaces (for entity and
relation mentions respectively), where, in each space, objects whose types are
close will also have similar representations. CoType, then using these learned
embeddings, estimates the types of test (unlinkable) mentions. We formulate a
joint optimization problem to learn embeddings from text corpora and knowledge
bases, adopting a novel partial-label loss function for noisy labeled data and
introducing an object “translation” function to capture the cross-constraints
of entities and relations on each other. Experiments on three public datasets
demonstrate the effectiveness of CoType across different domains (e.g., news,
biomedical), with an average of 25% improvement in F1 score compared to the
next best method.
Vered Shwartz, Ido Dagan
Comments: 5 pages, accepted to the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V), in COLING 2016
Subjects: Computation and Language (cs.CL)
We present a submission to the CogALex 2016 shared task on the corpus-based
identification of semantic relations, using LexNET (Shwartz and Dagan, 2016),
an integrated path-based and distributional method for semantic relation
classification. The reported results in the shared task bring this submission
to the third place on subtask 1 (word relatedness), and the first place on
subtask 2 (semantic relation classification), demonstrating the utility of
integrating the complementary path-based and distributional information sources
in recognizing semantic relatedness. Combined with a common similarity measure,
LexNET performs fairly good on the word relatedness task (subtask 1). The
relatively lower performance of LexNET and the various other systems on subtask
2, however, confirms the difficulty of the semantic relation classification
task, and stresses the need to develop additional methods for this task.
A.K.M. Sabbir, Antonio Jimeno Yepes, Ramakanth Kavuluru
Subjects: Computation and Language (cs.CL)
Biomedical word sense disambiguation (WSD) is an important intermediate task
in many natural language processing applications such as named entity
recognition, syntactic parsing, and relation extraction. In this paper, we
employ knowledge-based approaches that also exploit recent advances in neural
word/concept embeddings to improve over the state-of-the-art in biomedical WSD
using the MSH WSD dataset as the test set. Our methods involve distant
supervision – we do not use any hand-labeled examples for WSD to build our
prediction models; however, we employ an existing well known named entity
recognition and concept mapping program, MetaMap, to obtain our concept
vectors. Over the MSH WSD dataset, our linear time (in terms of numbers of
senses and words in the test instance) method achieves an accuracy of 92.24%
which is an absolute 3% improvement over the best known results obtained via
unsupervised or knowledge-based means. A more expensive approach that we
developed relies on a nearest neighbor framework and achieves an accuracy of
94.34%. Employing dense vector representations learned from unlabeled free text
has been shown to benefit many language processing tasks recently and our
efforts show that biomedical WSD is no exception to this trend. For a complex
and rapidly evolving domain such as biomedicine, building labeled datasets for
larger sets of ambiguous terms may be impractical. Here we demonstrate that
distant supervision that leverages recent advances in representation learning
can rival supervised approaches in biomedical WSD.
Łukasz Kaiser, Samy Bengio
Subjects: Learning (cs.LG); Computation and Language (cs.CL)
Several mechanisms to focus attention of a neural network on selected parts
of its input or memory have been used successfully in deep learning models in
recent years. Attention has improved image classification, image captioning,
speech recognition, generative models, and learning algorithmic tasks, but it
had probably the largest impact on neural machine translation.
Recently, similar improvements have been obtained using alternative
mechanisms that do not focus on a single part of a memory but operate on all of
it in parallel, in a uniform way. Such mechanism, which we call active memory,
improved over attention in algorithmic tasks, image processing, and in
generative modelling.
So far, however, active memory has not improved over attention for most
natural language processing tasks, in particular for machine translation. We
analyze this shortcoming in this paper and propose an extended model of active
memory that matches existing attention models on neural machine translation and
generalizes better to longer sentences. We investigate this model and explain
why previous active memory models did not succeed. Finally, we discuss when
active memory brings most benefits and where attention can be a better choice.
Sanjaya Wijeratne, Lakshika Balasuriya, Derek Doran, Amit Sheth
Comments: 7 pages, 1 figure, 2 tables, Published at IJCAI Workshop on Semantic Machine Learning (SML 2016)
Journal-ref: IJCAI Workshop on Semantic Machine Learning (SML 2016). pp. 18-24.
CEUR-WS, New York City, NY (07 2016)
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR)
Gang affiliates have joined the masses who use social media to share thoughts
and actions publicly. Interestingly, they use this public medium to express
recent illegal actions, to intimidate others, and to share outrageous images
and statements. Agencies able to unearth these profiles may thus be able to
anticipate, stop, or hasten the investigation of gang-related crimes. This
paper investigates the use of word embeddings to help identify gang members on
Twitter. Building on our previous work, we generate word embeddings that
translate what Twitter users post in their profile descriptions, tweets,
profile images, and linked YouTube content to a real vector format amenable for
machine learning classification. Our experimental results show that pre-trained
word embeddings can boost the accuracy of supervised learning algorithms
trained over gang members social media posts.
Anshu Dubey, Ann Almgren, John Bell, Martin Berzins, Steve Brandt, Greg Bryan, Phillip Colella, Daniel Graves, Michael Lijewski, Frank Löffler, Brian O'Shea, Erik Schnetter, Brian Van Straalen, Klaus Weide
Journal-ref: Journal of Parallel and Distributed Computing, Volume 74, Issue
12, December 2014, Pages 3217-3227
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); High Energy Astrophysical Phenomena (astro-ph.HE); General Relativity and Quantum Cosmology (gr-qc)
Over the last decade block-structured adaptive mesh refinement (SAMR) has
found increasing use in large, publicly available codes and frameworks. SAMR
frameworks have evolved along different paths. Some have stayed focused on
specific domain areas, others have pursued a more general functionality,
providing the building blocks for a larger variety of applications. In this
survey paper we examine a representative set of SAMR packages and SAMR-based
codes that have been in existence for half a decade or more, have a reasonably
sized and active user base outside of their home institutions, and are publicly
available. The set consists of a mix of SAMR packages and application codes
that cover a broad range of scientific domains. We look at their high-level
frameworks, and their approach to dealing with the advent of radical changes in
hardware architecture. The codes included in this survey are BoxLib, Cactus,
Chombo, Enzo, FLASH, and Uintah.
Karine Altisen, Pierre Corbineau, Stephane Devismes
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO)
We propose a general framework to build certified proofs of distributed
self-stabilizing algorithms with the proof assistant Coq. We first define in
Coq the locally shared memory model with composite atomicity, the most commonly
used model in the self-stabilizing area. We then validate our framework by
certifying a non trivial part of an existing silent self-stabilizing algorithm
which builds a k-clustering of the network. We also certified a quantitative
property related to the output of this algorithm. Precisely, we show that the
computed k-clustering contains at most floor( (n-1)/(k+1) )+ 1 clusterheads,
where n is the number of nodes in the network. To obtain these results, we also
developed a library which contains general tools related to potential functions
and cardinality of sets.
Maruan Al-Shedivat, Andrew Gordon Wilson, Yunus Saatchi, Zhiting Hu, Eric P. Xing
Comments: 29 pages, 7 figures
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Many applications in speech, robotics, finance, and biology deal with
sequential data, where ordering matters and recurrent structures are common.
However, this structure cannot be easily captured by standard kernel functions.
To model such structure, we propose expressive closed-form kernel functions for
Gaussian processes. The resulting model, GP-LSTM, fully encapsulates the
inductive biases of long short-term memory (LSTM) recurrent networks, while
retaining the non-parametric probabilistic advantages of Gaussian processes. We
learn the properties of the proposed kernels by optimizing the Gaussian process
marginal likelihood using a new provably convergent semi-stochastic procedure
and exploit the structure of these kernels for fast and scalable training and
prediction. We demonstrate state-of-the-art performance on several benchmarks,
and thoroughly investigate a consequential autonomous driving application,
where the predictive uncertainties provided by GP-LSTM are uniquely valuable.
Nicolas Keriven (UR1, PANAMA), Nicolas Tremblay (PANAMA), Yann Traonmilin (PANAMA), Rémi Gribonval (PANAMA)
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The Lloyd-Max algorithm is a classical approach to perform K-means
clustering. Unfortunately, its cost becomes prohibitive as the training dataset
grows large. We propose a compressive version of K-means (CKM), that estimates
cluster centers from a sketch, i.e. from a drastically compressed
representation of the training dataset. We demonstrate empirically that CKM
performs similarly to Lloyd-Max, for a sketch size proportional to the number
of cen-troids times the ambient dimension, and independent of the size of the
original dataset. Given the sketch, the computational complexity of CKM is also
independent of the size of the dataset. Unlike Lloyd-Max which requires several
replicates, we further demonstrate that CKM is almost insensitive to
initialization. For a large dataset of 10^7 data points, we show that CKM can
run two orders of magnitude faster than five replicates of Lloyd-Max, with
similar clustering performance on artificial data. Finally, CKM achieves lower
classification errors on handwritten digits classification.
Łukasz Kaiser, Samy Bengio
Subjects: Learning (cs.LG); Computation and Language (cs.CL)
Several mechanisms to focus attention of a neural network on selected parts
of its input or memory have been used successfully in deep learning models in
recent years. Attention has improved image classification, image captioning,
speech recognition, generative models, and learning algorithmic tasks, but it
had probably the largest impact on neural machine translation.
Recently, similar improvements have been obtained using alternative
mechanisms that do not focus on a single part of a memory but operate on all of
it in parallel, in a uniform way. Such mechanism, which we call active memory,
improved over attention in algorithmic tasks, image processing, and in
generative modelling.
So far, however, active memory has not improved over attention for most
natural language processing tasks, in particular for machine translation. We
analyze this shortcoming in this paper and propose an extended model of active
memory that matches existing attention models on neural machine translation and
generalizes better to longer sentences. We investigate this model and explain
why previous active memory models did not succeed. Finally, we discuss when
active memory brings most benefits and where attention can be a better choice.
Chen Huang, Chen Change Loy, Xiaoou Tang
Comments: 9 pages, 4 figures, 2 tables. Accepted to NIPS 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Existing deep embedding methods in vision tasks are capable of learning a
compact Euclidean space from images, where Euclidean distances correspond to a
similarity metric. To make learning more effective and efficient, hard sample
mining is usually employed, with samples identified through computing the
Euclidean feature distance. However, the global Euclidean distance cannot
faithfully characterize the true feature similarity in a complex visual feature
space, where the intraclass distance in a high-density region may be larger
than the interclass distance in low-density regions. In this paper, we
introduce a Position-Dependent Deep Metric (PDDM) unit, which is capable of
learning a similarity metric adaptive to local feature structure. The metric
can be used to select genuinely hard samples in a local neighborhood to guide
the deep embedding learning in an online and robust manner. The new layer is
appealing in that it is pluggable to any convolutional networks and is trained
end-to-end. Our local similarity-aware feature embedding not only demonstrates
faster convergence and boosted performance on two complex image retrieval
datasets, its large margin nature also leads to superior generalization results
under the large and open set scenarios of transfer learning and zero-shot
learning on ImageNet 2010 and ImageNet-10K datasets.
Jie Chen, Dehua Cheng, Yan Liu
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Positive-definite kernel functions are fundamental elements of kernel methods
and Gaussian processes. A well-known construction of such functions comes from
Bochner’s characterization, which connects a positive-definite function with a
probability distribution. Another construction, which appears to have attracted
less attention, is Polya’s criterion that characterizes a subset of these
functions. In this paper, we study the latter characterization and derive a
number of novel kernels little known previously.
In the context of large-scale kernel machines, Rahimi and Recht (2007)
proposed a random feature map (random Fourier) that approximates a kernel
function, through independent sampling of the probability distribution in
Bochner’s characterization. The authors also suggested another feature map
(random binning), which, although not explicitly stated, comes from Polya’s
characterization. We show that with the same number of random samples, the
random binning map results in an Euclidean inner product closer to the kernel
than does the random Fourier map. The superiority of the random binning map is
confirmed empirically through regressions and classifications in the
reproducing kernel Hilbert space.
Anthony O. Smith, Anand Rangarajan
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Supervised dimensionality reduction has emerged as an important theme in the
last decade. Despite the plethora of models and formulations, there is a lack
of a simple model which aims to project the set of patterns into a space
defined by the classes (or categories). To this end, we set up a model in which
each class is represented as a 1D subspace of the vector space formed by the
features. Assuming the set of classes does not exceed the cardinality of the
features, the model results in multi-class supervised learning in which the
features of each class are projected into the class subspace. Class
discrimination is automatically guaranteed via the imposition of orthogonality
of the 1D class sub-spaces. The resulting optimization problem – formulated as
the minimization of a sum of quadratic functions on a Stiefel manifold – while
being non-convex (due to the constraints), nevertheless has a structure for
which we can identify when we have reached a global minimum. After formulating
a version with standard inner products, we extend the formulation to
reproducing kernel Hilbert spaces in a straightforward manner. The optimization
approach also extends in a similar fashion to the kernel version. Results and
comparisons with the multi-class Fisher linear (and kernel) discriminants and
principal component analysis (linear and kernel) showcase the relative merits
of this approach to dimensionality reduction.
Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han
Comments: submitted to WWW 2017
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Extracting entities and relations for types of interest from text is
important for understanding massive text corpora. Traditionally, systems of
entity relation extraction have relied on human-annotated corpora for training
and adopted an incremental pipeline. Such systems require additional human
expertise to be ported to a new domain, and are vulnerable to errors cascading
down the pipeline. In this paper, we investigate joint extraction of typed
entities and relations with labeled data heuristically obtained from knowledge
bases (i.e., distant supervision). As our algorithm for type labeling via
distant supervision is context-agnostic, noisy training data poses unique
challenges for the task. We propose a novel domain-independent framework,
called CoType, that runs a data-driven text segmentation algorithm to extract
entity mentions, and jointly embeds entity mentions, relation mentions, text
features and type labels into two low-dimensional spaces (for entity and
relation mentions respectively), where, in each space, objects whose types are
close will also have similar representations. CoType, then using these learned
embeddings, estimates the types of test (unlinkable) mentions. We formulate a
joint optimization problem to learn embeddings from text corpora and knowledge
bases, adopting a novel partial-label loss function for noisy labeled data and
introducing an object “translation” function to capture the cross-constraints
of entities and relations on each other. Experiments on three public datasets
demonstrate the effectiveness of CoType across different domains (e.g., news,
biomedical), with an average of 25% improvement in F1 score compared to the
next best method.
Joonas Jälkö, Onur Dikmen, Antti Honkela
Comments: 8 pages, 7 figures
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG); Methodology (stat.ME)
As collecting huge amounts of personal data from individuals has been
established as a standard nowadays, it is really important to use these data in
a conscientious way. For example, when performing inference using these data,
one has to make sure individuals’ identities or the privacy of the data are not
compromised. Differential privacy is a powerful framework that introduces
stochasticity into the computation to guarantee that it is difficult to breach
the privacy using the output of the computation. Differentially private
versions of many important machine learning methods have been proposed, but
still there is a long way to pave towards an efficient unified approach
applicable to handle many models. In this paper, we propose a differentially
private variational inference method with a very wide applicability. The
variational inference is based on stochastic gradient ascent and can handle
non-conjugate models as well as conjugate ones. Differential privacy is
achieved by perturbing the gradients. We explore ways to make the algorithm
more efficient through privacy amplification from subsampling and through
clipping the gradients to limit the amount of information they leak. We explore
the effect of different parameter combinations in logistic regression problems
where the method can reach an accuracy close to non-private level under
reasonably strong privacy guarantees.
Wataru Kumagai
Comments: This paper was accepted at NIPS 2016 as a poster presentation
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We consider a transfer-learning problem by using the parameter transfer
approach, where a suitable parameter of feature mapping is learned through one
task and applied to another objective task. Then, we introduce the notion of
the local stability of parametric feature mapping and parameter transfer
learnability, and thereby derive a learning bound for parameter transfer
algorithms. As an application of parameter transfer learning, we discuss the
performance of sparse coding in self-taught learning. Although self-taught
learning algorithms with plentiful unlabeled data often show excellent
empirical performance, their theoretical analysis has not been studied. In this
paper, we also provide the first theoretical learning bound for self-taught
learning.
Luigi Leonardo Palese
Comments: 18 pages, 6 figures, 2 tables
Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG)
Principal component analysis (PCA) is a widespread technique for data
analysis that relies on the covariance-correlation matrix of the analyzed data.
However to properly work with high-dimensional data, PCA poses severe
mathematical constraints on the minimum number of different replicates or
samples that must be included in the analysis. Here we show that a modified
algorithm works not only on well dimensioned datasets, but also on degenerated
ones.
Pierre Alquier, The Tien Mai, Massimiliano Pontil
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We consider the problem of transfer learning in an online setting. Different
tasks are presented sequentially and processed by a within-task algorithm. We
propose a lifelong learning strategy which refines the underlying data
representation used by the within-task algorithm, thereby transferring
information from one task to the next. We show that when the within-task
algorithm comes with some regret bound, our strategy inherits this good
property. Our bounds are in expectation for a general loss function, and
uniform for a convex loss. We discuss applications to dictionary learning and
finite set of predictors. In the latter case, we improve previous
(O(1/sqrt{m})) bounds to (O(1/m)) where (m) is the per task sample size.
Yango He, Zhi Geng
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In this paper, we discuss structure learning of causal networks from multiple
data sets obtained by external intervention experiments where we do not know
what variables are manipulated. For example, the conditions in these
experiments are changed by changing temperature or using drugs, but we do not
know what target variables are manipulated by the external interventions. From
such data sets, the structure learning becomes more difficult. For this case,
we first discuss the identifiability of causal structures. Next we present a
graph-merging method for learning causal networks for the case that the sample
sizes are large for these interventions. Then for the case that the sample
sizes of these interventions are relatively small, we propose a data-pooling
method for learning causal networks in which we pool all data sets of these
interventions together for the learning. Further we propose a re-sampling
approach to evaluate the edges of the causal network learned by the
data-pooling method. Finally we illustrate the proposed learning methods by
simulations.
Nils Jansen, Murat Cubuktepe, Ufuk Topcu
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
We formalize synthesis of shared control protocols with correctness
guarantees for temporal logic specifications. More specifically, we introduce a
modeling formalism in which both a human and an autonomy protocol can issue
commands to a robot towards performing a certain task. These commands are
blended into a joint input to the robot. The autonomy protocol is synthesized
using an abstraction of possible human commands accounting for randomness in
decisions caused by factors such as fatigue or incomprehensibility of the
problem at hand. The synthesis is designed to ensure that the resulting robot
behavior satisfies given safety and performance specifications, e.g., in
temporal logic. Our solution is based on nonlinear programming and we address
the inherent scalability issue by presenting alternative methods. We assess the
feasibility and the scalability of the approach by an experimental evaluation.
Mohsen Mohammadkhani Razlighi, Nikola Zlatanov
Comments: 22 pages and 2 figures, prepared to be an extended version of conference (IEEE ICC 2017, 21-25 May 2017,Paris,France) paper
Subjects: Information Theory (cs.IT)
In this paper, we investigate the two-hop full-duplex (FD) relay channel with
self-interference and fading, which is comprised of a source, an FD relay, and
a destination, where a direct source-destination link does not exist and the FD
relay is impaired by self-interference. For this channel, we propose three
buffer-aided relaying schemes with adaptive reception-transmission at the FD
relay for the cases when the source and the relay both perform adaptive-power
allocation, fixed-power allocation, and fixed-rate transmission, respectively.
The proposed buffer-aided relaying schemes significantly improve the achievable
rate and the throughput of the considered relay channel by enabling the FD
relay to adaptively select to either receive, transmit, or simultaneously
receive and transmit in a given time slot based on the qualities of the
receiving, transmitting, and self-interference channels. Our numerical results
show that significant performance gains are achieved using the proposed
buffer-aided relaying schemes compared to conventional FD relaying, where the
FD relay is forced to always simultaneously receive and transmit, and to
buffer-aided half-duplex relaying, where the half-duplex relay cannot
simultaneously receive and transmit.
Abdelaziz Soulimani, Mustapha Benjillali, Hatim Chergui, Daniel B. da Costa
Comments: Draft, V1
Subjects: Information Theory (cs.IT)
The paper presents a comprehensive closed-form performance analysis framework
of multihop cooperative communications as promising schemes in the next
generation mmwave systems. As fading channel model, we adopt the advocated
Weibull channel for its flexible ability to cover different channel conditions,
and supported by many recent measurement campaigns in various emerging 5G
scenarios. The analyzed scheme consists of multiple “Detect-and-Forward” relays
with generalized high-order M-QAM transmissions. The end-to-end performance of
the multihop communication is evaluated in terms of outage probability, ergodic
capacity, bit error rate, and symbol error rate. For all these metrics, we
present exact closed-form expressions and their asymptotic behavior, some in
terms of generalized hypergeometric functions. The exactness of our study is
set by numerical analysis, and Monte-Carlo simulations assess the accuracy of
our obtained results for different system and channel parameters. Moreover,
while Fox H and bivariate Fox H functions are not yet implemented in Matlab, we
provide new codes for computing these functions in general contexts.
M.E. Shirokov
Comments: 34 pages, radically extended version of arXiv:1602.00568 (the finite energy case is added), any comments are welcome
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)
We obtain continuity bounds for basic information characteristics of quantum
channels depending on their input dimension (when it is finite) and on the
maximal level of input energy (when the input dimension is infinite).
First we prove continuity bounds for the output conditional mutual
information for a single channel and for (n) copies of a channel.
Then we obtain estimates for variation of the output Holevo quantity with
respect to simultaneous variations of a channel and of an input ensemble.
As a result tight and close-to-tight continuity bounds for basic capacities
of quantum channels depending on the input dimension are obtained. They
complement the Leung-Smith continuity bounds depending on the output dimension.
Finally, continuity bounds for basic capacities of infinite-dimensional
channels with the input energy constraints are proved. They show uniform
continuity of these capacities on the set of all channels with respect to the
norm of complete boundedness.