Yilun Cao, Hyungtae Lee, Heesung Kwon
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we introduce a novel fusion method that can enhance object
detection performance by fusing decisions from two different types of computer
vision tasks: object detection and image classification. In the proposed work,
the class label of an image obtained from image classification is viewed as
prior knowledge about existence or non-existence of certain objects. The prior
knowledge is then fused with the decisions of object detection to improve
detection accuracy by mitigating false positives of an object detector that are
strongly contradicted with the prior knowledge. A recently introduced novel
fusion approach called dynamic belief fusion (DBF) is used to fuse the detector
output with the classification prior. Experimental results show that the
detection performance of all the detection algorithms used in the proposed work
is improved on benchmark datasets via the proposed fusion framework.
Soo Min Kang, Richard P. Wildes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In computer vision, action recognition refers to the act of classifying an
action that is present in a given video and action detection involves locating
actions of interest in space and/or time. Videos, which contain photometric
information (e.g. RGB, intensity values) in a lattice structure, contain
information that can assist in identifying the action that has been imaged. The
process of action recognition and detection often begins with extracting useful
features and encoding them to ensure that the features are specific to serve
the task of action recognition and detection. Encoded features are then
processed through a classifier to identify the action class and their spatial
and/or temporal locations. In this report, a thorough review of various action
recognition and detection algorithms in computer vision is provided by
analyzing the two-step process of a typical action recognition and detection
algorithm: (i) extraction and encoding of features, and (ii) classifying
features into action classes. In efforts to ensure that computer vision-based
algorithms reach the capabilities that humans have of identifying actions
irrespective of various nuisance variables that may be present within the field
of view, the state-of-the-art methods are reviewed and some remaining problems
are addressed in the final chapter.
Joel Levis, Hyungtae Lee, Heesung Kwon, James Michaelis, Michael Kolodny, Sungmin Eum
Subjects: Computer Vision and Pattern Recognition (cs.CV)
General image classification approaches differentiate classes using strong
distinguishing features but some classes cannot be easily separated because
they contain very similar visual features. To deal with this problem, we can
use keywords relevant to a particular class. To implement this concept we have
newly constructed a malicious crowd dataset which contains crowd images with
two events, benign and malicious, which look similar yet involve opposite
semantic events. We also created a set of five malicious event-relevant
keywords such as police and fire. In the evaluation, integrating malicious
event classification with recognition output of these keywords enhances the
overall performance on the malicious crowd dataset.
Feng Li, Guangfan Zhang, Wei Wang, Roger Xu, Tom Schnell, Jonathan Wen, Frederic McKenzie, Jiang Li
Comments: 9 pages, 8 figures and 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Task engagement is defined as loadings on energetic arousal (affect), task
motivation, and concentration (cognition). It is usually challenging and
expensive to label cognitive state data, and traditional computational models
trained with limited label information for engagement assessment do not perform
well because of overfitting. In this paper, we proposed two deep models (i.e.,
a deep classifier and a deep autoencoder) for engagement assessment with scarce
label information. We recruited 15 pilots to conduct a 4-h flight simulation
from Seattle to Chicago and recorded their electroencephalograph (EEG) signals
during the simulation. Experts carefully examined the EEG signals and labeled
20 min of the EEG data for each pilot. The EEG signals were preprocessed and
power spectral features were extracted. The deep models were pretrained by the
unlabeled data and were fine-tuned by a different proportion of the labeled
data (top 1%, 3%, 5%, 10%, 15%, and 20%) to learn new representations for
engagement assessment. The models were then tested on the remaining labeled
data. We compared performances of the new data representations with the
original EEG features for engagement assessment. Experimental results show that
the representations learned by the deep models yielded better accuracies for
the six scenarios (77.09%, 80.45%, 83.32%, 85.74%, 85.78%, and 86.52%), based
on different proportions of the labeled data for training, as compared with the
corresponding accuracies (62.73%, 67.19%, 73.38%, 79.18%, 81.47%, and 84.92%)
achieved by the original EEG features. Deep models are effective for engagement
assessment especially when less label information was used for training.
Erik Rodner, Marcel Simon, Robert B. Fisher, Joachim Denzler
Comments: BMVC 2016 Paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we study the sensitivity of CNN outputs with respect to image
transformations and noise in the area of fine-grained recognition. In
particular, we answer the following questions (1) how sensitive are CNNs with
respect to image transformations encountered during wild image capture?; (2)
how can we predict CNN sensitivity?; and (3) can we increase the robustness of
CNNs with respect to image degradations? To answer the first question, we
provide an extensive empirical sensitivity analysis of commonly used CNN
architectures (AlexNet, VGG19, GoogleNet) across various types of image
degradations. This allows for predicting CNN performance for new domains
comprised by images of lower quality or captured from a different viewpoint. We
also show how the sensitivity of CNN outputs can be predicted for single
images. Furthermore, we demonstrate that input layer dropout or pre-filtering
during test time only reduces CNN sensitivity for high levels of degradation.
Experiments for fine-grained recognition tasks reveal that VGG19 is more
robust to severe image degradations than AlexNet and GoogleNet. However, small
intensity noise can lead to dramatic changes in CNN performance even for VGG19.
Nadia Robertini, Dan Casas, Helge Rhodin, Hans-Peter Seidel, Christian Theobalt
Comments: 3DV 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a new model-based method to accurately reconstruct human
performances captured outdoors in a multi-camera setup. Starting from a
template of the actor model, we introduce a new unified implicit representation
for both, articulated skeleton tracking and nonrigid surface shape refinement.
Our method fits the template to unsegmented video frames in two stages – first,
the coarse skeletal pose is estimated, and subsequently non-rigid surface shape
and body pose are jointly refined. Particularly for surface shape refinement we
propose a new combination of 3D Gaussians designed to align the projected model
with likely silhouette contours without explicit segmentation or edge
detection. We obtain reconstructions of much higher quality in outdoor settings
than existing methods, and show that we are on par with state-of-the-art
methods on indoor scenes for which they were designed
Ahmed Ben Said, Rachid Hadjidj, Kamel Eddine Melkemi, Sebti Foufou
Comments: 30 pages, 17 figures, journal paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Nowadays, many applications rely on images of high quality to ensure good
performance in conducting their tasks. However, noise goes against this
objective as it is an unavoidable issue in most applications. Therefore, it is
essential to develop techniques to attenuate the impact of noise, while
maintaining the integrity of relevant information in images. We propose in this
work to extend the application of the Non-Local Means filter (NLM) to the
vector case and apply it for denoising multispectral images. The objective is
to benefit from the additional information brought by multispectral imaging
systems. The NLM filter exploits the redundancy of information in an image to
remove noise. A restored pixel is a weighted average of all pixels in the
image. In our contribution, we propose an optimization framework where we
dynamically fine tune the NLM filter parameters and attenuate its computational
complexity by considering only pixels which are most similar to each other in
computing a restored pixel. Filter parameters are optimized using Stein’s
Unbiased Risk Estimator (SURE) rather than using ad hoc means. Experiments have
been conducted on multispectral images corrupted with additive white Gaussian
noise and PSNR and similarity comparison with other approaches are provided to
illustrate the efficiency of our approach in terms of both denoising
performance and computation complexity.
Dewei Li, Yingjie Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
It is critical and meaningful to make image classification since it can help
human in image retrieval and recognition, object detection, etc. In this paper,
three-sides efforts are made to accomplish the task. First, visual features
with bag-of-words representation, not single vector, are extracted to
characterize the image. To improve the performance, the idea of multi-view
learning is implemented and three kinds of features are provided, each one
corresponds to a single view. The information from three views is complementary
to each other, which can be unified together. Then a new distance function is
designed for bags by computing the weighted sum of the distances between
instances. The technique of metric learning is explored to construct a
data-dependent distance metric to measure the relationships between instances,
meanwhile between bags and images, more accurately. Last, a novel approach,
called MVML, is proposed, which optimizes the joint probability that every
image is similar with its nearest image. MVML learns multiple distance metrics,
each one models a single view, to unifies the information from multiple views.
The method can be solved by alternate optimization iteratively. Gradient ascent
and positive semi-definite projection are utilized in the iterations. Distance
comparisons verified that the new bag distance function is prior to previous
functions. In model evaluation, numerical experiments show that MVML with
multiple views performs better than single view condition, which demonstrates
that our model can assemble the complementary information efficiently and
measure the distance between images more precisely. Experiments on influence of
parameters and instance number validate the consistency of the method.
Chris Mattmann, Madhav Sharan
Comments: 7 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We contribute a scalable implementation of Ryoo et al’s Pooled Time Series
algorithm from CVPR 2015. The updated algorithm has been evaluated on a large
and diverse dataset of approximately 6800 videos collected from a crawl of the
deep web related to human trafficking on DARPA’s MEMEX effort. We describe the
properties of Pooled Time Series and the motivation for using it to relate
videos collected from the deep web. We highlight issues that we found while
running Pooled Time Series on larger datasets and discuss solutions for those
issues. Our solution centers are re-imagining Pooled Time Series as a
Hadoop-based algorithm in which we compute portions of the eventual solution in
parallel on large commodity clusters. We demonstrate that our new Hadoop-based
algorithm works well on the 6800 video dataset and shares all of the properties
described in the CVPR 2015 paper. We suggest avenues of future work in the
project.
Soumyabrata Dev, Shilpa Manandhar, Yee Hui Lee, Stefan Winkler
Comments: Accepted in Proc. TENCON 2016 – 2016 IEEE Region 10 Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)
Ground-based sky cameras (popularly known as Whole Sky Imagers) are
increasingly used now-a-days for continuous monitoring of the atmosphere. These
imagers have higher temporal and spatial resolutions compared to conventional
satellite images. In this paper, we use ground-based sky cameras to detect the
onset of rainfall. These images contain additional information about cloud
coverage and movement and are therefore useful for accurate rainfall nowcast.
We validate our results using rain gauge measurement recordings and achieve an
accuracy of 89% for correct detection of rainfall onset.
Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler
Comments: Accepted in Proc. TENCON 2016 – 2016 IEEE Region 10 Conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Fine-scale short-term cloud motion prediction is needed for several
applications, including solar energy generation and satellite communications.
In tropical regions such as Singapore, clouds are mostly formed by convection;
they are very localized, and evolve quickly. We capture hemispherical images of
the sky at regular intervals of time using ground-based cameras. They provide a
high resolution and localized cloud images. We use two successive frames to
compute optical flow and predict the future location of clouds. We achieve good
prediction accuracy for a lead time of up to 5 minutes.
Fangyi Zhang, Jürgen Leitner, Ben Upcroft, Peter Corke
Comments: Under review for the IEEE Robotics and Automation Letters (RA-L) with the option for the IEEE International Conference on Robotics and Automation (ICRA) 2017 (submitted on 10 Sep, 2016)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Systems and Control (cs.SY)
In this paper we describe a deep network architecture that maps visual input
to control actions for a robotic planar reaching task with 100% reliability in
real-world trials. Our network is trained in simulation and fine-tuned with a
limited number of real-world images. The policy search is guided by a
kinematics-based controller (K-GPS), which works more effectively and
efficiently than (varepsilon)-Greedy. A critical insight in our system is the
need to introduce a bottleneck in the network between the perception and
control networks, and to initially train these networks independently.
Omid Bakhshandeh, Trung Bui, Zeh Lin, Walter Chang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Answering open-ended questions is an essential capability for any intelligent
agent. One of the most interesting recent open-ended question answering
challenges is Visual Question Answering (VQA) which attempts to evaluate a
system’s visual understanding through its answers to natural language questions
about images. There exist many approaches to VQA, the majority of which do not
exhibit deeper semantic understanding of the candidate answers they produce. We
study the importance of generating plausible answers to a given question by
introducing the novel task of `Answer Proposal’: for a given open-ended
question, a system should generate a ranked list of candidate answers informed
by the semantics of the question. We experiment with various models including a
neural generative model as well as a semantic graph matching one. We provide
both intrinsic and extrinsic evaluations for the task of Answer Proposal,
showing that our best model learns to propose plausible answers with a high
recall and performs competitively with some other solutions to VQA.
Prakhar Ojha, Partha Talukdar
Subjects: Artificial Intelligence (cs.AI)
Automatic construction of large knowledge graphs (KG) by mining web-scale
text datasets has received considerable attention over the last few years,
resulting in the construction of several KGs, such as NELL, Google Knowledge
Vault, etc. These KGs consist of thousands of predicate-relations (e.g.,
isPerson, isMayorOf ) and millions of their instances (e.g., (Bill de Blasio,
isMayorOf, New York City)). Estimating accuracy of such automatically
constructed KGs is a challenging problem due to their size and diversity. Even
though crowdsourcing is an obvious choice for such evaluation, the standard
single-task crowdsourcing, where each predicate in the KG is evaluated
independently, is very expensive and especially problematic if the budget
available is limited. We show that such approaches are sub-optimal as they
ignore dependencies among various predicates and their instances. To overcome
this challenge, we propose Relational Crowdsourcing (RelCrowd), where the tasks
are created while taking dependencies among predicates and instances into
account. We apply this framework in the context of evaluation of large-scale
KGs and demonstrate its effectiveness through extensive experiments on
real-world datasets.
Khudran Alzhrani, Ethan M. Rudd, Terrance E. Boult, C. Edward Chow
Comments: Pre-print of Best Paper Award IEEE Intelligence and Security Informatics (ISI) 2016 Manuscript
Journal-ref: 2016 IEEE International Conference on Intelligence and Security
Informatics (ISI)
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
In recent years, traditional cybersecurity safeguards have proven ineffective
against insider threats. Famous cases of sensitive information leaks caused by
insiders, including the WikiLeaks release of diplomatic cables and the Edward
Snowden incident, have greatly harmed the U.S. government’s relationship with
other governments and with its own citizens. Data Leak Prevention (DLP) is a
solution for detecting and preventing information leaks from within an
organization’s network. However, state-of-art DLP detection models are only
able to detect very limited types of sensitive information, and research in the
field has been hindered due to the lack of available sensitive texts. Many
researchers have focused on document-based detection with artificially labeled
“confidential documents” for which security labels are assigned to the entire
document, when in reality only a portion of the document is sensitive. This
type of whole-document based security labeling increases the chances of
preventing authorized users from accessing non-sensitive information within
sensitive documents. In this paper, we introduce Automated Classification
Enabled by Security Similarity (ACESS), a new and innovative detection model
that penetrates the complexity of big text security classification/detection.
To analyze the ACESS system, we constructed a novel dataset, containing
formerly classified paragraphs from diplomatic cables made public by the
WikiLeaks organization. To our knowledge this paper is the first to analyze a
dataset that contains actual formerly sensitive information annotated at
paragraph granularity.
Fangyi Zhang, Jürgen Leitner, Ben Upcroft, Peter Corke
Comments: Under review for the IEEE Robotics and Automation Letters (RA-L) with the option for the IEEE International Conference on Robotics and Automation (ICRA) 2017 (submitted on 10 Sep, 2016)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Systems and Control (cs.SY)
In this paper we describe a deep network architecture that maps visual input
to control actions for a robotic planar reaching task with 100% reliability in
real-world trials. Our network is trained in simulation and fine-tuned with a
limited number of real-world images. The policy search is guided by a
kinematics-based controller (K-GPS), which works more effectively and
efficiently than (varepsilon)-Greedy. A critical insight in our system is the
need to introduce a bottleneck in the network between the perception and
control networks, and to initially train these networks independently.
Omid Bakhshandeh, Trung Bui, Zeh Lin, Walter Chang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Answering open-ended questions is an essential capability for any intelligent
agent. One of the most interesting recent open-ended question answering
challenges is Visual Question Answering (VQA) which attempts to evaluate a
system’s visual understanding through its answers to natural language questions
about images. There exist many approaches to VQA, the majority of which do not
exhibit deeper semantic understanding of the candidate answers they produce. We
study the importance of generating plausible answers to a given question by
introducing the novel task of `Answer Proposal’: for a given open-ended
question, a system should generate a ranked list of candidate answers informed
by the semantics of the question. We experiment with various models including a
neural generative model as well as a semantic graph matching one. We provide
both intrinsic and extrinsic evaluations for the task of Answer Proposal,
showing that our best model learns to propose plausible answers with a high
recall and performs competitively with some other solutions to VQA.
Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
Recent work on discriminative segmental models has shown that they can
achieve competitive speech recognition performance, using features based on
deep neural frame classifiers. However, segmental models can be more
challenging to train than standard frame-based approaches. While some segmental
models have been successfully trained end to end, there is a lack of
understanding of their training under different settings and with different
losses.
We investigate a model class based on recent successful approaches,
consisting of a linear model that combines segmental features based on an LSTM
frame classifier. Similarly to hybrid HMM-neural network models, segmental
models of this class can be trained in two stages (frame classifier training
followed by linear segmental model weight training), end to end (joint training
of both frame classifier and linear weights), or with end-to-end fine-tuning
after two-stage training.
We study segmental models trained end to end with hinge loss, log loss,
latent hinge loss, and marginal log loss. We consider several losses for the
case where training alignments are available as well as where they are not.
We find that in general, marginal log loss provides the most consistent
strong performance without requiring ground-truth alignments. We also find that
training with dropout is very important in obtaining good performance with
end-to-end training. Finally, the best results are typically obtained by a
combination of two-stage training and fine-tuning.
Omid Bakhshandeh, Trung Bui, Zeh Lin, Walter Chang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Answering open-ended questions is an essential capability for any intelligent
agent. One of the most interesting recent open-ended question answering
challenges is Visual Question Answering (VQA) which attempts to evaluate a
system’s visual understanding through its answers to natural language questions
about images. There exist many approaches to VQA, the majority of which do not
exhibit deeper semantic understanding of the candidate answers they produce. We
study the importance of generating plausible answers to a given question by
introducing the novel task of `Answer Proposal’: for a given open-ended
question, a system should generate a ranked list of candidate answers informed
by the semantics of the question. We experiment with various models including a
neural generative model as well as a semantic graph matching one. We provide
both intrinsic and extrinsic evaluations for the task of Answer Proposal,
showing that our best model learns to propose plausible answers with a high
recall and performs competitively with some other solutions to VQA.
Roman Novak, Michael Auli, David Grangier
Subjects: Computation and Language (cs.CL)
Existing machine translation decoding algorithms generate translations in a
strictly monotonic fashion and never revisit previous decisions. As a result,
earlier mistakes cannot be corrected at a later stage. In this paper, we
present a translation scheme that starts from an initial guess and then makes
iterative improvements that may revisit previous decisions. We parameterize our
model as a convolutional neural network that predicts discrete substitutions to
an existing translation based on an attention mechanism over both the source
sentence as well as the current translation output. By making less than one
modification per sentence, we improve the output of a phrase-based translation
system by up to 0.4 BLEU on WMT15 German-English translation.
Alok Ranjan Pal, Anupam Munshi, Diganta Saha
Comments: 13 pages in International Journal of Instrumentation and Control Systems (IJICS) Vol.3, No.4, October 2013
Subjects: Computation and Language (cs.CL)
In this paper, we are going to focus on speed up of the Word Sense
Disambiguation procedure by filtering the relevant senses of an ambiguous word
through Part-of-Speech Tagging. First, this proposed approach performs the
Part-of-Speech Tagging operation before the disambiguation procedure using
Bigram approximation. As a result, the exact Part-of-Speech of the ambiguous
word at a particular text instance is derived. In the next stage, only those
dictionary definitions (glosses) are retrieved from an online dictionary, which
are associated with that particular Part-of-Speech to disambiguate the exact
sense of the ambiguous word. In the training phase, we have used Brown Corpus
for Part-of-Speech Tagging and WordNet as an online dictionary. The proposed
approach reduces the execution time upto half (approximately) of the normal
execution time for a text, containing around 200 sentences. Not only that, we
have found several instances, where the correct sense of an ambiguous word is
found for using the Part-of-Speech Tagging before the Disambiguation procedure.
Khudran Alzhrani, Ethan M. Rudd, Terrance E. Boult, C. Edward Chow
Comments: Pre-print of Best Paper Award IEEE Intelligence and Security Informatics (ISI) 2016 Manuscript
Journal-ref: 2016 IEEE International Conference on Intelligence and Security
Informatics (ISI)
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
In recent years, traditional cybersecurity safeguards have proven ineffective
against insider threats. Famous cases of sensitive information leaks caused by
insiders, including the WikiLeaks release of diplomatic cables and the Edward
Snowden incident, have greatly harmed the U.S. government’s relationship with
other governments and with its own citizens. Data Leak Prevention (DLP) is a
solution for detecting and preventing information leaks from within an
organization’s network. However, state-of-art DLP detection models are only
able to detect very limited types of sensitive information, and research in the
field has been hindered due to the lack of available sensitive texts. Many
researchers have focused on document-based detection with artificially labeled
“confidential documents” for which security labels are assigned to the entire
document, when in reality only a portion of the document is sensitive. This
type of whole-document based security labeling increases the chances of
preventing authorized users from accessing non-sensitive information within
sensitive documents. In this paper, we introduce Automated Classification
Enabled by Security Similarity (ACESS), a new and innovative detection model
that penetrates the complexity of big text security classification/detection.
To analyze the ACESS system, we constructed a novel dataset, containing
formerly classified paragraphs from diplomatic cables made public by the
WikiLeaks organization. To our knowledge this paper is the first to analyze a
dataset that contains actual formerly sensitive information annotated at
paragraph granularity.
Leonid Barenboim, Michael Elkin, Tzalik Maimon
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
We consider coloring problems in the distributed message-passing setting. The
previously-known deterministic algorithms for edge-coloring employed at least
(2Delta – 1) colors, even though any graph admits an edge-coloring with Delta +
1 colors [V64]. Moreover, the previously-known deterministic algorithms that
employed at most O(Delta) colors required superlogarithmic time
[B15,BE10,BE11,FHK15]. In the current paper we devise deterministic
edge-coloring algorithms that employ only Delta + o(Delta) colors, for a very
wide family of graphs. Specifically, as long as the arboricity is a =
O(Delta^{1 – epsilon}), for a constant epsilon > 0, our algorithm computes
such a coloring within {polylogarithmic} deterministic time. We also devise
significantly improved deterministic edge-coloring algorithms for {general
graphs} for a very wide range of parameters. Specifically, for any value (chi)
in the range [4Delta, 2^{o(log Delta)} cdot Delta], our chi-edge-coloring
algorithm has smaller running time than the best previously-known
chi-edge-coloring algorithms. Our algorithms are actually much more general,
since edge-coloring is equivalent to {vertex-coloring of line graphs.} Our
method is applicable to vertex-coloring of the family of graphs with {bounded
diversity} that contains line graphs, line graphs of hypergraphs, and many
other graphs.
Our results are obtained using a novel technique that connects vertices or
edges in a certain way that reduces clique size. The resulting structures,
which we call {connectors}, can be colored more efficiently than the original
graph. Moreover, the color classes constitute simpler subgraphs that can be
colored even more efficiently using appropriate connectors. Hence, we recurse
until we obtain sufficiently simple structures that are colored directly. We
introduce several types of connectors that are useful for various scenarios.
Haoyu Chen, Daniel Seita, Xinlei Pan, John Canny
Comments: Under review
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We present a novel Metropolis-Hastings method for large datasets that uses
small expected-size minibatches of data. Previous work on reducing the cost of
Metropolis-Hastings tests yield variable data consumed per sample, with only
constant factor reductions versus using the full dataset for each sample. Here
we present a method that can be tuned to provide arbitrarily small batch sizes,
by adjusting either proposal step size or temperature. Our test uses the
noise-tolerant Barker acceptance test with a novel additive correction
variable. The resulting test has similar cost to a normal SGD update. Our
experiments demonstrate several order-of-magnitude speedups over previous work.
Carlos M. Alaíz, Michaël Fanuel, Johan A. K. Suykens
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In this paper, Kernel PCA is reinterpreted as the solution to a convex
optimization problem. Actually, there is a constrained convex problem for each
principal component, so that the constraints guarantee that the principal
component is indeed a solution, and not a mere saddle point. Although these
insights do not imply any algorithmic improvement, they can be used to further
understand the method, formulate possible extensions and properly address them.
As an example, a new convex optimization problem for semi-supervised
classification is proposed, which seems particularly well-suited whenever the
number of known labels is small. Our formulation resembles a Least Squares SVM
problem with a regularization parameter multiplied by a negative sign, combined
with a variational principle for Kernel PCA. Our primal optimization principle
for semi-supervised learning is solved in terms of the Lagrange multipliers.
Numerical experiments in several classification tasks illustrate the
performance of the proposed model in problems with only a few labeled data.
Tianpei Xie, Nasser. M. Narabadi, Alfred O. Hero
Comments: 13 pages; Accepted in Transaction on Signal Processing, 2016. arXiv admin note: text overlap with arXiv:1507.04540
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we propose a general framework to learn a robust large-margin
binary classifier when corrupt measurements, called anomalies, caused by sensor
failure might be present in the training set. The goal is to minimize the
generalization error of the classifier on non-corrupted measurements while
controlling the false alarm rate associated with anomalous samples. By
incorporating a non-parametric regularizer based on an empirical entropy
estimator, we propose a Geometric-Entropy-Minimization regularized Maximum
Entropy Discrimination (GEM-MED) method to learn to classify and detect
anomalies in a joint manner. We demonstrate using simulated data and a real
multimodal data set. Our GEM-MED method can yield improved performance over
previous robust classification methods in terms of both classification accuracy
and anomaly detection rate.
Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu
Comments: to appear in NIPS 2016
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
In this paper, we study the stochastic combinatorial multi-armed bandit
(CMAB) framework that allows a general nonlinear reward function, whose
expected value may not depend only on the means of the input random variables
but possibly on the entire distributions of these variables. Our framework
enables a much larger class of reward functions such as the (max()) function
and nonlinear utility functions. Existing techniques relying on accurate
estimations of the means of random variables, such as the upper confidence
bound (UCB) technique, do not work directly on these functions. We propose a
new algorithm called stochastically dominant confidence bound (SDCB), which
estimates the distributions of underlying random variables and their
stochastically dominant confidence bounds. We prove that SDCB can achieve
(O(log T)) distribution-dependent regret and ( ilde{O}(sqrt{T}))
distribution-independent regret, where (T) is the time horizon. We apply our
results to the (K)-MAX problem and expected utility maximization problems. In
particular, for (K)-MAX, we provide the first polynomial-time approximation
scheme (PTAS) for its offline problem, and give the first ( ilde{O}(sqrt T))
bound on the ((1-epsilon))-approximation regret of its online problem, for any
(epsilon>0).
Martín Abadi, David G. Andersen (Google Brain)
Comments: 15 pages
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
We ask whether neural networks can learn to use secret keys to protect
information from other neural networks. Specifically, we focus on ensuring
confidentiality properties in a multiagent system, and we specify those
properties in terms of an adversary. Thus, a system may consist of neural
networks named Alice and Bob, and we aim to limit what a third neural network
named Eve learns from eavesdropping on the communication between Alice and Bob.
We do not prescribe specific cryptographic algorithms to these neural networks;
instead, we train end-to-end, adversarially. We demonstrate that the neural
networks can learn how to perform forms of encryption and decryption, and also
how to apply these operations selectively in order to meet confidentiality
goals.
Fangyi Zhang, Jürgen Leitner, Ben Upcroft, Peter Corke
Comments: Under review for the IEEE Robotics and Automation Letters (RA-L) with the option for the IEEE International Conference on Robotics and Automation (ICRA) 2017 (submitted on 10 Sep, 2016)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Systems and Control (cs.SY)
In this paper we describe a deep network architecture that maps visual input
to control actions for a robotic planar reaching task with 100% reliability in
real-world trials. Our network is trained in simulation and fine-tuned with a
limited number of real-world images. The policy search is guided by a
kinematics-based controller (K-GPS), which works more effectively and
efficiently than (varepsilon)-Greedy. A critical insight in our system is the
need to introduce a bottleneck in the network between the perception and
control networks, and to initially train these networks independently.
Erik Rodner, Björn Barz, Yanira Guanche, Milan Flach, Miguel Mahecha, Paul Bodesheim, Markus Reichstein, Joachim Denzler
Comments: ICML Workshop on Anomaly Detection
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We present new methods for batch anomaly detection in multivariate time
series. Our methods are based on maximizing the Kullback-Leibler divergence
between the data distribution within and outside an interval of the time
series. An empirical analysis shows the benefits of our algorithms compared to
methods that treat each time step independently from each other without
optimizing with respect to all possible intervals.
Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
Recent work on discriminative segmental models has shown that they can
achieve competitive speech recognition performance, using features based on
deep neural frame classifiers. However, segmental models can be more
challenging to train than standard frame-based approaches. While some segmental
models have been successfully trained end to end, there is a lack of
understanding of their training under different settings and with different
losses.
We investigate a model class based on recent successful approaches,
consisting of a linear model that combines segmental features based on an LSTM
frame classifier. Similarly to hybrid HMM-neural network models, segmental
models of this class can be trained in two stages (frame classifier training
followed by linear segmental model weight training), end to end (joint training
of both frame classifier and linear weights), or with end-to-end fine-tuning
after two-stage training.
We study segmental models trained end to end with hinge loss, log loss,
latent hinge loss, and marginal log loss. We consider several losses for the
case where training alignments are available as well as where they are not.
We find that in general, marginal log loss provides the most consistent
strong performance without requiring ground-truth alignments. We also find that
training with dropout is very important in obtaining good performance with
end-to-end training. Finally, the best results are typically obtained by a
combination of two-stage training and fine-tuning.
Changyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang, Lawrence Carin
Comments: NIPS2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Stochastic gradient MCMC (SG-MCMC) has played an important role in
large-scale Bayesian learning, with well-developed theoretical convergence
properties. In such applications of SG-MCMC, it is becoming increasingly
popular to employ distributed systems, where stochastic gradients are computed
based on some outdated parameters, yielding what are termed stale gradients.
While stale gradients could be directly used in SG-MCMC, their impact on
convergence properties has not been well studied. In this paper we develop
theory to show that while the bias and MSE of an SG-MCMC algorithm depend on
the staleness of stochastic gradients, its estimation variance (relative to the
expected estimate, based on a prescribed number of samples) is independent of
it. In a simple Bayesian distributed system with SG-MCMC, where stale gradients
are computed asynchronously by a set of workers, our theory indicates a linear
speedup on the decrease of estimation variance w.r.t. the number of workers.
Experiments on synthetic data and deep neural networks validate our theory,
demonstrating the effectiveness and scalability of SG-MCMC with stale
gradients.
Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis
Comments: 24 pages, 4 figures, NIPS 2016
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)
In this paper we present a new algorithm for computing a low rank
approximation of the product (A^TB) by taking only a single pass of the two
matrices (A) and (B). The straightforward way to do this is to (a) first sketch
(A) and (B) individually, and then (b) find the top components using PCA on the
sketch. Our algorithm in contrast retains additional summary information about
(A,B) (e.g. row and column norms etc.) and uses this additional information to
obtain an improved approximation from the sketches. Our main analytical result
establishes a comparable spectral norm guarantee to existing two-pass methods;
in addition we also provide results from an Apache Spark implementation that
shows better computational and statistical performance on real-world and
synthetic evaluation datasets.
Arun Kumar, Paul Schrater
Subjects: Human-Computer Interaction (cs.HC); Learning (cs.LG)
The vast majority of recommender systems model preferences as static or
slowly changing due to observable user experience. However, spontaneous changes
in user preferences are ubiquitous in many domains like media consumption and
key factors that drive changes in preferences are not directly observable.
These latent sources of preference change pose new challenges. When systems do
not track and adapt to users’ tastes, users lose confidence and trust,
increasing the risk of user churn. We meet these challenges by developing a
model of novelty preferences that learns and tracks latent user tastes. We
combine three innovations: a new measure of item similarity based on patterns
of consumption co-occurrence; model for {em spontaneous} changes in
preferences; and a learning agent that tracks each user’s dynamic preferences
and learns individualized policies for variety. The resulting framework
adaptively provides users with novelty tailored to their preferences for change
per se.
Chun-Kit Lai, Shidong Li, Daniel Mondo
Comments: 12 pages, 2 figures
Subjects: Information Theory (cs.IT); Functional Analysis (math.FA)
Solving compressed sensing problems relies on the properties of sparse
signals. It is commonly assumed that the sparsity s needs to be less than one
half of the spark of the sensing matrix A, and then the unique sparsest
solution exists, and recoverable by (ell_1)-minimization or related
procedures. We discover, however, a measure theoretical uniqueness exists for
nearly spark-level sparsity from compressed measurements Ax = b. Specifically,
suppose A is of full spark with m rows, and suppose (frac{m}{2}) < s < m. Then
the solution to Ax = b is unique for x with (|x|_0 leq s) up to a set of
measure 0 in every s-sparse plane. This phenomenon is observed and confirmed by
an (ell_1)-tail minimization procedure, which recovers sparse signals uniquely
with s > (frac{m}{2}) in thousands and thousands of random tests. We further
show instead that the mere (ell_1)-minimization would actually fail if s >
(frac{m}{2}) even from the same measure theoretical point of view.
Arman Shojaeifard, Kai-Kit Wong, Khairi Ashour Hamdi, Emad Alsusa, Daniel K. C. So, Jie Tang
Subjects: Information Theory (cs.IT)
Dense cellular networks (DenseNets) are fast becoming a reality with the
rapid deployment of base stations (BSs) aimed at meeting the explosive data
traffic demand. In legacy systems however this comes with the penalties of
higher network interference and energy consumption. In order to support network
densification in a sustainable manner, the system behavior should be made
‘load-proportional’ thus allowing certain portions of the network to activate
on-demand. In this work, we develop an analytical framework using tools from
stochastic geometry theory for the performance analysis of DenseNets where
load-awareness is explicitly embedded in the design. The model leverages on a
flexible cellular network architecture where there is a complete separation of
the data and signaling communication functionalities. Using the proposed model,
we identify the most energy- efficient deployment solution for meeting certain
minimum service criteria and analyze the corresponding power savings through
dynamic sleep modes. Based on state-of-the-art system parameters, a homogeneous
pico deployment for the data plane with a separate layer of signaling
macro-cells is revealed to be the most energy-efficient solution in future
dense urban environments.
Linqi Song, Christina Fragouli
Subjects: Information Theory (cs.IT)
In pliable index coding, we consider a server with (m) messages and (n)
clients where each client has as side information a subset of the messages. We
seek to minimize the number of broadcast transmissions, so that each client can
recover any one unknown message she does not already have. Previous work has
shown that the pliable index coding problem is NP-hard and requires at most
(mathcal{O}(log^2(n))) broadcast transmissions, which indicates exponential
savings over the conventional index coding that requires in the worst case
(mathcal{O}(n)) transmissions. In this work, building on a decoding criterion
that we propose, we first design a deterministic polynomial-time algorithm that
can realize the exponential benefits, by achieving, in the worst case, a
performance upper bounded by (mathcal{O}(log^2(n))) broadcast transmissions.
We extend our algorithm to the (t)-requests case, where each client requires
(t) unknown messages that she does not have, and show that our algorithm
requires at most (mathcal{O}(tlog(n)+log^2(n))) broadcast transmissions. We
construct lower bound instances that require at least (Omega(log(n)))
transmissions for linear pliable index coding and at least (Omega(t+log(n)))
transmissions for the (t)-requests case, indicating that our upper bounds are
almost tight. Finally, we provide a probabilistic analysis and show that the
required number of transmissions is almost surely (Theta(log(n))), as
compared to (Theta(n/log(n))) for index coding. Our numerical experiments
show that our algorithm outperforms existing algorithms for pliable index
coding by up to (50\%) less transmissions.
Orod Raeesi, Ahmet Gokceoglu, Yaning Zou, Mikko Valkama
Subjects: Information Theory (cs.IT)
This paper analyzes the performance of linearly precoded time division duplex
based multi-user massive MIMO downlink system under joint impacts of channel
non-reciprocity (NRC) and imperfect channel state information (CSI). We
consider a practical NRC model which accounts for transceiver
frequency-response mismatches at both user equipment (UE) and base station (BS)
sides as well as mutual coupling mismatches at BS. The analysis covers two most
prominent forms of linear precoding schemes, namely, zero-forcing (ZF) and
maximum-ratio transmission (MRT), and assumes that the statistical channel
properties are used in the user side to decode the received signal. Closed-form
analytical expressions are derived for the effective signal to interference and
noise ratios (SINRs) and the corresponding capacity lower bounds, stemming from
the developed signal and system models. The derived analytical expressions show
that, in moderate to high SNR region, the additional interference caused by
practical NRC levels degrades the performance of both precoders significantly.
Moreover, the ZF is shown to be more sensitive to NRC with a much more severe
performance loss compared to MRT. Numerical evaluations with practical NRC
levels indicate that this performance loss in the received SINR can be as high
as 80% for ZF, whereas it is typically less than 20% for MRT. The derived
analytical expressions provide useful tools, e.g., in calculating the NRC
calibration requirements in BSs and UEs for given specific performance targets
in terms of the system capacity lower bound or effective SINR.
Ma Xiao, Liu Jia, Zhao Shancheng
Comments: arXiv admin note: text overlap with arXiv:1202.0592
Subjects: Information Theory (cs.IT)
In this paper, the sphere bound (SB) is revisited within a general bounding
framework based on nested Gallager regions. The equivalence is revealed between
the SB proposed by Herzberg and Poltyrev and the SB proposed by Kasami et al.,
whereas the latter was rarely cited in the literatures. Interestingly and
importantly, the derivation of the SB based on nested Gallager regions suggests
us a new simulation approach to performance evaluation of binary linear codes
over additive white Gaussian noise (AWGN) channels. In order for the
performance evaluation, the proposed approach decouples the geometrical
structure of the code from the noise statistics. The former specifies the
conditional error probabilities, which are independent of signal-to-noise
ratios (SNRs) and can be simulated and estimated efficiently, while the latter
determines the probabilities of those conditions, which involve SNRs and can be
calculated numerically. Numerical results show that the proposed simulation
approach matches well with the traditional simulation approach in the high
error rate region but is able to evaluate efficiently the performance in the
extremely low error rate region.
Amodh Kant Saxena, Inbar Fijalkow, A. Lee Swindlehurst
Subjects: Information Theory (cs.IT)
We present a mathematical analysis of linear precoders for downlink massive
MIMO multiuser systems that employ one-bit digital-to-analog converters at the
basestation in order to reduce complexity and mitigate power usage. The
analysis is based on the Bussgang theorem, and applies generally to any linear
precoding scheme. We examine in detail the special case of the quantized
zero-forcing (ZF) precoder, and derive a simple asymptotic expression for the
resulting symbol error rate at each terminal. Our analysis illustrates that the
performance of the quantized ZF precoder depends primarily on the ratio of the
number of antennas to the number of users, and our simulations show that it can
outperform the much more complicated maximum likelihood encoder for
low-to-moderate signal to noise ratios, where massive MIMO systems are presumed
to operate. We also use the Bussgang theorem to derive a new linear precoder
optimized for the case of one-bit quantization, and illustrate its improved
performance.
Jinming Wen, Lanping Li, Xiaohu Tang, Wai Ho Mow, Chintha Tellambura
Subjects: Information Theory (cs.IT)
Although multiple-input and multiple-output (MIMO) wireless systems achieve
significant data rates (e.g., multiplexing) and reliability (e.g., diversity)
and are main enablers for high-rate 5G wireless systems, MIMO receivers require
high implementation complexity. A solution is the use of integer-forcing (IF)
linear receivers which first create an effective integer channel matrix. In
this paper, we propose an efficient algorithm to find the optimal integer
coefficient matrix which maximizes the achievable rate of the IF linear
receiver. The algorithm initializes with a suboptimal matrix, which is updated
by a novel and efficient algorithm during the process of an improved version of
sphere decoding until the optimal matrix is obtained. We theoretically show
that the proposed algorithm indeed finds the optimal coefficient matrix.
Furthermore, theoretically complexity analysis indicates that the complexity of
the new algorithm in big-O notation is an order of magnitude smaller, with
respect to the the dimension of the model matrix, than that of the so far most
efficient algorithm. Simulation results are also presented to confirm the
efficiency of our novel algorithm.
Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis
Comments: 24 pages, 4 figures, NIPS 2016
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)
In this paper we present a new algorithm for computing a low rank
approximation of the product (A^TB) by taking only a single pass of the two
matrices (A) and (B). The straightforward way to do this is to (a) first sketch
(A) and (B) individually, and then (b) find the top components using PCA on the
sketch. Our algorithm in contrast retains additional summary information about
(A,B) (e.g. row and column norms etc.) and uses this additional information to
obtain an improved approximation from the sketches. Our main analytical result
establishes a comparable spectral norm guarantee to existing two-pass methods;
in addition we also provide results from an Apache Spark implementation that
shows better computational and statistical performance on real-world and
synthetic evaluation datasets.