Tamas Madl
Comments: 6 pages in NIPS 2016 Workshop on Machine Learning for Health (ML4HC)
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)
Despite of the pain and limited accuracy of blood tests for early recognition
of cardiovascular disease, they dominate risk screening and triage. On the
other hand, heart rate variability is non-invasive and cheap, but not
considered accurate enough for clinical practice. Here, we tackle heart beat
interval based classification with deep learning. We introduce an end to end
differentiable hybrid architecture, consisting of a layer of biological neuron
models of cardiac dynamics (modified FitzHugh Nagumo neurons) and several
layers of a standard feed-forward neural network. The proposed model is
evaluated on ECGs from 474 stable at-risk (coronary artery disease) patients,
and 1172 chest pain patients of an emergency department. We show that it can
significantly outperform models based on traditional heart rate variability
predictors, as well as approaching or in some cases outperforming clinical
blood tests, based only on 60 seconds of inter-beat intervals.
Fathi M. Salem
Comments: 8 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We present a model of a basic recurrent neural network (or bRNN) that
includes a separate linear term with a slightly “stable” fixed matrix to
guarantee bounded solutions and fast dynamic response. We formulate a state
space viewpoint and adapt the constrained optimization Lagrange Multiplier
(CLM) technique and the vector Calculus of Variations (CoV) to derive the
(stochastic) gradient descent. In this process, one avoids the commonly used
re-application of the circular chain-rule and identifies the error
back-propagation with the co-state backward dynamic equations. We assert that
this bRNN can successfully perform regression tracking of time-series.
Moreover, the “vanishing and exploding” gradients are explicitly quantified and
explained through the co-state dynamics and the update laws. The adapted CoV
framework, in addition, can correctly and principally integrate new loss
functions in the network on any variable and for varied goals, e.g., for
supervised learning on the outputs and unsupervised learning on the internal
(hidden) states.
S. E. Russek, C. A. Donnelly, M. L. Schneider, B. Baek, M. R. Pufall, W. H. Rippard, P. F. Hopkins, P. D. Dresselhaus, S. P. Benz
Comments: 2016 IEEE International Conference on Rebooting Computing (ICRC)
Subjects: Superconductivity (cond-mat.supr-con); Neural and Evolutionary Computing (cs.NE)
Single flux quantum (SFQ) circuits form a natural neuromorphic technology
with SFQ pulses and superconducting transmission lines simulating action
potentials and axons, respectively. Here we present a new component, magnetic
Josephson junctions, that have a tunablility and re-configurability that was
lacking from previous SFQ neuromorphic circuits. The nanoscale magnetic
structure acts as a tunable synaptic constituent that modifies the junction
critical current. These circuits can operate near the thermal limit where
stochastic firing of the neurons is an essential component of the technology.
This technology has the ability to create complex neural systems with greater
than 10^21 neural firings per second with approximately 1 W dissipation.
Ahmed Mateen, Marriam Nazir, Salman Afsar Awan
Comments: 9 pages
Journal-ref: International Journal of Computer Applications Foundation of
Computer Science (FCS), NY, USA Volume 151 – Number 7 Year of Publication:
2016
Subjects: Software Engineering (cs.SE); Neural and Evolutionary Computing (cs.NE)
Testing provides means pertaining to assuring software performance. The total
aim of software industry is actually to make a certain start associated with
high quality software for the end user. However, associated with software
testing has quite a few underlying concerns, which are very important and need
to pay attention on these issues. These issues are effectively generating,
prioritization of test cases, etc. These issues can be overcome by paying
attention and focus. Solitary of the greatest Problems in the software testing
area is usually how to acquire a great proper set associated with cases to
confirm software. Some other strategies and also methodologies are proposed
pertaining to shipping care of most of these issues. Genetic Algorithm (GA)
belongs to evolutionary algorithms. Evolutionary algorithms have a significant
role in the automatic test generation and many researchers are focusing on it.
In this study explored software testing related issues by using the GA
approach. In addition to right after applying some analysis, better solution
produced, that is feasible and reliable. The particular research presents the
implementation of GAs because of its generation of optimized test cases. Along
these lines, this paper gives proficient system for the optimization of test
case generation using genetic algorithm.
David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
One of the key challenges of artificial intelligence is to learn models that
are effective in the context of planning. In this document we introduce the
predictron architecture. The predictron consists of a fully abstract model,
represented by a Markov reward process, that can be rolled forward multiple
“imagined” planning steps. Each forward pass of the predictron accumulates
internal rewards and values over multiple planning depths. The predictron is
trained end-to-end so as to make these accumulated values accurately
approximate the true value function. We applied the predictron to procedurally
generated random mazes and a simulator for the game of pool. The predictron
yielded significantly more accurate predictions than conventional deep neural
network architectures.
Ang Li, Allan Jabri, Armand Joulin, Laurens van der Maaten
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Real-world image recognition systems need to recognize tens of thousands of
classes that constitute a plethora of visual concepts. The traditional approach
of annotating thousands of images per class for training is infeasible in such
a scenario, prompting the use of webly supervised data. This paper explores the
training of image-recognition systems on large numbers of images and associated
user comments. In particular, we develop visual n-gram models that can predict
arbitrary phrases that are relevant to the content of an image. Our visual
n-gram models are feed-forward convolutional networks trained using new loss
functions that are inspired by n-gram models commonly used in language
modeling. We demonstrate the merits of our models in phrase prediction,
phrase-based image retrieval, relating images and captions, and zero-shot
transfer.
Antonio M. Lopez, Jiaolong Xu, Jose L. Gomez, David Vazquez, German Ros
Comments: Invited book chapter to appear in “Domain Adaptation in Computer Vision Applications”, Springer Series: Advances in Computer Vision and Pattern Recognition, Edited by Gabriela Csurka
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Supervised learning tends to produce more accurate classifiers than
unsupervised learning in general. This implies that training data is preferred
with annotations. When addressing visual perception challenges, such as
localizing certain object classes within an image, the learning of the involved
classifiers turns out to be a practical bottleneck. The reason is that, at
least, we have to frame object examples with bounding boxes in thousands of
images. A priori, the more complex the model is regarding its number of
parameters, the more annotated examples are required. This annotation task is
performed by human oracles, which ends up in inaccuracies and errors in the
annotations (aka ground truth) since the task is inherently very cumbersome and
sometimes ambiguous. As an alternative we have pioneered the use of virtual
worlds for collecting such annotations automatically and with high precision.
However, since the models learned with virtual data must operate in the real
world, we still need to perform domain adaptation (DA). In this chapter we
revisit the DA of a deformable part-based model (DPM) as an exemplifying case
of virtual- to-real-world DA. As a use case, we address the challenge of
vehicle detection for driver assistance, using different publicly available
virtual-world data. While doing so, we investigate questions such as: how does
the domain gap behave due to virtual-vs-real data with respect to dominant
object appearance per domain, as well as the role of photo-realism in the
virtual world.
Chao Chen, Alina Zare, Huy Trinh, Gbeng Omotara, J. Tory Cobb, Timotius Lagaunne
Comments: Version 1, Sent for Review. arXiv admin note: substantial text overlap with arXiv:1511.02821
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Topic models (e.g., pLSA, LDA, sLDA) have been widely used for segmenting
imagery. However, these models are confined to crisp segmentation, forcing a
visual word (i.e., an image patch) to belong to one and only one topic. Yet,
there are many images in which some regions cannot be assigned a crisp
categorical label (e.g., transition regions between a foggy sky and the ground
or between sand and water at a beach). In these cases, a visual word is best
represented with partial memberships across multiple topics. To address this,
we present a partial membership latent Dirichlet allocation (PM-LDA) model and
an associated parameter estimation algorithm. This model can be useful for
imagery where a visual word may be a mixture of multiple topics. Experimental
results on visual and sonar imagery show that PM-LDA can produce both crisp and
soft semantic image segmentations; a capability previous topic modeling methods
do not have.
Asad Khan, Luo Jiang, Wei Li, Ligang Liu
Comments: arXiv admin note: text overlap with arXiv:1610.04861
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Color transfer between images uses the statistics information of image
effectively. We present a novel approach of local color transfer between images
based on the simple statistics and locally linear embedding. A sketching
interface is proposed for quickly and easily specifying the color
correspondences between target and source image. The user can specify the
correspondences of local region using scribes, which more accurately transfers
the target color to the source image while smoothly preserving the boundaries,
and exhibits more natural output results. Our algorithm is not restricted to
one-to-one image color transfer and can make use of more than one target images
to transfer the color in different regions in the source image. Moreover, our
algorithm does not require to choose the same color style and image size
between source and target images. We propose the sub-sampling to reduce the
computational load. Comparing with other approaches, our algorithm is much
better in color blending in the input data. Our approach preserves the other
color details in the source image. Various experimental results show that our
approach specifies the correspondences of local color region in source and
target images. And it expresses the intention of users and generates more
actual and natural results of visual effect.
Konstantinos Kamnitsas, Christian Baumgartner, Christian Ledig, Virginia F.J. Newcombe, Joanna P. Simpson, Andrew D. Kane, David K. Menon, Aditya Nori, Antonio Criminisi, Daniel Rueckert, Ben Glocker
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Significant advances have been made towards building accurate automatic
segmentation systems for a variety of biomedical applications using machine
learning. However, the performance of these systems often degrades when they
are applied on new data that differ from the training data, for example, due to
variations in imaging protocols. Manually annotating new data for each test
domain is not a feasible solution. In this work we investigate unsupervised
domain adaptation using adversarial neural networks to train a segmentation
method which is more invariant to differences in the input data, and which does
not require any annotations on the test domain. Specifically, we learn
domain-invariant features by learning to counter an adversarial network, which
attempts to classify the domain of the input data by observing the activations
of the segmentation network. Furthermore, we propose a multi-connected domain
discriminator for improved adversarial training. Our system is evaluated using
two MR databases of subjects with traumatic brain injuries, acquired using
different scanners and imaging protocols. Using our unsupervised approach, we
obtain segmentation accuracies which are close to the upper bound of supervised
domain adaptation.
DaoYu Lin
Comments: arXiv admin note: text overlap with arXiv:1511.06434, arXiv:1606.03498, arXiv:1406.2661, arXiv:1508.00092 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Scene classification plays a key role in interpreting the remotely sensed
high-resolution images. With the development of deep learning, supervised
learning in classification of Remote Sensing with convolutional networks (CNNs)
has been frequently adopted. However, researchers paid less attention to
unsupervised learning in remote sensing with CNNs. In order to filling the gap,
this paper proposes a set of CNNs called extbf{M}ultiple
l extbf{A}ye extbf{R} fea extbf{T}ure m extbf{A}tching(MARTA) generative
adversarial networks (GANs) to learn representation using only unlabeled data.
There will be two models of MARTA GANs involved: (1) a generative model (G)
that captures the data distribution and provides more training data; (2) a
discriminative model (D) that estimates the possibility that a sample came from
the training data rather than (G) and in this way a well-formed representation
of dataset can be learned. Therefore, MARTA GANs obtain the state-of-the-art
results which outperform the results got from UC-Merced Land-use dataset and
Brazilian Coffee Scenes dataset.
David Nilsson, Cristian Sminchisescu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Semantic video segmentation is challenging due to the sheer amount of data
that needs to be processed and labeled in order to construct accurate models.
In this paper we present a deep, end-to-end trainable methodology to video
segmentation that is capable of leveraging information present in unlabeled
data in order to improve semantic estimates. Our model combines a convolutional
architecture and a spatial transformer recurrent layer that are able to
temporally propagate labeling information by means of optical flow, adaptively
gated based on its locally estimated uncertainty. The flow, the recogition and
the gated propagation modules can be trained jointly, end-to-end. The gated
recurrent flow propagation component of our model can be plugged-in any static
semantic segmentation architecture and turn it into a weakly supervised video
processing one. Our extensive experiments in the challenging CityScapes dataset
indicate that the resulting model can leverage unlabeled temporal frames next
to a labeled one in order to improve both the video segmentation accuracy and
the consistency of its temporal labeling, at no additional annotation cost.
Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, Fei Sha
Comments: Our implementation is available on this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Objects appear to scale differently in natural images. This fact requires
methods dealing with object-centric tasks e.g. object proposal to have robust
performance over scale variances of objects. In the paper we present a novel
segment proposal framework, namely FastMask, which takes advantage of the
hierarchical structure in deep convolutional neural network to segment
multi-scale objects in one shot. Innovatively, we generalize segment proposal
network into three different functional components (body, neck and head). We
further propose a weight-shared residual neck module as well as a
scale-tolerant attentional head module for multi-scale training and efficient
one-shot inference. On MS COCO benchmark, the proposed FastMask outperforms all
state-of-the-art segment proposal methods in average recall while keeping 2~5
times faster. More impressively, with a slight trade-off in accuracy, FastMask
can segment objects in near real time (~13 fps) at 800( imes)600 resolution
images, highlighting its potential in practical applications. Our
implementation is available on this https URL
Alexander Amini, Berthold Horn, Alan Edelman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Performance (cs.PF)
Convolutions have long been regarded as fundamental to applied mathematics,
physics and engineering. Their mathematical elegance allows for common tasks
such as numerical differentiation to be computed efficiently on large data
sets. Efficient computation of convolutions is critical to artificial
intelligence in real-time applications, like machine vision, where convolutions
must be continuously and efficiently computed on tens to hundreds of kilobytes
per second. In this paper, we explore how convolutions are used in fundamental
machine vision applications. We present an accelerated n-dimensional
convolution package in the high performance computing language, Julia, and
demonstrate its efficacy in solving the time to contact problem for machine
vision. Results are measured against synthetically generated videos and
quantitatively assessed according to their mean squared error from the ground
truth. We achieve over an order of magnitude decrease in compute time and
allocated memory for comparable machine vision applications. All code is
packaged and integrated into the official Julia Package Manager to be used in
various other scenarios.
Xiahai Zhuang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a method for simultaneous segmentation of multi-source
images, using the multivariate mixture model (MvMM) and maximum of
log-likelihood (LL) framework. The segmentation is a procedure of texture
classification, and the MvMM is used to model the joint intensity distribution
of the images. Specifically, the method is applied to the myocardial
segmentation combining the complementary texture information from
multi-sequence (MS) cardiac magnetic resonance (CMR) images. Furthermore, there
exist inter-image mis-registration and intra-image misalignment of slices in
the MS CMR images. Hence, the MvMM is formulated with transformations, which
are embedded into the LL framework and optimized simultaneously with the
segmentation parameters. The proposed method is able to correct the inter- and
intra-image misalignment by registering each slice of the MS CMR to a virtual
common space, as well as to delineate the indistinguishable boundaries of
myocardium consisting of pathologies. Results have shown statistically
significant improvement in the segmentation performance of the proposed method
with respect to the conventional approaches which can solely segment each image
separately. The proposed method has also demonstrated better robustness in the
incongruent data, where some images may not fully cover the region of interest
and the full coverage can only be reconstructed combining the images from
multiple sources.
D. S. Guru, N. Vinay Kumar
Comments: 15 pages, 6 figures, 6 tables, Proceedings of International Conference on Computer Vision and Image Processing (CVIP 2016). arXiv admin note: text overlap with arXiv:1609.01414
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, a model for classification of logos based on symbolic
representation of features is presented. The proposed model makes use of global
features of logo images such as color, texture, and shape features for
classification. The logo images are broadly classified into three different
classes, viz., logo image containing only text, an image with only symbol, and
an image with both text and a symbol. In each class, the similar looking logo
images are clustered using K-means clustering algorithm. The intra-cluster
variations present in each cluster corresponding to each class are then
preserved using symbolic interval data. Thus referenced logo images are
represented in the form of interval data. A sample logo image is then
classified using suitable symbolic classifier. For experimentation purpose,
relatively large amount of color logo images is created consisting of 5044 logo
images. The classification results are validated with the help of accuracy,
precision, recall, F-measure, and time. To check the efficacy of the proposed
model, the comparative analyses are given against the other models. The results
show that the proposed model outperforms the other models with respect to time
and F-measure.
Zhihua Ban, Jianguo Liu, Li Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Superpixel segmentation is used to partition an image into perceptually
coherence atomic regions. As a preprocessing step of computer vision
applications, it can enormously reduce the number of entries of subsequent
algorithms. With each superpixel associated with a Gaussian distribution, we
assume that a pixel is generated by first randomly choosing one of the
superpixels, and then the pixel is drawn from the corresponding Gaussian
density. Unlike most applications of Gaussian mixture model in clustering, data
points in our model are assumed to be non-identically distributed. Given an
image, a log-likelihood function is constructed for maximizing. Based on a
solution derived from the expectation-maximization method, a well designed
algorithm is proposed. Our method is of linear complexity with respect to the
number of pixels, and it can be implemented using parallel techniques. To the
best of our knowledge, our algorithm outperforms the state-of-the-art in
accuracy and presents a competitive performance in computational efficiency.
Hosein M. Golshan, Adam O. Hebb, Sara J. Hanrahan, Joshua Nedrud, Mohammad H. Mahoor
Comments: IEEE Conf on ICASSP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
Classification of human behavior is key to developing closed-loop Deep Brain
Stimulation (DBS) systems, which may be able to decrease the power consumption
and side effects of the existing systems. Recent studies have shown that the
Local Field Potential (LFP) signals from both Subthalamic Nuclei (STN) of the
brain can be used to recognize human behavior. Since the DBS leads implanted in
each STN can collect three bipolar signals, the selection of a suitable pair of
LFPs that achieves optimal recognition performance is still an open problem to
address. Considering the presence of synchronized aggregate activity in the
basal ganglia, this paper presents an FFT-based synchronization approach to
automatically select a relevant pair of LFPs and use the pair together with an
SVM-based MKL classifier for behavior recognition purposes. Our experiments on
five subjects show the superiority of the proposed approach compared to other
methods used for behavior classification.
Mahajabin Rahman, Davi Geiger
Comments: 16 pages, 13 figures
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)
The mixture of Gaussian distributions, a soft version of k-means , is
considered a state-of-the-art clustering algorithm. It is widely used in
computer vision for selecting classes, e.g., color, texture, and shapes. In
this algorithm, each class is described by a Gaussian distribution, defined by
its mean and covariance. The data is described by a weighted sum of these
Gaussian distributions. We propose a new method, inspired by quantum
interference in physics. Instead of modeling each class distribution directly,
we model a class wave function such that its magnitude square is the class
Gaussian distribution. We then mix the class wave functions to create the
mixture wave function. The final mixture distribution is then the magnitude
square of the mixture wave function. As a result, we observe the quantum class
interference phenomena, not present in the Gaussian mixture model. We show that
the quantum method outperforms the Gaussian mixture method in every aspect of
the estimations. It provides more accurate estimations of all distribution
parameters, with much less fluctuations, and it is also more robust to data
deformations from the Gaussian assumptions. We illustrate our method for color
segmentation as an example application.
Vikas K. Garg, Adam Tauman Kalai
Comments: 22 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We introduce a new paradigm to investigate unsupervised learning, reducing
unsupervised learning to supervised learning. Specifically, we mitigate the
subjectivity in unsupervised decision-making by leveraging knowledge acquired
from prior, possibly heterogeneous, supervised learning tasks. We demonstrate
the versatility of our framework via comprehensive expositions and detailed
experiments on several unsupervised problems such as (a) clustering, (b)
outlier detection, and (c) similarity prediction under a common umbrella of
meta-unsupervised-learning. We also provide rigorous PAC-agnostic bounds to
establish the theoretical foundations of our framework, and show that our
framing of meta-clustering circumvents Kleinberg’s impossibility theorem for
clustering.
Rouven Bauer
Subjects: Artificial Intelligence (cs.AI)
In this work we present an algorithm for composing monophonic melodies
similar in style to those of a given, phrase annotated, sample of melodies. For
implementation, a hybrid approach incorporating parametric Markov models of
higher order and a contour concept of phrases is used. This work is based on
the master thesis of Thayabaran Kathiresan (2015). An online listening test
conducted shows that enhancing a pure Markov model with musically relevant
context, like count and planed melody contour, improves the result
significantly.
Nicolas Le Roux
Comments: 12 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)
We tackle the issue of finding a good policy when the number of policy
updates is limited. This is done by approximating the expected policy reward as
a sequence of concave lower bounds which can be efficiently maximized,
drastically reducing the number of policy updates required to achieve good
performance. We also extend existing methods to negative rewards, enabling the
use of control variates.
Joshua S. Friedman
Subjects: Artificial Intelligence (cs.AI)
We formulate an integer program to solve a highly constrained academic
timetabling problem at the United States Merchant Marine Academy. The IP
instance that results from our real case study has approximately both 170,000
rows and columns and solves to near optimality in 12 hours, using a commercial
solver. Our model is applicable to both high schools and small colleges who
wish to deviate from group scheduling. We also solve a necessary preprocessing
student subgrouping problem, which breaks up big groups of students into small
groups so they can optimally fit into small capacity classes.
Eugenia Ternovska
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Databases (cs.DB)
We propose a new formalism for specifying and reasoning about problems that
involve heterogeneous “pieces of information” — large collections of data,
decision procedures of any kind and complexity and connections between them.
The essence of our proposal is to lift Codd’s relational algebra from
operations on relational tables to operations on classes of structures (with
recursion), and to add a direction of information propagation. We observe the
presence of information propagation in several formalisms for efficient
reasoning and use it to express unary negation and operations used in graph
databases. We carefully analyze several reasoning tasks and establish a precise
connection between a generalized query evaluation and temporal logic model
checking. Our development allows us to reveal a general correspondence between
classical and modal logics and may shed a new light on the good computational
properties of modal logics and related formalisms.
Tamas Madl
Comments: 6 pages in NIPS 2016 Workshop on Machine Learning for Health (ML4HC)
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)
Despite of the pain and limited accuracy of blood tests for early recognition
of cardiovascular disease, they dominate risk screening and triage. On the
other hand, heart rate variability is non-invasive and cheap, but not
considered accurate enough for clinical practice. Here, we tackle heart beat
interval based classification with deep learning. We introduce an end to end
differentiable hybrid architecture, consisting of a layer of biological neuron
models of cardiac dynamics (modified FitzHugh Nagumo neurons) and several
layers of a standard feed-forward neural network. The proposed model is
evaluated on ECGs from 474 stable at-risk (coronary artery disease) patients,
and 1172 chest pain patients of an emergency department. We show that it can
significantly outperform models based on traditional heart rate variability
predictors, as well as approaching or in some cases outperforming clinical
blood tests, based only on 60 seconds of inter-beat intervals.
Antonio M. Lopez, Jiaolong Xu, Jose L. Gomez, David Vazquez, German Ros
Comments: Invited book chapter to appear in “Domain Adaptation in Computer Vision Applications”, Springer Series: Advances in Computer Vision and Pattern Recognition, Edited by Gabriela Csurka
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Supervised learning tends to produce more accurate classifiers than
unsupervised learning in general. This implies that training data is preferred
with annotations. When addressing visual perception challenges, such as
localizing certain object classes within an image, the learning of the involved
classifiers turns out to be a practical bottleneck. The reason is that, at
least, we have to frame object examples with bounding boxes in thousands of
images. A priori, the more complex the model is regarding its number of
parameters, the more annotated examples are required. This annotation task is
performed by human oracles, which ends up in inaccuracies and errors in the
annotations (aka ground truth) since the task is inherently very cumbersome and
sometimes ambiguous. As an alternative we have pioneered the use of virtual
worlds for collecting such annotations automatically and with high precision.
However, since the models learned with virtual data must operate in the real
world, we still need to perform domain adaptation (DA). In this chapter we
revisit the DA of a deformable part-based model (DPM) as an exemplifying case
of virtual- to-real-world DA. As a use case, we address the challenge of
vehicle detection for driver assistance, using different publicly available
virtual-world data. While doing so, we investigate questions such as: how does
the domain gap behave due to virtual-vs-real data with respect to dominant
object appearance per domain, as well as the role of photo-realism in the
virtual world.
Vikas K. Garg, Adam Tauman Kalai
Comments: 22 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We introduce a new paradigm to investigate unsupervised learning, reducing
unsupervised learning to supervised learning. Specifically, we mitigate the
subjectivity in unsupervised decision-making by leveraging knowledge acquired
from prior, possibly heterogeneous, supervised learning tasks. We demonstrate
the versatility of our framework via comprehensive expositions and detailed
experiments on several unsupervised problems such as (a) clustering, (b)
outlier detection, and (c) similarity prediction under a common umbrella of
meta-unsupervised-learning. We also provide rigorous PAC-agnostic bounds to
establish the theoretical foundations of our framework, and show that our
framing of meta-clustering circumvents Kleinberg’s impossibility theorem for
clustering.
Toni Heidenreich
Comments: Literature review prepared as a student at King’s College London
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Defining various dishonest notions in a formal way is a key step to enable
intelligent agents to act in untrustworthy environments. This review evaluates
the literature for this topic by looking at formal definitions based on modal
logic as well as other formal approaches. Criteria from philosophical
groundwork is used to assess the definitions for correctness and completeness.
The key contribution of this review is to show that only a few definitions
fully comply with this gold standard and to point out the missing steps towards
a successful application of these definitions in an actual agent environment.
Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, Fei Sha
Comments: Our implementation is available on this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Objects appear to scale differently in natural images. This fact requires
methods dealing with object-centric tasks e.g. object proposal to have robust
performance over scale variances of objects. In the paper we present a novel
segment proposal framework, namely FastMask, which takes advantage of the
hierarchical structure in deep convolutional neural network to segment
multi-scale objects in one shot. Innovatively, we generalize segment proposal
network into three different functional components (body, neck and head). We
further propose a weight-shared residual neck module as well as a
scale-tolerant attentional head module for multi-scale training and efficient
one-shot inference. On MS COCO benchmark, the proposed FastMask outperforms all
state-of-the-art segment proposal methods in average recall while keeping 2~5
times faster. More impressively, with a slight trade-off in accuracy, FastMask
can segment objects in near real time (~13 fps) at 800( imes)600 resolution
images, highlighting its potential in practical applications. Our
implementation is available on this https URL
Alexander Amini, Berthold Horn, Alan Edelman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Performance (cs.PF)
Convolutions have long been regarded as fundamental to applied mathematics,
physics and engineering. Their mathematical elegance allows for common tasks
such as numerical differentiation to be computed efficiently on large data
sets. Efficient computation of convolutions is critical to artificial
intelligence in real-time applications, like machine vision, where convolutions
must be continuously and efficiently computed on tens to hundreds of kilobytes
per second. In this paper, we explore how convolutions are used in fundamental
machine vision applications. We present an accelerated n-dimensional
convolution package in the high performance computing language, Julia, and
demonstrate its efficacy in solving the time to contact problem for machine
vision. Results are measured against synthetically generated videos and
quantitatively assessed according to their mean squared error from the ground
truth. We achieve over an order of magnitude decrease in compute time and
allocated memory for comparable machine vision applications. All code is
packaged and integrated into the official Julia Package Manager to be used in
various other scenarios.
David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
One of the key challenges of artificial intelligence is to learn models that
are effective in the context of planning. In this document we introduce the
predictron architecture. The predictron consists of a fully abstract model,
represented by a Markov reward process, that can be rolled forward multiple
“imagined” planning steps. Each forward pass of the predictron accumulates
internal rewards and values over multiple planning depths. The predictron is
trained end-to-end so as to make these accumulated values accurately
approximate the true value function. We applied the predictron to procedurally
generated random mazes and a simulator for the game of pool. The predictron
yielded significantly more accurate predictions than conventional deep neural
network architectures.
Chao-Hsuan Ke, Tsung-Lu Michael Lee, Jung-Hsien Chiang
Subjects: Information Retrieval (cs.IR)
Summary: Abstracts in biomedical articles can provide a quick overview of the
articles but detailed information cannot be obtained without reading full-text
contents. Full-text articles certainly generate more information and contents;
however, accessing full-text documents is usually time consuming. Condensedly
is a web-based application, which provides readers an easy and efficient way to
access full-text paragraphs using sentences in abstracts as fishing bait to
retrieve the big fish reside in full-text. Condensedly is based on the
paragraph ranking algorithm, which evaluates and ranks full-text paragraphs
based on their association scores with sentences in abstracts.
Availability: this http URL
Vladimir V. Bochkarev, Eduard Yu.Lerner, Anna V. Shevlyakova
Comments: 8 pages, 6 figures
Subjects: Computation and Language (cs.CL); Physics and Society (physics.soc-ph)
This article is devoted to the verification of the empirical Heaps law in
European languages using Google Books Ngram corpus data. The connection between
word distribution frequency and expected dependence of individual word number
on text size is analysed in terms of a simple probability model of text
generation. It is shown that the Heaps exponent varies significantly within
characteristic time intervals of 60-100 years.
Jonathan Godwin, Pontus Stenetorp, Sebastian Riedel
Subjects: Computation and Language (cs.CL)
In this paper we present a novel Neural Network algorithm for conducting
semi-supervised learning for sequence labeling tasks arranged in a
linguistically motivated hierarchy. This relationship is exploited to
regularise the representations of supervised tasks by backpropagating the error
of the unsupervised task through the supervised tasks. We introduce a neural
network where lower layers are supervised by junior downstream tasks and the
final layer task is an auxiliary unsupervised task. The architecture shows
improvements of up to two percentage points F1 for Chunking compared to a
plausible baseline.
Peter Potash, Alexey Romanov, Anna Rumshisky
Comments: 10 pages; under review for ICLR
Subjects: Computation and Language (cs.CL)
One of the major goals in automated argumentation mining is to uncover the
argument structure present in argumentative text. In order to determine this
structure, one must understand how different individual components of the
overall argument are linked. General consensus in this field dictates that the
argument components form a hierarchy of persuasion, which manifests itself in a
tree structure. This work provides the first neural network-based approach to
argumentation mining, focusing on the two tasks of extracting links between
argument components, and classifying types of argument components. In order to
solve this problem, we propose to use a joint model that is based on a Pointer
Network architecture. A Pointer Network is appealing for this task for the
following reasons: 1) It takes into account the sequential nature of argument
components; 2) By construction, it enforces certain properties of the tree
structure present in argument relations; 3) The hidden representations can be
applied to auxiliary tasks. In order to extend the contribution of the original
Pointer Network model, we construct a joint model that simultaneously attempts
to learn the type of argument component, as well as continuing to predict links
between argument components. The proposed joint model achieves state-of-the-art
results on two separate evaluation corpora, achieving far superior performance
than a regular Pointer Network model. Our results show that optimizing for both
tasks, and adding a fully-connected layer prior to recurrent neural network
input, is crucial for high performance.
Yonatan Belinkov, Alexander Magidow, Maxim Romanov, Avi Shmidman, Moshe Koppel
Comments: Slightly expanded version of Coling LT4DH workshop paper
Subjects: Computation and Language (cs.CL)
Arabic is a widely-spoken language with a rich and long history spanning more
than fourteen centuries. Yet existing Arabic corpora largely focus on the
modern period or lack sufficient diachronic information. We develop a
large-scale, historical corpus of Arabic of about 1 billion words from diverse
periods of time. We clean this corpus, process it with a morphological
analyzer, and enhance it by detecting parallel passages and automatically
dating undated texts. We demonstrate its utility with selected case-studies in
which we show its application to the digital humanities.
Natalia Bezerra Mota, Sylvia Pinheiro, Mariano Sigman, Diego Fernandez Slezak, Guillermo Cecchi, Mauro Copelli, Sidarta Ribeiro
Comments: Natalia Bezerra Mota and Sylvia Pinheiro: Equal contribution Sidarta Ribeiro and Mauro Copelli: Corresponding authors
Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Physics and Society (physics.soc-ph)
Discourse varies with age, education, psychiatric state and historical epoch,
but the ontogenetic and cultural dynamics of discourse structure remain to be
quantitatively characterized. To this end we investigated word graphs obtained
from verbal reports of 200 subjects ages 2-58, and 676 literary texts spanning
~5,000 years. In healthy subjects, lexical diversity, graph size, and
long-range recurrence departed from initial near-random levels through a
monotonic asymptotic increase across ages, while short-range recurrence showed
a corresponding decrease. These changes were explained by education and suggest
a hierarchical development of discourse structure: short-range recurrence and
lexical diversity stabilize after elementary school, but graph size and
long-range recurrence only stabilize after high school. This gradual maturation
was blurred in psychotic subjects, who maintained in adulthood a near-random
structure. In literature, monotonic asymptotic changes over time were
remarkable: While lexical diversity, long-range recurrence and graph size
increased away from near-randomness, short-range recurrence declined, from
above to below random levels. Bronze Age texts are structurally similar to
childish or psychotic discourses, but subsequent texts converge abruptly to the
healthy adult pattern around the onset of the Axial Age (800-200 BC), a period
of pivotal cultural change. Thus, individually as well as historically,
discourse maturation increases the range of word recurrence away from
randomness.
Pei Xie, Keyou You, Roberto Tempo, Shiji Song, Cheng Wu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
This paper considers a distributed convex optimization problem with
inequality constraints over time-varying unbalanced digraphs, where the cost
function is a sum of local objectives, and each node of the graph only knows
its local objective and inequality constraints. Although there is a vast
literature on distributed optimization, most of them require the graph to be
balanced, which is quite restrictive and not necessary. Very recently, the
unbalanced problem has been resolved only for either time-invariant graphs or
unconstrained optimization. This work addresses the unbalancedness by focusing
on an epigraph form of the constrained optimization. A striking feature is that
this novel idea can be easily used to study time-varying unbalanced digraphs.
Under local communications, a simple iterative algorithm is then designed for
each node. We prove that if the graph is uniformly jointly strongly connected,
each node asymptotically converges to some common optimal solution.
Kaipeng Li, Amanullah Ghazi, Chance Tarver, Jani Boutellier, Mahmoud Abdelaziz, Lauri Anttila, Markku Juntti, Mikko Valkama, Joseph R. Cavallaro
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Digital predistortion (DPD) is a widely adopted baseband processing technique
in current radio transmitters. While DPD can effectively suppress unwanted
spurious spectrum emissions stemming from imperfections of analog RF and
baseband electronics, it also introduces extra processing complexity and poses
challenges on efficient and flexible implementations, especially for mobile
cellular transmitters, considering their limited computing power compared to
basestations. In this paper, we present high data rate implementations of
broadband DPD on modern embedded processors, such as mobile GPU and multicore
CPU, by taking advantage of emerging parallel computing techniques for
exploiting their computing resources. We further verify the suppression effect
of DPD experimentally on real radio hardware platforms. Performance evaluation
results of our DPD design demonstrate the high efficacy of modern general
purpose mobile processors on accelerating DPD processing for a mobile
transmitter.
Muhammad Farooq-i-Azam, Muhammad Naeem Ayyaz, Saleem Akhtar
Comments: arXiv admin note: text overlap with arXiv:1611.03420
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
We propose a localization algorithm for wireless sensor networks, which is
simple in design, does not involve significant overhead and yet provides
acceptable position estimates of sensor nodes. The algorithm uses settled nodes
as beacon nodes so as to increase the number of beacon nodes. The algorithm is
range free and does not need any additional piece of hardware for ranging. It
also does not involve any significant communication overhead for localization.
The simulation and results show that good localization accuracy is achieved for
outdoor environments.
Adrian Kosowski, Przemysław Uznański
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
We consider a dynamical process in a network which distributes all tokens
located at a node among its neighbors, in a round-robin manner.
We show that in the recurrent state of this dynamics, the number of particles
located on a given edge, averaged over an interval of time, is tightly
concentrated around the average particle density in the system. Formally, for a
system of (k) particles in a graph of (m) edges, during any interval of length
(T), this time-averaged value is (k/m pm widetilde{O}(1/T)), whenever
(gcd(m,k) = widetilde{O}(1)) (and so, e.g., whenever (m) is a prime number).
To achieve these bounds, we link the behavior of the studied dynamics to
ergodic properties of traversals based on Eulerian circuits on a symmetric
directed graph. These results are proved through sum set methods and are likely
to be of independent interest.
As a corollary, we also obtain bounds on the emph{idleness} of the studied
dynamics, i.e., on the longest possible time between two consecutive
appearances of a token on an edge, taken over all edges. Designing trajectories
for (k) tokens in a way which minimizes refresh time is fundamental to the
study of patrolling in networks. Our results immediately imply a bound of
(widetilde{O}(m/k)) on the idleness of the studied process, thus showing that
it is a distributed (widetilde{O}(1))-competitive solution to the patrolling
task, for all of the covered cases.
Amani S. Ibrahim, James Hamlyn-Harris, John Grundy
Comments: 6 pages Published in APSEC 2010 Workshop on Cloud Computing
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)
The cloud computing model is rapidly transforming the IT landscape. Cloud
computing is a new computing paradigm that delivers computing resources as a
set of reliable and scalable internet-based services allowing customers to
remotely run and manage these services. Infrastructure-as-a-service (IaaS) is
one of the popular cloud computing services. IaaS allows customers to increase
their computing resources on the fly without investing in new hardware. IaaS
adapts virtualization to enable on-demand access to a pool of virtual computing
resources. Although there are great benefits to be gained from cloud computing,
cloud computing also enables new categories of threats to be introduced. These
threats are a result of the cloud virtual infrastructure complexity created by
the adoption of the virtualization technology.
Breaching the security of any component in the cloud virtual infrastructure
significantly impacts on the security of other components and consequently
affects the overall system security. This paper explores the security problem
of the cloud platform virtual infrastructure identifying the existing security
threats and the complexities of this virtual infrastructure. The paper also
discusses the existing security approaches to secure the cloud virtual
infrastructure and their drawbacks. Finally, we propose and explore some key
research challenges of implementing new virtualization-aware security solutions
that can provide the pre-emptive protection for complex and ever- dynamic cloud
virtual infrastructure.
Xingguo Li, Zhaoran Wang, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Tuo Zhao
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We propose a general theory for studying the geometry of nonconvex objective
functions with underlying symmetric structures. In specific, we characterize
the locations of stationary points and the null space of the associated Hessian
matrices via the lens of invariant groups. As a major motivating example, we
apply the proposed general theory to characterize the global geometry of the
low-rank matrix factorization problem. In particular, we illustrate how the
rotational symmetry group gives rise to infinitely many non-isolated strict
saddle points and equivalent global minima of the objective function. By
explicitly identifying all stationary points, we divide the entire parameter
space into three regions: ((cR_1)) the region containing the neighborhoods of
all strict saddle points, where the objective has negative curvatures;
((cR_2)) the region containing neighborhoods of all global minima, where the
objective enjoys strong convexity along certain directions; and ((cR_3)) the
complement of the above regions, where the gradient has sufficiently large
magnitudes. We further extend our result to the matrix sensing problem. This
allows us to establish strong global convergence guarantees for popular
iterative algorithms with arbitrary initial solutions.
Ofer Dekel
Subjects: Learning (cs.LG)
Linear predictors are especially useful when the data is high-dimensional and
sparse. One of the standard techniques used to train a linear predictor is the
Averaged Stochastic Gradient Descent (ASGD) algorithm. We present an efficient
implementation of ASGD that avoids dense vector operations. We also describe a
translation invariant extension called Centered Averaged Stochastic Gradient
Descent (CASGD).
John Glover
Subjects: Learning (cs.LG)
This paper describes a method for using Generative Adversarial Networks to
learn distributed representations of natural language documents. We propose a
model that is based on the recently proposed Energy-Based GAN, but instead uses
a Denoising Autoencoder as the discriminator network. Document representations
are extracted from the hidden layer of the discriminator and evaluated both
quantitatively and qualitatively.
Yunlong Liu, Hexing Zhu
Comments: 9 papges, 4 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Predictive State Representations (PSRs) are powerful techniques for modelling
dynamical systems, which represent a state as a vector of predictions about
future observable events (tests). In PSRs, one of the fundamental problems is
the learning of the PSR model of the underlying system. Recently, spectral
methods have been successfully used to address this issue by treating the
learning problem as the task of computing an singular value decomposition (SVD)
over a submatrix of a special type of matrix called the Hankel matrix. Under
the assumptions that the rows and columns of the submatrix of the Hankel Matrix
are sufficient~(which usually means a very large number of rows and columns,
and almost fails in practice) and the entries of the matrix can be estimated
accurately, it has been proven that the spectral approach for learning PSRs is
statistically consistent and the learned parameters can converge to the true
parameters. However, in practice, due to the limit of the computation ability,
only a finite set of rows or columns can be chosen to be used for the spectral
learning. While different sets of columns usually lead to variant accuracy of
the learned model, in this paper, we propose an approach for selecting the set
of columns, namely basis selection, by adopting a concept of model entropy to
measure the accuracy of the learned model. Experimental results are shown to
demonstrate the effectiveness of the proposed approach.
Elchanan Mossel
Subjects: Learning (cs.LG)
In this paper we propose a new prism for studying deep learning motivated by
connections between deep learning and evolution. Our main contributions are: 1,
We introduce of a sequence of increasingly complex hierarchal generative models
which interpolate between standard Markov models on trees (phylogenetic models)
and deep learning models. 2. Formal definitions of classes of algorithms that
are not deep. 3. Rigorous proofs showing that such classes are information
theoretically much weaker than optimal “deep” learning algorithms. In our
models, deep learning is performed efficiently and proven to classify correctly
with high probability. All of the models and results are in the semi-supervised
setting. Many open problems and future directions are presented.
Vikas K. Garg, Adam Tauman Kalai
Comments: 22 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We introduce a new paradigm to investigate unsupervised learning, reducing
unsupervised learning to supervised learning. Specifically, we mitigate the
subjectivity in unsupervised decision-making by leveraging knowledge acquired
from prior, possibly heterogeneous, supervised learning tasks. We demonstrate
the versatility of our framework via comprehensive expositions and detailed
experiments on several unsupervised problems such as (a) clustering, (b)
outlier detection, and (c) similarity prediction under a common umbrella of
meta-unsupervised-learning. We also provide rigorous PAC-agnostic bounds to
establish the theoretical foundations of our framework, and show that our
framing of meta-clustering circumvents Kleinberg’s impossibility theorem for
clustering.
David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
One of the key challenges of artificial intelligence is to learn models that
are effective in the context of planning. In this document we introduce the
predictron architecture. The predictron consists of a fully abstract model,
represented by a Markov reward process, that can be rolled forward multiple
“imagined” planning steps. Each forward pass of the predictron accumulates
internal rewards and values over multiple planning depths. The predictron is
trained end-to-end so as to make these accumulated values accurately
approximate the true value function. We applied the predictron to procedurally
generated random mazes and a simulator for the game of pool. The predictron
yielded significantly more accurate predictions than conventional deep neural
network architectures.
Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Many machine learning applications use latent variable models to explain
structure in data, whereby visible variables (= coordinates of the given
datapoint) are explained as a probabilistic function of some hidden variables.
Finding parameters with the maximum likelihood is NP-hard even in very simple
settings. In recent years, provably efficient algorithms were nevertheless
developed for models with linear structures: topic models, mixture models,
hidden markov models, etc. These algorithms use matrix or tensor decomposition,
and make some reasonable assumptions about the parameters of the underlying
model.
But matrix or tensor decomposition seems of little use when the latent
variable model has nonlinearities. The current paper shows how to make
progress: tensor decomposition is applied for learning the single-layer {em
noisy or} network, which is a textbook example of a Bayes net, and used for
example in the classic QMR-DT software for diagnosing which disease(s) a
patient may have by observing the symptoms he/she exhibits.
The technical novelty here, which should be useful in other settings in
future, is analysis of tensor decomposition in presence of systematic error
(i.e., where the noise/error is correlated with the signal, and doesn’t
decrease as number of samples goes to infinity). This requires rethinking all
steps of tensor decomposition methods from the ground up.
For simplicity our analysis is stated assuming that the network parameters
were chosen from a probability distribution but the method seems more generally
applicable.
Manuel Martin Salvador, Marcin Budka, Bogdan Gabrys
Subjects: Learning (cs.LG)
Composition and parametrisation of multicomponent predictive systems (MCPSs)
consisting of chains of data transformation steps is a challenging task. This
paper is concerned with theoretical considerations and extensive experimental
analysis for automating the task of building such predictive systems. In the
theoretical part of the paper, we first propose to adopt the Well-handled and
Acyclic Workflow (WA-WF) Petri net as a formal representation of MCPSs. We then
define the optimisation problem in which the search space consists of suitably
parametrised directed acyclic graphs (i.e. WA-WFs) forming the sought MCPS
solutions. In the experimental analysis we focus on examining the impact of
considerably extending the search space resulting from incorporating multiple
sequential data cleaning and preprocessing steps in the process of composing
optimised MCPSs, and the quality of the solutions found. In a range of
extensive experiments three different optimisation strategies are used to
automatically compose MCPSs for 21 publicly available datasets and 7 datasets
from real chemical processes. The diversity of the composed MCPSs found is an
indication that fully and automatically exploiting different combinations of
data cleaning and preprocessing techniques is possible and highly beneficial
for different predictive models. Our findings can have a major impact on
development of high quality predictive models as well as their maintenance and
scalability aspects needed in modern applications and deployment scenarios.
Ping Li
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Following the very recent line of work on the “generalized min-max” (GMM)
kernel, this study proposes the “generalized intersection” (GInt) kernel and
the related “normalized generalized min-max” (NGMM) kernel. In computer
vision, the (histogram) intersection kernel has been popular, and the GInt
kernel generalizes it to data which can have both negative and positive
entries. Through an extensive empirical classification study on 40 datasets
from the UCI repository, we are able to show that this (tuning-free) GInt
kernel performs fairly well.
The empirical results also demonstrate that the NGMM kernel typically
outperforms the GInt kernel. Interestingly, the NGMM kernel has another
interpretation — it is the “asymmetrically transformed” version of the GInt
kernel, based on the idea of “asymmetric hashing”. Just like the GMM kernel,
the NGMM kernel can be efficiently linearized through (e.g.,) generalized
consistent weighted sampling (GCWS), as empirically validated in our study.
Owing to the discrete nature of hashed values, it also provides a scheme for
approximate near neighbor search.
Tamas Madl
Comments: 6 pages in NIPS 2016 Workshop on Machine Learning for Health (ML4HC)
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG)
Despite of the pain and limited accuracy of blood tests for early recognition
of cardiovascular disease, they dominate risk screening and triage. On the
other hand, heart rate variability is non-invasive and cheap, but not
considered accurate enough for clinical practice. Here, we tackle heart beat
interval based classification with deep learning. We introduce an end to end
differentiable hybrid architecture, consisting of a layer of biological neuron
models of cardiac dynamics (modified FitzHugh Nagumo neurons) and several
layers of a standard feed-forward neural network. The proposed model is
evaluated on ECGs from 474 stable at-risk (coronary artery disease) patients,
and 1172 chest pain patients of an emergency department. We show that it can
significantly outperform models based on traditional heart rate variability
predictors, as well as approaching or in some cases outperforming clinical
blood tests, based only on 60 seconds of inter-beat intervals.
Gianluigi Pillonetto
Subjects: Systems and Control (cs.SY); Learning (cs.LG); Machine Learning (stat.ML)
Learning from examples is one of the key problems in science and engineering.
It deals with function reconstruction from a finite set of direct and noisy
samples. Regularization in reproducing kernel Hilbert spaces (RKHSs) is widely
used to solve this task and includes powerful estimators such as regularization
networks. Recent achievements include the proof of the statistical consistency
of these kernel- based approaches. Parallel to this, many different system
identification techniques have been developed but the interaction with machine
learning does not appear so strong yet. One reason is that the RKHSs usually
employed in machine learning do not embed the information available on dynamic
systems, e.g. BIBO stability. In addition, in system identification the
independent data assumptions routinely adopted in machine learning are never
satisfied in practice. This paper provides new results which strengthen the
connection between system identification and machine learning. Our starting
point is the introduction of RKHSs of dynamic systems. They contain functionals
over spaces defined by system inputs and allow to interpret system
identification as learning from examples. In both linear and nonlinear
settings, it is shown that this perspective permits to derive in a relatively
simple way conditions on RKHS stability (i.e. the property of containing only
BIBO stable systems or predictors), also facilitating the design of new kernels
for system identification. Furthermore, we prove the convergence of the
regularized estimator to the optimal predictor under conditions typical of
dynamic systems.
Chaoyun Zhang, Mingjun Zhong, Zongzuo Wang, Nigel Goddard, Charles Sutton
Comments: 15 pages, 7 figures
Subjects: Applications (stat.AP); Learning (cs.LG)
Energy disaggregation (a.k.a nonintrusive load monitoring, NILM), a
single-channel blind source separation problem, aims to decompose the mains
which records the whole electricity consumption into appliance-wise readings.
This problem is difficult because it is inherently unidentifiable. Recent
approaches have shown that the identifiability problem could be reduced by
introducing domain knowledge into the model. Deep neural networks have been
shown to be promising to tackle this problem in literatures. However, it is not
clear why and how the neural networks could work for this problem. In this
paper, we propose sequence-to-point learning for NILM, where the input is a
window of the mains and the output is a single point of the target appliance.
We use convolutional neural networks to train the model. Interestingly, we
systematically show that the convolutional neural networks can inherently learn
the signatures of the target appliances, which are automatically added into the
model to reduce the identifiability problem. We applied the proposed neural
network approaches to a real-world household energy data, and show that the
methods achieve the state-of-the-art performance.
Shixiang Chen, Shiqian Ma
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we extend the geometric descent method recently proposed by
Bubeck, Lee and Singh to solving nonsmooth and strongly convex composite
problems. We prove that the resulting algorithm, GeoPG, converges with a linear
rate ((1-1/sqrt{kappa})), thus achieves the optimal rate among first-order
methods, where (kappa) is the condition number of the problem. Numerical
results on linear regression and logistic regression with elastic net
regularization show that GeoPG compares favorably with Nesterov’s accelerated
proximal gradient method.
Huan Song, Jayaraman J. Thiagarajan, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Andreas Spanias
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Kernel fusion is a popular and effective approach for combining multiple
features that characterize different aspects of data. Traditional approaches
for Multiple Kernel Learning (MKL) attempt to learn the parameters for
combining the kernels through sophisticated optimization procedures. In this
paper, we propose an alternative approach that creates dense embeddings for
data using the kernel similarities and adopts a deep neural network
architecture for fusing the embeddings. In order to improve the effectiveness
of this network, we introduce the kernel dropout regularization strategy
coupled with the use of an expanded set of composition kernels. Experiment
results on a real-world activity recognition dataset show that the proposed
architecture is effective in fusing kernels and achieves state-of-the-art
performance.
Nicolas Le Roux
Comments: 12 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)
We tackle the issue of finding a good policy when the number of policy
updates is limited. This is done by approximating the expected policy reward as
a sequence of concave lower bounds which can be efficiently maximized,
drastically reducing the number of policy updates required to achieve good
performance. We also extend existing methods to negative rewards, enabling the
use of control variates.
Jesse H. Krijthe, Marco Loog
Comments: 16 pages, 1 figure, 1 table
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We show that for linear classifiers defined by convex margin-based surrogate
losses that are monotonically decreasing, it is impossible to construct any
semi-supervised approach that is able to guarantee an improvement over the
supervised classifier measured by this surrogate loss. For non-monotonically
decreasing loss functions, we demonstrate safe improvements are possible.
Galen Reeves
Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)
This paper addresses the question of when projections of a high-dimensional
random vector are approximately Gaussian. This problem has been studied
previously in the context of high-dimensional data analysis, where the focus is
on low-dimensional projections of high-dimensional point clouds. The focus of
this paper is on the typical behavior when the projections are generated by an
i.i.d. Gaussian projection matrix. The main results are bounds on the deviation
between the conditional distribution of the projections and a Gaussian
approximation, where the conditioning is on the projection matrix. The bounds
are given in terms of the quadratic Wasserstein distance and relative entropy
and are stated explicitly as a function of the number of projections and
certain key properties of the random vector. The proof uses Talagrand’s
transportation inequality and a general integral-moment inequality for mutual
information. Applications to random linear estimation and compressed sensing
are discussed.
Sergey V. Zhidkov
Subjects: Information Theory (cs.IT)
Nonlinear distortion in power amplifiers (PA) can significantly degrade
performance of orthogonal frequency division multiplexed (OFDM) communication
systems. This paper presents a joint maximum-likelihood channel frequency
response and nonlinear PA model estimator for OFDM signals. Derivation of the
estimator is based on Taylor-series representation of power amplifier
nonlinearity and is suitable for wide range of memoryless PA models. A
sub-optimal decision-aided algorithm for adaptive compensation of nonlinear
distortion effects at the receiver-side is also presented. It is shown that the
proposed algorithms can be used in IEEE 802.11a/g/p/ac compliant wireless LAN
receivers without any modifications at the transmitter side. The performance of
the proposed algorithms is studied by means of computer simulation.
Yuri Suhov, Izabella Stuhl
Subjects: Information Theory (cs.IT); Probability (math.PR)
The weighted entropy (H^{
m w}_phi (X)=H^{
m w}_phi (f)) of a random
variable (X) with values (x) and a probability-mass/density function (f) is
defined as the mean value ({mathbb E} I^{
m w}_phi(X)) of the weighted
information (I^{
m w}_phi (x)=-phi (x)log,f(x)). Here (xmapstophi
(x)in{mathbb R}) is a given weight function (WF) indicating a ‘value’ of
outcome (x). For an (n)-component random vector
({mathbf{X}}_0^{n-1}=(X_0,ldots ,X_{n-1})) produced by a random process
({mathbf{X}}=(X_i,iin{mathbb Z})), the weighted information (I^{
m
w}_{phi_n}({mathbf x}_0^{n-1})) and weighted entropy (H^{
m
w}_{phi_n}({mathbf{X}}_0^{n-1})) are defined similarly, with an WF
(phi_n({mathbf x}_0^{n-1})). Two types of WFs (phi_n) are considered, based
on additive and a multiplicative forms ((phi_n({mathbf
x}_0^{n-1})=sumlimits_{i=0}^{n-1}{varphi} (x_i)) and (phi_n({mathbf
x}_0^{n-1})=prodlimits_{i=0}^{n-1}{varphi} (x_i)), respectively). The focus
is upon ({it rates}) of the weighted entropy and information, regarded as
parameters related to ({mathbf{X}}). We show that, in the context of
ergodicity, a natural scale for an asymptotically additive/multiplicative WF is
(frac{1}{n^2}H^{
m w}_{phi_n}({mathbf{X}}_0^{n-1})) and
(frac{1}{n}log;H^{
m w}_{phi_n}({mathbf{X}}_0^{n-1})), respectively. This
gives rise to ({it primary}) ({it rates}). The next-order terms can also be
identified, leading to ({it secondary}) ({it rates}). We also consider
emerging generalisations of the Shannon-McMillan-Breiman theorem.
Jesús Gómez-Vilardebó
Subjects: Information Theory (cs.IT)
We consider a cache network in which a single server is connected to multiple
users via a shared error free link. The server has access do a database with
(N) files of equal length (F), and serves (K) users each with a cache memory of
(MF) bits. A novel centralized coded caching scheme is proposed for scenarios
with more users than files (Nleq K) and cache capacities satisfying
(frac{1}{K}leq Mleqfrac{N}{K}). It is shown, that the proposed scheme
outperforms the best delivery rate known in the literature.
Chao Wang, Qing Zhao, Kobi Cohen
Subjects: Information Theory (cs.IT)
The problem of detecting a few anomalous processes among a large number of
processes is considered. At each time, aggregated observations can be taken
from a chosen subset of processes, where the chosen subset must conform to a
given binary tree structure. The random observations are i.i.d. over time with
a distribution that depends on the size of the chosen subset and the number of
anomalous processes in the subset. The objective is a sequential search
strategy that minimizes the sample complexity (i.e., expected detection time)
subject to a reliability constraint. A sequential test that results in a biased
random walk on the tree is developed and is shown to be asymptotically optimal
in the reliability constraint and order optimal in the number of processes
under certain conditions on the hierarchical observation model.
Ziyang Yuan, Hongxia Wang
Comments: 21 pages, 11 figures
Subjects: Information Theory (cs.IT)
Phase retrieval(PR) problem is a kind of ill-condition inverse problem which
is arising in various of applications. Based on the Wirtinger flow(WF) method,
a reweighted Wirtinger flow(RWF) method is proposed to deal with PR problem.
RWF finds the global optimum by solving a series of sub-PR problems with
changing weights. Theoretical analyses illustrate that the RWF has a geometric
convergence from a deliberate initialization when the weights are bounded by 1
and (frac{10}{9}). Numerical testing shows RWF has a lower sampling complexity
compared with WF. As an essentially adaptive truncated Wirtinger flow(TWF)
method, RWF performs better than TWF especially when the ratio between sampling
number (m) and length of signal (n) is small.
Biao He, Shihao Yan, Xiangyun Zhou, Vincent K. N. Lau
Comments: to appear in IEEE Communications Letters
Subjects: Information Theory (cs.IT)
Prior studies on covert communication with noise uncertainty adopted a
worst-case approach from the warden’s perspective. That is, the worst-case
detection performance of the warden is used to assess covertness, which is
overly optimistic. Instead of simply considering the worst limit, in this work,
we take the distribution of noise uncertainty into account to evaluate the
overall covertness in a statistical sense. Specifically, we define new metrics
for measuring the covertness, which are then adopted to analyze the maximum
achievable rate for a given covertness requirement under both bounded and
unbounded noise uncertainty models.
Mladen Kovačević, Vincent Y. F. Tan
Comments: 5 pages (double-column)
Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM)
This paper is motivated by the error-control problem in communication
channels in which the transmitted sequences are subjected to random
permutations, in addition to being impaired with insertions, deletions,
substitutions, and erasures of symbols. Bounds on the size of optimal codes in
this setting are derived, and their asymptotic behavior examined in the
fixed-minimum-distance regime. A family of codes correcting these types of
errors is described and is shown to be asymptotically optimal for some sets of
parameters. The corresponding error-detection problem is also analyzed.
Applications to data transmission over packet networks based on multipath
routing are discussed.
Nian Li, Tor Helleseth
Comments: Cryptography and Communications
Subjects: Information Theory (cs.IT)
Motivated by recent results on the constructions of permutation polynomials
with few terms over the finite field (mathbb{F}_{2^n}), where (n) is a
positive even integer, we focus on the construction of permutation trinomials
over (mathbb{F}_{2^n}) from Niho exponents. As a consequence, several new
classes of permutation trinomials over (mathbb{F}_{2^n}) are constructed from
Niho exponents based on some subtle manipulation of solving equations with low
degrees over finite fields.
Hirofumi Tsuda, Ken Umeno
Comments: 11 pages 20 figures
Subjects: Information Theory (cs.IT)
Signal to Noise Ratio (SNR) is an important index for wireless
communications. In CDMA systems, spreading sequences are utilized. This series
of papers show the method to derive spreading sequences as the solutions of
non-linear programming: maximize SNR. In this paper, we derive the optimization
problems with the expression SNR derived in Part I and the necessary conditions
for the global solutions. We numerically solve the problems and evaluate the
solutions with two factors, mean-square correlations and maximum mean-square
correlations.
Swapan Rana, Preeti Parashar, Andreas Winter, Maciej Lewenstein
Comments: 5+4 pages, 2 figures, comments welcome
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)
We show that the distillable coherence–which is equal to the relative
entropy of coherence, is, at most up to a constant, always bounded by the
(l_1)-norm based measure of coherence (which equals to the sum of absolute
values of off-diagonals). Thus, the latter plays a similar role as logarithmic
negativity plays in entanglement theory, and this is the best operational
interpretation from resource-theoretic viewpoint. Consequently the two measures
are intimately connected to another operational measure, the robustness of
coherence. We also find the tightest possible bound between the two, namely,
given (l_1)-coherence we find states having minimum (and maximum for pure
state) distillable coherence. For any given robustness, the same state also
minimizes the distillable coherence.