Olalekan Ogunmolu, Xuejun Gu, Steve Jiang, Nicholas Gans
Comments: American Control Conference, 2017
Subjects: Neural and Evolutionary Computing (cs.NE)
Neural networks are known to be effective function approximators. Recently,
deep neural networks have proven to be very effective in pattern recognition,
classification tasks and human-level control to model highly nonlinear
realworld systems. This paper investigates the effectiveness of deep neural
networks in the modeling of dynamical systems with complex behavior. Three deep
neural network structures are trained on sequential data, and we investigate
the effectiveness of these networks in modeling associated characteristics of
the underlying dynamical systems. We carry out similar evaluations on select
publicly available system identification datasets. We demonstrate that deep
neural networks are effective model estimators from input-output data
Roberto Paredes, José-Miguel Benedí
Subjects: Neural and Evolutionary Computing (cs.NE)
Layers is an open source neural network toolkit aim at providing an easy way
to implement modern neural networks. The main user target are students and to
this end layers provides an easy scriptting language that can be early adopted.
The user has to focus only on design details as network totpology and parameter
tunning.
Anthony Caterini, Dong Eui Chang
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Deep Neural Networks (DNNs) have become very popular for prediction in many
areas. Their strength is in representation with a high number of parameters
that are commonly learned via gradient descent or similar optimization methods.
However, the representation is non-standardized, and the gradient calculation
methods are often performed using component-based approaches that break
parameters down into scalar units, instead of considering the parameters as
whole entities. In this work, these problems are addressed. Standard notation
is used to represent DNNs in a compact framework. Gradients of DNN loss
functions are calculated directly over the inner product space on which the
parameters are defined. This framework is general and is applied to two common
network types: the Multilayer Perceptron and the Deep Autoencoder.
Konstantinos Chatzilygeroudis (LORIA, LARSEN), Antoine Cully, Jean-Baptiste Mouret (LORIA, LARSEN)
Comments: Workshop on AI for Long-Term Autonomy at the IEEE International Conference on Robotics and Automation (ICRA), May 2016, Stockholm, Sweden. 2016
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
The recently introduced Intelligent Trial and Error algorithm (IT&E) enables
robots to creatively adapt to damage in a matter of minutes by combining an
off-line evolutionary algorithm and an on-line learning algorithm based on
Bayesian Optimization. We extend the IT&E algorithm to allow for robots to
learn to compensate for damages while executing their task(s). This leads to a
semi-episodic learning scheme that increases the robot’s lifetime autonomy and
adaptivity. Preliminary experiments on a toy simulation and a 6-legged robot
locomotion task show promising results.
Dmitry Yarotsky
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We study how approximation errors of neural networks with ReLU activation
functions depend on the depth of the network. We establish rigorous error
bounds showing that deep ReLU networks are significantly more expressive than
shallow ones as long as approximations of smooth functions are concerned. At
the same time, we show that on a set of functions constrained only by their
degree of smoothness, a ReLU network architecture cannot in general achieve
approximation accuracy with better than a power law dependence on the network
size, regardless of its depth.
Krzysztof Cpalka, Marcin Zalasinski, Leszek Rutkowski
Comments: 34 pages, 7 figures
Journal-ref: Applied Soft Computing, vol. 43, pp. 47-56, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Identity verification based on authenticity assessment of a handwritten
signature is an important issue in biometrics. There are many effective methods
for signature verification taking into account dynamics of a signing process.
Methods based on partitioning take a very important place among them. In this
paper we propose a new approach to signature partitioning. Its most important
feature is the possibility of selecting and processing of hybrid partitions in
order to increase a precision of the test signature analysis. Partitions are
formed by a combination of vertical and horizontal sections of the signature.
Vertical sections correspond to the initial, middle, and final time moments of
the signing process. In turn, horizontal sections correspond to the signature
areas associated with high and low pen velocity and high and low pen pressure
on the surface of a graphics tablet. Our previous research on vertical and
horizontal sections of the dynamic signature (created independently) led us to
develop the algorithm presented in this paper. Selection of sections, among
others, allows us to define the stability of the signing process in the
partitions, promoting signature areas of greater stability (and vice versa). In
the test of the proposed method two databases were used: public MCYT-100 and
paid BioSecure.
Matthias Kümmerer, Thomas S. A. Wallis, Matthias Bethge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Applications (stat.AP)
Here we present DeepGaze II, a model that predicts where people look in
images. The model uses the features from the VGG-19 deep neural network trained
to identify objects in images. Contrary to other saliency models that use deep
features, here we use the VGG features for saliency prediction with no
additional fine-tuning (rather, a few readout layers are trained on top of the
VGG features to predict saliency). The model is therefore a strong test of
transfer learning. After conservative cross-validation, DeepGaze II explains
about 87% of the explainable information gain in the patterns of fixations and
achieves top performance in area under the curve metrics on the MIT300 hold-out
benchmark. These results corroborate the finding from DeepGaze I (which
explained 56% of the explainable information gain), that deep features trained
on object recognition provide a versatile feature space for performing related
visual tasks. We explore the factors that contribute to this success and
present several informative image examples. A web service is available to
compute model predictions at this http URL
Nina Miolane (ASCLEPIOS), Susan Holmes, Xavier Pennec (ASCLEPIOS)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Differential Geometry (math.DG)
We use tools from geometric statistics to analyze the usual estimation
procedure of a template shape. This applies to shapes from landmarks, curves,
surfaces, images etc. We demonstrate the asymptotic bias of the template shape
estimation using the stratified geometry of the shape space. We give a Taylor
expansion of the bias with respect to a parameter $sigma$ describing the
measurement error on the data. We propose two bootstrap procedures that
quantify the bias and correct it, if needed. They are applicable for any type
of shape data. We give a rule of thumb to provide intuition on whether the bias
has to be corrected. This exhibits the parameters that control the bias’
magnitude. We illustrate our results on simulated and real shape data.
Kushal Kafle, Christopher Kanan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Visual Question Answering (VQA) is a recent problem in computer vision and
natural language processing that has garnered a large amount of interest from
the deep learning, computer vision, and natural language processing
communities. In VQA, an algorithm needs to answer text-based questions about
images. Since the release of the first VQA dataset in 2014, several additional
datasets have been released and many algorithms have been proposed. In this
review, we critically examine the current state of VQA in terms of problem
formulation, existing datasets, evaluation metrics, and algorithms. In
particular, we discuss the limitations of current datasets with regard to their
ability to properly train and assess VQA algorithms. We then exhaustively
review existing algorithms for VQA. Finally, we discuss possible future
directions for VQA and image understanding research.
Nicolas Papadakis, Julien Rabin
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
We investigate in this work a versatile convex framework for multiple image
segmentation, relying on the regularized optimal mass transport theory. In this
setting, several transport cost functions are considered and used to match
statistical distributions of features. In practice, global multidimensional
histograms are estimated from the segmented image regions, and are compared to
referring models that are either fixed histograms given a priori, or directly
inferred in the non-supervised case. The different convex problems studied are
solved efficiently using primal-dual algorithms. The proposed approach is
generic and enables multi-phase segmentation as well as co-segmentation of
multiple images.
Shaofei Wang, Charless C. Fowlkes
Comments: arXiv admin note: text overlap with arXiv:1412.2066
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We describe an end-to-end framework for learning parameters of min-cost flow
multi-target tracking problem with quadratic trajectory interactions including
suppression of overlapping tracks and contextual cues about cooccurrence of
different objects. Our approach utilizes structured prediction with a
tracking-specific loss function to learn the complete set of model parameters.
In this learning framework, we evaluate two different approaches to finding an
optimal set of tracks under a quadratic model objective, one based on an LP
relaxation and the other based on novel greedy variants of dynamic programming
that handle pairwise interactions. We find the greedy algorithms achieve almost
equivalent accuracy to the LP relaxation while being up to 10x faster than a
commercial LP solver. We evaluate trained models on three challenging
benchmarks. Surprisingly, we find that with proper parameter learning, our
simple data association model without explicit appearance/motion reasoning is
able to achieve comparable or better accuracy than many state-of-the-art
methods that use far more complex motion features or appearance affinity metric
learning.
Marie-Charlotte Desseroit (CHU Poitiers – Département de médecine nucléaire), Florent Tixier (CHU Poitiers – Département de médecine nucléaire), Wolfgang Weber, Barry A Siegel, Catherine Cheze Le Rest (CHU Poitiers – Département de médecine nucléaire), Dimitris Visvikis (LaTIM), Mathieu Hatt (LaTIM)
Comments: Journal of Nuclear Medicine, Society of Nuclear Medicine, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Purpose: The main purpose of this study was to assess the reliability of
shape and heterogeneity features in both Positron Emission Tomography (PET) and
low-dose Computed Tomography (CT) components of PET/CT. A secondary objective
was to investigate the impact of image quantization.Material and methods: A
Health Insurance Portability and Accountability Act -compliant secondary
analysis of deidentified prospectively acquired PET/CT test-retest datasets of
74 patients from multi-center Merck and ACRIN trials was performed.
Metabolically active volumes were automatically delineated on PET with Fuzzy
Locally Adaptive Bayesian algorithm. 3DSlicerTM was used to semi-automatically
delineate the anatomical volumes on low-dose CT components. Two quantization
methods were considered: a quantization into a set number of bins
(quantizationB) and an alternative quantization with bins of fixed width
(quantizationW). Four shape descriptors, ten first-order metrics and 26
textural features were computed. Bland-Altman analysis was used to quantify
repeatability. Features were subsequently categorized as very reliable,
reliable, moderately reliable and poorly reliable with respect to the
corresponding volume variability. Results: Repeatability was highly variable
amongst features. Numerous metrics were identified as poorly or moderately
reliable. Others were (very) reliable in both modalities, and in all categories
(shape, 1st-, 2nd- and 3rd-order metrics). Image quantization played a major
role in the features repeatability. Features were more reliable in PET with
quantizationB, whereas quantizationW showed better results in CT.Conclusion:
The test-retest repeatability of shape and heterogeneity features in PET and
low-dose CT varied greatly amongst metrics. The level of repeatability also
depended strongly on the quantization step, with different optimal choices for
each modality. The repeatability of PET and low-dose CT features should be
carefully taken into account when selecting metrics to build multiparametric
models.
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel approach for temporal and semantic segmentation
of edited videos into meaningful segments, from the point of view of the
storytelling structure. The objective is to decompose a long video into more
manageable sequences, which can in turn be used to retrieve the most
significant parts of it given a textual query and to provide an effective
summarization. Previous video decomposition methods mainly employed perceptual
cues, tackling the problem either as a story change detection, or as a
similarity grouping task, and the lack of semantics limited their ability to
identify story boundaries. Our proposal connects together perceptual, audio and
semantic cues in a specialized deep network architecture designed with a
combination of CNNs which generate an appropriate embedding, and clusters shots
into connected sequences of semantic scenes, i.e. stories. A retrieval
presentation strategy is also proposed, by selecting the semantically and
aesthetically “most valuable” thumbnails to present, considering the query in
order to improve the storytelling presentation. Finally, the subjective nature
of the task is considered, by conducting experiments with different annotators
and by proposing an algorithm to maximize the agreement between automatic
results and human annotators.
Samik Banerjee, Sukhendu Das
Comments: This is an extended version of the paper accepted in CVPR Biometric Workshop, 2016. arXiv admin note: text overlap with arXiv:1610.00660
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)
Face recognition (FR) is the most preferred mode for biometric-based
surveillance, due to its passive nature of detecting subjects, amongst all
different types of biometric traits. FR under surveillance scenario does not
give satisfactory performance due to low contrast, noise and poor illumination
conditions on probes, as compared to the training samples. A state-of-the-art
technology, Deep Learning, even fails to perform well in these scenarios. We
propose a novel soft-margin based learning method for multiple feature-kernel
combinations, followed by feature transformed using Domain Adaptation, which
outperforms many recent state-of-the-art techniques, when tested using three
real-world surveillance face datasets.
Marc-André Carbonneau, Eric Granger, Yazid Attabi, Ghyslain Gagnon
Comments: 12 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Several methods have recently been proposed to analyze speech and
automatically infer the personality of the speaker. These methods often rely on
prosodic and other hand crafted speech processing features extracted with
off-the-shelf toolboxes. To achieve high accuracy, numerous features are
typically extracted using complex and highly parameterized algorithms. In this
paper, a new method based on feature learning and spectrogram analysis is
proposed to simplify the feature extraction process while maintaining a high
level of accuracy. The proposed method learns a dictionary of discriminant
features from patches extracted in the spectrogram representations of training
speech segments. Each speech segment is then encoded using the dictionary, and
the resulting feature set is used to perform classification of personality
traits. Experiments indicate that the proposed method achieves state-of-the-art
results with a significant reduction in complexity when compared to the most
recent reference methods. The number of features, and difficulties linked to
the feature extraction process are greatly reduced as only one type of
descriptors is used, for which the 6 parameters can be tuned automatically. In
contrast, the simplest reference method uses 4 types of descriptors to which 6
functionals are applied, resulting in over 20 parameters to be tuned.
Davide Alinovi, Gianluigi Ferrari, Francesco Pisani, Riccardo Raheli
Comments: submitted for publication; 19 pages, 9 figures, 4 tables
Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV)
The lack of large video databases obtained from real patients with
respiratory disorders makes the design and optimization of video-based
monitoring systems quite critical. The purpose of this study is the development
of suitable models and simulators of breathing behaviors and disorders, such as
respiratory pauses and apneas, in order to allow efficient design and test of
video-based monitoring systems. More precisely, a novel Continuous-Time Markov
Chain (CTMC) statistical model of breathing patterns is presented. The
Respiratory Rate (RR) pattern, estimated by measured vital signs of
hospital-monitored patients, is approximated as a CTMC, whose states and
parameters are selected through an appropriate statistical analysis. Then, two
simulators, software- and hardware-based, are proposed. After validation of the
CTMC model, the proposed simulators are tested with previously developed
video-based algorithms for the estimation of the RR and the detection of apnea
events. Examples of application to assess the performance of systems for
video-based RR estimation and apnea detection are presented. The results, in
terms of Kullback-Leibler divergence, show that realistic breathing patterns,
including specific respiratory disorders, can be accurately described by the
proposed model; moreover, the simulators are able to reproduce practical
breathing patterns for video analysis. The presented CTMC statistical model can
be strategic to describe realistic breathing patterns and devise simulators
useful to develop and test novel and effective video processing-based
monitoring systems.
Nicolò Genesio, Tariq Abuhashim, Fabio Solari, Manuela Chessa, Lorenzo Natale
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
In recent years, the numbers of life-size humanoids as well as their mobile
capabilities have steadily grown. Stable walking motion and control for
humanoid robots are active fields of research. In this scenario an open
question is how to model and analyse the scene so that a motion planning
algorithm can generate an appropriate walking pattern. This paper presents the
current work towards scene modelling and understanding, using an RGBD sensor.
The main objective is to provide the humanoid robot iCub with capabilities to
navigate safely and interact with various parts of the environment. In this
sense we address the problem of traversability analysis of the scene, focusing
on classification of point clouds as a function of mobility, and hence walking
safety.
Tuan Do, Nikhil Krishnaswamy, James Pustejovsky
Comments: 4 pages, 4 figures, ISA workshop 2015
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
This paper introduces the Event Capture Annotation Tool (ECAT), a
user-friendly, open-source interface tool for annotating events and their
participants in video, capable of extracting the 3D positions and orientations
of objects in video captured by Microsoft’s Kinect(R) hardware. The modeling
language VoxML (Pustejovsky and Krishnaswamy, 2016) underlies ECAT’s object,
program, and attribute representations, although ECAT uses its own spec for
explicit labeling of motion instances. The demonstration will show the tool’s
workflow and the options available for capturing event-participant relations
and browsing visual data. Mapping ECAT’s output to VoxML will also be
addressed.
Dan Barnes, Will Maddern, Ingmar Posner
Comments: Submitted to the IEEE International Conference on Robotics and Automation 2017. Video summary: this http URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a weakly-supervised approach to segmenting proposed drivable paths
in images with the goal of autonomous driving in complex urban environments.
Using recorded routes from a data collection vehicle, our proposed method
generates vast quantities of labelled images containing proposed paths and
obstacles without requiring manual annotation, which we then use to train a
deep semantic segmentation network. With the trained network we can segment
proposed paths and obstacles at run-time using a vehicle equipped with only a
monocular camera without relying on explicit modelling of road or lane
markings. We evaluate our method on the large-scale KITTI and Oxford RobotCar
datasets and demonstrate reliable path proposal and obstacle segmentation in a
wide variety of environments under a range of lighting, weather and traffic
conditions. We illustrate how the method can generalise to multiple path
proposals at intersections and outline plans to incorporate the system into a
framework for autonomous urban driving.
Yingming Li, Ming Yang, Zhongfei Zhang
Comments: 27 pages, 10 figures. arXiv admin note: text overlap with arXiv:1206.5538, arXiv:1304.5634 by other authors
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Recently, multi-view representation learning has become a rapidly growing
direction in machine learning and data mining areas. This paper first reviews
the root methods and theories on multi-view representation learning, especially
on canonical correlation analysis (CCA) and its several extensions. And then we
investigate the advancement of multi-view representation learning that ranges
from shallow methods including multi-modal topic learning, multi-view sparse
coding, and multi-view latent space Markov networks, to deep methods including
multi-modal restricted Boltzmann machines, multi-modal autoencoders, and
multi-modal recurrent neural networks. Further, we also provide an important
perspective from manifold alignment for multi-view representation learning.
Overall, this survey aims to provide an insightful overview of theoretical
basis and current developments in the field of multi-view representation
learning and to help researchers find the most appropriate tools for particular
applications.
Udi Apsel
Subjects: Artificial Intelligence (cs.AI)
We introduce the lifted Generalized Belief Propagation (GBP) message passing
algorithm, for the computation of sum-product queries in Probabilistic
Relational Models (e.g. Markov logic network). The algorithm forms a compact
region graph and establishes a modified version of message passing, which
mimics the GBP behavior in a corresponding ground model. The compact graph is
obtained by exploiting a graphical representation of clusters, which reduces
cluster symmetry detection to isomorphism tests on small local graphs. The
framework is thus capable of handling complex models, while remaining
domain-size independent.
Dominik Meyer, Hao Shen, Klaus Diepold
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
In this paper, we study the Temporal Difference (TD) learning with linear
value function approximation. It is well known that most TD learning algorithms
are unstable with linear function approximation and off-policy learning. Recent
development of Gradient TD (GTD) algorithms has addressed this problem
successfully. However, the success of GTD algorithms requires a set of well
chosen features, which are not always available. When the number of features is
huge, the GTD algorithms might face the problem of overfitting and being
computationally expensive. To cope with this difficulty, regularization
techniques, in particular $ell_1$ regularization, have attracted significant
attentions in developing TD learning algorithms. The present work combines the
GTD algorithms with $ell_1$ regularization. We propose a family of $ell_1$
regularized GTD algorithms, which employ the well known soft thresholding
operator. We investigate convergence properties of the proposed algorithms, and
depict their performance with several numerical experiments.
Alasdair Thomason, Nathan Griffiths, Victor Sanchez
Subjects: Artificial Intelligence (cs.AI)
With a large proportion of people carrying location-aware smartphones, we
have an unprecedented platform from which to understand individuals and predict
their future actions. This work builds upon the Context Tree data structure
that summarises the historical contexts of individuals from augmented
geospatial trajectories, and constructs a predictive model for their likely
future contexts. The Predictive Context Tree (PCT) is constructed as a
hierarchical classifier, capable of predicting both the future locations that a
user will visit and the contexts that a user will be immersed within. The PCT
is evaluated over real-world geospatial trajectories, and compared against
existing location extraction and prediction techniques, as well as a proposed
hybrid approach that uses identified land usage elements in combination with
machine learning to predict future interactions. Our results demonstrate that
higher predictive accuracies can be achieved using this hybrid approach over
traditional extracted location datasets, and the PCT itself matches the
performance of the hybrid approach at predicting future interactions, while
adding utility in the form of context predictions. Such a prediction system is
capable of understanding not only where a user will visit, but also their
context, in terms of what they are likely to be doing.
Krzysztof Cpalka, Marcin Zalasinski, Leszek Rutkowski
Comments: 34 pages, 7 figures
Journal-ref: Applied Soft Computing, vol. 43, pp. 47-56, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Identity verification based on authenticity assessment of a handwritten
signature is an important issue in biometrics. There are many effective methods
for signature verification taking into account dynamics of a signing process.
Methods based on partitioning take a very important place among them. In this
paper we propose a new approach to signature partitioning. Its most important
feature is the possibility of selecting and processing of hybrid partitions in
order to increase a precision of the test signature analysis. Partitions are
formed by a combination of vertical and horizontal sections of the signature.
Vertical sections correspond to the initial, middle, and final time moments of
the signing process. In turn, horizontal sections correspond to the signature
areas associated with high and low pen velocity and high and low pen pressure
on the surface of a graphics tablet. Our previous research on vertical and
horizontal sections of the dynamic signature (created independently) led us to
develop the algorithm presented in this paper. Selection of sections, among
others, allows us to define the stability of the signing process in the
partitions, promoting signature areas of greater stability (and vice versa). In
the test of the proposed method two databases were used: public MCYT-100 and
paid BioSecure.
Anthony Caterini, Dong Eui Chang
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Deep Neural Networks (DNNs) have become very popular for prediction in many
areas. Their strength is in representation with a high number of parameters
that are commonly learned via gradient descent or similar optimization methods.
However, the representation is non-standardized, and the gradient calculation
methods are often performed using component-based approaches that break
parameters down into scalar units, instead of considering the parameters as
whole entities. In this work, these problems are addressed. Standard notation
is used to represent DNNs in a compact framework. Gradients of DNN loss
functions are calculated directly over the inner product space on which the
parameters are defined. This framework is general and is applied to two common
network types: the Multilayer Perceptron and the Deep Autoencoder.
Kushal Kafle, Christopher Kanan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Visual Question Answering (VQA) is a recent problem in computer vision and
natural language processing that has garnered a large amount of interest from
the deep learning, computer vision, and natural language processing
communities. In VQA, an algorithm needs to answer text-based questions about
images. Since the release of the first VQA dataset in 2014, several additional
datasets have been released and many algorithms have been proposed. In this
review, we critically examine the current state of VQA in terms of problem
formulation, existing datasets, evaluation metrics, and algorithms. In
particular, we discuss the limitations of current datasets with regard to their
ability to properly train and assess VQA algorithms. We then exhaustively
review existing algorithms for VQA. Finally, we discuss possible future
directions for VQA and image understanding research.
Konstantinos Chatzilygeroudis (LORIA, LARSEN), Antoine Cully, Jean-Baptiste Mouret (LORIA, LARSEN)
Comments: Workshop on AI for Long-Term Autonomy at the IEEE International Conference on Robotics and Automation (ICRA), May 2016, Stockholm, Sweden. 2016
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
The recently introduced Intelligent Trial and Error algorithm (IT&E) enables
robots to creatively adapt to damage in a matter of minutes by combining an
off-line evolutionary algorithm and an on-line learning algorithm based on
Bayesian Optimization. We extend the IT&E algorithm to allow for robots to
learn to compensate for damages while executing their task(s). This leads to a
semi-episodic learning scheme that increases the robot’s lifetime autonomy and
adaptivity. Preliminary experiments on a toy simulation and a 6-legged robot
locomotion task show promising results.
Samik Banerjee, Sukhendu Das
Comments: This is an extended version of the paper accepted in CVPR Biometric Workshop, 2016. arXiv admin note: text overlap with arXiv:1610.00660
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)
Face recognition (FR) is the most preferred mode for biometric-based
surveillance, due to its passive nature of detecting subjects, amongst all
different types of biometric traits. FR under surveillance scenario does not
give satisfactory performance due to low contrast, noise and poor illumination
conditions on probes, as compared to the training samples. A state-of-the-art
technology, Deep Learning, even fails to perform well in these scenarios. We
propose a novel soft-margin based learning method for multiple feature-kernel
combinations, followed by feature transformed using Domain Adaptation, which
outperforms many recent state-of-the-art techniques, when tested using three
real-world surveillance face datasets.
Aravind Rajeswaran, Sarvjeet Ghotra, Sergey Levine, Balaraman Ravindran
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Sample complexity and safety are major challenges when learning policies with
reinforcement learning for real-world tasks — especially when the policies are
represented using rich function approximators like deep neural networks.
Model-based methods where the real-world target domain is approximated using a
simulated source domain provide an avenue to tackle the above challenges by
augmenting real data with simulated data. However, discrepancies between the
simulated source domain and the target domain pose a challenge for simulated
training. We introduce the EPOpt algorithm, which uses an ensemble of simulated
source domains and a form of adversarial training to learn policies that are
robust and generalize to a broad range of possible target domains, including to
unmodeled effects. Further, the probability distribution over source domains in
the ensemble can be adapted using data from target domain and approximate
Bayesian methods, to progressively make it a better approximation. Thus,
learning on a model ensemble, along with source domain adaptation, provides the
benefit of both robustness and learning/adaptation.
Dan Barnes, Will Maddern, Ingmar Posner
Comments: Submitted to the IEEE International Conference on Robotics and Automation 2017. Video summary: this http URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a weakly-supervised approach to segmenting proposed drivable paths
in images with the goal of autonomous driving in complex urban environments.
Using recorded routes from a data collection vehicle, our proposed method
generates vast quantities of labelled images containing proposed paths and
obstacles without requiring manual annotation, which we then use to train a
deep semantic segmentation network. With the trained network we can segment
proposed paths and obstacles at run-time using a vehicle equipped with only a
monocular camera without relying on explicit modelling of road or lane
markings. We evaluate our method on the large-scale KITTI and Oxford RobotCar
datasets and demonstrate reliable path proposal and obstacle segmentation in a
wide variety of environments under a range of lighting, weather and traffic
conditions. We illustrate how the method can generalise to multiple path
proposals at intersections and outline plans to incorporate the system into a
framework for autonomous urban driving.
Kyriakos Sideris, Reza Nejabati, Dimitra Simeonidou
Comments: 8 pages, 6 figures, Big data, data analytics, data mining, knowledge centric networking (KCN), software defined networking (SDN), Seer, 2016 15th International Conference on Ubiquitous Computing and Communications and 2016 International Symposium on Cyberspace and Security (IUCC-CSS 2016)
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Network complexity is increasing, making network control and orchestration a
challenging task. The proliferation of network information and tools for data
analytics can provide an important insight into resource provisioning and
optimisation. The network knowledge incorporated in software defined networking
can facilitate the knowledge driven control, leveraging the network
programmability. We present Seer: a flexible, highly configurable data
analytics platform for network intelligence based on software defined
networking and big data principles. Seer combines a computational engine with a
distributed messaging system to provide a scalable, fault tolerant and
real-time platform for knowledge extraction. Our first prototype uses Apache
Spark for streaming analytics and open network operating system (ONOS)
controller to program a network in real-time. The first application we
developed aims to predict the mobility pattern of mobile devices inside a smart
city environment.
Giambattista Amati, Simone Angelini, Marco Bianchi, Luca Costantini, Giuseppe Marcone
Subjects: Information Retrieval (cs.IR)
We estimate sentiment categories proportions for retrieval within large
retrieval sets. In general, estimates are produced by counting the
classification outcomes and then by adjusting such category sizes taking into
account misclassification error matrix. However, both the accuracy of the
classifier and the precision of the retrieval produce a large number of errors
that makes difficult the application of an aggregative approach to sentiment
analysis as a reliable and efficient estimation of proportions for sentiment
categories.
The challenge for real time analytics during retrieval is thus to overcome
misclassification errors, and more importantly, to apply sentiment
classification or any other similar post-processing analytics at retrieval
time. We present a non-aggregative approach that can be applied to very large
retrieval sets of queries.
Christina Lioma, Birger Larsen, Wei Lu, Yong Huang
Subjects: Information Retrieval (cs.IR)
Much of the information processed by Information Retrieval (IR) systems is
unreliable, biased, and generally untrustworthy [1], [2], [3]. Yet, factuality
& objectivity detection is not a standard component of IR systems, even though
it has been possible in Natural Language Processing (NLP) in the last decade.
Motivated by this, we ask if and how factuality & objectivity detection may
benefit IR. We answer this in two parts. First, we use state-of-the-art NLP to
compute the probability of document factuality & objectivity in two TREC
collections, and analyse its relation to document relevance. We find that
factuality is strongly and positively correlated to document relevance, but
objectivity is not. Second, we study the impact of factuality & objectivity to
retrieval effectiveness by treating them as query independent features that we
combine with a competitive language modelling baseline. Experiments with 450
TREC queries show that factuality improves precision >10% over strong
baselines, especially for uncurated data used in web search; objectivity gives
mixed results. An overall clear trend is that document factuality & objectivity
is much more beneficial to IR when searching uncurated (e.g. web) documents vs.
curated (e.g. state documentation and newswire articles). To our knowledge,
this is the first study of factuality & objectivity for back-end IR,
contributing novel findings about the relation between relevance and
factuality/objectivity, and statistically significant gains to retrieval
effectiveness in the competitive web search task.
Edgar Altszyler, Mariano Sigman, Diego Fernández Slezak
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Word embeddings have been extensively studied in large text datasets.
However, only a few studies analyze semantic representations of small corpora,
particularly relevant in single-person text production studies. In the present
paper, we compare Skip-gram and LSA capabilities in this scenario, and we test
both techniques to extract relevant semantic patterns in single-series dreams
reports. LSA showed better performance than Skip-gram in small size training
corpus in two semantic tests. As a study case, we show that LSA can capture
relevant words associations in dream reports series, even in cases of small
number of dreams or low-frequency words. We propose that LSA can be used to
explore words associations in dreams reports, which could bring new insight
into this classic research area of psychology
Yingming Li, Ming Yang, Zhongfei Zhang
Comments: 27 pages, 10 figures. arXiv admin note: text overlap with arXiv:1206.5538, arXiv:1304.5634 by other authors
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Recently, multi-view representation learning has become a rapidly growing
direction in machine learning and data mining areas. This paper first reviews
the root methods and theories on multi-view representation learning, especially
on canonical correlation analysis (CCA) and its several extensions. And then we
investigate the advancement of multi-view representation learning that ranges
from shallow methods including multi-modal topic learning, multi-view sparse
coding, and multi-view latent space Markov networks, to deep methods including
multi-modal restricted Boltzmann machines, multi-modal autoencoders, and
multi-modal recurrent neural networks. Further, we also provide an important
perspective from manifold alignment for multi-view representation learning.
Overall, this survey aims to provide an insightful overview of theoretical
basis and current developments in the field of multi-view representation
learning and to help researchers find the most appropriate tools for particular
applications.
Yftah Ziser, Roi Reichart
Subjects: Computation and Language (cs.CL)
Domain adaptation, adapting models from domains rich in labeled training data
to domains poor in such data, is a fundamental NLP challenge. We introduce a
neural network model that marries together ideas from two prominent strands of
research on domain adaptation through representation learning: structural
correspondence learning (SCL, (Blitzer et al., 2006)) and autoencoder neural
networks. Particularly, our model is a three-layer neural network that learns
to encode the nonpivot features of an input example into a low-dimensional
representation, so that the existence of pivot features (features that are
prominent in both domains and convey useful information for the NLP task) in
the example can be decoded from that representation. The low-dimensional
representation is then employed in a learning algorithm for the task. Moreover,
we show how to inject pre-trained word embeddings into our model in order to
improve generalization across examples with similar pivot features. On the task
of cross-domain product sentiment classification (Blitzer et al., 2007),
consisting of 12 domain pairs, our model outperforms both the SCL and the
marginalized stacked denoising autoencoder (MSDA, (Chen et al., 2012)) methods
by 3.77% and 2.17% respectively, on average across domain pairs.
Yueming Sun, Yi Zhang, Yunfei Chen, Roger Jin
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We will demonstrate a conversational products recommendation agent. This
system shows how we combine research in personalized recommendation systems
with research in dialogue systems to build a virtual sales agent. Based on new
deep learning technologies we developed, the virtual agent is capable of
learning how to interact with users, how to answer user questions, what is the
next question to ask, and what to recommend when chatting with a human user.
Normally a descent conversational agent for a particular domain requires tens
of thousands of hand labeled conversational data or hand written rules. This is
a major barrier when launching a conversation agent for a new domain. We will
explore and demonstrate the effectiveness of the learning solution even when
there is no hand written rules or hand labeled training data.
Edgar Altszyler, Mariano Sigman, Diego Fernández Slezak
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Word embeddings have been extensively studied in large text datasets.
However, only a few studies analyze semantic representations of small corpora,
particularly relevant in single-person text production studies. In the present
paper, we compare Skip-gram and LSA capabilities in this scenario, and we test
both techniques to extract relevant semantic patterns in single-series dreams
reports. LSA showed better performance than Skip-gram in small size training
corpus in two semantic tests. As a study case, we show that LSA can capture
relevant words associations in dream reports series, even in cases of small
number of dreams or low-frequency words. We propose that LSA can be used to
explore words associations in dreams reports, which could bring new insight
into this classic research area of psychology
James Pustejovsky, Nikhil Krishnaswamy
Comments: 8 pages, 9 figures, proceedings of LREC 2016
Subjects: Computation and Language (cs.CL)
We present the specification for a modeling language, VoxML, which encodes
semantic knowledge of real-world objects represented as three-dimensional
models, and of events and attributes related to and enacted over these objects.
VoxML is intended to overcome the limitations of existing 3D visual markup
languages by allowing for the encoding of a broad range of semantic knowledge
that can be exploited by a variety of systems and platforms, leading to
multimodal simulations of real-world scenarios using conceptual objects that
represent their semantic values.
Tiago Tresoldi
Comments: draft, 1 table, 1 figure
Subjects: Computation and Language (cs.CL)
This work proposes a tentative model for the calculation of dimensionless
distances between phonemes; sounds are described with binary distinctive
features and distances show linear consistency in terms of such features. The
model can be used as a scoring function for local and global pairwise alignment
of phoneme sequences, and the distances can be used as prior probabilities for
Bayesian analyses on the phylogenetic relationship between languages,
particularly for cognate identification in cases where no empirical prior
probability is available.
Mahdi Khademian, Mohammad Mehdi Homayounpour
Subjects: Computation and Language (cs.CL); Sound (cs.SD)
A Pascal challenge entitled monaural multi-talker speech recognition was
developed, targeting the problem of robust automatic speech recognition against
speech like noises which significantly degrades the performance of automatic
speech recognition systems. In this challenge, two competing speakers say a
simple command simultaneously and the objective is to recognize speech of the
target speaker. Surprisingly during the challenge, a team from IBM research,
could achieve a performance better than human listeners on this task. The
proposed method of the IBM team, consist of an intermediate speech separation
and then a single-talker speech recognition. This paper reconsiders the task of
this challenge based on gain adapted factorial speech processing models. It
develops a joint-token passing algorithm for direct utterance decoding of both
target and masker speakers, simultaneously. Comparing it to the challenge
winner, it uses maximum uncertainty during the decoding which cannot be used in
the past two-phased method. It provides detailed derivation of inference on
these models based on general inference procedures of probabilistic graphical
models. As another improvement, it uses deep neural networks for joint-speaker
identification and gain estimation which makes these two steps easier than
before producing competitive results for these steps. The proposed method of
this work outperforms past super-human results and even the results were
achieved recently by Microsoft research, using deep neural networks. It
achieved 5.5% absolute task performance improvement compared to the first
super-human system and 2.7% absolute task performance improvement compared to
its recent competitor.
Christophe Servan, Alexandre Berard, Zied Elloumi, Hervé Blanchon, Laurent Besacier
Comments: accepted to COLING 2016 conference
Subjects: Computation and Language (cs.CL)
This paper presents an approach combining lexico-semantic resources and
distributed representations of words applied to the evaluation in machine
translation (MT). This study is made through the enrichment of a well-known MT
evaluation metric: METEOR. This metric enables an approximate match (synonymy
or morphological similarity) between an automatic and a reference translation.
Our experiments are made in the framework of the Metrics task of WMT 2014. We
show that distributed representations are a good alternative to lexico-semantic
resources for MT evaluation and they can even bring interesting additional
information. The augmented versions of METEOR, using vector representations,
are made available on our Github page.
Tuan Do, Nikhil Krishnaswamy, James Pustejovsky
Comments: 4 pages, 4 figures, ISA workshop 2015
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
This paper introduces the Event Capture Annotation Tool (ECAT), a
user-friendly, open-source interface tool for annotating events and their
participants in video, capable of extracting the 3D positions and orientations
of objects in video captured by Microsoft’s Kinect(R) hardware. The modeling
language VoxML (Pustejovsky and Krishnaswamy, 2016) underlies ECAT’s object,
program, and attribute representations, although ECAT uses its own spec for
explicit labeling of motion instances. The demonstration will show the tool’s
workflow and the options available for capturing event-participant relations
and browsing visual data. Mapping ECAT’s output to VoxML will also be
addressed.
Koustav Rudra, Siddhartha Banerjee, Niloy Ganguly, Pawan Goyal, Muhammad Imran, Prasenjit Mitra
Comments: 7 pages, 9 figures, Accepted in The 4th International Workshop on Social Web for Disaster Management (SWDM’16) will be co-located with CIKM 2016
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)
The use of microblogging platforms such as Twitter during crises has become
widespread. More importantly, information disseminated by affected people
contains useful information like reports of missing and found people, requests
for urgent needs etc. For rapid crisis response, humanitarian organizations
look for situational awareness information to understand and assess the
severity of the crisis. In this paper, we present a novel framework (i) to
generate abstractive summaries useful for situational awareness, and (ii) to
capture sub-topics and present a short informative summary for each of these
topics. A summary is generated using a two stage framework that first extracts
a set of important tweets from the whole set of information through an
Integer-linear programming (ILP) based optimization technique and then follows
a word graph and concept event based abstractive summarization technique to
produce the final summary. High accuracies obtained for all the tasks show the
effectiveness of the proposed framework.
Kushal Kafle, Christopher Kanan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Visual Question Answering (VQA) is a recent problem in computer vision and
natural language processing that has garnered a large amount of interest from
the deep learning, computer vision, and natural language processing
communities. In VQA, an algorithm needs to answer text-based questions about
images. Since the release of the first VQA dataset in 2014, several additional
datasets have been released and many algorithms have been proposed. In this
review, we critically examine the current state of VQA in terms of problem
formulation, existing datasets, evaluation metrics, and algorithms. In
particular, we discuss the limitations of current datasets with regard to their
ability to properly train and assess VQA algorithms. We then exhaustively
review existing algorithms for VQA. Finally, we discuss possible future
directions for VQA and image understanding research.
Abdul Malik Badshah, Jamil Ahmad, Mi Young Lee, Sung Wook Baik
Comments: 8 pages, conference paper, The 2nd International Integrated Conference & Concert on Convergence (2016)
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
Besides spoken words, speech signals also carry information about speaker
gender, age, and emotional state which can be used in a variety of speech
analysis applications. In this paper, a divide and conquer strategy for
ensemble classification has been proposed to recognize emotions in speech.
Intrinsic hierarchy in emotions has been utilized to construct an emotions
tree, which assisted in breaking down the emotion recognition task into smaller
sub tasks. The proposed framework generates predictions in three phases.
Firstly, emotions are detected in the input speech signal by classifying it as
neutral or emotional. If the speech is classified as emotional, then in the
second phase, it is further classified into positive and negative classes.
Finally, individual positive or negative emotions are identified based on the
outcomes of the previous stages. Several experiments have been performed on a
widely used benchmark dataset. The proposed method was able to achieve improved
recognition rates as compared to several other approaches.
Karl Fürlinger, Tobias Fuchs, Roger Kowalewski
Comments: Accepted for publication at HPCC 2016, 12-14 December 2016, Syndey Australia
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We present DASH, a C++ template library that offers distributed data
structures and parallel algorithms and implements a compiler-free PGAS
(partitioned global address space) approach. DASH offers many productivity and
performance features such as global-view data structures, efficient support for
the owner-computes model, flexible multidimensional data distribution schemes
and inter-operability with STL (standard template library) algorithms. DASH
also features a flexible representation of the parallel target machine and
allows the exploitation of several hierarchically organized levels of locality
through a concept of Teams. We evaluate DASH on a number of benchmark
applications and we port a scientific proxy application using the MPI two-sided
model to DASH. We find that DASH offers excellent productivity and performance
and demonstrate scalability up to 9800 cores.
Eli Gafni, Yuan He, Petr Kuznetsov, Thibault Rieutord
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The wait-free read-write memory model has been characterized as an iterated
emph{Immediate Snapshot} (IS) task. The IS task is emph{affine}—it can be
defined as a (sub)set of simplices of the standard chromatic subdivision. It is
known that the task of emph{Weak Symmetry Breaking} (WSB) cannot be
represented as an affine task. In this paper, we highlight the phenomenon of a
“natural” model that can be captured by an iterated affine task and, thus, by a
subset of runs of the iterated immediate snapshot model. We show that the
read-write memory model in which, additionally, $k$-set-consensus objects can
be used is, unlike WSB, “natural” by presenting the corresponding simple affine
task captured by a subset of $2$-round IS runs. Our results imply the first
combinatorial characterization of models equipped with abstractions other than
read-write memory that applies to generic tasks.
Gabriele D'Angelo
Comments: To appear in “Simulation Modelling Practice and Theory, Elsevier”
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Performance (cs.PF)
This paper is about partitioning in parallel and distributed simulation. That
means decomposing the simulation model into a numberof components and to
properly allocate them on the execution units. An adaptive solution based on
self-clustering, that considers both communication reduction and computational
load-balancing, is proposed. The implementation of the proposed mechanism is
tested using a simulation model that is challenging both in terms of structure
and dynamicity. Various configurations of the simulation model and the
execution environment have been considered. The obtained performance results
are analyzed using a reference cost model. The results demonstrate that the
proposed approach is promising and that it can reduce the simulation execution
time in both parallel and distributed architectures.
Dariusz Dereniowski, Dorota Urbańska
Subjects: Discrete Mathematics (cs.DM); Distributed, Parallel, and Cluster Computing (cs.DC); Combinatorics (math.CO)
We consider the following distributed pursuit-evasion problem. A team of
mobile agents called searchers starts at an arbitrary node of an unknown
$n$-node network. Their goal is to execute a search strategy that guarantees
capturing a fast and invisible intruder regardless of its movements using as
few agents as possible. We restrict our attention to networks that are embedded
into partial grids: nodes are placed on the plane at integer coordinates and
only nodes at distance one can be adjacent. We give a distributed algorithm for
the searchers that allow them to compute a connected and monotone strategy that
guarantees searching any unknown partial grid with the use of $O(sqrt{n})$
searchers. As for a lower bound, not only there exist partial grids that
require $Omega(sqrt{n})$ searchers, but we prove that for each distributed
searching algorithm there is a partial grid that forces the algorithm to use
$Omega(sqrt{n})$ searchers but $O(log n)$ searchers are sufficient in the
offline scenario. This gives a lower bound of $Omega(sqrt{n}/log n)$ in
terms of achievable competitive ratio of any distributed algorithm.
Aravind Rajeswaran, Sarvjeet Ghotra, Sergey Levine, Balaraman Ravindran
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Sample complexity and safety are major challenges when learning policies with
reinforcement learning for real-world tasks — especially when the policies are
represented using rich function approximators like deep neural networks.
Model-based methods where the real-world target domain is approximated using a
simulated source domain provide an avenue to tackle the above challenges by
augmenting real data with simulated data. However, discrepancies between the
simulated source domain and the target domain pose a challenge for simulated
training. We introduce the EPOpt algorithm, which uses an ensemble of simulated
source domains and a form of adversarial training to learn policies that are
robust and generalize to a broad range of possible target domains, including to
unmodeled effects. Further, the probability distribution over source domains in
the ensemble can be adapted using data from target domain and approximate
Bayesian methods, to progressively make it a better approximation. Thus,
learning on a model ensemble, along with source domain adaptation, provides the
benefit of both robustness and learning/adaptation.
Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Xinyu Xing, C. Lee Giles, Xue Liu
Subjects: Learning (cs.LG)
Deep neural networks (DNN) have been proven to be quite effective in many
applications such as image recognition and using software to process security
or traffic camera footage, for example to measure traffic flows or spot
suspicious activities. Despite the superior performance of DNN in these
applications, it has recently been shown that a DNN is susceptible to a
particular type of attack that exploits a fundamental flaw in its design.
Specifically, an attacker can craft a particular synthetic example, referred to
as an adversarial sample, causing the DNN to produce an output behavior chosen
by attackers, such as misclassification. Addressing this flaw is critical if a
DNN is to be used in critical applications such as those in cybersecurity.
Previous work provided various defence mechanisms by either increasing the
model nonlinearity or enhancing model complexity. However, after a thorough
analysis of the fundamental flaw in the DNN, we discover that the effectiveness
of such methods is limited. As such, we propose a new adversary resistant
technique that obstructs attackers from constructing impactful adversarial
samples by randomly nullifying features within samples. Using the MNIST
dataset, we evaluate our proposed technique and empirically show our technique
significantly boosts DNN’s robustness against adversarial samples while
maintaining high accuracy in classification.
Yingming Li, Ming Yang, Zhongfei Zhang
Comments: 27 pages, 10 figures. arXiv admin note: text overlap with arXiv:1206.5538, arXiv:1304.5634 by other authors
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Recently, multi-view representation learning has become a rapidly growing
direction in machine learning and data mining areas. This paper first reviews
the root methods and theories on multi-view representation learning, especially
on canonical correlation analysis (CCA) and its several extensions. And then we
investigate the advancement of multi-view representation learning that ranges
from shallow methods including multi-modal topic learning, multi-view sparse
coding, and multi-view latent space Markov networks, to deep methods including
multi-modal restricted Boltzmann machines, multi-modal autoencoders, and
multi-modal recurrent neural networks. Further, we also provide an important
perspective from manifold alignment for multi-view representation learning.
Overall, this survey aims to provide an insightful overview of theoretical
basis and current developments in the field of multi-view representation
learning and to help researchers find the most appropriate tools for particular
applications.
Peter Goldsborough
Subjects: Learning (cs.LG)
Deep learning is a branch of artificial intelligence employing deep neural
network architectures that has significantly advanced the state-of-the-art in
computer vision, speech recognition, natural language processing and other
domains. In November 2015, Google released $ extit{TensorFlow}$, an open
source deep learning software library for defining, training and deploying
machine learning models. In this paper, we review TensorFlow and put it in
context of modern deep learning concepts and software. We discuss its basic
computational paradigms and distributed execution model, its programming
interface as well as accompanying visualization toolkits. We then compare
TensorFlow to alternative libraries such as Theano, Torch or Caffe on a
qualitative as well as quantitative basis and finally comment on observed
use-cases of TensorFlow in academia and industry.
Dmitry Yarotsky
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We study how approximation errors of neural networks with ReLU activation
functions depend on the depth of the network. We establish rigorous error
bounds showing that deep ReLU networks are significantly more expressive than
shallow ones as long as approximations of smooth functions are concerned. At
the same time, we show that on a set of functions constrained only by their
degree of smoothness, a ReLU network architecture cannot in general achieve
approximation accuracy with better than a power law dependence on the network
size, regardless of its depth.
Yueming Sun, Yi Zhang, Yunfei Chen, Roger Jin
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
We will demonstrate a conversational products recommendation agent. This
system shows how we combine research in personalized recommendation systems
with research in dialogue systems to build a virtual sales agent. Based on new
deep learning technologies we developed, the virtual agent is capable of
learning how to interact with users, how to answer user questions, what is the
next question to ask, and what to recommend when chatting with a human user.
Normally a descent conversational agent for a particular domain requires tens
of thousands of hand labeled conversational data or hand written rules. This is
a major barrier when launching a conversation agent for a new domain. We will
explore and demonstrate the effectiveness of the learning solution even when
there is no hand written rules or hand labeled training data.
Dominik Meyer, Hao Shen, Klaus Diepold
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
In this paper, we study the Temporal Difference (TD) learning with linear
value function approximation. It is well known that most TD learning algorithms
are unstable with linear function approximation and off-policy learning. Recent
development of Gradient TD (GTD) algorithms has addressed this problem
successfully. However, the success of GTD algorithms requires a set of well
chosen features, which are not always available. When the number of features is
huge, the GTD algorithms might face the problem of overfitting and being
computationally expensive. To cope with this difficulty, regularization
techniques, in particular $ell_1$ regularization, have attracted significant
attentions in developing TD learning algorithms. The present work combines the
GTD algorithms with $ell_1$ regularization. We propose a family of $ell_1$
regularized GTD algorithms, which employ the well known soft thresholding
operator. We investigate convergence properties of the proposed algorithms, and
depict their performance with several numerical experiments.
Igor Colin, Christophe Dupuy
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Privacy preserving networks can be modelled as decentralized networks (e.g.,
sensors, connected objects, smartphones), where communication between nodes of
the network is not controlled by an all-knowing, central node. For this type of
networks, the main issue is to gather/learn global information on the network
(e.g., by optimizing a global cost function) while keeping the (sensitive)
information at each node. In this work, we focus on text information that
agents do not want to share (e.g., text messages, emails, confidential
reports). We use recent advances on decentralized optimization and topic models
to infer topics from a graph with limited communication. We propose a method to
adapt latent Dirichlet allocation (LDA) model to decentralized optimization and
show on synthetic data that we still recover similar parameters and similar
performance at each node than with stochastic methods accessing to the whole
information in the graph.
Samik Banerjee, Sukhendu Das
Comments: This is an extended version of the paper accepted in CVPR Biometric Workshop, 2016. arXiv admin note: text overlap with arXiv:1610.00660
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)
Face recognition (FR) is the most preferred mode for biometric-based
surveillance, due to its passive nature of detecting subjects, amongst all
different types of biometric traits. FR under surveillance scenario does not
give satisfactory performance due to low contrast, noise and poor illumination
conditions on probes, as compared to the training samples. A state-of-the-art
technology, Deep Learning, even fails to perform well in these scenarios. We
propose a novel soft-margin based learning method for multiple feature-kernel
combinations, followed by feature transformed using Domain Adaptation, which
outperforms many recent state-of-the-art techniques, when tested using three
real-world surveillance face datasets.
Dan Barnes, Will Maddern, Ingmar Posner
Comments: Submitted to the IEEE International Conference on Robotics and Automation 2017. Video summary: this http URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a weakly-supervised approach to segmenting proposed drivable paths
in images with the goal of autonomous driving in complex urban environments.
Using recorded routes from a data collection vehicle, our proposed method
generates vast quantities of labelled images containing proposed paths and
obstacles without requiring manual annotation, which we then use to train a
deep semantic segmentation network. With the trained network we can segment
proposed paths and obstacles at run-time using a vehicle equipped with only a
monocular camera without relying on explicit modelling of road or lane
markings. We evaluate our method on the large-scale KITTI and Oxford RobotCar
datasets and demonstrate reliable path proposal and obstacle segmentation in a
wide variety of environments under a range of lighting, weather and traffic
conditions. We illustrate how the method can generalise to multiple path
proposals at intersections and outline plans to incorporate the system into a
framework for autonomous urban driving.
Eric Bax, Farshad Kooti
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
If classifiers are selected from a hypothesis class to form an ensemble,
bounds on average error rate over the selected classifiers include a component
for selectivity, which grows as the fraction of hypothesis classifiers selected
for the ensemble shrinks, and a component for variety, which grows with the
size of the hypothesis class or in-sample data set. We show that the component
for selectivity asymptotically dominates the component for variety, meaning
that variety is essentially free.
Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman
Comments: ECCV 2016 (oral). The first two authors contributed equally to this work
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Understanding 3D object structure from a single image is an important but
difficult task in computer vision, mostly due to the lack of 3D object
annotations in real images. Previous work tackles this problem by either
solving an optimization task given 2D keypoint positions, or training on
synthetic data with ground truth 3D information. In this work, we propose 3D
INterpreter Network (3D-INN), an end-to-end framework which sequentially
estimates 2D keypoint heatmaps and 3D object structure, trained on both real
2D-annotated images and synthetic 3D data. This is made possible mainly by two
technical innovations. First, we propose a Projection Layer, which projects
estimated 3D structure to 2D space, so that 3D-INN can be trained to predict 3D
structural parameters supervised by 2D annotations on real images. Second,
heatmaps of keypoints serve as an intermediate representation connecting real
and synthetic data, enabling 3D-INN to benefit from the variation and abundance
of synthetic 3D objects, without suffering from the difference between the
statistics of real and synthesized images due to imperfect rendering. The
network achieves state-of-the-art performance on both 2D keypoint estimation
and 3D structure recovery. We also show that the recovered 3D information can
be used in other vision applications, such as 3D rendering and image retrieval.
Taehyeun Park, Nof Abuzainab, Walid Saad
Comments: 10 pages, 4 figures, 1 table
Subjects: Information Theory (cs.IT); Computer Science and Game Theory (cs.GT)
For a seamless deployment of the Internet of Things (IoT), there is a need
for self-organizing solutions to overcome key IoT challenges that include data
processing, resource management, coexistence with existing wireless networks,
and improved IoT-wide event detection. One of the most promising solutions to
address these challenges is via the use of innovative learning frameworks that
will enable the IoT devices to operate autonomously in a dynamic environment.
However, developing learning mechanisms for the IoT requires coping with unique
IoT properties in terms of resource constraints, heterogeneity, and strict
quality-of-service requirements. In this paper, a number of emerging learning
frameworks suitable for IoT applications are presented. In particular, the
advantages, limitations, IoT applications, and key results pertaining to
machine learning, sequential learning, and reinforcement learning are studied.
For each type of learning, the computational complexity, required information,
and learning performance are discussed. Then, to handle the heterogeneity of
the IoT, a new framework based on the powerful tools of cognitive hierarchy
theory is introduced. This framework is shown to efficiently capture the
different IoT device types and varying levels of available resources among the
IoT devices. In particular, the different resource capabilities of IoT devices
are mapped to different levels of rationality in cognitive hierarchy theory,
thus enabling the IoT devices to use different learning frameworks depending on
their available resources. Finally, key results on the use of cognitive
hierarchy theory in the IoT are presented.
Mingzhe Chen, Mohammad Mozaffari, Walid Saad, Changchuan Yin, Mérouane Debbah, Choong-Seon Hong
Subjects: Information Theory (cs.IT)
In this paper, the problem of proactive deployment of cache-enabled unmanned
aerial vehicles (UAVs) for optimizing the quality-of-experience (QoE) of
wireless devices in a cloud radio access network (CRAN) is studied. In the
considered model, the network can leverage human-centric information such as
users’ visited locations, requested contents, gender, job, and device type to
predict the content request distribution and mobility pattern of each user.
Then, given these behavior predictions, the proposed approach seeks to find the
user-UAV associations, the optimal UAVs’ locations, and the contents to cache
at UAVs. This problem is formulated as an optimization problem whose goal is to
maximize the users’ QoE while minimizing the transmit power used by the UAVs.
To solve this problem, a novel algorithm based on the machine learning
framework of conceptor-based echo state networks (ESNs) is proposed. Using
ESNs, the network can effectively predict each user’s content request
distribution and its mobility pattern when limited information on the states of
users and the network is available. Based on the predictions of the users’
content request distribution and their mobility patterns, we derive the optimal
user-UAV association, optimal locations of the UAVs as well as the content to
cache at UAVs. Simulation results using real pedestrian mobility patterns from
BUPT and actual content transmission data from Youku show that the proposed
algorithm can yield 40% and 61% gains, respectively, in terms of the average
transmit power and the percentage of the users with satisfied QoE compared to a
benchmark algorithm without caching and a benchmark solution without UAVs.
Cao Yuan, Yonglin Cao
Subjects: Information Theory (cs.IT)
Let $mathbb{F}_{p^m}$ be a finite field of cardinality $p^m$, where $p$ is a
prime, and $k, N$ be any positive integers. We denote $R_k=F_{p^m}[u]/langle
u^k
angle =F_{p^m}+uF_{p^m}+ldots+u^{k-1}F_{p^m}$ ($u^k=0$) and
$lambda=a_0+a_1u+ldots+a_{k-1}u^{k-1}$ where $a_0, a_1,ldots, a_{k-1}in
F_{p^m}$ satisfying $a_0
eq 0$ and $a_1=1$. Let $r$ be a positive integer
satisfying $p^{r-1}+1leq kleq p^r$. We defined a Gray map from $R_k$ to
$F_{p^m}^{p^r}$ first, then prove that the Gray image of any linear
$lambda$-constacyclic code over $R_k$ of length $N$ is a distance invariant
linear $a_0^{p^r}$-constacyclic code over $F_{p^m}$ of length $p^rN$.
Furthermore, the generator polynomials for each linear $lambda$-constacyclic
code over $R_k$ of length $N$ and its Gray image are given respectively.
Finally, some optimal constacyclic codes over $F_{3}$ and $F_{5}$ are
constructed.
Zhongju Wang, Prabhu Babu, Daniel P. Palomar
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)
Phase noise correction is crucial to exploit full advantage of orthogonal
frequency division multiplexing (OFDM) in modern high-data-rate communications.
OFDM channel estimation with simultaneous phase noise compensation has
therefore drawn much attention and stimulated continuing efforts. Existing
methods, however, either have not taken into account the fundamental properties
of phase noise or are only able to provide estimates of limited applicability
owing to considerable computational complexity. In this paper, we have
reformulated the joint estimation problem in the time domain as opposed to
existing frequency-domain approaches, which enables us to develop much more
efficient algorithms using the majorization-minimization technique. In
addition, we propose a method based on dimensionality reduction and the
Bayesian Information Criterion (BIC) that can adapt to various phase noise
levels and accomplish much lower mean squared error than the benchmarks without
incurring much additional computational cost. Several numerical examples with
phase noise generated by free-running oscillators or phase-locked loops
demonstrate that our proposed algorithms outperform existing methods with
respect to both computational efficiency and mean squared error within a large
range of signal-to-noise ratios.
Nikolaos I. Miridakis, Theodoros A. Tsiftsis
Subjects: Information Theory (cs.IT)
In this paper, a spatial multiplexing multiple-input multiple-output (MIMO)
system when hardware along with RF imperfections occur during the communication
setup is analytically investigated. More specifically, the scenario of hardware
impairments at the transceiver and imperfect channel state information (CSI) at
the receiver is considered, when successive interference cancellation (SIC) is
implemented. Two popular linear detection schemes are analyzed, namely, zero
forcing SIC (ZF-SIC) and minimum mean-square error SIC (MMSE-SIC). New
analytical expressions for the outage probability of each SIC stage are
provided, when independent and identically distributed Rayleigh fading channels
are considered. In addition, the well-known error propagation effect between
consecutive SIC stages is analyzed, while closed-form expressions are derived
for some special cases of interest. Finally, useful engineering insights are
manifested, such as the achievable diversity order, the performance difference
between ZF- and MMSE-SIC, and the impact of imperfect CSI and/or the presence
of hardware impairments to the overall system performance.
B. N. Bharath, Vaishali P
Comments: A part of this work is submitted to WCNC-2017
Subjects: Information Theory (cs.IT)
This paper considers a distributed stochastic optimization problem where the
goal is to minimize the time average of a cost function subject to a set of
constraints on the time averages of a related stochastic processes called
penalties. We assume that a delayed information about an event in the system is
available as a common information at every user, and the state of the system is
evolving in an independent and non-stationary fashion. We show that an
approximate Drift-plus-penalty (DPP) algorithm that we propose achieves a time
average cost that is within some positive constant epsilon of the optimal cost
with high probability. Further, we provide a condition on the waiting time for
this result to hold. The condition is shown to be a function of the mixing
coefficient, the number of samples (w) used to compute an estimate of the
distribution of the state, and the delay. Unlike the existing work, the method
used in the paper can be adapted to prove high probability results when the
state is evolving in a non-i.i.d and non-stationary fashion. Under mild
conditions, we show that the dependency of the error bound on w is exponential,
which is a significant improvement compared to the exiting work.
Bikash Kumar Dey, Sidharth Jaggi, Michael Langberg
Comments: 18 pages
Subjects: Information Theory (cs.IT)
In this work we consider a communication problem in which a sender, Alice,
wishes to communicate with a receiver, Bob, over a channel controlled by an
adversarial jammer, James, who is {em myopic}. Roughly speaking, for
blocklength $n$, the codeword $X^n$ transmitted by Alice is corrupted by James
who must base his adversarial decisions (of which locations of $X^n$ to corrupt
and how to corrupt them) not on the codeword $X^n$ but on $Z^n$, an image of
$X^n$ through a noisy memoryless channel. More specifically, our communication
model may be described by two channels. A memoryless channel $p(z|x)$ from
Alice to James, and an {it Arbitrarily Varying Channel} from Alice to Bob,
$p(y|x,s)$ governed by a state $X^n$ determined by James. In standard
adversarial channels the states $S^n$ may depend on the codeword $X^n$, but in
our setting $S^n$ depends only on James’s view $Z^n$.
The myopic channel captures a broad range of channels and bridges between the
standard models of memoryless and adversarial (zero-error) channels. In this
work we present upper and lower bounds on the capacity of myopic channels. For
a number of special cases of interest we show that our bounds are tight. We
extend our results to the setting of {em secure} communication in which we
require that the transmitted message remain secret from James. For example, we
show that if (i) James may flip at most a $p$ fraction of the bits communicated
between Alice and Bob, and (ii) James views $X^n$ through a binary symmetric
channel with parameter $q$, then once James is “sufficiently myopic” (in this
case, when $q>p$), then the optimal communication rate is that of an adversary
who is “blind” (that is, an adversary that does not see $X^n$ at all), which is
$1-H(p)$ for standard communication, and $H(q)-H(p)$ for secure communication.
A similar phenomenon exists for our general model of communication.
Mladen Kovačević, Vincent Y. F. Tan
Comments: 7 pages, 2 figures
Subjects: Combinatorics (math.CO); Computational Geometry (cs.CG); Information Theory (cs.IT); Group Theory (math.GR); Number Theory (math.NT)
A $ B_h $ set (or Sidon set of order $ h $) in an Abelian group $ G $ is any
subset $ {b_0, b_1, ldots,b_{n}} subset G $ with the property that all the
sums $ b_{i_1} + cdots + b_{i_h} $ are different up to the order of the
summands. Let $ phi(h,n) $ denote the order of the smallest Abelian group
containing a $ B_h $ set of cardinality $ n + 1 $. It is shown that, as $ h o
infty $ and $ n $ is kept fixed, [ phi(h,n) sim frac{1}{n!
delta_{L}( riangle^n)} h^n , ] where $ delta_{L}( riangle^n) $ is the
lattice-packing density of an $ n $-simplex in the Euclidean space. This
determines the asymptotics exactly in cases where this density is known ($ n
leq 3 $), and gives an improved upper bound on $ phi(h,n) $ in the remaining
cases. Covering analogs of Sidon sets are also introduced and their
characterization in terms of lattice-coverings by simplices is given.
Mattia Rebato, Federico Boccardi, Marco Mezzavilla, Sundeep Rangan, Michele Zorzi
Comments: Submitted for publication in IEEE Transactions on Cognitive Communications and Networking
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
While spectrum at millimeter wave (mmWave) frequencies is less scarce than at
traditional frequencies below 6 GHz, still it is not unlimited, in particular
if we consider the requirements from other services using the same band and the
need to license mmWave bands to multiple mobile operators. Therefore, an
efficient spectrum access scheme is critical to harvest the maximum benefit
from emerging mmWave technologies. In this paper, we introduce a new hybrid
spectrum access scheme for mmWave networks, where data is aggregated through
two mmWave carriers with different characteristics. In particular, we consider
the case of a hybrid spectrum scheme between a mmWave band with exclusive
access and a mmWave band where spectrum is pooled between multiple operators.
To the best of our knowledge, this is the first study proposing hybrid spectrum
access for mmWave networks and providing a quantitative assessment of its
benefits. Our results show that this approach provides major advantages with
respect to traditional fully licensed or fully unlicensed spectrum access
schemes, though further work is needed to achieve a more complete understanding
of both technical and non technical implications.
Anubhav Chaturvedi, Marcin Pawlowski, Karol Horodecki
Comments: 17 pages, 6 figures
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
It is known that a PR-BOX (PR), a non-local resource and $(2
ightarrow 1)$
random access code (RAC), a functionality (wherein Alice encodes 2 bits into 1
bit message and Bob learns one of randomly chosen Alice’s inputs) are
equivalent under the no-signaling condition. In this work we introduce
generalizations to PR and $(2
ightarrow 1)$ RAC and study their
inter-convertibility. We introduce generalizations based on the number of
inputs provided to Alice, $B_n$-BOX and $(n
ightarrow 1)$ RAC. We show that a
$B_n$-BOX is equivalent to a no-signaling $(n
ightarrow 1)$ RACBOX (RB).
Further we introduce a signaling $(n
ightarrow 1)$ RB which cannot simulate a
$B_n$-BOX. Finally to quantify the same we provide a resource inequality
between $(n
ightarrow 1)$ RB and $B_n$-BOX, and show that it is saturated. As
an application we prove that one requires atleast $(n-1)$ PRs supplemented with
a bit of communication to win a $(n
ightarrow 1)$ RAC. We further introduce
generalizations based on the dimension of inputs provided to Alice and the
message she sends, $B_n^d(+)$-BOX, $B_n^d(-)$-BOX and $(n
ightarrow 1,d)$ RAC
($d>2$). We show that no-signaling condition is not enough to enforce strict
equivalence in the case of $d>2$. We introduce classes of no-signaling
$(n
ightarrow 1,d)$ RB, one which can simulate $B_n^d(+)$-BOX, second which
can simulate $B_n^d(-)$-BOX and third which cannot simulate either. Finally to
quantify the same we provide a resource inequality between $(n
ightarrow 1,d)$
RB and $B_n^d(+)$-BOX, and show that it is saturated.