Roozbeh Mottaghi, Connor Schenck, Dieter Fox, Ali Farhadi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Humans have rich understanding of liquid containers and their contents; for
example, we can effortlessly pour water from a pitcher to a cup. Doing so
requires estimating the volume of the cup, approximating the amount of water in
the pitcher, and predicting the behavior of water when we tilt the pitcher.
Very little attention in computer vision has been made to liquids and their
containers. In this paper, we study liquid containers and their contents, and
propose methods to estimate the volume of containers, approximate the amount of
liquid in them, and perform comparative volume estimations all from a single
RGB image. Furthermore, we show that our proposed model can predict the
behavior of liquids inside containers when one tilts the containers. We also
introduce a new dataset of Containers Of liQuid contEnt (COQE) that contains
5,000 images of 10,000 liquid containers in context labelled with volume,
amount of content, bounding box annotation, and corresponding similar 3D CAD
models.
Drew Linsley, Sven Eberhardt, Tarun Sharma, Pankaj Gupta, Thomas Serre
Comments: 12 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Understanding what visual features and representations contribute to human
object recognition may provide scaffolding for more effective artificial vision
systems. While recent advances in Deep Convolutional Networks (DCNs) have led
to systems approaching human accuracy, it is unclear if they leverage the same
visual features as humans for object recognition.
We introduce Clicktionary, a competitive web-based game for discovering
features that humans use for object recognition: One participant from a pair
sequentially reveals parts of an object in an image until the other correctly
identifies its category. Scoring image regions according to their proximity to
correct recognition yields maps of visual feature importance for individual
images. We find that these “realization” maps exhibit only weak correlation
with relevance maps derived from DCNs or image salience algorithms. Cueing DCNs
to attend to features emphasized by these maps improves their object
recognition accuracy. Our results thus suggest that realization maps identify
visual features that humans deem important for object recognition but are not
adequately captured by DCNs. To rectify this shortcoming, we propose a novel
web-based application for acquiring realization maps at scale, with the aim of
improving the state-of-the-art in object recognition.
Hao Dong, Paarth Neekhara, Chao Wu, Yike Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
It’s useful to automatically transform an image from its original form to
some synthetic form (style, partial contents, etc.), while keeping the original
structure or semantics. We define this requirement as the “image-to-image
translation” problem, and propose a general approach to achieve it, based on
deep convolutional and conditional generative adversarial networks (GANs),
which has gained a phenomenal success to learn mapping images from noise input
since 2014. In this work, we develop a two step (unsupervised) learning method
to translate images between different domains by using unlabeled images without
specifying any correspondence between them, so that to avoid the cost of
acquiring labeled data. Compared with prior works, we demonstrated the capacity
of generality in our model, by which variance of translations can be conduct by
a single type of model. Such capability is desirable in applications like
bidirectional translation
Sergio Escalera, Xavier Baró, Hugo Jair Escalante, Isabelle Guyon
Comments: Associated website: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper reviews the historic of ChaLearn Looking at People (LAP) events.
We started in 2011 (with the release of the first Kinect device) to run
challenges related to human action/activity and gesture recognition. Since then
we have regularly organized events in a series of competitions covering all
aspects of visual analysis of humans. So far we have organized more than 10
international challenges and events in this field. This paper reviews
associated events, and introduces the ChaLearn LAP platform where public
resources (including code, data and preprints of papers) related to the
organized events are available. We also provide a discussion on perspectives of
ChaLearn LAP activities.
Cristian González García, Daniel Meana-Llorián, B. Cristina Pelayo G-Bustelo, Juan Manuel Cueva Lovelle, Néstor Garcia-Fernandez
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Could we use Computer Vision in the Internet of Things for using pictures as
sensors? This is the principal hypothesis that we want to resolve. Currently,
in order to create safety areas, cities, or homes, people use IP cameras.
Nevertheless, this system needs people who watch the camera images, watch the
recording after something occurred, or watch when the camera notifies them of
any movement. These are the disadvantages. Furthermore, there are many Smart
Cities and Smart Homes around the world. This is why we thought of using the
idea of the Internet of Things to add a way of automating the use of IP
cameras. In our case, we propose the analysis of pictures through Computer
Vision to detect people in the analysed pictures. With this analysis, we are
able to obtain if these pictures contain people and handle the pictures as if
they were sensors with two possible states. Notwithstanding, Computer Vision is
a very complicated field. This is why we needed a second hypothesis: Could we
work with Computer Vision in the Internet of Things with a good accuracy to
automate or semi-automate this kind of events? The demonstration of these
hypotheses required a testing over our Computer Vision module to check the
possibilities that we have to use this module in a possible real environment
with a good accuracy. Our proposal, as a possible solution, is the analysis of
entire sequence instead of isolated pictures for using pictures as sensors in
the Internet of Things.
Simone Bianco, Marco Buzzelli, Davide Mazzini, Raimondo Schettini
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we propose a method for logo recognition using deep learning.
Our recognition pipeline is composed of a logo region proposal followed by a
Convolutional Neural Network (CNN) specifically trained for logo
classification, even if they are not precisely localized. Experiments are
carried out on the FlickrLogos-32 database, and we evaluate the effect on
recognition performance of synthetic versus real data augmentation, and image
pre-processing. Moreover, we systematically investigate the benefits of
different training choices such as class-balancing, sample-weighting and
explicit modeling the background class (i.e. no-logo regions). Experimental
results confirm the feasibility of the proposed method, that outperforms the
methods in the state of the art.
Ender Konukoglu, Ben Glocker
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Current methods for statistical analysis of neuroimaging data identify
condition related structural alterations in the human brain by detecting group
differences. They construct detailed maps showing population-wide changes due
to a condition of interest. Although extremely useful, methods do not provide
information on the subject-specific structural alterations and they have
limited diagnostic value because group assignments for each subject are
required for the analysis. In this article, we propose SubCMap, a novel method
to detect subject and condition specific structural alterations. SubCMap is
designed to work without the group assignment information in order to provide
diagnostic value. Unlike outlier detection methods, SubCMap detections are
condition-specific and can be used to study the effects of various conditions
or for diagnosing diseases. The method combines techniques from classification,
generalization error estimation and image restoration to the identify the
condition-related alterations. Experimental evaluation is performed on
synthetically generated data as well as data from the Alzheimer’s Disease
Neuroimaging Initiative (ADNI) database. Results on synthetic data demonstrate
the advantages of SubCMap compared to population-wide techniques and higher
detection accuracy compared to outlier detection. Analysis with the ADNI
dataset show that SubCMap detections on cortical thickness data well correlate
with non-imaging markers of Alzheimer’s Disease (AD), the Mini Mental State
Examination Score and Cerebrospinal Fluid amyloid-(eta) levels, suggesting
the proposed method well captures the inter-subject variation of AD effects.
Syed Afaq Ali Shah, Uzair Nadeem, Mohammed Bennamoun, Ferdous Sohel, Roberto Togneri
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a novel image set classification technique using linear regression
models. Downsampled gallery image sets are interpreted as subspaces of a high
dimensional space to avoid the computationally expensive training step. We
estimate regression models for each test image using the class specific gallery
subspaces. Images of the test set are then reconstructed using the regression
models. Based on the minimum reconstruction error between the reconstructed and
the original images, a weighted voting strategy is used to classify the test
set. We performed extensive evaluation on the benchmark UCSD/Honda, CMU Mobo
and YouTube Celebrity datasets for face classification, and ETH-80 dataset for
object classification. The results demonstrate that by using only a small
amount of training data, our technique achieved competitive classification
accuracy and superior computational speed compared with the state-of-the-art
methods.
Dr. Manuela Hirschmugla, DI Heinz Gallaun, Dr. Matthias Dees, Dr. Pawan Datta, Mag. Janik Deutschera Dr. Nikos Koutsias, Prof. Dr. Mathias Schardt
Comments: This is the Authors’ accepted version only! The final version of this paper can be located at Springer.com as part of the Current Forestry Reports!
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a review of the current state of the art in remote
sensing based monitoring of forest disturbances and forest degradation from
optical Earth Observation data. Part one comprises an overview and tabular
description of currently available optical remote sensing sensors, which can be
used for forest disturbance and degradation mapping. A section is devoted to
currently existing mapping approaches, including both operational methods and
recent developments. Part two reviews the two main categories of existing
mapping approaches: first, classical image-to-image change detection and
second, time series analysis. With the launch of the Sentinel-2a satellite and
available Landsat imagery, time series analysis has become the most promising
but also most demanding category of degradation mapping approaches. Hence, an
emphasis is put on methods of time series analysis, among which four different
classification methods are distinguished. The methods are explained and their
benefits and drawbacks are discussed. A separate chapter presents a number of
recent forest degradation mapping studies for two different ecosystems: The
first ecosystem comprises the temperate forests with a geographical focus on
Europe. The second ecosystem consists of the tropical forests with a
geographical focus on Africa. Mapping examples from both ecosystems help to
better illustrate the current state of the art.
Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, Peter V. Gehler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
3D models provide the common ground for different representations of human
bodies. In turn, robust 2D estimation has proven to be a powerful tool to
obtain 3D fits “in-the-wild”. However, depending on the level of detail, it can
be hard to impossible to obtain labeled representations on large scale.
We propose a hybrid approach to this problem: with an extended version of the
recently introduced SMPLify method, we obtain high quality 3D body model fits
to the core human pose datasets. Human annotators solely sort good and bad
fits. This enables us to efficiently build a large dataset with a rich
representation.
In a comprehensive set of experiments, we show how we can make use of this
data to push the limits of discriminative models. With segmentation into 31
body parts and keypoint detection with 91 landmarks, we present compelling
results for human analysis at an unprecedented level of detail.
Using our dense landmark set, we present state-of-the art results for 3D
human pose and shape estimation, while having used an order of magnitude less
training data and making no assumptions about gender or pose in the fitting
procedure. We show that the initial dataset can be enhanced with these improved
fits to grow in quantity and quality, which makes the system deployable on
large scale.
Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Understanding a visual scene goes beyond recognizing individual objects in
isolation. Relationships between objects also constitute rich semantic
information about the scene. In this work, we explicitly model the objects and
their relationships using scene graphs, a visually-grounded graphical structure
of an image. We propose a novel end-to-end model that generates such structured
scene representation from an input image. The model solves the scene graph
inference problem using standard RNNs and learns to iteratively improves its
predictions via message passing. Our joint inference model can take advantage
of contextual cues to make better predictions on objects and their
relationships. The experiments show that our model significantly outperforms
previous methods on generating scene graphs using Visual Genome dataset and
inferring support relations with NYU Depth v2 dataset.
Brian Chu, Daylen Yang, Ravi Tadinada
Comments: UC Berkeley CS 280 final project report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Residual networks are the current state of the art on ImageNet. Similar work
in the direction of utilizing shortcut connections has been done extremely
recently with derivatives of residual networks and with highway networks. This
work potentially challenges our understanding that CNNs learn layers of local
features that are followed by increasingly global features. Through qualitative
visualization and empirical analysis, we explore the purpose that residual skip
connections serve. Our assessments show that the residual shortcut connections
force layers to refine features, as expected. We also provide alternate
visualizations that confirm that residual networks learn what is already
intuitively known about CNNs in general.
Carlos Castillo, Soham De, Xintong Han, Bharat Singh, Abhay Kumar Yadav, Tom Goldstein
Journal-ref: ICASSP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Style transfer is an important task in which the style of a source image is
mapped onto that of a target image. The method is useful for synthesizing
derivative works of a particular artist or specific painting. This work
considers targeted style transfer, in which the style of a template image is
used to alter only part of a target image. For example, an artist may wish to
alter the style of only one particular object in a target image without
altering the object’s general morphology or surroundings. This is useful, for
example, in augmented reality applications (such as the recently released
Pokemon GO), where one wants to alter the appearance of a single real-world
object in an image frame to make it appear as a cartoon. Most notably, the
rendering of real-world objects into cartoon characters has been used in a
number of films and television show, such as the upcoming series Son of Zorn.
We present a method for targeted style transfer that simultaneously segments
and stylizes single objects selected by the user. The method uses a Markov
random field model to smooth and anti-alias outlier pixels near object
boundaries, so that stylized objects naturally blend into their surroundings.
Xiaowei Zhou, Menglong Zhu, Georgios Pavlakos, Spyridon Leonardos, Kostantinos G. Derpanis, Kostas Daniilidis
Comments: The extended version of the following paper: Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video. X Zhou, M Zhu, S Leonardos, K Derpanis, K Daniilidis. CVPR 2016. arXiv admin note: substantial text overlap with arXiv:1511.09439
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recovering 3D full-body human pose is a challenging problem with many
applications. It has been successfully addressed by motion capture systems with
body worn markers and multiple cameras. In this paper, we address the more
challenging case of not only using a single camera but also not leveraging
markers: going directly from 2D appearance to 3D geometry. Deep learning
approaches have shown remarkable abilities to discriminatively learn 2D
appearance features. The missing piece is how to integrate 2D, 3D and temporal
information to recover 3D geometry and account for the uncertainties arising
from the discriminative model. We introduce a novel approach that treats 2D
joint locations as latent variables, whose uncertainty distributions are given
by a deep fully convolutional network. The unknown 3D poses are modeled by a
sparse representation and the 3D parameter estimates are realized via an
Expectation-Maximization algorithm, where it is shown that the 2D joint
location uncertainties can be conveniently marginalized out during inference.
Extensive evaluation on benchmark datasets shows that the proposed approach
achieves greater accuracy over state-of-the-art baselines. Notably, the
proposed approach does not require synchronized 2D-3D data for training and is
applicable to “in-the-wild” images, which is demonstrated with the MPII
dataset.
Ehsan Jahangiri, Erdem Yoruk, Rene Vidal, Laurent Younes, Donald Geman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Despite enormous progress in object detection and classification, the problem
of incorporating expected contextual relationships among object instances into
modern recognition systems remains a key challenge. In this work we propose
Information Pursuit, a Bayesian framework for scene parsing that combines prior
models for the geometry of the scene and the spatial arrangement of objects
instances with a data model for the output of high-level image classifiers
trained to answer specific questions about the scene. In the proposed
framework, the scene interpretation is progressively refined as evidence
accumulates from the answers to a sequence of questions. At each step, we
choose the question to maximize the mutual information between the new answer
and the full interpretation given the current evidence obtained from previous
inquiries. We also propose a method for learning the parameters of the model
from synthesized, annotated scenes obtained by top-down sampling from an
easy-to-learn generative scene model. Finally, we introduce a database of
annotated indoor scenes of dining room tables, which we use to evaluate the
proposed approach.
Abhinav Thanda, Shankar M Venkatesan
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Multi-task learning (MTL) involves the simultaneous training of two or more
related tasks over shared representations. In this work, we apply MTL to
audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn
a mapping between audio-visual fused features and frame labels obtained from
acoustic GMM/HMM model. This is combined with an auxiliary task which maps
visual features to frame labels obtained from a separate visual GMM/HMM model.
The MTL model is tested at various levels of babble noise and the results are
compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate
that MTL is especially useful at higher level of noise. Compared to base-line,
upto 7\% relative improvement in WER is reported at -3 SNR dB
Daniel Meana-Llorián, Cristian González García, B. Cristina Pelayo G-Bustelo, Juan Manuel Cueva Lovelle, Nestor Garcia-Fernandez
Subjects: Artificial Intelligence (cs.AI)
The Internet of Things is arriving to our homes or cities through fields
already known like Smart Homes, Smart Cities, or Smart Towns. The monitoring of
environmental conditions of cities can help to adapt the indoor locations of
the cities in order to be more comfortable for people who stay there. A way to
improve the indoor conditions is an efficient temperature control, however, it
depends on many factors like the different combinations of outdoor temperature
and humidity. Therefore, adjusting the indoor temperature is not setting a
value according to other value. There are many more factors to take into
consideration, hence the traditional logic based in binary states cannot be
used. Many problems cannot be solved with a set of binary solutions and we need
a new way of development. Fuzzy logic is able to interpret many states, more
than two states, giving to computers the capacity to react in a similar way to
people. In this paper we will propose a new approach to control the temperature
using the Internet of Things together its platforms and fuzzy logic regarding
not only the indoor temperature but also the outdoor temperature and humidity
in order to save energy and to set a more comfortable environment for their
users. Finally, we will conclude that the fuzzy approach allows us to achieve
an energy saving around 40% and thus, save money.
Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, Xiuwen Yi, Tianrui Li
Comments: 21 pages, 16 figures. arXiv admin note: substantial text overlap with arXiv:1610.00081
Subjects: Artificial Intelligence (cs.AI)
Forecasting the flow of crowds is of great importance to traffic management
and public safety, and very challenging as it is affected by many complex
factors, including spatial dependencies (nearby and distant), temporal
dependencies (closeness, period, trend), and external conditions (e.g., weather
and events). We propose a deep-learning-based approach, called ST-ResNet, to
collectively forecast two types of crowd flows (i.e. inflow and outflow) in
each and every region of a city. We design an end-to-end structure of ST-ResNet
based on unique properties of spatio-temporal data. More specifically, we
employ the residual neural network framework to model the temporal closeness,
period, and trend properties of crowd traffic. For each property, we design a
branch of residual convolutional units, each of which models the spatial
properties of crowd traffic. ST-ResNet learns to dynamically aggregate the
output of the three residual neural networks based on data, assigning different
weights to different branches and regions. The aggregation is further combined
with external factors, such as weather and day of the week, to predict the
final traffic of crowds in each and every region. We have developed a real-time
system based on Microsoft Azure Cloud, called UrbanFlow, providing the crowd
flow monitoring and forecasting in Guiyang City of China. In addition, we
present an extensive experimental evaluation using two types of crowd flows in
Beijing and New York City (NYC), where ST-ResNet outperforms nine well-known
baselines.
Gabriel Murray
Comments: Submitted to Canadian A.I. 2017 conference
Subjects: Artificial Intelligence (cs.AI)
We present a position paper advocating the notion that Stoic philosophy and
ethics can inform the development of ethical A.I. systems. This is in sharp
contrast to most work on building ethical A.I., which has focused on
Utilitarian or Deontological ethical theories. We relate ethical A.I. to
several core Stoic notions, including the dichotomy of control, the four
cardinal virtues, the ideal Sage, Stoic practices, and Stoic perspectives on
emotion or affect. More generally, we put forward an ethical view of A.I. that
focuses more on internal states of the artificial agent rather than on external
actions of the agent. We provide examples relating to near-term A.I. systems as
well as hypothetical superintelligent agents.
Diego Marcheggiani, Anton Frolov, Ivan Titov
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We introduce a simple and accurate neural model for dependency-based semantic
role labeling. Our model predicts predicate-argument dependencies relying on
states of a bidirectional LSTM encoder. The semantic role labeler achieves
respectable performance on English even without any kind of syntactic
information and only using local inference. However, when automatically
predicted part-of-speech tags are provided as input, it substantially
outperforms all previous local models and approaches the best reported results
on the CoNLL-2009 dataset. Syntactic parsers are unreliable on out-of-domain
data, so standard (i.e. syntactically-informed) SRL models are hindered when
tested in this setting. Our syntax-agnostic model appears more robust,
resulting in the best reported results on the standard out-of-domain test set.
Chris Heunen, Ohad Kammar, Sam Staton, Hongseok Yang
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Category Theory (math.CT); Probability (math.PR)
Higher-order probabilistic programming languages allow programmers to write
sophisticated models in machine learning and statistics in a succinct and
structured way, but step outside the standard measure-theoretic formalization
of probability theory. Programs may use both higher-order functions and
continuous distributions, or even define a probability distribution on
functions. But standard probability theory cannot support higher-order
functions, that is, the category of measurable spaces is not cartesian closed.
Here we introduce quasi-Borel spaces. We show that these spaces: form a new
formalization of probability theory replacing measurable spaces; form a
cartesian closed category and so support higher-order functions; form an
extensional category and so support good proof principles for equational
reasoning; and support continuous probability distributions. We demonstrate the
use of quasi-Borel spaces for higher-order functions and probability by:
showing that a well-known construction of probability theory involving random
functions gains a cleaner expression; and generalizing de Finetti’s theorem,
that is a crucial theorem in probability theory, to quasi-Borel spaces.
Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo
Comments: WSDM 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
The majority of online display ads are served through real-time bidding (RTB)
— each ad display impression is auctioned off in real-time when it is just
being generated from a user visit. To place an ad automatically and optimally,
it is critical for advertisers to devise a learning algorithm to cleverly bid
an ad impression in real-time. Most previous works consider the bid decision as
a static optimization problem of either treating the value of each impression
independently or setting a bid price to each segment of ad volume. However, the
bidding for a given ad campaign would repeatedly happen during its life span
before the budget runs out. As such, each bid is strategically correlated by
the constrained budget and the overall effectiveness of the campaign (e.g., the
rewards from generated clicks), which is only observed after the campaign has
completed. Thus, it is of great interest to devise an optimal bidding strategy
sequentially so that the campaign budget can be dynamically allocated across
all the available impressions on the basis of both the immediate and future
rewards. In this paper, we formulate the bid decision process as a
reinforcement learning problem, where the state space is represented by the
auction information and the campaign’s real-time parameters, while an action is
the bid price to set. By modeling the state transition via auction competition,
we build a Markov Decision Process framework for learning the optimal bidding
policy to optimize the advertising performance in the dynamic real-time bidding
environment. Furthermore, the scalability problem from the large real-world
auction volume and campaign budget is well handled by state value approximation
using neural networks.
Abhinav Thanda, Shankar M Venkatesan
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Multi-task learning (MTL) involves the simultaneous training of two or more
related tasks over shared representations. In this work, we apply MTL to
audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn
a mapping between audio-visual fused features and frame labels obtained from
acoustic GMM/HMM model. This is combined with an auxiliary task which maps
visual features to frame labels obtained from a separate visual GMM/HMM model.
The MTL model is tested at various levels of babble noise and the results are
compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate
that MTL is especially useful at higher level of noise. Compared to base-line,
upto 7\% relative improvement in WER is reported at -3 SNR dB
Tanmay Shankar, Santosha K. Dwivedy, Prithwijit Guha
Comments: Accepted at the International Conference on Pattern Recognition, ICPR 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Deep Reinforcement Learning has enabled the learning of policies for complex
tasks in partially observable environments, without explicitly learning the
underlying model of the tasks. While such model-free methods achieve
considerable performance, they often ignore the structure of task. We present a
natural representation of to Reinforcement Learning (RL) problems using
Recurrent Convolutional Neural Networks (RCNNs), to better exploit this
inherent structure. We define 3 such RCNNs, whose forward passes execute an
efficient Value Iteration, propagate beliefs of state in partially observable
environments, and choose optimal actions respectively. Backpropagating
gradients through these RCNNs allows the system to explicitly learn the
Transition Model and Reward Function associated with the underlying MDP,
serving as an elegant alternative to classical model-based RL. We evaluate the
proposed algorithms in simulation, considering a robot planning problem. We
demonstrate the capability of our framework to reduce the cost of replanning,
learn accurate MDP models, and finally re-plan with learnt models to achieve
near-optimal policies.
Kory W. Mathewson, Patrick M. Pilarski
Comments: 4 pages, 2 figures, Accepted at the 2017 AAAI Spring Symposium on Interactive Multi-Sensory Object Perception for Embodied Agents
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO)
This paper extends recent work in interactive machine learning (IML) focused
on effectively incorporating human feedback. We show how control and feedback
signals complement each other in systems which model human reward. We
demonstrate that simultaneously incorporating human control and feedback
signals can improve interactive robotic systems’ performance on a self-mirrored
movement control task where an RL-agent controlled right arm attempts to match
the preprogrammed movement pattern of the left arm. We illustrate the impact of
varying human feedback parameters on task performance by investigating the
probability of giving feedback on each time step and the likelihood of given
feedback being correct. We further illustrate that varying the temporal decay
with which the agent incorporates human feedback has a significant impact on
task performance. We found that ‘smearing’ human feedback over time steps
improves performance and we show varying the probability of feedback at each
time step, and an increased likelihood of those feedbacks being ‘correct’ can
impact agent performance. We conclude that understanding latent variables in
human feedback is crucial for learning algorithms acting in human-machine
interaction domains.
Markus Viljanen, Antti Airola, Jukka Heikkonen, Tapio Pahikkala
Subjects: Applications (stat.AP); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Maximizing product use is a central goal of many businesses, which makes
retention and monetization two central analytics metrics in games. Player
retention may refer to various duration variables quantifying product use:
total playtime or session playtime are popular research targets, and active
playtime is well-suited for subscription games. Such research often has the
goal of increasing player retention or conversely decreasing player churn.
Survival analysis is a framework of powerful tools well suited for retention
type data. This paper contributes new methods to game analytics on how playtime
can be analyzed using survival analysis without covariates. Survival and hazard
estimates provide both a visual and an analytic interpretation of the playtime
phenomena as a funnel type nonparametric estimate. Metrics based on the
survival curve can be used to aggregate this playtime information into a single
statistic. Comparison of survival curves between cohorts provides a scientific
AB-test. All these methods work on censored data and enable computation of
confidence intervals. This is especially important in time and sample limited
data which occurs during game development. Throughout this paper, we illustrate
the application of these methods to real world game development problems on the
Hipster Sheep mobile game.
Ehsan Jahangiri, Erdem Yoruk, Rene Vidal, Laurent Younes, Donald Geman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Despite enormous progress in object detection and classification, the problem
of incorporating expected contextual relationships among object instances into
modern recognition systems remains a key challenge. In this work we propose
Information Pursuit, a Bayesian framework for scene parsing that combines prior
models for the geometry of the scene and the spatial arrangement of objects
instances with a data model for the output of high-level image classifiers
trained to answer specific questions about the scene. In the proposed
framework, the scene interpretation is progressively refined as evidence
accumulates from the answers to a sequence of questions. At each step, we
choose the question to maximize the mutual information between the new answer
and the full interpretation given the current evidence obtained from previous
inquiries. We also propose a method for learning the parameters of the model
from synthesized, annotated scenes obtained by top-down sampling from an
easy-to-learn generative scene model. Finally, we introduce a database of
annotated indoor scenes of dining room tables, which we use to evaluate the
proposed approach.
Anasua Mitra, Amit Awekar
Comments: 2 pages, submitted to ACM WWW Conference 2017
Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)
Number of published scholarly articles is growing exponentially. To tackle
this information overload, researchers are increasingly depending on niche
academic search engines. Recent works have shown that two major general web
search engines: Google and Bing, have high level of agreement in their top
search results. In contrast, we show that various academic search engines have
low degree of agreement among themselves. We performed experiments using 2500
queries over four academic search engines. We observe that overlap in search
result sets of any pair of academic search engines is significantly low and in
most of the cases the search result sets are mutually exclusive. We also
discuss implications of this low overlap.
Ying Zhang, Mohammad Pezeshki, Philemon Brakel, Saizheng Zhang, Cesar Laurent Yoshua Bengio, Aaron Courville
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
Convolutional Neural Networks (CNNs) are effective models for reducing
spectral variations and modeling spectral correlations in acoustic features for
automatic speech recognition (ASR). Hybrid speech recognition systems
incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models
(HMMs/GMMs) have achieved the state-of-the-art in various benchmarks.
Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural
Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it
feasible to train an end-to-end speech recognition system instead of hybrid
settings. However, RNNs are computationally expensive and sometimes difficult
to train. In this paper, inspired by the advantages of both CNNs and the CTC
approach, we propose an end-to-end speech framework for sequence labeling, by
combining hierarchical CNNs with CTC directly without recurrent connections. By
evaluating the approach on the TIMIT phoneme recognition task, we show that the
proposed model is not only computationally efficient, but also competitive with
the existing baseline systems. Moreover, we argue that CNNs have the capability
to model temporal correlations with appropriate context information.
Diego Marcheggiani, Anton Frolov, Ivan Titov
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We introduce a simple and accurate neural model for dependency-based semantic
role labeling. Our model predicts predicate-argument dependencies relying on
states of a bidirectional LSTM encoder. The semantic role labeler achieves
respectable performance on English even without any kind of syntactic
information and only using local inference. However, when automatically
predicted part-of-speech tags are provided as input, it substantially
outperforms all previous local models and approaches the best reported results
on the CoNLL-2009 dataset. Syntactic parsers are unreliable on out-of-domain
data, so standard (i.e. syntactically-informed) SRL models are hindered when
tested in this setting. Our syntax-agnostic model appears more robust,
resulting in the best reported results on the standard out-of-domain test set.
Yang Xu, Jiawei Liu
Comments: 7 pages, 7 figures
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
In this paper, we implicitly incorporate morpheme information into word
embedding. Based on the strategy we utilize the morpheme information, three
models are proposed. To test the performances of our models, we conduct the
word similarity and syntactic analogy. The results demonstrate the
effectiveness of our methods. Our models beat the comparative baselines on both
tasks to a great extent. On the golden standard Wordsim-353 and RG-65, our
models approximately outperform CBOW for 5 and 7 percent, respectively. In
addition, 7 percent advantage is also achieved by our models on syntactic
analysis. According to parameter analysis, our models can increase the semantic
information in the corpus and our performances on the smallest corpus are
similar to the performance of CBOW on the corpus which is five times ours. This
property of our methods may have some positive effects on NLP researches about
the corpus-limited languages.
Abhinav Thanda, Shankar M Venkatesan
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Multi-task learning (MTL) involves the simultaneous training of two or more
related tasks over shared representations. In this work, we apply MTL to
audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn
a mapping between audio-visual fused features and frame labels obtained from
acoustic GMM/HMM model. This is combined with an auxiliary task which maps
visual features to frame labels obtained from a separate visual GMM/HMM model.
The MTL model is tested at various levels of babble noise and the results are
compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate
that MTL is especially useful at higher level of noise. Compared to base-line,
upto 7\% relative improvement in WER is reported at -3 SNR dB
Mustafa Kemal Taş, Kamer Kaya, Erik Saule
Comments: 11 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
In parallel computing, a valid graph coloring yields a lock-free processing
of the colored tasks, data points, etc., without expensive synchronization
mechanisms. However, coloring is not free and the overhead can be significant.
In particular, for the bipartite-graph partial coloring (BGPC) and distance-2
graph coloring (D2GC) problems, which have various use-cases within the
scientific computing and numerical optimization domains, the coloring overhead
can be in the order of minutes with a single thread for many real-life graphs.
In this work, we propose parallel algorithms for bipartite-graph partial
coloring on shared-memory architectures. Compared to the existing shared-memory
BGPC algorithms, the proposed ones employ greedier and more optimistic
techniques that yield a better parallel coloring performance. In particular, on
16 cores, the proposed algorithms perform more than 4x faster than their
counterparts in the ColPack library which is, to the best of our knowledge, the
only publicly-available coloring library for multicore architectures. In
addition to BGPC, the proposed techniques are employed to devise parallel
distance-2 graph coloring algorithms and similar performance improvements have
been observed. Finally, we propose two costless balancing heuristics for BGPC
that can reduce the skewness and imbalance on the cardinality of color sets
(almost) for free. The heuristics can also be used for the D2GC problem and in
general, they will probably yield a better color-based parallelization
performance especially on many-core architectures.
Rabindra K. Barik, Harishchandra Dubey, Arun B. Samaddar, Rajan D. Gupta, Prakash K. Ray
Comments: 6 pages, 4 figures, 1 table, 3rd IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (09-11 December, 2016) Indian Institute of Technology (Banaras Hindu University) Varanasi, India
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cloud Geographic Information Systems (GIS) has emerged as a tool for
analysis, processing and transmission of geospatial data. The Fog computing is
a paradigm where Fog devices help to increase throughput and reduce latency at
the edge of the client. This paper developed a Fog-based framework named Fog
GIS for mining analytics from geospatial data. We built a prototype using Intel
Edison, an embedded microprocessor. We validated the FogGIS by doing
preliminary analysis. including compression, and overlay analysis. Results
showed that Fog computing hold a great promise for analysis of geospatial data.
We used several open source compression techniques for reducing the
transmission to the cloud.
Ofer Feinerman, Amos Korman (IRIF, GANG)
Comments: arXiv admin note: text overlap with arXiv:1205.4545
Journal-ref: Distributed Computing, Springer Verlag, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We introduce the Ants Nearby Treasure Search (ANTS) problem, which models
natural cooperative foraging behavior such as that performed by ants around
their nest. In this problem, k probabilistic agents, initially placed at a
central location, collectively search for a treasure on the two-dimensional
grid. The treasure is placed at a target location by an adversary and the
agents’ goal is to find it as fast as possible as a function of both k and D,
where D is the (unknown) distance between the central location and the target.
We concentrate on the case in which agents cannot communicate while searching.
It is straightforward to see that the time until at least one agent finds the
target is at least (Omega)(D + D 2 /k), even for very sophisticated agents,
with unrestricted memory. Our algorithmic analysis aims at establishing
connections between the time complexity and the initial knowledge held by
agents (e.g., regarding their total number k), as they commence the search. We
provide a range of both upper and lower bounds for the initial knowledge
required for obtaining fast running time. For example, we prove that log log k
+ (Theta)(1) bits of initial information are both necessary and sufficient to
obtain asymptotically optimal running time, i.e., O(D +D 2 /k). We also we
prove that for every 0 extless{} extless{} 1, running in time O(log 1– k
( imes)(D +D 2 /k)) requires that agents have the capacity for storing
(Omega)(log k) different states as they leave the nest to start the search. To
the best of our knowledge, the lower bounds presented in this paper provide the
first non-trivial lower bounds on the memory complexity of probabilistic agents
in the context of search problems. We view this paper as a “proof of concept”
for a new type of interdisciplinary methodology. To fully demonstrate this
methodology, the theoretical tradeoff presented here (or a similar one) should
be combined with measurements of the time performance of searching ants.
Yuqing Zhu, Philip S. Yu, Yungang Bao, Guolei Yi, Wenlong Ma, Mengying Guo, Jianxun Liu
Comments: 11 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Highly-available datastores are widely deployed for online applications.
However, many online applications are not contented with the simple data access
interface currently provided by highly-available datastores. Distributed
transaction support is demanded by applications such as large-scale online
payment used by Alipay or Paypal. Current solutions to distributed transaction
can spend more than half of the whole transaction processing time in
distributed commit. An efficient atomic commit protocol is highly desirable.
This paper presents the HACommit protocol, a logless one-phase commit protocol
for highly-available systems. HACommit has transaction participants vote for a
commit before the client decides to commit or abort the transaction; in
comparison, the state-of-the-art practice for distributed commit is to have the
client decide before participants vote. The change enables the removal of both
the participant logging and the coordinator logging steps in the distributed
commit process; it also makes possible that, after the client initiates the
transaction commit, the transaction data is visible to other transactions
within one communication roundtrip time (i.e., one phase). In the evaluation
with extensive experiments, HACommit outperforms recent atomic commit solutions
for highly-available datastores under different workloads. In the best case,
HACommit can commit in one fifth of the time 2PC does.
Chenhan D. Yu, William B. March, George Biros
Comments: proceeding 31st IEEE International Parallel & Distributed Processing Symposium
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (cs.NA)
Kernel matrices appear in machine learning and non-parametric statistics.
Given (N) points in (d) dimensions and a kernel function that requires
(mathcal{O}(d)) work to evaluate, we present an (mathcal{O}(dNlog N))-work
algorithm for the approximate factorization of a regularized kernel matrix, a
common computational bottleneck in the training phase of a learning task. With
this factorization, solving a linear system with a kernel matrix can be done
with (mathcal{O}(Nlog N)) work. Our algorithm only requires kernel
evaluations and does not require that the kernel matrix admits an efficient
global low rank approximation. Instead our factorization only assumes low-rank
properties for the off-diagonal blocks under an appropriate row and column
ordering. We also present a hybrid method that, when the factorization is
prohibitively expensive, combines a partial factorization with iterative
methods. As a highlight, we are able to approximately factorize a dense
(11M imes11M) kernel matrix in 2 minutes on 3,072 x86 “Haswell” cores and a
(4.5M imes4.5M) matrix in 1 minute using 4,352 “Knights Landing” cores.
Vladimir Savic, Elad M. Schiller, Marina Papatriantafilou
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Vehicle-to-vehicle (V2V) communication is a crucial component of the future
autonomous driving systems since it enables improved awareness of the
surrounding environment, even without extensive processing of sensory
information. However, V2V communication is prone to failures and delays, so a
distributed fault-tolerant approach is required for safe and efficient
transportation. In this paper, we focus on the intersection crossing (IC)
problem with autonomous vehicles that cooperate via V2V communications, and
propose a novel distributed IC algorithm that can handle an unknown number of
communication failures. Our analysis shows that both safety and liveness
requirements are satisfied in all realistic situations. We also found, based on
a real data set, that the crossing delay is only slightly increased even in the
presence of highly correlated failures.
Feng Liu, Guanquan Zhang, Haiyan Lu, Jie Lu
Comments: 48 Pages, 4 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Transfer learning addresses the problem of how to leverage previously
acquired knowledge (a source domain) to improve the efficiency of learning in a
new domain (the target domain). Although transfer learning has been widely
researched in the last decade, existing research still has two restrictions: 1)
the feature spaces of the domains must be homogeneous; and 2) the target domain
must have at least a few labelled instances. These restrictions significantly
limit transfer learning models when transferring knowledge across domains,
especially in the big data era. To completely break through both of these
bottlenecks, a theorem for reliable unsupervised knowledge transfer is proposed
to avoid negative transfers, and a Grassmann manifold is applied to measure the
distance between heterogeneous feature spaces. Based on this theorem and the
Grassmann manifold, this study proposes two heterogeneous unsupervised
knowledge transfer (HeUKT) models, known as RLG and GLG. The RLG uses a linear
monotonic map (LMM) to reliably project two heterogeneous feature spaces onto a
latent feature space and applies geodesic flow kernel (GFK) model to transfers
knowledge between two the projected domains. The GLG optimizes the LMM to
achieve the highest possible accuracy and guarantees that the geometric
properties of the domains remain unchanged during the transfer process. To test
the overall effectiveness of two models, this paper reorganizes five public
datasets into ten heterogeneous cross-domain tasks across three application
fields: credit assessment, text classification, and cancer detection. Extensive
experiments demonstrate that the proposed models deliver superior performance
over current benchmarks, and that these HeUKT models are a promising way to
give computers the associative ability to judge unknown things using related
known knowledge.
Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo
Comments: WSDM 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
The majority of online display ads are served through real-time bidding (RTB)
— each ad display impression is auctioned off in real-time when it is just
being generated from a user visit. To place an ad automatically and optimally,
it is critical for advertisers to devise a learning algorithm to cleverly bid
an ad impression in real-time. Most previous works consider the bid decision as
a static optimization problem of either treating the value of each impression
independently or setting a bid price to each segment of ad volume. However, the
bidding for a given ad campaign would repeatedly happen during its life span
before the budget runs out. As such, each bid is strategically correlated by
the constrained budget and the overall effectiveness of the campaign (e.g., the
rewards from generated clicks), which is only observed after the campaign has
completed. Thus, it is of great interest to devise an optimal bidding strategy
sequentially so that the campaign budget can be dynamically allocated across
all the available impressions on the basis of both the immediate and future
rewards. In this paper, we formulate the bid decision process as a
reinforcement learning problem, where the state space is represented by the
auction information and the campaign’s real-time parameters, while an action is
the bid price to set. By modeling the state transition via auction competition,
we build a Markov Decision Process framework for learning the optimal bidding
policy to optimize the advertising performance in the dynamic real-time bidding
environment. Furthermore, the scalability problem from the large real-world
auction volume and campaign budget is well handled by state value approximation
using neural networks.
Maziar Raissi, George Em. Karniadakis
Subjects: Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
This work leverages recent advances in probabilistic machine learning to
discover conservation laws expressed by parametric linear equations. Such
equations involve, but are not limited to, ordinary and partial differential,
integro-differential, and fractional order operators. Here, Gaussian process
priors are modified according to the particular form of such operators and are
employed to infer parameters of the linear equations from scarce and possibly
noisy observations. Such observations may come from experiments or “black-box”
computer simulations.
Tanmay Shankar, Santosha K. Dwivedy, Prithwijit Guha
Comments: Accepted at the International Conference on Pattern Recognition, ICPR 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Deep Reinforcement Learning has enabled the learning of policies for complex
tasks in partially observable environments, without explicitly learning the
underlying model of the tasks. While such model-free methods achieve
considerable performance, they often ignore the structure of task. We present a
natural representation of to Reinforcement Learning (RL) problems using
Recurrent Convolutional Neural Networks (RCNNs), to better exploit this
inherent structure. We define 3 such RCNNs, whose forward passes execute an
efficient Value Iteration, propagate beliefs of state in partially observable
environments, and choose optimal actions respectively. Backpropagating
gradients through these RCNNs allows the system to explicitly learn the
Transition Model and Reward Function associated with the underlying MDP,
serving as an elegant alternative to classical model-based RL. We evaluate the
proposed algorithms in simulation, considering a robot planning problem. We
demonstrate the capability of our framework to reduce the cost of replanning,
learn accurate MDP models, and finally re-plan with learnt models to achieve
near-optimal policies.
Marco Gori, Marco Maggini, Alessandro Rossi
Subjects: Learning (cs.LG)
In this document we shows a first implementation and some preliminary results
of a new theory, facing Machine Learning problems in the frameworks of
Classical Mechanics and Variational Calculus. We give a general formulation of
the problem and then we studies basic behaviors of the model on simple
practical implementations.
Ying Zhang, Mohammad Pezeshki, Philemon Brakel, Saizheng Zhang, Cesar Laurent Yoshua Bengio, Aaron Courville
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
Convolutional Neural Networks (CNNs) are effective models for reducing
spectral variations and modeling spectral correlations in acoustic features for
automatic speech recognition (ASR). Hybrid speech recognition systems
incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models
(HMMs/GMMs) have achieved the state-of-the-art in various benchmarks.
Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural
Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it
feasible to train an end-to-end speech recognition system instead of hybrid
settings. However, RNNs are computationally expensive and sometimes difficult
to train. In this paper, inspired by the advantages of both CNNs and the CTC
approach, we propose an end-to-end speech framework for sequence labeling, by
combining hierarchical CNNs with CTC directly without recurrent connections. By
evaluating the approach on the TIMIT phoneme recognition task, we show that the
proposed model is not only computationally efficient, but also competitive with
the existing baseline systems. Moreover, we argue that CNNs have the capability
to model temporal correlations with appropriate context information.
Hao Dong, Paarth Neekhara, Chao Wu, Yike Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
It’s useful to automatically transform an image from its original form to
some synthetic form (style, partial contents, etc.), while keeping the original
structure or semantics. We define this requirement as the “image-to-image
translation” problem, and propose a general approach to achieve it, based on
deep convolutional and conditional generative adversarial networks (GANs),
which has gained a phenomenal success to learn mapping images from noise input
since 2014. In this work, we develop a two step (unsupervised) learning method
to translate images between different domains by using unlabeled images without
specifying any correspondence between them, so that to avoid the cost of
acquiring labeled data. Compared with prior works, we demonstrated the capacity
of generality in our model, by which variance of translations can be conduct by
a single type of model. Such capability is desirable in applications like
bidirectional translation
Yang Xu, Jiawei Liu
Comments: 7 pages, 7 figures
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
In this paper, we implicitly incorporate morpheme information into word
embedding. Based on the strategy we utilize the morpheme information, three
models are proposed. To test the performances of our models, we conduct the
word similarity and syntactic analogy. The results demonstrate the
effectiveness of our methods. Our models beat the comparative baselines on both
tasks to a great extent. On the golden standard Wordsim-353 and RG-65, our
models approximately outperform CBOW for 5 and 7 percent, respectively. In
addition, 7 percent advantage is also achieved by our models on syntactic
analysis. According to parameter analysis, our models can increase the semantic
information in the corpus and our performances on the smallest corpus are
similar to the performance of CBOW on the corpus which is five times ours. This
property of our methods may have some positive effects on NLP researches about
the corpus-limited languages.
Abhinav Thanda, Shankar M Venkatesan
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Multi-task learning (MTL) involves the simultaneous training of two or more
related tasks over shared representations. In this work, we apply MTL to
audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn
a mapping between audio-visual fused features and frame labels obtained from
acoustic GMM/HMM model. This is combined with an auxiliary task which maps
visual features to frame labels obtained from a separate visual GMM/HMM model.
The MTL model is tested at various levels of babble noise and the results are
compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate
that MTL is especially useful at higher level of noise. Compared to base-line,
upto 7\% relative improvement in WER is reported at -3 SNR dB
Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an
effective method for training generative models of complex data such as natural
images. However, they are notoriously hard to train and can suffer from the
problem of missing modes where the model is not able to produce examples in
certain regions of the space. We propose an iterative procedure, called AdaGAN,
where at every step we add a new component into a mixture model by running a
GAN algorithm on a reweighted sample. This is inspired by boosting algorithms,
where many potentially weak individual predictors are greedily aggregated to
form a strong composite predictor. We prove that such an incremental procedure
leads to convergence to the true distribution in a finite number of steps if
each step is optimal, and convergence at an exponential rate otherwise. We also
illustrate experimentally that this procedure addresses the problem of missing
modes.
Subhro Das, José M. F. Moura
Comments: Accepted for publication at ICASSP 2017
Subjects: Information Theory (cs.IT)
This work presents distributed algorithms for estimation of time-varying
random fields over multi-agent/sensor networks. A network of sensors makes
sparse and noisy local measurements of the dynamic field. Each sensor aims to
obtain unbiased distributed estimates of the entire field with bounded
mean-squared error (MSE) based on its own local observations and its neighbors’
estimates. This work develops three novel distributed estimators:
Pseudo-Innovations Kalman Filter (PIKF), Distributed Information Kalman Filter
(DIKF) and Consensus+Innovations Kalman Filter (CIKF). We design the gain
matrices such that the estimators achieve unbiased estimates with bounded MSE
under minimal assumptions on the local observation and network communication
models. This work establishes trade-offs between these three distributed
estimators and demonstrates how they outperform existing solutions. We validate
our results through extensive numerical evaluations.
Juerg Kohlas
Subjects: Information Theory (cs.IT)
The basic idea behind information algebras is that information comes in
pieces, each referring to a certain question, that these pieces can be combined
or aggregated and that the part relating to a given question can be extracted.
This algebraic structure can be given different forms. Questions were
originally represented by subsets of variables. Pieces of information were then
represented by valuations associated with the domains of variables. This leads
to an algebraic structure called valuation algebras. The basic axiomatics of
this algebraic structure was in essence proposed by Shenoy and Shafer. Here a
much more general view of systems of questions is proposed and pieces of
information are related to the elements of this system of questions. This leads
to a new and extended system of axioms for information algebras. Classical
valuation algebras are essentially a special case of this new system. A full
discussion of the algebraic theory of this new information algebras is given,
including local computation, duality between labeled and domain-free versions
of the algebras, order of information, finiteness of information and
approximation, compact and continuous information algebras. Finally a rather
complete discussion of uncertain information, based on random maps into
information algebras is presented. This is shown to represent a generalisation
of classical Dempster-Shafer theory.
Antonia Wachter-Zeh
Comments: Short version submitted to ISIT 2017
Subjects: Information Theory (cs.IT)
List decoding of insertions and deletions in the Levenshtein metric is
considered. The Levenshtein distance between two sequences is the minimum
number of insertions and deletions needed to turn one of the sequences into the
other. In this paper, a Johnson-like upper bound on the list size in the
Levenshtein metric is derived. This bound depends only on the length and
minimum Levenshtein distance of the code, the length of the received word, and
the alphabet size. It shows that polynomial- time list decoding beyond half the
Levenshtein distance is possible for many parameters. For example, list
decoding of two insertions/deletions with the well-known Varshamov-Tenengolts
(VT) codes is possible. Further, we also show a lower bound on list decoding VT
codes and an efficient list decoding algorithm for certain subcodes of VT
codes.
Yanting Ma, Yue M. Lu, Dror Baron
Comments: Accepted for publication in ICASSP 2017
Subjects: Information Theory (cs.IT)
Solving a large-scale regularized linear inverse problem using multiple
processors is important in various real-world applications due to the
limitations of individual processors and constraints on data sharing policies.
This paper focuses on the setting where the matrix is partitioned column-wise.
We extend the algorithmic framework and the theoretical analysis of approximate
message passing (AMP), an iterative algorithm for solving linear inverse
problems, whose asymptotic dynamics are characterized by state evolution (SE).
In particular, we show that column-wise multiprocessor AMP (C-MP-AMP) obeys an
SE under the same assumptions when the SE for AMP holds. The SE results imply
that (i) the SE of C-MP-AMP converges to a state that is no worse than that of
AMP and (ii) the asymptotic dynamics of C-MP-AMP and AMP can be identical.
Moreover, for a setting that is not covered by SE, numerical results show that
damping can improve the convergence performance of C-MP-AMP.
B. N. Bharath, P. Vaishali
Comments: 16 pages + 5 figures
Subjects: Information Theory (cs.IT)
In this paper, we consider a distributed stochastic optimization problem
where the goal is to minimize the time average of a cost function subject to a
set of constraints on the time averages of related stochastic processes called
penalties. We assume that the state of the system is evolving in an independent
and non-stationary fashion and the “common information” available at each node
is distributed and delayed. Such stochastic optimization is an integral part of
many important problems in wireless networks such as scheduling, routing,
resource allocation and crowd sensing. We propose an approximate distributed
Drift- Plus-Penalty (DPP) algorithm, and show that it achieves a time average
cost (and penalties) that is within epsilon > 0 of the optimal cost (and
constraints) with high probability. Also, we provide a condition on the
convergence time t for this result to hold. In particular, for any delay D >= 0
in the common information, we use a coupling argument to prove that the
proposed algorithm converges almost surely to the optimal solution. We use an
application from wireless sensor network to corroborate our theoretical
findings through simulation results.
Céline Aubel, Helmut Bölcksei
Comments: 47 pages, 2 figures, submitted to Applied and Computational Harmonic Analysis
Subjects: Information Theory (cs.IT); Functional Analysis (math.FA); Numerical Analysis (math.NA); Number Theory (math.NT)
We derive bounds on the extremal singular values and the condition number of
NxK, with N>=K, Vandermonde matrices with nodes in the unit disk. The
mathematical techniques we develop to prove our main results are inspired by
the link—first established by Selberg [1] and later extended by Moitra
[2]—between the extremal singular values of Vandermonde matrices with nodes
on the unit circle and large sieve inequalities. Our main conceptual
contribution lies in establishing a connection between the extremal singular
values of Vandermonde matrices with nodes in the unit disk and a novel large
sieve inequality involving polynomials in z in C with |z|<=1. This is
accomplished by first recognizing that the Selberg-Moitra connection can
alternatively be obtained based on the Montgomery-Vaughan proof technique for
the large sieve, and then extending this alternative connection from the unit
circle to the unit disk. Compared to Baz’an’s upper bound on the condition
number [3], which, to the best of our knowledge, constitutes the only
analytical result—available in the literature—on the condition number of
Vandermonde matrices with nodes in the unit disk, our bound not only takes a
much simpler form, but is also sharper for certain node configurations.
Moreover, the bound we report can be consistently evaluated in a numerically
stable fashion, whereas the evaluation of Baz’an’s bound requires the solution
of a linear system of equations which has the same condition number as the
Vandermonde matrix under consideration and can therefore lead to numerical
instability in practice. As a byproduct, our result—when particularized to
the case of nodes on the unit circle—slightly improves upon the
Selberg-Moitra bound. This improved bound also applies to the square case, N=K,
not covered by the Selberg-Moitra result.
Burhan Gulbahar
Comments: 7 pages, 2 figures, 2 tables. To be published in Proc. of The IEEE GLOBECOM 2016 First International Workshop on the Internet of Everything (IoE), Washington D.C., USA, December 2016, Copyright {copyright}2016 IEEE
Subjects: Information Theory (cs.IT)
Internet-of-things (IoT) architectures connecting a massive number of
heterogeneous devices need energy efficient, low hardware complexity, low cost,
simple and secure mechanisms to realize communication among devices. One of the
emerging schemes is to realize simultaneous wireless information and power
transfer (SWIPT) in an energy harvesting network. Radio frequency (RF)
solutions require special hardware and modulation methods for RF to direct
current (DC) conversion and optimized operation to achieve SWIPT which are
currently in an immature phase. On the other hand, magneto-inductive (MI)
communication transceivers are intrinsically energy harvesting with potential
for SWIPT in an efficient manner. In this article, novel modulation and
demodulation mechanisms are presented in a combined framework with
multiple-access channel (MAC) communication and wireless power transmission.
The network topology of power transmitting active coils in a transceiver
composed of a grid of coils is changed as a novel method to transmit
information. Practical demodulation schemes are formulated and numerically
simulated for two-user MAC topology of small size coils. The transceivers are
suitable to attach to everyday objects to realize reliable local area network
(LAN) communication performances with tens of meters communication ranges. The
designed scheme is promising for future IoT applications requiring SWIPT with
energy efficient, low cost, low power and low hardware complexity solutions.
Swanand Kadhe, Robert Calderbank
Comments: Longer version of the ISIT 2017 submission
Subjects: Information Theory (cs.IT)
A locally repairable code with availability has the property that every code
symbol can be recovered from multiple, disjoint subsets of other symbols of
small size. In particular, a code symbol is said to have ((r,t))-availability
if it can be recovered from (t) disjoint subsets, each of size at most (r). A
code with availability is said to be ‘rate-optimal’, if its rate is maximum
among the class of codes with given locality, availability, and alphabet size.
This paper focuses on rate-optimal binary, linear codes with small
availability, and makes three contributions. First, it establishes tight upper
bounds on the rate of binary linear codes with ((r,2)) and ((2,3))
availability. Second, it establishes a uniqueness result for binary
rate-optimal codes, showing that for certain classes of binary linear codes
with ((r,2)) and ((2,3))-availability, any rate optimal code must be a direct
sum of shorter rate optimal codes. Finally, it derives properties of locally
repairable linear codes associated with convex polyhedra, focusing on the codes
associated with the Platonic solids. It demonstrates that these codes are
locally repairable with (t = 2), and that the codes associated with (geometric)
dual polyhedra are (coding theoretic) duals of each other.
Rajshekhar Vishweshwar Bhat, Mehul Motani, Teng Joon Lim
Comments: 30 single column pages
Subjects: Information Theory (cs.IT)
Modern systems will increasingly rely on energy harvested from their
environment. Such systems utilize batteries to smoothen out the random
fluctuations in harvested energy. These fluctuations induce highly variable
battery charge and discharge rates, which affect the efficiencies of practical
batteries that typically have non-zero internal resistances. In this paper, we
study an energy harvesting communication system using a finite battery with
non-zero internal resistance. We adopt a dual-path architecture, in which
harvested energy can be directly used, or stored and then used. In a frame,
both time and power can be split between energy storage and data transmission.
For a single frame, we derive an analytical expression for the rate optimal
time and power splitting ratios between harvesting energy and transmitting
data. We then optimize the time and power splitting ratios for a group of
frames, assuming non-causal knowledge of harvested power and fading channel
gains, by giving an approximate solution. When only the statistics of the
energy arrivals and channel gains are known, we derive a dynamic programming
based policy and, propose three sub-optimal policies, which are shown to
perform competitively. In summary, our study suggests that battery internal
resistance significantly impacts the design and performance of energy
harvesting communication systems and must be taken into account.
Sachini Jayasooriya, Mahyar Shirvanimoghaddam, Lawrence Ong, Sarah J. Johnson
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Information Theory (cs.IT)
The focus of this paper is on the analysis and design of Raptor codes using a
multi-edge framework. In this regard, we first represent the Raptor code as a
multi-edge type low-density parity-check (METLDPC) code. This MET
representation gives a general framework to analyze and design Raptor codes
over a binary input additive white Gaussian noise channel using MET density
evolution (MET-DE). We consider a joint decoding scheme based on the belief
propagation (BP) decoding for Raptor codes in the multi-edge framework, and
analyze the convergence behavior of the BP decoder using MET-DE. In joint
decoding of Raptor codes, the component codes correspond to inner code and
precode are decoded in parallel and provide information to each other. We also
derive an exact expression for the stability of Raptor codes with joint
decoding. We then propose an efficient Raptor code design method using the
multi-edge framework, where we simultaneously optimize the inner code and the
precode. Finally we consider performance-complexity trade-offs of Raptor codes
using the multi-edge framework. Through density evolution analysis we show that
the designed Raptor codes using the multi-edge framework outperform the
existing Raptor codes in literature in terms of the realized rate.
Cheng Chen, Randall A. Berry, Michael L. Honig, Vijay G. Subramanian
Comments: 9 pages, 4 figures, accepted and to appear at 2017 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN)
Subjects: Information Theory (cs.IT); Computer Science and Game Theory (cs.GT); Networking and Internet Architecture (cs.NI)
Small-cell deployment in licensed and unlicensed spectrum is considered to be
one of the key approaches to cope with the ongoing wireless data demand
explosion. Compared to traditional cellular base stations with large
transmission power, small-cells typically have relatively low transmission
power, which makes them attractive for some spectrum bands that have strict
power regulations, for example, the 3.5GHz band [1]. In this paper we consider
a heterogeneous wireless network consisting of one or more service providers
(SPs). Each SP operates in both macro-cells and small-cells, and provides
service to two types of users: mobile and fixed. Mobile users can only
associate with macro-cells whereas fixed users can connect to either macro- or
small-cells. The SP charges a price per unit rate for each type of service.
Each SP is given a fixed amount of bandwidth and splits it between macro- and
small-cells. Motivated by bandwidth regulations, such as those for the 3.5Gz
band, we assume a minimum amount of bandwidth has to be set aside for
small-cells. We study the optimal pricing and bandwidth allocation strategies
in both monopoly and competitive scenarios. In the monopoly scenario the
strategy is unique. In the competitive scenario there exists a unique Nash
equilibrium, which depends on the regulatory constraints. We also analyze the
social welfare achieved, and compare it to that without the small-cell
bandwidth constraints. Finally, we discuss implications of our results on the
effectiveness of the minimum bandwidth constraint on influencing small-cell
deployments.
Ali Dehghan, Amir H. Banihashemi
Subjects: Information Theory (cs.IT)
Random bipartite graphs, random lifts of bipartite protographs, and random
cyclic lifts of bipartite protographs are used to represent random low-density
parity-check (LDPC) codes, randomly constructed protograph-based LDPC codes,
and random quasi-cyclic (QC) LDPC codes, respectively. In this paper, we study
the distribution of cycles of different length in all these three categories of
graphs. We prove that for a random bipartite graph, with a given degree
distribution, the distributions of cycles of different length tend to
independent Poisson distributions, as the size of the graph tends to infinity.
It is well-known that for a random lift of a protograph, the distributions of
cycles of different length (c) tend to independent Poisson distributions with
expected value equal to the number of tailless backtrackless closed (tbc) walks
of length (c) in the protograph, as the size of the graph (lifting degree)
tends to infinity. Here, we find the number of tbc walks in a bi-regular
protograph, and demonstrate that random bi-regular LDPC codes have essentially
the same cycle distribution as random protograph-based LDPC codes, as long as
the degree distributions are identical. For random QC-LDPC codes, however, we
show that the cycle distribution can be quite different from the other two
categories. While for the former categories, the expected number of cycles of
different length is (Theta(1)) with respect to the size of the graph, for the
case of QC-LDPC codes, depending on the protograph and the value of (c), it can
be either (Theta(N)) or (Theta(1)), where (N) is the lifting degree (code
length). In addition, we provide numerical results that match our theoretical
derivations. Our results also provide a theoretical foundation for empirical
results that were reported in the literature but were not well-justified.
Lele Wang, Young-Han Kim, Chiao-Yi Chen, Hosung Park, Eren Sasoglu
Comments: Submitted to the IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
A low-complexity coding scheme is developed to achieve the rate region of
maximum likelihood decoding for interference channels. As in the classical
rate-splitting multiple access scheme by Grant, Rimoldi, Urbanke, and Whiting,
the proposed coding scheme uses superposition of multiple codewords with
successive cancellation decoding, which can be implemented using standard
point-to-point encoders and decoders. Unlike rate-splitting multiple access,
which is not rate-optimal for multiple receivers, the proposed coding scheme
transmits codewords over multiple blocks in a staggered manner and recovers
them successively over sliding decoding windows, achieving the single-stream
optimal rate region as well as the more general Han–Kobayashi inner bound for
the two-user interference channel. The feasibility of this scheme in practice
is verified by implementing it using commercial channel codes over the two-user
Gaussian interference channel.