Akhilesh Jaiswal, Sourjya Roy, Gopalakrishnan Srinivasan, Kaushik Roy
Subjects: Neural and Evolutionary Computing (cs.NE)
The efficiency of the human brain in performing classification tasks has
attracted considerable research interest in brain-inspired neuromorphic
computing. Hardware implementations of a neuromorphic system aims to mimic the
computations in the brain through interconnection of neurons and synaptic
weights. A leaky-integrate-fire (LIF) spiking model is widely used to emulate
the dynamics of neuronal action potentials. In this work, we propose a spin
based LIF spiking neuron using the magneto-electric (ME) switching of
ferro-magnets. The voltage across the ME oxide exhibits a typical
leaky-integrate behavior, which in turn switches an underlying ferro-magnet.
Due to the effect of thermal noise, the ferro-magnet exhibits probabilistic
switching dynamics, which is reminiscent of the stochasticity exhibited by
biological neurons. The energy-efficiency of the ME switching mechanism coupled
with the intrinsic non-volatility of ferro-magnets result in lower energy
consumption, when compared to a CMOS LIF neuron. A device to system-level
simulation framework has been developed to investigate the feasibility of the
proposed LIF neuron for a hand-written digit recognition problem
Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
We present a novel semi-supervised approach for sequence transduction and
apply it to semantic parsing. The unsupervised component is based on a
generative model in which latent sentences generate the unpaired logical forms.
We apply this method to a number of semantic parsing tasks focusing on domains
with limited access to labelled training data and extend those datasets with
synthetically generated logical forms.
Yu Ding
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Self-organizing map(SOM) have been widely applied in clustering, this paper
focused on centroids of clusters and what they reveal. When the input vectors
consists of time, latitude and longitude, the map can be strongly linked to
physical world, providing valuable information. Beyond basic clustering, a
novel approach to address the temporal element is developed, enabling 3D SOM to
track behaviors in multiple periods concurrently. Combined with adaptations
targeting to process heterogeneous data relating to distribution in time and
space, the paper offers a fresh scope for business and services based on
temporal-spatial pattern.
Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo, Ed Walker Jr., Alberto Rodriguez, Jianxiong Xiao
Comments: Under review at the International Conference on Robotics and Automation (ICRA) 2017. Project webpage: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)
Robot warehouse automation has attracted significant interest in recent
years, perhaps most visibly in the Amazon Picking Challenge (APC). A fully
autonomous warehouse pick-and-place system requires robust vision that reliably
recognizes and locates objects amid cluttered environments, self-occlusions,
sensor noise, and a large variety of objects. In this paper we present an
approach that leverages multi-view RGB-D data and self-supervised, data-driven
learning to overcome those difficulties. The approach was part of the
MIT-Princeton Team system that took 3rd- and 4th- place in the stowing and
picking tasks, respectively at APC 2016. In the proposed approach, we segment
and label multiple views of a scene with a fully convolutional neural network,
and then fit pre-scanned 3D object models to the resulting segmentation to get
the 6D object pose. Training a deep neural network for segmentation typically
requires a large amount of training data. We propose a self-supervised method
to generate a large labeled dataset without tedious manual segmentation. We
demonstrate that our system can reliably estimate the 6D pose of objects under
a variety of scenarios. All code, data, and benchmarks are available at
this http URL
J. Krishna Murthy, G.V. Sai Krishna, Falak Chhaya, K. Madhava Krishna
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We present an approach for reconstructing vehicles from a single (RGB) image,
in the context of autonomous driving. Though the problem appears to be
ill-posed, we demonstrate that prior knowledge about how 3D shapes of vehicles
project to an image can be used to reason about the reverse process, i.e., how
shapes (back-)project from 2D to 3D. We encode this knowledge in emph{shape
priors}, which are learnt over a small keypoint-annotated dataset. We then
formulate a shape-aware adjustment problem that uses the learnt shape priors to
recover the 3D pose and shape of a query object from an image. For shape
representation and inference, we leverage recent successes of Convolutional
Neural Networks (CNNs) for the task of object and keypoint localization, and
train a novel cascaded fully-convolutional architecture to localize vehicle
emph{keypoints} in images. The shape-aware adjustment then robustly recovers
shape (3D locations of the detected keypoints) while simultaneously filling in
occluded keypoints. To tackle estimation errors incurred due to erroneously
detected keypoints, we use an Iteratively Re-weighted Least Squares (IRLS)
scheme for robust optimization, and as a by-product characterize noise models
for each predicted keypoint. We evaluate our approach on autonomous driving
benchmarks, and present superior results to existing monocular, as well as
stereo approaches.
Anguelos Nicolaou, Liwicki Marcus
Comments: Short paper presented at the 12th IEEE workshop on Document Analysis Systems (DAS)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Although binarization is considered passe, it still remains a highly popular
research topic. In this paper we propose a rethinking of what binarization is.
We introduce the notion of the visual archetype as the ideal form of any one
document. Binarization can be defined as the restoration of the visual
archetype for a class of images. This definition broadens the scope of what
binarization means but also suggests ground-truth should focus on the
foreground.
Arnab Ghosh, Viveka Kulharia, Amitabha Mukerjee, Vinay Namboodiri, Mohit Bansal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Understanding, predicting, and generating object motions and transformations
is a core problem in artificial intelligence. Modeling sequences of evolving
images may provide better representations and models of motion and may
ultimately be used for forecasting, simulation, or video generation.
Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in
complex patterns and one needs to infer the underlying pattern sequence and
generate the next image in the sequence. For this, we develop a novel
Contextual Generative Adversarial Network based on Recurrent Neural Networks
(Context-RNN-GANs), where both the generator and the discriminator modules are
based on contextual history (modeled as RNNs) and the adversarial discriminator
guides the generator to produce realistic images for the particular time step
in the image sequence. We evaluate the Context-RNN-GAN model (and its variants)
on a novel dataset of Diagrammatic Abstract Reasoning, where it performs
competitively with 10th-grade human performance but there is still scope for
interesting improvements as compared to college-grade human performance. We
also evaluate our model on a standard video next-frame prediction task,
achieving improved performance over comparable state-of-the-art.
Julie Dequaire, Dushyant Rao, Peter Ondruska, Dominic Wang, Ingmar Posner
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)
This paper presents an end-to-end approach for tracking static and dynamic
objects for an autonomous vehicle driving through crowded urban environments.
Unlike traditional approaches to tracking, this method is learned end-to-end,
and is able to directly predict a full unoccluded occupancy grid map from raw
laser input data. Inspired by the recently presented DeepTracking approach
[Ondruska, 2016], we employ a recurrent neural network (RNN) to capture the
temporal evolution of the state of the environment, and propose to use Spatial
Transformer modules to exploit estimates of the egomotion of the vehicle. Our
results demonstrate the ability to track a range of objects, including cars,
buses, pedestrians, and cyclists through occlusion, from both moving and
stationary platforms, using a single learned model. Experimental results
demonstrate that the model can also predict the future states of objects from
current inputs, with greater accuracy than previous work.
R. Tapiador, A. Rios-Navarro, A. Linares-Barranco, Minkyu Kim, Deepak Kadetotad, Jae-sun Seo
Comments: 6 pages, 6 figures. Robotic and Technology of Computers Lab report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Deep learning has significantly advanced the state of the art in artificial
intelligence, gaining wide popularity from both industry and academia. Special
interest is around Convolutional Neural Networks (CNN), which take inspiration
from the hierarchical structure of the visual cortex, to form deep layers of
convolutional operations, along with fully connected classifiers. Hardware
implementations of these deep CNN architectures are challenged with memory
bottlenecks that require many convolution and fully-connected layers demanding
large amount of communication for parallel computation. Multi-core CPU based
solutions have demonstrated their inadequacy for this problem due to the memory
wall and low parallelism. Many-core GPU architectures show superior performance
but they consume high power and also have memory constraints due to
inconsistencies between cache and main memory. FPGA design solutions are also
actively being explored, which allow implementing the memory hierarchy using
embedded BlockRAM. This boosts the parallel use of shared memory elements
between multiple processing units, avoiding data replicability and
inconsistencies. This makes FPGAs potentially powerful solutions for real-time
classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design
framework from GPU for FPGA designs as a pseudo-automatic development solution.
In this paper, a comprehensive evaluation and comparison of Altera and Xilinx
OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources,
temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx
demonstrates faster synthesis, better FPGA resource utilization and more
compact boards. Altera provides multi-platforms tools, mature design community
and better execution times.
Jiu Xu, Bjorn Stenger, Tommi Kerola, Tony Tung
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a method of estimating the geometry of a room and the 3D
pose of objects from a single 360-degree panorama image. Assuming Manhattan
World geometry, we formulate the task as a Bayesian inference problem in which
we estimate positions and orientations of walls and objects. The method
combines surface normal estimation, 2D object detection and 3D object pose
estimation. Quantitative results are presented on a dataset of synthetically
generated 3D rooms containing objects, as well as on a subset of hand-labeled
images from the public SUN360 dataset.
Hà Quang Minh, Marco San Biagio, Loris Bazzani, Vittorio Murino
Comments: 18 double-column pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel framework for visual object recognition using
infinite-dimensional covariance operators of input features in the paradigm of
kernel methods on infinite-dimensional Riemannian manifolds. Our formulation
provides in particular a rich representation of image features by exploiting
their non-linear correlations. Theoretically, we provide a finite-dimensional
approximation of the Log-Hilbert-Schmidt (Log-HS) distance between covariance
operators that is scalable to large datasets, while maintaining an effective
discriminating capability. This allows us to efficiently approximate any
continuous shift-invariant kernel defined using the Log-HS distance. At the
same time, we prove that the Log-HS inner product between covariance operators
is only approximable by its finite-dimensional counterpart in a very limited
scenario. Consequently, kernels defined using the Log-HS inner product, such as
polynomial kernels, are not scalable in the same way as shift-invariant
kernels. Computationally, we apply the approximate Log-HS distance formulation
to covariance operators of both handcrafted and convolutional features,
exploiting both the expressiveness of these features and the power of the
covariance representation. Empirically, we tested our framework on the task of
image classification on twelve challenging datasets. In almost all cases, the
results obtained outperform other state of the art methods, demonstrating the
competitiveness and potential of our framework.
Gabriel Moyà-Alcover, Ahmed Elgammal, Antoni Jaume-i-Capó, Javier Varona
Comments: Accepted in Pattern Recognition Letters. Will update the info
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The problem of detecting changes in a scene and segmenting the foreground
from background is still challenging, despite previous work. Moreover, new RGBD
capturing devices include depth cues, which could be incorporated to improve
foreground segmentation. In this work, we present a new nonparametric approach
where a unified model mixes the device multiple information cues. In order to
unify all the device channel cues, a new probabilistic depth data model is also
proposed where we show how handle the inaccurate data to improve foreground
segmentation. A new RGBD video dataset is presented in order to introduce a new
standard for comparison purposes of this kind of algorithms. Results show that
the proposed approach can handle several practical situations and obtain good
results in all cases.
Manali Naik, V. Srinivasa Chakravarthy
Comments: 22 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present Bharati, a simple, novel script that can represent the characters
of a majority of contemporary Indian scripts. The shapes/motifs of Bharati
characters are drawn from some of the simplest characters of existing Indian
scripts. Bharati characters are designed such that they strictly reflect the
underlying phonetic organization, thereby attributing to the script qualities
of simplicity, familiarity, ease of acquisition and use. Thus, employing
Bharati script as a common script for a majority of Indian languages can
ameliorate several existing communication bottlenecks in India. We perform a
complexity analysis of handwritten Bharati script and compare its complexity
with that of 9 major Indian scripts. The measures of complexity are derived
from a theory of handwritten characters based on Catastrophe theory. Bharati
script is shown to be simpler than the 9 major Indian scripts in most measures
of complexity.
Mahdyar Ravanbakhsh, Hossein Mousavi, Moin Nabi, Mohammad Rastegari, Carlo Regazzoni
Comments: ICIP 2016 Best Paper / Student Paper Finalist
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we introduce a novel method for general semantic segmentation
that can benefit from general semantics of Convolutional Neural Network (CNN).
Our segmentation proposes visually and semantically coherent image segments. We
use binary encoding of CNN features to overcome the difficulty of the
clustering on the high-dimensional CNN feature space. These binary codes are
very robust against noise and non-semantic changes in the image. These binary
encoding can be embedded into the CNN as an extra layer at the end of the
network. This results in real-time segmentation. To the best of our knowledge
our method is the first attempt on general semantic image segmentation using
CNN. All the previous papers were limited to few number of category of the
images (e.g. PASCAL VOC). Experiments show that our segmentation algorithm
outperform the state-of-the-art non-semantic segmentation methods by large
margin.
Minyoung Kim, Stefano Alletto, Luca Rigazio
Comments: accepted as a poster presentation for WiML 2016, colocated with NIPS 2016, Barcelona, Spain
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Multi-object tracking has recently become an important area of computer
vision, especially for Advanced Driver Assistance Systems (ADAS). Despite
growing attention, achieving high performance tracking is still challenging,
with state-of-the- art systems resulting in high complexity with a large number
of hyper parameters. In this paper, we focus on reducing overall system
complexity and the number hyper parameters that need to be tuned to a specific
environment. We introduce a novel tracking system based on similarity mapping
by Enhanced Siamese Neural Network (ESNN), which accounts for both appearance
and geometric information, and is trainable end-to-end. Our system achieves
competitive performance in both speed and accuracy on MOT16 challenge, compared
to known state-of-the-art methods.
Hejia Zhang, Po-Hsuan Chen, Janice Chen, Xia Zhu, Javier S. Turek, Theodore L. Willke, Uri Hasson, Peter J. Ramadge
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
There is a growing interest in joint multi-subject fMRI analysis. The
challenge of such analysis comes from inherent anatomical and functional
variability across subjects. One approach to resolving this is a shared
response factor model. This assumes a shared and time synchronized stimulus
across subjects. Such a model can often identify shared information, but it may
not be able to pinpoint with high resolution the spatial location of this
information. In this work, we examine a searchlight based shared response model
to identify shared information in small contiguous regions (searchlights)
across the whole brain. Validation using classification tasks demonstrates that
we can pinpoint informative local regions.
Jianwen Xie, Yang Lu, Song-Chun Zhu, Ying Nian Wu
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)
This paper studies the cooperative training of two probabilistic models of
signals such as images. Both models are parametrized by convolutional neural
networks (ConvNets). The first network is a descriptor network, which is an
exponential family model or an energy-based model, whose feature statistics or
energy function are defined by a bottom-up ConvNet, which maps the observed
signal to the feature statistics. The second network is a generator network,
which is a non-linear version of factor analysis. It is defined by a top-down
ConvNet, which maps the latent factors to the observed signal. The maximum
likelihood training algorithms of both the descriptor net and the generator net
are in the form of alternating back-propagation, and both algorithms involve
Langevin sampling. %In the training of the descriptor net, the Langevin
sampling is used to sample synthesized examples from the model. In the training
of the generator net, the Langevin sampling is used to sample the latent
factors from the posterior distribution. The Langevin sampling in both
algorithms can be time consuming. We observe that the two training algorithms
can cooperate with each other by jumpstarting each other’s Langevin sampling,
and they can be naturally and seamlessly interwoven into a CoopNets algorithm
that can train both nets simultaneously.
Gheorghii Postica, Andrea Romanoni, Matteo Matteucci
Comments: 6 pages, to appear in IROS 2016
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Detecting moving objects in dynamic scenes from sequences of lidar scans is
an important task in object tracking, mapping, localization, and navigation.
Many works focus on changes detection in previously observed scenes, while a
very limited amount of literature addresses moving objects detection. The
state-of-the-art method exploits Dempster-Shafer Theory to evaluate the
occupancy of a lidar scan and to discriminate points belonging to the static
scene from moving ones. In this paper we improve both speed and accuracy of
this method by discretizing the occupancy representation, and by removing false
positives through visual cues. Many false positives lying on the ground plane
are also removed thanks to a novel ground plane removal algorithm. Efficiency
is improved through an octree indexing strategy. Experimental evaluation
against the KITTI public dataset shows the effectiveness of our approach, both
qualitatively and quantitatively with respect to the state- of-the-art.
Yael Yankelevsky, Michael Elad
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a supervised dictionary learning algorithm that
aims to preserve the local geometry in both dimensions of the data. A
graph-based regularization explicitly takes into account the local manifold
structure of the data points. A second graph regularization gives similar
treatment to the feature domain and helps in learning a more robust dictionary.
Both graphs can be constructed from the training data or learned and adapted
along the dictionary learning process. The combination of these two terms
promotes the discriminative power of the learned sparse representations and
leads to improved classification accuracy. The proposed method was evaluated on
several different datasets, representing both single-label and multi-label
classification problems, and demonstrated better performance compared with
other dictionary based approaches.
Wenbin Li, Yang Gao, Lei Wang, Luping Zhou, Jing Huo, Yinghuan Shi
Comments: 12 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
To achieve a low computational cost when performing online metric learning
for large-scale data, we present a one-pass closed-form solution namely OPML in
this paper. Typically, the proposed OPML first adopts a one-pass triplet
construction strategy, which aims to use only a very small number of triplets
to approximate the representation ability of whole original triplets obtained
by batch-manner methods. Then, OPML employs a closed-form solution to update
the metric for new coming samples, which leads to a low space (i.e., $O(d)$)
and time (i.e., $O(d^2)$) complexity, where $d$ is the feature dimensionality.
In addition, an extension of OPML (namely COPML) is further proposed to enhance
the robustness when in real case the first several samples come from the same
class (i.e., cold start problem). In the experiments, we have systematically
evaluated our methods (OPML and COPML) on three typical tasks, including UCI
data classification, face verification, and abnormal event detection in videos,
which aims to fully evaluate the proposed methods on different sample number,
different feature dimensionalities and different feature extraction ways (i.e.,
hand-crafted and deeply-learned). The results show that OPML and COPML can
obtain the promising performance with a very low computational cost. Also, the
effectiveness of COPML under the cold start setting is experimentally verified.
Petros-Pavlos Ypsilantis, Giovanni Montana
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV)
Computed tomography (CT) generates a stack of cross-sectional images covering
a region of the body. The visual assessment of these images for the
identification of potential abnormalities is a challenging and time consuming
task due to the large amount of information that needs to be processed. In this
article we propose a deep artificial neural network architecture, ReCTnet, for
the fully-automated detection of pulmonary nodules in CT scans. The
architecture learns to distinguish nodules and normal structures at the pixel
level and generates three-dimensional probability maps highlighting areas that
are likely to harbour the objects of interest. Convolutional and recurrent
layers are combined to learn expressive image representations exploiting the
spatial dependencies across axial slices. We demonstrate that leveraging
intra-slice dependencies substantially increases the sensitivity to detect
pulmonary nodules without inflating the false positive rate. On the publicly
available LIDC/IDRI dataset consisting of 1,018 annotated CT scans, ReCTnet
reaches a detection sensitivity of 90.5% with an average of 4.5 false positives
per scan. Comparisons with a competing multi-channel convolutional neural
network for multi-slice segmentation and other published methodologies using
the same dataset provide evidence that ReCTnet offers significant performance
gains.
Ivan S. Grechikhin
Subjects: Artificial Intelligence (cs.AI)
Vehicle Routing Problem is a well-known problem in logistics and
transportation, and the variety of such problems is explained by the fact that
it occurs in many real-life situations. It is an NP-hard combinatorial
optimization problem and finding an exact optimal solution is practically
impossible. In this work, Site-Dependent Truck and Trailer Routing Problem with
hard and soft Time Windows and Split Deliveries is considered (SDTTRPTWSD). In
this article, we develop a heuristic with the elements of Tabu Search for
solving SDTTRPTWSD. The heuristic uses the concept of neighborhoods and visits
infeasible solutions during the search. A greedy heuristic is applied to
construct an initial solution.
Arnab Ghosh, Viveka Kulharia, Amitabha Mukerjee, Vinay Namboodiri, Mohit Bansal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Understanding, predicting, and generating object motions and transformations
is a core problem in artificial intelligence. Modeling sequences of evolving
images may provide better representations and models of motion and may
ultimately be used for forecasting, simulation, or video generation.
Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in
complex patterns and one needs to infer the underlying pattern sequence and
generate the next image in the sequence. For this, we develop a novel
Contextual Generative Adversarial Network based on Recurrent Neural Networks
(Context-RNN-GANs), where both the generator and the discriminator modules are
based on contextual history (modeled as RNNs) and the adversarial discriminator
guides the generator to produce realistic images for the particular time step
in the image sequence. We evaluate the Context-RNN-GAN model (and its variants)
on a novel dataset of Diagrammatic Abstract Reasoning, where it performs
competitively with 10th-grade human performance but there is still scope for
interesting improvements as compared to college-grade human performance. We
also evaluate our model on a standard video next-frame prediction task,
achieving improved performance over comparable state-of-the-art.
Yonatan Bisk, Siva Reddy, John Blitzer, Julia Hockenmaier, Mark Steedman
Comments: 6 pages, short paper, EMNLP 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We compare the effectiveness of four different syntactic CCG parsers for a
semantic slot-filling task to explore how much syntactic supervision is
required for downstream semantic analysis. This extrinsic, task-based
evaluation also provides a unique window into the semantics captured (or
missed) by unsupervised grammar induction systems.
Julie Dequaire, Dushyant Rao, Peter Ondruska, Dominic Wang, Ingmar Posner
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)
This paper presents an end-to-end approach for tracking static and dynamic
objects for an autonomous vehicle driving through crowded urban environments.
Unlike traditional approaches to tracking, this method is learned end-to-end,
and is able to directly predict a full unoccluded occupancy grid map from raw
laser input data. Inspired by the recently presented DeepTracking approach
[Ondruska, 2016], we employ a recurrent neural network (RNN) to capture the
temporal evolution of the state of the environment, and propose to use Spatial
Transformer modules to exploit estimates of the egomotion of the vehicle. Our
results demonstrate the ability to track a range of objects, including cars,
buses, pedestrians, and cyclists through occlusion, from both moving and
stationary platforms, using a single learned model. Experimental results
demonstrate that the model can also predict the future states of objects from
current inputs, with greater accuracy than previous work.
Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
We present a novel semi-supervised approach for sequence transduction and
apply it to semantic parsing. The unsupervised component is based on a
generative model in which latent sentences generate the unpaired logical forms.
We apply this method to a number of semantic parsing tasks focusing on domains
with limited access to labelled training data and extend those datasets with
synthetically generated logical forms.
Qiang Liu, Shu Wu, Feng Yu, Liang Wang, Tieniu Tan
Comments: IEEE Transactions on Information Forensics and Security (TIFS), under review
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
With the rapid growth of social media, rumors are also spreading widely on
social media and bring harm to people’s daily life. Nowadays, information
credibility evaluation has drawn attention from academic and industrial
communities. Current methods mainly focus on feature engineering and achieve
some success. However, feature engineering based methods require a lot of labor
and cannot fully reveal the underlying relations among data. In our viewpoint,
the key elements of user behaviors for evaluating credibility are concluded as
“who”, “what”, “when”, and “how”. These existing methods cannot model the
correlation among different key elements during the spreading of microblogs. In
this paper, we propose a novel representation learning method, Information
Credibility Evaluation (ICE), to learn representations of information
credibility on social media. In ICE, latent representations are learnt for
modeling user credibility, behavior types, temporal properties, and comment
attitudes. The aggregation of these factors in the microblog spreading process
yields the representation of a user’s behavior, and the aggregation of these
dynamic representations generates the credibility representation of an event
spreading on social media. Moreover, a pairwise learning method is applied to
maximize the credibility difference between rumors and non-rumors. To evaluate
the performance of ICE, we conduct experiments on a Sina Weibo data set, and
the experimental results show that our ICE model outperforms the
state-of-the-art methods.
Ruining He, Julian McAuley
Comments: 10 pages, 8 figures
Subjects: Information Retrieval (cs.IR)
Predicting personalized sequential behavior is a key task for recommender
systems. In order to predict user actions such as the next product to purchase,
movie to watch, or place to visit, it is essential to take into account both
long-term user preferences and sequential patterns (i.e., short-term dynamics).
Matrix Factorization and Markov Chain methods have emerged as two separate but
powerful paradigms for modeling the two respectively. Combining these ideas has
led to unified methods that accommodate long- and short-term dynamics
simultaneously by modeling pairwise user-item and item-item interactions.
In spite of the success of such methods for tackling dense data, they are
challenged by sparsity issues, which are prevalent in real-world datasets. In
recent years, similarity-based methods have been proposed for
(sequentially-unaware) item recommendation with promising results on sparse
datasets. In this paper, we propose to fuse such methods with Markov Chains to
make personalized sequential recommendations. We evaluate our method, Fossil,
on a variety of large, real-world datasets. We show quantitatively that Fossil
outperforms alternative algorithms, especially on sparse datasets, and
qualitatively that it captures personalized dynamics and is able to make
meaningful recommendations.
Leonard K.M. Poon, Nevin L. Zhang
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)
Academic researchers often need to face with a large collection of research
papers in the literature. This problem may be even worse for postgraduate
students who are new to a field and may not know where to start. To address
this problem, we have developed an online catalog of research papers where the
papers have been automatically categorized by a topic model. The catalog
contains 7719 papers from the proceedings of two artificial intelligence
conferences from 2000 to 2015. Rather than the commonly used Latent Dirichlet
Allocation, we use a recently proposed method called hierarchical latent tree
analysis for topic modeling. The resulting topic model contains a hierarchy of
topics so that users can browse the topics from the top level to the bottom
level. The topic model contains a manageable number of general topics at the
top level and allows thousands of fine-grained topics at the bottom level. It
also can detect topics that have emerged recently.
Yonatan Bisk, Siva Reddy, John Blitzer, Julia Hockenmaier, Mark Steedman
Comments: 6 pages, short paper, EMNLP 2016
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We compare the effectiveness of four different syntactic CCG parsers for a
semantic slot-filling task to explore how much syntactic supervision is
required for downstream semantic analysis. This extrinsic, task-based
evaluation also provides a unique window into the semantics captured (or
missed) by unsupervised grammar induction systems.
Othman Zennaki, Nasredine Semmar, Laurent Besacier
Comments: accepted to COLING 2016
Subjects: Computation and Language (cs.CL)
This work focuses on the rapid development of linguistic annotation tools for
resource-poor languages. We experiment several cross-lingual annotation
projection methods using Recurrent Neural Networks (RNN) models. The
distinctive feature of our approach is that our multilingual word
representation requires only a parallel corpus between the source and target
language. More precisely, our method has the following characteristics: (a) it
does not use word alignment information, (b) it does not assume any knowledge
about foreign languages, which makes it applicable to a wide range of
resource-poor languages, (c) it provides truly multilingual taggers. We
investigate both uni- and bi-directional RNN models and propose a method to
include external information (for instance low level information from POS) in
the RNN to train higher level taggers (for instance, super sense taggers). We
demonstrate the validity and genericity of our model by using parallel corpora
(obtained by manual or automatic translation). Our experiments are conducted to
induce cross-lingual POS and super sense taggers.
Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
We present a novel semi-supervised approach for sequence transduction and
apply it to semantic parsing. The unsupervised component is based on a
generative model in which latent sentences generate the unpaired logical forms.
We apply this method to a number of semantic parsing tasks focusing on domains
with limited access to labelled training data and extend those datasets with
synthetically generated logical forms.
Zhenghua Li, Yue Zhang, Jiayuan Chao, Min Zhang
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Recently, these has been a surge on studying how to obtain partially
annotated data for model supervision. However, there still lacks a systematic
study on how to train statistical models with partial annotation (PA). Taking
dependency parsing as our case study, this paper describes and compares two
straightforward approaches for three mainstream dependency parsers. The first
approach is previously proposed to directly train a log-linear graph-based
parser (LLGPar) with PA based on a forest-based objective. This work for the
first time proposes the second approach to directly training a linear
graph-based parse (LGPar) and a linear transition-based parser (LTPar) with PA
based on the idea of constrained decoding. We conduct extensive experiments on
Penn Treebank under three different settings for simulating PA, i.e., random
dependencies, most uncertain dependencies, and dependencies with divergent
outputs from the three parsers. The results show that LLGPar is most effective
in learning from PA and LTPar lags behind the graph-based counterparts by large
margin. Moreover, LGPar and LTPar can achieve best performance by using LLGPar
to complete PA into full annotation (FA).
Shaonan Wang, Jiajun Zhang, Chengqing Zong
Comments: submitted to AAAI 2017
Subjects: Computation and Language (cs.CL)
The most existing sentence representation models typically treat each word in
sentences equally. However, extensive studies have proven that human read
sentences by making a sequence of fixation and saccades (Rayner 1998), which is
extremely efficient. In this paper, we propose two novel approaches, using
significant predictors of human reading time, e.g., surprisal and word classes,
implemented as attention models to improve representation capability of
sentence embeddings. One approach utilizes surprisal directly as the attention
weight over baseline models. The other one builds attention model with the help
of POS tag and CCG supertag vectors which are trained together with word
embeddings in the process of sentence representation learning. In experiments,
we have evaluated our models on 24 textual semantic similarity datasets and the
results demonstrate that the proposed models significantly outperform the
state-of-the-art sentence representation models.
Leonard K.M. Poon, Nevin L. Zhang
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)
Academic researchers often need to face with a large collection of research
papers in the literature. This problem may be even worse for postgraduate
students who are new to a field and may not know where to start. To address
this problem, we have developed an online catalog of research papers where the
papers have been automatically categorized by a topic model. The catalog
contains 7719 papers from the proceedings of two artificial intelligence
conferences from 2000 to 2015. Rather than the commonly used Latent Dirichlet
Allocation, we use a recently proposed method called hierarchical latent tree
analysis for topic modeling. The resulting topic model contains a hierarchy of
topics so that users can browse the topics from the top level to the bottom
level. The topic model contains a manageable number of general topics at the
top level and allows thousands of fine-grained topics at the bottom level. It
also can detect topics that have emerged recently.
Lei Shen, Junlin Zhang
Subjects: Computation and Language (cs.CL)
Recurrent Neural Networks have achieved state-of-the-art results for many
problems in NLP and two most popular RNN architectures are Tail Model and
Pooling Model. In this paper, a hybrid architecture is proposed and we present
the first empirical study using LSTMs to compare performance of the three RNN
structures on sentence classification task. Experimental results show that the
Tail Model and Hybrid Model consistently get a better performance over Pooling
Model, and Hybrid Model is comparable with Tail Model.
Huan Zhou, Jose Gracia
Comments: 7 pages, accepted for publication in International Workshop on Legacy HPC Application Migration (LHAM16) held in conjunction with CANDAR16
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
MPI is the most widely used data transfer and communication model in High
Performance Computing. The latest version of the standard, MPI-3, allows
skilled programmers to exploit all hardware capabilities of the latest and
future supercomputing systems. The revised asynchronous remote-memory-access
model in combination with the shared-memory window extension, in particular,
allow writing code that hides communication latencies and optimizes
communication paths according to the locality of data origin and destination.
The latter is particularly important for today’s multi- and many-core systems.
However, writing such efficient code is highly complex and error-prone. In this
paper we evaluate a recent remote-memory-access model, namely DART-MPI. This
model claims to hide the aforementioned complexities from the programmer, but
deliver locality-aware remote-memory-access semantics which outperforms MPI-3
one-sided communication primitives on multi-core systems. Conceptually, the
DART-MPI interface is simple; at the same time it takes care of the
complexities of the underlying MPI-3 and system topology. This makes DART-MPI
an interesting candidate for porting legacy applications. We evaluate these
claims using a realistic scientific application, specifically a
finite-difference stencil code which solves the heat diffusion equation, on a
large-scale Cray XC40 installation.
Pankaj Khanchandani, Christoph Lenzen
Comments: 35 pages, 3 figures, full version of the paper in proceedings of SSS 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We revisit the approach to Byzantine fault-tolerant clock synchronization
based on approximate agreement introduced by Lynch and Welch. Our contribution
is threefold:
(1) We provide a slightly refined variant of the algorithm yielding improved
bounds on the skew that can be achieved and the sustainable frequency offsets.
(2) We show how to extend the technique to also synchronize clock rates. This
permits less frequent communication without significant loss of precision,
provided that clock rates change sufficiently slowly.
(3) We present a coupling scheme that allows to make these algorithms
self-stabilizing while preserving their high precision. The scheme utilizes a
low-precision, but self-stabilizing algorithm for the purpose of recovery.
Chenhao Qu, Rodrigo N. Calheiros, Rajkumar Buyya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Web application providers have been migrating their applications to cloud
data centers, attracted by the emerging cloud computing paradigm. One of the
appealing features of cloud is elasticity. It allows cloud users to acquire or
release computing resources on demand, which enables web application providers
to auto-scale the resources provisioned to their applications under dynamic
workload in order to minimize resource cost while satisfying Quality of Service
(QoS) requirements. In this paper, we comprehensively analyze the challenges
remain in auto-scaling web applications in clouds and review the developments
in this field. We present a taxonomy of auto-scaling systems according to the
identified challenges and key properties. We analyze the surveyed works and map
them to the taxonomy to identify the weakness in this field. Moreover, based on
the analysis, we propose new future directions.
Rohit Verma, Abhishek Srivastava
Comments: Preprint Submitted to Arxiv
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Advancements in technology have transformed mobile devices from being mere
communication widgets to versatile computing devices. Proliferation of these
hand held devices has made them a common means to access and process digital
information. Most web based applications are today available in a form that can
conveniently be accessed over mobile devices. However, webservices
(applications meant for consumption by other applications rather than humans)
are not as commonly provided and consumed over mobile devices. Facilitating
this and in effect realizing a service-oriented system over mobile devices has
the potential to further enhance the potential of mobile devices. One of the
major challenges in this integration is the lack of an efficient service
registry system that caters to issues associated with the dynamic and volatile
mobile environments. Existing service registry technologies designed for
traditional systems fall short of accommodating such issues. In this paper, we
propose a novel approach to manage service registry systems provided ‘solely’
over mobile devices, and thus realising an SOA without the need for high-end
computing systems. The approach manages a dynamic service registry system in
the form of light weight and distributed registries. We assess the feasibility
of our approach by engineering and deploying a working prototype of the
proposed registry system over actual mobile devices. A comparative study of the
proposed approach and the traditional UDDI (Universal Description, Discovery,
and Integration) registry is also included. The evaluation of our framework has
shown propitious results in terms of battery cost, scalability, hindrance with
native applications.
Ramakrishnan Kannan, Grey Ballard, Haesun Park
Comments: arXiv admin note: text overlap with arXiv:1509.09313
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (cs.NA); Machine Learning (stat.ML)
Non-negative matrix factorization (NMF) is the problem of determining two
non-negative low rank factors $W$ and $H$, for the given input matrix $A$, such
that $A approx W H$. NMF is a useful tool for many applications in different
domains such as topic modeling in text mining, background separation in video
analysis, and community detection in social networks. Despite its popularity in
the data mining community, there is a lack of efficient parallel algorithms to
solve the problem for big data sets.
The main contribution of this work is a new, high-performance parallel
computational framework for a broad class of NMF algorithms that iteratively
solves alternating non-negative least squares (NLS) subproblems for $W$ and
$H$. It maintains the data and factor matrices in memory (distributed across
processors), uses MPI for interprocessor communication, and, in the dense case,
provably minimizes communication costs (under mild assumptions). The framework
is flexible and able to leverage a variety of NMF and NLS algorithms, including
Multiplicative Update, Hierarchical Alternating Least Squares, and Block
Principal Pivoting. Our implementation allows us to benchmark and compare
different algorithms on massive dense and sparse data matrices of size that
spans for few hundreds of millions to billions. We demonstrate the scalability
of our algorithm and compare it with baseline implementations, showing
significant performance improvements. The code and the datasets used for
conducting the experiments are available online.
R. Tapiador, A. Rios-Navarro, A. Linares-Barranco, Minkyu Kim, Deepak Kadetotad, Jae-sun Seo
Comments: 6 pages, 6 figures. Robotic and Technology of Computers Lab report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Deep learning has significantly advanced the state of the art in artificial
intelligence, gaining wide popularity from both industry and academia. Special
interest is around Convolutional Neural Networks (CNN), which take inspiration
from the hierarchical structure of the visual cortex, to form deep layers of
convolutional operations, along with fully connected classifiers. Hardware
implementations of these deep CNN architectures are challenged with memory
bottlenecks that require many convolution and fully-connected layers demanding
large amount of communication for parallel computation. Multi-core CPU based
solutions have demonstrated their inadequacy for this problem due to the memory
wall and low parallelism. Many-core GPU architectures show superior performance
but they consume high power and also have memory constraints due to
inconsistencies between cache and main memory. FPGA design solutions are also
actively being explored, which allow implementing the memory hierarchy using
embedded BlockRAM. This boosts the parallel use of shared memory elements
between multiple processing units, avoiding data replicability and
inconsistencies. This makes FPGAs potentially powerful solutions for real-time
classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design
framework from GPU for FPGA designs as a pseudo-automatic development solution.
In this paper, a comprehensive evaluation and comparison of Altera and Xilinx
OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources,
temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx
demonstrates faster synthesis, better FPGA resource utilization and more
compact boards. Altera provides multi-platforms tools, mature design community
and better execution times.
Pengfei Xuan, Feng Luo, Rong Ge, Pradip K Srimani
Comments: 5 pages, 8 figures, short paper
Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC)
In order to boost the performance of data-intensive computing on HPC systems,
in-memory computing frameworks, such as Apache Spark and Flink, use local DRAM
for data storage. Optimizing the memory allocation to data storage is critical
to delivering performance to traditional HPC compute jobs and throughput to
data-intensive applications sharing the HPC resources. Current practices that
statically configure in-memory storage may leave inadequate space for compute
jobs or lose the opportunity to utilize more available space for data-intensive
applications. In this paper, we explore techniques to dynamically adjust
in-memory storage and make the right amount of space for compute jobs. We have
developed a dynamic memory controller, DynIMS, which infers memory demands of
compute tasks online and employs a feedback-based control model to adapt the
capacity of in-memory storage. We test DynIMS using mixed HPCC and Spark
workloads on a HPC cluster. Experimental results show that DynIMS can achieve
up to 5X performance improvement compared to systems with static memory
allocations.
Zhirong Qiu, Lihua Xie, Yiguang Hong
Subjects: Systems and Control (cs.SY); Distributed, Parallel, and Cluster Computing (cs.DC)
Distributed consensus with data rate constraint is an important research
topic of multi-agent systems. Some results have been obtained for consensus of
multi-agent systems with integrator dynamics, but it remains challenging for
general high-order systems, especially in the presence of unmeasurable states.
In this paper, we study the quantized consensus problem for a special kind of
high-order systems and investigate the corresponding data rate required for
achieving consensus. The state matrix of each agent is a 2m-th order real
Jordan block admitting m identical pairs of conjugate poles on the unit circle;
each agent has a single input, and only the first state variable can be
measured. The case of harmonic oscillators corresponding to m=1 is first
investigated under a directed communication topology which contains a spanning
tree, while the general case of m >= 2 is considered for a connected and
undirected network. In both cases it is concluded that the sufficient number of
communication bits to guarantee the consensus at an exponential convergence
rate is an integer between $m$ and $2m$, depending on the location of the
poles.
Lovedeep Gondara
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
New proposed models are often compared to state-of-the-art using statistical
significance testing. Literature is scarce for classifier comparison using
metrics other than accuracy. We present a survey of statistical methods that
can be used for classifier comparison using precision, accounting for
inter-precision correlation arising from use of same dataset. Comparisons are
made using per-class precision and methods presented to test global null
hypothesis of an overall model comparison. Comparisons are extended to multiple
multi-class classifiers and to models using cross validation or its variants.
Partial Bayesian update to precision is introduced when population prevalence
of a class is known. Applications to compare deep architectures are studied.
Di Chen, Yexiang Xue, Shuo Chen, Daniel Fink, Carla Gomes
Comments: 7pages, aaai2017
Subjects: Learning (cs.LG); Populations and Evolution (q-bio.PE); Machine Learning (stat.ML)
Understanding how species are distributed across landscapes over time is a
fundamental question in biodiversity research. Unfortunately, most species
distribution models only target a single species at a time, despite the fact
that there is strong evidence that species are not independently distributed.
We propose Deep Multi-Species Embedding (DMSE), which jointly embed vectors
corresponding to multiple species as well as vectors representing environmental
covariates into a common high dimensional feature space via a deep neural
network. Applied to extit{eBird} bird watching data, our single-species DMSE
model outperforms commonly used random forest models in terms of accuracy. Our
multi-species DMSE model further improves the single species version. Through
this model, we are able to confirm quantitatively many species-species
interactions, which are only understood qualitatively among ecologists. As an
additional contribution, we provide a graphical embedding of hundreds of bird
species in the Northeast US.
Yael Yankelevsky, Michael Elad
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a supervised dictionary learning algorithm that
aims to preserve the local geometry in both dimensions of the data. A
graph-based regularization explicitly takes into account the local manifold
structure of the data points. A second graph regularization gives similar
treatment to the feature domain and helps in learning a more robust dictionary.
Both graphs can be constructed from the training data or learned and adapted
along the dictionary learning process. The combination of these two terms
promotes the discriminative power of the learned sparse representations and
leads to improved classification accuracy. The proposed method was evaluated on
several different datasets, representing both single-label and multi-label
classification problems, and demonstrated better performance compared with
other dictionary based approaches.
Priyanka H U, Vivek R
Subjects: Learning (cs.LG); Computers and Society (cs.CY)
Developing predictive modelling solutions for risk estimation is extremely
challenging in health-care informatics. Risk estimation involves integration of
heterogeneous clinical sources having different representation from different
health-care provider making the task increasingly complex. Such sources are
typically voluminous, diverse, and significantly change over the time.
Therefore, distributed and parallel computing tools collectively termed big
data tools are in need which can synthesize and assist the physician to make
right clinical decisions. In this work we propose multi-model predictive
architecture, a novel approach for combining the predictive ability of multiple
models for better prediction accuracy. We demonstrate the effectiveness and
efficiency of the proposed work on data from Framingham Heart study. Results
show that the proposed multi-model predictive architecture is able to provide
better accuracy than best model approach. By modelling the error of predictive
models we are able to choose sub set of models which yields accurate results.
More information was modelled into system by multi-level mining which has
resulted in enhanced predictive accuracy.
Wenbin Li, Yang Gao, Lei Wang, Luping Zhou, Jing Huo, Yinghuan Shi
Comments: 12 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
To achieve a low computational cost when performing online metric learning
for large-scale data, we present a one-pass closed-form solution namely OPML in
this paper. Typically, the proposed OPML first adopts a one-pass triplet
construction strategy, which aims to use only a very small number of triplets
to approximate the representation ability of whole original triplets obtained
by batch-manner methods. Then, OPML employs a closed-form solution to update
the metric for new coming samples, which leads to a low space (i.e., $O(d)$)
and time (i.e., $O(d^2)$) complexity, where $d$ is the feature dimensionality.
In addition, an extension of OPML (namely COPML) is further proposed to enhance
the robustness when in real case the first several samples come from the same
class (i.e., cold start problem). In the experiments, we have systematically
evaluated our methods (OPML and COPML) on three typical tasks, including UCI
data classification, face verification, and abnormal event detection in videos,
which aims to fully evaluate the proposed methods on different sample number,
different feature dimensionalities and different feature extraction ways (i.e.,
hand-crafted and deeply-learned). The results show that OPML and COPML can
obtain the promising performance with a very low computational cost. Also, the
effectiveness of COPML under the cold start setting is experimentally verified.
Sauptik Dhar, Naveen Ramakrishnan, Vladimir Cherkassky, Mohak Shah
Comments: 14 pages, 12 figures
Subjects: Learning (cs.LG)
We introduce Universum learning for multiclass problems and propose a novel
formulation for multiclass universum SVM (MU-SVM). We also propose a span bound
for MU-SVM that can be used for model selection thereby avoiding resampling.
Empirical results demonstrate the effectiveness of MU-SVM and the proposed
bound.
Yu Ding
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Self-organizing map(SOM) have been widely applied in clustering, this paper
focused on centroids of clusters and what they reveal. When the input vectors
consists of time, latitude and longitude, the map can be strongly linked to
physical world, providing valuable information. Beyond basic clustering, a
novel approach to address the temporal element is developed, enabling 3D SOM to
track behaviors in multiple periods concurrently. Combined with adaptations
targeting to process heterogeneous data relating to distribution in time and
space, the paper offers a fresh scope for business and services based on
temporal-spatial pattern.
David Ha, Andrew Dai, Quoc V. Le
Subjects: Learning (cs.LG)
This work explores hypernetworks: an approach of using a small network, also
known as a hypernetwork, to generate the weights for a larger network.
Hypernetworks provide an abstraction that is similar to what is found in
nature: the relationship between a genotype – the hypernetwork – and a
phenotype – the main network. Though they are also reminiscent of HyperNEAT in
evolution, our hypernetworks are trained end-to-end with backpropagation and
thus are usually faster. The focus of this work is to make hypernetworks useful
for deep convolutional networks and long recurrent networks, where
hypernetworks can be viewed as relaxed form of weight-sharing across layers.
Our main result is that hypernetworks can generate non-shared weights for LSTM
and achieve state-of-art results on a variety of language modeling tasks with
Character-Level Penn Treebank and Hutter Prize Wikipedia datasets, challenging
the weight-sharing paradigm for recurrent networks. Our results also show that
hypernetworks applied to convolutional networks still achieve respectable
results for image recognition tasks compared to state-of-the-art baseline
models while requiring fewer learnable parameters.
Vu Dinh, Lam Si Tung Ho, Duy Nguyen, Binh T. Nguyen
Comments: Advances in Neural Information Processing Systems (NIPS 2016): 11 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We study fast learning rates when the losses are not necessarily bounded and
may have a distribution with heavy tails. To enable such analyses, we introduce
two new conditions: (i) the envelope function $sup_{f in mathcal{F}}|ell
circ f|$, where $ell$ is the loss function and $mathcal{F}$ is the
hypothesis class, exists and is $L^r$-integrable, and (ii) $ell$ satisfies the
multi-scale Bernstein’s condition on $mathcal{F}$. Under these assumptions, we
prove that learning rate faster than $O(n^{-1/2})$ can be obtained and,
depending on $r$ and the multi-scale Bernstein’s powers, can be arbitrarily
close to $O(n^{-1})$. We then verify these assumptions and derive fast learning
rates for the problem of vector quantization by $k$-means clustering with
heavy-tailed distributions. The analyses enable us to obtain novel learning
rates that extend and complement existing results in the literature from both
theoretical and practical viewpoints.
Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo, Ed Walker Jr., Alberto Rodriguez, Jianxiong Xiao
Comments: Under review at the International Conference on Robotics and Automation (ICRA) 2017. Project webpage: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)
Robot warehouse automation has attracted significant interest in recent
years, perhaps most visibly in the Amazon Picking Challenge (APC). A fully
autonomous warehouse pick-and-place system requires robust vision that reliably
recognizes and locates objects amid cluttered environments, self-occlusions,
sensor noise, and a large variety of objects. In this paper we present an
approach that leverages multi-view RGB-D data and self-supervised, data-driven
learning to overcome those difficulties. The approach was part of the
MIT-Princeton Team system that took 3rd- and 4th- place in the stowing and
picking tasks, respectively at APC 2016. In the proposed approach, we segment
and label multiple views of a scene with a fully convolutional neural network,
and then fit pre-scanned 3D object models to the resulting segmentation to get
the 6D object pose. Training a deep neural network for segmentation typically
requires a large amount of training data. We propose a self-supervised method
to generate a large labeled dataset without tedious manual segmentation. We
demonstrate that our system can reliably estimate the 6D pose of objects under
a variety of scenarios. All code, data, and benchmarks are available at
this http URL
Arnab Ghosh, Viveka Kulharia, Amitabha Mukerjee, Vinay Namboodiri, Mohit Bansal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
Understanding, predicting, and generating object motions and transformations
is a core problem in artificial intelligence. Modeling sequences of evolving
images may provide better representations and models of motion and may
ultimately be used for forecasting, simulation, or video generation.
Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in
complex patterns and one needs to infer the underlying pattern sequence and
generate the next image in the sequence. For this, we develop a novel
Contextual Generative Adversarial Network based on Recurrent Neural Networks
(Context-RNN-GANs), where both the generator and the discriminator modules are
based on contextual history (modeled as RNNs) and the adversarial discriminator
guides the generator to produce realistic images for the particular time step
in the image sequence. We evaluate the Context-RNN-GAN model (and its variants)
on a novel dataset of Diagrammatic Abstract Reasoning, where it performs
competitively with 10th-grade human performance but there is still scope for
interesting improvements as compared to college-grade human performance. We
also evaluate our model on a standard video next-frame prediction task,
achieving improved performance over comparable state-of-the-art.
Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson
Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)
Convolutional Neural Networks (CNNs) have proven very effective in image
classification and have shown promise for audio classification. We apply
various CNN architectures to audio and investigate their ability to classify
videos with a very large data set of 70M training videos (5.24 million hours)
with 30,871 labels. We examine fully connected Deep Neural Networks (DNNs),
AlexNet, VGG, Inception, and ResNet. We explore the effects of training with
different sized subsets of the training videos. Additionally we report the
effect of training using different subsets of the labels. While our dataset
contains video-level labels, we are also interested in Acoustic Event Detection
(AED) and train a classifier on embeddings learned from the video-level task on
Audio Set [5]. We find that derivatives of image classification networks do
well on our audio classification task, that increasing the number of labels we
train on provides some improved performance over subsets of labels, that
performance of models improves as we increase training set size, and that a
model using embeddings learned from the video-level task does much better than
a baseline on the Audio Set classification task.
Julie Dequaire, Dushyant Rao, Peter Ondruska, Dominic Wang, Ingmar Posner
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)
This paper presents an end-to-end approach for tracking static and dynamic
objects for an autonomous vehicle driving through crowded urban environments.
Unlike traditional approaches to tracking, this method is learned end-to-end,
and is able to directly predict a full unoccluded occupancy grid map from raw
laser input data. Inspired by the recently presented DeepTracking approach
[Ondruska, 2016], we employ a recurrent neural network (RNN) to capture the
temporal evolution of the state of the environment, and propose to use Spatial
Transformer modules to exploit estimates of the egomotion of the vehicle. Our
results demonstrate the ability to track a range of objects, including cars,
buses, pedestrians, and cyclists through occlusion, from both moving and
stationary platforms, using a single learned model. Experimental results
demonstrate that the model can also predict the future states of objects from
current inputs, with greater accuracy than previous work.
Giuseppe De Nittis, Francesco Trovò
Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG)
The present survey aims at presenting the current machine learning techniques
employed in security games domains. Specifically, we focused on papers and
works developed by the Teamcore of University of Southern California, which
deepened different directions in this field. After a brief introduction on
Stackelberg Security Games (SSGs) and the poaching setting, the rest of the
work presents how to model a boundedly rational attacker taking into account
her human behavior, then describes how to face the problem of having attacker’s
payoffs not defined and how to estimate them and, finally, presents how online
learning techniques have been exploited to learn a model of the attacker.
Zhenghua Li, Yue Zhang, Jiayuan Chao, Min Zhang
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Recently, these has been a surge on studying how to obtain partially
annotated data for model supervision. However, there still lacks a systematic
study on how to train statistical models with partial annotation (PA). Taking
dependency parsing as our case study, this paper describes and compares two
straightforward approaches for three mainstream dependency parsers. The first
approach is previously proposed to directly train a log-linear graph-based
parser (LLGPar) with PA based on a forest-based objective. This work for the
first time proposes the second approach to directly training a linear
graph-based parse (LGPar) and a linear transition-based parser (LTPar) with PA
based on the idea of constrained decoding. We conduct extensive experiments on
Penn Treebank under three different settings for simulating PA, i.e., random
dependencies, most uncertain dependencies, and dependencies with divergent
outputs from the three parsers. The results show that LLGPar is most effective
in learning from PA and LTPar lags behind the graph-based counterparts by large
margin. Moreover, LGPar and LTPar can achieve best performance by using LLGPar
to complete PA into full annotation (FA).
Davis W. Blalock, John V. Guttag
Comments: To appear in IEEE International Conference on Data Mining 2016
Subjects: Machine Learning (stat.ML); Databases (cs.DB); Learning (cs.LG)
Thanks to the rise of wearable and connected devices, sensor-generated time
series comprise a large and growing fraction of the world’s data.
Unfortunately, extracting value from this data can be challenging, since
sensors report low-level signals (e.g., acceleration), not the high-level
events that are typically of interest (e.g., gestures). We introduce a
technique to bridge this gap by automatically extracting examples of real-world
events in low-level data, given only a rough estimate of when these events have
taken place.
By identifying sets of features that repeat in the same temporal arrangement,
we isolate examples of such diverse events as human actions, power consumption
patterns, and spoken words with up to 96% precision and recall. Our method is
fast enough to run in real time and assumes only minimal knowledge of which
variables are relevant or the lengths of events. Our evaluation uses numerous
publicly available datasets and over 1 million samples of manually labeled
sensor data.
Leonard K.M. Poon, Nevin L. Zhang
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Learning (cs.LG)
Academic researchers often need to face with a large collection of research
papers in the literature. This problem may be even worse for postgraduate
students who are new to a field and may not know where to start. To address
this problem, we have developed an online catalog of research papers where the
papers have been automatically categorized by a topic model. The catalog
contains 7719 papers from the proceedings of two artificial intelligence
conferences from 2000 to 2015. Rather than the commonly used Latent Dirichlet
Allocation, we use a recently proposed method called hierarchical latent tree
analysis for topic modeling. The resulting topic model contains a hierarchy of
topics so that users can browse the topics from the top level to the bottom
level. The topic model contains a manageable number of general topics at the
top level and allows thousands of fine-grained topics at the bottom level. It
also can detect topics that have emerged recently.
Minyoung Kim, Stefano Alletto, Luca Rigazio
Comments: accepted as a poster presentation for WiML 2016, colocated with NIPS 2016, Barcelona, Spain
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Multi-object tracking has recently become an important area of computer
vision, especially for Advanced Driver Assistance Systems (ADAS). Despite
growing attention, achieving high performance tracking is still challenging,
with state-of-the- art systems resulting in high complexity with a large number
of hyper parameters. In this paper, we focus on reducing overall system
complexity and the number hyper parameters that need to be tuned to a specific
environment. We introduce a novel tracking system based on similarity mapping
by Enhanced Siamese Neural Network (ESNN), which accounts for both appearance
and geometric information, and is trainable end-to-end. Our system achieves
competitive performance in both speed and accuracy on MOT16 challenge, compared
to known state-of-the-art methods.
Eric Graves, Paul Yu, Predrag Spasojevic
Comments: Pre-print. Paper presented at ITW 2016 Cambridge
Subjects: Information Theory (cs.IT)
If Alice must communicate with Bob over a channel shared with the adversarial
Eve, then Bob must be able to validate the authenticity of the message. In
particular we consider the model where Alice and Eve share a discrete
memoryless multiple access channel with Bob, thus allowing simultaneous
transmissions from Alice and Eve. By traditional random coding arguments, we
demonstrate an inner bound on the rate at which Alice may transmit, while still
granting Bob the ability to authenticate. Furthermore this is accomplished in
spite of Alice and Bob lacking a pre-shared key, as well as allowing Eve prior
knowledge of both the codebook Alice and Bob share and the messages Alice
transmits.
Imène Trigui, Sofiène Affes, Ben Liang
Subjects: Information Theory (cs.IT)
Statistical characterization of the signal-to-interference-plus-noise ratio
(SINR) via its cumulative distribution function (CDF) is ubiquitous in a vast
majority of technical contributions in the area of cellular networks since it
boils down to averaging the Laplace transform of the aggregate interference, a
benefit accorded at the expense of confinement to the simplistic Rayleigh
fading. In this work, to capture diverse fading channels that appear in
realistic outdoor/indoor wireless communication scenarios, we tackle the
problem differently. By exploting the moment generating function (MGF) of the
SINR, we succeed in analytically assessing cellular networks performance over
the shadowed {kappa}-{mu}, {kappa}-{mu} and {eta}-{mu} fading models. The
latter offer high flexibility by capturing diverse fading channels including
Rayleigh, Nakagami-m, Rician, and Rician shadow fading distributions. These
channel models have been recently praised for their capability to accurately
model dense urban environments, future femtocells and device-to-device (D2D)
shadowed channels. In addition to unifying the analysis for different channel
models, this work integrates, for the first time, the coverage, the achievable
rate, and the bit error probability (BEP), which are largely treated separately
in the literature. The developed model and analysis are validated over a broad
range of simulation setups and parameters.
Sebastian Cammerer, Benedikt Leible, Matthias Stahl, Jakob Hoydis, Stephan ten Brink
Comments: submitted to ICASSP’17
Subjects: Information Theory (cs.IT)
The decoding performance of polar codes strongly depends on the decoding
algorithm used, while also the decoder throughput and its latency mainly depend
on the decoding algorithm. In this work, we implement the powerful successive
cancellation list (SCL) decoder on a GPU and identify the bottlenecks of this
algorithm with respect to parallel computing and its difficulties. The inherent
serial decoding property of the SCL algorithm naturally limits the achievable
speed-up gains on GPUs when compared to CPU implementations. In order to
increase the decoding throughput, we use a hybrid decoding scheme based on the
belief propagation (BP) decoder, which can be intraand inter-frame
parallelized. The proposed scheme combines excellent decoding performance and
high throughput within the signal-to-noise ratio (SNR) region of interest.
Nastja Cepak, Pascale Charpin, Enes Pasalic
Subjects: Information Theory (cs.IT)
We show that many infinite classes of permutations over finite fields can be
constructed via translators with a large choice of parameters. We first charac-
terize some functions having linear translators, based on which several
families of permutations are then derived. Extending the results of [10], we
give in several cases the compositional inverse of these permutations. The
connection with complete permutations is also utilized to provide further
infinite classes of permutations. Moreover, we propose new tools to study
permutations of the form x is mapped to x+(x^(p^m) – x+ lambda)^s and a few
infinite classes of permutations of this form are proposed.
Baokun Ding, Tao Zhang, Gennian Ge
Subjects: Information Theory (cs.IT)
Recently, Yaakobi et al. introduced codes for $b$-symbol read channels, where
the read operation is performed as a consecutive sequence of $b>2$ symbols. In
this paper, we establish a Singleton-type bound on $b$-symbol codes. Codes
meeting the Singleton-type bound are called maximum distance separable (MDS)
codes, and they are optimal in the sense they attain the maximal minimum
$b$-distance. Based on projective geometry and constacyclic codes, we construct
new families of linear MDS $b$-symbol codes over finite fields. And in some
sense, we completely determine the existence of linear MDS $b$-symbol codes
over finite fields for certain parameters.
Yiwei Zhang, Xin Wang, Hengjia Wei, Gennian Ge
Subjects: Information Theory (cs.IT)
Given a database, the private information retrieval (PIR) protocol allows a
user to make queries to several servers and retrieve a certain item of the
database via the feedbacks, without revealing the privacy of the specific item
to any single server. Classical models of PIR protocols require that each
server stores a whole copy of the database. Recently new PIR models are
proposed with coding techniques arising from distributed storage system. In
these new models each server only stores a fraction $1/s$ of the whole
database, where $s>1$ is a given rational number. PIR array codes are recently
proposed by Fazeli, Vardy and Yaakobi to characterize the new models. Consider
a PIR array code with $m$ servers and the $k$-PIR property (which indicates
that these $m$ servers may emulate any efficient $k$-PIR protocol). The central
problem is to design PIR array codes with optimal rate $k/m$. Our contribution
to this problem is three-fold. First, for the case $1<sle 2$, although PIR
array codes with optimal rate have been constructed recently by Blackburn and
Etzion, the number of servers in their construction is impractically large. We
determine the minimum number of servers admitting the existence of a PIR array
code with optimal rate for a certain range of parameters. Second, for the case
$s>2$, we derive a new upper bound on the rate of a PIR array code. Finally,
for the case $s>2$, we analyze a new construction by Blackburn and Etzion and
show that its rate is better than all the other existing constructions.
Yifan Gu, He Chen, Yonghui Li, Branka Vucetic
Comments: An invited paper to appear in WPMC 2016
Subjects: Information Theory (cs.IT)
This paper investigates a three-node multiple-input multiple-output relay
system suffering from co-channel interference (CCI) at the multi-antenna relay.
Contrary to the conventional relay networks, we consider the scenario that the
relay is an energy harvesting (EH) node and has no embedded energy supply. But
it is equipped with a rechargeable battery such that it can harvest and
accumulate the harvested energy from RF signals sent by the source and
co-channel interferers to support its operation. Leveraging the inherent
feature of the considered system, we develop a novel accumulate-then-forward
(ATF) protocol to eliminate the harmful effect of CCI. In the proposed ATF
scheme, at the beginning of each transmission block, the relay can choose
either EH operation to harvest energy from source and CCI or information
decoding (ID) operation to decode and forward source’s information while
suffering from CCI. Specifically, ID operation is activated only when the
accumulated energy at the relay can support an outage-free transmission in the
second hop. Otherwise, EH operation is invoked at the relay to harvest and
accumulate energy. By modeling the finite-capacity battery of relay as a
finite-state Markov Chain (MC), we derive a closed-form expression for the
system throughput of the proposed ATF scheme over mixed Nakagami-m and Rayleigh
fading channels. Numerical results validate our theoretical analysis, and show
that the proposed ATF scheme with energy accumulation significantly outperforms
the existing one without energy accumulation.
Z. Zh. Zhanabaev, S.N. Akhtanov, E.T. Kozhagulov, B.A Karibayev
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Information Theory (cs.IT)
In this paper we suggest a new algorithm for determination of signal-to-noise
ratio (SNR). SNR is a quantitative measure widely used in science and
engineering. Generally, methods for determination of SNR are based on using of
experimentally defined power of noise level, or some conditional noise
criterion which can be specified for signal processing. In the present work we
describe method for determination of SNR of chaotic and stochastic signals at
unknown power levels of signal and noise. For this aim we use information as
difference between unconditional and conditional entropy. Our theoretical
results are confirmed by results of analysis of signals which can be described
by nonlinear maps and presented as overlapping of harmonic and stochastic
signals.