Sebastián Basterrech
Comments: 23 pages, 14 figures, accepted paper for the IEEE IJCNN, 2017
Subjects: Neural and Evolutionary Computing (cs.NE)
The Echo State Network (ESN) is a specific recurrent network, which has
gained popularity during the last years. The model has a recurrent network
named reservoir, that is fixed during the learning process. The reservoir is
used for transforming the input space in a larger space. A fundamental property
that provokes an impact on the model accuracy is the Echo State Property (ESP).
There are two main theoretical results related to the ESP. First, a sufficient
condition for the ESP existence that involves the singular values of the
reservoir matrix. Second, a necessary condition for the ESP. The ESP can be
violated according to the spectral radius value of the reservoir matrix. There
is a theoretical gap between these necessary and sufficient conditions. This
article presents an empirical analysis of the accuracy and the projections of
reservoirs that satisfy this theoretical gap. It gives some insights about the
generation of the reservoir matrix. From previous works, it is already known
that the optimal accuracy is obtained near to the border of stability control
of the dynamics. Then, according to our empirical results, we can see that this
border seems to be closer to the sufficient conditions than to the necessary
conditions of the ESP.
Johannes Thiele, Peter Diehl, Matthew Cook
Comments: Presented at the NIPS 2016 workshop “Computing with Spikes”
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
We investigate a recently proposed model for cortical computation which
performs relational inference. It consists of several interconnected,
structurally equivalent populations of leaky integrate-and-fire (LIF) neurons,
which are trained in a self-organized fashion with spike-timing dependent
plasticity (STDP). Despite its robust learning dynamics, the model is
susceptible to a problem typical for recurrent networks which use a correlation
based (Hebbian) learning rule: if trained with high learning rates, the
recurrent connections can cause strong feedback loops in the network dynamics,
which lead to the emergence of attractor states. This causes a strong reduction
in the number of representable patterns and a decay in the inference ability of
the network. As a solution, we introduce a conceptually very simple
“wake-sleep” algorithm: during the wake phase, training is executed normally,
while during the sleep phase, the network “dreams” samples from its generative
model, which are induced by random input. This process allows us to activate
the attractor states in the network, which can then be unlearned effectively by
an anti-Hebbian mechanism. The algorithm allows us to increase learning rates
up to a factor of ten while avoiding clustering, which allows the network to
learn several times faster. Also for low learning rates, where clustering is
not an issue, it improves convergence speed and reduces the final inference
error.
Zhi-Zhong Liu, Yong Wang, Shengxiang Yang, Ke Tang
Comments: This paper provides a new point of view toward how to describe an evolutionary operator in the original coordinate system, and also offers a convenient transformation from an evolutionary operator in the original coordinate system to the corresponding evolutionary operator in the Eigen coordinate system
Subjects: Neural and Evolutionary Computing (cs.NE)
In the evolutionary computation research community, the performance of most
evolutionary algorithms (EAs) depends strongly on their implemented coordinate
system. However, the commonly used coordinate system is fixed and not well
suited for different function landscapes, EAs thus might not search
efficiently. To overcome this shortcoming, in this paper we propose a
framework, named ACoS, to adaptively tune the coordinate systems in EAs. In
ACoS, an Eigen coordinate system is established by making use of the cumulative
population distribution information, which can be obtained based on a
covariance matrix adaptation strategy and an additional archiving mechanism.
Since the population distribution information can reflect the features of the
function landscape to some extent, EAs in the Eigen coordinate system have the
capability to identify the modality of the function landscape. In addition, the
Eigen coordinate system is coupled with the original coordinate system, and
they are selected according to a probability vector. The probability vector
aims to determine the selection ratio of each coordinate system for each
individual, and is adaptively updated based on the collected information from
the offspring. ACoS has been applied to two of the most popular EA paradigms,
i.e., particle swarm optimization (PSO) and differential evolution (DE), for
solving 30 test functions with 30 and 50 dimensions at the 2014 IEEE Congress
on Evolutionary Computation. The experimental studies demonstrate its
effectiveness.
Pietro Morerio, Jacopo Cavazza, Riccardo Volpi, Rene Vidal, Vittorio Murino
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
Dropout is a very effective way of regularizing neural networks.
Stochastically “dropping out” units with a certain probability discourages
over-specific co-adaptations of feature detectors, preventing overfitting and
improving network generalization. Besides, Dropout can be interpreted as an
approximate model aggregation technique, where an exponential number of smaller
networks are averaged in order to get a more powerful ensemble. In this paper,
we show that using a fixed dropout probability during training is a suboptimal
choice. We thus propose a time scheduling for the probability of retaining
neurons in the network. This induces an adaptive regularization scheme that
smoothly increases the difficulty of the optimization problem. This idea of
“starting easy” and adaptively increasing the difficulty of the learning
problem has its roots in curriculum learning and allows one to train better
models. Indeed, we prove that our optimization strategy implements a very
general curriculum scheme, by gradually adding noise to both the input and
intermediate feature representations within the network architecture.
Experiments on seven image classification datasets and different network
architectures show that our method, named Curriculum Dropout, frequently yields
to better generalization and, at worst, performs just as well as the standard
Dropout method.
Nadav Cohen, Ronen Tamari, Amnon Shashua
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Expressive efficiency is a concept that allows formally reasoning about the
representational capacity of deep network architectures. A network architecture
is expressively efficient with respect to an alternative architecture if the
latter must grow super-linearly in order to represent functions realized by the
former. A well-known example is the exponential expressive efficiency of depth,
namely, that in many cases shallow networks must grow exponentially large in
order to represent functions realized by deep networks. In this paper we study
the expressive efficiency brought forth by the architectural feature of
connectivity, motivated by the observation that nearly all state of the art
networks these days employ elaborate connection schemes, running layers in
parallel while splitting and merging them in various ways. A formal treatment
of this question would shed light on the effectiveness of modern connectivity
schemes, and in addition, could provide new tools for network design. We focus
on dilated convolutional networks, a family of deep models gaining increased
attention, underlying state of the art architectures like Google’s WaveNet and
ByteNet. By introducing and studying the concept of mixed tensor
decompositions, we prove that interconnecting dilated convolutional networks
can lead to expressive efficiency. In particular, we show that a single
connection between intermediate layers can already lead to an almost quadratic
gap, which in large-scale settings typically makes the difference between a
model that is practical and one that is not.
Peter Karkus, David Hsu, Wee Sun Lee
Comments: 9 pages, 5 figures, 1 table
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This paper introduces QMDP-net, a neural network architecture for planning
under partial observability. The QMDP-net combines the strengths of model-free
learning and model-based planning. It is a recurrent policy network, but it
represents a policy by connecting a model with a planning algorithm that solves
the model, thus embedding the solution structure of planning in the network
architecture. The QMDP-net is fully differentiable and allows end-to-end
training. We train a QMDP-net over a set of different environments so that it
can generalize over new ones. In preliminary experiments, QMDP-net showed
strong performance on several robotic tasks in simulation. Interestingly, it
also sometimes outperformed the QMDP algorithm, which generated the data for
learning, because of QMDP-net’s robustness resulting from end-to-end learning.
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Access to electronic health records (EHR) data has motivated computational
advances in medical research. However, various concerns, particularly over
privacy, can limit access to and collaborative use of EHR data. Sharing
synthetic EHR data could mitigate risk. In this paper, we propose a new
approach, medical Generative Adversarial Network (medGAN), to generate
realistic synthetic EHRs. Based on an input EHR dataset, medGAN can generate
high-dimensional discrete variables (e.g., binary and count features) via a
combination of an autoencoder and generative adversarial networks. We also
propose minibatch averaging to efficiently avoid mode collapse, and increase
the learning efficiency with batch normalization and shortcut connections. To
demonstrate feasibility, we showed that medGAN generates synthetic EHR datasets
that achieve comparable performance to real data on many experiments including
distribution statistics, predictive modeling tasks and medical expert review.
Ramin M. Hasani, Guodong Wang, Radu Grosu
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This paper studies an intelligent ultimate technique for health-monitoring
and prognostic of common rotary machine components, particularly bearings.
During a run-to-failure experiment, rich unsupervised features from vibration
sensory data are extracted by a trained sparse auto-encoder. Then, the
correlation of the extracted attributes of the initial samples (presumably
healthy at the beginning of the test) with the succeeding samples is calculated
and passed through a moving-average filter. The normalized output is named
auto-encoder correlation-based (AEC) rate which stands for an informative
attribute of the system depicting its health status and precisely identifying
the degradation starting point. We show that AEC technique well-generalizes in
several run-to-failure tests. AEC collects rich unsupervised features form the
vibration data fully autonomous. We demonstrate the superiority of the AEC over
many other state-of-the-art approaches for the health monitoring and prognostic
of machine bearings.
Ramin M. Hasani, Victoria Beneder, Magdalena Fuchs, David Lung, Radu Grosu
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
We introduce SIM-CE, an advanced, user-friendly modeling and simulation
environment in Simulink for performing multi-scale behavioral analysis of the
nervous system of Caenorhabditis elegans (C. elegans). SIM-CE contains an
implementation of the mathematical models of C. elegans’s neurons and synapses,
in Simulink, which can be easily extended and particularized by the user. The
Simulink model is able to capture both complex dynamics of ion channels and
additional biophysical detail such as intracellular calcium concentration. We
demonstrate the performance of SIM-CE by carrying out neuronal, synaptic and
neural-circuit-level behavioral simulations. Such environment enables the user
to capture unknown properties of the neural circuits, test hypotheses and
determine the origin of many behavioral plasticities exhibited by the worm.
Ramin M. Hasani, Magdalena Fuchs, Victoria Beneder, Radu Grosu
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE); Quantitative Methods (q-bio.QM)
Caenorhabditis elegans (C. elegans) illustrated remarkable behavioral
plasticities including complex non-associative and associative learning
representations. Understanding the principles of such mechanisms presumably
leads to constructive inspirations for the design of efficient learning
algorithms. In the present study, we postulate a novel approach on modeling
single neurons and synapses to study the mechanisms underlying learning in the
C. elegans nervous system. In this regard, we construct a precise mathematical
model of sensory neurons where we include multi-scale details from genes, ion
channels and ion pumps, together with a dynamic model of synapses comprised of
neurotransmitters and receptors kinetics. We recapitulate mechanosensory
habituation mechanism, a non-associative learning process, in which elements of
the neural network tune their parameters as a result of repeated input stimuli.
Accordingly, we quantitatively demonstrate the roots of such plasticity in the
neuronal and synaptic-level representations. Our findings can potentially give
rise to the development of new bio-inspired learning algorithms.
Mason McGill, Pietro Perona
Comments: Submitted to ICML 2017
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose and systematically evaluate three strategies for training
dynamically-routed artificial neural networks: graphs of learned
transformations through which different input signals may take different paths.
Though some approaches have advantages over others, the resulting networks are
often qualitatively similar. We find that, in dynamically-routed networks
trained to classify images, layers and branches become specialized to process
distinct categories of images. Additionally, given a fixed computational
budget, dynamically-routed networks tend to perform better than comparable
statically-routed networks.
Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a conceptually simple, flexible, and general framework for object
instance segmentation. Our approach efficiently detects objects in an image
while simultaneously generating a high-quality segmentation mask for each
instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a
branch for predicting an object mask in parallel with the existing branch for
bounding box recognition. Mask R-CNN is simple to train and adds only a small
overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to
generalize to other tasks, e.g., allowing us to estimate human poses in the
same framework. We show top results in all three tracks of the COCO suite of
challenges, including instance segmentation, bounding-box object detection, and
person keypoint detection. Without tricks, Mask R-CNN outperforms all existing,
single-model entries on every task, including the COCO 2016 challenge winners.
We hope our simple and effective approach will serve as a solid baseline and
help ease future research in instance-level recognition. Code will be made
available.
Xun Huang, Serge Belongie
Comments: An extended abstract of this paper was accepted to ICLR workshop. Code is available: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Gatys et al. recently introduced a neural algorithm that renders a content
image in the style of another image, achieving so-called style transfer.
However, their framework requires a slow iterative optimization process, which
limits its practical application. Fast approximations with feed-forward neural
networks have been proposed to speed up neural style transfer. Unfortunately,
the speed improvement comes at a cost: the network is usually tied to a fixed
set of styles and cannot adapt to arbitrary new styles. In this paper, we
present a simple yet effective approach that for the first time enables
arbitrary style transfer in real-time. At the heart of our method is a novel
adaptive instance normalization (AdaIN) layer that aligns the mean and variance
of the content features with those of the style features. Our method achieves
speed comparable to the fastest existing approach, without the restriction to a
pre-defined set of styles. In addition, our approach allows flexible user
controls such as content-style trade-off, style interpolation, color & spatial
controls, all using a single feed-forward neural network.
Hossein Hosseini, Radha Poovendran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Deep Neural Networks (DNNs) have achieved remarkable performance on a variety
of pattern-recognition tasks, particularly visual classification problems,
where new algorithms reported to achieve or even surpass the human performance.
In this paper, we test the state-of-the-art DNNs with negative images and show
that the accuracy drops to the level of random classification. This leads us to
the conjecture that the DNNs, which are merely trained on raw data, do not
recognize the semantics of the objects, but rather memorize the inputs. We
suggest that negative images can be thought as “semantic adversarial examples”,
which we define as transformed inputs that semantically represent the same
objects, but the model does not classify them correctly.
Kaicheng Yu, Mathieu Salzmann
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional Neural Networks (CNNs) have been successfully applied to many
computer vision tasks, such as image classification. By performing linear
combinations and element-wise nonlinear operations, these networks can be
thought of as extracting solely first-order information from an input image. In
the past, however, second-order statistics computed from handcrafted features,
e.g., covariances, have proven highly effective in diverse recognition tasks.
In this paper, we introduce a novel class of CNNs that exploit second-order
statistics. To this end, we design a series of new layers that (i) extract a
covariance matrix from convolutional activations, (ii) compute a parametric,
second-order transformation of a matrix, and (iii) perform a parametric
vectorization of a matrix. These operations can be assembled to form a
Covariance Descriptor Unit (CDU), which replaces the fully-connected layers of
standard CNNs. Our experiments demonstrate the benefits of our new
architecture, which outperform the first-order CNNs, while relying on up to 90%
fewer parameters.
Hao Dong, Jingqing Zhang, Douglas McIlwraith, Yike Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Translating information between text and image is a fundamental problem in
artificial intelligence that connects natural language processing and computer
vision. In the past few years, performance in image caption generation has seen
significant improvement through the adoption of recurrent neural networks
(RNN). Meanwhile, text-to-image generation begun to generate plausible images
using datasets of specific categories like birds and flowers. We’ve even seen
image generation from multi-category datasets such as the Microsoft Common
Objects in Context (MSCOCO) through the use of generative adversarial networks
(GANs). Synthesizing objects with a complex shape, however, is still
challenging. For example, animals and humans have many degrees of freedom,
which means that they can take on many complex shapes. We propose a new
training method called Image-Text-Image (I2T2I) which integrates text-to-image
and image-to-text (image captioning) synthesis to improve the performance of
text-to-image synthesis. We demonstrate that %the capability of our method to
understand the sentence descriptions, so as to I2T2I can generate better
multi-categories images using MSCOCO than the state-of-the-art. We also
demonstrate that I2T2I can achieve transfer learning by using a pre-trained
image captioning module to generate human images on the MPII Human Pose
Yuting Hu, Liang Zheng, Yi Yang, Yongfeng Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper contributes a new large-scale dataset for weakly supervised
cross-media retrieval, named Twitter100k. Current datasets, such as Wikipedia,
NUS Wide and Flickr30k, have two major limitations. First, these datasets are
lacking in content diversity, i.e., only some pre-defined classes are covered.
Second, texts in these datasets are written in well-organized language, leading
to inconsistency with realistic applications. To overcome these drawbacks, the
proposed Twitter100k dataset is characterized by two aspects: 1) it has 100,000
image-text pairs randomly crawled from Twitter and thus has no constraint in
the image categories; 2) text in Twitter100k is written in informal language by
the users.
Since strongly supervised methods leverage the class labels that may be
missing in practice, this paper focuses on weakly supervised learning for
cross-media retrieval, in which only text-image pairs are exploited during
training. We extensively benchmark the performance of four subspace learning
methods and three variants of the Correspondence AutoEncoder, along with
various text features on Wikipedia, Flickr30k and Twitter100k. Novel insights
are provided. As a minor contribution, inspired by the characteristic of
Twitter100k, we propose an OCR-based cross-media retrieval method. In
experiment, we show that the proposed OCR-based method improves the baseline
performance.
Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
Comments: 11 pages, 4 figures, 2 tables, webpage: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
We introduce the first goal-driven training for visual question answering and
dialog agents. Specifically, we pose a cooperative ‘image guessing’ game
between two agents — Qbot and Abot — who communicate in natural language
dialog so that Qbot can select an unseen image from a lineup of images. We use
deep reinforcement learning (RL) to learn the policies of these agents
end-to-end — from pixels to multi-agent multi-round dialog to game reward.
We demonstrate two experimental results.
First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show
results on a synthetic world, where the agents communicate in ungrounded
vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find
that two bots invent their own communication protocol and start using certain
symbols to ask/answer about certain visual attributes (shape/color/size). Thus,
we demonstrate the emergence of grounded language and communication among
‘visual’ dialog agents with no human supervision at all.
Second, we conduct large-scale real-image experiments on the VisDial dataset,
where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’
agents significantly outperform SL agents. Interestingly, the RL Qbot learns to
ask questions that Abot is good at, ultimately resulting in more informative
dialog and a better team.
Ravi Kiran Sarvadevabhatla, Sudharshan Suresh, R. Venkatesh Babu
Comments: Accepted for publication in Transactions on Image Processing (this http URL)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The study of eye gaze fixations on photographic images is an active research
area. In contrast, the image subcategory of freehand sketches has not received
as much attention for such studies. In this paper, we analyze the results of a
free-viewing gaze fixation study conducted on 3904 freehand sketches
distributed across 160 object categories. Our analysis shows that fixation
sequences exhibit marked consistency within a sketch, across sketches of a
category and even across suitably grouped sets of categories. This multi-level
consistency is remarkable given the variability in depiction and extreme image
content sparsity that characterizes hand-drawn object sketches. In our paper,
we show that the multi-level consistency in the fixation data can be exploited
to (a) predict a test sketch’s category given only its fixation sequence and
(b) build a computational model which predicts part-labels underlying fixations
on objects. We hope that our findings motivate the community to deem
sketch-like representations worthy of gaze-based studies vis-a-vis photographic
images.
Yuanwei Wu, Yao Sui, Guanghui Wang
Comments: 8 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The paper focuses on the problem of vision-based obstacle detection and
tracking for unmanned aerial vehicle navigation. A real-time object
localization and tracking strategy from monocular image sequences is developed
by effectively integrating the object detection and tracking into a dynamic
Kalman model. At the detection stage, the object of interest is automatically
detected and localized from a saliency map computed via the image background
connectivity cue at each frame; at the tracking stage, a Kalman filter is
employed to provide a coarse prediction of the object state, which is further
refined via a local detector incorporating the saliency map and the temporal
information between two consecutive frames. Compared to existing methods, the
proposed approach does not require any manual initialization for tracking, runs
much faster than the state-of-the-art trackers of its kind, and achieves
competitive tracking performance on a large number of image sequences.
Extensive experiments demonstrate the effectiveness and superior performance of
the proposed approach.
Baoguang Shi, Xiang Bai, Serge Belongie
Comments: Accepted by CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most state-of-the-art text detection methods are specific to horizontal text
in Latin scripts and are not fast enough for real-time applications. We
introduce Segment Linking (SegLink), an oriented text detection method. The
main idea is to decompose text into two locally detectable elements, namely
segments and links. A segment is an oriented bounding box that covers a part of
a word or text line; A link connects two adjacent segments, indicating that
they belong to the same word or line. Both elements are detected densely at
multiple scales by an end-to-end trained, fully-convolutional neural network.
Final detections are the combinations of segments that are connected by links.
Compared with previous methods, our method improves along the dimensions of
accuracy, speed and ease of training. It achieves an f-measure of 75.0% on the
standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the
previous best by a large margin. It runs at over 20 FPS on 512×512 input
images. In addition, our method is able to detect non-Latin text in long lines.
Jia-Hong Huang, Modar Alfadly, Bernard Ghanem
Comments: Submitted ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Taking image and question as the input of our method, it can output the
text-based answer of the query question about the given image, so called Visual
Question Answering (VQA). There are two main modules in our algorithm. Given a
natural language question about an image, the first module takes the question
as input and then outputs the basic questions of the main question, given
question. The second module takes the main question, image and these basic
questions as input and then outputs the text-based answer of the main question.
We formulate the basic questions generation problem as a LASSO optimization
problem, and also propose a criterion about how to exploit these basic
questions to help answer main question. Our method is evaluated on the
challenging VQA dataset, and yields the competitive performance compared to
state-of-the-art.
Ronald Kemker, Christopher Kanan
Comments: 9 pages, submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
A semantic segmentation algorithm must assign a label to every pixel in an
image. Recently, semantic segmentation of RGB imagery has advanced
significantly due to deep learning. Because creating datasets for semantic
segmentation is laborious, these datasets tend to be significantly smaller than
object recognition datasets. This makes it difficult to directly train a deep
neural network for semantic segmentation, because it will be prone to
overfitting. To cope with this, deep learning models typically use
convolutional neural networks pre-trained on large-scale image classification
datasets, which are then fine-tuned for semantic segmentation. For non-RGB
imagery, this is currently not possible because large-scale labeled non-RGB
datasets do not exist. In this paper, we developed two deep neural networks for
semantic segmentation of multispectral remote sensing imagery. Prior to
training on the target dataset, we initialize the networks with large amounts
of synthetic multispectral imagery. We show that this significantly improves
results on real-world remote sensing imagery, and we establish a new
state-of-the-art result on the challenging Hamlin Beach State Park Dataset.
Assaf Hoogi, John W. Lambert, Yefeng Zheng, Dorin Comaniciu, Daniel L. Rubin
Comments: Workshop on Machine Learning in Healthcare, Neural Information Processing Systems (NIPS). Barcelona, Spain, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a fully-automated method for accurate and robust detection and
segmentation of potentially cancerous lesions found in the liver and in lymph
nodes. The process is performed in three steps, including organ detection,
lesion detection and lesion segmentation. Our method applies machine learning
techniques such as marginal space learning and convolutional neural networks,
as well as active contour models. The method proves to be robust in its
handling of extremely high lesion diversity. We tested our method on volumetric
computed tomography (CT) images, including 42 volumes containing liver lesions
and 86 volumes containing 595 pathological lymph nodes. Preliminary results
under 10-fold cross validation show that for both the liver lesions and the
lymph nodes, a total detection sensitivity of 0.53 and average Dice score of
(0.71 pm 0.15) for segmentation were obtained.
Ayushman Dash, John Cristian Borges Gamboa, Sheraz Ahmed, Muhammad Zeshan Afzal, Marcus Liwicki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we present the Text Conditioned Auxiliary Classifier Generative
Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network
(GAN) for synthesizing images from their text descriptions. Former approaches
have tried to condition the generative process on the textual data; but allying
it to the usage of class information, known to diversify the generated samples
and improve their structural coherence, has not been explored. We trained the
presented TAC-GAN model on the Oxford-102 dataset of flowers, and evaluated the
discriminability of the generated images with Inception-Score, as well as their
diversity using the Multi-Scale Structural Similarity Index (MS-SSIM). Our
approach outperforms the state-of-the-art models, i.e., its inception score is
3.45, corresponding to a relative increase of 7.8% compared to the recently
introduced StackGan. A comparison of the mean MS-SSIM scores of the training
and generated samples per class shows that our approach is able to generate
highly diverse images with an average MS-SSIM of 0.14 over all generated
classes.
Andreas Kölsch, Muhammad Zeshan Afzal, Marcus Liwicki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we propose the combined usage of low- and high-level blocks of
convolutional neural networks (CNNs) for improving object recognition. While
recent research focused on either propagating the context from all layers, e.g.
ResNet, (including the very low-level layers) or having multiple loss layers
(e.g. GoogLeNet), the importance of the features close to the higher layers is
ignored. This paper postulates that the use of context closer to the high-level
layers provides the scale and translation invariance and works better than
using the top layer only. In particular, we extend AlexNet and GoogLeNet by
additional connections in the top (n) layers. In order to demonstrate the
effectiveness of the proposed approach, we evaluated it on the standard
ImageNet task. The relative reduction of the classification error is around
1-2% without affecting the computational cost. Furthermore, we show that this
approach is orthogonal to typical test data augmentation techniques, as
recently introduced by Szegedy et al. (leading to a runtime reduction of 144
during test time).
Jiang Lu, Jin Li, Ziang Yan, Changshui Zhang
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Zero-shot learning (ZSL) is a challenging task aiming at recognizing novel
classes without any training instances. In this paper we present a simple but
high-performance ZSL approach by generating pseudo feature representations
(GPFR). Given the dataset of seen classes and side information of unseen
classes (e.g. attributes), we synthesize feature-level pseudo representations
for novel concepts, which allows us access to the formulation of unseen class
predictor. Firstly we design a Joint Attribute Feature Extractor (JAFE) to
acquire understandings about attributes, then construct a cognitive repository
of attributes filtered by confidence margins, and finally generate pseudo
feature representations using a probability based sampling strategy to
facilitate subsequent training process of class predictor. We demonstrate the
effectiveness in ZSL settings and the extensibility in supervised recognition
scenario of our method on a synthetic colored MNIST dataset (C-MNIST). For
several popular ZSL benchmark datasets, our approach also shows compelling
results on zero-shot recognition task, especially leading to tremendous
improvement to state-of-the-art mAP on zero-shot retrieval task.
Shichao Yang, Sebastian Scherer
Comments: ICRA 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Most visual odometry algorithm for a monocular camera focuses on points,
either by feature matching, or direct alignment of pixel intensity, while
ignoring a common but important geometry entity: edges. In this paper, we
propose an odometry algorithm that combines points and edges to benefit from
the advantages of both direct and feature based methods. It works better in
texture-less environments and is also more robust to lighting changes and fast
motion by increasing the convergence basin. We maintain a depth map for the
keyframe then in the tracking part, the camera pose is recovered by minimizing
both the photometric error and geometric error to the matched edge in a
probabilistic framework. In the mapping part, edge is used to speed up and
increase stereo matching accuracy. On various public datasets, our algorithm
achieves better or comparable performance than state-of-the-art monocular
odometry methods. In some challenging texture-less environments, our algorithm
reduces the state estimation error over 50%.
Ehab Salahat, Murad Qasaimeh
Comments: Annual IEEE Industrial Electronics Societys 18th International Conf. on Industrial Technology (ICIT), 22-25 March, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Computer vision is one of the most active research fields in information
technology today. Giving machines and robots the ability to see and comprehend
the surrounding world at the speed of sight creates endless potential
applications and opportunities. Feature detection and description algorithms
can be indeed considered as the retina of the eyes of such machines and robots.
However, these algorithms are typically computationally intensive, which
prevents them from achieving the speed of sight real-time performance. In
addition, they differ in their capabilities and some may favor and work better
given a specific type of input compared to others. As such, it is essential to
compactly report their pros and cons as well as their performances and recent
advances. This paper is dedicated to provide a comprehensive overview on the
state-of-the-art and recent advances in feature detection and description
algorithms. Specifically, it starts by overviewing fundamental concepts. It
then compares, reports and discusses their performance and capabilities. The
Maximally Stable Extremal Regions algorithm and the Scale Invariant Feature
Transform algorithms, being two of the best of their type, are selected to
report their recent algorithmic derivatives.
Li Sun, Cheng Zhao, Rustam Stolkin
Comments: 8 pages, 5 figures, submitted to conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper addresses the problem of RGBD object recognition in real-world
applications, where large amounts of annotated training data are typically
unavailable. To overcome this problem, we propose a novel, weakly-supervised
learning architecture (DCNN-GPC) which combines parametric models (a pair of
Deep Convolutional Neural Networks (DCNN) for RGB and D modalities) with
non-parametric models (Gaussian Process Classification). Our system is
initially trained using a small amount of labeled data, and then automatically
prop- agates labels to large-scale unlabeled data. We first run 3D- based
objectness detection on RGBD videos to acquire many unlabeled object proposals,
and then employ DCNN-GPC to label them. As a result, our multi-modal DCNN can
be trained end-to-end using only a small amount of human annotation. Finally,
our 3D-based objectness detection and multi-modal DCNN are integrated into a
real-time detection and recognition pipeline. In our approach, bounding-box
annotations are not required and boundary-aware detection is achieved. We also
propose a novel way to pretrain a DCNN for the depth modality, by training on
virtual depth images projected from CAD models. We pretrain our multi-modal
DCNN on public 3D datasets, achieving performance comparable to
state-of-the-art methods on Washington RGBS Dataset. We then finetune the
network by further training on a small amount of annotated data from our novel
dataset of industrial objects (nuclear waste simulants). Our weakly supervised
approach has demonstrated to be highly effective in solving a novel RGBD object
recognition application which lacks of human annotations.
Hongzhi Li, Joseph G. Ellis, Lei Zhang, Shih-Fu Chang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Visual patterns represent the discernible regularity in the visual world.
They capture the essential nature of visual objects or scenes. Understanding
and modeling visual patterns is a fundamental problem in visual recognition
that has wide ranging applications. In this paper, we study the problem of
visual pattern mining and propose a novel deep neural network architecture
called PatternNet for discovering these patterns that are both discriminative
and representative. The proposed PatternNet leverages the filters in the last
convolution layer of a convolutional neural network to find locally consistent
visual patches, and by combining these filters we can effectively discover
unique visual patterns. In addition, PatternNet can discover visual patterns
efficiently without performing expensive image patch sampling, and this
advantage provides an order of magnitude speedup compared to most other
approaches. We evaluate the proposed PatternNet subjectively by showing
randomly selected visual patterns which are discovered by our method and
quantitatively by performing image classification with the identified visual
patterns and comparing our performance with the current state-of-the-art. We
also directly evaluate the quality of the discovered visual patterns by
leveraging the identified patterns as proposed objects in an image and compare
with other relevant methods. Our proposed network and procedure, PatterNet, is
able to outperform competing methods for the tasks described.
Shiyu Huang, Deva Ramanan
Comments: To appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
As autonomous vehicles become an every-day reality, high-accuracy pedestrian
detection is of paramount practical importance. Pedestrian detection is a
highly researched topic with mature methods, but most datasets focus on common
scenes of people engaged in typical walking poses on sidewalks. But performance
is most crucial for dangerous scenarios, such as children playing in the street
or people using bicycles/skateboards in unexpected ways. Such “in-the-tail”
data is notoriously hard to observe, making both training and testing
difficult. To analyze this problem, we have collected a novel annotated dataset
of dangerous scenarios called the Precarious Pedestrian dataset. Even given a
dedicated collection effort, it is relatively small by contemporary standards
(around 1000 images). To explore large-scale data-driven learning, we explore
the use of synthetic data generated by a game engine. A significant challenge
is selected the right “priors” or parameters for synthesis: we would like
realistic data with realistic poses and object configurations. Inspired by
Generative Adversarial Networks, we generate a massive amount of synthetic data
and train a discriminative classifier to select a realistic subset, which we
deem Synthetic Imposters. We demonstrate that this pipeline allows one to
generate realistic training data by making use of rendering/animation engines.
Interestingly, we also demonstrate that such data can be used to rank
algorithms, suggesting that Synthetic Imposters can also be used for
“in-the-tail” validation at test-time, a notoriously difficult challenge for
real-world deployment.
Qi Yang, Yanzhu Zhang, Tiebiao Zhao, YangQuan Chen
Comments: 24 pages, 13 figures, it is to appear in ISA Transactions
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image super-resolution using self-optimizing mask via fractional-order
gradient interpolation and reconstruction aims to recover detailed information
from low-resolution images and reconstruct them into high-resolution images.
Due to the limited amount of data and information retrieved from low-resolution
images, it is difficult to restore clear, artifact-free images, while still
preserving enough structure of the image such as the texture. This paper
presents a new single image super-resolution method which is based on adaptive
fractional-order gradient interpolation and reconstruction. The interpolated
image gradient via optimal fractional-order gradient is first constructed
according to the image similarity and afterwards the minimum energy function is
employed to reconstruct the final high-resolution image. Fractional-order
gradient based interpolation methods provide an additional degree of freedom
which helps optimize the implementation quality due to the fact that an extra
free parameter (alpha)-order is being used. The proposed method is able to
produce a rich texture detail while still being able to maintain structural
similarity even under large zoom conditions. Experimental results show that the
proposed method performs better than current single image super-resolution
techniques.
Chunde Huang, Jiaxiang Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The histogram of oriented gradients (HOG) is a widely used feature descriptor
in computer vision for the purpose of object detection. In the paper, a
modified HOG descriptor is described, it uses a lookup table and the method of
integral image to speed up the detection performance by a factor of 5~10. By
exploiting the special hardware features of a given platform(e.g. a digital
signal processor), further improvement can be made to the HOG descriptor in
order to have real-time object detection and tracking.
Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recognizing how objects interact with each other is a crucial task in visual
recognition. If we define the context of the interaction to be the objects
involved, then most current methods can be categorized as either: (i) training
a single classifier on the combination of the interaction and its context; or
(ii) aiming to recognize the interaction independently of its explicit context.
Both methods suffer limitations: the former scales poorly with the number of
combinations and fails to generalize to unseen combinations, while the latter
often leads to poor interaction recognition performance due to the difficulty
of designing a context-independent interaction classifier. To mitigate those
drawbacks, this paper proposes an alternative, context-aware interaction
recognition framework. The key to our method is to explicitly construct an
interaction classifier which combines the context, and the interaction. The
context is encoded via word2vec into a semantic space, and is used to derive a
classification result for the interaction.
The proposed method still builds one classifier for one interaction (as per
type (ii) above), but the classifier built is adaptive to context via weights
which are context dependent. The benefit of using the semantic space is that it
naturally leads to zero-shot generalizations in which semantically similar
contexts (subjectobject pairs) can be recognized as suitable contexts for an
interaction, even if they were not observed in the training set.
Chen-Yu Lee, Vijay Badrinarayanan, Tomasz Malisiewicz, Andrew Rabinovich
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper focuses on the task of room layout estimation from a monocular RGB
image. Prior works break the problem into two sub-tasks: semantic segmentation
of floor, walls, ceiling to produce layout hypotheses, followed by an iterative
optimization step to rank these hypotheses. In contrast, we adopt a more direct
formulation of this problem as one of estimating an ordered set of room layout
keypoints. The room layout and the corresponding segmentation is completely
specified given the locations of these ordered keypoints. We predict the
locations of the room layout keypoints using RoomNet, an end-to-end trainable
encoder-decoder network. On the challenging benchmark datasets Hedau and LSUN,
we achieve state-of-the-art performance along with 200x to 600x speedup
compared to the most recent work. Additionally, we present optional extensions
to the RoomNet architecture such as including recurrent computations and memory
units to refine the keypoint locations under the same parametric capacity.
Arun Mallya, Svetlana Lazebnik
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This work proposes Recurrent Neural Network (RNN) models to predict
structured image situations – actions and noun entities fulfilling semantic
roles related to the action. We transform the task of noun entity prediction to
that of sequential prediction and use RNNs in contrast to prior work that uses
Conditional Random Fields (CRFs). By using an action prediction network and a
separate network with an RNN for noun prediction, we obtain state-of-the-art
accuracy on the challenging recent imSitu dataset, beating CRF-based models,
including ones trained with additional data. Further, we show that specialized
features learned from situation prediction can be transferred to the task of
image captioning to more accurately describe human-object interactions.
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural networks (CNNs) are inherently limited to model
geometric transformations due to the fixed geometric structures in its building
modules. In this work, we introduce two new modules to enhance the
transformation modeling capacity of CNNs, namely, deformable convolution and
deformable RoI pooling. Both are based on the idea of augmenting the spatial
sampling locations in the modules with additional offsets and learning the
offsets from target tasks, without additional supervision. The new modules can
readily replace their plain counterparts in existing CNNs and can be easily
trained end-to-end by standard back-propagation, giving rise to deformable
convolutional networks. Extensive experiments validate the effectiveness of our
approach on sophisticated vision tasks of object detection and semantic
segmentation. The code would be released.
Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia
Comments: 8 Pages , 7 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Temporal Action Proposal (TAP) generation is an important problem, as fast
and accurate extraction of semantically important (e.g. human actions) segments
from untrimmed videos is an important step for large-scale video analysis. We
propose a novel Temporal Unit Regression Network (TURN) model. There are two
salient aspects of TURN: (1) TURN jointly predicts action proposals and refines
the temporal boundaries by temporal coordinate regression; (2) Fast computation
is enabled by unit feature reuse: a long untrimmed video is decomposed into
video units, which are reused as basic building blocks of temporal proposals.
TURN outperforms the state-of-the-art methods under average recall (AR) by a
large margin on THUMOS-14 and ActivityNet datasets, and runs at over 880 frames
per second (FPS) on a TITAN X GPU. We further apply TURN as a proposal
generation stage for existing temporal action localization pipelines, it
outperforms state-of-the-art performance on THUMOS-14 and ActivityNet.
Sheng Zou, Hao Sun, Alina Zare
Subjects: Computer Vision and Pattern Recognition (cs.CV)
A semi-supervised Partial Membership Latent Dirichlet Allocation approach is
developed for hyperspectral unmixing and endmember estimation while accounting
for spectral variability and spatial information. Partial Membership Latent
Dirichlet Allocation is an effective approach for spectral unmixing while
representing spectral variability and leveraging spatial information. In this
work, we extend Partial Membership Latent Dirichlet Allocation to incorporate
any available (imprecise) label information to help guide unmixing.
Experimental results on two hyperspectral datasets show that the proposed
semi-supervised PM-LDA can yield improved hyperspectral unmixing and endmember
estimation results.
Mason McGill, Pietro Perona
Comments: Submitted to ICML 2017
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose and systematically evaluate three strategies for training
dynamically-routed artificial neural networks: graphs of learned
transformations through which different input signals may take different paths.
Though some approaches have advantages over others, the resulting networks are
often qualitatively similar. We find that, in dynamically-routed networks
trained to classify images, layers and branches become specialized to process
distinct categories of images. Additionally, given a fixed computational
budget, dynamically-routed networks tend to perform better than comparable
statically-routed networks.
Fabio Aurelio D'Asaro, Antonis Bikakis, Luke Dickens, Rob Miller
Comments: Technical report
Subjects: Artificial Intelligence (cs.AI)
We present PEC, an Event Calculus (EC) style action language for reasoning
about probabilistic causal and narrative information. It has an action language
style syntax similar to that of the EC variant Modular-E. Its semantics is
given in terms of possible worlds which constitute possible evolutions of the
domain, and builds on that of EFEC, an epistemic extension of EC. We also
describe an ASP implementation of PEC and show the sense in which this is sound
and complete.
Peter Karkus, David Hsu, Wee Sun Lee
Comments: 9 pages, 5 figures, 1 table
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This paper introduces QMDP-net, a neural network architecture for planning
under partial observability. The QMDP-net combines the strengths of model-free
learning and model-based planning. It is a recurrent policy network, but it
represents a policy by connecting a model with a planning algorithm that solves
the model, thus embedding the solution structure of planning in the network
architecture. The QMDP-net is fully differentiable and allows end-to-end
training. We train a QMDP-net over a set of different environments so that it
can generalize over new ones. In preliminary experiments, QMDP-net showed
strong performance on several robotic tasks in simulation. Interestingly, it
also sometimes outperformed the QMDP algorithm, which generated the data for
learning, because of QMDP-net’s robustness resulting from end-to-end learning.
Diederik Aerts, Jonito Aerts Arguelles, Lester Beltran, Lyneth Beltran, Isaac Distrito, Massimiliano Sassoli de Bianchi, Sandro Sozzo, Tomas Veloz
Comments: 20 pages, no figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Quantum Physics (quant-ph)
We elaborate a quantum model for corpora of written documents, like the pages
forming the World Wide Web. To that end, we are guided by how physicists
constructed quantum theory for microscopic entities, which unlike classical
objects cannot be fully represented in our spatial theater. We suggest that a
similar construction needs to be carried out by linguists and computational
scientists, to capture the full meaning content of collections of documental
entities. More precisely, we show how to associate a quantum-like ‘entity of
meaning’ to a ‘language entity formed by printed documents’, considering the
latter as the collection of traces that are left by the former, in specific
results of search actions that we describe as measurements. In other words, we
offer a perspective where a collection of documents, like the Web, is described
as the space of manifestation of a more complex entity – the QWeb – which is
the object of our modeling, drawing its inspiration from previous studies on
operational-realistic approaches to quantum physics and quantum modeling of
human cognition and decision-making. We emphasize that a consistent QWeb model
needs to account for the observed correlations between words appearing in
printed documents, e.g., co-occurrences, as the latter would depend on the
‘meaning connections’ existing between the concepts that are associated with
these words. In that respect, we show that both ‘context and interference
(quantum) effects’ are required to explain the probabilities calculated by
counting the relative number of documents containing certain words and
co-ocurrrences of words.
Tshilidzi Marwala, Evan Hurwitz
Comments: Marwala, T. and Hurwitz, E. (2017) Artificial Intelligence and Economic Theory: Skynet in the Market. Springer. (Accepted)
Subjects: Artificial Intelligence (cs.AI)
The advent of artificial intelligence has changed many disciplines such as
engineering, social science and economics. Artificial intelligence is a
computational technique which is inspired by natural intelligence such as the
swarming of birds, the working of the brain and the pathfinding of the ants.
These techniques have impact on economic theories. This book studies the impact
of artificial intelligence on economic theories, a subject that has not been
extensively studied. The theories that are considered are: demand and supply,
asymmetrical information, pricing, rational choice, rational expectation, game
theory, efficient market hypotheses, mechanism design, prospect, bounded
rationality, portfolio theory, rational counterfactual and causality. The
benefit of this book is that it evaluates existing theories of economics and
update them based on the developments in artificial intelligence field.
Thanuka Wickramarathne
Subjects: Artificial Intelligence (cs.AI)
Conditioning is the primary method for belief revision in data fusion systems
employing probabilistic inferencing. However, big-data environments, where soft
(i.e., human or human-based) sources are commonly utilized in addition to hard
(i.e., physics-based sensors, pose several challenges to traditional
conditioning tasks primarily due to the numerous data/source imperfections that
are characteristic of such data. The objective of this paper is to investigate
the most natural extension of Bayes conditioning based evidence updates in the
presence of such large-scale data uncertainties and source/sensor
imperfections. By viewing the evidence updating process as a thought
experiment, we devise an elegant strategy for robust evidence updating in the
presence of extreme uncertainties characteristic of big-data environments. In
particular, we look at the formulation of a belief theoretic evidence updating
mechanism that is derived as a natural extension of Bayes conditional approach
when the incoming evidence takes the form of a general belief function.
Proposed method generalizes the belief theoretic Fagin-Halpern conditional
notion, and provides a novel evidence updating strategy that is derived as a
natural extension of Bayes conditional applied in a highly uncertain and
complex fusion scenario that is characteristic of big-data environments. The
presented extension differs fundamentally from the previously published work on
Conditional Update Equation (CUE) as well as authors own work. An overview of
this development is provided via illustrative examples. Furthermore, insights
into parameter selection under various fusion contexts are also provided.
Peeyush Kumar, Doina Precup
Subjects: Artificial Intelligence (cs.AI)
Deliberating on large or continuous state spaces have been long standing
challenges in reinforcement learning. Temporal Abstraction have somewhat made
this possible, but efficiently planing using temporal abstraction still remains
an issue. Moreover using spatial abstractions to learn policies for various
situations at once while using temporal abstraction models is an open problem.
We propose here an efficient algorithm which is convergent under linear
function approximation while planning using temporally abstract actions. We
show how this algorithm can be used along with randomly generated option models
over multiple time scales to plan agents which need to act real time. Using
these randomly generated option models over multiple time scales are shown to
reduce number of decision epochs required to solve the given task, hence
effectively reducing the time needed for deliberation.
Mark Muraven
Subjects: Artificial Intelligence (cs.AI)
Research on human self-regulation has shown that people hold many goals
simultaneously and have complex self-regulation mechanisms to deal with this
goal conflict. Artificial autonomous systems may also need to find ways to cope
with conflicting goals. Indeed, the intricate interplay among different goals
may be critical to the design as well as long-term safety and stability of
artificial autonomous systems. I discuss some of the critical features of the
human self-regulation system and how it might be applied to an artificial
system. Furthermore, the implications of goal conflict for the reliability and
stability of artificial autonomous systems and ensuring their alignment with
human goals and ethics is examined.
Jiří Vomlel, Václav Kratochvíl
Comments: 10 pages, 2 figures
Subjects: Artificial Intelligence (cs.AI)
Influence diagrams are a decision-theoretic extension of probabilistic
graphical models. In this paper we show how they can be used to solve the
Goddard problem. We present results of numerical experiments with this problem
and compare the solutions provided by influence diagrams with the optimal
solution.
Jialin Liu, Julian Togelius, Diego Perez-Liebana, Simon M. Lucas
Comments: 9 pages, 17 figures, CEC2017
Subjects: Artificial Intelligence (cs.AI)
Most games have, or can be generalised to have, a number of parameters that
may be varied in order to provide instances of games that lead to very
different player experiences. The space of possible parameter settings can be
seen as a search space, and we can therefore use a Random Mutation Hill
Climbing algorithm or other search methods to find the parameter settings that
induce the best games. One of the hardest parts of this approach is defining a
suitable fitness function. In this paper we explore the possibility of using
one of a growing set of General Video Game AI agents to perform automatic
play-testing. This enables a very general approach to game evaluation based on
estimating the skill-depth of a game. Agent-based play-testing is
computationally expensive, so we compare two simple but efficient optimisation
algorithms: the Random Mutation Hill-Climber and the Multi-Armed Bandit Random
Mutation Hill-Climber. For the test game we use a space-battle game in order to
provide a suitable balance between simulation speed and potential skill-depth.
Results show that both algorithms are able to rapidly evolve game versions with
significant skill-depth, but that choosing a suitable resampling number is
essential in order to combat the effects of noise.
Jacob W. Crandall, Mayada Oudah, Tennom, Fatimah Ishowo-Oloko, Sherief Abdallah, Jean-François Bonnefon, Manuel Cebrian, Azim Shariff, Michael A. Goodrich, Iyad Rahwan
Subjects: Artificial Intelligence (cs.AI)
Since Alan Turing envisioned Artificial Intelligence (AI) [1], a major
driving force behind technical progress has been competition with human
cognition. Historical milestones have been frequently associated with computers
matching or outperforming humans in difficult cognitive tasks (e.g. face
recognition [2], personality classification [3], driving cars [4], or playing
video games [5]), or defeating humans in strategic zero-sum encounters (e.g.
Chess [6], Checkers [7], Jeopardy! [8], Poker [9], or Go [10]). In contrast,
less attention has been given to developing autonomous machines that establish
mutually cooperative relationships with people who may not share the machine’s
preferences. A main challenge has been that human cooperation does not require
sheer computational power, but rather relies on intuition [11], cultural norms
[12], emotions and signals [13, 14, 15, 16], and pre-evolved dispositions
toward cooperation [17], common-sense mechanisms that are difficult to encode
in machines for arbitrary contexts. Here, we combine a state-of-the-art
machine-learning algorithm with novel mechanisms for generating and acting on
signals to produce a new learning algorithm that cooperates with people and
other machines at levels that rival human cooperation in a variety of
two-player repeated stochastic games. This is the first general-purpose
algorithm that is capable, given a description of a previously unseen game
environment, of learning to cooperate with people within short timescales in
scenarios previously unanticipated by algorithm designers. This is achieved
without complex opponent modeling or higher-order theories of mind, thus
showing that flexible, fast, and general human-machine cooperation is
computationally achievable using a non-trivial, but ultimately simple, set of
algorithmic mechanisms.
Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
Comments: 11 pages, 4 figures, 2 tables, webpage: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
We introduce the first goal-driven training for visual question answering and
dialog agents. Specifically, we pose a cooperative ‘image guessing’ game
between two agents — Qbot and Abot — who communicate in natural language
dialog so that Qbot can select an unseen image from a lineup of images. We use
deep reinforcement learning (RL) to learn the policies of these agents
end-to-end — from pixels to multi-agent multi-round dialog to game reward.
We demonstrate two experimental results.
First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show
results on a synthetic world, where the agents communicate in ungrounded
vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find
that two bots invent their own communication protocol and start using certain
symbols to ask/answer about certain visual attributes (shape/color/size). Thus,
we demonstrate the emergence of grounded language and communication among
‘visual’ dialog agents with no human supervision at all.
Second, we conduct large-scale real-image experiments on the VisDial dataset,
where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’
agents significantly outperform SL agents. Interestingly, the RL Qbot learns to
ask questions that Abot is good at, ultimately resulting in more informative
dialog and a better team.
Ronald Kemker, Christopher Kanan
Comments: 9 pages, submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
A semantic segmentation algorithm must assign a label to every pixel in an
image. Recently, semantic segmentation of RGB imagery has advanced
significantly due to deep learning. Because creating datasets for semantic
segmentation is laborious, these datasets tend to be significantly smaller than
object recognition datasets. This makes it difficult to directly train a deep
neural network for semantic segmentation, because it will be prone to
overfitting. To cope with this, deep learning models typically use
convolutional neural networks pre-trained on large-scale image classification
datasets, which are then fine-tuned for semantic segmentation. For non-RGB
imagery, this is currently not possible because large-scale labeled non-RGB
datasets do not exist. In this paper, we developed two deep neural networks for
semantic segmentation of multispectral remote sensing imagery. Prior to
training on the target dataset, we initialize the networks with large amounts
of synthetic multispectral imagery. We show that this significantly improves
results on real-world remote sensing imagery, and we establish a new
state-of-the-art result on the challenging Hamlin Beach State Park Dataset.
Shiyu Huang, Deva Ramanan
Comments: To appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
As autonomous vehicles become an every-day reality, high-accuracy pedestrian
detection is of paramount practical importance. Pedestrian detection is a
highly researched topic with mature methods, but most datasets focus on common
scenes of people engaged in typical walking poses on sidewalks. But performance
is most crucial for dangerous scenarios, such as children playing in the street
or people using bicycles/skateboards in unexpected ways. Such “in-the-tail”
data is notoriously hard to observe, making both training and testing
difficult. To analyze this problem, we have collected a novel annotated dataset
of dangerous scenarios called the Precarious Pedestrian dataset. Even given a
dedicated collection effort, it is relatively small by contemporary standards
(around 1000 images). To explore large-scale data-driven learning, we explore
the use of synthetic data generated by a game engine. A significant challenge
is selected the right “priors” or parameters for synthesis: we would like
realistic data with realistic poses and object configurations. Inspired by
Generative Adversarial Networks, we generate a massive amount of synthetic data
and train a discriminative classifier to select a realistic subset, which we
deem Synthetic Imposters. We demonstrate that this pipeline allows one to
generate realistic training data by making use of rendering/animation engines.
Interestingly, we also demonstrate that such data can be used to rank
algorithms, suggesting that Synthetic Imposters can also be used for
“in-the-tail” validation at test-time, a notoriously difficult challenge for
real-world deployment.
Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P. How, John Vian
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Many real-world tasks involve multiple agents with partial observability and
limited communication. Learning is challenging in these settings due to local
viewpoints of agents, which perceive the world as non-stationary due to
concurrently-exploring teammates. Approaches that learn specialized policies
for individual tasks face major problems when applied to the real world: not
only do agents have to learn and store a distinct policy for each task, but in
practice the identity of the task is often non-observable, making these
algorithms inapplicable. This paper formalizes and addresses the problem of
multi-task multi-agent reinforcement learning under partial observability. We
introduce a decentralized single-task learning approach that is robust to
concurrent interactions of teammates, and present an approach for distilling
single-task policies into a unified policy that performs well across multiple
related tasks, without explicit provision of task identity.
Mohamed Morchid, Juan-Manuel Torres-Moreno, Richard Dufour, Javier Ramírez-Rodríguez, Georges Linarès
Comments: 16 pages, 4 tables, 8 figures
Journal-ref: International Journal of Computational Linguistics and
Applications, 7(2):87-109, 2016
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
The number of documents available into Internet moves each day up. For this
reason, processing this amount of information effectively and expressibly
becomes a major concern for companies and scientists. Methods that represent a
textual document by a topic representation are widely used in Information
Retrieval (IR) to process big data such as Wikipedia articles. One of the main
difficulty in using topic model on huge data collection is related to the
material resources (CPU time and memory) required for model estimate. To deal
with this issue, we propose to build topic spaces from summarized documents. In
this paper, we present a study of topic space representation in the context of
big data. The topic space representation behavior is analyzed on different
languages. Experiments show that topic spaces estimated from text summaries are
as relevant as those estimated from the complete documents. The real advantage
of such an approach is the processing time gain: we showed that the processing
time can be drastically reduced using summarized documents (more than 60\% in
general). This study finally points out the differences between thematic
representations of documents depending on the targeted languages such as
English or latin languages.
Han Tian, Hankz Hankui Zhuo
Subjects: Information Retrieval (cs.IR)
Due to the availability of references of research papers and the rich
information contained in papers, various citation analysis approaches have been
proposed to identify similar documents for scholar recommendation. Despite of
the success of previous approaches, they are, however, based on co-occurrence
of items. Once there are no co-occurrence items available in documents, they
will not work well. Inspired by distributed representations of words in the
literature of natural language processing, we propose a novel approach to
measuring the similarity of papers based on distributed representations learned
from the citation context of papers. We view the set of papers as the
vocabulary, define the weighted citation context of papers, and convert it to
weight matrix similar to the word-word cooccurrence matrix in natural language
processing. After that we explore a variant of matrix factorization approach to
train distributed representations of papers on the matrix, and leverage the
distributed representations to measure similarities of papers. In the
experiment, we exhibit that our approach outperforms state-of-theart
citation-based approaches by 25%, and better than other distributed
representation based methods.
B Sengupta, E Vasquez, Y Qian
Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
Learning an encoding of feature vectors in terms of an over-complete
dictionary or a probabilistic information geometric (Fisher vectors) construct
is wide-spread in statistical signal processing and computer vision. In content
based information retrieval using deep-learning classifiers, such encodings are
learnt on the flattened last layer, without adherence to the multi-linear
structure of the underlying feature tensor. We illustrate a variety of feature
encodings incl. sparse dictionary coding and Fisher vectors along with
proposing that a structured tensor factorization scheme enables us to perform
retrieval that is at par, in terms of average precision, with Fisher vector
encoded image signatures. In short, we illustrate how structural constraints
increase retrieval fidelity.
Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims
Subjects: Learning (cs.LG); Information Retrieval (cs.IR)
Accurately evaluating new policies (e.g. ad-placement models, ranking
functions, recommendation functions) is one of the key prerequisites for
improving interactive systems. While the conventional approach to evaluation
relies on online A/B tests, recent work has shown that counterfactual
estimators can provide an inexpensive and fast alternative, since they can be
applied offline using log data that was collected from a different policy
fielded in the past. In this paper, we address the question of how to estimate
the performance of a new policy when we have log data from multiple historic
policies. This question is of great relevance in practice, since policies get
updated frequently in most online systems. We show that naively combining data
from multiple logging policies can be highly suboptimal. In particular, we find
that the standard Inverse Propensity Score (IPS) estimator suffers especially
when logging and evaluation policies diverge — to a point where throwing away
data improves the variance of the estimator. We therefore propose two
alternative estimators which we characterize theoretically and compare
experimentally. We find that the new estimators can provide substantially
improved estimation accuracy.
Shervin Malmasi, Mark Dras
Subjects: Computation and Language (cs.CL)
Ensemble methods using multiple classifiers have proven to be the most
successful approach for the task of Native Language Identification (NLI),
achieving the current state of the art. However, a systematic examination of
ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble
architectures such as classifier stacking have not been closely evaluated. We
present a set of experiments using three ensemble-based models, testing each
with multiple configurations and algorithms. This includes a rigorous
application of meta-classification models for NLI, achieving state-of-the-art
results on three datasets from different languages. We also present the first
use of statistical significance testing for comparing NLI systems, showing that
our results are significantly better than the previous state of the art. We
make available a collection of test set predictions to facilitate future
statistical tests.
Elvys Linhares Pontes, Thiago Gouveia da Silva, Andréa Carneiro Linhares, Juan-Manuel Torres-Moreno, Stéphane Huet
Comments: 12 pages, 1 figure, 3 tables (paper in Portuguese), Preprint of XLVIII Simp’osio Brasileiro de Pesquisa Operacional, 2016, Vit’oria, ES, (Brazil)
Subjects: Computation and Language (cs.CL)
The Internet has led to a dramatic increase in the amount of available
information. In this context, reading and understanding this flow of
information have become costly tasks. In the last years, to assist people to
understand textual data, various Natural Language Processing (NLP) applications
based on Combinatorial Optimization have been devised. However, for
Multi-Sentences Compression (MSC), method which reduces the sentence length
without removing core information, the insertion of optimization methods
requires further study to improve the performance of MSC. This article
describes a method for MSC using Combinatorial Optimization and Graph Theory to
generate more informative sentences while maintaining their grammaticality. An
experiment led on a corpus of 40 clusters of sentences shows that our system
has achieved a very good quality and is better than the state-of-the-art.
Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
Comments: Accepted as a conference paper at ICLR 2017. This is an extended version of the original paper (this https URL). The original paper proposes a new architecture, while this version focuses on transfer learning for a general model class
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Recent papers have shown that neural networks obtain state-of-the-art
performance on several different sequence tagging tasks. One appealing property
of such systems is their generality, as excellent performance can be achieved
with a unified architecture and without task-specific feature engineering.
However, it is unclear if such systems can be used for tasks without large
amounts of training data. In this paper we explore the problem of transfer
learning for neural sequence taggers, where a source task with plentiful
annotations (e.g., POS tagging on Penn Treebank) is used to improve performance
on a target task with fewer available annotations (e.g., POS tagging for
microblogs). We examine the effects of transfer learning for deep hierarchical
recurrent networks across domains, applications, and languages, and show that
significant improvement can often be obtained. These improvements lead to
improvements over the current state-of-the-art on several well-studied tasks.
Hao Dong, Jingqing Zhang, Douglas McIlwraith, Yike Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Translating information between text and image is a fundamental problem in
artificial intelligence that connects natural language processing and computer
vision. In the past few years, performance in image caption generation has seen
significant improvement through the adoption of recurrent neural networks
(RNN). Meanwhile, text-to-image generation begun to generate plausible images
using datasets of specific categories like birds and flowers. We’ve even seen
image generation from multi-category datasets such as the Microsoft Common
Objects in Context (MSCOCO) through the use of generative adversarial networks
(GANs). Synthesizing objects with a complex shape, however, is still
challenging. For example, animals and humans have many degrees of freedom,
which means that they can take on many complex shapes. We propose a new
training method called Image-Text-Image (I2T2I) which integrates text-to-image
and image-to-text (image captioning) synthesis to improve the performance of
text-to-image synthesis. We demonstrate that %the capability of our method to
understand the sentence descriptions, so as to I2T2I can generate better
multi-categories images using MSCOCO than the state-of-the-art. We also
demonstrate that I2T2I can achieve transfer learning by using a pre-trained
image captioning module to generate human images on the MPII Human Pose
Diederik Aerts, Jonito Aerts Arguelles, Lester Beltran, Lyneth Beltran, Isaac Distrito, Massimiliano Sassoli de Bianchi, Sandro Sozzo, Tomas Veloz
Comments: 20 pages, no figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Quantum Physics (quant-ph)
We elaborate a quantum model for corpora of written documents, like the pages
forming the World Wide Web. To that end, we are guided by how physicists
constructed quantum theory for microscopic entities, which unlike classical
objects cannot be fully represented in our spatial theater. We suggest that a
similar construction needs to be carried out by linguists and computational
scientists, to capture the full meaning content of collections of documental
entities. More precisely, we show how to associate a quantum-like ‘entity of
meaning’ to a ‘language entity formed by printed documents’, considering the
latter as the collection of traces that are left by the former, in specific
results of search actions that we describe as measurements. In other words, we
offer a perspective where a collection of documents, like the Web, is described
as the space of manifestation of a more complex entity – the QWeb – which is
the object of our modeling, drawing its inspiration from previous studies on
operational-realistic approaches to quantum physics and quantum modeling of
human cognition and decision-making. We emphasize that a consistent QWeb model
needs to account for the observed correlations between words appearing in
printed documents, e.g., co-occurrences, as the latter would depend on the
‘meaning connections’ existing between the concepts that are associated with
these words. In that respect, we show that both ‘context and interference
(quantum) effects’ are required to explain the probabilities calculated by
counting the relative number of documents containing certain words and
co-ocurrrences of words.
Mohamed Morchid, Juan-Manuel Torres-Moreno, Richard Dufour, Javier Ramírez-Rodríguez, Georges Linarès
Comments: 16 pages, 4 tables, 8 figures
Journal-ref: International Journal of Computational Linguistics and
Applications, 7(2):87-109, 2016
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
The number of documents available into Internet moves each day up. For this
reason, processing this amount of information effectively and expressibly
becomes a major concern for companies and scientists. Methods that represent a
textual document by a topic representation are widely used in Information
Retrieval (IR) to process big data such as Wikipedia articles. One of the main
difficulty in using topic model on huge data collection is related to the
material resources (CPU time and memory) required for model estimate. To deal
with this issue, we propose to build topic spaces from summarized documents. In
this paper, we present a study of topic space representation in the context of
big data. The topic space representation behavior is analyzed on different
languages. Experiments show that topic spaces estimated from text summaries are
as relevant as those estimated from the complete documents. The real advantage
of such an approach is the processing time gain: we showed that the processing
time can be drastically reduced using summarized documents (more than 60\% in
general). This study finally points out the differences between thematic
representations of documents depending on the targeted languages such as
English or latin languages.
Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
Comments: 11 pages, 4 figures, 2 tables, webpage: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
We introduce the first goal-driven training for visual question answering and
dialog agents. Specifically, we pose a cooperative ‘image guessing’ game
between two agents — Qbot and Abot — who communicate in natural language
dialog so that Qbot can select an unseen image from a lineup of images. We use
deep reinforcement learning (RL) to learn the policies of these agents
end-to-end — from pixels to multi-agent multi-round dialog to game reward.
We demonstrate two experimental results.
First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show
results on a synthetic world, where the agents communicate in ungrounded
vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find
that two bots invent their own communication protocol and start using certain
symbols to ask/answer about certain visual attributes (shape/color/size). Thus,
we demonstrate the emergence of grounded language and communication among
‘visual’ dialog agents with no human supervision at all.
Second, we conduct large-scale real-image experiments on the VisDial dataset,
where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’
agents significantly outperform SL agents. Interestingly, the RL Qbot learns to
ask questions that Abot is good at, ultimately resulting in more informative
dialog and a better team.
Jia-Hong Huang, Modar Alfadly, Bernard Ghanem
Comments: Submitted ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Taking image and question as the input of our method, it can output the
text-based answer of the query question about the given image, so called Visual
Question Answering (VQA). There are two main modules in our algorithm. Given a
natural language question about an image, the first module takes the question
as input and then outputs the basic questions of the main question, given
question. The second module takes the main question, image and these basic
questions as input and then outputs the text-based answer of the main question.
We formulate the basic questions generation problem as a LASSO optimization
problem, and also propose a criterion about how to exploit these basic
questions to help answer main question. Our method is evaluated on the
challenging VQA dataset, and yields the competitive performance compared to
state-of-the-art.
Moreno Marzolla, Gabriele D'Angelo
Comments: Submitted to IEEE/ACM DS-RT 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Multiagent Systems (cs.MA)
In this paper we consider the problem of identifying intersections between
two sets of d-dimensional axis-parallel rectangles. This is a common problem
that arises in many agent-based simulation studies, and is of central
importance in the context of High Level Architecture (HLA), where it is at the
core of the Data Distribution Management (DDM) service. Several realizations of
the DDM service have been proposed; however, many of them are either
inefficient or inherently sequential. These are serious limitations since
multicore processors are now ubiquitous, and DDM algorithms — being
CPU-intensive — could benefit from additional computing power. We propose a
parallel version of the Sort-Based Matching algorithm for shared-memory
multiprocessors. Sort-Based Matching is one of the most efficient serial
algorithms for the DDM problem, but is quite difficult to parallelize due to
data dependencies. We describe the algorithm and compute its asymptotic running
time; we complete the analysis by assessing its performance and scalability
through extensive experiments on two commodity multicore systems based on a
dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.
Shuo Chen, Rongxing Lu, Jie Zhang
Comments: 23 pages, 4 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
The singular value decomposition (SVD) is a widely used matrix factorization
tool which underlies plenty of useful applications, e.g. recommendation system,
abnormal detection and data compression. Under the environment of emerging
Internet of Things (IoT), there would be an increasing demand for data analysis
to better human’s lives and create new economic growth points. Moreover, due to
the large scope of IoT, most of the data analysis work should be done in the
network edge, i.e. handled by fog computing. However, the devices which provide
fog computing may not be trustable while the data privacy is often the
significant concern of the IoT application users. Thus, when performing SVD for
data analysis purpose, the privacy of user data should be preserved. Based on
the above reasons, in this paper, we propose a privacy-preserving fog computing
framework for SVD computation. The security and performance analysis shows the
practicability of the proposed framework. Furthermore, since the SVD
computation is the tool for data analysis rather than the ultimate goal, three
different applications are introduced to show how the framework could flexibly
achieve the purposes of different applications, which indicates the flexibility
of the design.
Farzad Khodadadi, Amir Vahid Dastjerdi, Rajkumar Buyya
Comments: Keywords: Internet of Things; IoT; Web of Things; Cloud of Things
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
As technology proceeds and the number of smart devices continues to grow
substantially, need for ubiquitous context-aware platforms that support
interconnected, heterogeneous, and distributed network of devices has given
rise to what is referred today as Internet-of-Things. However, paving the path
for achieving aforementioned objectives and making the IoT paradigm more
tangible requires integration and convergence of different knowledge and
research domains, covering aspects from identification and communication to
resource discovery and service integration. Through this chapter, we aim to
highlight researches in topics including proposed architectures, security and
privacy, network communication means and protocols, and eventually conclude by
providing future directions and open challenges facing the IoT development.
Jan Kofroň (Charles University), Jana Tumova (KTH Royal Institute of Technology)
Journal-ref: EPTCS 245, 2017
Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)
These are the proceedings of the 14th International Workshop on Formal
Engineering approaches to Software Components and Architectures (FESCA). The
workshop was held on April 22, 2017 in Uppsala (Sweden) as a satellite event to
the European Joint Conference on Theory and Practice of Software (ETAPS’17).
The aim of the FESCA workshop is to bring together junior researchers from
formal methods, software engineering, and industry interested in the
development and application of formal modelling approaches as well as
associated analysis and reasoning techniques with practical benefits for
software engineering.
In recent years, the growing importance of functional correctness and the
increased relevance of system quality properties (e.g. performance,
reliability, security) have stimulated the emergence of analytical and
modelling techniques for the design and development of software systems. With
the increasing complexity and utilization of today’s software systems, FESCA
aims at addressing two research questions: (1) what role is played by the
software design phase in the systematic addressing of the analytical and
modelling challenges, and (2) how can formal and semi-formal techniques be
effectively applied to make the issues easier to address automatically, with
lower human intervention.
Hubert Garavel, Wendelin Serwe
Comments: In Proceedings MARS 2017, arXiv:1703.05812
Journal-ref: EPTCS 244, 2017, pp. 230-270
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC)
The multiway rendezvous introduced in Theoretical CSP is a powerful paradigm
to achieve synchronization and communication among a group of (possibly more
than two) processes. We illustrate the advantages of this paradigm on the
production cell benchmark, a model of a real metal processing plant, for which
we propose a compositional software controller, which is written in LNT and
LOTOS, and makes intensive use of the multiway rendezvous.
Tam Nguyen, Takeshi Hatanaka, Mamoru Doi, Emanuele Garone, Masayuki Fujita
Comments: 8 pages, International Federation of Automatic Conference 2017, 8 figures
Subjects: Multiagent Systems (cs.MA); Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO); Systems and Control (cs.SY)
This paper focuses on a passivity-based distributed reference governor (RG)
applied to a pre-stabilized mobile robotic network. The novelty of this paper
lies in the method used to solve the RG problem, where a passivity-based
distributed optimization scheme is proposed. In particular, the gradient
descent method minimizes the global objective function while the dual ascent
method maximizes the Hamiltonian. To make the agents converge to the agreed
optimal solution, a proportional-integral consensus estimator is used. This
paper proves the convergence of the state estimates of the RG to the optimal
solution through passivity arguments, considering the physical system static.
Then, the effectiveness of the scheme considering the dynamics of the physical
system is demonstrated through simulations and experiments.
Hongwei Xi, Hanwen Wu
Subjects: Logic (math.LO); Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
We identify multirole logic as a new form of logic in which
conjunction/disjunction is interpreted as an ultrafilter on the power set of
some underlying set (of roles) and the notion of negation is generalized to
endomorphisms on this underlying set. We formalize both multirole logic (MRL)
and linear multirole logic (LMRL) as natural generalizations of classical logic
(CL) and classical linear logic (CLL), respectively, and also present a
filter-based interpretation for intuitionism in multirole logic. Among various
meta-properties established for MRL and LMRL, we obtain one named multiparty
cut-elimination stating that every cut involving one or more sequents (as a
generalization of a (binary) cut involving exactly two sequents) can be
eliminated, thus extending the celebrated result of cut-elimination by Gentzen.
Ehab Salahat, Murad Qasaimeh
Comments: Annual IEEE Industrial Electronics Societys 18th International Conf. on Industrial Technology (ICIT), 22-25 March, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Computer vision is one of the most active research fields in information
technology today. Giving machines and robots the ability to see and comprehend
the surrounding world at the speed of sight creates endless potential
applications and opportunities. Feature detection and description algorithms
can be indeed considered as the retina of the eyes of such machines and robots.
However, these algorithms are typically computationally intensive, which
prevents them from achieving the speed of sight real-time performance. In
addition, they differ in their capabilities and some may favor and work better
given a specific type of input compared to others. As such, it is essential to
compactly report their pros and cons as well as their performances and recent
advances. This paper is dedicated to provide a comprehensive overview on the
state-of-the-art and recent advances in feature detection and description
algorithms. Specifically, it starts by overviewing fundamental concepts. It
then compares, reports and discusses their performance and capabilities. The
Maximally Stable Extremal Regions algorithm and the Scale Invariant Feature
Transform algorithms, being two of the best of their type, are selected to
report their recent algorithmic derivatives.
Nadav Cohen, Ronen Tamari, Amnon Shashua
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Expressive efficiency is a concept that allows formally reasoning about the
representational capacity of deep network architectures. A network architecture
is expressively efficient with respect to an alternative architecture if the
latter must grow super-linearly in order to represent functions realized by the
former. A well-known example is the exponential expressive efficiency of depth,
namely, that in many cases shallow networks must grow exponentially large in
order to represent functions realized by deep networks. In this paper we study
the expressive efficiency brought forth by the architectural feature of
connectivity, motivated by the observation that nearly all state of the art
networks these days employ elaborate connection schemes, running layers in
parallel while splitting and merging them in various ways. A formal treatment
of this question would shed light on the effectiveness of modern connectivity
schemes, and in addition, could provide new tools for network design. We focus
on dilated convolutional networks, a family of deep models gaining increased
attention, underlying state of the art architectures like Google’s WaveNet and
ByteNet. By introducing and studying the concept of mixed tensor
decompositions, we prove that interconnecting dilated convolutional networks
can lead to expressive efficiency. In particular, we show that a single
connection between intermediate layers can already lead to an almost quadratic
gap, which in large-scale settings typically makes the difference between a
model that is practical and one that is not.
Fanhua Shang, Yuanyuan Liu, James Cheng, Kelvin Kai Wing Ng, Yuichi Yoshida
Comments: 22 pages, 9 figures
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
The sufficient decrease technique has been widely used in deterministic
optimization, even for non-convex optimization problems, such as line-search
techniques. Motivated by those successes, we propose a novel sufficient
decrease framework for a class of variance reduced stochastic gradient descent
(VR-SGD) methods such as SVRG and SAGA. In order to make sufficient decrease
for stochastic optimization, we design a new sufficient decrease criterion. We
then introduce a coefficient heta to satisfy the sufficient decrease
property, which takes the decisions to shrink, expand or move in the opposite
direction (i.e., heta x for the variable x), and give two specific update
rules for Lasso and ridge regression. Moreover, we analyze the convergence
properties of our algorithms for strongly convex problems, which show that both
of our algorithms attain linear convergence rates. We also provide the
convergence guarantees of both of our algorithms for non-strongly convex
problems. Our experimental results further verify that our algorithms achieve
better performance than their counterparts.
Anthony Bagnall, Gavin C. Cawley
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We demonstrate that, for a range of state-of-the-art machine learning
algorithms, the differences in generalisation performance obtained using
default parameter settings and using parameters tuned via cross-validation can
be similar in magnitude to the differences in performance observed between
state-of-the-art and uncompetitive learning systems. This means that fair and
rigorous evaluation of new learning algorithms requires performance comparison
against benchmark methods with best-practice model selection procedures, rather
than using default parameter settings. We investigate the sensitivity of three
key machine learning algorithms (support vector machine, random forest and
rotation forest) to their default parameter settings, and provide guidance on
determining sensible default parameter values for implementations of these
algorithms. We also conduct an experimental comparison of these three
algorithms on 121 classification problems and find that, perhaps surprisingly,
rotation forest is significantly more accurate on average than both random
forest and a support vector machine.
Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
We introduce two tactics to attack agents trained by deep reinforcement
learning algorithms using adversarial examples, namely the strategically-timed
attack and the enchanting attack. In the strategically-timed attack, the
adversary aims at minimizing the agent’s reward by only attacking the agent at
a small subset of time steps in an episode. Limiting the attack activity to
this subset helps prevent detection of the attack by the agent. We propose a
novel method to determine when an adversarial example should be crafted and
applied. In the enchanting attack, the adversary aims at luring the agent to a
designated target state. This is achieved by combining a generative model and a
planning algorithm: while the generative model predicts the future states, the
planning algorithm generates a preferred sequence of actions for luring the
agent. A sequence of adversarial examples is then crafted to lure the agent to
take the preferred sequence of actions. We apply the two tactics to the agents
trained by the state-of-the-art deep reinforcement learning algorithm including
DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much
reward as the uniform attack (i.e., attacking at every time step) does by
attacking the agent 4 times less often. Our enchanting attack lures the agent
toward designated target states with a more than 70% success rate. Videos are
available at this http URL
Gary Bécigneul
Subjects: Learning (cs.LG)
In machine learning and neuroscience, certain computational structures and
algorithms are known to yield disentangled representations without us
understanding why, the most striking examples being perhaps convolutional
neural networks and the ventral stream of the visual cortex in humans and
primates. As for the latter, it was conjectured that representations may be
disentangled by being flattened progressively and at a local scale. An attempt
at a formalization of the role of invariance in learning representations was
made recently, being referred to as I-theory. In this framework and using the
language of differential geometry, we show that pooling over a group of
transformations of the input contracts the metric and reduces its curvature,
and provide quantitative bounds, in the aim of moving towards a theoretical
understanding on how to disentangle representations.
Daniil Ryabko
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
The independence clustering problem is considered in the following
formulation: given a set (S) of random variables, it is required to find the
finest partitioning ({U_1,dots,U_k}) of (S) into clusters such that the
clusters (U_1,dots,U_k) are mutually independent. Since mutual independence is
the target, pairwise similarity measurements are of no use, and thus
traditional clustering algorithms are inapplicable. The distribution of the
random variables in (S) is, in general, unknown, but a sample is available.
Thus, the problem is cast in terms of time series. Two forms of sampling are
considered: i.i.d. and stationary time series, with the main emphasis being on
the latter, more general, case. A consistent, computationally tractable
algorithm for each of the settings is proposed, and a number of open directions
for further research are outlined.
Shuo Wang, Leandro L. Minku, Xin Yao
Subjects: Learning (cs.LG)
As an emerging research topic, online class imbalance learning often combines
the challenges of both class imbalance and concept drift. It deals with data
streams having very skewed class distributions, where concept drift may occur.
It has recently received increased research attention; however, very little
work addresses the combined problem where both class imbalance and concept
drift coexist. As the first systematic study of handling concept drift in
class-imbalanced data streams, this paper first provides a comprehensive review
of current research progress in this field, including current research focuses
and open challenges. Then, an in-depth experimental study is performed, with
the goal of understanding how to best overcome concept drift in online learning
with class imbalance. Based on the analysis, a general guideline is proposed
for the development of an effective algorithm.
Roei Gelbhart, Ran El-Yaniv
Subjects: Learning (cs.LG)
A selective classifier (f,g) comprises a classification function f and a
binary selection function g, which determines if the classifier abstains from
prediction, or uses f to predict. The classifier is called
pointwise-competitive if it classifies each point identically to the best
classifier in hindsight (from the same class), whenever it does not abstain.
The quality of such a classifier is quantified by its rejection mass, defined
to be the probability mass of the points it rejects. A “fast” rejection rate is
achieved if the rejection mass is bounded from above by O(1/m) where m is the
number of labeled examples used to train the classifier (and O hides
logarithmic factors). Pointwise-competitive selective (PCS) classifiers are
intimately related to disagreement-based active learning and it is known that
in the realizable case, a fast rejection rate of a known PCS algorithm (called
Consistent Selective Strategy) is equivalent to an exponential speedup of the
well-known CAL active algorithm.
We focus on the agnostic setting, for which there is a known algorithm called
LESS that learns a PCS classifier and achieves a fast rejection rate (depending
on Hanneke’s disagreement coefficient) under strong assumptions. We present an
improved PCS learning algorithm called ILESS for which we show a fast rate
(depending on Hanneke’s disagreement coefficient) without any assumptions. Our
rejection bound smoothly interpolates the realizable and agnostic settings. The
main result of this paper is an equivalence between the following three
entities: (i) the existence of a fast rejection rate for any PCS learning
algorithm (such as ILESS); (ii) a poly-logarithmic bound for Hanneke’s
disagreement coefficient; and (iii) an exponential speedup for a new
disagreement-based active learner called ActiveiLESS.
Shuangfei Fan, Bert Huang
Subjects: Learning (cs.LG)
We propose a new method for training iterative collective classifiers for
labeling nodes in network data. The iterative classification algorithm (ICA) is
a canonical method for incorporating relational information into
classification. Yet, existing methods for training ICA models rely on the
assumption that relational features reflect the true labels of the nodes. This
unrealistic assumption introduces a bias that is inconsistent with the actual
prediction algorithm. In this paper, we introduce recurrent collective
classification (RCC), a variant of ICA analogous to recurrent neural network
prediction. RCC accommodates any differentiable local classifier and relational
feature functions. We provide gradient-based strategies for optimizing over
model parameters to more directly minimize the loss function. In our
experiments, this direct loss minimization translates to improved accuracy and
robustness on real network data. We demonstrate the robustness of RCC in
settings where local classification is very noisy, settings that are
particularly challenging for ICA.
Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The probability that a user will click a search result depends both on its
relevance and its position on the results page. The position based model
explains this behavior by ascribing to every item an attraction probability,
and to every position an examination probability. To be clicked, a result must
be both attractive and examined. The probabilities of an item-position pair
being clicked thus form the entries of a rank-(1) matrix. We propose the
learning problem of a Bernoulli rank-(1) bandit where at each step, the
learning agent chooses a pair of row and column arms, and receives the product
of their Bernoulli-distributed values as a reward. This is a special case of
the stochastic rank-(1) bandit problem considered in recent work that proposed
an elimination based algorithm Rank1Elim, and showed that Rank1Elim’s regret
scales linearly with the number of rows and columns on “benign” instances.
These are the instances where the minimum of the average row and column rewards
(mu) is bounded away from zero. The issue with Rank1Elim is that it fails to
be competitive with straightforward bandit strategies as (mu
ightarrow 0).
In this paper we propose Rank1ElimKL which simply replaces the (crude)
confidence intervals of Rank1Elim with confidence intervals based on
Kullback-Leibler (KL) divergences, and with the help of a novel result
concerning the scaling of KL divergences we prove that with this change, our
algorithm will be competitive no matter the value of (mu). Experiments with
synthetic data confirm that on benign instances the performance of Rank1ElimKL
is significantly better than that of even Rank1Elim, while experiments with
models derived from real data confirm that the improvements are significant
across the board, regardless of whether the data is benign or not.
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Access to electronic health records (EHR) data has motivated computational
advances in medical research. However, various concerns, particularly over
privacy, can limit access to and collaborative use of EHR data. Sharing
synthetic EHR data could mitigate risk. In this paper, we propose a new
approach, medical Generative Adversarial Network (medGAN), to generate
realistic synthetic EHRs. Based on an input EHR dataset, medGAN can generate
high-dimensional discrete variables (e.g., binary and count features) via a
combination of an autoencoder and generative adversarial networks. We also
propose minibatch averaging to efficiently avoid mode collapse, and increase
the learning efficiency with batch normalization and shortcut connections. To
demonstrate feasibility, we showed that medGAN generates synthetic EHR datasets
that achieve comparable performance to real data on many experiments including
distribution statistics, predictive modeling tasks and medical expert review.
Peeyush Kumar, Wolf Kohn, Zelda B. Zabinsky
Subjects: Learning (cs.LG)
Many applications require solving non-linear control problems that are
classically not well behaved. This paper develops a simple and efficient
chattering algorithm that learns near optimal decision policies through an
open-loop feedback strategy. The optimal control problem reduces to a series of
linear optimization programs that can be easily solved to recover a relaxed
optimal trajectory. This algorithm is implemented on a real-time enterprise
scheduling and control process.
Nir Rosenfeld, Amir Globerson
Subjects: Learning (cs.LG)
The goal of semi supervised learning methods is to effectively combine
labeled and unlabeled data to arrive at a better model. Many such methods rely
on graph-based approaches, where continuous labels are propagated through a
graph on the input points. Here we argue that it is more effective to consider
infection processes on these graphs, whereby at any point in time nodes can
infect other nodes with their labels. Since the dynamics of these processes is
stochastic, we develop algorithms for efficiently estimating the expected
labels over time. We show that our approach addresses many of the limitations
of graph based learning, and is also empirically effective.
Ramin M. Hasani, Guodong Wang, Radu Grosu
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This paper studies an intelligent ultimate technique for health-monitoring
and prognostic of common rotary machine components, particularly bearings.
During a run-to-failure experiment, rich unsupervised features from vibration
sensory data are extracted by a trained sparse auto-encoder. Then, the
correlation of the extracted attributes of the initial samples (presumably
healthy at the beginning of the test) with the succeeding samples is calculated
and passed through a moving-average filter. The normalized output is named
auto-encoder correlation-based (AEC) rate which stands for an informative
attribute of the system depicting its health status and precisely identifying
the degradation starting point. We show that AEC technique well-generalizes in
several run-to-failure tests. AEC collects rich unsupervised features form the
vibration data fully autonomous. We demonstrate the superiority of the AEC over
many other state-of-the-art approaches for the health monitoring and prognostic
of machine bearings.
Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P. How, John Vian
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Many real-world tasks involve multiple agents with partial observability and
limited communication. Learning is challenging in these settings due to local
viewpoints of agents, which perceive the world as non-stationary due to
concurrently-exploring teammates. Approaches that learn specialized policies
for individual tasks face major problems when applied to the real world: not
only do agents have to learn and store a distinct policy for each task, but in
practice the identity of the task is often non-observable, making these
algorithms inapplicable. This paper formalizes and addresses the problem of
multi-task multi-agent reinforcement learning under partial observability. We
introduce a decentralized single-task learning approach that is robust to
concurrent interactions of teammates, and present an approach for distilling
single-task policies into a unified policy that performs well across multiple
related tasks, without explicit provision of task identity.
Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims
Subjects: Learning (cs.LG); Information Retrieval (cs.IR)
Accurately evaluating new policies (e.g. ad-placement models, ranking
functions, recommendation functions) is one of the key prerequisites for
improving interactive systems. While the conventional approach to evaluation
relies on online A/B tests, recent work has shown that counterfactual
estimators can provide an inexpensive and fast alternative, since they can be
applied offline using log data that was collected from a different policy
fielded in the past. In this paper, we address the question of how to estimate
the performance of a new policy when we have log data from multiple historic
policies. This question is of great relevance in practice, since policies get
updated frequently in most online systems. We show that naively combining data
from multiple logging policies can be highly suboptimal. In particular, we find
that the standard Inverse Propensity Score (IPS) estimator suffers especially
when logging and evaluation policies diverge — to a point where throwing away
data improves the variance of the estimator. We therefore propose two
alternative estimators which we characterize theoretically and compare
experimentally. We find that the new estimators can provide substantially
improved estimation accuracy.
Hossein Hosseini, Radha Poovendran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Deep Neural Networks (DNNs) have achieved remarkable performance on a variety
of pattern-recognition tasks, particularly visual classification problems,
where new algorithms reported to achieve or even surpass the human performance.
In this paper, we test the state-of-the-art DNNs with negative images and show
that the accuracy drops to the level of random classification. This leads us to
the conjecture that the DNNs, which are merely trained on raw data, do not
recognize the semantics of the objects, but rather memorize the inputs. We
suggest that negative images can be thought as “semantic adversarial examples”,
which we define as transformed inputs that semantically represent the same
objects, but the model does not classify them correctly.
Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Learning (cs.LG)
Machine learning has matured to the point to where it is now being considered
to automate decisions in loan lending, employee hiring, and predictive
policing. In many of these scenarios however, previous decisions have been made
that are unfairly biased against certain subpopulations (e.g., those of a
particular race, gender, or sexual orientation). Because this past data is
often biased, machine learning predictors must account for this to avoid
perpetuating discriminatory practices (or incidentally making new ones). In
this paper, we develop a framework for modeling fairness in any dataset using
tools from counterfactual inference. We propose a definition called
counterfactual fairness that captures the intuition that a decision is fair
towards an individual if it gives the same predictions in (a) the observed
world and (b) a world where the individual had always belonged to a different
demographic group, other background causes of the outcome being equal. We
demonstrate our framework on two real-world problems: fair prediction of law
school success, and fair modeling of an individual’s criminality in policing
data.
Nick Pawlowski, Miguel Jaques, Ben Glocker
Comments: Under review for Workshop track – ICLR 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In this work we perform outlier detection using ensembles of neural networks
obtained by variational approximation of the posterior in a Bayesian neural
network setting. The variational parameters are obtained by sampling from the
true posterior by gradient descent. We show our outlier detection results are
better than those obtained using other efficient ensembling methods.
Peter Karkus, David Hsu, Wee Sun Lee
Comments: 9 pages, 5 figures, 1 table
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This paper introduces QMDP-net, a neural network architecture for planning
under partial observability. The QMDP-net combines the strengths of model-free
learning and model-based planning. It is a recurrent policy network, but it
represents a policy by connecting a model with a planning algorithm that solves
the model, thus embedding the solution structure of planning in the network
architecture. The QMDP-net is fully differentiable and allows end-to-end
training. We train a QMDP-net over a set of different environments so that it
can generalize over new ones. In preliminary experiments, QMDP-net showed
strong performance on several robotic tasks in simulation. Interestingly, it
also sometimes outperformed the QMDP algorithm, which generated the data for
learning, because of QMDP-net’s robustness resulting from end-to-end learning.
Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
Comments: 11 pages, 4 figures, 2 tables, webpage: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)
We introduce the first goal-driven training for visual question answering and
dialog agents. Specifically, we pose a cooperative ‘image guessing’ game
between two agents — Qbot and Abot — who communicate in natural language
dialog so that Qbot can select an unseen image from a lineup of images. We use
deep reinforcement learning (RL) to learn the policies of these agents
end-to-end — from pixels to multi-agent multi-round dialog to game reward.
We demonstrate two experimental results.
First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show
results on a synthetic world, where the agents communicate in ungrounded
vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find
that two bots invent their own communication protocol and start using certain
symbols to ask/answer about certain visual attributes (shape/color/size). Thus,
we demonstrate the emergence of grounded language and communication among
‘visual’ dialog agents with no human supervision at all.
Second, we conduct large-scale real-image experiments on the VisDial dataset,
where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’
agents significantly outperform SL agents. Interestingly, the RL Qbot learns to
ask questions that Abot is good at, ultimately resulting in more informative
dialog and a better team.
Annie Liang, Xiaosheng Mu, Vasilis Syrgkanis
Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG); Statistics Theory (math.ST)
Decision-makers often learn by acquiring information from distinct sources
that possibly provide complementary information. We consider a decision-maker
who sequentially samples from a finite set of Gaussian signals, and wants to
predict a persistent multi-dimensional state at an unknown final period. What
signal should he choose to observe in each period? Related problems about
optimal experimentation and dynamic learning tend to have solutions that can
only be approximated or implicitly characterized. In contrast, we find that in
our problem, the dynamically optimal path of signal acquisitions generically:
(1) eventually coincides at every period with the myopic path of signal
acquisitions, and (2) eventually achieves “total optimality,” so that at every
large period, the decision-maker will not want to revise his previous signal
acquisitions, even if given this opportunity. In special classes of
environments that we describe, these properties attain not only eventually, but
from period 1. Finally, we characterize the asymptotic frequency with which
each signal is chosen, and how this depends on primitives of the informational
environment.
Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
Comments: Accepted as a conference paper at ICLR 2017. This is an extended version of the original paper (this https URL). The original paper proposes a new architecture, while this version focuses on transfer learning for a general model class
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Recent papers have shown that neural networks obtain state-of-the-art
performance on several different sequence tagging tasks. One appealing property
of such systems is their generality, as excellent performance can be achieved
with a unified architecture and without task-specific feature engineering.
However, it is unclear if such systems can be used for tasks without large
amounts of training data. In this paper we explore the problem of transfer
learning for neural sequence taggers, where a source task with plentiful
annotations (e.g., POS tagging on Penn Treebank) is used to improve performance
on a target task with fewer available annotations (e.g., POS tagging for
microblogs). We examine the effects of transfer learning for deep hierarchical
recurrent networks across domains, applications, and languages, and show that
significant improvement can often be obtained. These improvements lead to
improvements over the current state-of-the-art on several well-studied tasks.
Ashish Khetan, Sewoong Oh
Comments: 52 pages; 15 figures
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Numerical Analysis (cs.NA)
Singular values of a data in a matrix form provide insights on the structure
of the data, the effective dimensionality, and the choice of hyper-parameters
on higher-level data analysis tools. However, in many practical applications
such as collaborative filtering and network analysis, we only get a partial
observation. Under such scenarios, we consider the fundamental problem of
recovering spectral properties of the underlying matrix from a sampling of its
entries. We are particularly interested in directly recovering the spectrum,
which is the set of singular values, and also in sample-efficient approaches
for recovering a spectral sum function, which is an aggregate sum of the same
function applied to each of the singular values. We propose first estimating
the Schatten (k)-norms of a matrix, and then applying Chebyshev approximation
to the spectral sum function or applying moment matching in Wasserstein
distance to recover the singular values. The main technical challenge is in
accurately estimating the Schatten norms from a sampling of a matrix. We
introduce a novel unbiased estimator based on counting small structures in a
graph and provide guarantees that match its empirical performance. Our
theoretical analysis shows that Schatten norms can be recovered accurately from
strictly smaller number of samples compared to what is needed to recover the
underlying low-rank matrix. Numerical experiments suggest that we significantly
improve upon a competing approach of using matrix completion methods.
B Sengupta, E Vasquez, Y Qian
Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
Learning an encoding of feature vectors in terms of an over-complete
dictionary or a probabilistic information geometric (Fisher vectors) construct
is wide-spread in statistical signal processing and computer vision. In content
based information retrieval using deep-learning classifiers, such encodings are
learnt on the flattened last layer, without adherence to the multi-linear
structure of the underlying feature tensor. We illustrate a variety of feature
encodings incl. sparse dictionary coding and Fisher vectors along with
proposing that a structured tensor factorization scheme enables us to perform
retrieval that is at par, in terms of average precision, with Fisher vector
encoded image signatures. In short, we illustrate how structural constraints
increase retrieval fidelity.
Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen
Subjects: Sound (cs.SD); Learning (cs.LG)
Despite the significant progress made in the recent years in dictating
single-talker speech, the progress made in speaker independent multi-talker
mixed speech separation and tracing, often referred to as the cocktail-party
problem, has been less impressive. In this paper we propose a novel technique
for attacking this problem. The core of our technique is permutation invariant
training (PIT), which aims at minimizing the source stream reconstruction error
no matter how labels are ordered. This is achieved by aligning labels to the
output streams automatically during the training time. This strategy
effectively solves the label permutation problem observed in deep learning
based techniques for speech separation. More interestingly, our approach can
integrate speaker tracing in the PIT framework so that separation and tracing
can be carried out in one step and trained end-to-end. This is achieved using
recurrent neural networks (RNNs) by forcing separated frames belonging to the
same speaker to be aligned to the same output layer during training.
Furthermore, the computational cost introduced by PIT is very small compared to
the RNN computation during training and is zero during separation. We evaluated
PIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks
and found that it compares favorably to non-negative matrix factorization
(NMF), computational auditory scene analysis (CASA), deep clustering (DPCL) and
deep attractor network (DANet), and generalizes well over unseen speakers and
languages.
Pietro Morerio, Jacopo Cavazza, Riccardo Volpi, Rene Vidal, Vittorio Murino
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
Dropout is a very effective way of regularizing neural networks.
Stochastically “dropping out” units with a certain probability discourages
over-specific co-adaptations of feature detectors, preventing overfitting and
improving network generalization. Besides, Dropout can be interpreted as an
approximate model aggregation technique, where an exponential number of smaller
networks are averaged in order to get a more powerful ensemble. In this paper,
we show that using a fixed dropout probability during training is a suboptimal
choice. We thus propose a time scheduling for the probability of retaining
neurons in the network. This induces an adaptive regularization scheme that
smoothly increases the difficulty of the optimization problem. This idea of
“starting easy” and adaptively increasing the difficulty of the learning
problem has its roots in curriculum learning and allows one to train better
models. Indeed, we prove that our optimization strategy implements a very
general curriculum scheme, by gradually adding noise to both the input and
intermediate feature representations within the network architecture.
Experiments on seven image classification datasets and different network
architectures show that our method, named Curriculum Dropout, frequently yields
to better generalization and, at worst, performs just as well as the standard
Dropout method.
Mason McGill, Pietro Perona
Comments: Submitted to ICML 2017
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We propose and systematically evaluate three strategies for training
dynamically-routed artificial neural networks: graphs of learned
transformations through which different input signals may take different paths.
Though some approaches have advantages over others, the resulting networks are
often qualitatively similar. We find that, in dynamically-routed networks
trained to classify images, layers and branches become specialized to process
distinct categories of images. Additionally, given a fixed computational
budget, dynamically-routed networks tend to perform better than comparable
statically-routed networks.
Andreas Lenz, Manuel Stein, Lee Swindlehurst
Comments: 5 pages, 4 figures
Subjects: Information Theory (cs.IT)
In this work, the optimization of the analog transmit waveform for joint
delay-Doppler estimation under sub-Nyquist conditions is considered. In
particular, we derive an estimation theoretic design rule for the Fourier
coefficients of the analog transmit signal when violating the sampling theorem
at the receiver by using a wide analog pre-filtering bandwidth. For a wireless
delay-Doppler channel, we derive an optimization problem based on the Bayesian
Cram’er-Rao lower bound (BCRLB) which allows us to solve the transmitter
design problem using an Eigenvalue decomposition. Our approach enables one to
explore the Pareto-optimal design region spanned by the optimized waveforms.
Furthermore, we demonstrate how our framework can be used to reduce the
sampling rate at the receiver while maintaining high estimation accuracy.
Finally, we verify the practical impact by Monte-Carlo simulations of a channel
estimation algorithm.
Hai Cheng, Xiaojun Yuan, Yihua Tan
Comments: 27 pages, 11 figures, submitted to IEEE Transaction on Information Theory
Subjects: Information Theory (cs.IT)
Compute-and-forward (CF) harnesses interference in wireless communications by
exploiting structured coding. The key idea of CF is to compute integer
combinations of codewords from multiple source nodes, rather than to decode
individual codewords by treating others as noise. Compute-compress-and-forward
(CCF) can further enhance the network performance by introducing compression
operations at receivers. In this paper, we develop a more general compression
framework, termed generalized compute-compress-and-forward (GCCF), where the
compression function involves multiple quantization-and-modulo lattice
operations. We show that GCCF achieves a broader compression rate region than
CCF. We also compare our compression rate region with the fundamental
Slepian-Wolf (SW) region. We show that GCCF is optimal in the sense of
achieving the minimum total compression rate. We also establish the criteria
under which GCCF achieves the SW region. In addition, we consider a two-hop
relay network employing the GCCF scheme. We formulate a sum-rate maximization
problem and develop an approximate algorithm to solve the problem. Numerical
results are presented to demonstrate the performance superiority of GCCF over
CCF and other schemes.
Panos N. Alevizos, Efthymios A. Vlachos, Aggelos Bletsas
Subjects: Information Theory (cs.IT)
Interference-aware resource allocation of time slots and frequency channels
in single-antenna, halfduplex radio wireless sensor networks (WSN) is
challenging. Devising distributed algorithms for such task further complicates
the problem. This work studiesWSN joint time and frequency channel allocation
for a given routing tree, such that: a) allocation is performed in a fully
distributed way, i.e., information exchange is only performed among neighboring
WSN terminals, within communication up to two hops, and b) detection of
potential interfering terminals is simplified and can be practically realized.
The algorithm imprints space, time, frequency and radio hardware constraints
into a loopy factor graph and performs iterative message passing/ loopy belief
propagation (BP) with randomized initial priors. Sufficient conditions for
convergence to a valid solution are offered, for the first time in the
literature, exploiting the structure of the proposed factor graph. Based on
theoretical findings, modifications of BP are devised that i) accelerate
convergence to a valid solution and ii) reduce computation cost. Simulations
reveal promising throughput results of the proposed distributed algorithm, even
though it utilizes simplified interfering terminals set detection. Future work
could modify the constraints such that other disruptive wireless technologies
(e.g., full-duplex radios or network coding) could be accommodated within the
same inference framework.
Xiaohui Zhou, Jing Guo, Salman Durrani, Marco Di Renzo
Comments: submitted for possible journal publication
Subjects: Information Theory (cs.IT)
Deployment of low cost power beacons (PBs) is a promising solution for
dedicated wireless power transfer (WPT) in future wireless networks. In this
paper, we present a tractable model for PB-assisted millimeter wave (mmWave)
wireless ad hoc networks, where each transmitter (TX) harvests energy from all
PBs and then uses the harvested energy to transmit information to its desired
receiver. Our model accounts for realistic aspects of WPT and mmWave
transmissions, such as power circuit activation threshold, allowed maximum
harvested power, maximum transmit power, beamforming and blockage. Using
stochastic geometry, we derive the Laplace transform of the aggregate received
power at the TX to calculate the power coverage probability. We approximate and
discretize the transmit power of each TX into a finite number of discrete power
levels in log scale to compute the channel and total coverage probability. We
compare our analytical predictions to simulations and observe good accuracy.
The proposed model allows insights into effect of system parameters, such as
transmit power of PBs, main lobe beam-width and power circuit activation
threshold on the overall coverage probability. The results confirm that it is
feasible and safe to power TXs in a mmWave ad hoc network using PBs.
Rui Zhang, He Chen, Phee Lep Yeoh, Yonghui Li, Branka Vucetic
Comments: 6 pages, 3 figures, conference
Subjects: Information Theory (cs.IT)
This paper proposes and analyzes a new full-duplex (FD) cooperative cognitive
radio network with wireless energy harvesting (EH). We consider that the
secondary receiver is equipped with a FD radio and acts as a FD hybrid access
point (HAP), which aims to collect information from its associated EH secondary
transmitter (ST) and relay the signals. The ST is assumed to be equipped with
an EH unit and a rechargeable battery such that it can harvest and accumulate
energy from radio frequency (RF) signals transmitted by the primary transmitter
(PT) and the HAP. We develop a novel cooperative spectrum sharing (CSS)
protocol for the considered system. In the proposed protocol, thanks to its FD
capability, the HAP can receive the PT’s signals and transmit energy-bearing
signals to charge the ST simultaneously, or forward the PT’s signals and
receive the ST’s signals at the same time. We derive analytical expressions for
the achievable throughput of both primary and secondary links by characterizing
the dynamic charging/discharging behaviors of the ST battery as a finite-state
Markov chain. We present numerical results to validate our theoretical analysis
and demonstrate the merits of the proposed protocol over its non-cooperative
counterpart.
Peter Trifonov, Vera Miloslavskaya, Ruslan Morozov
Subjects: Information Theory (cs.IT)
An extension of the stack decoding algorithm for polar codes is presented.
The paper introduces a new score function, which enables one to accurately
compare paths of different length. This results in significant complexity
reduction with respect to the original stack algorithm at the expense of
negligible performance loss.
Muhammad R. A. Khandaker, Kai-Kit Wong, Gan Zheng
Comments: Accepted in IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
This paper brings the novel idea of paying the utility to the winning agents
in terms of some physical entity in cooperative communications. Our setting is
a secret two-way communication channel where two transmitters exchange
information in the presence of an eavesdropper. The relays are selected from a
set of interested parties such that the secrecy sum rate is maximized. In
return, the selected relay nodes’ energy harvesting requirements will be
fulfilled up to a certain threshold through their own payoff so that they have
the natural incentive to be selected and involved in the communication.
However, relays may exaggerate their private information in order to improve
their chance to be selected. Our objective is to develop a mechanism for relay
selection that enforces them to reveal the truth since otherwise they may be
penalized. We also propose a joint cooperative relay beamforming and transmit
power optimization scheme based on an alternating optimization approach. Note
that the problem is highly non-convex since the objective function appears as a
product of three correlated Rayleigh quotients. While a common practice in the
existing literature is to optimize the relay beamforming vector for given
transmit power via rank relaxation, we propose a second-order cone programming
(SOCP)-based approach in this paper which requires a significantly lower
computational task. The performance of the incentive control mechanism and the
optimization algorithm has been evaluated through numerical simulations.
Khac-Hoang Ngo, Sheng Yang, Mari Kobayashi
Comments: 30 pages, 5 figures, submitted to the IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
We consider the content delivery problem in a fading multi-input
single-output channel with cacheaided users. We are interested in the
scalability of the content delivery rate when the number of users, K, is large.
Analytical results show that, using coded caching and wireless multicasting,
without channel state information at the transmitter (CSIT), linear scaling of
the content delivery rate with respect to K can be achieved in three different
ways. First, with quasi-static fading, it can be achieved when the number of
transmit antennas grows logarithmically with K. Second, even with a fixed
number of antennas, we can achieve the linear scaling with a threshold-based
user selection requiring only one-bit feedbacks from the users. Third, if the
multicast transmission can span over multiple independent coherence blocks
(with block fading), we show that linear scaling can be obtained when the
product of the number of coherence blocks and the number of transmit antennas
scales logarithmically with K. When CSIT is available, we propose a mixed
strategy that combines spatial multiplexing and multicasting. Numerical results
show that, by optimizing the power split between spatial multiplexing and
multicasting, we can achieve a significant gain of the content delivery rate
with moderate cache size.
Waqas Aman, Guftaar Ahmad Sardar Sidhu, Tayyaba Jabeen, Feifei Gao, Shi Jin
Subjects: Information Theory (cs.IT)
In this paper, we consider the Physical Layer Security(PLS) problem in
orthogonal frequency division multiple access (OFDMA) based dual-hop system
which consists of multiple users, multiple amplify and forward relays, and an
eavesdropper. The aim is to enhance PLS of the entire system by maximizing sum
secrecy rate of secret users through optimal resource allocation under various
practical constraints. Specifically, the sub-carrier allocation to different
users, the relay assignments, and the power loading over different sub-carriers
at transmitting nodes are optimized. The joint optimization problem is modeled
as a mixed binary integer programming problem subject to exclusive sub-carrier
allocation and separate power budget constraints at each node. A joint
optimization solution is obtained through Lagrangian dual decomposition where
KKT conditions are exploited to find the optimal power allocation at base
station. Further, to reduce the complexity, a sub-optimal scheme is presented
where the optimal power allocation is derived under fixed sub-carrier-relay
assignment. Simulation results are also provided to validate the performance of
proposed schemes.
Te Sun Han, Ryo Nomura
Comments: 23 pages
Subjects: Information Theory (cs.IT)
The first- and second-order optimum achievable exponents in the simple
hypothesis testing problem are investigated. The optimum achievable exponent
for type II error probability, under the constraint that the type I error
probability is allowed asymptotically up to epsilon, is called the
epsilon-optimum exponent. In this paper, we first give the second-order
epsilon-exponent in the case where the null hypothesis and the alternative
hypothesis are a mixed memoryless source and a stationary memoryless source,
respectively. We next generalize this setting to the case where the alternative
hypothesis is also a mixed memoryless source. We address the first-order
epsilon-optimum exponent in this setting. In addition, an extension of our
results to more general setting such as the hypothesis testing with mixed
general source and the relationship with the general compound hypothesis
testing problem are also discussed.
Jianping He, Lin Cai, Xinping Guan
Comments: 32 pages, 2 figures
Subjects: Information Theory (cs.IT)
Networked system often relies on distributed algorithms to achieve a global
computation goal with iterative local information exchanges between neighbor
nodes. To preserve data privacy, a node may add a random noise to its original
data for information exchange at each iteration. Nevertheless, a neighbor node
can estimate other’s original data based on the information it received. The
estimation accuracy and data privacy can be measured in terms of ((epsilon,
delta))-data-privacy, defined as the probability of (epsilon)-accurate
estimation (the difference of an estimation and the original data is within
(epsilon)) is no larger than (delta) (the disclosure probability). How to
optimize the estimation and analyze data privacy is a critical and open issue.
In this paper, a theoretical framework is developed to investigate how to
optimize the estimation of neighbor’s original data using the local information
received, named optimal distributed estimation. Then, we study the disclosure
probability under the optimal estimation for data privacy analysis. We further
apply the developed framework to analyze the data privacy of the
privacy-preserving average consensus algorithm and identify the optimal noises
for the algorithm.
Yuting Wei, Martin J. Wainwright, Adityanand Guntuboyina
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)
We consider a compound testing problem within the Gaussian sequence model in
which the null and alternative are specified by a pair of closed, convex cones.
Such cone testing problem arise in various applications, including detection of
treatment effects, trend detection in econometrics, signal detection in radar
processing, and shape-constrained inference in non-parametric statistics. We
provide a sharp characterization of the GLRT testing radius up to a universal
multiplicative constant in terms of the geometric structure of the underlying
convex cones. When applied to concrete examples, this result reveals some
interesting phenomena that do not arise in the analogous problems of estimation
under convex constraints. In particular, in contrast to estimation error, the
testing error no longer depends purely on the problem complexity via a
volume-based measure (such as metric entropy or Gaussian complexity), other
geometric properties of the cones also play an important role. To address the
issue of optimality, we prove information-theoretic lower bounds for minimax
testing radius again in terms of geometric quantities. Our general theorems are
illustrated by examples including the cases of monotone and orthant cones, and
involve some results of independent interest.
Daniil Ryabko
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
The independence clustering problem is considered in the following
formulation: given a set (S) of random variables, it is required to find the
finest partitioning ({U_1,dots,U_k}) of (S) into clusters such that the
clusters (U_1,dots,U_k) are mutually independent. Since mutual independence is
the target, pairwise similarity measurements are of no use, and thus
traditional clustering algorithms are inapplicable. The distribution of the
random variables in (S) is, in general, unknown, but a sample is available.
Thus, the problem is cast in terms of time series. Two forms of sampling are
considered: i.i.d. and stationary time series, with the main emphasis being on
the latter, more general, case. A consistent, computationally tractable
algorithm for each of the settings is proposed, and a number of open directions
for further research are outlined.
Reza Ghanaatian, Alexios Balatsoukas-Stimming, Christoph Muller, Michael Meidlinger, Gerald Matz, Adam Teman, Andreas Burg
Comments: 11 pages, submitted to T-VLSI
Subjects: Hardware Architecture (cs.AR); Information Theory (cs.IT)
An ultra-high throughput low-density parity check (LDPC) decoder with an
unrolled full-parallel architecture is proposed, which achieves the highest
decoding throughput compared to previously reported LDPC decoders in
literature. The decoder benefits from a serial message-transfer approach
between the decoding stages to alleviate the well-known routing congestion
problem in parallel LDPC decoders. Furthermore, a finite-alphabet message
passing algorithm is employed to replace the variable node update rule of
standard min-sum decoders with optimized look-up tables, designed in a way that
maximize the mutual information between decoding messages. The proposed
algorithm results in an architecture with reduced bit-width messages, leading
to a significantly higher decoding throughput and to a lower area as compared
to a min-sum decoder when serial message-transfer is used. The architecture is
placed and routed for the standard min-sum reference decoder and for the
proposed finite-alphabet decoder using a custom pseudo-hierarchical backend
design strategy to further alleviate routing congestions and to handle the
large design. Post-layout results show that the finite-alphabet decoder with
the serial message-transfer architecture achieves a throughput as large as 594
Gbps with an area of 16.2 mm(^2) and dissipates an average power of 22.7 pJ per
decoded bit in a 28 nm FD-SOI library. Compared to the reference min-sum
decoder, this corresponds to 3.3 times smaller area and 2 times better energy
efficiency.