Kieran Greer
Subjects: Neural and Evolutionary Computing (cs.NE)
This paper considers a process for the creation and subsequent firing of
sequences of neuronal patterns, as might be found in the human brain. The scale
is one of larger patterns emerging from an ensemble mass, possibly through some
type of energy equation and a reduction procedure. The links between the
patterns can be formed naturally, as a residual effect of the pattern creation
itself. If the process is valid, then the pattern creation can be relatively
simplistic and automatic, where the neuron does not have to do anything
particularly intelligent. The pattern interfaces become slightly abstract
without firm boundaries and exact structure is determined more by averages or
ratios. This paper follows-on closely from the earlier research, including two
earlier papers in the series and uses the ideas of entropy and cohesion. With a
small addition, it is possible to show how the inter-pattern links can be
determined. A new compact grid form of an earlier Counting Mechanism is also
demonstrated.
Pedro Tabacof, Julia Tavares, Eduardo Valle
Comments: Workshop on Adversarial Training, NIPS 2016, Barcelona, Spain
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We investigate adversarial attacks for autoencoders. We propose a procedure
that distorts the input image to mislead the autoencoder in reconstructing a
completely different target image. We attack the internal latent
representations, attempting to make the adversarial input produce an internal
representation as similar as possible as the target’s. We find that
autoencoders are much more robust to the attack than classifiers: while some
examples have tolerably small input distortion, and reasonable similarity to
the target image, there is a quasi-linear trade-off between those aims. We
report results on MNIST and SVHN datasets, and also test regular deterministic
autoencoders, reaching similar conclusions in all cases. Finally, we show that
the usual adversarial attack for classifiers, while being much easier, also
presents a direct proportion between distortion on the input, and misdirection
on the output. That proportionality however is hidden by the normalization of
the output, which maps a linear layer into non-linear probabilities.
Iulian V. Serban, Alexander G. Ororbia II, Joelle Pineau, Aaron Courville
Comments: 18 pages, 2 figures, 4 tables; under review at ICLR 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Recent advances in neural variational inference have facilitated efficient
training of powerful directed graphical models with continuous latent
variables, such as variational autoencoders. However, these models usually
assume simple, uni-modal priors – such as the multivariate Gaussian
distribution – yet many real-world data distributions are highly complex and
multi-modal. Examples of complex and multi-modal distributions range from
topics in newswire text to conversational dialogue responses. When such latent
variable models are applied to these domains, the restriction of the simple,
uni-modal prior hinders the overall expressivity of the learned model as it
cannot possibly capture more complex aspects of the data distribution. To
overcome this critical restriction, we propose a flexible, simple prior
distribution which can be learned efficiently and potentially capture an
exponential number of modes of a target distribution. We develop the
multi-modal variational encoder-decoder framework and investigate the
effectiveness of the proposed prior in several natural language processing
modeling tasks, including document modeling and dialogue modeling.
Shaofei Wang, Chong Zhang, Miguel A. Gonzalez-Ballester, Julian Yarkony
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We study the problems of multi-person pose segmentation in natural images and
instance segmentation in biological images with crowded cells. We formulate
these distinct tasks as integer programs where variables correspond to
poses/cells. To optimize, we propose a generic relaxation scheme for solving
these combinatorial problems using a column generation formulation where the
program for generating a column is solved via exact optimization of very small
scale integer programs. This results in efficient exploration of the spaces of
poses and cells.
Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we introduce the TorontoCity benchmark, which covers the full
greater Toronto area (GTA) with 712.5 (km^2) of land, 8439 (km) of road and
around 400,000 buildings. Our benchmark provides different perspectives of the
world captured from airplanes, drones and cars driving around the city.
Manually labeling such a large scale dataset is infeasible. Instead, we propose
to utilize different sources of high-precision maps to create our ground truth.
Towards this goal, we develop algorithms that allow us to align all data
sources with the maps while requiring minimal human supervision. We have
designed a wide variety of tasks including building height estimation
(reconstruction), road centerline and curb extraction, building instance
segmentation, building contour extraction (reorganization), semantic labeling
and scene type classification (recognition). Our pilot study shows that most of
these tasks are still difficult for modern convolutional neural networks.
Imon Banerjee, Lewis Hahn, Geoffrey Sonn, Richard Fan, Daniel L. Rubin
Comments: NIPS 2016 Workshop on Machine Learning for Health (NIPS ML4HC)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose an automated method for detecting aggressive prostate cancer(CaP)
(Gleason score >=7) based on a comprehensive analysis of the lesion and the
surrounding normal prostate tissue which has been simultaneously captured in
T2-weighted MR images, diffusion-weighted images (DWI) and apparent diffusion
coefficient maps (ADC). The proposed methodology was tested on a dataset of 79
patients (40 aggressive, 39 non-aggressive). We evaluated the performance of a
wide range of popular quantitative imaging features on the characterization of
aggressive versus non-aggressive CaP. We found that a group of 44
discriminative predictors among 1464 quantitative imaging features can be used
to produce an area under the ROC curve of 0.73.
Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, Jitendra Malik
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a learning framework for abstracting complex shapes by learning to
assemble objects using 3D volumetric primitives. In addition to generating
simple and geometrically interpretable explanations of 3D objects, our
framework also allows us to automatically discover and exploit consistent
structure in the data. We demonstrate that using our method allows predicting
shape representations which can be leveraged for obtaining a consistent parsing
across the instances of a shape collection and constructing an interpretable
shape similarity measure. We also examine applications for image-based
prediction as well as shape manipulation.
Jefferson Ryan Medel, Andreas Savakis
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automating the detection of anomalous events within long video sequences is
challenging due to the ambiguity of how such events are defined. We approach
the problem by learning generative models that can identify anomalies in videos
using limited supervision. We propose end-to-end trainable composite
Convolutional Long Short-Term Memory (Conv-LSTM) networks that are able to
predict the evolution of a video sequence from a small number of input frames.
Regularity scores are derived from the reconstruction errors of a set of
predictions with abnormal video sequences yielding lower regularity scores as
they diverge further from the actual sequence over time. The models utilize a
composite structure and examine the effects of conditioning in learning more
meaningful representations. The best model is chosen based on the
reconstruction and prediction accuracy. The Conv-LSTM models are evaluated both
qualitatively and quantitatively, demonstrating competitive results on anomaly
detection datasets. Conv-LSTM units are shown to be an effective tool for
modeling and predicting video sequences.
Wenjie Pei, Tadas Baltrušaitis, David M.J. Tax, Louis-Philippe Morency
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Typical techniques for sequence classification are designed for
well-segmented sequences which has been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
which are expected in real-world applications. We present the Temporal
Attention-Gated Model (TAGM) which is able to deal with noisy sequences. Our
model assimilates ideas from attention models and gated recurrent networks.
Specifically, we employ an attention model to measure the relevance of each
time step of a sequence to the final decision. We then use the relevant
segments based on their attention scores in a novel gated recurrent network to
learn the hidden representation for the classification. More importantly, our
attention weights provide a physically meaningful interpretation for the
salience of each time step in the sequence. We demonstrate the merits of our
model in both interpretability and classification performance on a variety of
tasks, including speech recognition, textual sentiment analysis and event
recognition.
Siqi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, Kevin Murphy
Comments: Under review at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
In this paper, we propose a novel training procedure for image captioning
models based on policy gradient methods. This allows us to directly optimize
for the metrics of interest, rather than just maximizing likelihood of human
generated captions. We show that by optimizing for standard metrics such as
BLEU, CIDEr, METEOR and ROUGE, we can develop a system that improve on the
metrics and ranks first on the MSCOCO image captioning leader board, even
though our CNN-RNN model is much simpler than state of the art models. We
further show that by also optimizing for the recently introduced SPICE metric,
which measures semantic quality of captions, we can produce a system that
significantly outperforms other methods as measured by human evaluation.
Finally, we show how we can leverage extra sources of information, such as
pre-trained image tagging models, to further improve quality
Kwame S. Kutten, Nicolas Charon, Michael I. Miller, J. Tilak Ratnanather, Karl Deisseroth, Li Ye, Joshua T. Vogelstein
Comments: Awaiting Peer Review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Large Deformation Diffeomorphic Metric Mapping (LDDMM) is a widely used
deformable registration algorithm for computing smooth invertible maps between
various types of anatomical shapes such as landmarks, curves, surfaces or
images. In this work, we specifically focus on the case of images and adopt an
optimal control point of view so as to extend the original LDDMM with Sum of
Squared Differences (SSD) matching term to a framework more robust to intensity
variations, which is critical for cross-modality registration. We implement a
mutual information based LDDMM (MI-LDDMM) algorithm and demonstrate its
superiority to SSD-LDDMM in aligning 2D phantoms with differing intensity
profiles. This algorithm is then used to register CLARITY mouse brain images to
a standard mouse atlas despite their differences in grayscale values. We
complement the approach by showing how a cascaded multi-scale method improves
the optimization while reducing the run time of the algorithm.
Zohreh Kohan, Reza Azmi, Behrouz Gholizadeh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Most of the temporal lobe epilepsy detection approaches are based on
hippocampus deformation and use complicated features, resulting, detection is
done with complicated features extraction and pre-processing task. In this
paper, a new detection method based on shape-based features and spherical
harmonics is proposed which can analysis the hippocampus shape anomaly and
detection asymmetry. This method consisted of two main parts; (1) shape feature
extraction, and (2) image classification. For evaluation, HFH database is used
which is publicly available in this field. Nine different geometry and 256
spherical harmonic features are introduced then selected Eighteen of them that
detect the asymmetry in hippocampus significantly in a randomly selected subset
of the dataset. Then a support vector machine (SVM) classifier was employed to
classify the remaining images of the dataset to normal and epileptic images
using our selected features. On a dataset of 25 images, 12 images were used for
feature extraction and the rest 13 for classification. The results show that
the proposed method has accuracy, specificity and sensitivity of, respectively,
84%, 100%, and 80%. Therefore, the proposed approach shows acceptable result
and is straightforward also; complicated pre-processing steps were omitted
compared to other methods.
Xiang Long, Chuang Gan, Gerard de Melo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, video captioning has been attracting an increasing amount of
interest, due to its potential for improving accessibility and information
retrieval. While existing methods rely on different kinds of visual features
and model structures, they do not fully exploit relevant semantic information.
We present an extensible approach to jointly leverage several sorts of visual
features and semantic attributes. Our novel architecture builds on LSTMs for
sentence generation, with several attention layers and two multimodal layers.
The attention mechanism learns to automatically select the most salient visual
features or semantic attributes, and the multimodal layer yields overall
representations for the input and outputs of the sentence generation component.
Experimental results on the challenging MSVD and MSR-VTT datasets show that our
framework outperforms the state-of-the-art approaches, while ground truth based
semantic attributes are able to further elevate the output quality to a
near-human level.
Mark Marsden, Kevin McGuiness, Suzanne Little, Noel E. O'Connor
Comments: 7 pages , VISAPP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we advance the state-of-the-art for crowd counting in high
density scenes by further exploring the idea of a fully convolutional crowd
counting model introduced by (Zhang et al., 2016). Producing an accurate and
robust crowd count estimator using computer vision techniques has attracted
significant research interest in recent years. Applications for crowd counting
systems exist in many diverse areas including city planning, retail, and of
course general public safety. Developing a highly generalised counting model
that can be deployed in any surveillance scenario with any camera perspective
is the key objective for research in this area. Techniques developed in the
past have generally performed poorly in highly congested scenes with several
thousands of people in frame (Rodriguez et al., 2011). Our approach, influenced
by the work of (Zhang et al., 2016), consists of the following contributions:
(1) A training set augmentation scheme that minimises redundancy among training
samples to improve model generalisation and overall counting performance; (2) a
deep, single column, fully convolutional network (FCN) architecture; (3) a
multi-scale averaging step during inference. The developed technique can
analyse images of any resolution or aspect ratio and achieves state-of-the-art
counting performance on the Shanghaitech Part B and UCF CC 50 datasets as well
as competitive performance on Shanghaitech Part A.
Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic image synthesis research has been rapidly growing with deep
networks getting more and more expressive. In the last couple of years, we have
observed images of digits, indoor scenes, birds, chairs, etc. being
automatically generated. The expressive power of image generators have also
been enhanced by introducing several forms of conditioning variables such as
object names, sentences, bounding box and key-point locations. In this work, we
propose a novel deep conditional generative adversarial network architecture
that takes its strength from the semantic layout and scene attributes
integrated as conditioning variables. We show that our architecture is able to
generate realistic outdoor scene images under different conditions, e.g.
day-night, sunny-foggy, with clear object boundaries.
He Wen, Shuchang Zhou, Zhe Liang, Yuxiang Zhang, Dieqiao Feng, Xinyu Zhou, Cong Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Fully convolutional neural networks give accurate, per-pixel prediction for
input images and have applications like semantic segmentation. However, a
typical FCN usually requires lots of floating point computation and large
run-time memory, which effectively limits its usability. We propose a method to
train Bit Fully Convolution Network (BFCN), a fully convolutional neural
network that has low bit-width weights and activations. Because most of its
computation-intensive convolutions are accomplished between low bit-width
numbers, a BFCN can be accelerated by an efficient bit-convolution
implementation. On CPU, the dot product operation between two bit vectors can
be reduced to bitwise operations and popcounts, which can offer much higher
throughput than 32-bit multiplications and additions.
To validate the effectiveness of BFCN, we conduct experiments on the PASCAL
VOC 2012 semantic segmentation task and Cityscapes. Our BFCN with 1-bit weights
and 2-bit activations, which runs 7.8x faster on CPU or requires less than 1\%
resources on FPGA, can achieve comparable performance as the 32-bit
counterpart.
Christian Rupprecht, Iro Laina, Maximilian Baust, Federico Tombari, Gregory D. Hager, Nassir Navab
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Many prediction tasks contain uncertainty. In the case of next-frame or
future prediction the uncertainty is inherent in the task itself, as it is
impossible to foretell what exactly is going to happen in the future. Another
source of uncertainty or ambiguity is the way data is labeled. Sometimes not
all objects of interest are annotated in a given image or the annotation is
ambiguous, e.g. in the form of occluded joints in human pose estimation. We
present a method that is able to handle these problems by predicting not a
single output but multiple hypotheses. More precisely, we propose a framework
for re-formulating existing single prediction models as multiple hypothesis
prediction (MHP) problems as well as a meta loss and an optimization procedure
to train the resulting MHP model. We consider three entirely different
applications, i.e. future prediction, image classification and human pose
estimation, and demonstrate how existing single hypothesis predictors (SHPs)
can be turned into MHPs. The performed experiments show that the resulting MHP
outperforms the existing SHP and yields additional insights regarding the
variation and ambiguity of the predictions.
Artem Rozantsev, Sudipta N. Sinha, Debadeepta Dey, Pascal Fua
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We propose a new method to estimate the 6-dof trajectory of a flying object
such as a quadrotor UAV within a 3D airspace monitored using multiple fixed
ground cameras. It is based on a new structure from motion formulation for the
3D reconstruction of a single moving point with known motion dynamics. Our main
contribution is a new bundle adjustment procedure which in addition to
optimizing the camera poses, regularizes the point trajectory using a prior
based on motion dynamics (or specifically flight dynamics). Furthermore, we can
infer the underlying control input sent to the UAV’s autopilot that determined
its flight trajectory.
Our method requires neither perfect single-view tracking nor appearance
matching across views. For robustness, we allow the tracker to generate
multiple detections per frame in each video. The true detections and the data
association across videos is estimated using robust multi-view triangulation
and subsequently refined during our bundle adjustment procedure. Quantitative
evaluation on simulated data and experiments on real videos from indoor and
outdoor scenes demonstrates the effectiveness of our method.
Michael Miller, Jan Van lent
Comments: 15 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
This paper focuses on a similarity measure, known as the Wasserstein
distance, with which to compare images. The Wasserstein distance results from a
partial differential equation (PDE) formulation of Monge’s optimal transport
problem. We present an efficient numerical solution method for solving Monge’s
problem. To demonstrate the measure’s discriminatory power when comparing
images, we use it within the architecture of the (k)-Nearest Neighbour ((k)-NN)
machine learning algorithm to illustrate the measure’s potential benefits over
other more traditional distance metrics and also the state-of-the-art Tangent
Space distance on the well-known MNIST dataset. To our knowledge, the PDE
formulation of the Wasserstein metric has not been presented for dealing with
image comparison, nor has the Wasserstein distance been used within the
(k)-nearest neighbour architecture.
Anirban Santara, Kaustubh Mani, Pranoot Hatwar, Ankit Singh, Ankur Garg, Kirti Padia, Pabitra Mitra
Comments: 8 pages, 10 figures, Submitted to IEEE TGRS, Code available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning based landcover classification algorithms have recently been
proposed in literature. In hyperspectral images (HSI) they face the challenges
of large dimensionality, spatial variability of spectral signatures and
scarcity of labeled data. In this article we propose an end-to-end deep
learning architecture that extracts band specific spectral-spatial features and
performs landcover classification. The architecture has fewer independent
connection weights and thus requires lesser number of training data. The method
is found to outperform the highest reported accuracies on popular hyperspectral
image data sets.
Andras Rozsa, Manuel Gunther, Terrance E. Boult
Comments: Under review as a conference paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Machine learning models, including state-of-the-art deep neural networks, are
vulnerable to small perturbations that cause unexpected classification errors.
This unexpected lack of robustness raises fundamental questions about their
generalization properties and poses a serious concern for practical
deployments. As such perturbations can remain imperceptible – commonly called
adversarial examples that demonstrate an inherent inconsistency between
vulnerable machine learning models and human perception – some prior work casts
this problem as a security issue as well. Despite the significance of the
discovered instabilities and ensuing research, their cause is not well
understood, and no effective method has been developed to address the problem
highlighted by adversarial examples. In this paper, we present a novel theory
to explain why this unpleasant phenomenon exists in deep neural networks. Based
on that theory, we introduce a simple, efficient and effective training
approach, Batch Adjusted Network Gradients (BANG), which significantly improves
the robustness of machine learning models. While the BANG technique does not
rely on any form of data augmentation or the application of adversarial images
for training, the resultant classifiers are more resistant to adversarial
perturbations while maintaining or even enhancing classification performance
overall.
Haoshu Fang, Shuqin Xie, Cewu Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Multi-person pose estimation in wild images is a challenging problem, where
human detector inevitably suffers from errors both in localization and
recognition. These undesirable errors would ultimately result in failures of
most CNN-based single-person pose estimators. In this paper, a novel regional
multi-person pose estimation (RMPE) framework is proposed to facilitate
single-person pose estimator in presence of the inaccurate human detector. In
particular, our framework consists of three novel techniques, namely, symmetric
spatial transformer network (SSTN), deep proposals generator (DPG) and
parametric pose non-maximum suppression (NMS). Extensive experimental results
have demonstrated the validity and effectiveness of the proposed approach. In
comparison to the state-of-the-art approach, the proposed approach
significantly achieves 16% relative increase in mAP on MPII (multi person)
dataset[1]. Our model and source codes are publicly available.
Jiajun Lu, Aditya Deshpande, David Forsyth
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Problems such as predicting an optical flow field (Y) for an image (X) are
ambiguous: many very distinct solutions are good. Representing this ambiguity
requires building a conditional model P(Y|X) of the prediction, conditioned on
the image. It is hard because training data usually does not contain many
different flow fields for the same image. As a result, we need different images
to share data to produce good models. We demonstrate an improved method for
building conditional models, the Co-Embedding Deep Variational Auto Encoder.
Our CDVAE exploits multiple encoding and decoding layers for both X and Y.
These are tied during training to produce a model of the joint distribution
P(X, Y), which provides the necessary smoothing. Our tying procedure is
designed to yield a conditional model easy at test time. We demonstrate our
model on three example tasks using real data: image saturation adjustment,
image relighting, and motion prediction. We describe quantitative evaluation
metrics to evaluate ambiguous generation results. Our results quantitatively
and qualitatively advance the state of the art.
Lei Shi, Rui Guo, Yuchen Ma
Comments: 5 pages, 5 figures, acknowledgement: this paper is accepted by the icces2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image pattern recognition is an important area in digital image processing.
An efficient pattern recognition algorithm should be able to provide correct
recognition at a reduced computational time. Off late amongst the machine
learning pattern recognition algorithms, Artificial fish swarm algorithm is one
of the swarm intelligence optimization algorithms that works based on
population and stochastic search. In order to achieve acceptable result, there
are many parameters needs to be adjusted in AFSA. Among these parameters,
visual and step are very significant in view of the fact that artificial fish
basically move based on these parameters. In standard AFSA, these two
parameters remain constant until the algorithm termination. Large values of
these parameters increase the capability of algorithm in global search, while
small values improve the local search ability of the algorithm. In this paper,
we empirically study the performance of the AFSA and different approaches to
balance between local and global exploration have been tested based on the
adaptive modification of visual and step during algorithm execution. The
proposed approaches have been evaluated based on the four well-known benchmark
functions. Experimental results show considerable positive impact on the
performance of AFSA. A Convex optimization has been integrated into the
proposed work to have an ideal segmentation of the input image which is a MR
brain image.
Xiaojie Jin, Xin Li, Huaxin Xiao, Xiaohui Shen, Zhe Lin, Jimei Yang, Yunpeng Chen, Jian Dong, Luoqi Liu, Zequn Jie, Jiashi Feng, Shuicheng Yan
Comments: 15 pages, 7 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we address the challenging video scene parsing problem by
developing effective representation learning methods given limited parsing
annotations. In particular, we contribute two novel methods that constitute a
unified parsing framework. (1) extbf{Predictive feature learning}} from
nearly unlimited unlabeled video data. Different from existing methods learning
features from single frame parsing, we learn spatiotemporal discriminative
features by enforcing a parsing network to predict future frames and their
parsing maps (if available) given only historical frames. In this way, the
network can effectively learn to capture video dynamics and temporal context,
which are critical clues for video scene parsing, without requiring extra
manual annotations. (2) extbf{Prediction steering parsing}} architecture that
effectively adapts the learned spatiotemporal features to scene parsing tasks
and provides strong guidance for any off-the-shelf parsing model to achieve
better video scene parsing performance. Extensive experiments over two
challenging datasets, Cityscapes and Camvid, have demonstrated the
effectiveness of our methods by showing significant improvement over
well-established baselines.
Angela Dai, Charles Ruizhongtai Qi, Matthias Nießner
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a data-driven approach to complete partial 3D shapes through a
combination of volumetric deep neural networks and 3D shape synthesis. From a
partially-scanned input shape, our method first infers a low-resolution — but
complete — output. To this end, we introduce a 3D-Encoder-Predictor Network
(3D-EPN) which is composed of 3D convolutional layers. The network is trained
to predict and fill in missing data, and operates on an implicit surface
representation that encodes both known and unknown space. This allows us to
predict global structure in unknown areas at high accuracy. We then correlate
these intermediary results with 3D geometry from a shape database at test time.
In a final pass, we propose a patch-based 3D shape synthesis method that
imposes the 3D geometry from these retrieved shapes as constraints on the
coarsely-completed mesh. This synthesis process enables us to reconstruct
fine-scale detail and generate high-resolution output while respecting the
global mesh structure obtained by the 3D-EPN. Although our 3D-EPN outperforms
state-of-the-art completion method, the main contribution in our work lies in
the combination of a data-driven shape predictor and analytic 3D shape
synthesis. In our results, we show extensive evaluations on a newly-introduced
shape completion benchmark for both real-world and synthetic data.
Luka Čehovin, Alan Lukežič, Aleš Leonardis, Matej Kristan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Benchmarks have played an important role in advancing the field of visual
object tracking. Due to weakly defined and sometimes biased attribute
specification, existing benchmarks do not allow fine-grained tracker analysis
with respect to specific attributes. Apart from illumination changes and
occlusions, the tracking performance is most strongly affected by the object
motion. In this paper, we propose a novel approach for tracker evaluation with
respect to the motion-related attributes. Our approach utilizes 360 degree
videos to generate realistic annotated short-term tracking scenarios with exact
specification of the object motion type and extent. A fully annotated dataset
of 360 degree videos was constructed and fine-grained performance of 17
state-of-the-art trackers is reported. The proposed approach offers unique
tracking insights, is complementary to existing benchmarks, and will be made
publicly available. The evaluation system was implemented within a
state-of-the-art performance evaluation toolkit and supports straight-forward
extension with third-party trackers.
Il Jun Ahn (1), Woo Hyun Nam (1) ((1) Digital Media & Communications R&D Center, Samsung Electronics, Seoul, Korea)
Comments: Il Jun Ahn and Woo Hyun Nam contributed equally to this work. Submitted to IEEE Transactions on Consumer Electronics
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, various deep-neural-network (DNN)-based approaches have been
proposed for single-image super-resolution (SISR). Despite their promising
results on major structure regions such as edges and lines, they still suffer
from limited performance on texture regions that consist of very complex and
fine patterns. This is because, during the acquisition of a low-resolution (LR)
image via down-sampling, these regions lose most of the high frequency
information necessary to represent the texture details. In this paper, we
present a novel texture enhancement framework for SISR to effectively improve
the spatial resolution in the texture regions as well as edges and lines. We
call our method, high-resolution (HR) style transfer algorithm. Our framework
consists of three steps: (i) generate an initial HR image from an interpolated
LR image via an SISR algorithm, (ii) generate an HR style image from the
initial HR image via down-scaling and tiling, and (iii) combine the HR style
image with the initial HR image via a customized style transfer algorithm.
Here, the HR style image is obtained by down-scaling the initial HR image and
then repetitively tiling it into an image of the same size as the HR image.
This down-scaling and tiling process comes from the idea that texture regions
are often composed of small regions that similar in appearance albeit sometimes
different in scale. This process creates an HR style image that is rich in
details, which can be used to restore high-frequency texture details back into
the initial HR image via the style transfer algorithm. Experimental results on
a number of texture datasets show that our proposed HR style transfer algorithm
provides more visually pleasing results compared with competitive methods.
Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey Dosovitskiy, Jeff Clune
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Generating high-resolution, photo-realistic images has been a long-standing
goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting
way to synthesize novel images by performing gradient ascent in the latent
space of a generator network to maximize the activations of one or multiple
neurons in a separate classifier network. In this paper we extend this method
by introducing an additional prior on the latent code, improving both sample
quality and sample diversity, leading to a state-of-the-art generative model
that produces high quality images at higher resolutions (227×227) than previous
generative models, and does so for all 1000 ImageNet categories. In addition,
we provide a unified probabilistic interpretation of related activation
maximization methods and call the general class of models “Plug and Play
Generative Networks”. PPGNs are composed of 1) a generator network G that is
capable of drawing a wide range of image types and 2) a replaceable “condition”
network C that tells the generator what to draw. We demonstrate the generation
of images conditioned on a class (when C is an ImageNet or MIT Places
classification network) and also conditioned on a caption (when C is an image
captioning network). Our method also improves the state of the art of
Multifaceted Feature Visualization, which generates the set of synthetic inputs
that activate a neuron in order to better understand how deep neural networks
operate. Finally, we show that our model performs reasonably well at the task
of image inpainting. While image models are used in this paper, the approach is
modality-agnostic and can be applied to many types of data.
Shehroze Bhatti, Alban Desmaison, Ondrej Miksik, Nantas Nardelli, N. Siddharth, Philip H. S. Torr
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
A number of recent approaches to policy learning in 2D game domains have been
successful going directly from raw input images to actions. However when
employed in complex 3D environments, they typically suffer from challenges
related to partial observability, combinatorial exploration spaces, path
planning, and a scarcity of rewarding scenarios. Inspired from prior work in
human cognition that indicates how humans employ a variety of semantic concepts
and abstractions (object categories, localisation, etc.) to reason about the
world, we build an agent-model that incorporates such abstractions into its
policy-learning framework. We augment the raw image input to a Deep Q-Learning
Network (DQN), by adding details of objects and structural elements
encountered, along with the agent’s localisation. The different components are
automatically extracted and composed into a topological representation using
on-the-fly object detection and 3D-scene reconstruction.We evaluate the
efficacy of our approach in Doom, a 3D first-person combat game that exhibits a
number of challenges discussed, and show that our augmented framework
consistently learns better, more effective policies.
Da Chen, Jean-Marie Mirebeau, Laurent D. Cohen
Comments: in International Journal of Computer Vision, 2016
Subjects: Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a novel curvature-penalized minimal path model via
an orientation-lifted Finsler metric and the Euler elastica curve. The original
minimal path model computes the globally minimal geodesic by solving an Eikonal
partial differential equation (PDE). Essentially, this first-order model is
unable to penalize curvature which is related to the path rigidity property in
the classical active contour models. To solve this problem, we present an
Eikonal PDE-based Finsler elastica minimal path approach to address the
curvature-penalized geodesic energy minimization problem. We were successful at
adding the curvature penalization to the classical geodesic energy. The basic
idea of this work is to interpret the Euler elastica bending energy via a novel
Finsler elastica metric that embeds a curvature penalty. This metric is
non-Riemannian, anisotropic and asymmetric, and is defined over an
orientation-lifted space by adding to the image domain the orientation as an
extra space dimension. Based on this orientation lifting, the proposed minimal
path model can benefit from both the curvature and orientation of the paths.
Thanks to the fast marching method, the global minimum of the
curvature-penalized geodesic energy can be computed efficiently. We introduce
two anisotropic image data-driven speed functions that are computed by
steerable filters. Based on these orientation-dependent speed functions, we can
apply the proposed Finsler elastica minimal path model to the applications of
closed contour detection, perceptual grouping and tubular structure extraction.
Numerical experiments on both synthetic and real images show that these
applications of the proposed model indeed obtain promising results.
Pedro Tabacof, Julia Tavares, Eduardo Valle
Comments: Workshop on Adversarial Training, NIPS 2016, Barcelona, Spain
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We investigate adversarial attacks for autoencoders. We propose a procedure
that distorts the input image to mislead the autoencoder in reconstructing a
completely different target image. We attack the internal latent
representations, attempting to make the adversarial input produce an internal
representation as similar as possible as the target’s. We find that
autoencoders are much more robust to the attack than classifiers: while some
examples have tolerably small input distortion, and reasonable similarity to
the target image, there is a quasi-linear trade-off between those aims. We
report results on MNIST and SVHN datasets, and also test regular deterministic
autoencoders, reaching similar conclusions in all cases. Finally, we show that
the usual adversarial attack for classifiers, while being much easier, also
presents a direct proportion between distortion on the input, and misdirection
on the output. That proportionality however is hidden by the normalization of
the output, which maps a linear layer into non-linear probabilities.
Jean-Paul Gauthier, Dario Prandi
Comments: 15 pages, 2 figures
Subjects: Numerical Analysis (math.NA); Computer Vision and Pattern Recognition (cs.CV); Group Theory (math.GR)
We consider functions (f) of two real variables, given as trigonometric
functions over a finite set (F) of frequencies. This set is assumed to be
closed under rotations in the frequency plane of angle (frac{2kpi}{M}) for
some integer (M). Firstly, we address the problem of evaluating these functions
over a similar finite set (E) in the space plane and, secondly, we address the
problems of interpolating or approximating a function (g) of two variables by
such an (f) over the grid (E.) In particular, for this aim, we establish an
abstract factorization theorem for the evaluation function, which is a key
point for an efficient numerical solution to these problems. This result is
based on the very special structure of the group (SE(2,N)), subgroup of the
group (SE(2)) of motions of the plane corresponding to discrete rotations,
which is a maximally almost periodic group.
Although the motivation of this paper comes from our previous works on
biomimetic image reconstruction and pattern recognition, where these questions
appear naturally, this topic is related with several classical problems: the
FFT in polar coordinates, the Non Uniform FFT, the evaluation of general
trigonometric polynomials, and so on.
Shehroze Bhatti, Alban Desmaison, Ondrej Miksik, Nantas Nardelli, N. Siddharth, Philip H. S. Torr
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
A number of recent approaches to policy learning in 2D game domains have been
successful going directly from raw input images to actions. However when
employed in complex 3D environments, they typically suffer from challenges
related to partial observability, combinatorial exploration spaces, path
planning, and a scarcity of rewarding scenarios. Inspired from prior work in
human cognition that indicates how humans employ a variety of semantic concepts
and abstractions (object categories, localisation, etc.) to reason about the
world, we build an agent-model that incorporates such abstractions into its
policy-learning framework. We augment the raw image input to a Deep Q-Learning
Network (DQN), by adding details of objects and structural elements
encountered, along with the agent’s localisation. The different components are
automatically extracted and composed into a topological representation using
on-the-fly object detection and 3D-scene reconstruction.We evaluate the
efficacy of our approach in Doom, a 3D first-person combat game that exhibits a
number of challenges discussed, and show that our augmented framework
consistently learns better, more effective policies.
Michael B. Chang, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum
Comments: Under review as a conference paper for ICLR 2017. 11 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
We present the Neural Physics Engine (NPE), an object-based neural network
architecture for learning predictive models of intuitive physics. We propose a
factorization of a physical scene into composable object-based representations
and also the NPE architecture whose compositional structure factorizes object
dynamics into pairwise interactions. Our approach draws on the strengths of
both symbolic and neural approaches: like a symbolic physics engine, the NPE is
endowed with generic notions of objects and their interactions, but as a neural
network it can also be trained via stochastic gradient descent to adapt to
specific object properties and dynamics of different worlds. We evaluate the
efficacy of our approach on simple rigid body dynamics in two-dimensional
worlds. By comparing to less structured architectures, we show that our model’s
compositional representation of the structure in physical interactions improves
its ability to predict movement, generalize to different numbers of objects,
and infer latent properties of objects such as mass.
Kleanthi Georgala, Micheal Hoffmann, Axel-Cyrille Ngonga Ngomo
Subjects: Artificial Intelligence (cs.AI)
Time-efficient link discovery is of central importance to implement the
vision of the Semantic Web. Some of the most rapid Link Discovery approaches
rely internally on planning to execute link specifications. In newer works,
linear models have been used to estimate the runtime the fastest planners.
However, no other category of models has been studied for this purpose so far.
In this paper, we study non-linear runtime estimation functions for runtime
estimation. In particular, we study exponential and mixed models for the
estimation of the runtimes of planners. To this end, we evaluate three
different models for runtime on six datasets using 400 link specifications. We
show that exponential and mixed models achieve better fits when trained but are
only to be preferred in some cases. Our evaluation also shows that the use of
better runtime approximation models has a positive impact on the overall
execution of link specifications.
Stefano Borgo, Loris Bozzato, Alessio Palmero Aprosio, Marco Rospocher, Luciano Serafini
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Systems for automatic extraction of semantic information about events from
large textual resources are now available: these tools are capable to generate
RDF datasets about text extracted events and this knowledge can be used to
reason over the recognized events. On the other hand, text based tasks for
event recognition, as for example event coreference (i.e. recognizing whether
two textual descriptions refer to the same event), do not take into account
ontological information of the extracted events in their process. In this
paper, we propose a method to derive event coreference on text extracted event
data using semantic based rule reasoning. We demonstrate our method considering
a limited (yet representative) set of event types: we introduce a formal
analysis on their ontological properties and, on the base of this, we define a
set of coreference criteria. We then implement these criteria as RDF-based
reasoning rules to be applied on text extracted event data. We evaluate the
effectiveness of our approach over a standard coreference benchmark dataset.
Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Rezende, Koray Kavukcuoglu
Comments: Published in NIPS 2016
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Reasoning about objects, relations, and physics is central to human
intelligence, and a key goal of artificial intelligence. Here we introduce the
interaction network, a model which can reason about how objects in complex
systems interact, supporting dynamical predictions, as well as inferences about
the abstract properties of the system. Our model takes graphs as input,
performs object- and relation-centric reasoning in a way that is analogous to a
simulation, and is implemented using deep neural networks. We evaluate its
ability to reason about several challenging physical domains: n-body problems,
rigid-body collision, and non-rigid dynamics. Our results show it can be
trained to accurately simulate the physical trajectories of dozens of objects
over thousands of time steps, estimate abstract quantities such as energy, and
generalize automatically to systems with different numbers and configurations
of objects and relations. Our interaction network implementation is the first
general-purpose, learnable physics engine, and a powerful general framework for
reasoning about object and relations in a wide variety of complex real-world
domains.
Xiaojian Wu, Akshat Kumar, Daniel Sheldon, Shlomo Zilberstein
Comments: AAAI 2017
Subjects: Artificial Intelligence (cs.AI)
Stochastic network design is a general framework for optimizing network
connectivity. It has several applications in computational sustainability
including spatial conservation planning, pre-disaster network preparation, and
river network optimization. A common assumption in previous work has been made
that network parameters (e.g., probability of species colonization) are
precisely known, which is unrealistic in real- world settings. We therefore
address the robust river network design problem where the goal is to optimize
river connectivity for fish movement by removing barriers. We assume that fish
passability probabilities are known only imprecisely, but are within some
interval bounds. We then develop a planning approach that computes the policies
with either high robust ratio or low regret. Empirically, our approach scales
well to large river networks. We also provide insights into the solutions
generated by our robust approach, which has significantly higher robust ratio
than the baseline solution with mean parameter estimates.
Hugo Gilbert, Paul Weng, Yan Xu
Comments: Long version of AAAI 2017 paper. arXiv admin note: text overlap with arXiv:1611.00862
Subjects: Artificial Intelligence (cs.AI)
In the Markov decision process model, policies are usually evaluated by
expected cumulative rewards. As this decision criterion is not always suitable,
we propose in this paper an algorithm for computing a policy optimal for the
quantile criterion. Both finite and infinite horizons are considered. Finally
we experimentally evaluate our approach on random MDPs and on a data center
control problem.
Christian Walder, Dongwoo Kim
Subjects: Artificial Intelligence (cs.AI)
Sequence modeling with neural networks has lead to powerful models of
symbolic music data. We address the problem of exploiting these models to reach
creative musical goals. To this end we generalise previous work, which sampled
Markovian sequence models under the constraint that the sequence belong to the
language of a given finite state machine. We consider more expressive
non-Markov models, thereby requiring approximate sampling which we provide in
the form of an efficient sequential Monte Carlo method. In addition we provide
and compare with a beam search strategy for conditional probability
maximisation. Our algorithms are capable of convincingly re-harmonising famous
musical works. To demonstrate this we provide visualisations, quantitative
experiments, a human listening test and illustrative audio examples. We find
both the sampling and optimisation procedures to be effective, yet
complementary in character. For the case of highly permissive constraint sets,
we find that sampling is to be preferred due to the overly regular nature of
the optimisation based results.
Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Deep reinforcement learning (RL) can acquire complex behaviors from low-level
inputs, such as images. However, real-world applications of such methods
require generalizing to the vast variability of the real world. Deep networks
are known to achieve remarkable generalization when provided with massive
amounts of labeled data, but can we provide this breadth of experience to an RL
agent, such as a robot? The robot might continuously learn as it explores the
world around it, even while deployed. However, this learning requires access to
a reward function, which is often hard to measure in real-world domains, where
the reward could depend on, for example, unknown positions of objects or the
emotional state of the user. Conversely, it is often quite practical to provide
the agent with reward functions in a limited set of situations, such as when a
human supervisor is present or in a controlled setting. Can we make use of this
limited supervision, and still benefit from the breadth of experience an agent
might collect on its own? In this paper, we formalize this problem as
semisupervised reinforcement learning, where the reward function can only be
evaluated in a set of “labeled” MDPs, and the agent must generalize its
behavior to the wide range of states it might encounter in a set of “unlabeled”
MDPs, by using experience from both settings. Our proposed method infers the
task objective in the unlabeled MDPs through an algorithm that resembles
inverse RL, using the agent’s own prior experience in the labeled MDPs as a
kind of demonstration of optimal behavior. We evaluate our method on
challenging tasks that require control directly from images, and show that our
approach can improve the generalization of a learned deep neural network policy
by using experience for which no reward function is available. We also show
that our method outperforms direct supervised learning of the reward.
Iulian V. Serban, Alexander G. Ororbia II, Joelle Pineau, Aaron Courville
Comments: 18 pages, 2 figures, 4 tables; under review at ICLR 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Recent advances in neural variational inference have facilitated efficient
training of powerful directed graphical models with continuous latent
variables, such as variational autoencoders. However, these models usually
assume simple, uni-modal priors – such as the multivariate Gaussian
distribution – yet many real-world data distributions are highly complex and
multi-modal. Examples of complex and multi-modal distributions range from
topics in newswire text to conversational dialogue responses. When such latent
variable models are applied to these domains, the restriction of the simple,
uni-modal prior hinders the overall expressivity of the learned model as it
cannot possibly capture more complex aspects of the data distribution. To
overcome this critical restriction, we propose a flexible, simple prior
distribution which can be learned efficiently and potentially capture an
exponential number of modes of a target distribution. We develop the
multi-modal variational encoder-decoder framework and investigate the
effectiveness of the proposed prior in several natural language processing
modeling tasks, including document modeling and dialogue modeling.
Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, Maarten de Rijke
Comments: 10 pages, What If workshop NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
The ability to perform effective off-policy learning would revolutionize the
process of building better interactive systems, such as search engines and
recommendation systems for e-commerce, computational advertising and news.
Recent approaches for off-policy evaluation and learning in these settings
appear promising. With this paper, we provide real-world data and a
standardized test-bed to systematically investigate these algorithms using data
from display advertising. In particular, we consider the problem of filling a
banner ad with an aggregate of multiple products the user may want to purchase.
This paper presents our test-bed, the sanity checks we ran to ensure its
validity, and shows results comparing state-of-the-art off-policy learning
methods like doubly robust optimization, POEM, and reductions to supervised
learning using regression baselines. Our results show experimental evidence
that recent off-policy learning methods can improve upon state-of-the-art
supervised learning techniques on a large-scale real-world data set.
Dimitrios Kalatzis, Arash Eshghi, Oliver Lemon
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
We present a method for inducing new dialogue systems from very small amounts
of unannotated dialogue data, showing how word-level exploration using
Reinforcement Learning (RL), combined with an incremental and semantic grammar
– Dynamic Syntax (DS) – allows systems to discover, generate, and understand
many new dialogue variants. The method avoids the use of expensive and
time-consuming dialogue act annotations, and supports more natural
(incremental) dialogues than turn-based systems. Here, language generation and
dialogue management are treated as a joint decision/optimisation problem, and
the MDP model for RL is constructed automatically. With an implemented system,
we show that this method enables a wide range of dialogue variations to be
automatically captured, even when the system is trained from only a single
dialogue. The variants include question-answer pairs, over- and
under-answering, self- and other-corrections, clarification interaction,
split-utterances, and ellipsis. This generalisation property results from the
structural knowledge and constraints present within the DS grammar, and
highlights some limitations of recent systems built using machine learning
techniques only.
Darko Brodić, Alessia Amelio
Comments: 13 pages, 2 figures, 6 tables, 5th International Workshop on Symbiotic Interaction (Symbiotic), Padua, Italy, 29-30 September 2016
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
The paper analyzes the interaction between humans and computers in terms of
response time in solving the image-based CAPTCHA. In particular, the analysis
focuses on the attitude of the different Internet users in easily solving four
different types of image-based CAPTCHAs which include facial expressions like:
animated character, old woman, surprised face, worried face. To pursue this
goal, an experiment is realized involving 100 Internet users in solving the
four types of CAPTCHAs, differentiated by age, Internet experience, and
education level. The response times are collected for each user. Then,
association rules are extracted from user data, for evaluating the dependence
of the response time in solving the CAPTCHA from age, education level and
experience in internet usage by statistical analysis. The results implicitly
capture the users’ psychological states showing in what states the users are
more sensible. It reveals to be a novelty and a meaningful analysis in the
state-of-the-art.
Jiajun Lu, Aditya Deshpande, David Forsyth
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Problems such as predicting an optical flow field (Y) for an image (X) are
ambiguous: many very distinct solutions are good. Representing this ambiguity
requires building a conditional model P(Y|X) of the prediction, conditioned on
the image. It is hard because training data usually does not contain many
different flow fields for the same image. As a result, we need different images
to share data to produce good models. We demonstrate an improved method for
building conditional models, the Co-Embedding Deep Variational Auto Encoder.
Our CDVAE exploits multiple encoding and decoding layers for both X and Y.
These are tied during training to produce a model of the joint distribution
P(X, Y), which provides the necessary smoothing. Our tying procedure is
designed to yield a conditional model easy at test time. We demonstrate our
model on three example tasks using real data: image saturation adjustment,
image relighting, and motion prediction. We describe quantitative evaluation
metrics to evaluate ambiguous generation results. Our results quantitatively
and qualitatively advance the state of the art.
Zizhan Zheng, Ness B. Shroff, Prasant Mohapatra
Comments: 9 pages, 2 figures; accepted by the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA, Feb. 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cybersecurity is increasingly threatened by advanced and persistent attacks.
As these attacks are often designed to disable a system (or a critical
resource, e.g., a user account) repeatedly, it is crucial for the defender to
keep updating its security measures to strike a balance between the risk of
being compromised and the cost of security updates. Moreover, these decisions
often need to be made with limited and delayed feedback due to the stealthy
nature of advanced attacks. In addition to targeted attacks, such an optimal
timing policy under incomplete information has broad applications in
cybersecurity. Examples include key rotation, password change, application of
patches, and virtual machine refreshing. However, rigorous studies of optimal
timing are rare. Further, existing solutions typically rely on a pre-defined
attack model that is known to the defender, which is often not the case in
practice. In this work, we make an initial effort towards achieving optimal
timing of security updates in the face of unknown stealthy attacks. We consider
a variant of the influential FlipIt game model with asymmetric feedback and
unknown attack time distribution, which provides a general model to consecutive
security updates. The defender’s problem is then modeled as a time associative
bandit problem with dependent arms. We derive upper confidence bound based
learning policies that achieve low regret compared with optimal periodic
defense strategies that can only be derived when attack time distributions are
known.
Roman Zapatrin
Comments: 21 pp
Journal-ref: In: Beyond Peaceful Coexistence: The Emergence of Space, Time and
Quantum; Ignazio Licata, ed., World Scientific (2016), pp. 201-220
Subjects: Information Retrieval (cs.IR)
The ubiquitous nature of modern Information Retrieval and Virtual World give
rise to new realities. To what extent are these “realities” real? Which
“physics” should be applied to quantitatively describe them? In this essay I
dwell on few examples. The first is Adaptive neural networks, which are not
networks and not neural, but still provide service similar to classical ANNs in
extended fashion. The second is the emergence of objects looking like
Einsteinian spacetime, which describe the behavior of an Internet surfer like
geodesic motion. The third is the demonstration of nonclassical and even
stronger-than-quantum probabilities in Information Retrieval, their use.
Immense operable datasets provide new operationalistic environments, which
become to greater and greater extent “realities”. In this essay, I consider the
overall Information Retrieval process as an objective physical process,
representing it according to Melucci metaphor in terms of physical-like
experiments. Various semantic environments are treated as analogs of various
realities. The readers’ attention is drawn to topos approach to physical
theories, which provides a natural conceptual and technical framework to cope
with the new emerging realities.
Vivek Kulkarni, Yashar Mehdad, Troy Chevalier
Comments: 12 pages, 3 figures, 8 tables arxiv preprint
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Content on the Internet is heterogeneous and arises from various domains like
News, Entertainment, Finance and Technology. Understanding such content
requires identifying named entities (persons, places and organizations) as one
of the key steps. Traditionally Named Entity Recognition (NER) systems have
been built using available annotated datasets (like CoNLL, MUC) and demonstrate
excellent performance. However, these models fail to generalize onto other
domains like Sports and Finance where conventions and language use can differ
significantly. Furthermore, several domains do not have large amounts of
annotated labeled data for training robust Named Entity Recognition models. A
key step towards this challenge is to adapt models learned on domains where
large amounts of annotated training data are available to domains with scarce
annotated data.
In this paper, we propose methods to effectively adapt models learned on one
domain onto other domains using distributed word representations. First we
analyze the linguistic variation present across domains to identify key
linguistic insights that can boost performance across domains. We propose
methods to capture domain specific semantics of word usage in addition to
global semantics. We then demonstrate how to effectively use such domain
specific knowledge to learn NER models that outperform previous baselines in
the domain adaptation setting.
Thanapon Noraset, Chen Liang, Larry Birnbaum, Doug Downey
Comments: To appear in AAAI Conference 2017
Subjects: Computation and Language (cs.CL)
Distributed representations of words have been shown to capture lexical
semantics, as demonstrated by their effectiveness in word similarity and
analogical relation tasks. But, these tasks only evaluate lexical semantics
indirectly. In this paper, we study whether it is possible to utilize
distributed representations to generate dictionary definitions of words, as a
more direct and transparent representation of the embeddings’ semantics. We
introduce definition modeling, the task of generating a definition for a given
word and its embedding. We present several definition model architectures based
on recurrent neural networks, and experiment with the models over multiple data
sets. Our results show that a model that controls dependencies between the word
being defined and the definition words performs significantly better, and that
a character-level convolution layer designed to leverage morphology can
complement word-level embeddings. Finally, an error analysis suggests that the
errors made by a definition model may provide insight into the shortcomings of
word embeddings.
Iulian V. Serban, Alexander G. Ororbia II, Joelle Pineau, Aaron Courville
Comments: 18 pages, 2 figures, 4 tables; under review at ICLR 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Recent advances in neural variational inference have facilitated efficient
training of powerful directed graphical models with continuous latent
variables, such as variational autoencoders. However, these models usually
assume simple, uni-modal priors – such as the multivariate Gaussian
distribution – yet many real-world data distributions are highly complex and
multi-modal. Examples of complex and multi-modal distributions range from
topics in newswire text to conversational dialogue responses. When such latent
variable models are applied to these domains, the restriction of the simple,
uni-modal prior hinders the overall expressivity of the learned model as it
cannot possibly capture more complex aspects of the data distribution. To
overcome this critical restriction, we propose a flexible, simple prior
distribution which can be learned efficiently and potentially capture an
exponential number of modes of a target distribution. We develop the
multi-modal variational encoder-decoder framework and investigate the
effectiveness of the proposed prior in several natural language processing
modeling tasks, including document modeling and dialogue modeling.
Dimitrios Kalatzis, Arash Eshghi, Oliver Lemon
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
We present a method for inducing new dialogue systems from very small amounts
of unannotated dialogue data, showing how word-level exploration using
Reinforcement Learning (RL), combined with an incremental and semantic grammar
– Dynamic Syntax (DS) – allows systems to discover, generate, and understand
many new dialogue variants. The method avoids the use of expensive and
time-consuming dialogue act annotations, and supports more natural
(incremental) dialogues than turn-based systems. Here, language generation and
dialogue management are treated as a joint decision/optimisation problem, and
the MDP model for RL is constructed automatically. With an implemented system,
we show that this method enables a wide range of dialogue variations to be
automatically captured, even when the system is trained from only a single
dialogue. The variants include question-answer pairs, over- and
under-answering, self- and other-corrections, clarification interaction,
split-utterances, and ellipsis. This generalisation property results from the
structural knowledge and constraints present within the DS grammar, and
highlights some limitations of recent systems built using machine learning
techniques only.
Lahari Poddar
Subjects: Computation and Language (cs.CL)
The project aims to provide a semi-supervised approach to identify Multiword
Expressions in a multilingual context consisting of English and most of the
major Indian languages. Multiword expressions are a group of words which refers
to some conventional or regional way of saying things. If they are literally
translated from one language to another the expression will lose its inherent
meaning.
To automatically extract multiword expressions from a corpus, an extraction
pipeline have been constructed which consist of a combination of rule based and
statistical approaches. There are several types of multiword expressions which
differ from each other widely by construction. We employ different methods to
detect different types of multiword expressions. Given a POS tagged corpus in
English or any Indian language the system initially applies some regular
expression filters to narrow down the search space to certain patterns (like,
reduplication, partial reduplication, compound nouns, compound verbs, conjunct
verbs etc.). The word sequences matching the required pattern are subjected to
a series of linguistic tests which include verb filtering, named entity
filtering and hyphenation filtering test to exclude false positives. The
candidates are then checked for semantic relationships among themselves (using
Wordnet). In order to detect partial reduplication we make use of Wordnet as a
lexical database as well as a tool for lemmatising. We detect complex
predicates by investigating the features of the constituent words. Statistical
methods are applied to detect collocations. Finally, lexicographers examine the
list of automatically extracted candidates to validate whether they are true
multiword expressions or not and add them to the multiword dictionary
accordingly.
Vivek Kulkarni, Yashar Mehdad, Troy Chevalier
Comments: 12 pages, 3 figures, 8 tables arxiv preprint
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Content on the Internet is heterogeneous and arises from various domains like
News, Entertainment, Finance and Technology. Understanding such content
requires identifying named entities (persons, places and organizations) as one
of the key steps. Traditionally Named Entity Recognition (NER) systems have
been built using available annotated datasets (like CoNLL, MUC) and demonstrate
excellent performance. However, these models fail to generalize onto other
domains like Sports and Finance where conventions and language use can differ
significantly. Furthermore, several domains do not have large amounts of
annotated labeled data for training robust Named Entity Recognition models. A
key step towards this challenge is to adapt models learned on domains where
large amounts of annotated training data are available to domains with scarce
annotated data.
In this paper, we propose methods to effectively adapt models learned on one
domain onto other domains using distributed word representations. First we
analyze the linguistic variation present across domains to identify key
linguistic insights that can boost performance across domains. We propose
methods to capture domain specific semantics of word usage in addition to
global semantics. We then demonstrate how to effectively use such domain
specific knowledge to learn NER models that outperform previous baselines in
the domain adaptation setting.
Wenjie Pei, Tadas Baltrušaitis, David M.J. Tax, Louis-Philippe Morency
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Typical techniques for sequence classification are designed for
well-segmented sequences which has been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
which are expected in real-world applications. We present the Temporal
Attention-Gated Model (TAGM) which is able to deal with noisy sequences. Our
model assimilates ideas from attention models and gated recurrent networks.
Specifically, we employ an attention model to measure the relevance of each
time step of a sequence to the final decision. We then use the relevant
segments based on their attention scores in a novel gated recurrent network to
learn the hidden representation for the classification. More importantly, our
attention weights provide a physically meaningful interpretation for the
salience of each time step in the sequence. We demonstrate the merits of our
model in both interpretability and classification performance on a variety of
tasks, including speech recognition, textual sentiment analysis and event
recognition.
Siqi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, Kevin Murphy
Comments: Under review at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
In this paper, we propose a novel training procedure for image captioning
models based on policy gradient methods. This allows us to directly optimize
for the metrics of interest, rather than just maximizing likelihood of human
generated captions. We show that by optimizing for standard metrics such as
BLEU, CIDEr, METEOR and ROUGE, we can develop a system that improve on the
metrics and ranks first on the MSCOCO image captioning leader board, even
though our CNN-RNN model is much simpler than state of the art models. We
further show that by also optimizing for the recently introduced SPICE metric,
which measures semantic quality of captions, we can produce a system that
significantly outperforms other methods as measured by human evaluation.
Finally, we show how we can leverage extra sources of information, such as
pre-trained image tagging models, to further improve quality
Stefano Borgo, Loris Bozzato, Alessio Palmero Aprosio, Marco Rospocher, Luciano Serafini
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Systems for automatic extraction of semantic information about events from
large textual resources are now available: these tools are capable to generate
RDF datasets about text extracted events and this knowledge can be used to
reason over the recognized events. On the other hand, text based tasks for
event recognition, as for example event coreference (i.e. recognizing whether
two textual descriptions refer to the same event), do not take into account
ontological information of the extracted events in their process. In this
paper, we propose a method to derive event coreference on text extracted event
data using semantic based rule reasoning. We demonstrate our method considering
a limited (yet representative) set of event types: we introduce a formal
analysis on their ontological properties and, on the base of this, we define a
set of coreference criteria. We then implement these criteria as RDF-based
reasoning rules to be applied on text extracted event data. We evaluate the
effectiveness of our approach over a standard coreference benchmark dataset.
Tianyu Wu, Kun Yuan, Qing Ling, Wotao Yin, Ali H. Sayed
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
We propose an asynchronous, decentralized algorithm for consensus
optimization. The algorithm runs over a network in which the agents communicate
with their neighbors and perform local computation. In the proposed algorithm,
each agent can compute and communicate independently at different times, for
different durations, with the information it has even if the latest information
from its neighbors is not yet available. Such an asynchronous algorithm reduces
the time that agents would otherwise waste idle because of communication delays
or because their neighbors are slower. It also eliminates the need for a global
clock for synchronization. Mathematically, the algorithm involves both primal
and dual variables, uses fixed step-size parameters, and provably converges to
the exact solution under a bounded delay assumption and a random agent
assumption. When running synchronously, the algorithm performs just as well as
existing competitive synchronous algorithms such as PG-EXTRA, which diverges
without synchronization. Numerical experiments confirm the theoretical findings
and illustrate the performance of the proposed algorithm.
Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Deep reinforcement learning (RL) can acquire complex behaviors from low-level
inputs, such as images. However, real-world applications of such methods
require generalizing to the vast variability of the real world. Deep networks
are known to achieve remarkable generalization when provided with massive
amounts of labeled data, but can we provide this breadth of experience to an RL
agent, such as a robot? The robot might continuously learn as it explores the
world around it, even while deployed. However, this learning requires access to
a reward function, which is often hard to measure in real-world domains, where
the reward could depend on, for example, unknown positions of objects or the
emotional state of the user. Conversely, it is often quite practical to provide
the agent with reward functions in a limited set of situations, such as when a
human supervisor is present or in a controlled setting. Can we make use of this
limited supervision, and still benefit from the breadth of experience an agent
might collect on its own? In this paper, we formalize this problem as
semisupervised reinforcement learning, where the reward function can only be
evaluated in a set of “labeled” MDPs, and the agent must generalize its
behavior to the wide range of states it might encounter in a set of “unlabeled”
MDPs, by using experience from both settings. Our proposed method infers the
task objective in the unlabeled MDPs through an algorithm that resembles
inverse RL, using the agent’s own prior experience in the labeled MDPs as a
kind of demonstration of optimal behavior. We evaluate our method on
challenging tasks that require control directly from images, and show that our
approach can improve the generalization of a learned deep neural network policy
by using experience for which no reward function is available. We also show
that our method outperforms direct supervised learning of the reward.
Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy
Comments: 13 pages, 4 figures, ICLR2017 Submission
Subjects: Learning (cs.LG); Information Theory (cs.IT)
We present a variational approximation to the information bottleneck of
Tishby et al. (1999). This variational approach allows us to parameterize the
information bottleneck model using a neural network and leverage the
reparameterization trick for efficient training. We call this method “Deep
Variational Information Bottleneck”, or Deep VIB. We show that models trained
with the VIB objective outperform those that are trained with other forms of
regularization, in terms of generalization performance and robustness to
adversarial attack.
Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, Maarten de Rijke
Comments: 10 pages, What If workshop NIPS 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
The ability to perform effective off-policy learning would revolutionize the
process of building better interactive systems, such as search engines and
recommendation systems for e-commerce, computational advertising and news.
Recent approaches for off-policy evaluation and learning in these settings
appear promising. With this paper, we provide real-world data and a
standardized test-bed to systematically investigate these algorithms using data
from display advertising. In particular, we consider the problem of filling a
banner ad with an aggregate of multiple products the user may want to purchase.
This paper presents our test-bed, the sanity checks we ran to ensure its
validity, and shows results comparing state-of-the-art off-policy learning
methods like doubly robust optimization, POEM, and reductions to supervised
learning using regression baselines. Our results show experimental evidence
that recent off-policy learning methods can improve upon state-of-the-art
supervised learning techniques on a large-scale real-world data set.
Beilun Wang, Ji Gao, Yanjun Qi
Comments: 20 pages , submitting to ICLR 2017
Subjects: Learning (cs.LG)
Recent literature has pointed out that machine learning classifiers,
including deep neural networks (DNN), are vulnerable to adversarial samples
that are maliciously created inputs that force a machine learning classifier to
produce wrong output labels. Multiple studies have tried to analyze and thus
harden machine classifiers under such adversarial noise (AN). However, they are
mostly empirical and provide little understanding of the underlying principles
that enable evaluation of the robustness of a classier against AN. This paper
proposes a unified framework using two metric spaces to evaluate classifiers’
robustness against AN and provides general guidance for hardening such
classifiers. The central idea of our work is that for a certain classification
task, the robustness of a classifier (f_1) against AN is decided by both (f_1)
and its oracle (f_2) (like human annotator of that specific task). In
particular: (1) By adding oracle (f_2) into the framework, we provide a general
definition of the adversarial sample problem. (2) We theoretically formulate a
definition that decides whether a classifier is always robust against AN
(strong-robustness); (3) Using two metric spaces ((X_1,d_1)) and ((X_2,d_2))
defined by (f_1) and (f_2) respectively, we prove that the topological
equivalence between ((X_1,d_1)) and ((X_2,d_2)) is sufficient in deciding
whether (f_1) is strong-robust at test time, or not; (5) By training a DNN
classifier using the Siamese architecture, we propose a new defense strategy
“Siamese training” to intuitively approach topological equivalence between
((X_1,d_1)) and ((X_2,d_2)). Experimental results show that Siamese training
helps multiple DNN models achieve better accuracy compared to previous defense
strategies in an adversarial setting. DNN models after Siamese training exhibit
better robustness than the state-of-the-art baselines.
Zakaria Mhammedi, Andrew Hellicar, Ashfaqur Rahman, James Bailey
Subjects: Learning (cs.LG)
Recurrent Neural Networks (RNNs) have been successfully used in many
applications. However, the problem of learning long-term dependencies in
sequences using these networks is still a major challenge. Recent methods have
been suggested to solve this problem by constraining the transition matrix to
be unitary during training, which ensures that its norm is exactly equal to
one. These methods either have limited expressiveness or scale poorly with the
size of the network when compared to the simple RNN case, especially in an
online learning setting. Our contributions are as follows. We first show that
constraining the transition matrix to be unitary is a special case of an
orthogonal constraint. Therefore, it may not be necessary to work with complex
valued matrices. Then we present a new parametrisation of the transition matrix
which allows efficient training of an RNN while ensuring that the transition
matrix is always orthogonal. Using our approach, one online gradient step can,
in the worst case, be performed in time complexity (mathcal{O}(T n^2)), where
(T) and (n) are the length of the input sequence and the size of the hidden
layer respectively. This time complexity is the same as the simple RNN case.
Finally, we test our new parametrisation on problems with long-term
dependencies. Our results suggest that the orthogonal constraint on the
transition matrix has similar benefits to the unitary constraint.
Singh Vijendra, Hemjyotsana Parashar, Nisha Vasudeva
Subjects: Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)
Decision tree is an important method for both induction research and data
mining, which is mainly used for model classification and prediction. ID3
algorithm is the most widely used algorithm in the decision tree so far. In
this paper, the shortcoming of ID3’s inclining to choose attributes with many
values is discussed, and then a new decision tree algorithm which is improved
version of ID3. In our proposed algorithm attributes are divided into groups
and then we apply the selection measure 5 for these groups. If information gain
is not good then again divide attributes values into groups. These steps are
done until we get good classification/misclassification ratio. The proposed
algorithms classify the data sets more accurately and efficiently.
Zizhan Zheng, Ness B. Shroff, Prasant Mohapatra
Comments: 9 pages, 2 figures; accepted by the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA, Feb. 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cybersecurity is increasingly threatened by advanced and persistent attacks.
As these attacks are often designed to disable a system (or a critical
resource, e.g., a user account) repeatedly, it is crucial for the defender to
keep updating its security measures to strike a balance between the risk of
being compromised and the cost of security updates. Moreover, these decisions
often need to be made with limited and delayed feedback due to the stealthy
nature of advanced attacks. In addition to targeted attacks, such an optimal
timing policy under incomplete information has broad applications in
cybersecurity. Examples include key rotation, password change, application of
patches, and virtual machine refreshing. However, rigorous studies of optimal
timing are rare. Further, existing solutions typically rely on a pre-defined
attack model that is known to the defender, which is often not the case in
practice. In this work, we make an initial effort towards achieving optimal
timing of security updates in the face of unknown stealthy attacks. We consider
a variant of the influential FlipIt game model with asymmetric feedback and
unknown attack time distribution, which provides a general model to consecutive
security updates. The defender’s problem is then modeled as a time associative
bandit problem with dependent arms. We derive upper confidence bound based
learning policies that achieve low regret compared with optimal periodic
defense strategies that can only be derived when attack time distributions are
known.
Maria-Florina Balcan, Hongyang Zhang
Comments: 24 pages, 5 figures in NIPS 2016
Subjects: Learning (cs.LG)
We study the problem of recovering an incomplete (m imes n) matrix of rank
(r) with columns arriving online over time. This is known as the problem of
life-long matrix completion, and is widely applied to recommendation system,
computer vision, system identification, etc. The challenge is to design
provable algorithms tolerant to a large amount of noises, with small sample
complexity. In this work, we give algorithms achieving strong guarantee under
two realistic noise models. In bounded deterministic noise, an adversary can
add any bounded yet unstructured noise to each column. For this problem, we
present an algorithm that returns a matrix of a small error, with sample
complexity almost as small as the best prior results in the noiseless case. For
sparse random noise, where the corrupted columns are sparse and drawn randomly,
we give an algorithm that exactly recovers an (mu_0)-incoherent matrix by
probability at least (1-delta) with sample complexity as small as
(Oleft(mu_0rnlog (r/delta)
ight)). This result advances the
state-of-the-art work and matches the lower bound in a worst case. We also
study the scenario where the hidden matrix lies on a mixture of subspaces and
show that the sample complexity can be even smaller. Our proposed algorithms
perform well experimentally in both synthetic and real-world datasets.
Ehsan Amid, Aristides Gionis, Antti Ukkonen
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We consider the problem of metric learning subject to a set of constraints on
relative-distance comparisons between the data items. Such constraints are
meant to reflect side-information that is not expressed directly in the feature
vectors of the data items. The relative-distance constraints used in this work
are particularly effective in expressing structures at finer level of detail
than must-link (ML) and cannot-link (CL) constraints, which are most commonly
used for semi-supervised clustering. Relative-distance constraints are thus
useful in settings where providing an ML or a CL constraint is difficult
because the granularity of the true clustering is unknown.
Our main contribution is an efficient algorithm for learning a kernel matrix
using the log determinant divergence — a variant of the Bregman divergence
— subject to a set of relative-distance constraints. The learned kernel
matrix can then be employed by many different kernel methods in a wide range of
applications. In our experimental evaluations, we consider a semi-supervised
clustering setting and show empirically that kernels found by our algorithm
yield clusterings of higher quality than existing approaches that either use
ML/CL constraints or a different means to implement the supervision using
relative comparisons.
Joachim van der Herten, Ivo Couckuyt, Tom Dhaene
Comments: 5 pages, 3 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Student-(t) processes have recently been proposed as an appealing alternative
non-parameteric function prior. They feature enhanced flexibility and
predictive variance. In this work the use of Student-(t) processes are explored
for multi-objective Bayesian optimization. In particular, an analytical
expression for the hypervolume-based probability of improvement is developed
for independent Student-(t) process priors of the objectives. Its effectiveness
is shown on a multi-objective optimization problem which is known to be
difficult with traditional Gaussian processes.
Wesley Tansey, Edward W. Lowe Jr., James G. Scott
Comments: Accepted to the NIPS 2016 Workshop on Machine Learning for Health
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Applications (stat.AP)
Smart phone apps that enable users to easily track their diets have become
widespread in the last decade. This has created an opportunity to discover new
insights into obesity and weight loss by analyzing the eating habits of the
users of such apps. In this paper, we present diet2vec: an approach to modeling
latent structure in a massive database of electronic diet journals. Through an
iterative contract-and-expand process, our model learns real-valued embeddings
of users’ diets, as well as embeddings for individual foods and meals. We
demonstrate the effectiveness of our approach on a real dataset of 55K users of
the popular diet-tracking app LoseItfootnote{this http URL}. To the
best of our knowledge, this is the largest fine-grained diet tracking study in
the history of nutrition and obesity research. Our results suggest that
diet2vec finds interpretable results at all levels, discovering intuitive
representations of foods, meals, and diets.
Iulian V. Serban, Alexander G. Ororbia II, Joelle Pineau, Aaron Courville
Comments: 18 pages, 2 figures, 4 tables; under review at ICLR 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Recent advances in neural variational inference have facilitated efficient
training of powerful directed graphical models with continuous latent
variables, such as variational autoencoders. However, these models usually
assume simple, uni-modal priors – such as the multivariate Gaussian
distribution – yet many real-world data distributions are highly complex and
multi-modal. Examples of complex and multi-modal distributions range from
topics in newswire text to conversational dialogue responses. When such latent
variable models are applied to these domains, the restriction of the simple,
uni-modal prior hinders the overall expressivity of the learned model as it
cannot possibly capture more complex aspects of the data distribution. To
overcome this critical restriction, we propose a flexible, simple prior
distribution which can be learned efficiently and potentially capture an
exponential number of modes of a target distribution. We develop the
multi-modal variational encoder-decoder framework and investigate the
effectiveness of the proposed prior in several natural language processing
modeling tasks, including document modeling and dialogue modeling.
Michael B. Chang, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum
Comments: Under review as a conference paper for ICLR 2017. 11 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
We present the Neural Physics Engine (NPE), an object-based neural network
architecture for learning predictive models of intuitive physics. We propose a
factorization of a physical scene into composable object-based representations
and also the NPE architecture whose compositional structure factorizes object
dynamics into pairwise interactions. Our approach draws on the strengths of
both symbolic and neural approaches: like a symbolic physics engine, the NPE is
endowed with generic notions of objects and their interactions, but as a neural
network it can also be trained via stochastic gradient descent to adapt to
specific object properties and dynamics of different worlds. We evaluate the
efficacy of our approach on simple rigid body dynamics in two-dimensional
worlds. By comparing to less structured architectures, we show that our model’s
compositional representation of the structure in physical interactions improves
its ability to predict movement, generalize to different numbers of objects,
and infer latent properties of objects such as mass.
Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Rezende, Koray Kavukcuoglu
Comments: Published in NIPS 2016
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Reasoning about objects, relations, and physics is central to human
intelligence, and a key goal of artificial intelligence. Here we introduce the
interaction network, a model which can reason about how objects in complex
systems interact, supporting dynamical predictions, as well as inferences about
the abstract properties of the system. Our model takes graphs as input,
performs object- and relation-centric reasoning in a way that is analogous to a
simulation, and is implemented using deep neural networks. We evaluate its
ability to reason about several challenging physical domains: n-body problems,
rigid-body collision, and non-rigid dynamics. Our results show it can be
trained to accurately simulate the physical trajectories of dozens of objects
over thousands of time steps, estimate abstract quantities such as energy, and
generalize automatically to systems with different numbers and configurations
of objects and relations. Our interaction network implementation is the first
general-purpose, learnable physics engine, and a powerful general framework for
reasoning about object and relations in a wide variety of complex real-world
domains.
Sven Banisch, Eckehard Olbrich
Comments: Accepted for publication in the Journal of Artificial Societies and Social Simulation (JASSS)
Subjects: Economics (q-fin.EC); Learning (cs.LG); Adaptation and Self-Organizing Systems (nlin.AO)
In this paper, we develop an agent-based version of the Diamond search
equilibrium model – also called Coconut Model. In this model, agents are faced
with production decisions that have to be evaluated based on their expectations
about the future utility of the produced entity which in turn depends on the
global production level via a trading mechanism. While the original dynamical
systems formulation assumes an infinite number of homogeneously adapting agents
obeying strong rationality conditions, the agent-based setting allows to
discuss the effects of heterogeneous and adaptive expectations and enables the
analysis of non-equilibrium trajectories. Starting from a baseline
implementation that matches the asymptotic behavior of the original model, we
show how agent heterogeneity can be accounted for in the aggregate dynamical
equations. We then show that when agents adapt their strategies by a simple
temporal difference learning scheme, the system converges to one of the fixed
points of the original system. Systematic simulations reveal that this is the
only stable equilibrium solution.
He Wen, Shuchang Zhou, Zhe Liang, Yuxiang Zhang, Dieqiao Feng, Xinyu Zhou, Cong Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Fully convolutional neural networks give accurate, per-pixel prediction for
input images and have applications like semantic segmentation. However, a
typical FCN usually requires lots of floating point computation and large
run-time memory, which effectively limits its usability. We propose a method to
train Bit Fully Convolution Network (BFCN), a fully convolutional neural
network that has low bit-width weights and activations. Because most of its
computation-intensive convolutions are accomplished between low bit-width
numbers, a BFCN can be accelerated by an efficient bit-convolution
implementation. On CPU, the dot product operation between two bit vectors can
be reduced to bitwise operations and popcounts, which can offer much higher
throughput than 32-bit multiplications and additions.
To validate the effectiveness of BFCN, we conduct experiments on the PASCAL
VOC 2012 semantic segmentation task and Cityscapes. Our BFCN with 1-bit weights
and 2-bit activations, which runs 7.8x faster on CPU or requires less than 1\%
resources on FPGA, can achieve comparable performance as the 32-bit
counterpart.
G. Ferré, T. Haut, K. Barros
Subjects: Computational Physics (physics.comp-ph); Learning (cs.LG); Machine Learning (stat.ML)
Recent machine learning methods make it possible to model potential energy of
atomic configurations with chemical-level accuracy (as calculated from
ab-initio calculations) and at speeds suitable for molecular dynamics
simulation. Best performance is achieved when the known physical constraints
are encoded in the machine learning models. For example, the atomic energy is
invariant under global translations and rotations; it is also invariant to
permutations of same-species atoms. Although simple to state, these symmetries
are complicated to encode into machine learning algorithms. In this paper, we
present a machine learning approach based on graph theory that naturally
incorporates translation, rotation, and permutation symmetries. Specifically,
we use a random walk graph kernel to measure the similarity of two adjacency
matrices, each of which represents a local atomic environment. We show on a
standard benchmark that our Graph Approximated Energy (GRAPE) method is
competitive with state of the art kernel methods. Furthermore, the GRAPE
framework is flexible and admits many possible extensions.
Pedro Tabacof, Julia Tavares, Eduardo Valle
Comments: Workshop on Adversarial Training, NIPS 2016, Barcelona, Spain
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We investigate adversarial attacks for autoencoders. We propose a procedure
that distorts the input image to mislead the autoencoder in reconstructing a
completely different target image. We attack the internal latent
representations, attempting to make the adversarial input produce an internal
representation as similar as possible as the target’s. We find that
autoencoders are much more robust to the attack than classifiers: while some
examples have tolerably small input distortion, and reasonable similarity to
the target image, there is a quasi-linear trade-off between those aims. We
report results on MNIST and SVHN datasets, and also test regular deterministic
autoencoders, reaching similar conclusions in all cases. Finally, we show that
the usual adversarial attack for classifiers, while being much easier, also
presents a direct proportion between distortion on the input, and misdirection
on the output. That proportionality however is hidden by the normalization of
the output, which maps a linear layer into non-linear probabilities.
Jonathan Scarlett, Alfonso Martinez, Albert Guillén i Fàbregas
Comments: Submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
This paper studies channel coding for the discrete memoryless multiple-access
channel with a given (possibly suboptimal) decoding rule. A multi-letter
successive decoding rule depending on an arbitrary non-negative decoding metric
is considered, and achievable rate regions and error exponents are derived both
for the standard MAC (independent codebooks), and for the cognitive MAC (one
user knows both messages) with superposition coding. In the cognitive case, the
rate region and error exponent are shown to be tight with respect to the
ensemble average. The rate regions are compared with those of the
maximum-metric decoder, and numerical examples are given for which successive
decoding yields a strictly higher sum rate for a given pair of input
distributions.
Amine Mezghani, A. Lee Swindlehurst
Comments: 5 pages, 1 figure, submitted to the International ITG Workshop on Smart Antennas (WSA 2017)
Subjects: Information Theory (cs.IT)
We provide a maximum likelihood formulation for the blind estimation of
massive mmWave MIMO channels while taking into account their underlying sparse
structure. The main advantage of this approach is the fact that the overhead
due to pilot sequences can be reduced dramatically especially when operating at
low SNR per antenna. Thereby, the sparsity in the angular domain is exploited
as a key property to enable the unambiguous blind separation between user’s
channels. On the other hand, as only the sparsity is assumed, the proposed
method is robust with respect to the statistical properties of the channel and
data and allows the estimation in rapidly time-varying scenarios and eventually
the separation of interfering users from adjacent base stations. Additionally,
a performance limit is derived based on the clairvoyant Cram’er Rao lower
bound. Simulation results demonstrate that this maximum likelihood formulation
yields superior estimation accuracy with reasonable computational complexity
and limited model assumptions.
Mengfan Zheng, Meixia Tao, Wen Chen, Cong Ling
Comments: 29 pages, 7 figures, submitted to IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)
This paper studies polar coding for secure communications over the general
two-way wiretap channel, where two legitimate users communicate with each other
simultaneously while a passive eavesdropper overhears a combination of their
exchanged signals. The legitimate users wish to design a coding scheme such
that the interference between their codewords can be leveraged to jam the
eavesdropper. This security method is called coded cooperative jamming. In this
model, the eavesdropper observes a two-user multiple access channel (MAC).
Inspired by recent studies on polar coding for asymmetric channels,
Slepian-Wolf coding, MACs and general wiretap channels, we design a polar
code-based cooperative jamming code that achieves the whole secrecy rate region
of the general two-way wiretap channel under the strong secrecy criterion. To
make proper alignment of polar indices, a multi-block strategy is used. For the
special case when the eavesdropper channel is degraded with respect to both
legitimate channels, a simplified scheme is proposed which can simultaneously
ensure reliability and weak secrecy within a single transmission block. An
example of the binary erasure channel case is given to demonstrate the
performance of our scheme.
Minjia Shi, Yan Liu, Patrick Solé
Comments: 12 pages, submitted on 27 October, 2016
Subjects: Information Theory (cs.IT)
We construct an infinite family of two-Lee-weight and three-Lee-weight codes
over the chain ring (mathbb{F}_p+umathbb{F}_p.) They have the algebraic
structure of abelian codes. Their Lee weight distribution is computed by using
Gauss sums. Then by using a linear Gray map, we obtain an infinite family of
abelian codes with few weights over (mathbb{F}_p). In particular, we obtain an
infinite family of two-weight codes which meets the Griesmer bound with
equality. Finally, an application to secret sharing schemes is given.
Yan Liu, Minjia Shi, Patrick Solé
Comments: 15 pages, submitted on 21 November, 2016
Subjects: Information Theory (cs.IT)
We construct a class of three-Lee-weight and two infinite families of
five-Lee-weight codes over the ring (R=mathbb{F}_2 +vmathbb{F}_2
+v^2mathbb{F}_2 +v^3mathbb{F}_2 +v^4mathbb{F}_2,) where (v^5=1.) The same
ring occurs in the quintic construction of binary quasi-cyclic codes. %The
length of these codes depends on the degree (m) of ring extension. They have
the algebraic structure of abelian codes. Their Lee weight distribution is
computed by using character sums. Given a linear Gray map, we obtain three
families of binary abelian codes with few weights. In particular, we obtain a
class of three-weight codes which are optimal. Finally, an application to
secret sharing schemes is given.
Minjia Shi, Hongwei Zhu, Patrick Solé
Comments: 12 pages, submitted on 28 November, 2016
Subjects: Information Theory (cs.IT)
In this paper, we construct an infinite family of three-weight binary codes
from linear codes over the ring (R=mathbb{F}_2+vmathbb{F}_2+v^2mathbb{F}_2),
where (v^3=1.) These codes are defined as trace codes. They have the algebraic
structure of abelian codes. Their Lee weight distributions are computed by
employing character sums. The three-weight binary linear codes which we
construct are shown to be optimal when (m) is odd and (m>1). They are cubic,
that is to say quasi-cyclic of co-index three. An application to secret sharing
schemes is given.
Yan Liu, Minjia Shi, Patrick Solé
Comments: 11 pages, submitted on 29 November,2016
Subjects: Information Theory (cs.IT)
We construct an infinite family of two-Lee-weight and three-Lee-weight codes
over the non-chain ring
(mathbb{F}_p+umathbb{F}_p+vmathbb{F}_p+uvmathbb{F}_p,) where
(u^2=0,v^2=0,uv=vu.) These codes are defined as trace codes. They have the
algebraic structure of abelian codes. Their Lee weight distribution is computed
by using Gauss sums. With a linear Gray map, we obtain a class of abelian
three-weight codes and two-weight codes over (mathbb{F}_p). In particular, the
two-weight codes we describe are shown to be optimal by application of the
Griesmer bound. We also discuss their dual Lee distance. Finally, an
application to secret sharing schemes is given.
Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy
Comments: 13 pages, 4 figures, ICLR2017 Submission
Subjects: Learning (cs.LG); Information Theory (cs.IT)
We present a variational approximation to the information bottleneck of
Tishby et al. (1999). This variational approach allows us to parameterize the
information bottleneck model using a neural network and leverage the
reparameterization trick for efficient training. We call this method “Deep
Variational Information Bottleneck”, or Deep VIB. We show that models trained
with the VIB objective outperform those that are trained with other forms of
regularization, in terms of generalization performance and robustness to
adversarial attack.
Theo van Uem
Comments: 29 pages
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Information Theory (cs.IT)
The Hat Game (Ebert’s Hat Problem) got much attention in the beginning of
this century; not in the last place by it’s connections to coding theory and
computer science. There were publications in The New York Times, Die Zeit and
abcNews. Exact solutions with two colors are only known in the symmetric case
(equal probabilities for the two players) when N=2^k-1 (using Hamming codes),
N=2^k (extended Hamming codes) and up to N=9 (using bounds on covering codes of
radius 1), where N is the number of players. How the probabilities and
strategies behave when the two colors are not equally likely (asymmetric case),
is an open problem. Where the symmetric case is hard, both mathematically and
from the point of view of computational complexity, we may expect the
asymmetric case to be harder and perhaps beyond the capabilities of
contemporary mathematics and computer science. However there is a surprising
answer to the open problem: elementary mathematics in combination with adequate
computer programs suffices to do the job and the new approach gives also new
insights in the classical symmetric case. Where the standard theory in the
symmetric case works with Hamming codes and covering sets of radius 1, the new
approach deals with adequate sets of radius N-1. Our main results in this paper
are: a simple and effective way to analyze N-person two color hat problems,
and: a dramatically decrease of computational complexity.