Mansoureh Aghabeig, Andrzej Jaszkiewicz
Subjects: Neural and Evolutionary Computing (cs.NE)
In this paper we systematically study the importance, i.e., the influence on
performance, of the main design elements that differentiate scalarizing
functions-based multiobjective evolutionary algorithms (MOEAs). This class of
MOEAs includes Multiobjecitve Genetic Local Search (MOGLS) and Multiobjective
Evolutionary Algorithm Based on Decomposition (MOEA/D) and proved to be very
successful in multiple computational experiments and practical applications.
The two algorithms share the same common structure and differ only in two main
aspects. Using three different multiobjective combinatorial optimization
problems, i.e., the multiobjective symmetric traveling salesperson problem, the
traveling salesperson problem with profits, and the multiobjective set covering
problem, we show that the main differentiating design element is the mechanism
for parent selection, while the selection of weight vectors, either random or
uniformly distributed, is practically negligible if the number of uniform
weight vectors is sufficiently large.
Shumeet Baluja, Ian Fischer
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Multiple different approaches of generating adversarial examples have been
proposed to attack deep neural networks. These approaches involve either
directly computing gradients with respect to the image pixels, or directly
solving an optimization on the image pixels. In this work, we present a
fundamentally new method for generating adversarial examples that is fast to
execute and provides exceptional diversity of output. We efficiently train
feed-forward neural networks in a self-supervised manner to generate
adversarial examples against a target network or set of networks. We call such
a network an Adversarial Transformation Network (ATN). ATNs are trained to
generate adversarial examples that minimally modify the classifier’s outputs
given the original input, while constraining the new classification to match an
adversarial target class. We present methods to train ATNs and analyze their
effectiveness targeting a variety of MNIST classifiers as well as the latest
state-of-the-art ImageNet classifier Inception ResNet v2.
Santiago Pascual, Antonio Bonafonte, Joan Serrà
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
Current speech enhancement techniques operate on the spectral domain and/or
exploit some higher-level feature. The majority of them tackle a limited number
of noise conditions and rely on first-order statistics. To circumvent these
issues, deep networks are being increasingly used, thanks to their ability to
learn complex functions from large example sets. In this work, we propose the
use of generative adversarial networks for speech enhancement. In contrast to
current techniques, we operate at the waveform level, training the model
end-to-end, and incorporate 28 speakers and 40 different noise conditions into
the same model, such that model parameters are shared across them. We evaluate
the proposed model using an independent, unseen test set with two speakers and
20 alternative noise conditions. The enhanced samples confirm the viability of
the proposed model, and both objective and subjective evaluations confirm the
effectiveness of it. With that, we open the exploration of generative
architectures for speech enhancement, which may progressively incorporate
further speech-centric design choices to improve their performance.
Yichao Lu, Phillip Keung, Shaonan Zhang, Jason Sun, Vikas Bhardwaj
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
We describe a prototype dialogue response generation model for the customer
service domain at Amazon. The model, which is trained in a weakly supervised
fashion, measures the similarity between customer questions and agent answers
using a dual encoder network, a Siamese-like neural network architecture.
Answer templates are extracted from embeddings derived from past agent answers,
without turn-by-turn annotations. Responses to customer inquiries are generated
by selecting the best template from the final set of templates. We show that,
in a closed domain like customer service, the selected templates cover (>)70\%
of past customer inquiries. Furthermore, the relevance of the model-selected
templates is significantly higher than templates selected by a standard tf-idf
baseline.
Nasim Souly, Concetto Spampinato, Mubarak Shah
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Semantic segmentation has been a long standing challenging task in computer
vision. It aims at assigning a label to each image pixel and needs significant
number of pixellevel annotated data, which is often unavailable. To address
this lack, in this paper, we leverage, on one hand, massive amount of available
unlabeled or weakly labeled data, and on the other hand, non-real images
created through Generative Adversarial Networks. In particular, we propose a
semi-supervised framework ,based on Generative Adversarial Networks (GANs),
which consists of a generator network to provide extra training examples to a
multi-class classifier, acting as discriminator in the GAN framework, that
assigns sample a label y from the K possible classes or marks it as a fake
sample (extra class). The underlying idea is that adding large fake visual data
forces real samples to be close in the feature space, enabling a bottom-up
clustering process, which, in turn, improves multiclass pixel classification.
To ensure higher quality of generated images for GANs with consequent improved
pixel classification, we extend the above framework by adding weakly annotated
data, i.e., we provide class level information to the generator. We tested our
approaches on several challenging benchmarking visual datasets, i.e. PASCAL,
SiftFLow, Stanford and CamVid, achieving competitive performance also compared
to state-of-the-art semantic segmentation method
Fei Jiang, Xiao-Yang Liu, Hongtao Lu, Ruimin Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Sparse coding (SC) is an automatic feature extraction and selection technique
that is widely used in unsupervised learning. However, conventional SC
vectorizes the input images, which breaks apart the local proximity of pixels
and destructs the elementary object structures of images. In this paper, we
propose a novel two-dimensional sparse coding (2DSC) scheme that represents the
input images as the tensor-linear combinations under a novel algebraic
framework. 2DSC learns much more concise dictionaries because it uses the
circular convolution operator, since the shifted versions of atoms learned by
conventional SC are treated as the same ones. We apply 2DSC to natural images
and demonstrate that 2DSC returns meaningful dictionaries for large patches.
Moreover, for mutli-spectral images denoising, the proposed 2DSC reduces
computational costs with competitive performance in comparison with the
state-of-the-art algorithms.
Kushal Kafle, Christopher Kanan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
In visual question answering (VQA), an algorithm must answer text-based
questions about images. While multiple datasets for VQA have been created since
late 2014, they all have flaws in both their content and the way algorithms are
evaluated on them. As a result, evaluation scores are inflated and
predominantly determined by answering easier questions, making it difficult to
compare different methods. In this paper, we analyze existing VQA algorithms
using a new dataset. It contains over 1.6 million questions organized into 12
different categories. We also introduce questions that are meaningless for a
given image to force a VQA system to reason about image content. We propose new
evaluation schemes that compensate for over-represented question-types and make
it easier to study the strengths and weaknesses of algorithms. We analyze the
performance of both baseline and state-of-the-art VQA models, including
multi-modal compact bilinear pooling (MCB), neural module networks, and
recurrent answering units. Our experiments establish how attention helps
certain categories more than others, determine which models work better than
others, and explain how simple models (e.g. MLP) can surpass more complex
models (MCB) by simply learning to answer large, easy question categories.
Zhiyuan Shi, Tae-Kyun Kim
Comments: conference cvpr 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Existing RNN-based approaches for action recognition from depth sequences
require either skeleton joints or hand-crafted depth features as inputs. An
end-to-end manner, mapping from raw depth maps to action classes, is
non-trivial to design due to the fact that: 1) single channel map lacks texture
thus weakens the discriminative power; 2) relatively small set of depth
training data. To address these challenges, we propose to learn an RNN driven
by privileged information (PI) in three-steps: An encoder is pre-trained to
learn a joint embedding of depth appearance and PI (i.e. skeleton joints). The
learned embedding layers are then tuned in the learning step, aiming to
optimize the network by exploiting PI in a form of multi-task loss. However,
exploiting PI as a secondary task provides little help to improve the
performance of a primary task (i.e. classification) due to the gap between
them. Finally, a bridging matrix is defined to connect two tasks by discovering
latent PI in the refining step. Our PI-based classification loss maintains a
consistency between latent PI and predicted distribution. The latent PI and
network are iteratively estimated and updated in an expectation-maximization
procedure. The proposed learning process provides greater discriminative power
to model subtle depth difference, while helping avoid overfitting the scarcer
training data. Our experiments show significant performance gains over
state-of-the-art methods on three public benchmark datasets and our newly
collected Blanket dataset.
Anna Khoreva, Rodrigo Benenson, Eddy Ilg, Thomas Brox, Bernt Schiele
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional networks reach top quality in pixel-level object tracking but
require a large amount of training data (1k ~ 10k) to deliver such results. We
propose a new training strategy which achieves state-of-the-art results across
three evaluation datasets while using 20x ~ 100x less annotated data than
competing methods. Instead of using large training sets hoping to generalize
across domains, we generate in-domain training data using the provided
annotation on the first frame of each video to synthesize (“lucid dream”)
plausible future video frames. In-domain per-video training data allows us to
train high quality appearance- and motion-based models, as well as tune the
post-processing stage. This approach allows to reach competitive results even
when training from only a single annotated frame, without ImageNet
pre-training. Our results indicate that using a larger training set is not
automatically better, and that for the tracking task a smaller training set
that is closer to the target domain is more effective. This changes the mindset
regarding how many training samples and general “objectness” knowledge are
required for the object tracking task.
Maxim Romanov, Matthew Thomas Miller, Sarah Bowen Savant, Benjamin Kiessling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL)
The OpenITI team has achieved Optical Character Recognition (OCR) accuracy
rates for classical Arabic-script texts in the high nineties. These numbers are
based on our tests of seven different Arabic-script texts of varying quality
and typefaces, totaling over 7,000 lines. These accuracy rates not only
represent a distinct improvement over the actual accuracy rates of the various
proprietary OCR options for classical Arabic-script texts, but, equally
important, they are produced using an open-source OCR software, thus enabling
us to make this Arabic-script OCR technology freely available to the broader
Islamic, Persian, and Arabic Studies communities.
Abel Gonzalez-Garcia, Davide Modolo, Vittorio Ferrari
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a semantic part detection approach that effectively leverages
object information. We use the object appearance and its class as indicators of
what parts to expect. We also model the expected relative location of parts
inside the objects based on their appearance. We achieve this with a new
network module, called OffsetNet, that efficiently predicts a variable number
of part locations within a given object. Our model incorporates all these cues
to detect parts in the context of their objects. This leads to significantly
higher performance for the challenging task of part detection compared to using
part appearance alone (+5 mAP on the PASCAL-Part dataset). We also compare to
other part detection methods on both PASCAL-Part and CUB200-2011 datasets.
Rajeev Ranjan, Carlos D. Castillo, Rama Chellappa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, the performance of face verification systems has
significantly improved using deep convolutional neural networks (DCNNs). A
typical pipeline for face verification includes training a deep network for
subject classification with softmax loss, using the penultimate layer output as
the feature descriptor, and generating a cosine similarity score given a pair
of face images. The softmax loss function does not optimize the features to
have higher similarity score for positive pairs and lower similarity score for
negative pairs, which leads to a performance gap. In this paper, we add an
L2-constraint to the feature descriptors which restricts them to lie on a
hypersphere of a fixed radius. This module can be easily implemented using
existing deep learning frameworks. We show that integrating this simple step in
the training pipeline significantly boosts the performance of face
verification. Specifically, we achieve state-of-the-art results on the
challenging IJB-A dataset, achieving True Accept Rates of 0.863 and 0.910 at
False Accept Rates 0.0001 and 0.001 respectively on the face verification
protocol.
Yangyang Li, Ruqian Lu
Comments: 31 pages, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (cs.NA)
Symmetric Positive Definite (SPD) matrices have been widely used as feature
descriptors in image recognition. However, the dimension of an SPD matrix built
by image feature descriptors is usually high. So SPD matrices oriented
dimensionality reduction techniques are needed. The existing manifold learning
algorithms only apply to reduce the dimension of high dimensional vector-form
data. For high dimensional SPD matrices, it is impossible to directly use
manifold learning algorithms to reduce the dimension of matrix-form data, but
we need first transform the matrix into a long vector and then reduce the
dimension of this vector. This however breaks the spatial structure of the SPD
matrix space. To overcome this limitation, we propose a new dimension reduction
algorithm on SPD matrix space to transform the high dimensional SPD matrices to
lower dimensional SPD matrices. Our work is based on the fact that the set of
all SPD matrices with the same size is known to have a Lie group structure and
we aims to transform the manifold learning algorithm to SPD matrix Lie group.
We make use of the basic idea of manifold learning algorithm LPP (locality
preserving projection) to construct the corresponding Laplacian matrix on SPD
matrix Lie group. Thus we call our approach Lie-LPP to emphasize its Lie group
character. Finally our method gets a lower dimensional and more discriminable
SPD matrix Lie group. We also show by experiments that our approach achieves
effective results on Human action recognition and Human face recognition.
Ancong Wu, Wei-Shi Zheng, Jianhuang Lai
Comments: IEEE Transactions on Image Processing Early Access
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Person re-identification (re-id) aims to match people across non-overlapping
camera views. So far the RGB-based appearance is widely used in most existing
works. However, when people appeared in extreme illumination or changed
clothes, the RGB appearance-based re-id methods tended to fail. To overcome
this problem, we propose to exploit depth information to provide more invariant
body shape and skeleton information regardless of illumination and color
change. More specifically, we exploit depth voxel covariance descriptor and
further propose a locally rotation invariant depth shape descriptor called
Eigen-depth feature to describe pedestrian body shape. We prove that the
distance between any two covariance matrices on the Riemannian manifold is
equivalent to the Euclidean distance between the corresponding Eigen-depth
features. Furthermore, we propose a kernelized implicit feature transfer scheme
to estimate Eigen-depth feature implicitly from RGB image when depth
information is not available. We find that combining the estimated depth
features with RGB-based appearance features can sometimes help to better reduce
visual ambiguities of appearance features caused by illumination and similar
clothes. The effectiveness of our models was validated on publicly available
depth pedestrian datasets as compared to related methods for person
re-identification.
Seong Joon Oh, Mario Fritz, Bernt Schiele
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT)
Users like sharing personal photos with others through social media. At the
same time, they might want to make automatic identification in such photos
difficult or even impossible. Classic obfuscation methods such as blurring are
not only unpleasant but also not as effective as one would expect. Recent
studies on adversarial image perturbations (AIP) suggest that it is possible to
confuse recognition systems effectively without unpleasant artifacts. However,
in the presence of counter measures against AIPs, it is unclear how effective
AIP would be in particular when the choice of counter measure is unknown. Game
theory provides tools for studying the interaction between agents with
uncertainties in the strategies. We introduce a general game theoretical
framework for the user-recogniser dynamics, and present a case study that
involves current state of the art AIP and person recognition techniques. We
derive the optimal strategy for the user that assures an upper bound on the
recognition rate independent of the recogniser’s counter measure.
Silvano Galliani, Charis Lanaras, Dimitrios Marmanis, Emmanuel Baltsavias, Konrad Schindler
Comments: Submitted to ICCV 2017 (10 pages, 8 figures)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We describe a novel method for blind, single-image spectral super-resolution.
While conventional super-resolution aims to increase the spatial resolution of
an input image, our goal is to spectrally enhance the input, i.e., generate an
image with the same spatial resolution, but a greatly increased number of
narrow (hyper-spectral) wave-length bands. Just like the spatial statistics of
natural images has rich structure, which one can exploit as prior to predict
high-frequency content from a low resolution image, the same is also true in
the spectral domain: the materials and lighting conditions of the observed
world induce structure in the spectrum of wavelengths observed at a given
pixel. Surprisingly, very little work exists that attempts to use this
diagnosis and achieve blind spectral super-resolution from single images. We
start from the conjecture that, just like in the spatial domain, we can learn
the statistics of natural image spectra, and with its help generate finely
resolved hyper-spectral images from RGB input. Technically, we follow the
current best practice and implement a convolutional neural network (CNN), which
is trained to carry out the end-to-end mapping from an entire RGB image to the
corresponding hyperspectral image of equal size. We demonstrate spectral
super-resolution both for conventional RGB images and for multi-spectral
satellite data, outperforming the state-of-the-art.
Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a deep convolutional decoder architecture that can generate
volumetric 3D outputs in a compute- and memory-efficient manner by using an
octree representation. The network learns to predict both the structure of the
octree, and the occupancy values of individual cells. This makes it a
particularly valuable technique for generating 3D shapes. In contrast to
standard decoders acting on regular voxel grids, the architecture does not have
cubic complexity. This allows representing much higher resolution outputs with
a limited memory budget. We demonstrate this in several application domains,
including 3D convolutional autoencoders, generation of objects and whole scenes
from high-level representations, and shape from a single image.
Rodrigo M. Ferreira, Ricardo M. Marcacini
Comments: in Portuguese
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The task of counting eucalyptus trees from aerial images collected by
unmanned aerial vehicles (UAVs) has been frequently explored by techniques of
estimation of the basal area, i.e, by determining the expected number of trees
based on sampling techniques. An alternative is the use of machine learning to
identify patterns that represent a tree unit, and then search for the
occurrence of these patterns throughout the image. This strategy depends on a
supervised image segmentation step to define predefined interest regions. Thus,
it is possible to automate the counting of eucalyptus trees in these images,
thereby increasing the efficiency of the eucalyptus forest inventory
management. In this paper, we evaluated 20 different classifiers for the image
segmentation task. A real sample was used to analyze the counting trees task
considering a practical environment. The results show that it possible to
automate this task with 0.7% counting error, in particular, by using strategies
based on a combination of classifiers. Moreover, we present some performance
considerations about each classifier that can be useful as a basis for
decision-making in future tasks.
Shohei Kumagai, Kazuhiro Hotta, Takio Kurita
Comments: 8pages, 8figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a crowd counting method. Crowd counting is difficult
because of large appearance changes of a target which caused by density and
scale changes. Conventional crowd counting methods generally utilize one
predictor (e,g., regression and multi-class classifier). However, such only one
predictor can not count targets with large appearance changes well. In this
paper, we propose to predict the number of targets using multiple CNNs
specialized to a specific appearance, and those CNNs are adaptively selected
according to the appearance of a test image. By integrating the selected CNNs,
the proposed method has the robustness to large appearance changes. In
experiments, we confirm that the proposed method can count crowd with lower
counting error than a CNN and integration of CNNs with fixed weights. Moreover,
we confirm that each predictor automatically specialized to a specific
appearance.
Wei Liu, Xiaogang Chen, Chunhua Shen, Jingyi Yu, Qiang Wu, Jie Yang
Comments: This paper is an extension of our previous work at arXiv:1512.08103 and arXiv:1506.05187
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The process of using one image to guide the filtering process of another one
is called Guided Image Filtering (GIF). The main challenge of GIF is the
structure inconsistency between the guidance image and the target image.
Besides, noise in the target image is also a challenging issue especially when
it is heavy. In this paper, we propose a general framework for Robust Guided
Image Filtering (RGIF), which contains a data term and a smoothness term, to
solve the two issues mentioned above. The data term makes our model
simultaneously denoise the target image and perform GIF which is robust against
the heavy noise. The smoothness term is able to make use of the property of
both the guidance image and the target image which is robust against the
structure inconsistency. While the resulting model is highly non-convex, it can
be solved through the proposed Iteratively Re-weighted Least Squares (IRLS) in
an efficient manner. For challenging applications such as guided depth map
upsampling, we further develop a data-driven parameter optimization scheme to
properly determine the parameter in our model. This optimization scheme can
help to preserve small structures and sharp depth edges even for a large
upsampling factor (8x for example). Moreover, the specially designed structure
of the data term and the smoothness term makes our model perform well in
edge-preserving smoothing for single-image tasks (i.e., the guidance image is
the target image itself). This paper is an extension of our previous work [1],
[2].
Fei Jiang, Xiao-Yang Liu, Hongtao Lu, Ruimin Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Sparse coding (SC) is an unsupervised learning scheme that has received an
increasing amount of interests in recent years. However, conventional SC
vectorizes the input images, which destructs the intrinsic spatial structures
of the images. In this paper, we propose a novel graph regularized tensor
sparse coding (GTSC) for image representation. GTSC preserves the local
proximity of elementary structures in the image by adopting the newly proposed
tubal-tensor representation. Simultaneously, it considers the intrinsic
geometric properties by imposing graph regularization that has been
successfully applied to uncover the geometric distribution for the image data.
Moreover, the returned sparse representations by GTSC have better physical
explanations as the key operation (i.e., circular convolution) in the
tubal-tensor model preserves the shifting invariance property. Experimental
results on image clustering demonstrate the effectiveness of the proposed
scheme.
Jiří Hladůvka, Bui Thi Mai Phuong, Richard Ljuhar, Davul Ljuhar, Ana M Rodrigues, Jaime C Branco, Helena Canhão
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The relationship between knee osteoarthritis progression and changes in
tibial bone structure has long been recognized and various texture descriptors
have been proposed to detect early osteoarthritis (OA) from radiographs. This
work aims to investigate (1) femoral textures as an OA indicator and (2) the
potential of entropy as a computationally efficient alternative to established
texture descriptors.
We design a robust semi-automatically placed layout for regions of interest
(ROI), compute the Hurst coefficient and the entropy in each ROI, and employ
statistical and machine learning methods to evaluate feature combinations.
Based on 153 high-resolution radiographs, our results identify medial femur
as an effective univariate descriptor, with significance comparable to medial
tibia. Entropy is shown to contribute to classification performance. A linear
five-feature classifier combining femur, entropic and standard texture
descriptors, achieves AUC of 0.85, outperforming the state-of-the-art by
roughly 0.1.
Lei Xiao, Felix Heide, Wolfgang Heidrich, Bernhard Schölkopf, Michael Hirsch
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, several discriminative learning approaches have been proposed for
effective image restoration, achieving convincing trade-off between image
quality and computational efficiency. However, these methods require separate
training for each restoration task (e.g., denoising, deblurring, demosaicing)
and problem condition (e.g., noise level of input images). This makes it
time-consuming and difficult to encompass all tasks and conditions during
training. In this paper, we propose a discriminative transfer learning method
that incorporates formal proximal optimization and discriminative learning for
general image restoration. The method requires a single-pass training and
allows for reuse across various problems and conditions while achieving an
efficiency comparable to previous discriminative approaches. Furthermore, after
being trained, our model can be easily transferred to new likelihood terms to
solve untrained tasks, or be combined with existing priors to further improve
image restoration quality.
Shumeet Baluja, Ian Fischer
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Multiple different approaches of generating adversarial examples have been
proposed to attack deep neural networks. These approaches involve either
directly computing gradients with respect to the image pixels, or directly
solving an optimization on the image pixels. In this work, we present a
fundamentally new method for generating adversarial examples that is fast to
execute and provides exceptional diversity of output. We efficiently train
feed-forward neural networks in a self-supervised manner to generate
adversarial examples against a target network or set of networks. We call such
a network an Adversarial Transformation Network (ATN). ATNs are trained to
generate adversarial examples that minimally modify the classifier’s outputs
given the original input, while constraining the new classification to match an
adversarial target class. We present methods to train ATNs and analyze their
effectiveness targeting a variety of MNIST classifiers as well as the latest
state-of-the-art ImageNet classifier Inception ResNet v2.
Yu Guan, Thomas Ploetz
Comments: accepted for publication in ACM IMWUT (Ubicomp) 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Recently, deep learning (DL) methods have been introduced very successfully
into human activity recognition (HAR) scenarios in ubiquitous and wearable
computing. Especially the prospect of overcoming the need for manual feature
design combined with superior classification capabilities render deep neural
networks very attractive for real-life HAR application. Even though DL-based
approaches now outperform the state-of-the-art in a number of recognitions
tasks of the field, yet substantial challenges remain. Most prominently, issues
with real-life datasets, typically including imbalanced datasets and
problematic data quality, still limit the effectiveness of activity recognition
using wearables. In this paper we tackle such challenges through Ensembles of
deep Long Short Term Memory (LSTM) networks. We have developed modified
training procedures for LSTM networks and combine sets of diverse LSTM learners
into classifier collectives. We demonstrate, both formally and empirically,
that Ensembles of deep LSTM learners outperform the individual LSTM networks.
Through an extensive experimental evaluation on three standard benchmarks
(Opportunity, PAMAP2, Skoda) we demonstrate the excellent recognition
capabilities of our approach and its potential for real-life applications of
human activity recognition.
Christoph Benzmüller
Comments: 9 pages
Subjects: Artificial Intelligence (cs.AI)
Classical higher-order logic, when utilized as a meta-logic in which various
other (classical and non-classical) logics can be shallowly embedded, is well
suited for realising a universal logic reasoning approach. Universal logic
reasoning in turn, as envisioned already by Leibniz, may support the rigorous
formalisation and deep logical analysis of rational arguments within machines.
A respective universal logic reasoning framework is described and a range of
exemplary applications are discussed. In the future, universal logic reasoning
in combination with appropriate, controlled forms of rational argumentation may
serve as a communication layer between humans and intelligent machines.
Aleksey Buzmakov, Sergei O. Kuznetsov, Amedeo Napoli
Subjects: Artificial Intelligence (cs.AI)
The exponential explosion of the set of patterns is one of the main
challenges in pattern mining. This challenge is approached by introducing a
constraint for pattern selection. One of the first constraints proposed in
pattern mining is support (frequency) of a pattern in a dataset. Frequency is
an anti-monotonic function, i.e., given an infrequent pattern, all its
superpatterns are not frequent. However, many other constraints for pattern
selection are neither monotonic nor anti-monotonic, which makes it difficult to
generate patterns satisfying these constraints.
In order to deal with nonmonotonic constraints we introduce the notion of
“projection antimonotonicity” and SOFIA algorithm that allow generating best
patterns for a class of nonmonotonic constraints. Cosine interest, robustness,
stability of closed itemsets, and the associated delta-measure are among these
constraints. SOFIA starts from light descriptions of transactions in dataset (a
small set of items in the case of itemset description) and then iteratively
adds more information to these descriptions (more items with indication of
tidsets they describe).
Anthony Bagnall, Aaron Bostrom, James Large, Jason Lines
Subjects: Artificial Intelligence (cs.AI)
There are now a broad range of time series classification (TSC) algorithms
designed to exploit different representations of the data. These have been
evaluated on a range of problems hosted at the UCR-UEA TSC Archive
(www.timeseriesclassification.com), and there have been extensive comparative
studies. However, our understanding of why one algorithm outperforms another is
still anecdotal at best. This series of experiments is meant to help provide
insights into what sort of discriminatory features in the data lead one set of
algorithms that exploit a particular representation to be better than other
algorithms. We categorise five different feature spaces exploited by TSC
algorithms then design data simulators to generate randomised data from each
representation. We describe what results we expected from each class of
algorithm and data representation, then observe whether these prior beliefs are
supported by the experimental evidence. We provide an open source
implementation of all the simulators to allow for the controlled testing of
hypotheses relating to classifier performance on different data
representations. We identify many surprising results that confounded our
expectations, and use these results to highlight how an over simplified view of
classifier structure can often lead to erroneous prior beliefs. We believe
ensembling can often overcome prior bias, and our results support the belief by
showing that the ensemble approach adopted by the Hierarchical Collective of
Transform based Ensembles (HIVE-COTE) is significantly better than the
alternatives when the data representation is unknown, and is significantly
better than, or not significantly significantly better than, or not
significantly worse than, the best other approach on three out of five of the
individual simulators.
Jingchi Jiang, Chao Zhao, Yi Guan, Qiubin Yu
Comments: 32 pages, 8 figures
Subjects: Artificial Intelligence (cs.AI)
Based on a weighted knowledge graph to represent first-order knowledge and
combining it with a probabilistic model, we propose a methodology for the
creation of a medical knowledge network (MKN) in medical diagnosis. When a set
of symptoms is activated for a specific patient, we can generate a ground
medical knowledge network composed of symptom nodes and potential disease
nodes. By Incorporating a Boltzmann machine into the potential function of a
Markov network, we investigated the joint probability distribution of the MKN.
In order to deal with numerical symptoms, a multivariate inference model is
presented that uses conditional probability. In addition, the weights for the
knowledge graph were efficiently learned from manually annotated Chinese
Electronic Medical Records (CEMRs). In our experiments, we found numerically
that the optimum choice of the quality of disease node and the expression of
symptom variable can improve the effectiveness of medical diagnosis. Our
experimental results comparing a Markov logic network and the logistic
regression algorithm on an actual CEMR database indicate that our method holds
promise and that MKN can facilitate studies of intelligent diagnosis.
Kushal Kafle, Christopher Kanan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
In visual question answering (VQA), an algorithm must answer text-based
questions about images. While multiple datasets for VQA have been created since
late 2014, they all have flaws in both their content and the way algorithms are
evaluated on them. As a result, evaluation scores are inflated and
predominantly determined by answering easier questions, making it difficult to
compare different methods. In this paper, we analyze existing VQA algorithms
using a new dataset. It contains over 1.6 million questions organized into 12
different categories. We also introduce questions that are meaningless for a
given image to force a VQA system to reason about image content. We propose new
evaluation schemes that compensate for over-represented question-types and make
it easier to study the strengths and weaknesses of algorithms. We analyze the
performance of both baseline and state-of-the-art VQA models, including
multi-modal compact bilinear pooling (MCB), neural module networks, and
recurrent answering units. Our experiments establish how attention helps
certain categories more than others, determine which models work better than
others, and explain how simple models (e.g. MLP) can surpass more complex
models (MCB) by simply learning to answer large, easy question categories.
Santiago Castro, Matías Cubero, Diego Garat, Guillermo Moncecchi
Comments: Preprint version, without referral
Journal-ref: Presented in Iberamia 2016. The final publication is available at
link.springer.com:
https://link.springer.com/chapter/10.1007%2F978-3-319-47955-2_12
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
While humor has been historically studied from a psychological, cognitive and
linguistic standpoint, its study from a computational perspective is an area
yet to be explored in Computational Linguistics. There exist some previous
works, but a characterization of humor that allows its automatic recognition
and generation is far from being specified. In this work we build a
crowdsourced corpus of labeled tweets, annotated according to its humor value,
letting the annotators subjectively decide which are humorous. A humor
classifier for Spanish tweets is assembled based on supervised learning,
reaching a precision of 84% and a recall of 69%.
Shumeet Baluja, Ian Fischer
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Multiple different approaches of generating adversarial examples have been
proposed to attack deep neural networks. These approaches involve either
directly computing gradients with respect to the image pixels, or directly
solving an optimization on the image pixels. In this work, we present a
fundamentally new method for generating adversarial examples that is fast to
execute and provides exceptional diversity of output. We efficiently train
feed-forward neural networks in a self-supervised manner to generate
adversarial examples against a target network or set of networks. We call such
a network an Adversarial Transformation Network (ATN). ATNs are trained to
generate adversarial examples that minimally modify the classifier’s outputs
given the original input, while constraining the new classification to match an
adversarial target class. We present methods to train ATNs and analyze their
effectiveness targeting a variety of MNIST classifiers as well as the latest
state-of-the-art ImageNet classifier Inception ResNet v2.
Yu Guan, Thomas Ploetz
Comments: accepted for publication in ACM IMWUT (Ubicomp) 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Recently, deep learning (DL) methods have been introduced very successfully
into human activity recognition (HAR) scenarios in ubiquitous and wearable
computing. Especially the prospect of overcoming the need for manual feature
design combined with superior classification capabilities render deep neural
networks very attractive for real-life HAR application. Even though DL-based
approaches now outperform the state-of-the-art in a number of recognitions
tasks of the field, yet substantial challenges remain. Most prominently, issues
with real-life datasets, typically including imbalanced datasets and
problematic data quality, still limit the effectiveness of activity recognition
using wearables. In this paper we tackle such challenges through Ensembles of
deep Long Short Term Memory (LSTM) networks. We have developed modified
training procedures for LSTM networks and combine sets of diverse LSTM learners
into classifier collectives. We demonstrate, both formally and empirically,
that Ensembles of deep LSTM learners outperform the individual LSTM networks.
Through an extensive experimental evaluation on three standard benchmarks
(Opportunity, PAMAP2, Skoda) we demonstrate the excellent recognition
capabilities of our approach and its potential for real-life applications of
human activity recognition.
Brett W. Israelsen, Nisar Ahmed, Kenneth Center, Roderick Green, Winston Bennett Jr
Comments: submitted to JAIS for review
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
This work studies how an AI-controlled dog-fighting agent with tunable
decision-making parameters can learn to optimize performance against an
intelligent adversary, as measured by a stochastic objective function evaluated
on simulated combat engagements. Gaussian process Bayesian optimization (GPBO)
techniques are developed to automatically learn global Gaussian Process (GP)
surrogate models, which provide statistical performance predictions in both
explored and unexplored areas of the parameter space. This allows a learning
engine to sample full-combat simulations at parameter values that are most
likely to optimize performance and also provide highly informative data points
for improving future predictions. However, standard GPBO methods do not provide
a reliable surrogate model for the highly volatile objective functions found in
aerial combat, and thus do not reliably identify global maxima. These issues
are addressed by novel Repeat Sampling (RS) and Hybrid Repeat/Multi-point
Sampling (HRMS) techniques. Simulation studies show that HRMS improves the
accuracy of GP surrogate models, allowing AI decision-makers to more accurately
predict performance and efficiently tune parameters.
Dorna Bandari, Shuo Xiang, Jure Leskovec
Comments: Submitted to KDD ’17
Subjects: Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Different users can use a given Internet application in many different ways.
The ability to record detailed event logs of user in-application activity
allows us to discover ways in which the application is being used. This enables
personalization and also leads to important insights with actionable business
and product outcomes.
Here we study the problem of user session categorization, where the goal is
to automatically discover categories/classes of user in-session behavior using
event logs, and then consistently categorize each user session into the
discovered classes. We develop a three stage approach which uses clustering to
discover categories of sessions, then builds classifiers to classify new
sessions into the discovered categories, and finally performs daily
classification in a distributed pipeline. An important innovation of our
approach is selecting a set of events as long-tail features, and replacing them
with a new feature that is less sensitive to product experimentation and
logging changes. This allows for robust and stable identification of session
types even though the underlying application is constantly changing. We deploy
the approach to Pinterest and demonstrate its effectiveness. We discover
insights that have consequences for product monetization, growth, and design.
Our solution classifies millions of user sessions daily and leads to actionable
insights.
Taylor Arnold
Comments: 17 pages; 4 figures
Subjects: Computation and Language (cs.CL); Computation (stat.CO)
The package cleanNLP provides a set of fast tools for converting a textual
corpus into a set of normalized tables. The underlying natural language
processing pipeline utilizes Stanford’s CoreNLP library, exposing a number of
annotation tasks for text written in English, French, German, and Spanish.
Annotators include tokenization, part of speech tagging, named entity
recognition, entity linking, sentiment analysis, dependency parsing,
coreference resolution, and information extraction.
Santiago Castro, Matías Cubero, Diego Garat, Guillermo Moncecchi
Comments: Preprint version, without referral
Journal-ref: Presented in Iberamia 2016. The final publication is available at
link.springer.com:
https://link.springer.com/chapter/10.1007%2F978-3-319-47955-2_12
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
While humor has been historically studied from a psychological, cognitive and
linguistic standpoint, its study from a computational perspective is an area
yet to be explored in Computational Linguistics. There exist some previous
works, but a characterization of humor that allows its automatic recognition
and generation is far from being specified. In this work we build a
crowdsourced corpus of labeled tweets, annotated according to its humor value,
letting the annotators subjectively decide which are humorous. A humor
classifier for Spanish tweets is assembled based on supervised learning,
reaching a precision of 84% and a recall of 69%.
Yichao Lu, Phillip Keung, Shaonan Zhang, Jason Sun, Vikas Bhardwaj
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
We describe a prototype dialogue response generation model for the customer
service domain at Amazon. The model, which is trained in a weakly supervised
fashion, measures the similarity between customer questions and agent answers
using a dual encoder network, a Siamese-like neural network architecture.
Answer templates are extracted from embeddings derived from past agent answers,
without turn-by-turn annotations. Responses to customer inquiries are generated
by selecting the best template from the final set of templates. We show that,
in a closed domain like customer service, the selected templates cover (>)70\%
of past customer inquiries. Furthermore, the relevance of the model-selected
templates is significantly higher than templates selected by a standard tf-idf
baseline.
Kushal Kafle, Christopher Kanan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
In visual question answering (VQA), an algorithm must answer text-based
questions about images. While multiple datasets for VQA have been created since
late 2014, they all have flaws in both their content and the way algorithms are
evaluated on them. As a result, evaluation scores are inflated and
predominantly determined by answering easier questions, making it difficult to
compare different methods. In this paper, we analyze existing VQA algorithms
using a new dataset. It contains over 1.6 million questions organized into 12
different categories. We also introduce questions that are meaningless for a
given image to force a VQA system to reason about image content. We propose new
evaluation schemes that compensate for over-represented question-types and make
it easier to study the strengths and weaknesses of algorithms. We analyze the
performance of both baseline and state-of-the-art VQA models, including
multi-modal compact bilinear pooling (MCB), neural module networks, and
recurrent answering units. Our experiments establish how attention helps
certain categories more than others, determine which models work better than
others, and explain how simple models (e.g. MLP) can surpass more complex
models (MCB) by simply learning to answer large, easy question categories.
Md Main Uddin Rony, Naeemul Hassan, Mohammad Yousuf
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)
The use of alluring headlines (clickbait) to tempt the readers has become a
growing practice nowadays. For the sake of existence in the highly competitive
media industry, most of the on-line media including the mainstream ones, have
started following this practice. Although the wide-spread practice of clickbait
makes the reader’s reliability on media vulnerable, a large scale analysis to
reveal this fact is still absent. In this paper, we analyze 1.67 million
Facebook posts created by 153 media organizations to understand the extent of
clickbait practice, its impact and user engagement by using our own developed
clickbait detection model. The model uses distributed sub-word embeddings
learned from a large corpus. The accuracy of the model is 98.3%. Powered with
this model, we further study the distribution of topics in clickbait and
non-clickbait contents.
Benjamin D. Horne, Sibel Adali
Comments: Published at The 2nd International Workshop on News and Public Opinion at ICWSM
Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)
The problem of fake news has gained a lot of attention as it is claimed to
have had a significant impact on 2016 US Presidential Elections. Fake news is
not a new problem and its spread in social networks is well-studied. Often an
underlying assumption in fake news discussion is that it is written to look
like real news, fooling the reader who does not check for reliability of the
sources or the arguments in its content. Through a unique study of three data
sets and features that capture the style and the language of articles, we show
that this assumption is not true. Fake news in most cases is more similar to
satire than to real news, leading us to conclude that persuasion in fake news
is achieved through heuristics rather than the strength of arguments. We show
overall title structure and the use of proper nouns in titles are very
significant in differentiating fake from real. This leads us to conclude that
fake news is targeted for audiences who are not likely to read beyond titles
and is aimed at creating mental associations between entities and claims.
Yongzhe Zhang, Hsiang-Shang Ko, Zhenjiang Hu
Comments: 12 pages, 11 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Pregel is a popular parallel computing model for dealing with large-scale
graphs. However, it can be tricky to implement graph algorithms correctly and
efficiently in Pregel’s vertex-centric model, as programmers need to carefully
restructure an algorithm in terms of supersteps and message passing, which are
low-level and detached from the algorithm descriptions. Some domain-specific
languages (DSLs) have been proposed to provide more intuitive ways to implement
graph algorithms, but none of them can flexibly describe remote access (reading
or writing attributes of other vertices through references), causing a still
wide range of algorithms hard to implement.
To address this problem, we design and implement Palgol, a more declarative
and powerful DSL which supports remote access. In particular, programmers can
use a more declarative syntax called global field access to directly read data
on remote vertices. By structuring supersteps in a high-level vertex-centric
computation model and analyzing the logic patterns of global field access, we
provide a novel algorithm for compiling Palgol programs to efficient Pregel
code. We demonstrate the power of Palgol by using it to implement a bunch of
practical Pregel algorithms and compare them with hand-written code. The
evaluation result shows that the efficiency of Palgol is comparable with that
of hand-written code.
Vidhya Tekken Valapil, Sandeep S. Kulkarni
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper, we present an algorithm that transforms a stabilizing program
that uses unbounded variables into a stabilizing program that uses bounded
variables and (practically bounded) physical time. While non-stabilizing
programs can deal with unbounded variables by assigning large enough but
bounded space, stabilizing programs that need to deal with arbitrary transient
faults cannot do the same since a transient fault may corrupt the variable to
its maximum value. Our transformation is based on two key concepts: free
counters and dependent counters. The former represents variables that can be
freely increased without affecting the correctness of the underlying program
and the latter represents temporary variables that become irrelevant after some
duration of time. We show that our transformation algorithm is applicable to
several problems including logical clocks, vector clocks, mutual exclusion,
leader election, diffusing computations, Paxos based consensus, and so on.
Moreover, our approach can also be used to bound counters used in earlier work
by Katz and Perry for adding stabilization. With our approach, it would be
possible to provide stabilization for a rich class of problems, by assigning
large enough but bounded space for variables.
Zahra Khatami, Hartmut Kaiser, J. Ramanujam
Comments: 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Maximizing parallelism level in applications can be achieved by minimizing
overheads due to load imbalances and waiting time due to memory latencies.
Compiler optimization is one of the most effective solutions to tackle this
problem. The compiler is able to detect the data dependencies in an application
and is able to analyze the specific sections of code for parallelization
potential. However, all of these techniques provided with a compiler are
usually applied at compile time, so they rely on static analysis, which is
insufficient for achieving maximum parallelism and producing desired
application scalability. One solution to address this challenge is the use of
runtime methods. This strategy can be implemented by delaying certain amount of
code analysis to be done at runtime. In this research, we improve the parallel
application performance generated by the OP2 compiler by leveraging HPX, a C++
runtime system, to provide runtime optimizations. These optimizations include
asynchronous tasking, loop interleaving, dynamic chunk sizing, and data
prefetching. The results of the research were evaluated using an Airfoil
application which showed a 40-50% improvement in parallel performance.
Ruonan Wang, Christopher Harris, Andreas Wicenec
Comments: 20 pages, journal article, 2016
Journal-ref: Astronomy and Computing, Volume 16, July 2016, Pages 146-154, ISSN
2213-1337
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Instrumentation and Methods for Astrophysics (astro-ph.IM)
In this paper, we investigate the Casacore Table Data System (CTDS) used in
the casacore and CASA libraries, and methods to parallelize it. CTDS provides a
storage manager plugin mechanism for third-party devel- opers to design and
implement their own CTDS storage managers. Hav- ing this in mind, we looked
into various storage backend techniques that can possibly enable parallel I/O
for CTDS by implementing new storage managers. After carrying on benchmarks
showing the excellent parallel I/O throughput of the Adaptive IO System
(ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then
applied the CASA MSTransform frequency split task to verify the ADIOS Storage
Manager. We also ran a series of performance tests to examine the I/O
throughput in a massively parallel scenario.
Katarzyna Mazur, Bogdan Ksiezopolski
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Information management is one of the most significant issues in nowadays data
centers. Selection of appropriate software, security mechanisms and effective
energy consumption management together with caring for the environment enforces
a profound analysis of the considered system. Besides these factors, financial
analysis of data center maintenance is another important aspect that needs to
be considered. Data centers are mission-critical components of all large
enterprises and frequently cost hundreds of millions of dollars to build, yet
few high-level executives understand the true cost of operating such
facilities. Costs are typically spread across the IT, networking, and
facilities, which makes management of these costs and assessment of
alternatives difficult. This paper deals with a research on multilevel analysis
of data center management and presents an approach to estimate the true total
costs of operating data center physical facilities, taking into account the
proper management of the information flow.
Divya Shyam Singha, G.B.L. Chowdarya, D Roy Mahapatraa
Comments: 6 pages,6 figures, ISSS conference
Subjects: Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
This paper presents real-time vibration based identification technique using
measured frequency response functions(FRFs) under random vibration loading.
Artificial Neural Networks (ANNs) are trained to map damage fingerprints to
damage characteristic parameters. Principal component statistical analysis(PCA)
technique was used to tackle the problem of high dimensionality and high noise
of data, which is common for industrial structures. The present study considers
Crack, Rivet hole expansion and redundant uniform mass as damages on the
structure. Frequency response function data after being reduced in size using
PCA is fed to individual neural networks to localize and predict the severity
of damage on the structure. The system of ANNs trained with both numerical and
experimental model data to make the system reliable and robust. The methodology
is applied to a numerical model of stiffened panel structure, where damages are
confined close to the stiffener. The results showed that, in all the cases
considered, it is possible to localize and predict severity of the damage
occurrence with very good accuracy and reliability.
Rundong Du, Barry Drake, Haesun Park
Comments: 9 pages, Submitted to a conference, Feb. 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We present a hybrid method for latent information discovery on the data sets
containing both text content and connection structure based on constrained low
rank approximation. The new method jointly optimizes the Nonnegative Matrix
Factorization (NMF) objective function for text clustering and the Symmetric
NMF (SymNMF) objective function for graph clustering. We propose an effective
algorithm for the joint NMF objective function, based on a block coordinate
descent (BCD) framework. The proposed hybrid method discovers content
associations via latent connections found using SymNMF. The method can also be
applied with a natural conversion of the problem when a hypergraph formulation
is used or the content is associated with hypergraph edges.
Experimental results show that by simultaneously utilizing both content and
connection structure, our hybrid method produces higher quality clustering
results compared to the other NMF clustering methods that uses content alone
(standard NMF) or connection structure alone (SymNMF). We also present some
interesting applications to several types of real world data such as citation
recommendations of papers. The hybrid method proposed in this paper can also be
applied to general data expressed with both feature space vectors and pairwise
similarities and can be extended to the case with multiple feature spaces or
multiple similarity measures.
Maren Mahsereci, Lukas Balles, Christoph Lassner, Philipp Hennig
Comments: 9 pages, 5 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Early stopping is a widely used technique to prevent poor generalization
performance when training an over-expressive model by means of gradient-based
optimization. To find a good point to halt the optimizer, a common practice is
to split the dataset into a training and a smaller validation set to obtain an
ongoing estimate of the generalization performance. In this paper we propose a
novel early stopping criterion which is based on fast-to-compute, local
statistics of the computed gradients and entirely removes the need for a
held-out validation set. Our experiments show that this is a viable approach in
the setting of least-squares and logistic regression as well as neural
networks.
Santiago Pascual, Antonio Bonafonte, Joan Serrà
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
Current speech enhancement techniques operate on the spectral domain and/or
exploit some higher-level feature. The majority of them tackle a limited number
of noise conditions and rely on first-order statistics. To circumvent these
issues, deep networks are being increasingly used, thanks to their ability to
learn complex functions from large example sets. In this work, we propose the
use of generative adversarial networks for speech enhancement. In contrast to
current techniques, we operate at the waveform level, training the model
end-to-end, and incorporate 28 speakers and 40 different noise conditions into
the same model, such that model parameters are shared across them. We evaluate
the proposed model using an independent, unseen test set with two speakers and
20 alternative noise conditions. The enhanced samples confirm the viability of
the proposed model, and both objective and subjective evaluations confirm the
effectiveness of it. With that, we open the exploration of generative
architectures for speech enhancement, which may progressively incorporate
further speech-centric design choices to improve their performance.
Sean McGregor, Rachel Houtman, Claire Montgomery, Ronald Metoyer, Thomas G. Dietterich
Subjects: Learning (cs.LG)
Managers of US National Forests must decide what policy to apply for dealing
with lightning-caused wildfires. Conflicts among stakeholders (e.g., timber
companies, home owners, and wildlife biologists) have often led to spirited
political debates and even violent eco-terrorism. One way to transform these
conflicts into multi-stakeholder negotiations is to provide a high-fidelity
simulation environment in which stakeholders can explore the space of
alternative policies and understand the tradeoffs therein. Such an environment
needs to support fast optimization of MDP policies so that users can adjust
reward functions and analyze the resulting optimal policies. This paper
assesses the suitability of SMAC—a black-box empirical function optimization
algorithm—for rapid optimization of MDP policies. The paper describes five
reward function components and four stakeholder constituencies. It then
introduces a parameterized class of policies that can be easily understood by
the stakeholders. SMAC is applied to find the optimal policy in this class for
the reward functions of each of the stakeholder constituencies. The results
confirm that SMAC is able to rapidly find good policies that make sense from
the domain perspective. Because the full-fidelity forest fire simulator is far
too expensive to support interactive optimization, SMAC is applied to a
surrogate model constructed from a modest number of runs of the full-fidelity
simulator. To check the quality of the SMAC-optimized policies, the policies
are evaluated on the full-fidelity simulator. The results confirm that the
surrogate values estimates are valid. This is the first successful optimization
of wildfire management policies using a full-fidelity simulation. The same
methodology should be applicable to other contentious natural resource
management problems where high-fidelity simulation is extremely expensive.
Sean McGregor, Rachel Houtman, Claire Montgomery, Ronald Metoyer, Thomas G. Dietterich
Subjects: Learning (cs.LG)
Policy analysts wish to visualize a range of policies for large
simulator-defined Markov Decision Processes (MDPs). One visualization approach
is to invoke the simulator to generate on-policy trajectories and then
visualize those trajectories. When the simulator is expensive, this is not
practical, and some method is required for generating trajectories for new
policies without invoking the simulator. The method of Model-Free Monte Carlo
(MFMC) can do this by stitching together state transitions for a new policy
based on previously-sampled trajectories from other policies. This “off-policy
Monte Carlo simulation” method works well when the state space has low
dimension but fails as the dimension grows. This paper describes a method for
factoring out some of the state and action variables so that MFMC can work in
high-dimensional MDPs. The new method, MFMCi, is evaluated on a very
challenging wildfire management MDP.
Yu Guan, Thomas Ploetz
Comments: accepted for publication in ACM IMWUT (Ubicomp) 2017
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Recently, deep learning (DL) methods have been introduced very successfully
into human activity recognition (HAR) scenarios in ubiquitous and wearable
computing. Especially the prospect of overcoming the need for manual feature
design combined with superior classification capabilities render deep neural
networks very attractive for real-life HAR application. Even though DL-based
approaches now outperform the state-of-the-art in a number of recognitions
tasks of the field, yet substantial challenges remain. Most prominently, issues
with real-life datasets, typically including imbalanced datasets and
problematic data quality, still limit the effectiveness of activity recognition
using wearables. In this paper we tackle such challenges through Ensembles of
deep Long Short Term Memory (LSTM) networks. We have developed modified
training procedures for LSTM networks and combine sets of diverse LSTM learners
into classifier collectives. We demonstrate, both formally and empirically,
that Ensembles of deep LSTM learners outperform the individual LSTM networks.
Through an extensive experimental evaluation on three standard benchmarks
(Opportunity, PAMAP2, Skoda) we demonstrate the excellent recognition
capabilities of our approach and its potential for real-life applications of
human activity recognition.
Michael Laskey, Jonathan Lee, Wesley Hsieh, Richard Liaw, Jeffrey Mahler, Roy Fox, Ken Goldberg
Subjects: Learning (cs.LG)
In Imitation Learning, a supervisor’s policy is observed and the intended
behavior is learned. A known problem with this approach is covariate shift,
which occurs because the agent visits different states than the supervisor.
Rolling out the current agent’s policy, an on-policy method, allows for
collecting data along a distribution similar to the updated agent’s policy.
However this approach can become less effective as the demonstrations are
collected in very large batch sizes, which reduces the relevance of data
collected in previous iterations. In this paper, we propose to alleviate the
covariate shift via the injection of artificial noise into the supervisor’s
policy. We prove an improved bound on the loss due to the covariate shift, and
introduce an algorithm that leverages our analysis to estimate the level of
(epsilon)-greedy noise to inject. In a driving simulator domain where an agent
learns an image-to-action deep network policy, our algorithm Dart achieves a
better performance than DAgger with 75% fewer demonstrations.
Brett W. Israelsen, Nisar Ahmed, Kenneth Center, Roderick Green, Winston Bennett Jr
Comments: submitted to JAIS for review
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
This work studies how an AI-controlled dog-fighting agent with tunable
decision-making parameters can learn to optimize performance against an
intelligent adversary, as measured by a stochastic objective function evaluated
on simulated combat engagements. Gaussian process Bayesian optimization (GPBO)
techniques are developed to automatically learn global Gaussian Process (GP)
surrogate models, which provide statistical performance predictions in both
explored and unexplored areas of the parameter space. This allows a learning
engine to sample full-combat simulations at parameter values that are most
likely to optimize performance and also provide highly informative data points
for improving future predictions. However, standard GPBO methods do not provide
a reliable surrogate model for the highly volatile objective functions found in
aerial combat, and thus do not reliably identify global maxima. These issues
are addressed by novel Repeat Sampling (RS) and Hybrid Repeat/Multi-point
Sampling (HRMS) techniques. Simulation studies show that HRMS improves the
accuracy of GP surrogate models, allowing AI decision-makers to more accurately
predict performance and efficiently tune parameters.
Silvano Galliani, Charis Lanaras, Dimitrios Marmanis, Emmanuel Baltsavias, Konrad Schindler
Comments: Submitted to ICCV 2017 (10 pages, 8 figures)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We describe a novel method for blind, single-image spectral super-resolution.
While conventional super-resolution aims to increase the spatial resolution of
an input image, our goal is to spectrally enhance the input, i.e., generate an
image with the same spatial resolution, but a greatly increased number of
narrow (hyper-spectral) wave-length bands. Just like the spatial statistics of
natural images has rich structure, which one can exploit as prior to predict
high-frequency content from a low resolution image, the same is also true in
the spectral domain: the materials and lighting conditions of the observed
world induce structure in the spectrum of wavelengths observed at a given
pixel. Surprisingly, very little work exists that attempts to use this
diagnosis and achieve blind spectral super-resolution from single images. We
start from the conjecture that, just like in the spatial domain, we can learn
the statistics of natural image spectra, and with its help generate finely
resolved hyper-spectral images from RGB input. Technically, we follow the
current best practice and implement a convolutional neural network (CNN), which
is trained to carry out the end-to-end mapping from an entire RGB image to the
corresponding hyperspectral image of equal size. We demonstrate spectral
super-resolution both for conventional RGB images and for multi-spectral
satellite data, outperforming the state-of-the-art.
Muneki Yasuda, Shun Kataoka
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)
In this paper, we address the inverse problem, or the statistical machine
learning problem, in Markov random fields with a non-parametric pair-wise
energy function with continuous variables. The inverse problem is formulated by
maximum likelihood estimation. The exact treatment of maximum likelihood
estimation is intractable because of two problems: (1) it includes the
evaluation of the partition function and (2) it is formulated in the form of
functional optimization. We avoid Problem (1) by using Bethe approximation.
Bethe approximation is an approximation technique equivalent to the loopy
belief propagation. Problem (2) can be solved by using orthonormal function
expansion. Orthonormal function expansion can reduce a functional optimization
problem to a function optimization problem. Our method can provide an analytic
form of the solution of the inverse problem within the framework of Bethe
approximation.
Somil Bansal, Roberto Calandra, Ted Xiao, Sergey Levine, Claire J. Tomlin
Subjects: Systems and Control (cs.SY); Learning (cs.LG)
Real-world robots are becoming increasingly complex and commonly act in
poorly understood environments where it is extremely challenging to model or
learn their true dynamics. Therefore, it might be desirable to take a
task-specific approach, wherein the focus is on explicitly learning the
dynamics model which achieves the best control performance for the task at
hand, rather than learning the true dynamics. In this work, we use Bayesian
optimization in an active learning framework where a locally linear dynamics
model is learned with the intent of maximizing the control performance, and
used in conjunction with optimal control schemes to efficiently design a
controller for a given task. This model is updated directly based on the
performance observed in experiments on the physical system in an iterative
manner until a desired performance is achieved. We demonstrate the efficacy of
the proposed approach through simulations and real experiments on a quadrotor
testbed.
Liang Dong
Comments: 30 pages, 13 figures. Submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT); Computer Science and Game Theory (cs.GT)
A passive radio-frequency (RF) energy harvester collects the radiated energy
from nearby wireless information transmitters instead of using a dedicated
wireless power source. The energy harvester needs multiple transmitters to
concentrate their RF radiation on it because typical electric field strengths
are weak. For multi-user transmissions over MIMO interference channels, each
user designs the transmit covariance matrix to maximize its information rate.
When RF energy harvesters are in the network, the multi-user transmissions in
interference channels are constrained by both the transmit power limits and the
energy harvesting requirements. In this paper, strategic games are proposed for
the multi-user transmissions. First, in a non-cooperative game, each
transmitter has a best-response strategy for the transmit covariance matrix
that follows a multi-level water-filling solution. A pure-strategy Nash
equilibrium exists. % but the best-response dynamics may cycle and do not
converge. Secondly, in a cooperative game, there is no need to estimate the
proportion of the harvested energy from each transmitter. Rather, the
transmitters bargain over the unit-reward of the energy contribution. An
approximation of the information rate is used in constructing the individual
utility such that the problem of network utility maximization can be decomposed
and the bargaining process can be implemented distributively. The bargaining
solution gives a point of rates that is superior to the Nash equilibria and
close to the Pareto front. Simulation results verify the algorithms that
provide good communication performance while satisfying the RF
energy-harvesting requirements.
Changyang She, Chenyang Yang, Tony Q.S. Quek
Comments: The manuscript has been submitted to IEEE Transactions on Wireless Communications. It is still in revision. Copyright and all rights therein are retained by authors
Subjects: Information Theory (cs.IT)
In this paper, we propose a framework for cross-layer optimization to ensure
ultra-high reliability and ultra-low latency in radio access networks, where
both transmission delay and queueing delay are considered. With short
transmission time, the blocklength of channel codes is finite, and the Shannon
Capacity cannot be used to characterize the maximal achievable rate with given
transmission error probability. With randomly arrived packets, some packets may
violate the queueing delay. Moreover, since the queueing delay is shorter than
the channel coherence time in typical scenarios, the required transmit power to
guarantee the queueing delay and transmission error probability will become
unbounded even with spatial diversity. To ensure the required
quality-of-service (QoS) with finite transmit power, a proactive packet
dropping mechanism is introduced. Then, the overall packet loss probability
includes transmission error probability, queueing delay violation probability,
and packet dropping probability. We optimize the packet dropping policy, power
allocation policy, and bandwidth allocation policy to minimize the transmit
power under the QoS constraint. The optimal solution is obtained, which depends
on both channel and queue state information. Simulation and numerical results
validate our analysis, and show that setting packet loss probabilities equal is
a near optimal solution.
Andrea Pizzo, Luca Sanguinetti
Comments: 8 pages, 4 figures, 1 table, presented at 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt2017), Paris, France, May 2017
Subjects: Information Theory (cs.IT)
This work analyzes a mmWave single-cell network, which comprises a macro base
station (BS) and an overlaid tier of small-cell BSs using a wireless backhaul
for data traffic. We look for the optimal number of antennas at both BS and
small-cell BSs that maximize the energy efficiency (EE) of the system when a
hybrid transceiver architecture is employed. Closed-form expressions for the
EE-optimal values of the number of antennas are derived that provide valuable
insights into the interplay between the optimization variables and hardware
characteristics. Numerical and analytical results show that the maximal EE is
achieved by a ‘close-to’ fully-digital system wherein the number of BS antennas
is approximately equal to the number of served small cells.
Hamid Ghourchian, Arash Amini, Amin Gohari
Comments: 37 pages
Subjects: Information Theory (cs.IT)
The sparsity and compressibility of finite-dimensional signals are of great
interest in fields such as compressed sensing. The notion of compressibility is
also extended to infinite sequences of i.i.d. or ergodic random variables based
on the observed error in their nonlinear k-term approximation. In this work, we
use the entropy measure to study the compressibility of continuous-domain
innovation processes (alternatively known as white noise). Specifically, we
define such a measure as the entropy limit of the doubly quantized (time and
amplitude) process by identifying divergent terms and extracting the convergent
part. While the converging part determines the introduced entropy, the
diverging terms provide a tool to compare the compressibility of various
innovation processes. In particular, we study stable, and impulsive Poisson
innovation processes representing various type of distributions. Our results
recognize Poisson innovations as the most compressible one with an entropy
measure far below that of stable innovations. While this result departs from
the previous knowledge regarding the compressibility of fat-tailed
distributions, our entropy measure ranks stable innovations according to their
tail decay.
Hamid Ghourchian, Amin Gohari, Arash Amini
Comments: 4 pages
Subjects: Information Theory (cs.IT)
In this paper, we identify a class of absolutely continuous probability
distributions, and show that the differential entropy is uniformly convergent
over this space under the metric of total variation distance. One of the
advantages of this class is that the requirements could be readily verified for
a given distribution.
Jose Armando Oviedo, Hamid R. Sadjadpour
Comments: This paper has been accepted for publication in the IEEE Transactions of Vehicular Technology; 11 pages, 6 figures
Subjects: Information Theory (cs.IT)
A non-orthogonal multiple access (NOMA) approach that always outperforms
orthogonal multiple access (OMA) called Fair-NOMA is introduced. In Fair-NOMA,
each mobile user is allocated its share of the transmit power such that its
capacity is always greater than or equal to the capacity that can be achieved
using OMA. For any slow-fading channel gains of the two users, the set of
possible power allocation coefficients are derived. For the infimum and
supremum of this set, the individual capacity gains and the sum-rate capacity
gain are derived. It is shown that the ergodic sum-rate capacity gain
approaches 1 b/s/Hz when the transmit power increases for the case when pairing
two random users with i.i.d. channel gains. The outage probability of this
approach is derived and shown to be better than OMA.
The Fair-NOMA approach is applied to the case of pairing a near base-station
user and a cell-edge user and the ergodic capacity gap is derived as a function
of total number of users in the cell at high SNR. This is then compared to the
conventional case of fixed-power NOMA with user-pairing. Finally, Fair-NOMA is
extended to (K) users and prove that the capacity can always be improved for
each user, while using less than the total transmit power required to achieve
OMA capacities per user.
Jae-Won Kim, Jong-Seon No
Subjects: Information Theory (cs.IT)
In this paper, new index coding problems are studied, where each receiver has
erroneous side information. Although side information is a crucial part of
index coding, the existence of erroneous side information has not yet been
considered. We study an index code with receivers that have erroneous side
information symbols in the error-free broadcast channel, which is called an
index code with side information errors (ICSIE). The encoding and decoding
procedures of the ICSIE are proposed, based on the syndrome decoding. Then, we
derive the bounds on the optimal codelength of the proposed index code with
erroneous side information. Furthermore, we introduce a special graph for the
proposed index coding problem, called a (delta_s)-cycle whose properties are
similar to those of the cycle in the conventional index coding problem.
Properties of the ICSIE are also discussed in the (delta_s)-cycle and clique.
Finally, the proposed ICSIE is generalized to an index code for the scenario
having both additive channel errors and side information errors, called a
generalized error correcting index code (GECIC).
Behrooz Makki, Tommy Svensson, Maite Brandt-Pearce, Mohamed-Slim Alouini
Comments: Submitted to IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
This paper studies the performance of multi-hop and mesh networks composed of
millimeter wave (MMW)-based radio frequency (RF) and free-space optical (FSO)
links. The results are obtained in cases with and without hybrid automatic
repeat request (HARQ). Taking the MMW characteristics of the RF links into
account, we derive closed-form expressions for the networks’ outage probability
and ergodic achievable rates. We also evaluate the effect of various parameters
such as power amplifiers efficiency, number of antennas as well as different
coherence times of the RF and the FSO links on the system performance. Finally,
we determine the minimum number of the transmit antennas in the RF link such
that the same rate is supported in the RF- and the FSO-based hops. The results
show the efficiency of the RF-FSO setups in different conditions. Moreover,
HARQ can effectively improve the outage probability/energy efficiency, and
compensate for the effect of hardware impairments in RF-FSO networks. For
common parameter settings of the RF-FSO dual-hop networks, outage probability
of 10^{-4} and code rate of 3 nats-per-channel-use, the implementation of HARQ
with a maximum of 2 and 3 retransmissions reduces the required power, compared
to cases with open-loop communication, by 13 and 17 dB, respectively.