Johannes Schemmel, Laura Kriener, Paul Müller, Karlheinz Meier
Comments: Accepted at IJCNN 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Emerging Technologies (cs.ET)
This paper presents an extension of the BrainScaleS accelerated analog
neuromorphic hardware model. The scalable neuromorphic architecture is extended
by the support for multi-compartment models and non-linear dendrites. These
features are part of a SI{65}{
anometer} prototype ASIC. It allows to
emulate different spike types observed in cortical pyramidal neurons: NMDA
plateau potentials, calcium and sodium spikes. By replicating some of the
structures of these cells, they can be configured to perform coincidence
detection within a single neuron. Built-in plasticity mechanisms can modify not
only the synaptic weights, but also the dendritic synaptic composition to
efficiently train large multi-compartment neurons. Transistor-level simulations
demonstrate the functionality of the analog implementation and illustrate
analogies to biological measurements.
Alexander Hagg, Maximilian Mensing, Alexander Asteroth
Subjects: Neural and Evolutionary Computing (cs.NE)
Neuroevolution methods evolve the weights of a neural network, and in some
cases the topology, but little work has been done to analyze the effect of
evolving the activation functions of individual nodes on network size, which is
important when training networks with a small number of samples. In this work
we extend the neuroevolution algorithm NEAT to evolve the activation function
of neurons in addition to the topology and weights of the network. The size and
performance of networks produced using NEAT with uniform activation in all
nodes, or homogenous networks, is compared to networks which contain a mixture
of activation functions, or heterogenous networks. For a number of regression
and classification benchmarks it is shown that, (1) qualitatively different
activation functions lead to different results in homogeneous networks, (2) the
heterogeneous version of NEAT is able to select well performing activation
functions, (3) producing heterogeneous networks that are significantly smaller
than homogeneous networks.
William La Cava, Jason H. Moore
Comments: Genetic and Evolutionary Computation Conference (GECCO) 2017, Berlin, Germany
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
Recently we proposed a general, ensemble-based feature engineering wrapper
(FEW) that was paired with a number of machine learning methods to solve
regression problems. Here, we adapt FEW for supervised classification and
perform a thorough analysis of fitness and survival methods within this
framework. Our tests demonstrate that two fitness metrics, one introduced as an
adaptation of the silhouette score, outperform the more commonly used Fisher
criterion. We analyze survival methods and demonstrate that (epsilon)-lexicase
survival works best across our test problems, followed by random survival which
outperforms both tournament and deterministic crowding. We conduct
hyper-parameter optimization for several classification methods using a large
set of problems to benchmark the ability of FEW to improve data
representations. The results show that FEW can improve the best classifier
performance on several problems. We show that FEW generates readable and
meaningful features for a biomedical problem with different ML pairings.
Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Imitation learning has been commonly applied to solve different tasks in
isolation. This usually requires either careful feature engineering, or a
significant number of samples. This is far from what we desire: ideally, robots
should be able to learn from very few demonstrations of any given task, and
instantly generalize to new situations of the same task, without requiring
task-specific engineering. In this paper, we propose a meta-learning framework
for achieving such capability, which we call one-shot imitation learning.
Specifically, we consider the setting where there is a very large set of
tasks, and each task has many instantiations. For example, a task could be to
stack all blocks on a table into a single tower, another task could be to place
all blocks on a table into two-block towers, etc. In each case, different
instances of the task would consist of different sets of blocks with different
initial states. At training time, our algorithm is presented with pairs of
demonstrations for a subset of all tasks. A neural net is trained that takes as
input one demonstration and the current state (which initially is the initial
state of the other demonstration of the pair), and outputs an action with the
goal that the resulting sequence of states and actions matches as closely as
possible with the second demonstration. At test time, a demonstration of a
single instance of a new task is presented, and the neural net is expected to
perform well on new instances of this new task. The use of soft attention
allows the model to generalize to conditions and tasks unseen in the training
data. We anticipate that by training this model on a much greater variety of
tasks and settings, we will obtain a general system that can turn any
demonstrations into robust policies that can accomplish an overwhelming variety
of tasks.
Videos available at this https URL
Chris Donahue, Zachary C. Lipton, Julian McAuley
Subjects: Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Machine Learning (stat.ML)
Dance Dance Revolution (DDR) is a popular rhythm-based video game. Players
perform steps on a dance platform in synchronization with music as directed by
on-screen step charts. While many step charts are available in standardized
packs, users may grow tired of existing charts, or wish to dance to a song for
which no chart exists. We introduce the task of learning to choreograph. Given
a raw audio track, the goal is to produce a new step chart. This task
decomposes naturally into two subtasks: deciding when to place steps and
deciding which steps to select. For the step placement task, we combine
recurrent and convolutional neural networks to ingest spectrograms of low-level
audio features to predict steps, conditioned on chart difficulty. For step
selection, we present a conditional LSTM generative model that substantially
outperforms n-gram and fixed-window approaches.
Shichao Yang, Yu Song, Michael Kaess, Sebastian Scherer
Comments: International Conference on Intelligent Robots and Systems (IROS) 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Existing simultaneous localization and mapping (SLAM) algorithms are not
robust in challenging low-texture environments because there are only few
salient features. The resulting sparse or semi-dense map also conveys little
information for motion planning. Though some work utilize plane or scene layout
for dense map regularization, they require decent state estimation from other
sources. In this paper, we propose real-time monocular plane SLAM to
demonstrate that scene understanding could improve both state estimation and
dense mapping especially in low-texture environments. The plane measurements
come from a pop-up 3D plane model applied to each single image. We also combine
planes with point based SLAM to improve robustness. On a public TUM dataset,
our algorithm generates a dense semantic 3D model with pixel depth error of 6.2
cm while existing SLAM algorithms fail. On a 60 m long dataset with loops, our
method creates a much better 3D model with state estimation error of 0.67%.
Adrian Bulat, Georgios Tzimiropoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper investigates how far a very deep neural network is from attaining
close to saturating performance on existing 2D and 3D face alignment datasets.
To this end, we make the following three contributions: (a) we construct, for
the first time, a very strong baseline by combining a state-of-the-art
architecture for landmark localization with a state-of-the-art residual block,
train it on a very large yet synthetically expanded 2D facial landmark dataset
and finally evaluate it on all other 2D facial landmark datasets. (b) We create
a guided by 2D landmarks network which converts 2D landmark annotations to 3D
and unifies all existing datasets, leading to the creation of LS3D-W, the
largest and most challenging 3D facial landmark dataset to date (~230,000
images). (c) Following that, we train a neural network for 3D face alignment
and evaluate it on the newly introduced LS3D-W. (d) We further look into the
effect of all “traditional” factors affecting face alignment performance like
large pose, initialization and resolution, and introduce a “new” one, namely
the size of the network. (e) We show that both 2D and 3D face alignment
networks achieve performance of remarkable accuracy which is probably close to
saturating the datasets used. Demo code and pre-trained models can be
downloaded from this http URL
Syed Zain Masood, Guang Shu, Afshin Dehghan, Enrique G. Ortiz
Comments: 10 pages, 4 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This work details Sighthounds fully automated license plate detection and
recognition system. The core technology of the system is built using a sequence
of deep Convolutional Neural Networks (CNNs) interlaced with accurate and
efficient algorithms. The CNNs are trained and fine-tuned so that they are
robust under different conditions (e.g. variations in pose, lighting,
occlusion, etc.) and can work across a variety of license plate templates (e.g.
sizes, backgrounds, fonts, etc). For quantitative analysis, we show that our
system outperforms the leading license plate detection and recognition
technology i.e. ALPR on several benchmarks. Our system is available to
developers through the Sighthound Cloud API at
this https URL
Daniel Peralta, Isaac Triguero, Salvador García, Yvan Saeys, Jose M. Benitez, Francisco Herrera
Comments: Preprint submitted to Pattern Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The growth of fingerprint databases creates a need for strategies to reduce
the identification time. Fingerprint classification reduces the search
penetration rate by grouping the fingerprints into several classes. Typically,
features describing the visual patterns of a fingerprint are extracted and fed
to a classifier. The extraction can be time-consuming and error-prone,
especially for fingerprints whose visual classification is dubious, and often
includes a criterion to reject ambiguous fingerprints. In this paper, we
propose to improve on this manually designed process by using deep neural
networks, which extract implicit features directly from the images and perform
the classification within a single learning process. An extensive experimental
study assesses that convolutional neural networks outperform all other tested
approaches by achieving a very high accuracy with no rejection. Moreover,
multiple copies of the same fingerprint are consistently classified. The
runtime of convolutional networks is also lower than that of combining feature
extraction procedures with classification algorithms.
Hao Wang, Xiaodan Liang, Hao Zhang, Dit-Yan Yeung, Eric P. Xing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)
Many problems in image processing and computer vision (e.g. colorization,
style transfer) can be posed as ‘manipulating’ an input image into a
corresponding output image given a user-specified guiding signal. A holy-grail
solution towards generic image manipulation should be able to efficiently alter
an input image with any personalized signals (even signals unseen during
training), such as diverse paintings and arbitrary descriptive attributes.
However, existing methods are either inefficient to simultaneously process
multiple signals (let alone generalize to unseen signals), or unable to handle
signals from other modalities. In this paper, we make the first attempt to
address the zero-shot image manipulation task. We cast this problem as
manipulating an input image according to a parametric model whose key
parameters can be conditionally generated from any guiding signal (even unseen
ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a
fully-differentiable architecture that jointly optimizes an
image-transformation network (TNet) and a parameter network (PNet). The PNet
learns to generate key transformation parameters for the TNet given any guiding
signal while the TNet performs fast zero-shot image manipulation according to
both signal-dependent parameters from the PNet and signal-invariant parameters
from the TNet itself. Extensive experiments show that our ZM-Net can perform
high-quality image manipulation conditioned on different forms of guiding
signals (e.g. style images and attributes) in real-time (tens of milliseconds
per image) even for unseen signals. Moreover, a large-scale style dataset with
over 20,000 style images is also constructed to promote further research.
Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Yi Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Person re-identification (re-ID) and attribute recognition share a common
target at the pedestrian description. Their difference consists in the
granularity. Attribute recognition focuses on local aspects of a person while
person re-ID usually extracts global representations. Considering their
similarity and difference, this paper proposes a very simple convolutional
neural network (CNN) that learns a re-ID embedding and predicts the pedestrian
attributes simultaneously. This multi-task method integrates an ID
classification loss and a number of attribute classification losses, and
back-propagates the weighted sum of the individual losses.
Albeit simple, we demonstrate on two pedestrian benchmarks that by learning a
more discriminative representation, our method significantly improves the re-ID
baseline and is scalable on large galleries. We report competitive re-ID
performance compared with the state-of-the-art methods on the two datasets.
Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advances in generative adversarial networks (GANs) have shown
promising potentials in conditional image generation. However, how to generate
high-resolution images remains an open problem. In this paper, we aim at
generating high-resolution well-blended images given composited copy-and-paste
ones, i.e. realistic high-resolution image blending. To achieve this goal, we
propose Gaussian-Poisson GAN (GP-GAN), a framework that combines the strengths
of classical gradient-based approaches and GANs, which is the first work that
explores the capability of GANs in high-resolution image blending task to the
best of our knowledge. Particularly, we propose Gaussian-Poisson Equation to
formulate the high-resolution image blending problem, which is a joint
optimisation constrained by the gradient and colour information. Gradient
filters can obtain gradient information. For generating the colour information,
we propose Blending GAN to learn the mapping between the composited image and
the well-blended one. Compared to the alternative methods, our approach can
deliver high-resolution, realistic images with fewer bleedings and unpleasant
artefacts. Experiments confirm that our approach achieves the state-of-the-art
performance on Transient Attributes dataset. A user study on Amazon Mechanical
Turk finds that majority of workers are in favour of the proposed approach.
Bumsub Ham, Minsu Cho, Cordelia Schmid, Jean Ponce
Comments: arXiv admin note: text overlap with arXiv:1511.05065
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Finding image correspondences remains a challenging problem in the presence
of intra-class variations and large changes in scene layout. Semantic flow
methods are designed to handle images depicting different instances of the same
object or scene category. We introduce a novel approach to semantic flow,
dubbed proposal flow, that establishes reliable correspondences using object
proposals. Unlike prevailing semantic flow approaches that operate on pixels or
regularly sampled local regions, proposal flow benefits from the
characteristics of modern object proposals, that exhibit high repeatability at
multiple scales, and can take advantage of both local and geometric consistency
constraints among proposals. We also show that the corresponding sparse
proposal flow can effectively be transformed into a conventional dense flow
field. We introduce two new challenging datasets that can be used to evaluate
both general semantic flow techniques and region-based approaches such as
proposal flow. We use these benchmarks to compare different matching
algorithms, object proposals, and region features within proposal flow, to the
state of the art in semantic flow. This comparison, along with experiments on
standard datasets, demonstrates that proposal flow significantly outperforms
existing semantic flow methods in various settings.
Youngsung Kim, ByungIn Yoo, Youngjun Kwak, Changkyu Choi, Junmo Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
As the expressive depth of an emotional face differs with individuals,
expressions, or situations, recognizing an expression using a single facial
image at a moment is difficult. One of the approaches to alleviate this
difficulty is using a video-based method that utilizes multiple frames to
extract temporal information between facial expression images. In this paper,
we attempt to utilize a generative image that is estimated based on a given
single image. Then, we propose to utilize a contrastive representation that
explains an expression difference for discriminative purposes. The contrastive
representation is calculated at the embedding layer of a deep network by
comparing a single given image with a reference sample generated by a deep
encoder-decoder network. Consequently, we deploy deep neural networks that
embed a combination of a generative model, a contrastive model, and a
discriminative model. In our proposed networks, we attempt to disentangle a
facial expressive factor in two steps including learning of a reference
generator network and learning of a contrastive encoder network. We conducted
extensive experiments on three publicly available face expression databases
(CK+, MMI, and Oulu-CASIA) that have been widely adopted in the recent
literatures. The proposed method outperforms the known state-of-the art methods
in terms of the recognition accuracy.
Mandar Kulkarni, Kalpesh Patil, Shirish Karande
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Current approaches for Knowledge Distillation (KD) either directly use
training data or sample from the training data distribution. In this paper, we
demonstrate effectiveness of ‘mismatched’ unlabeled stimulus to perform KD for
image classification networks. For illustration, we consider scenarios where
this is a complete absence of training data, or mismatched stimulus has to be
used for augmenting a small amount of training data. We demonstrate that
stimulus complexity is a key factor for distillation’s good performance. Our
examples include use of various datasets for stimulating MNIST and CIFAR
teachers.
Krzysztof J. Geras, Stacey Wolfson, S. Gene Kim, Linda Moy, Kyunghyun Cho
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Recent advances in deep learning for object recognition in natural images has
prompted a surge of interest in applying a similar set of techniques to medical
images. Most of the initial attempts largely focused on replacing the input to
such a deep convolutional neural network from a natural image to a medical
image. This, however, does not take into consideration the fundamental
differences between these two types of data. More specifically, detection or
recognition of an anomaly in medical images depends significantly on fine
details, unlike object recognition in natural images where coarser, more global
structures matter more. This difference makes it inadequate to use the existing
deep convolutional neural networks architectures, which were developed for
natural images, because they rely on heavily downsampling an image to a much
lower resolution to reduce the memory requirements. This hides details
necessary to make accurate predictions for medical images. Furthermore, a
single exam in medical imaging often comes with a set of different views which
must be seamlessly fused in order to reach a correct conclusion. In our work,
we propose to use a multi-view deep convolutional neural network that handles a
set of more than one high-resolution medical image. We evaluate this network on
large-scale mammography-based breast cancer screening (BI-RADS prediction)
using 103 thousand images. We focus on investigating the impact of training set
sizes and image sizes on the prediction accuracy. Our results highlight that
performance clearly increases with the size of training set, and that the best
performance can only be achieved using the images in the original resolution.
This suggests the future direction of medical imaging research using deep
neural networks is to utilize as much data as possible with the least amount of
potentially harmful preprocessing.
Mohammad Sadegh Aliakbarian, Fatemehsadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, Lars Andersson
Comments: 13 Pages, 7 Figures, 9 Tables. arXiv admin note: substantial text overlap with arXiv:1611.05520
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In contrast to the widely studied problem of recognizing an action given a
complete sequence, action anticipation aims to identify the action from only
partially available videos. As such, it is therefore key to the success of
computer vision applications requiring to react as early as possible, such as
autonomous navigation. In this paper, we propose a new action anticipation
method that achieves high prediction accuracy even in the presence of a very
small percentage of a video sequence. To this end, we develop a multi-stage
LSTM architecture that leverages context- and action-aware features, and
introduce a novel loss function that encourages the model to predict the
correct class as early as possible. Our experiments on standard benchmark
datasets evidence the benefits of our approach; We outperform the
state-of-the-art action anticipation methods for early prediction by a relative
increase in accuracy of 22.0% on JHMDB-21, 14.0% on UT-Interaction and 49.9% on
UCF-101.
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
A natural image usually conveys rich semantic content and can be viewed from
different angles. Existing image description methods are largely restricted by
small sets of biased visual paragraph annotations, and fail to cover rich
underlying semantics. In this paper, we investigate a semi-supervised paragraph
generative framework that is able to synthesize diverse and semantically
coherent paragraph descriptions by reasoning over local semantic regions and
exploiting linguistic knowledge. The proposed Recurrent Topic-Transition
Generative Adversarial Network (RTT-GAN) builds an adversarial framework
between a structured paragraph generator and multi-level paragraph
discriminators. The paragraph generator generates sentences recurrently by
incorporating region-based visual and language attention mechanisms at each
step. The quality of generated paragraph sentences is assessed by multi-level
adversarial discriminators from two aspects, namely, plausibility at sentence
level and topic-transition coherence at paragraph level. The joint adversarial
training of RTT-GAN drives the model to generate realistic paragraphs with
smooth logical transition between sentence topics. Extensive quantitative
experiments on image and video paragraph datasets demonstrate the effectiveness
of our RTT-GAN in both supervised and semi-supervised settings. Qualitative
results on telling diverse stories for an image also verify the
interpretability of RTT-GAN.
Behzad Hasani, Mohammad H. Mahoor
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automated Facial Expression Recognition (FER) has been a challenging task for
decades. Many of the existing works use hand-crafted features such as LBP, HOG,
LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as
Support Vector Machines for expression recognition. These methods often require
rigorous hyperparameter tuning to achieve good results. Recently Deep Neural
Networks (DNN) have shown to outperform traditional methods in visual object
recognition. In this paper, we propose a two-part network consisting of a
DNN-based architecture followed by a Conditional Random Field (CRF) module for
facial expression recognition in videos. The first part captures the spatial
relation within facial images using convolutional layers followed by three
Inception-ResNet modules and two fully-connected layers. To capture the
temporal relation between the image frames, we use linear chain CRF in the
second part of our network. We evaluate our proposed network on three publicly
available databases, viz. CK+, MMI, and FERA. Experiments are performed in
subject-independent and cross-database manners. Our experimental results show
that cascading the deep network architecture with the CRF module considerably
increases the recognition of facial expressions in videos and in particular it
outperforms the state-of-the-art methods in the cross-database experiments and
yields comparable results in the subject-independent experiments.
Yan Wang, Lingxi Xie, Chenxi Liu, Ya Zhang, Wenjun Zhang, Alan Yuille
Comments: Submitted to ICCV 2017 (10 pages, 4 figures)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we reveal the importance and benefits of introducing
second-order operations into deep neural networks. We propose a novel approach
named Second-Order Response Transform (SORT), which appends element-wise
product transform to the linear sum of a two-branch network module. A direct
advantage of SORT is to facilitate cross-branch response propagation, so that
each branch can update its weights based on the current status of the other
branch. Moreover, SORT augments the family of transform operations and
increases the nonlinearity of the network, making it possible to learn flexible
functions to fit the complicated distribution of feature space. SORT can be
applied to a wide range of network architectures, including a branched variant
of a chain-styled network and a residual network, with very light-weighted
modifications. We observe consistent accuracy gain on both small (CIFAR10,
CIFAR100 and SVHN) and big (ILSVRC2012) datasets. In addition, SORT is very
efficient, as the extra computation overhead is less than 5%.
Miriam W. Huijser, Jan C. van Gemert
Comments: ICCV submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
This paper is on active learning where the goal is to reduce the data
annotation burden by interacting with a (human) oracle during training.
Standard active learning methods ask the oracle to annotate data samples.
Instead, we take a profoundly different approach: we ask for annotations of the
decision boundary. We achieve this using a deep generative model to create
novel instances along a 1d line. A point on the decision boundary is revealed
where the instances change class. Experimentally we show on three data sets
that our method can be plugged-in to other active learning schemes, that human
oracles can effectively annotate points on the decision boundary, that our
method is robust to annotation noise, and that decision boundary annotations
improve over annotating data samples.
Hang Zhang, Kristin Dana
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent work in style transfer learns a feed-forward generative network to
approximate the prior optimization-based approaches, resulting in real-time
performance. However, these methods require training separate networks for
different target styles which greatly limits the scalability. We introduce a
Multi-style Generative Network (MSG-Net) with a novel Inspiration Layer, which
retains the functionality of optimization-based approaches and has the fast
speed of feed-forward networks. The proposed Inspiration Layer explicitly
matches the feature statistics with the target styles at run time, which
dramatically improves versatility of existing generative network, so that
multiple styles can be realized within one network. The proposed MSG-Net
matches image styles at multiple scales and puts the computational burden into
the training. The learned generator is a compact feed-forward network that runs
in real-time after training. Comparing to previous work, the proposed network
can achieve fast style transfer with at least comparable quality using a single
network. The experimental results have covered (but are not limited to)
simultaneous training of twenty different styles in a single network. The
complete software system and pre-trained models will be publicly available upon
publication.
Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, Ondrej Chum
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Despite the success of deep learning on representing images for particular
object retrieval, recent studies show that the learned representations still
lie on manifolds in a high dimensional space. Therefore, nearest neighbor
search cannot be expected to be optimal for this task. Even if a nearest
neighbor graph is computed offline, exploring the manifolds online remains
expensive. This work introduces an explicit embedding reducing manifold search
to Euclidean search followed by dot product similarity search. We show this is
equivalent to linear graph filtering of a sparse signal in the frequency
domain, and we introduce a scalable offline computation of an approximate
Fourier basis of the graph. We improve the state of art on standard particular
object retrieval datasets including a challenging one containing small objects.
At a scale of (10^5) images, the offline cost is only a few hours, while query
time is comparable to standard similarity search.
Weiyao Lin, Yang Shen, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang, Ke Lu
Comments: accepted by IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1504.06243
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
This paper addresses the problem of handling spatial misalignments due to
camera-view changes or human-pose variations in person re-identification. We
first introduce a boosting-based approach to learn a correspondence structure
which indicates the patch-wise matching probabilities between images from a
target camera pair. The learned correspondence structure can not only capture
the spatial correspondence pattern between cameras but also handle the
viewpoint or human-pose variation in individual images. We further introduce a
global constraint-based matching process. It integrates a global matching
constraint over the learned correspondence structure to exclude cross-view
misalignments during the image patch matching process, hence achieving a more
reliable matching score between images. Finally, we also extend our approach by
introducing a multi-structure scheme, which learns a set of local
correspondence structures to capture the spatial correspondence sub-patterns
between a camera pair, so as to handle the spatial misalignments between
individual images in a more precise way. Experimental results on various
datasets demonstrate the effectiveness of our approach.
Marco Fiorucci, Alessandro Torcinovich, Manuel Curado, Francisco Escolano, Marcello Pelillo
Comments: GbR2017 to appear in Lecture Notes in Computer Science (LNCS)
Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV)
In this paper we analyze the practical implications of Szemer’edi’s
regularity lemma in the preservation of metric information contained in large
graphs. To this end, we present a heuristic algorithm to find regular
partitions. Our experiments show that this method is quite robust to the
natural sparsification of proximity graphs. In addition, this robustness can be
enforced by graph densification.
Xin Huang, Yuxin Peng
Comments: 6 pages, 1 figure, to appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 – Jul 14, 2017, Hong Kong, Hong Kong
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
DNN-based cross-modal retrieval has become a research hotspot, by which users
can search results across various modalities like image and text. However,
existing methods mainly focus on the pairwise correlation and reconstruction
error of labeled data. They ignore the semantically similar and dissimilar
constraints between different modalities, and cannot take advantage of
unlabeled data. This paper proposes Cross-modal Deep Metric Learning with
Multi-task Regularization (CDMLMR), which integrates quadruplet ranking loss
and semi-supervised contrastive loss for modeling cross-modal semantic
similarity in a unified multi-task learning architecture. The quadruplet
ranking loss can model the semantically similar and dissimilar constraints to
preserve cross-modal relative similarity ranking information. The
semi-supervised contrastive loss is able to maximize the semantic similarity on
both labeled and unlabeled data. Compared to the existing methods, CDMLMR
exploits not only the similarity ranking information but also unlabeled
cross-modal data, and thus boosts cross-modal retrieval accuracy.
Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Imitation learning has been commonly applied to solve different tasks in
isolation. This usually requires either careful feature engineering, or a
significant number of samples. This is far from what we desire: ideally, robots
should be able to learn from very few demonstrations of any given task, and
instantly generalize to new situations of the same task, without requiring
task-specific engineering. In this paper, we propose a meta-learning framework
for achieving such capability, which we call one-shot imitation learning.
Specifically, we consider the setting where there is a very large set of
tasks, and each task has many instantiations. For example, a task could be to
stack all blocks on a table into a single tower, another task could be to place
all blocks on a table into two-block towers, etc. In each case, different
instances of the task would consist of different sets of blocks with different
initial states. At training time, our algorithm is presented with pairs of
demonstrations for a subset of all tasks. A neural net is trained that takes as
input one demonstration and the current state (which initially is the initial
state of the other demonstration of the pair), and outputs an action with the
goal that the resulting sequence of states and actions matches as closely as
possible with the second demonstration. At test time, a demonstration of a
single instance of a new task is presented, and the neural net is expected to
perform well on new instances of this new task. The use of soft attention
allows the model to generalize to conditions and tasks unseen in the training
data. We anticipate that by training this model on a much greater variety of
tasks and settings, we will obtain a general system that can turn any
demonstrations into robust policies that can accomplish an overwhelming variety
of tasks.
Videos available at this https URL
Vladimir Marochko, Leonard Johard, Manuel Mazzara
Journal-ref: 11th International Conference on Agents and Multi-agent Systems
Technologies and Applications, 2017
Subjects: Artificial Intelligence (cs.AI)
Catastrophic forgetting is of special importance in reinforcement learning,
as the data distribution is generally non-stationary over time. We study and
compare several pseudorehearsal approaches for Q-learning with function
approximation in a pole balancing task. We have found that pseudorehearsal
seems to assist learning even in such very simple problems, given proper
initialization of the rehearsal parameters.
Julien Savaux, Julien Vion, Sylvain Piechowiak, René Mandiau, Toshihiro Matsui, Katsutoshi Hirayama, Makoto Yokoo, Shakre Elmane, Marius Silaghi
Subjects: Artificial Intelligence (cs.AI)
Privacy has traditionally been a major motivation for distributed problem
solving. Distributed Constraint Satisfaction Problem (DisCSP) as well as
Distributed Constraint Optimization Problem (DCOP) are fundamental models used
to solve various families of distributed problems. Even though several
approaches have been proposed to quantify and preserve privacy in such
problems, none of them is exempt from limitations. Here we approach the problem
by assuming that computation is performed among utilitarian agents. We
introduce a utilitarian approach where the utility of each state is estimated
as the difference between the reward for reaching an agreement on assignments
of shared variables and the cost of privacy loss. We investigate extensions to
solvers where agents integrate the utility function to guide their search and
decide which action to perform, defining thereby their policy. We show that
these extended solvers succeed in significantly reducing privacy loss without
significant degradation of the solution quality.
Hao Wang, Xiaodan Liang, Hao Zhang, Dit-Yan Yeung, Eric P. Xing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)
Many problems in image processing and computer vision (e.g. colorization,
style transfer) can be posed as ‘manipulating’ an input image into a
corresponding output image given a user-specified guiding signal. A holy-grail
solution towards generic image manipulation should be able to efficiently alter
an input image with any personalized signals (even signals unseen during
training), such as diverse paintings and arbitrary descriptive attributes.
However, existing methods are either inefficient to simultaneously process
multiple signals (let alone generalize to unseen signals), or unable to handle
signals from other modalities. In this paper, we make the first attempt to
address the zero-shot image manipulation task. We cast this problem as
manipulating an input image according to a parametric model whose key
parameters can be conditionally generated from any guiding signal (even unseen
ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a
fully-differentiable architecture that jointly optimizes an
image-transformation network (TNet) and a parameter network (PNet). The PNet
learns to generate key transformation parameters for the TNet given any guiding
signal while the TNet performs fast zero-shot image manipulation according to
both signal-dependent parameters from the PNet and signal-invariant parameters
from the TNet itself. Extensive experiments show that our ZM-Net can perform
high-quality image manipulation conditioned on different forms of guiding
signals (e.g. style images and attributes) in real-time (tens of milliseconds
per image) even for unseen signals. Moreover, a large-scale style dataset with
over 20,000 style images is also constructed to promote further research.
Niek Tax, Benjamin Dalmas, Natalia Sidorova, Wil M P van der Aalst, Sylvie Norre
Comments: submitted to the International Conference on Business Process Management (BPM) 2017
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Learning (cs.LG)
Local Process Models (LPM) describe structured fragments of process behavior
occurring in the context of less structured business processes. Traditional LPM
discovery aims to generate a collection of process models that describe highly
frequent behavior, but these models do not always provide useful answers for
questions posed by process analysts aiming at business process improvement. We
propose a framework for goal-driven LPM discovery, based on utility functions
and constraints. We describe four scopes on which these utility functions and
constrains can be defined, and show that utility functions and constraints on
different scopes can be combined to form composite utility
functions/constraints. Finally, we demonstrate the applicability of our
approach by presenting several actionable business insights discovered with LPM
discovery on two real life data sets.
Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz
Comments: 5 pages, 5 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
Language understanding is a key component in a spoken dialogue system. In
this paper, we investigate how the language understanding module influences the
dialogue system performance by conducting a series of systematic experiments on
a task-oriented neural dialogue system in a reinforcement learning based
setting. The empirical study shows that among different types of language
understanding errors, slot-level errors can have more impact on the overall
performance of a dialogue system compared to intent-level errors. In addition,
our experiments demonstrate that the reinforcement learning based dialogue
system is able to learn when and what to confirm in order to achieve better
performance and greater robustness.
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
A natural image usually conveys rich semantic content and can be viewed from
different angles. Existing image description methods are largely restricted by
small sets of biased visual paragraph annotations, and fail to cover rich
underlying semantics. In this paper, we investigate a semi-supervised paragraph
generative framework that is able to synthesize diverse and semantically
coherent paragraph descriptions by reasoning over local semantic regions and
exploiting linguistic knowledge. The proposed Recurrent Topic-Transition
Generative Adversarial Network (RTT-GAN) builds an adversarial framework
between a structured paragraph generator and multi-level paragraph
discriminators. The paragraph generator generates sentences recurrently by
incorporating region-based visual and language attention mechanisms at each
step. The quality of generated paragraph sentences is assessed by multi-level
adversarial discriminators from two aspects, namely, plausibility at sentence
level and topic-transition coherence at paragraph level. The joint adversarial
training of RTT-GAN drives the model to generate realistic paragraphs with
smooth logical transition between sentence topics. Extensive quantitative
experiments on image and video paragraph datasets demonstrate the effectiveness
of our RTT-GAN in both supervised and semi-supervised settings. Qualitative
results on telling diverse stories for an image also verify the
interpretability of RTT-GAN.
Weiyao Lin, Yang Shen, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang, Ke Lu
Comments: accepted by IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1504.06243
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
This paper addresses the problem of handling spatial misalignments due to
camera-view changes or human-pose variations in person re-identification. We
first introduce a boosting-based approach to learn a correspondence structure
which indicates the patch-wise matching probabilities between images from a
target camera pair. The learned correspondence structure can not only capture
the spatial correspondence pattern between cameras but also handle the
viewpoint or human-pose variation in individual images. We further introduce a
global constraint-based matching process. It integrates a global matching
constraint over the learned correspondence structure to exclude cross-view
misalignments during the image patch matching process, hence achieving a more
reliable matching score between images. Finally, we also extend our approach by
introducing a multi-structure scheme, which learns a set of local
correspondence structures to capture the spatial correspondence sub-patterns
between a camera pair, so as to handle the spatial misalignments between
individual images in a more precise way. Experimental results on various
datasets demonstrate the effectiveness of our approach.
Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei, Peihao Wu, Wenchang Situ, Shuai Li, Yang Zhang
Comments: 8 pages. arXiv admin note: text overlap with arXiv:1703.01024
Subjects: Computation and Language (cs.CL)
Recurrent neural networks (RNNs), especially long short-term memory (LSTM)
RNNs, are effective network for sequential task like speech recognition. Deeper
LSTM models perform well on large vocabulary continuous speech recognition,
because of their impressive learning ability. However, it is more difficult to
train a deeper network. We introduce a training framework with layer-wise
training and exponential moving average methods for deeper LSTM models. It is a
competitive framework that LSTM models of more than 7 layers are successfully
trained on Shenma voice search data in Mandarin and they outperform the deep
LSTM models trained by conventional approach. Moreover, in order for online
streaming speech recognition applications, the shallow model with low real time
factor is distilled from the very deep model. The recognition accuracy have
little loss in the distillation process. Therefore, the model trained with the
proposed training framework reduces relative 14\% character error rate,
compared to original model which has the similar real-time capability.
Furthermore, the novel transfer learning strategy with segmental Minimum
Bayes-Risk is also introduced in the framework. The strategy makes it possible
that training with only a small part of dataset could outperform full dataset
training from the beginning.
Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz
Comments: 5 pages, 5 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
Language understanding is a key component in a spoken dialogue system. In
this paper, we investigate how the language understanding module influences the
dialogue system performance by conducting a series of systematic experiments on
a task-oriented neural dialogue system in a reinforcement learning based
setting. The empirical study shows that among different types of language
understanding errors, slot-level errors can have more impact on the overall
performance of a dialogue system compared to intent-level errors. In addition,
our experiments demonstrate that the reinforcement learning based dialogue
system is able to learn when and what to confirm in order to achieve better
performance and greater robustness.
Stefano Bennati, Catholijn M. Jonker
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
This paper introduces PriMaL, a general PRIvacy-preserving MAchine-Learning
method for reducing the privacy cost of information transmitted through a
network. Distributed sensor networks are often used for automated
classification and detection of abnormal events in high-stakes situations, e.g.
fire in buildings, earthquakes, or crowd disasters. Such networks might
transmit privacy-sensitive information, e.g. GPS location of smartphones, which
might be disclosed if the network is compromised. Privacy concerns might slow
down the adoption of the technology, in particular in the scenario of social
sensing where participation is voluntary, thus solutions are needed which
improve privacy without compromising on the event detection accuracy. PriMaL is
implemented as a machine-learning layer that works on top of an existing event
detection algorithm. Experiments are run in a general simulation framework, for
several network topologies and parameter values. The privacy footprint of
state-of-the-art event detection algorithms is compared within the proposed
framework. Results show that PriMaL is able to reduce the privacy cost of a
distributed event detection algorithm below that of the corresponding
centralized algorithm, within the bounds of some assumptions about the
protocol. Moreover the performance of the distributed algorithm is not
statistically worse than that of the centralized algorithm.
Haichuan Yang, Shupeng Gui, Chuyang Ke, Daniel Stefankovic, Ryohei Fujimaki, Ji Liu
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The cardinality constraint is an intrinsic way to restrict the solution
structure in many domains, for example, sparse learning, feature selection, and
compressed sensing. To solve a cardinality constrained problem, the key
challenge is to solve the projection onto the cardinality constraint set, which
is NP-hard in general when there exist multiple overlapped cardiaiality
constraints. In this paper, we consider the scenario where overlapped
cardinality constraints satisfy a Three-view Cardinality Structure (TVCS),
which reflects the natural restriction in many applications, such as
identification of gene regulatory networks and task-worker assignment problem.
We cast the projection onto the TVCS set into a linear programming, and prove
that its solution can be obtained by finding an integer solution to such linear
programming. We further prove that such integer solution can be found with the
complexity proportional to the problem scale. We finally use synthetic
experiments and two interesting applications in bioinformatics and
crowdsourcing to validate the proposed TVCS model and method.
Mandar Kulkarni, Shirish Karande
Journal-ref: Deep Learning for Pattern Recognition (DLPR) workshop at ICPR 2016
Subjects: Learning (cs.LG)
Deep learning has shown promising results in many machine learning
applications. The hierarchical feature representation built by deep networks
enable compact and precise encoding of the data. A kernel analysis of the
trained deep networks demonstrated that with deeper layers, more simple and
more accurate data representations are obtained. In this paper, we propose an
approach for layer-wise training of a deep network for the supervised
classification task. A transformation matrix of each layer is obtained by
solving an optimization aimed at a better representation where a subsequent
layer builds its representation on the top of the features produced by a
previous layer. We compared the performance of our approach with a DNN trained
using back-propagation which has same architecture as ours. Experimental
results on the real image datasets demonstrate efficacy of our approach. We
also performed kernel analysis of layer representations to validate the claim
of better feature encoding.
Esben Jannik Bjerrum
Subjects: Learning (cs.LG)
Simplified Molecular Input Line Entry System (SMILES) is a single line text
representation of a unique molecule. One molecule can however have multiple
SMILES strings, which is a reason that canonical SMILES have been defined,
which ensures a one to one correspondence between SMILES string and molecule.
Here the fact that multiple SMILES represent the same molecule is explored as a
technique for data augmentation of a molecular QSAR dataset modeled by a long
short term memory (LSTM) cell based neural network. The augmented dataset was
130 times bigger than the original. The network trained with the augmented
dataset shows better performance on a test set when compared to a model built
with only one canonical SMILES string per molecule. The correlation coefficient
R2 on the test set was improved from 0.56 to 0.66 when using SMILES
enumeration, and the root mean square error (RMS) likewise fell from 0.62 to
0.55. The technique also works in the prediction phase. By taking the average
per molecule of the predictions for the enumerated SMILES a further improvement
to a correlation coefficient of 0.68 and a RMS of 0.52 was found.
Prasoon Goyal, Zhiting Hu, Xiaodan Liang, Chenyu Wang, Eric Xing
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The recently developed variational autoencoders (VAEs) have proved to be an
effective confluence of the rich representational power of neural networks with
Bayesian methods. However, most work on VAEs use a rather simple prior over the
latent variables such as standard normal distribution, thereby restricting its
applications to relatively simple phenomena. In this work, we propose
hierarchical nonparametric variational autoencoders, which combines
tree-structured Bayesian nonparametric priors with VAEs, to enable infinite
flexibility of the latent representation space. Both the neural parameters and
Bayesian priors are learned jointly using tailored variational inference. The
resulting model induces a hierarchical structure of latent semantic concepts
underlying the data corpus, and infers accurate representations of data
instances. We apply our model in video representation learning. Our method is
able to discover highly interpretable activity hierarchies, and obtain improved
clustering accuracy and generalization capacity based on the learned rich
representations.
Xin Huang, Yuxin Peng
Comments: 6 pages, 1 figure, to appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 – Jul 14, 2017, Hong Kong, Hong Kong
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
DNN-based cross-modal retrieval has become a research hotspot, by which users
can search results across various modalities like image and text. However,
existing methods mainly focus on the pairwise correlation and reconstruction
error of labeled data. They ignore the semantically similar and dissimilar
constraints between different modalities, and cannot take advantage of
unlabeled data. This paper proposes Cross-modal Deep Metric Learning with
Multi-task Regularization (CDMLMR), which integrates quadruplet ranking loss
and semi-supervised contrastive loss for modeling cross-modal semantic
similarity in a unified multi-task learning architecture. The quadruplet
ranking loss can model the semantically similar and dissimilar constraints to
preserve cross-modal relative similarity ranking information. The
semi-supervised contrastive loss is able to maximize the semantic similarity on
both labeled and unlabeled data. Compared to the existing methods, CDMLMR
exploits not only the similarity ranking information but also unlabeled
cross-modal data, and thus boosts cross-modal retrieval accuracy.
Guokun Lai, Wei-Cheng Chang, Yiming Yang, Hanxiao Liu
Subjects: Learning (cs.LG)
Multivariate time series forecasting is an important machine learning problem
across many domains, including predictions of solar plant energy output,
electricity consumption, and traffic jam situation. Temporal data arise in
these real-world applications often involves a mixture of long-term and
short-term patterns, for which traditional approaches such as Autoregressive
models and Gaussian Process may fail. In this paper, we proposed a novel deep
learning framework, namely Long- and Short-term Time-series network (LSTNet),
to address this open challenge. LSTNet uses the Convolution Neural Network
(CNN) to extract short-term local dependency patterns among variables, and the
Recurrent Neural Network (RNN) to discover long-term patterns and trends. In
our evaluation on real-world data with complex mixtures of repetitive patterns,
LSTNet achieved significant performance improvements over that of several
state-of-the-art baseline methods.
Harini Suresh, Peter Szolovits, Marzyeh Ghassemi
Journal-ref: NIPS Workshop on Machine Learning for Healthcare (NIPS ML4HC) 2016
Subjects: Learning (cs.LG)
We use autoencoders to create low-dimensional embeddings of underlying
patient phenotypes that we hypothesize are a governing factor in determining
how different patients will react to different interventions. We compare the
performance of autoencoders that take fixed length sequences of concatenated
timesteps as input with a recurrent sequence-to-sequence autoencoder. We
evaluate our methods on around 35,500 patients from the latest MIMIC III
dataset from Beth Israel Deaconess Hospital.
Ben Goertzel, Nil Geisweiller, Chris Poulin
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
A general formulation of optimization problems in which various candidate
solutions may use different feature-sets is presented, encompassing supervised
classification, automated program learning and other cases. A novel
characterization of the concept of a “good quality feature” for such an
optimization problem is provided; and a proposal regarding the integration of
quality based feature selection into metalearning is suggested, wherein the
quality of a feature for a problem is estimated using knowledge about related
features in the context of related problems. Results are presented regarding
extensive testing of this “feature metalearning” approach on supervised text
classification problems; it is demonstrated that, in this context, feature
metalearning can provide significant and sometimes dramatic speedup over
standard feature selection heuristics.
Natali Ruchansky, Sungyong Seo, Yan Liu
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)
In the recent political climate, the topic of fake news has drawn attention
both from the public and the academic communities. Such misinformation has been
cited to have a strong impact on public opinion, presenting the opportunity for
malicious manipulation. Detecting fake news is an important, yet challenging
problem since it is often difficult for humans to distinguish misinformation.
However, there have been three generally agreed upon characteristics of fake
news: the text, the response received, and the source users promoting it.
Existing work has largely focused on tailoring solutions to a particular
characteristic, but the complexity of the fake news epidemic limited their
success and generality.
In this work, we propose a model that combines all three characteristics for
a more accurate and automated prediction. Specifically, we incorporate the
behavior of both parties, users and articles, and the group behavior of users
who propagate fake news. Motivated by the three characteristics, we propose a
model called CSI, which is composed of three modules: Capture, Score, and
Integrate. The first module uses a Recurrent Neural Network (RNN) to capture
the temporal pattern of user activity that occurred with a given article, and
the second captures the behavior of users over time. The two are then
integrated with the third module to classify an article as fake or not. Through
experimental analysis on real-world data, we demonstrate that CSI achieves
higher accuracy than existing models. Further, we show that each module
captures relevant behavioral information both on users and articles with
respect to the propagation of fake news.
Hiva Ghanbari, Katya Scheinberg
Subjects: Learning (cs.LG)
In this work, we utilize a Trust Region based Derivative Free Optimization
(DFO-TR) method to directly maximize the Area Under Receiver Operating
Characteristic Curve (AUC), which is a nonsmooth, noisy function. We show that
AUC is a smooth function, in expectation, if the distributions of the positive
and negative data points obey a jointly normal distribution. The practical
performance of this algorithm is compared to three prominent Bayesian
optimization methods and random search. The presented numerical results show
that DFO-TR surpasses Bayesian optimization and random search on various
black-box optimization problem, such as maximizing AUC and hyperparameter
tuning.
Iaroslav Omelianenko
Comments: arXiv admin note: text overlap with arXiv:1207.0580 by other authors
Subjects: Learning (cs.LG); Computers and Society (cs.CY)
In the modern era, each Internet user leaves enormous amounts of auxiliary
digital residuals (footprints) by using a variety of on-line services. All this
data is already collected and stored for many years. In recent works, it was
demonstrated that it’s possible to apply simple machine learning methods to
analyze collected digital footprints and to create psychological profiles of
individuals. However, while these works clearly demonstrated the applicability
of machine learning methods for such an analysis, created simple prediction
models still lacks accuracy necessary to be successfully applied to practical
needs. We have assumed that using advanced deep machine learning methods may
considerably increase the accuracy of predictions. We started with simple
machine learning methods to estimate basic prediction performance and moved
further by applying advanced methods based on shallow and deep neural networks.
Then we compared prediction power of studied models and made conclusions about
its performance. Finally, we made hypotheses how prediction accuracy can be
further improved. As result of this work, we provide full source code used in
the experiments for all interested researchers and practitioners in
corresponding GitHub repository. We believe that applying deep machine learning
for psychological profiling may have an enormous impact on the society (for
good or worse) and providing full source code of our research we hope to
intensify further research by the wider circle of scholars.
Chris Donahue, Zachary C. Lipton, Julian McAuley
Subjects: Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Machine Learning (stat.ML)
Dance Dance Revolution (DDR) is a popular rhythm-based video game. Players
perform steps on a dance platform in synchronization with music as directed by
on-screen step charts. While many step charts are available in standardized
packs, users may grow tired of existing charts, or wish to dance to a song for
which no chart exists. We introduce the task of learning to choreograph. Given
a raw audio track, the goal is to produce a new step chart. This task
decomposes naturally into two subtasks: deciding when to place steps and
deciding which steps to select. For the step placement task, we combine
recurrent and convolutional neural networks to ingest spectrograms of low-level
audio features to predict steps, conditioned on chart difficulty. For step
selection, we present a conditional LSTM generative model that substantially
outperforms n-gram and fixed-window approaches.
Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Imitation learning has been commonly applied to solve different tasks in
isolation. This usually requires either careful feature engineering, or a
significant number of samples. This is far from what we desire: ideally, robots
should be able to learn from very few demonstrations of any given task, and
instantly generalize to new situations of the same task, without requiring
task-specific engineering. In this paper, we propose a meta-learning framework
for achieving such capability, which we call one-shot imitation learning.
Specifically, we consider the setting where there is a very large set of
tasks, and each task has many instantiations. For example, a task could be to
stack all blocks on a table into a single tower, another task could be to place
all blocks on a table into two-block towers, etc. In each case, different
instances of the task would consist of different sets of blocks with different
initial states. At training time, our algorithm is presented with pairs of
demonstrations for a subset of all tasks. A neural net is trained that takes as
input one demonstration and the current state (which initially is the initial
state of the other demonstration of the pair), and outputs an action with the
goal that the resulting sequence of states and actions matches as closely as
possible with the second demonstration. At test time, a demonstration of a
single instance of a new task is presented, and the neural net is expected to
perform well on new instances of this new task. The use of soft attention
allows the model to generalize to conditions and tasks unseen in the training
data. We anticipate that by training this model on a much greater variety of
tasks and settings, we will obtain a general system that can turn any
demonstrations into robust policies that can accomplish an overwhelming variety
of tasks.
Videos available at this https URL
Mathurin Massias, Alexandre Gramfort, Joseph Salmon
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO)
Convex sparsity-promoting regularizations are ubiquitous in modern
statistical learning. By construction, they yield solutions with few non-zero
coefficients, which correspond to saturated constraints in the dual
optimization formulation. Working set (WS) strategies are generic optimization
techniques that consist in solving simpler problems that only consider a subset
of constraints, whose indices form the WS. Working set methods therefore
involve two nested iterations: the outer loop corresponds to the definition of
the WS and the inner loop calls a solver for the subproblems. For the Lasso
estimator a WS is a set of features, while for a Group Lasso it refers to a set
of groups. In practice, WS are generally small in this context so the
associated feature Gram matrix can fit in memory. Here we show that the
Gauss-Southwell rule (a greedy strategy for block coordinate descent
techniques) leads to fast solvers in this case. Combined with a working set
strategy based on an aggressive use of so-called Gap Safe screening rules, we
propose a solver achieving state-of-the-art performance on sparse learning
problems. Results are presented on Lasso and multi-task Lasso estimators.
Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, Jean-Baptiste Mouret
Comments: 8 pages, 6 figures
Subjects: Robotics (cs.RO); Learning (cs.LG)
The most data-efficient algorithms for reinforcement learning (RL) in
robotics are based on uncertain dynamical models: after each episode, they
first learn a dynamical model of the robot, then they use an optimization
algorithm to find a policy that maximizes the expected return given the model
and its uncertainties. It is often believed that this optimization can be
tractable only if analytical, gradient-based algorithms are used; however,
these algorithms require using specific families of reward functions and
policies, which greatly limits the flexibility of the overall approach. In this
paper, we introduce a novel model-based RL algorithm, called Black-DROPS
(Black-box Data-efficient RObot Policy Search) that: (1) does not impose any
constraint on the reward function or the policy (they are treated as
black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for
data-efficient RL in robotics, and (3) is as fast (or faster) than analytical
approaches when several cores are available. The key idea is to replace the
gradient-based optimization algorithm with a parallel, black-box algorithm that
takes into account the model uncertainties. We demonstrate the performance of
our new algorithm on two standard control benchmark problems (in simulation)
and a low-cost robotic manipulator (with a real robot).
Hao Wang, Xiaodan Liang, Hao Zhang, Dit-Yan Yeung, Eric P. Xing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)
Many problems in image processing and computer vision (e.g. colorization,
style transfer) can be posed as ‘manipulating’ an input image into a
corresponding output image given a user-specified guiding signal. A holy-grail
solution towards generic image manipulation should be able to efficiently alter
an input image with any personalized signals (even signals unseen during
training), such as diverse paintings and arbitrary descriptive attributes.
However, existing methods are either inefficient to simultaneously process
multiple signals (let alone generalize to unseen signals), or unable to handle
signals from other modalities. In this paper, we make the first attempt to
address the zero-shot image manipulation task. We cast this problem as
manipulating an input image according to a parametric model whose key
parameters can be conditionally generated from any guiding signal (even unseen
ones). To this end, we propose the Zero-shot Manipulation Net (ZM-Net), a
fully-differentiable architecture that jointly optimizes an
image-transformation network (TNet) and a parameter network (PNet). The PNet
learns to generate key transformation parameters for the TNet given any guiding
signal while the TNet performs fast zero-shot image manipulation according to
both signal-dependent parameters from the PNet and signal-invariant parameters
from the TNet itself. Extensive experiments show that our ZM-Net can perform
high-quality image manipulation conditioned on different forms of guiding
signals (e.g. style images and attributes) in real-time (tens of milliseconds
per image) even for unseen signals. Moreover, a large-scale style dataset with
over 20,000 style images is also constructed to promote further research.
Mandar Kulkarni, Kalpesh Patil, Shirish Karande
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Current approaches for Knowledge Distillation (KD) either directly use
training data or sample from the training data distribution. In this paper, we
demonstrate effectiveness of ‘mismatched’ unlabeled stimulus to perform KD for
image classification networks. For illustration, we consider scenarios where
this is a complete absence of training data, or mismatched stimulus has to be
used for augmenting a small amount of training data. We demonstrate that
stimulus complexity is a key factor for distillation’s good performance. Our
examples include use of various datasets for stimulating MNIST and CIFAR
teachers.
Niek Tax, Benjamin Dalmas, Natalia Sidorova, Wil M P van der Aalst, Sylvie Norre
Comments: submitted to the International Conference on Business Process Management (BPM) 2017
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Learning (cs.LG)
Local Process Models (LPM) describe structured fragments of process behavior
occurring in the context of less structured business processes. Traditional LPM
discovery aims to generate a collection of process models that describe highly
frequent behavior, but these models do not always provide useful answers for
questions posed by process analysts aiming at business process improvement. We
propose a framework for goal-driven LPM discovery, based on utility functions
and constraints. We describe four scopes on which these utility functions and
constrains can be defined, and show that utility functions and constraints on
different scopes can be combined to form composite utility
functions/constraints. Finally, we demonstrate the applicability of our
approach by presenting several actionable business insights discovered with LPM
discovery on two real life data sets.
Atsushi Shibagaki, Ichiro Takeuchi
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)
We study primal-dual type stochastic optimization algorithms with non-uniform
sampling. Our main theoretical contribution in this paper is to present a
convergence analysis of Stochastic Primal Dual Coordinate (SPDC) Method with
arbitrary sampling. Based on this theoretical framework, we propose Optimality
Violation-based Sampling SPDC (ovsSPDC), a non-uniform sampling method based on
Optimality Violation. We also propose two efficient heuristic variants of
ovsSPDC called ovsSDPC+ and ovsSDPC++. Through intensive numerical experiments,
we demonstrate that the proposed method and its variants are faster than other
state-of-the-art primal-dual type stochastic optimization methods.
Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz
Comments: 5 pages, 5 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)
Language understanding is a key component in a spoken dialogue system. In
this paper, we investigate how the language understanding module influences the
dialogue system performance by conducting a series of systematic experiments on
a task-oriented neural dialogue system in a reinforcement learning based
setting. The empirical study shows that among different types of language
understanding errors, slot-level errors can have more impact on the overall
performance of a dialogue system compared to intent-level errors. In addition,
our experiments demonstrate that the reinforcement learning based dialogue
system is able to learn when and what to confirm in order to achieve better
performance and greater robustness.
Krzysztof J. Geras, Stacey Wolfson, S. Gene Kim, Linda Moy, Kyunghyun Cho
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
Recent advances in deep learning for object recognition in natural images has
prompted a surge of interest in applying a similar set of techniques to medical
images. Most of the initial attempts largely focused on replacing the input to
such a deep convolutional neural network from a natural image to a medical
image. This, however, does not take into consideration the fundamental
differences between these two types of data. More specifically, detection or
recognition of an anomaly in medical images depends significantly on fine
details, unlike object recognition in natural images where coarser, more global
structures matter more. This difference makes it inadequate to use the existing
deep convolutional neural networks architectures, which were developed for
natural images, because they rely on heavily downsampling an image to a much
lower resolution to reduce the memory requirements. This hides details
necessary to make accurate predictions for medical images. Furthermore, a
single exam in medical imaging often comes with a set of different views which
must be seamlessly fused in order to reach a correct conclusion. In our work,
we propose to use a multi-view deep convolutional neural network that handles a
set of more than one high-resolution medical image. We evaluate this network on
large-scale mammography-based breast cancer screening (BI-RADS prediction)
using 103 thousand images. We focus on investigating the impact of training set
sizes and image sizes on the prediction accuracy. Our results highlight that
performance clearly increases with the size of training set, and that the best
performance can only be achieved using the images in the original resolution.
This suggests the future direction of medical imaging research using deep
neural networks is to utilize as much data as possible with the least amount of
potentially harmful preprocessing.
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
A natural image usually conveys rich semantic content and can be viewed from
different angles. Existing image description methods are largely restricted by
small sets of biased visual paragraph annotations, and fail to cover rich
underlying semantics. In this paper, we investigate a semi-supervised paragraph
generative framework that is able to synthesize diverse and semantically
coherent paragraph descriptions by reasoning over local semantic regions and
exploiting linguistic knowledge. The proposed Recurrent Topic-Transition
Generative Adversarial Network (RTT-GAN) builds an adversarial framework
between a structured paragraph generator and multi-level paragraph
discriminators. The paragraph generator generates sentences recurrently by
incorporating region-based visual and language attention mechanisms at each
step. The quality of generated paragraph sentences is assessed by multi-level
adversarial discriminators from two aspects, namely, plausibility at sentence
level and topic-transition coherence at paragraph level. The joint adversarial
training of RTT-GAN drives the model to generate realistic paragraphs with
smooth logical transition between sentence topics. Extensive quantitative
experiments on image and video paragraph datasets demonstrate the effectiveness
of our RTT-GAN in both supervised and semi-supervised settings. Qualitative
results on telling diverse stories for an image also verify the
interpretability of RTT-GAN.
Florian Bordes, Sina Honari, Pascal Vincent
Comments: Published as a conference paper at ICLR 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In this work, we investigate a novel training procedure to learn a generative
model as the transition operator of a Markov chain, such that, when applied
repeatedly on an unstructured random noise sample, it will denoise it into a
sample that matches the target distribution from the training set. The novel
training procedure to learn this progressive denoising operation involves
sampling from a slightly different chain than the model chain used for
generation in the absence of a denoising target. In the training chain we
infuse information from the training target example that we would like the
chains to reach with a high probability. The thus learned transition operator
is able to produce quality and varied samples in a small number of steps.
Experiments show competitive results compared to the samples generated with a
basic Generative Adversarial Net
Miriam W. Huijser, Jan C. van Gemert
Comments: ICCV submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
This paper is on active learning where the goal is to reduce the data
annotation burden by interacting with a (human) oracle during training.
Standard active learning methods ask the oracle to annotate data samples.
Instead, we take a profoundly different approach: we ask for annotations of the
decision boundary. We achieve this using a deep generative model to create
novel instances along a 1d line. A point on the decision boundary is revealed
where the instances change class. Experimentally we show on three data sets
that our method can be plugged-in to other active learning schemes, that human
oracles can effectively annotate points on the decision boundary, that our
method is robust to annotation noise, and that decision boundary annotations
improve over annotating data samples.
William La Cava, Jason H. Moore
Comments: Genetic and Evolutionary Computation Conference (GECCO) 2017, Berlin, Germany
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
Recently we proposed a general, ensemble-based feature engineering wrapper
(FEW) that was paired with a number of machine learning methods to solve
regression problems. Here, we adapt FEW for supervised classification and
perform a thorough analysis of fitness and survival methods within this
framework. Our tests demonstrate that two fitness metrics, one introduced as an
adaptation of the silhouette score, outperform the more commonly used Fisher
criterion. We analyze survival methods and demonstrate that (epsilon)-lexicase
survival works best across our test problems, followed by random survival which
outperforms both tournament and deterministic crowding. We conduct
hyper-parameter optimization for several classification methods using a large
set of problems to benchmark the ability of FEW to improve data
representations. The results show that FEW can improve the best classifier
performance on several problems. We show that FEW generates readable and
meaningful features for a biomedical problem with different ML pairings.
Caifa Zhou, Andreas Wieser
Comments: 11 pages, 11 figures, published in proceedings UPINLBS 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We propose a scheme to employ backpropagation neural networks (BPNNs) for
both stages of fingerprinting-based indoor positioning using WLAN/WiFi signal
strengths (FWIPS): radio map construction during the offline stage, and
localization during the online stage. Given a training radio map (TRM), i.e., a
set of coordinate vectors and associated WLAN/WiFi signal strengths of the
available access points, a BPNN can be trained to output the expected signal
strengths for any input position within the region of interest (BPNN-RM). This
can be used to provide a continuous representation of the radio map and to
filter, densify or decimate a discrete radio map. Correspondingly, the TRM can
also be used to train another BPNN to output the expected position within the
region of interest for any input vector of recorded signal strengths and thus
carry out localization (BPNN-LA).Key aspects of the design of such artificial
neural networks for a specific application are the selection of design
parameters like the number of hidden layers and nodes within the network, and
the training procedure. Summarizing extensive numerical simulations, based on
real measurements in a testbed, we analyze the impact of these design choices
on the performance of the BPNN and compare the results in particular to those
obtained using the (k) nearest neighbors ((k)NN) and weighted (k) nearest
neighbors approaches to FWIPS.
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, Pieter Abbeel
Comments: 8 pages, 7 figures. Submitted to 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017)
Subjects: Robotics (cs.RO); Learning (cs.LG)
Bridging the ‘reality gap’ that separates simulated robotics from experiments
on hardware could accelerate robotic research through improved data
availability. This paper explores domain randomization, a simple technique for
training models on simulated images that transfer to real images by randomizing
rendering in the simulator. With enough variability in the simulator, the real
world may appear to the model as just another variation. We focus on the task
of object localization, which is a stepping stone to general robotic
manipulation skills. We find that it is possible to train a real-world object
detector that is accurate to (1.5)cm and robust to distractors and partial
occlusions using only data from a simulator with non-realistic random textures.
To demonstrate the capabilities of our detectors, we show they can be used to
perform grasping in a cluttered environment. To our knowledge, this is the
first successful transfer of a deep neural network trained only on simulated
RGB images (without pre-training on real images) to the real world for the
purpose of robotic control.
Juncheng Li, Wei Dai, Florian Metze, Shuhui Qu, Samarjit Das
Comments: 5 pages including reference
Journal-ref: published at ICASSP 2017
Subjects: Sound (cs.SD); Learning (cs.LG)
Environmental sound detection is a challenging application of machine
learning because of the noisy nature of the signal, and the small amount of
(labeled) data that is typically available. This work thus presents a
comparison of several state-of-the-art Deep Learning models on the IEEE
challenge on Detection and Classification of Acoustic Scenes and Events (DCASE)
2016 challenge task and data, classifying sounds into one of fifteen common
indoor and outdoor acoustic scenes, such as bus, cafe, car, city center, forest
path, library, train, etc. In total, 13 hours of stereo audio recordings are
available, making this one of the largest datasets available. We perform
experiments on six sets of features, including standard Mel-frequency cepstral
coefficients (MFCC), Binaural MFCC, log Mel-spectrum and two different large-
scale temporal pooling features extracted using OpenSMILE. On these features,
we apply five models: Gaussian Mixture Model (GMM), Deep Neural Network (DNN),
Recurrent Neural Network (RNN), Convolutional Deep Neural Net- work (CNN) and
i-vector. Using the late-fusion approach, we improve the performance of the
baseline 72.5% by 15.6% in 4-fold Cross Validation (CV) avg. accuracy and 11%
in test accuracy, which matches the best result of the DCASE 2016 challenge.
With large feature sets, deep neural network models out- perform traditional
methods and achieve the best performance among all the studied methods.
Consistent with other work, the best performing single model is the
non-temporal DNN model, which we take as evidence that sounds in the DCASE
challenge do not exhibit strong temporal dynamics.
Enrico Piovano, Bruno Clerckx
Subjects: Information Theory (cs.IT)
We consider the (K)-User Multiple-Input-Single-Output (MISO) Broadcast
Channel (BC) where the transmitter, equipped with (M) antennas, serves (K)
users, with (K leq M). The transmitter has access to a partial channel state
information of the users. This is modelled by letting the variance of the
Channel State Information at the Transmitter (CSIT) error of user (i) scale as
(O(P^{-alpha_i})) for the Signal-to-Noise Ratio (SNR) (P) and some constant
(alpha_i geq 0). In this work we derive the optimal Degrees-of-Freedom (DoF)
region in such setting.
Behrooz Makki, Tommy Svensson, Maite Brandt-Pearce, Mohamed-Slim Alouini
Comments: Presented at IEEE WCNC 2017
Subjects: Information Theory (cs.IT)
We study the performance of multi-hop networks composed of millimeter wave
(MMW)-based radio frequency (RF) and free-space optical (FSO) links. The
results are obtained in the cases with and without hybrid automatic repeat
request (HARQ). Taking the MMW characteristics of the RF links into account, we
derive closed-form expressions for the network outage probability. We also
evaluate the effect of various parameters such as power amplifiers efficiency,
number of antennas as well as different coherence times of the RF and the FSO
links on the system performance. Finally, we present mappings between the
performance of RF-FSO multi-hop networks and the ones using only the RF- or the
FSO-based communication, in the sense that with appropriate parameter settings
the same outage probability is achieved in these setups. The results show the
efficiency of the RF-FSO setups in different conditions. Moreover, the HARQ can
effectively improve the outage probability/energy efficiency, and compensate
the effect of hardware impairments in RF-FSO networks. For common parameter
settings of the RF-FSO dual-hop networks, outage probability 10^{-4} and code
rate 3 nats-per-channel-use, the implementation of HARQ with a maximum of 2 and
3 retransmissions reduces the required power, compared to the cases with no
HARQ, by 13 and 17 dB, respectively.
Yanxiang Jiang, Xiqi Gao, Xiaohu You
Comments: 11 pages, 9 figures, IEICE Trans. Commun., 2006
Journal-ref: IEICE Trans. Commun., Apr. 2006
Subjects: Information Theory (cs.IT)
A novel frequency domain training sequence and the corresponding carrier
frequency offset (CFO) estimator are proposed for orthogonal frequency division
multiplexing (OFDM) systems over frequency-selective fading channels. The
proposed frequency domain training sequence comprises two types of pilot tones,
namely distinctively spaced pilot tones with high energies and uniformly spaced
ones with low energies. Based on the distinctively spaced pilot tones, integer
CFO estimation is accomplished. After the subcarriers occupied by the
distinctively spaced pilot tones and their adjacent subcarriers are nulled for
the sake of interference cancellation, fractional CFO estimation is executed
according to the uniformly spaced pilot tones. By exploiting a predefined
lookup table making the best of the structure of the distinctively spaced pilot
tones, computational complexity of the proposed CFO estimator can be decreased
considerably. With the aid of the uniformly spaced pilot tones generated from
Chu sequence with cyclically orthogonal property, the ability of the proposed
estimator to combat multipath effect is enhanced to a great extent. Simulation
results illustrate the good performance of the proposed CFO estimator.
Shuai Li, Kun Yang, Mingxin Zhou, Jianjun Wu, Lingyang Song, Yonghui Li, Hongbin Li
Comments: tvt,journal
Subjects: Information Theory (cs.IT)
In this paper, we consider a full-duplex (FD) amplify-and-forward (AF) relay
system and optimize its power allocation and relay location to minimize the
system symbol error rate (SER). We first derive the asymptotic expressions of
the outage probability and SER performance by taking into account the residual
self interference (RSI) in FD systems. We then formulate the optimization
problem based on the minimal SER criterion. Analytical and numerical results
show that optimized relay location and power allocation can greatly improve
system SER performance, and the performance floor caused by the RSI can be
significantly reduced via optimizing relay location or power allocation.
Yanxiang Jiang, Hlaing Minn, Xiaohu You, Xiqi Gao
Comments: 5 pages, 3 figures, IEEE TVT, 2008
Journal-ref: IEEE TVT, Sept. 2008
Subjects: Information Theory (cs.IT)
This paper addresses a simplified frequency offset estimator for
multiple-input multiple-output (MIMO) orthogonal frequency division
multiplexing (OFDM) systems over frequency selective fading channels. By
exploiting the good correlation property of the training sequences, which are
constructed from the Chu sequence, carrier frequency offset (CFO) estimation is
obtained through factor decomposition for the derivative of the cost function
with great complexity reduction. The mean-squared error (MSE) of the CFO
estimation is derived to optimize the key parameter of the simplified estimator
and also to evaluate the estimator performance. Simulation results confirm the
good performance of the training-assisted CFO estimator.
Yanxiang Jiang, Yanxing Hu, Xiaohu You
Comments: 4 pages, 4 figures, IEICE Trans. Commun
Journal-ref: IEICE Trans. Commun., Jan. 2012
Subjects: Information Theory (cs.IT)
In this letter, signal-to-noise ratio (SNR) performance is analyzed for
orthogonal frequency division multiplexing (OFDM) based amplify-and-forward
(AF) relay systems in the presence of carrier frequency offset (CFO) for fading
channels. The SNR expression is derived under one-relay-node scenario, and is
further extended to multiple-relay-node scenario. Analytical results show that
the SNR is quite sensitive to CFO and the sensitivity of the SNR to CFO is
mainly determined by the power of the corresponding link channel and gain
factor.
Jiadian Zhang, Yanxiang Jiang, Peng Li, Fuchun Zheng, Xiaohu You
Comments: 6 pages, 5 figures, IEEE VTC 2016-S
Journal-ref: IEEE VTC 2016-S, May 2016
Subjects: Information Theory (cs.IT)
In this paper, energy efficient power allocation for downlink massive MIMO
systems is investigated. A constrained non-convex optimization problem is
formulated to maximize the energy efficiency (EE), which takes into account the
quality of service (QoS) requirements. By exploiting the properties of
fractional programming and the lower bound of the user data rate, the
non-convex optimization problem is transformed into a convex optimization
problem. The Lagrangian dual function method is utilized to convert the
constrained convex problem into an unconstrained convex one. Due to the
multi-variable coupling problem caused by the intra-user interference, it is
intractable to derive an explicit solution to the above optimization problem.
Exploiting the standard interference function, we propose an implicit iterative
algorithm to solve the unconstrained convex optimization problem and obtain the
optimal power allocation scheme. Simulation results show that the proposed
iterative algorithm converges in just a few iterations, and demonstrate the
impact of the number of users and the number of antennas on the EE.
Peng Li, Yanxiang Jiang, Wei Li, Fuchun Zheng, Xiaohu You
Comments: 6 pages, 5 figures, IEEE Wireless Communications and Networking Conference (WCNC’16)
Journal-ref: IEEE Wireless Communications and Networking Conference (WCNC’16),
April 2016
Subjects: Information Theory (cs.IT)
In this paper, energy efficient power allocation for the uplink of a
multi-cell massive MIMO system is investigated. With the simplified power
consumption model, the problem of power allocation is formulated as a
constrained Markov decision process (CMDP) framework with infinite-horizon
expected discounted total reward, which takes into account different quality of
service (QoS) requirements for each user terminal (UT). We propose an offline
solution containing the value iteration and Q-learning algorithms, which can
obtain the global optimum power allocation policy. Simulation results show that
our proposed policy performs very close to the ergodic optimal policy.
Yanxiang Jiang, Qiang Liu, Fuchun Zheng, Xiqi Gao, Xiaohu You
Comments: 9 pages, 5 figures, this paper has been published by IEEE Transactions on Vehicular Technology
Journal-ref: IEEE Transactions on Vehicular Technology, August 2016
Subjects: Information Theory (cs.IT)
In this paper, joint resource allocation and power control for energy
efficient device-to-device (D2D) communications underlaying cellular networks
are investigated. The resource and power are optimized for maximization of the
energy efficiency (EE) of D2D communications. Exploiting the properties of
fractional programming, we transform the original nonconvex optimization
problem in fractional form into an equivalent optimization problem in
subtractive form. Then, an efficient iterative resource allocation and power
control scheme is proposed. In each iteration, part of the constraints of the
EE optimization problem is removed by exploiting the penalty function approach.
We further propose a novel two-layer approach which allows to find the optimum
at each iteration by decoupling the EE optimization problem of joint resource
allocation and power control into two separate steps. In the first layer, the
optimal power values are obtained by solving a series of maximization problems
through root-finding with or without considering the loss of cellular users’
rates. In the second layer, the formulated optimization problem belongs to a
classical resource allocation problem with single allocation format which
admits a network flow formulation so that it can be solved to optimality.
Simulation results demonstrate the remarkable improvements in terms of EE by
using the proposed iterative resource allocation and power control scheme.
Peng Li, Yanxiang Jiang, Shaoli Kang, Fuchun Zheng, Xiaohu You
Comments: 6 pages, 5 figures, this paper has been accepted by IEEE VTC 2017-Spring
Subjects: Information Theory (cs.IT)
In this paper, pattern division multiple access with large-scale antenna
array (LSA-PDMA) is proposed as a novel non-orthogonal multiple access (NOMA)
scheme. In the proposed scheme, pattern is designed in both beam domain and
power domain in a joint manner. At the transmitter, pattern mapping utilizes
power allocation to improve the system sum rate and beam allocation to enhance
the access connectivity and realize the integration of LSA into multiple access
spontaneously. At the receiver, hybrid detection of spatial filter (SF) and
successive interference cancellation (SIC) is employed to separate the
superposed multiple-domain signals. Furthermore, we formulate the sum rate
maximization problem to obtain the optimal pattern mapping policy, and the
optimization problem is proved to be convex through proper mathematical
manipulations. Simulation results show that the proposed LSA-PDMA scheme
achieves significant performance gain on system sum rate compared to both the
orthogonal multiple access scheme and the power-domain NOMA scheme.
Zhengdao Yuan
Comments: arXiv admin note: text overlap with arXiv:1409.4671 by other authors
Subjects: Information Theory (cs.IT)
This paper investigate the problem of estimating sparse channels in massive
MIMO systems. Most wireless channel are sparse with large delay spread, while
some channels can be observed have common support within a certain area of the
antenna array. This common support property is attractive when it comes to the
estimation of large number of channels in massive MIMO systems. In this paper,
we proposed a novel channel estimation approach which utilize the common
support by exerting a Dirichlet process (DP) prior over the sparse Bayesian
learning (SBL) model. In addition, this Dirichlet process is modeled based on
factor graph and combined BP-MF message passing. Compare to the variational
Bayesian (VB) method in literaturewhich, the proposed method can improve the
performance while significantly reduce the complexity. Simulation results
demonstrate that the proposed algorithm outperform other reported ones in both
performance and complexity
K. Denia Kanellopoulou, Kostas P. Peppas, P. Takis Mathiopoulos
Subjects: Information Theory (cs.IT)
The effective capacity (EC) has been recently established as a rigorous
alternative to the classical Shannon’ s ergodic capacity since it accounts for
the delay constraints imposed by future wireless applications and their impact
on the overall system performance. This paper develops a novel unified approach
for the EC analysis of dispersed spectrum cognitive radio (CR) with equal gain
combining (EGC) and maximal ratio combining (MRC) diversity receivers over
generalized fading channels under a maximum delay constraint. The mathematical
formalism is validated with selected numerical and equivalent simulation
performance evaluation results thus confirming the correctness of the proposed
unified approach.