Syed Shakib Sarwar, Priyadarshini Panda, Kaushik Roy
Comments: Accepted in ISLPED 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
Convolutional Neural Networks (CNN) are being increasingly used in computer
vision for a wide range of classification and recognition problems. However,
training these large networks demands high computational time and energy
requirements; hence, their energy-efficient implementation is of great
interest. In this work, we reduce the training complexity of CNNs by replacing
certain weight kernels of a CNN with Gabor filters. The convolutional layers
use the Gabor filters as fixed weight kernels, which extracts intrinsic
features, with regular trainable weight kernels. This combination creates a
balanced system that gives better training performance in terms of energy and
time, compared to the standalone CNN (without any Gabor kernels), in exchange
for tolerable accuracy degradation. We show that the accuracy degradation can
be mitigated by partially training the Gabor kernels, for a small fraction of
the total training cycles. We evaluated the proposed approach on 4 benchmark
applications. Simple tasks like face detection and character recognition (MNIST
and TiCH), were implemented using LeNet architecture. While a more complex task
of object recognition (CIFAR10) was implemented on a state of the art deep CNN
(Network in Network) architecture. The proposed approach yields 1.31-1.53x
improvement in training energy in comparison to conventional CNN
implementation. We also obtain improvement up to 1.4x in training time, up to
2.23x in storage requirements, and up to 2.2x in memory access energy. The
accuracy degradation suffered by the approximate implementations is within 0-3%
of the baseline.
Qiang Lu, Kyoung-Dae Kim
Comments: 34 pages, 11 figures
Subjects: Systems and Control (cs.SY); Neural and Evolutionary Computing (cs.NE)
In this paper, we address a problem of safe and efficient intersection
crossing traffic management of autonomous and connected ground traffic. Toward
this objective, we propose an algorithm that is called the Discrete-time
occupancies trajectory based Intersection traffic Coordination Algorithm
(DICA). We first prove that the basic DICA is deadlock free and also starvation
free. Then, we show that the basic DICA has a computational complexity of
(mathcal{O}(n^2 L_m^3)) where (n) is the number of vehicles granted to cross
an intersection and (L_m) is the maximum length of intersection crossing
routes.
To improve the overall computational efficiency of the algorithm, the basic
DICA is enhanced by several computational approaches that are proposed in this
paper. The enhanced algorithm has the computational complexity of
(mathcal{O}(n^2 L_m log_2 L_m)). The improved computational efficiency of the
enhanced algorithm is validated through simulation using an open source traffic
simulator, called the Simulation of Urban MObility (SUMO). The overall
throughput as well as the computational efficiency of the enhanced algorithm
are also compared with those of an optimized traffic light control.
Paschalis Panteleris (1), Antonis Argyros (1 and 2) ((1) Institute of Computer Science, FORTH, (2) Computer Science Department, University of Crete)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a novel solution to the problem of 3D tracking of the articulated
motion of human hand(s), possibly in interaction with other objects. The vast
majority of contemporary relevant work capitalizes on depth information
provided by RGBD cameras. In this work, we show that accurate and efficient 3D
hand tracking is possible, even for the case of RGB stereo. A straightforward
approach for solving the problem based on such input would be to first recover
depth and then apply a state of the art depth-based 3D hand tracking method.
Unfortunately, this does not work well in practice because the stereo-based,
dense 3D reconstruction of hands is far less accurate than the one obtained by
RGBD cameras. Our approach bypasses 3D reconstruction and follows a completely
different route: 3D hand tracking is formulated as an optimization problem
whose solution is the hand configuration that maximizes the color consistency
between the two views of the hand. We demonstrate the applicability of our
method for real time tracking of a single hand, of a hand manipulating an
object and of two interacting hands. The method has been evaluated
quantitatively on standard datasets and in comparison to relevant, state of the
art RGBD-based approaches. The obtained results demonstrate that the proposed
stereo-based method performs equally well to its RGBD-based competitors, and in
some cases, it even outperforms them.
Ebenezer Isaac, Susan Elias, Srinivasan Rajagopalan, K.S. Easwarakumar
Comments: Submitted to IEEE Signal Processing Letters on 24-April-2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Template-based model-free approach provides by far the most successful
solution to the gait recognition problem in literature. Recent work discusses
how isolating the head and leg portion of the template increase the performance
of a gait recognition system making it robust against covariates like clothing
and carrying conditions. However, most involve a manual definition of the
boundaries. The method we propose, the genetic template segmentation (GTS),
employs the genetic algorithm to automate the boundary selection process. This
method was tested on the GEI, GEnI and AEI templates. GEI seems to exhibit the
best result when segmented with our approach. Experimental results depict that
our approach significantly outperforms the existing implementations of
view-invariant gait recognition.
Xuefeng Xiao, Yafeng Yang, Tasweer Ahmad, Lianwen Jin, Tianhai Chang
Comments: 5 pages, 2 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Currently, owing to the ubiquity of mobile devices, online handwritten
Chinese character recognition (HCCR) has become one of the suitable choice for
feeding input to cell phones and tablet devices. Over the past few years,
larger and deeper convolutional neural networks (CNNs) have extensively been
employed for improving character recognition performance. However, its
substantial storage requirement is a significant obstacle in deploying such
networks into portable electronic devices. To circumvent this problem, we
propose a novel technique called DropWeight for pruning redundant connections
in the CNN architecture. It is revealed that the proposed method not only
treats streamlined architectures such as AlexNet and VGGNet well but also
exhibits remarkable performance for deep residual network and inception
network. We also demonstrate that global pooling is a better choice for
building very compact online HCCR systems. Experiments were performed on the
ICDAR-2013 online HCCR competition dataset using our proposed network, and it
is found that the proposed approach requires only 0.57 MB for storage, whereas
state-of-the-art CNN-based methods require up to 135 MB; meanwhile the
performance is decreased only by 0.91%.
Qingbo Wu, Hongliang Li, Fanman Meng, King N. Ngan
Comments: This paper has been submitted to IEEE Transactions on Image Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In the field of objective image quality assessment (IQA), the Spearman’s
(
ho) and Kendall’s ( au) are two most popular rank correlation indicators,
which straightforwardly assign uniform weight to all quality levels and assume
each pair of images are sortable. They are successful for measuring the average
accuracy of an IQA metric in ranking multiple processed images. However, two
important perceptual properties are ignored by them as well. Firstly, the
sorting accuracy (SA) of high quality images are usually more important than
the poor quality ones in many real world applications, where only the
top-ranked images would be pushed to the users. Secondly, due to the subjective
uncertainty in making judgement, two perceptually similar images are usually
hardly sortable, whose ranks do not contribute to the evaluation of an IQA
metric. To more accurately compare different IQA algorithms, we explore a
perceptually weighted rank correlation indicator in this paper, which rewards
the capability of correctly ranking high quality images, and suppresses the
attention towards insensitive rank mistakes. More specifically, we focus on
activating `valid’ pairwise comparison towards image quality, whose difference
exceeds a given sensory threshold (ST). Meanwhile, each image pair is assigned
an unique weight, which is determined by both the quality level and rank
deviation. By modifying the perception threshold, we can illustrate the sorting
accuracy with a more sophisticated SA-ST curve, rather than a single rank
correlation coefficient. The proposed indicator offers a new insight for
interpreting visual perception behaviors. Furthermore, the applicability of our
indicator is validated in recommending robust IQA metrics for both the degraded
and enhanced image data.
Liangli Zhen, Dezhong Peng, Xin Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Subspace clustering aims to group data points into multiple clusters of which
each corresponds to one subspace. Most existing subspace clustering methods
assume that the data could be linearly represented with each other in the input
space. In practice, however, this assumption is hard to be satisfied. To
achieve nonlinear subspace clustering, we propose a novel method which consists
of the following three steps: 1) projecting the data into a hidden space in
which the data can be linearly reconstructed from each other; 2) calculating
the globally linear reconstruction coefficients in the kernel space; 3)
truncating the trivial coefficients to achieve robustness and
block-diagonality, and then achieving clustering by solving a graph Laplacian
problem. Our method has the advantages of a closed-form solution and capacity
of clustering data points that lie in nonlinear subspaces. The first advantage
makes our method efficient in handling large-scale data sets, and the second
one enables the proposed method to address the nonlinear subspace clustering
challenge. Extensive experiments on five real-world datasets demonstrate the
effectiveness and the efficiency of the proposed method in comparison with ten
state-of-the-art approaches regarding four evaluation metrics.
Amara Tariq, Hassan Foroosh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image search and retrieval engines rely heavily on textual annotation in
order to match word queries to a set of candidate images. A system that can
automatically annotate images with meaningful text can be highly beneficial for
such engines. Currently, the approaches to develop such systems try to
establish relationships between keywords and visual features of images. In this
paper, We make three main contributions to this area: (i) We transform this
problem from the low-level keyword space to the high-level semantics space that
we refer to as the “{em image theme}”, (ii) Instead of treating each possible
keyword independently, we use latent Dirichlet allocation to learn image themes
from the associated texts in a training phase. Images are then annotated with
image themes rather than keywords, using a modified continuous relevance model,
which takes into account the spatial coherence and the visual continuity among
images of common theme. (iii) To achieve more coherent annotations among images
of common theme, we have integrated ConceptNet in learning the semantics of
images, and hence augment image descriptions beyond annotations provided by
humans. Images are thus further annotated by a few most significant words of
the prominent image theme. Our extensive experiments show that a coherent
theme-based image annotation using high-level semantics results in improved
precision and recall as compared with equivalent classical keyword annotation
systems.
Xiaoyi Jia, Xiangmin Xu, Bolun Cai, Kailing Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Methods based on convolutional neural network (CNN) have demonstrated
tremendous improvements on single image super-resolution. However, the previous
methods mainly restore images from one single area in the low resolution (LR)
input, which limits the flexibility of models to infer various scales of
details for high resolution (HR) output. Moreover, most of them train a
specific model for each up-scale factor. In this paper, we propose a
multi-scale super resolution (MSSR) network. Our network consists of
multi-scale paths to make the HR inference, which can learn to synthesize
features from different scales. This property helps reconstruct various kinds
of regions in HR images. In addition, only one single model is needed for
multiple up-scale factors, which is more efficient without loss of restoration
quality. Experiments on four public datasets demonstrate that the proposed
method achieved state-of-the-art performance with fast speed.
Yong Khoo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
This paper presents a new method for 3D shape reconstruction based on two
existing methods. A 3D reconstruction from a single photograph is introduced by
both papers: the first one uses a photograph and a set of existing 3D model to
generate the 3D object in the photograph, while the second one uses a
photograph and a selected similar model to create the 3D object in the
photograph. According to their difference, we propose a relaxation based method
for more accurate correspondence establishment and shape recovery. The
experiment demonstrates promising results compared to the state-of-the-art work
on 3D shape estimation.
Bálint Zoltán Daróczy
Comments: doctoral thesis, 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this thesis we examined several multimodal feature extraction and learning
methods for retrieval and classification purposes. We reread briefly some
theoretical results of learning in Section 2 and reviewed several generative
and discriminative models in Section 3 while we described the similarity kernel
in Section 4. We examined different aspects of the multimodal image retrieval
and classification in Section 5 and suggested methods for identifying quality
assessments of Web documents in Section 6. In our last problem we proposed
similarity kernel for time-series based classification. The experiments were
carried over publicly available datasets and source codes for the most
essential parts are either open source or released. Since the used similarity
graphs (Section 4.2) are greatly constrained for computational purposes, we
would like to continue work with more complex, evolving and capable graphs and
apply for different problems such as capturing the rapid change in the
distribution (e.g. session based recommendation) or complex graphs of the
literature work. The similarity kernel with the proper metrics reaches and in
many cases improves over the state-of-the-art. Hence we may conclude generative
models based on instance similarities with multiple modes is a generally
applicable model for classification and regression tasks ranging over various
domains, including but not limited to the ones presented in this thesis. More
generally, the Fisher kernel is not only unique in many ways but one of the
most powerful kernel functions. Therefore we may exploit the Fisher kernel in
the future over widely used generative models, such as Boltzmann Machines
[Hinton et al., 1984], a particular subset, the Restricted Boltzmann Machines
and Deep Belief Networks [Hinton et al., 2006]), Latent Dirichlet Allocation
[Blei et al., 2003] or Hidden Markov Models [Baum and Petrie, 1966] to name a
few.
Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, Weiran He
Comments: Github: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Object Transfiguration replaces an object in an image with another object
from a second image. For example it can perform tasks like “putting exactly
those eyeglasses from image A on the nose of the person in image B”. Usage of
exemplar images allows more precise specification of desired modifications and
improves the diversity of conditional image generation. However, previous
methods that rely on feature space operations, require paired data and/or
appearance models for training or disentangling objects from background. In
this work, we propose a model that can learn object transfiguration from two
unpaired sets of images: one set containing images that “have” that kind of
object, and the other set being the opposite, with the mild constraint that the
objects be located approximately at the same place. For example, the training
data can be one set of reference face images that have eyeglasses, and another
set of images that have not, both of which spatially aligned by face landmarks.
Despite the weak 0/1 labels, our model can learn an “eyeglasses” subspace that
contain multiple representatives of different types of glasses. Consequently,
we can perform fine-grained control of generated images, like swapping the
glasses in two images by swapping the projected components in the “eyeglasses”
subspace, to create novel images of people wearing eyeglasses.
Overall, our deterministic generative model learns disentangled attribute
subspaces from weakly labeled data by adversarial training. Experiments on
CelebA and Multi-PIE datasets validate the effectiveness of the proposed model
on real world data, in generating images with specified eyeglasses, smiling,
hair styles, and lighting conditions etc. The code is available online.
Mais Alnasser, Hassan Foroosh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a new mathematical foundation for image-based
lighting. Using a simple manipulation of the local coordinate system, we derive
a closed-form solution to the light integral equation under distant environment
illumination. We derive our solution for different BRDF’s such as lambertian
and Phong-like. The method is free of noise, and provides the possibility of
using the full spectrum of frequencies captured by images taken from the
environment. This allows for the color of the rendered object to be toned
according to the color of the light in the environment. Experimental results
also show that one can gain an order of magnitude or higher in rendering time
compared to Monte Carlo quadrature methods and spherical harmonics.
Rohith AP, Salman S. Khan, Kumar Anubhav, Angshuman Paul
Comments: 7 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Grading of cancer is important to know the extent of its spread. Prior to
grading, segmentation of glandular structures is important. Manual segmentation
is a time consuming process and is subject to observer bias. Hence, an
automated process is required to segment the gland structures. These glands
show a large variation in shape size and texture. This makes the task
challenging as the glands cannot be segmented using mere morphological
operations and conventional segmentation mechanisms. In this project we propose
a method which detects the boundary epithelial cells of glands and then a novel
approach is used to construct the complete gland boundary. The region enclosed
within the boundary can then be obtained to get the segmented gland regions.
Shinjini Kundu, Soheil Kolouri, Kirk I Erickson, Arthur F Kramer, Edward McAuley, Gustavo K Rohde
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Disease in the brain is often associated with subtle, spatially diffuse, or
complex tissue changes that may lie beneath the level of gross visual
inspection, even on magnetic resonance imaging (MRI). Unfortunately, current
computer-assisted approaches that examine pre-specified features, whether
anatomically-defined (i.e. thalamic volume, cortical thickness) or based on
pixelwise comparison (i.e. deformation-based methods), are prone to missing a
vast array of physical changes that are not well-encapsulated by these metrics.
In this paper, we have developed a technique for automated pattern analysis
that can fully determine the relationship between brain structure and
observable phenotype without requiring any a priori features. Our technique,
called transport-based morphometry (TBM), is an image transformation that maps
brain images losslessly to a domain where they become much more separable. The
new approach is validated on structural brain images of healthy older adult
subjects where even linear models for discrimination, regression, and blind
source separation enable TBM to independently discover the characteristic
changes of aging and highlight potential mechanisms by which aerobic fitness
may mediate brain health later in life. TBM is a generative approach that can
provide visualization of physically meaningful shifts in tissue distribution
through inverse transformation. The proposed framework is a powerful technique
that can potentially elucidate genotype-structural-behavioral associations in
myriad diseases.
Suryansh Kumar, Yuchao Dai, Hongdong Li
Comments: Author version of this paper has been accepted by Pattern Recognition Journal in the special issue on Articulated Motion and Deformable Objects. This work was originally submitted to ACCV 16 conference on 27th May 2016 for review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Non-rigid structure-from-motion (NRSfM) has so far been mostly studied for
recovering 3D structure of a single non-rigid/deforming object. To handle the
real world challenging multiple deforming objects scenarios, existing methods
either pre-segment different objects in the scene or treat multiple non-rigid
objects as a whole to obtain the 3D non-rigid reconstruction. However, these
methods fail to exploit the inherent structure in the problem as the solution
of segmentation and the solution of reconstruction could not benefit each
other. In this paper, we propose a unified framework to jointly segment and
reconstruct multiple non-rigid objects. To compactly represent complex
multi-body non-rigid scenes, we propose to exploit the structure of the scenes
along both temporal direction and spatial direction, thus achieving a
spatio-temporal representation. Specifically, we represent the 3D non-rigid
deformations as lying in a union of subspaces along the temporal direction and
represent the 3D trajectories as lying in the union of subspaces along the
spatial direction. This spatio-temporal representation not only provides
competitive 3D reconstruction but also outputs robust segmentation of multiple
non-rigid objects. The resultant optimization problem is solved efficiently
using the Alternating Direction Method of Multipliers (ADMM). Extensive
experimental results on both synthetic and real multi-body NRSfM datasets
demonstrate the superior performance of our proposed framework compared with
the state-of-the-art methods.
Nam Vo, Nathan Jacobs, James Hays
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image geolocalization, inferring the geographic location of an image, is a
challenging computer vision problem with many potential applications. The
recent state-of-the-art approach to this problem is a deep image classification
approach in which the world is spatially divided into cells and a deep network
is trained to predict the correct cell for a given image. We propose to combine
this approach with the original Im2GPS approach in which a query image is
matched against a database of geotagged images and the location is inferred
from the retrieved set. We estimate the geographic location of a query image by
applying kernel density estimation to the locations of its nearest neighbors in
the reference database. Interestingly, we find that the best features for our
retrieval task are derived from networks trained with classification loss even
though we do not use a classification approach at test time. Training with
classification loss outperforms several deep feature learning methods (e.g.
Siamese networks with contrastive of triplet loss) more typical for retrieval
applications. Our simple approach achieves state-of-the-art geolocalization
accuracy while also requiring significantly less training data.
Yiluan Guo, Hossein Nejati, Ngai-Man Cheung
Comments: Accepted by ICIP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Brain imaging data such as EEG or MEG are high-dimensional spatiotemporal
data often degraded by complex, non-Gaussian noise. For reliable analysis of
brain imaging data, it is important to extract discriminative, low-dimensional
intrinsic representation of the recorded data. This work proposes a new method
to learn the low-dimensional representations from the noise-degraded
measurements. In particular, our work proposes a new deep neural network design
that integrates graph information such as brain connectivity with
fully-connected layers. Our work leverages efficient graph filter design using
Chebyshev polynomial and recent work on convolutional nets on graph-structured
data. Our approach exploits graph structure as the prior side information,
localized graph filter for feature extraction and neural networks for high
capacity learning. Experiments on real MEG datasets show that our approach can
extract more discriminative representations, leading to improved accuracy in a
supervised classification task.
Yao Yao, Jialv He, Jinbao Zhang, Yatao Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Impervious surface area is a direct consequence of the urbanization, which
also plays an important role in urban planning and environmental management.
With the rapidly technical development of remote sensing, monitoring urban
impervious surface via high spatial resolution (HSR) images has attracted
unprecedented attention recently. Traditional multi-classes models are
inefficient for impervious surface extraction because it requires labeling all
needed and unneeded classes that occur in the image exhaustively. Therefore, we
need to find a reliable one-class model to classify one specific land cover
type without labeling other classes. In this study, we investigate several
one-class classifiers, such as Presence and Background Learning (PBL), Positive
Unlabeled Learning (PUL), OCSVM, BSVM and MAXENT, to extract urban impervious
surface area using high spatial resolution imagery of GF-1, China’s new
generation of high spatial remote sensing satellite, and evaluate the
classification accuracy based on artificial interpretation results. Compared to
traditional multi-classes classifiers (ANN and SVM), the experimental results
indicate that PBL and PUL provide higher classification accuracy, which is
similar to the accuracy provided by ANN model. Meanwhile, PBL and PUL
outperforms OCSVM, BSVM, MAXENT and SVM models. Hence, the one-class
classifiers only need a small set of specific samples to train models without
losing predictive accuracy, which is supposed to gain more attention on urban
impervious surface extraction or other one specific land cover type.
EL-Hachemi Guerrout, Samy Ait-Aoudia, Dominique Michelucci, Ramdane Mahiou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image segmentation is the process of partitioning the image into significant
regions easier to analyze. Nowadays, segmentation has become a necessity in
many practical medical imaging methods as locating tumors and diseases. Hidden
Markov Random Field model is one of several techniques used in image
segmentation. It provides an elegant way to model the segmentation process.
This modeling leads to the minimization of an objective function. Conjugate
Gradient algorithm (CG) is one of the best known optimization techniques. This
paper proposes the use of the Conjugate Gradient algorithm (CG) for image
segmentation, based on the Hidden Markov Random Field. Since derivatives are
not available for this expression, finite differences are used in the CG
algorithm to approximate the first derivative. The approach is evaluated using
a number of publicly available images, where ground truth is known. The Dice
Coefficient is used as an objective criterion to measure the quality of
segmentation. The results show that the proposed CG approach compares favorably
with other variants of Hidden Markov Random Field segmentation algorithms.
Wei Li, Xiatian Zhu, Shaogang Gong
Comments: IJCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Existing person re-identification (re-id) methods rely mostly on either
localised or global feature representation alone. This ignores their joint
benefit and mutual complementary effects. In this work, we show the advantages
of jointly learning local and global features in a Convolutional Neural Network
(CNN) by aiming to discover correlated local and global features in different
context. Specifically, we formulate a method for joint learning of local and
global feature selection losses designed to optimise person re-id when using
only generic matching metrics such as the L2 distance. We design a novel CNN
architecture for Jointly Learning Multi-Loss (JLML) of local and global
discriminative feature optimisation subject concurrently to the same re-id
labelled information. Extensive comparative evaluations demonstrate the
advantages of this new JLML model for person re-id over a wide range of
state-of-the-art re-id methods on five benchmarks (VIPeR, GRID, CUHK01, CUHK03,
Market-1501).
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
Comments: In ICML 2017. Website at this https URL
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
In many real-world scenarios, rewards extrinsic to the agent are extremely
sparse, or absent altogether. In such cases, curiosity can serve as an
intrinsic reward signal to enable the agent to explore its environment and
learn skills that might be useful later in its life. We formulate curiosity as
the error in an agent’s ability to predict the consequence of its own actions
in a visual feature space learned by a self-supervised inverse dynamics model.
Our formulation scales to high-dimensional continuous state spaces like images,
bypasses the difficulties of directly predicting pixels, and, critically,
ignores the aspects of the environment that cannot affect the agent. The
proposed approach is evaluated in two environments: VizDoom and Super Mario
Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where
curiosity allows for far fewer interactions with the environment to reach the
goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent
to explore more efficiently; and 3) generalization to unseen scenarios (e.g.
new levels of the same game) where the knowledge gained from earlier experience
helps the agent explore new places much faster than starting from scratch. Demo
video and code available at this https URL
Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter I. Corke
Comments: 2 pages, to appear in the Deep Learning for Robotic Vision (DLRV) Workshop in CVPR 2017
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Systems and Control (cs.SY)
This paper introduces an end-to-end fine-tuning method to improve hand-eye
coordination in modular deep visuo-motor policies (modular networks) where each
module is trained independently. Benefiting from weighted losses, the
fine-tuning method significantly improves the performance of the policies for a
robotic planar reaching task.
Shital Shah, Debadeepta Dey, Chris Lovett, Ashish Kapoor
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (cs.SY)
Developing and testing algorithms for autonomous vehicles in real world is an
expensive and time consuming process. Also, in order to utilize recent advances
in machine intelligence and deep learning we need to collect a large amount of
annotated training data in a variety of conditions and environments. We present
a new simulator built on Unreal Engine that offers physically and visually
realistic simulations for both of these goals. Our simulator includes a physics
engine that can operate at a high frequency for real-time hardware-in-the-loop
(HITL) simulations with support for popular protocols (e.g. MavLink). The
simulator is designed from the ground up to be extensible to accommodate new
types of vehicles, hardware platforms and software protocols. In addition, the
modular design enables various components to be easily usable independently in
other projects. We demonstrate the simulator by first implementing a quadrotor
as an autonomous vehicle and then experimentally comparing the software
components with real-world flights.
Syed Shakib Sarwar, Priyadarshini Panda, Kaushik Roy
Comments: Accepted in ISLPED 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
Convolutional Neural Networks (CNN) are being increasingly used in computer
vision for a wide range of classification and recognition problems. However,
training these large networks demands high computational time and energy
requirements; hence, their energy-efficient implementation is of great
interest. In this work, we reduce the training complexity of CNNs by replacing
certain weight kernels of a CNN with Gabor filters. The convolutional layers
use the Gabor filters as fixed weight kernels, which extracts intrinsic
features, with regular trainable weight kernels. This combination creates a
balanced system that gives better training performance in terms of energy and
time, compared to the standalone CNN (without any Gabor kernels), in exchange
for tolerable accuracy degradation. We show that the accuracy degradation can
be mitigated by partially training the Gabor kernels, for a small fraction of
the total training cycles. We evaluated the proposed approach on 4 benchmark
applications. Simple tasks like face detection and character recognition (MNIST
and TiCH), were implemented using LeNet architecture. While a more complex task
of object recognition (CIFAR10) was implemented on a state of the art deep CNN
(Network in Network) architecture. The proposed approach yields 1.31-1.53x
improvement in training energy in comparison to conventional CNN
implementation. We also obtain improvement up to 1.4x in training time, up to
2.23x in storage requirements, and up to 2.2x in memory access energy. The
accuracy degradation suffered by the approximate implementations is within 0-3%
of the baseline.
Yair Rivenson, Zoltan Gorocs, Harun Gunaydin, Yibo Zhang, Hongda Wang, Aydogan Ozcan
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
We demonstrate that a deep neural network can significantly improve optical
microscopy, enhancing its spatial resolution over a large field-of-view and
depth-of-field. After its training, the only input to this network is an image
acquired using a regular optical microscope, without any changes to its design.
We blindly tested this deep learning approach using various tissue samples that
are imaged with low-resolution and wide-field systems, where the network
rapidly outputs an image with remarkably better resolution, matching the
performance of higher numerical aperture lenses, also significantly surpassing
their limited field-of-view and depth-of-field. These results are
transformative for various fields that use microscopy tools, including e.g.,
life sciences, where optical microscopy is considered as one of the most widely
used and deployed techniques. Beyond such applications, our presented approach
is broadly applicable to other imaging modalities, also spanning different
parts of the electromagnetic spectrum, and can be used to design computational
imagers that get better and better as they continue to image specimen and
establish new transformations among different modes of imaging.
Paul Beaumont, Michael Huth
Comments: 43 pages, 18 figures
Subjects: Artificial Intelligence (cs.AI)
We develop the theory and practice of an approach to modelling and
probabilistic inference in causal networks that is suitable when
application-specific or analysis-specific constraints should inform such
inference or when little or no data for the learning of causal network
structure or probability values at nodes are available. Constrained Bayesian
Networks generalize a Bayesian Network such that probabilities can be symbolic,
arithmetic expressions and where the meaning of the network is constrained by
finitely many formulas from the theory of the reals. A formal semantics for
constrained Bayesian Networks over first-order logic of the reals is given,
which enables non-linear and non-convex optimisation algorithms that rely on
decision procedures for this logic, and supports the composition of several
constrained Bayesian Networks. A non-trivial case study in arms control, where
few or no data are available to assess the effectiveness of an arms inspection
process, evaluates our approach. An open-access prototype implementation of
these foundations and their algorithms uses the SMT solver Z3 as decision
procedure, leverages an open-source package for Bayesian inference to symbolic
computation, and is evaluated experimentally.
Minas Dasygenis, Kostas Stergiou
Comments: special issue of the International Journal on AI Tools (IJAIT) (this http URL)
Subjects: Artificial Intelligence (cs.AI)
Local consistencies stronger than arc consistency have received a lot of
attention since the early days of CSP research. %because of the strong pruning
they can achieve. However, they have not been widely adopted by CSP solvers.
This is because applying such consistencies can sometimes result in
considerably smaller search tree sizes and therefore in important speed-ups,
but in other cases the search space reduction may be small, causing severe run
time penalties. Taking advantage of recent advances in parallelization, we
propose a novel approach for the application of strong local consistencies
(SLCs) that can improve their performance by largely preserving the speed-ups
they offer in cases where they are successful, and eliminating the run time
penalties in cases where they are unsuccessful. This approach is presented in
the form of two search algorithms. Both algorithms consist of a master search
process, which is a typical CSP solver, and a number of slave processes, with
each one implementing a SLC method. The first algorithm runs the different SLCs
synchronously at each node of the search tree explored in the master process,
while the second one can run them asynchronously at different nodes of the
search tree. Experimental results demonstrate the benefits of the proposed
method.
Raul Fervari, Andreas Herzig, Yanjun Li, Yanjing Wang
Comments: an earlier version of the paper to appear in IJCAI 2017
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
In this paper, we propose a single-agent logic of goal-directed knowing how
extending the standard epistemic logic of knowing that with a new knowing how
operator. The semantics of the new operator is based on the idea that knowing
how to achieve (phi) means that there exists a (uniform) strategy such that
the agent knows that it can make sure (phi). We give an intuitive
axiomatization of our logic and prove the soundness, completeness, and
decidability of the logic. The crucial axioms relating knowing that and knowing
how illustrate our understanding of knowing how in this setting. This logic can
be used in representing both knowledge-that and knowledge-how.
Lahari Poddar, Wynne Hsu, Mong Li Lee
Comments: Accepted for publication in IJCAI 2017
Subjects: Artificial Intelligence (cs.AI)
Opinion of users expressed in the form of observed ratings can influence an
individual’s view of an item. However, the true quality of an item is often
obfuscated by user biases, and it is not obvious from the observed ratings the
importance users place on different aspects of an item. In this paper, we
propose a probabilistic modeling of the observed aspect ratings to infer (i)
each user’s aspect bias and (ii) latent intrinsic quality of an item. We model
multi-aspect ratings as ordered discrete data and encode the dependency between
different aspects by using a latent Gaussian structure. We handle the
Gaussian-Categorical non-conjugacy using a stick-breaking formulation coupled
with recently developed Polya-Gamma auxiliary variable augmentation for a
simple, fully Bayesian inference. On two real world datasets, we demonstrate
the predictive ability of our model over state-of-the art baselines and its
effectiveness in learning explainable user biases to provide insights towards a
more reliable product quality estimation.
José F. Fontanari
Subjects: Artificial Intelligence (cs.AI)
The brain’s self-monitoring of activities, including internal activities — a
functionality that we refer to as awareness — has been suggested as a key
element of consciousness. Here we investigate whether the presence of an
inner-eye-like process (monitor) that supervises the activities of a number of
subsystems (operative agents) engaged in the solution of a problem can improve
the problem-solving efficiency of the system. The problem is to find the global
maximum of a NK fitness landscape and the performance is measured by the time
required to find that maximum. The operative agents explore blindly the fitness
landscape and the monitor provides them with feedback on the quality (fitness)
of the proposed solutions. This feedback is then used by the operative agents
to bias their searches towards the fittest regions of the landscape. We find
that a weak feedback between the monitor and the operative agents improves the
performance of the system, regardless of the difficulty of the problem, which
is gauged by the number of local maxima in the landscape. For easy problems
(i.e., landscapes without local maxima), the performance improves monotonically
as the feedback strength increases, but for difficult problems, there is an
optimal value of the feedback strength beyond which the system performance
degrades very rapidly.
Yevgeny Kazakov, Denis Ponomaryov
Subjects: Artificial Intelligence (cs.AI)
We propose a new mechanism for integration of OWL ontologies using semantic
import relations. In contrast to the standard OWL importing, we do not require
all axioms of the imported ontologies to be taken into account for reasoning
tasks, but only their logical implications over a chosen signature. This
property comes natural in many ontology integration scenarios, especially when
the number of ontologies is large. In this paper, we study the complexity of
reasoning over ontologies with semantic import relations and establish a range
of tight complexity bounds for various fragments of OWL.
Denis Ponomaryov, Mikhail Soutchanski
Subjects: Artificial Intelligence (cs.AI)
In many tasks related to reasoning about consequences of a logical theory, it
is desirable to decompose the theory into a number of weakly-related or
independent components. However, a theory may represent knowledge that is
subject to change, as a result of executing actions that have effects on some
of the initial properties mentioned in the theory. Having once computed a
decomposition of a theory, it is advantageous to know whether a decomposition
has to be computed again in the newly-changed theory (obtained from taking into
account changes resulting from execution of an action). In the paper, we
address this problem in the scope of the situation calculus, where a change of
an initial theory is related to the notion of progression. Progression provides
a form of forward reasoning; it relies on forgetting values of those
properties, which are subject to change, and computing new values for them. We
consider decomposability and inseparability, two component properties known
from the literature, and contribute by 1) studying the conditions when these
properties are preserved and 2) when they are lost wrt progression and the
related operation of forgetting. To show the latter, we demonstrate the
boundaries using a number of negative examples. To show the former, we identify
cases when these properties are preserved under forgetting and progression of
initial theories in local-effect basic action theories of the situation
calculus. Our paper contributes to bridging two different communities in
Knowledge Representation, namely research on modularity and research on
reasoning about actions.
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
Comments: In ICML 2017. Website at this https URL
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
In many real-world scenarios, rewards extrinsic to the agent are extremely
sparse, or absent altogether. In such cases, curiosity can serve as an
intrinsic reward signal to enable the agent to explore its environment and
learn skills that might be useful later in its life. We formulate curiosity as
the error in an agent’s ability to predict the consequence of its own actions
in a visual feature space learned by a self-supervised inverse dynamics model.
Our formulation scales to high-dimensional continuous state spaces like images,
bypasses the difficulties of directly predicting pixels, and, critically,
ignores the aspects of the environment that cannot affect the agent. The
proposed approach is evaluated in two environments: VizDoom and Super Mario
Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where
curiosity allows for far fewer interactions with the environment to reach the
goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent
to explore more efficiently; and 3) generalization to unseen scenarios (e.g.
new levels of the same game) where the knowledge gained from earlier experience
helps the agent explore new places much faster than starting from scratch. Demo
video and code available at this https URL
Cedric Nugteren
Subjects: Mathematical Software (cs.MS); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
This work demonstrates how to accelerate dense linear algebra computations
using CLBlast, an open-source OpenCL BLAS library providing optimized routines
for a wide variety of devices. It is targeted at machine learning and HPC
applications and thus provides a fast matrix-multiplication routine (GEMM) to
accelerate the core of many applications (e.g. deep learning, iterative
solvers, astrophysics, computational fluid dynamics, quantum chemistry).
CLBlast has four main advantages over other BLAS libraries: 1) it is optimized
for and tested on a large variety of OpenCL devices including less commonly
used devices such as embedded and low-power GPUs, 2) it can be explicitly tuned
for specific problem-sizes on specific hardware platforms, 3) it can perform
operations in half-precision floating-point FP16 saving precious bandwidth,
time and energy, 4) and it can combine multiple operations in a single batched
routine, accelerating smaller problems significantly. This paper describes the
library and demonstrates the advantages of CLBlast experimentally for different
use-cases on a wide variety of OpenCL hardware.
Chen Zhang, Hao Wang, Yingcai Wu
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Massive public resume data emerging on the WWW indicates individual-related
characteristics in terms of profile and career experiences. Resume Analysis
(RA) provides opportunities for many applications, such as talent seeking and
evaluation. Existing RA studies based on statistical analyzing have primarily
focused on talent recruitment by identifying explicit attributes. However, they
failed to discover the implicit semantic information, i.e., individual career
progress patterns and social-relations, which are vital to comprehensive
understanding of career development. Besides, how to visualize them for better
human cognition is also challenging. To tackle these issues, we propose a
visual analytics system ResumeVis to mine and visualize resume data. Firstly, a
text-mining based approach is presented to extract semantic information. Then,
a set of visualizations are devised to represent the semantic information in
multiple perspectives. By interactive exploration on ResumeVis performed by
domain experts, the following tasks can be accomplished: to trace individual
career evolving trajectory; to mine latent social-relations among individuals;
and to hold the full picture of massive resumes’ collective mobility. Case
studies with over 2500 online officer resumes demonstrate the effectiveness of
our system. We provide a demonstration video.
Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker
Comments: To be published in Machine Learning Journal
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO); Machine Learning (stat.ML)
This article provides the first survey of computational models of emotion in
reinforcement learning (RL) agents. The survey focuses on agent/robot emotions,
and mostly ignores human user emotions. Emotions are recognized as functional
in decision-making by influencing motivation and action selection. Therefore,
computational emotion models are usually grounded in the agent’s decision
making architecture, of which RL is an important subclass. Studying emotions in
RL-based agents is useful for three research fields. For machine learning (ML)
researchers, emotion models may improve learning efficiency. For the
interactive ML and human-robot interaction (HRI) community, emotions can
communicate state and enhance user investment. Lastly, it allows affective
modelling (AM) researchers to investigate their emotion theories in a
successful AI agent class. This survey provides background on emotion theory
and RL. It systematically addresses 1) from what underlying dimensions (e.g.,
homeostasis, appraisal) emotions can be derived and how these can be modelled
in RL-agents, 2) what types of emotions have been derived from these
dimensions, and 3) how these emotions may either influence the learning
efficiency of the agent or be useful as social signals. We also systematically
compare evaluation criteria, and draw connections to important RL sub-domains
like (intrinsic) motivation and model-based RL. In short, this survey provides
both a practical overview for engineers wanting to implement emotions in their
RL agents, and identifies challenges and directions for future emotion-RL
research.
Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter I. Corke
Comments: 2 pages, to appear in the Deep Learning for Robotic Vision (DLRV) Workshop in CVPR 2017
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Systems and Control (cs.SY)
This paper introduces an end-to-end fine-tuning method to improve hand-eye
coordination in modular deep visuo-motor policies (modular networks) where each
module is trained independently. Benefiting from weighted losses, the
fine-tuning method significantly improves the performance of the policies for a
robotic planar reaching task.
Liangli Zhen, Dezhong Peng, Xin Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Subspace clustering aims to group data points into multiple clusters of which
each corresponds to one subspace. Most existing subspace clustering methods
assume that the data could be linearly represented with each other in the input
space. In practice, however, this assumption is hard to be satisfied. To
achieve nonlinear subspace clustering, we propose a novel method which consists
of the following three steps: 1) projecting the data into a hidden space in
which the data can be linearly reconstructed from each other; 2) calculating
the globally linear reconstruction coefficients in the kernel space; 3)
truncating the trivial coefficients to achieve robustness and
block-diagonality, and then achieving clustering by solving a graph Laplacian
problem. Our method has the advantages of a closed-form solution and capacity
of clustering data points that lie in nonlinear subspaces. The first advantage
makes our method efficient in handling large-scale data sets, and the second
one enables the proposed method to address the nonlinear subspace clustering
challenge. Extensive experiments on five real-world datasets demonstrate the
effectiveness and the efficiency of the proposed method in comparison with ten
state-of-the-art approaches regarding four evaluation metrics.
Michael Backes, Jörg Hoffmann, Robert Künnemann, Patrick Speicher, Marcel Steinmetz
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Penetration testing is a well-established practical concept for the
identification of potentially exploitable security weaknesses and an important
component of a security audit. Providing a holistic security assessment for
networks consisting of several hundreds hosts is hardly feasible though without
some sort of mechanization. Mitigation, prioritizing counter- measures subject
to a given budget, currently lacks a solid theoretical understanding and is
hence more art than science. In this work, we propose the first approach for
conduct- ing comprehensive what-if analyses in order to reason about mitigation
in a conceptually well-founded manner. To evaluate and compare mitigation
strategies, we use simulated penetration testing, i.e., automated
attack-finding, based on a network model to which a subset of a given set of
mitigation actions, e.g., changes to the network topology, system updates,
configuration changes etc. is applied. We determine optimal combinations that
minimize the maximal attacker success (similar to a Stackelberg game), and thus
provide a well-founded basis for a holistic mitigation strategy. We show that
these what-if analysis models can largely be derived from network scan, public
vulnerability databases and manual inspection with various degrees of
automation and detail, and we simulate mitigation analysis on networks of
different size and vulnerability.
Shital Shah, Debadeepta Dey, Chris Lovett, Ashish Kapoor
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (cs.SY)
Developing and testing algorithms for autonomous vehicles in real world is an
expensive and time consuming process. Also, in order to utilize recent advances
in machine intelligence and deep learning we need to collect a large amount of
annotated training data in a variety of conditions and environments. We present
a new simulator built on Unreal Engine that offers physically and visually
realistic simulations for both of these goals. Our simulator includes a physics
engine that can operate at a high frequency for real-time hardware-in-the-loop
(HITL) simulations with support for popular protocols (e.g. MavLink). The
simulator is designed from the ground up to be extensible to accommodate new
types of vehicles, hardware platforms and software protocols. In addition, the
modular design enables various components to be easily usable independently in
other projects. We demonstrate the simulator by first implementing a quadrotor
as an autonomous vehicle and then experimentally comparing the software
components with real-world flights.
Luke Metz, Julian Ibarz, Navdeep Jaitly, James Davidson
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
It has long been assumed that high dimensional continuous control problems
cannot be solved effectively by discretizing individual dimensions of the
action space due to the exponentially large number of bins over which policies
would have to be learned. In this paper, we draw inspiration from the recent
success of sequence-to-sequence models for structured prediction problems to
develop policies over discretized spaces. Central to this method is the
realization that complex functions over high dimensional spaces can be modeled
by neural networks that use next step prediction. Specifically, we show how
Q-values and policies over continuous spaces can be modeled using a next step
prediction model over discretized dimensions. With this parameterization, it is
possible to both leverage the compositional structure of action spaces during
learning, as well as compute maxima over action spaces (approximately). On a
simple example task we demonstrate empirically that our method can perform
global search, which effectively gets around the local optimization issues that
plague DDPG and NAF. We apply the technique to off-policy (Q-learning) methods
and show that our method can achieve the state-of-the-art for off-policy
methods on several continuous control tasks.
Shunji Umetani, Masanao Arakawa, Mutsunori Yagiura
Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
We consider an extension of the set covering problem (SCP) introducing (i)
multicover and (ii) generalized upper bound (GUB) constraints. For the
conventional SCP, the pricing method has been introduced to reduce the size of
instances, and several efficient heuristic algorithms based on such reduction
techniques have been developed to solve large-scale instances. However, GUB
constraints often make the pricing method less effective, because they often
prevent solutions from containing highly evaluated variables together. To
overcome this, we develop heuristic algorithms to reduce the size of instances,
in which new evaluation schemes of variables are introduced taking account of
GUB constraints. We also develop an efficient implementation of a 2-flip
neighborhood local search algorithm that reduces the number of candidates in
the neighborhood without sacrificing the solution quality. According to
computational comparison on benchmark instances with the recent solvers, the
proposed algorithm performs quite effectively for instances having large gaps
between lower and upper bounds.
Wei Li, Xiatian Zhu, Shaogang Gong
Comments: IJCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Existing person re-identification (re-id) methods rely mostly on either
localised or global feature representation alone. This ignores their joint
benefit and mutual complementary effects. In this work, we show the advantages
of jointly learning local and global features in a Convolutional Neural Network
(CNN) by aiming to discover correlated local and global features in different
context. Specifically, we formulate a method for joint learning of local and
global feature selection losses designed to optimise person re-id when using
only generic matching metrics such as the L2 distance. We design a novel CNN
architecture for Jointly Learning Multi-Loss (JLML) of local and global
discriminative feature optimisation subject concurrently to the same re-id
labelled information. Extensive comparative evaluations demonstrate the
advantages of this new JLML model for person re-id over a wide range of
state-of-the-art re-id methods on five benchmarks (VIPeR, GRID, CUHK01, CUHK03,
Market-1501).
Jinfeng Rao, Ferhan Ture, Hua He, Oliver Jojic, Jimmy Lin
Subjects: Information Retrieval (cs.IR)
We tackle the novel problem of navigational voice queries posed against an
entertainment system, where viewers interact with a voice-enabled remote
controller to specify the program to watch. This is a difficult problem for
several reasons: such queries are short, even shorter than comparable voice
queries in other domains, which offers fewer opportunities for deciphering user
intent. Furthermore, ambiguity is exacerbated by underlying speech recognition
errors. We address these challenges by integrating word- and character-level
representations of the queries and by modeling voice search sessions to capture
the contextual dependencies in query sequences. Both are accomplished with a
probabilistic framework in which recurrent and feedforward neural network
modules are organized in a hierarchical manner. From a raw dataset of 32M voice
queries from 2.5M viewers on the Comcast Xfinity X1 entertainment system, we
extracted data to train and test our models. We demonstrate the benefits of our
hybrid representation and context-aware model, which significantly outperforms
models without context as well as the current deployed product.
Federico Nanni, Bhaskar Mitra, Matt Magnusson, Laura Dietz
Subjects: Information Retrieval (cs.IR)
Retrieving paragraphs to populate a Wikipedia article is a challenging task.
The new TREC Complex Answer Retrieval (TREC CAR) track introduces a
comprehensive dataset that targets this retrieval scenario. We present early
results from a variety of approaches — from standard information retrieval
methods (e.g., tf-idf) to complex systems that using query expansion using
knowledge bases and deep neural networks. The goal is to offer future
participants of this track an overview of some promising approaches to tackle
this problem.
Vedran Vukotic, Christian Raymond, Guillaume Gravier
Comments: 4 pages, 1 figure, 2 tables, published at ACM International Conference in Multimedia Retrieval (ICMR) 2017
Subjects: Multimedia (cs.MM); Information Retrieval (cs.IR)
Continuous multimodal representations suitable for multimodal information
retrieval are usually obtained with methods that heavily rely on multimodal
autoencoders. In video hyperlinking, a task that aims at retrieving video
segments, the state of the art is a variation of two interlocked networks
working in opposing directions. These systems provide good multimodal
embeddings and are also capable of translating from one representation space to
the other. Operating on representation spaces, these networks lack the ability
to operate in the original spaces (text or image), which makes it difficult to
visualize the crossmodal function, and do not generalize well to unseen data.
Recently, generative adversarial networks have gained popularity and have been
used for generating realistic synthetic data and for obtaining high-level,
single-modal latent representation spaces. In this work, we evaluate the
feasibility of using GANs to obtain multimodal representations. We show that
GANs can be used for multimodal representation learning and that they provide
multimodal representations that are superior to representations obtained with
multimodal autoencoders. Additionally, we illustrate the ability of visualizing
crossmodal translations that can provide human-interpretable insights on
learned GAN-based video hyperlinking models.
Sahil Manchanda, Ashish Anand
Comments: Accepted to appear in 3rd IEEE International Conference on Cybernetics (Spl Session: Deep Learning for Prediction and Estimation)
Subjects: Computation and Language (cs.CL)
Drug repositioning (DR) refers to identification of novel indications for the
approved drugs. The requirement of huge investment of time as well as money and
risk of failure in clinical trials have led to surge in interest in drug
repositioning. DR exploits two major aspects associated with drugs and
diseases: existence of similarity among drugs and among diseases due to their
shared involved genes or pathways or common biological effects. Existing
methods of identifying drug-disease association majorly rely on the information
available in the structured databases only. On the other hand, abundant
information available in form of free texts in biomedical research articles are
not being fully exploited. Word-embedding or obtaining vector representation of
words from a large corpora of free texts using neural network methods have been
shown to give significant performance for several natural language processing
tasks. In this work we propose a novel way of representation learning to obtain
features of drugs and diseases by combining complementary information available
in unstructured texts and structured datasets. Next we use matrix completion
approach on these feature vectors to learn projection matrix between drug and
disease vector spaces. The proposed method has shown competitive performance
with state-of-the-art methods. Further, the case studies on Alzheimer’s and
Hypertension diseases have shown that the predicted associations are matching
with the existing knowledge.
Lu Wang, Nick Beauchamp, Sarah Shugars, Kechen Qin
Comments: Accepted by TACL, 14 pages
Subjects: Computation and Language (cs.CL)
Debate and deliberation play essential roles in politics and government, but
most models presume that debates are won mainly via superior style or agenda
control. Ideally, however, debates would be won on the merits, as a function of
which side has the stronger arguments. We propose a predictive model of debate
that estimates the effects of linguistic features and the latent persuasive
strengths of different topics, as well as the interactions between the two.
Using a dataset of 118 Oxford-style debates, our model’s combination of content
(as latent topics) and style (as linguistic features) allows us to predict
audience-adjudicated winners with 74% accuracy, significantly outperforming
linguistic features alone (66%). Our model finds that winning sides employ
stronger arguments, and allows us to identify the linguistic features
associated with strong or weak arguments.
Kechen Qin, Lu Wang, Joseph Kim
Comments: Accepted by ACL 2017. 11 pages
Subjects: Computation and Language (cs.CL)
We present a joint modeling approach to identify salient discussion points in
spoken meetings as well as to label the discourse relations between speaker
turns. A variation of our model is also discussed when discourse relations are
treated as latent variables. Experimental results on two popular meeting
corpora show that our joint model can outperform state-of-the-art approaches
for both phrase-based content selection and discourse relation prediction
tasks. We also evaluate our model on predicting the consistency among team
members’ understanding of their group decisions. Classifiers trained with
features constructed from our model achieve significant better predictive
performance than the state-of-the-art.
Firoj Alam, Morena Danieli, Giuseppe Riccardi
Comments: Journal
Subjects: Computation and Language (cs.CL)
Empathy, as defined in behavioral sciences, expresses the ability of human
beings to recognize, understand and react to emotions, attitudes and beliefs of
others. The lack of an operational definition of empathy makes it difficult to
measure it. In this paper, we address two related problems in automatic
affective behavior analysis: the design of the annotation protocol and the
automatic recognition of empathy from spoken conversations. We propose and
evaluate an annotation scheme for empathy inspired by the modal model of
emotions. The annotation scheme was evaluated on a corpus of real-life, dyadic
spoken conversations. In the context of behavioral analysis, we designed an
automatic segmentation and classification system for empathy. Given the
different speech and language levels of representation where empathy may be
communicated, we investigated features derived from the lexical and acoustic
spaces. The feature development process was designed to support both the fusion
and automatic selection of relevant features from high dimensional space. The
automatic classification system was evaluated on call center conversations
where it showed significantly better performance than the baseline.
Kyle Richardson, Jonas Kuhn
Comments: accepted to ACL-2017
Subjects: Computation and Language (cs.CL)
We consider the problem of translating high-level textual descriptions to
formal representations in technical documentation as part of an effort to model
the meaning of such documentation. We focus specifically on the problem of
learning translational correspondences between text descriptions and grounded
representations in the target documentation, such as formal representation of
functions or code templates. Our approach exploits the parallel nature of such
documentation, or the tight coupling between high-level text and the low-level
representations we aim to learn. Data is collected by mining technical
documents for such parallel text-representation pairs, which we use to train a
simple semantic parsing model. We report new baseline results on sixteen novel
datasets, including the standard library documentation for nine popular
programming languages across seven natural languages, and a small collection of
Unix utility manuals.
Lukas Galke, Florian Mai, Alan Schelten, Dennis Brunsch, Ansgar Scherp
Comments: 10 pages, 1 figure, 3 tables
Subjects: Digital Libraries (cs.DL); Computation and Language (cs.CL)
Until today there has been no systematic comparison of how far document
classification can be conducted using just the titles of the documents.
However, methods using only the titles are very important since automated
processing of titles has no legal barriers. Copyright laws often hinder
automated document classification on full-text and even abstracts. In this
paper, we compare established methods like Bayes, Rocchio, kNN, SVM, and
logistic regression as well as recent methods like Learning to Rank and neural
networks to the multi-label document classification problem. We demonstrate
that classifications solely using the documents’ titles can be very good and
very close to the classification results using full-text. We use two
established news corpora and two scientific document collections. The
experiments are large-scale in terms of documents per corpus (up to 100,000) as
well as number of labels (up to 10,000). The best method on title data is a
modern variant of neural networks. For three datasets, the difference to
full-text is very small. For one dataset, a stacking of logistic regression and
decision trees performs slightly better than neural networks. Furthermore, we
observe that the best methods on titles are even better than several
state-of-the-art methods on full-text.
Guy Even, Reut Levi, Moti Medina
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
In this paper we present distributed testing algorithms of graph properties
in the CONGEST-model [Censor-Hillel et al. 2016]. We present one-sided error
testing algorithms in the general graph model.
We first describe a general procedure for converting (epsilon)-testers with
a number of rounds (f(D)), where (D) denotes the diameter of the graph, to
(O((log n)/epsilon)+f((log n)/epsilon)) rounds, where (n) is the number of
processors of the network. We then apply this procedure to obtain an optimal
tester, in terms of (n), for testing bipartiteness, whose round complexity is
(O(epsilon^{-1}log n)), which improves over the (poly(epsilon^{-1} log
n))-round algorithm by Censor-Hillel et al. (DISC 2016). Moreover, for
cycle-freeness, we obtain a emph{corrector} of the graph that locally corrects
the graph so that the corrected graph is acyclic. Note that, unlike a tester, a
corrector needs to mend the graph in many places in the case that the graph is
far from having the property.
In the second part of the paper we design algorithms for testing whether the
network is (H)-free for any connected (H) of size up to four with round
complexity of (O(epsilon^{-1})). This improves over the
(O(epsilon^{-2}))-round algorithms for testing triangle freeness by
Censor-Hillel et al. (DISC 2016) and for testing excluded graphs of size (4) by
Fraigniaud et al. (DISC 2016).
In the last part we generalize the global tester by Iwama and Yoshida (ITCS
2014) of testing (k)-path freeness to testing the exclusion of any tree of
order (k). We then show how to simulate this algorithm in the CONGEST-model in
(O(k^{k^2+1}cdotepsilon^{-k})) rounds.
Damien Imbs, Achour Mostéfaoui, Matthieu Perrin, Michel Raynal
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
It is well-known that consensus (one-set agreement) and total order broadcast
are equivalent in asynchronous systems prone to process crash failures.
Considering wait-free systems, this article addresses and answers the following
question: which is the communication abstraction that “captures” (k)-set
agreement? To this end, it introduces a new broadcast communication
abstraction, called (k)-BO-Broadcast, which restricts the disagreement on the
local deliveries of the messages that have been broadcast ((1)-BO-Broadcast
boils down to total order broadcast). Hence, in this context, (k=1) is not a
special number, but only the first integer in an increasing integer sequence.
This establishes a new “correspondence” between distributed agreement
problems and communication abstractions, which enriches our understanding of
the relations linking fundamental issues of fault-tolerant distributed
computing.
Hsiang-Huang Wu, Chien-Min Wang, Hsuan-Chi Kuo, Wei-Chun Chung, Jan-Ming Ho
Comments: 10 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Suffix Array (SA) is a cardinal data structure in many pattern matching
applications, including data compression, plagiarism detection and sequence
alignment. However, as the volumes of data increase abruptly, the construction
of SA is not amenable to the current large-scale data processing frameworks
anymore due to its intrinsic proliferation of suffixes during the construction.
That is, ameliorating the performance by just adding the resources to the
frameworks becomes less cost- effective, even having the severe diminishing
returns. At issue now is whether we can permit SA construction to be more
scalable and efficient for the everlasting accretion of data by creating a
radical shift in perspective. Regarding TeraSort [1] as our baseline, we first
demonstrate the fragile scalability of TeraSort and investigate what causes it
through the experiments on the sequence alignment of a grouper (i.e., the SA
construc- tion used in bioinformatics). As such, we propose a scheme that
amalgamates the distributed key-value store system into MapReduce to leverage
the in-memory queries about suffixes. Rather than handling the communication of
suffixes, MapReduce is in charge of the communication of their indexes, which
means better capacity for more data. It significantly abates the required disk
space for constructing SA and better utilizes the memory, which in turn
improves the scalability radically. We also examine the efficiency of our
scheme in terms of memory and show it outperforms TeraSort. At last, our scheme
can complete the pair- end sequencing and alignment with two input files
without any degradation on scalability, and can accommodate the suffixes of
nearly 6.7 TB in a small cluster composed of 16 nodes and Gigabit Ethernet
without any compression.
Cedric Nugteren
Subjects: Mathematical Software (cs.MS); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
This work demonstrates how to accelerate dense linear algebra computations
using CLBlast, an open-source OpenCL BLAS library providing optimized routines
for a wide variety of devices. It is targeted at machine learning and HPC
applications and thus provides a fast matrix-multiplication routine (GEMM) to
accelerate the core of many applications (e.g. deep learning, iterative
solvers, astrophysics, computational fluid dynamics, quantum chemistry).
CLBlast has four main advantages over other BLAS libraries: 1) it is optimized
for and tested on a large variety of OpenCL devices including less commonly
used devices such as embedded and low-power GPUs, 2) it can be explicitly tuned
for specific problem-sizes on specific hardware platforms, 3) it can perform
operations in half-precision floating-point FP16 saving precious bandwidth,
time and energy, 4) and it can combine multiple operations in a single batched
routine, accelerating smaller problems significantly. This paper describes the
library and demonstrates the advantages of CLBlast experimentally for different
use-cases on a wide variety of OpenCL hardware.
Saptarshi Das, Xi Chen, Michael P. Hobson
Comments: 13 pages, 23 figures
Journal-ref: IEEE Transactions on Computational Imaging, vol. 3, no. 2, pp.
316-329, Jun. 2017
Subjects: Geophysics (physics.geo-ph); Distributed, Parallel, and Cluster Computing (cs.DC); Computational Physics (physics.comp-ph)
A novel approach is presented for fast generation of synthetic seismograms
due to microseismic events, using heterogeneous marine velocity models. The
partial differential equations (PDEs) for the 3D elastic wave equation have
been numerically solved using the Fourier domain pseudo-spectral method which
is parallelizable on the graphics processing unit (GPU) cards, thus making it
faster compared to traditional CPU based computing platforms. Due to
computationally expensive forward simulation of large geological models,
several combinations of individual synthetic seismic traces are used for
specified microseismic event locations, in order to simulate the effect of
realistic microseismic activity patterns in the subsurface. We here explore the
patterns generated by few hundreds of microseismic events with different source
mechanisms using various combinations, both in event amplitudes and origin
times, using the simulated pressure and three component particle velocity
fields via 1D, 2D and 3D seismic visualizations.
Zane Witherspoon
Comments: 12 pages, blockchain theory, business theory
Subjects: Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)
Blockchain technology as a whole is experiencing a dramatic rise in adoption,
in no small part due to the developer-friendly Ethereum network. While the
number of smart-contract powered distributed applications (Dapps) continues to
rise, they face many of the same challenges all new technologies face as they
are introduced to a market. By modeling the consumer adoption of blockchain
technology and analyzing scholarly literature on supply-side factors affecting
the diffusion of technology, we seek to prove the growth of a Dapp can be
accelerated using abstraction, whole product planning, and complementaries.
Bogdan Czejdo, Sambit Bhattacharya, Mikołaj Baszun, Wiktor B. Daszczuk
Comments: 11 pages, 5 figures
Journal-ref: Autobusy-TEST, vol. 17, No.3, pp.1294-1301 (2016)
Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)
Environmental changes, failures, collisions or even terrorist attacks can
cause serious malfunctions of the delivery systems. We have presented a novel
approach improving resilience of Autonomous Moving Platforms AMPs. The approach
is based on multi-level state diagrams describing environmental trigger
specifications, movement actions and synchronization primitives. The upper
level diagrams allowed us to model advanced interactions between autonomous
AMPs and detect irregularities such as deadlocks live-locks etc. The techniques
were presented to verify and analyze combined AMPs’ behaviors using model
checking technique. The described system, Dedan verifier, is still under
development. In the near future, a graphical form of verified system
representation is planned.
Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh
Subjects: Learning (cs.LG)
We consider ((epsilon,delta))-PAC maximum-selection and ranking for general
probabilistic models whose comparisons probabilities satisfy strong stochastic
transitivity and stochastic triangle inequality. Modifying the popular knockout
tournament, we propose a maximum-selection algorithm that uses
(mathcal{O}left(frac{n}{epsilon^2}log frac{1}{delta}
ight))
comparisons, a number tight up to a constant factor. We then derive a general
framework that improves the performance of many ranking algorithms, and combine
it with merge sort and binary search to obtain a ranking algorithm that uses
(mathcal{O}left(frac{nlog n (log log n)^3}{epsilon^2}
ight))
comparisons for any (deltagefrac1n), a number optimal up to a ((log log
n)^3) factor.
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
Comments: In ICML 2017. Website at this https URL
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
In many real-world scenarios, rewards extrinsic to the agent are extremely
sparse, or absent altogether. In such cases, curiosity can serve as an
intrinsic reward signal to enable the agent to explore its environment and
learn skills that might be useful later in its life. We formulate curiosity as
the error in an agent’s ability to predict the consequence of its own actions
in a visual feature space learned by a self-supervised inverse dynamics model.
Our formulation scales to high-dimensional continuous state spaces like images,
bypasses the difficulties of directly predicting pixels, and, critically,
ignores the aspects of the environment that cannot affect the agent. The
proposed approach is evaluated in two environments: VizDoom and Super Mario
Bros. Three broad settings are investigated: 1) sparse extrinsic reward, where
curiosity allows for far fewer interactions with the environment to reach the
goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent
to explore more efficiently; and 3) generalization to unseen scenarios (e.g.
new levels of the same game) where the knowledge gained from earlier experience
helps the agent explore new places much faster than starting from scratch. Demo
video and code available at this https URL
Ahmed M. Alaa, Scott Hu, Mihaela van der Schaar
Subjects: Learning (cs.LG)
Critically ill patients in regular wards are vulnerable to unanticipated
adverse events which require prompt transfer to the intensive care unit (ICU).
To allow for accurate prognosis of deteriorating patients, we develop a novel
continuous-time probabilistic model for a monitored patient’s temporal sequence
of physiological data. Our model captures “informatively sampled” patient
episodes: the clinicians’ decisions on when to observe a hospitalized patient’s
vital signs and lab tests over time are represented by a marked Hawkes process,
with intensity parameters that are modulated by the patient’s latent clinical
states, and with observable physiological data (mark process) modeled as a
switching multi-task Gaussian process. In addition, our model captures
“informatively censored” patient episodes by representing the patient’s latent
clinical states as an absorbing semi-Markov jump process. The model parameters
are learned from offline patient episodes in the electronic health records via
an EM-based algorithm. Experiments conducted on a cohort of patients admitted
to a major medical center over a 3-year period show that risk prognosis based
on our model significantly outperforms the currently deployed medical risk
scores and other baseline machine learning algorithms.
Nicolas Papernot, Patrick McDaniel
Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Machine learning is vulnerable to adversarial examples: inputs carefully
modified to force misclassification. Designing defenses against such inputs
remains largely an open problem. In this work, we revisit defensive
distillation—which is one of the mechanisms proposed to mitigate adversarial
examples—to address its limitations. We view our results not only as an
effective way of addressing some of the recently discovered attacks but also as
reinforcing the importance of improved training techniques.
Ivo Danihelka, Balaji Lakshminarayanan, Benigno Uria, Daan Wierstra, Peter Dayan
Subjects: Learning (cs.LG)
We train a generator by maximum likelihood and we also train the same
generator architecture by Wasserstein GAN. We then compare the generated
samples, exact log-probability densities and approximate Wasserstein distances.
We show that an independent critic trained to approximate Wasserstein distance
between the validation set and the generator distribution helps detect
overfitting. Finally, we use ideas from the one-shot learning literature to
develop a novel fast learning critic.
Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker
Comments: To be published in Machine Learning Journal
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO); Machine Learning (stat.ML)
This article provides the first survey of computational models of emotion in
reinforcement learning (RL) agents. The survey focuses on agent/robot emotions,
and mostly ignores human user emotions. Emotions are recognized as functional
in decision-making by influencing motivation and action selection. Therefore,
computational emotion models are usually grounded in the agent’s decision
making architecture, of which RL is an important subclass. Studying emotions in
RL-based agents is useful for three research fields. For machine learning (ML)
researchers, emotion models may improve learning efficiency. For the
interactive ML and human-robot interaction (HRI) community, emotions can
communicate state and enhance user investment. Lastly, it allows affective
modelling (AM) researchers to investigate their emotion theories in a
successful AI agent class. This survey provides background on emotion theory
and RL. It systematically addresses 1) from what underlying dimensions (e.g.,
homeostasis, appraisal) emotions can be derived and how these can be modelled
in RL-agents, 2) what types of emotions have been derived from these
dimensions, and 3) how these emotions may either influence the learning
efficiency of the agent or be useful as social signals. We also systematically
compare evaluation criteria, and draw connections to important RL sub-domains
like (intrinsic) motivation and model-based RL. In short, this survey provides
both a practical overview for engineers wanting to implement emotions in their
RL agents, and identifies challenges and directions for future emotion-RL
research.
Heng Guo, Kaan Kara, Ce Zhang
Comments: 15 pages
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)
For Markov chain Monte Carlo methods, one of the greatest discrepancies
between theory and system is the scan order – while most theoretical
development on the mixing time analysis deals with random updates, real-world
systems are implemented with systematic scans. We bridge this gap for models
that exhibit a bipartite structure, including, most notably, the
Restricted/Deep Boltzmann Machine. The de facto implementation for these models
scans variables in a layerwise fashion. We show that the Gibbs sampler with a
layerwise alternating scan order has its relaxation time (in terms of epochs)
no larger than that of a random-update Gibbs sampler (in terms of variable
updates). We also construct examples to show that this bound is asymptotically
tight. Through standard inequalities, our result also implies a comparison on
the mixing times.
Nicolò Cesa-Bianchi, Ohad Shamir
Comments: 21 pages
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We study how the regret guarantees of nonstochastic multi-armed bandits can
be improved, if the effective range of the losses in each round is small (e.g.
the maximal difference between two losses in a given round). Despite a recent
impossibility result, we show how this can be made possible under certain mild
additional assumptions, such as availability of rough estimates of the losses,
or advance knowledge of the loss of a single, possibly unspecified arm. Along
the way, we develop a novel technique which might be of independent interest,
to convert any multi-armed bandit algorithm with regret depending on the loss
range, to an algorithm with regret depending only on the effective range, while
avoiding predictably bad arms altogether.
Hongyun Cai, Vincent W. Zheng, Kevin Chen-Chuan Chang
Comments: Technical Report
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Graph embedding provides an efficient solution for graph analysis by
converting the graph into a low-dimensional space which preserves the structure
information. In contrast to the graph structure data, the i.i.d. node embedding
can be processed efficiently in terms of both time and space. Current
semi-supervised graph embedding algorithms assume the labelled nodes are given,
which may not be always true in the real world. While manually label all
training data is inapplicable, how to select the subset of training data to
label so as to maximize the graph analysis task performance is of great
importance. This motivates our proposed active graph embedding (AGE) framework,
in which we design a general active learning query strategy for any
semi-supervised graph embedding algorithm. AGE selects the most informative
nodes as the training labelled nodes based on the graphical information (i.e.,
node centrality) as well as the learnt node embedding (i.e., node
classification uncertainty and node embedding representativeness). Different
query criteria are combined with the time-sensitive parameters which shift the
focus from graph based query criteria to embedding based criteria as the
learning progresses. Experiments have been conducted on three public data sets
and the results verified the effectiveness of each component of our query
strategy and the power of combining them using time-sensitive parameters. Our
code is available online at: this https URL
Luo Luo, Cheng Chen, Zhihua Zhang, Wu-Jun Li
Subjects: Learning (cs.LG)
Online Newton step algorithms usually achieve good performance with less
training samples than first order methods, but require higher space and time
complexity in each iteration. In this paper, we develop a new sketching
strategy called regularized frequent direction (RFD) to improve the performance
of online Newton algorithms. Unlike the standard frequent direction (FD) which
only maintains a sketching matrix, the RFD introduces a regularization term
additionally. The regularization provides an adaptive stepsize for update,
which makes the algorithm more stable. The RFD also reduces the approximation
error of FD with almost the same cost and makes the online learning more robust
to hyperparameters. Empirical studies demonstrate that our approach outperforms
sate-of-the-art second order online learning algorithms.
Luke Metz, Julian Ibarz, Navdeep Jaitly, James Davidson
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
It has long been assumed that high dimensional continuous control problems
cannot be solved effectively by discretizing individual dimensions of the
action space due to the exponentially large number of bins over which policies
would have to be learned. In this paper, we draw inspiration from the recent
success of sequence-to-sequence models for structured prediction problems to
develop policies over discretized spaces. Central to this method is the
realization that complex functions over high dimensional spaces can be modeled
by neural networks that use next step prediction. Specifically, we show how
Q-values and policies over continuous spaces can be modeled using a next step
prediction model over discretized dimensions. With this parameterization, it is
possible to both leverage the compositional structure of action spaces during
learning, as well as compute maxima over action spaces (approximately). On a
simple example task we demonstrate empirically that our method can perform
global search, which effectively gets around the local optimization issues that
plague DDPG and NAF. We apply the technique to off-policy (Q-learning) methods
and show that our method can achieve the state-of-the-art for off-policy
methods on several continuous control tasks.
Emanuel Laude, Jan-Hendrik Lange, Frank Schmidt, Bjoern Andres, Daniel Cremers
Subjects: Learning (cs.LG)
This paper proposes an approach for tackling an abstract formulation of
weakly supervised learning, which is posed as a joint optimization problem in
the continuous model parameters and discrete label variables. We devise a novel
decomposition of the latter into purely discrete and continuous subproblems
within the framework of the Alternating Direction Method of Multipliers (ADMM),
which allows to efficiently compute a local minimum of the nonconvex objective
function. Our approach preserves integrality of the discrete label variables
and admits a globally convergent kernel formulation. The resulting method
implicitly alternates between a discrete and a continuous variable update,
however, it is inherently different from simple alternating optimization (hard
EM). In numerous experiments we illustrate that our method can learn a
classifier from weak and abstract combinatorial supervision thereby being
superior towards hard EM.
Qunwei Li, Yi Zhou, Yingbin Liang, Pramod K. Varshney
Comments: Accepted in ICML 2017, 9 papes, 4 figures
Subjects: Learning (cs.LG)
In many modern machine learning applications, structures of underlying
mathematical models often yield nonconvex optimization problems. Due to the
intractability of nonconvexity, there is a rising need to develop efficient
methods for solving general nonconvex problems with certain performance
guarantee. In this work, we investigate the accelerated proximal gradient
method for nonconvex programming (APGnc). The method compares between a usual
proximal gradient step and a linear extrapolation step, and accepts the one
that has a lower function value to achieve a monotonic decrease. In specific,
under a general nonsmooth and nonconvex setting, we provide a rigorous argument
to show that the limit points of the sequence generated by APGnc are critical
points of the objective function. Then, by exploiting the
Kurdyka-{L}ojasiewicz (KL) property for a broad class of functions, we
establish the linear and sub-linear convergence rates of the function value
sequence generated by APGnc. We further propose a stochastic variance reduced
APGnc (SVRG-APGnc), and establish its linear convergence under a special case
of the KL property. We also extend the analysis to the inexact version of
these methods and develop an adaptive momentum strategy that improves the
numerical performance.
Alfredo V. Clemente, Humberto N. Castejón, Arjun Chandra
Subjects: Learning (cs.LG)
We propose a novel framework for efficient parallelization of deep
reinforcement learning algorithms, enabling these algorithms to learn from
multiple actors on a single machine. The framework is algorithm agnostic and
can be applied to on-policy, off-policy, value based and policy gradient based
algorithms. Given its inherent parallelism, the framework can be efficiently
implemented on a GPU, allowing the usage of powerful models while significantly
reducing training time. We demonstrate the effectiveness of our framework by
implementing an advantage actor-critic algorithm on a GPU, using on-policy
experiences and employing synchronous updates. Our algorithm achieves
state-of-the-art performance on the Atari domain after only a few hours of
training. Our framework thus opens the door for much faster experimentation on
demanding problem domains. Our implementation is open-source and is made public
at this https URL
Shuchu Han, Hao Huang, Hong Qin
Comments: submitted to ACM TKDD
Subjects: Learning (cs.LG)
The redundant features existing in high dimensional datasets always affect
the performance of learning and mining algorithms. How to detect and remove
them is an important research topic in machine learning and data mining
research. In this paper, we propose a graph based approach to find and remove
those redundant features automatically for high dimensional data. Based on the
framework of sparse learning based unsupervised feature selection, Sparse
Feature Graph (SFG) is introduced not only to model the redundancy between two
features, but also to disclose the group redundancy between two groups of
features. With SFG, we can divide the whole features into different groups, and
improve the intrinsic structure of data by removing detected redundant
features. With accurate data structure, quality indicator vectors can be
obtained to improve the learning performance of existing unsupervised feature
selection algorithms such as multi-cluster feature selection (MCFS). Our
experimental results on benchmark datasets show that the proposed SFG and
feature redundancy remove algorithm can improve the performance of unsupervised
feature selection algorithms consistently.
Yair Rivenson, Zoltan Gorocs, Harun Gunaydin, Yibo Zhang, Hongda Wang, Aydogan Ozcan
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics)
We demonstrate that a deep neural network can significantly improve optical
microscopy, enhancing its spatial resolution over a large field-of-view and
depth-of-field. After its training, the only input to this network is an image
acquired using a regular optical microscope, without any changes to its design.
We blindly tested this deep learning approach using various tissue samples that
are imaged with low-resolution and wide-field systems, where the network
rapidly outputs an image with remarkably better resolution, matching the
performance of higher numerical aperture lenses, also significantly surpassing
their limited field-of-view and depth-of-field. These results are
transformative for various fields that use microscopy tools, including e.g.,
life sciences, where optical microscopy is considered as one of the most widely
used and deployed techniques. Beyond such applications, our presented approach
is broadly applicable to other imaging modalities, also spanning different
parts of the electromagnetic spectrum, and can be used to design computational
imagers that get better and better as they continue to image specimen and
establish new transformations among different modes of imaging.
Zhou Xing, Eddy Baik, Yan Jiao, Nilesh Kulkarni, Chris Li, Gautam Muralidhar, Marzieh Parandehgheibi, Erik Reed, Abhishek Singhal, Fei Xiao, Chris Pouliot
Subjects: Sound (cs.SD); Learning (cs.LG)
While both the data volume and heterogeneity of the digital music content is
huge, it has become increasingly important and convenient to build a
recommendation or search system to facilitate surfacing these content to the
user or consumer community. Most of the recommendation models fall into two
primary species, collaborative filtering based and content based approaches.
Variants of instantiations of collaborative filtering approach suffer from the
common issues of so called “cold start” and “long tail” problems where there is
not much user interaction data to reveal user opinions or affinities on the
content and also the distortion towards the popular content. Content-based
approaches are sometimes limited by the richness of the available content data
resulting in a heavily biased and coarse recommendation result. In recent
years, the deep neural network has enjoyed a great success in large-scale image
and video recognitions. In this paper, we propose and experiment using deep
convolutional neural network to imitate how human brain processes hierarchical
structures in the auditory signals, such as music, speech, etc., at various
timescales. This approach can be used to discover the latent factor models of
the music based upon acoustic hyper-images that are extracted from the raw
audio waves of music. These latent embeddings can be used either as features to
feed to subsequent models, such as collaborative filtering, or to build
similarity metrics between songs, or to classify music based on the labels for
training such as genre, mood, sentiment, etc.
Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter I. Corke
Comments: 2 pages, to appear in the Deep Learning for Robotic Vision (DLRV) Workshop in CVPR 2017
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Systems and Control (cs.SY)
This paper introduces an end-to-end fine-tuning method to improve hand-eye
coordination in modular deep visuo-motor policies (modular networks) where each
module is trained independently. Benefiting from weighted losses, the
fine-tuning method significantly improves the performance of the policies for a
robotic planar reaching task.
Michael Tsang, Dehua Cheng, Yan Liu
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Interpreting deep neural networks can enable new applications for predictive
modeling where both accuracy and interpretability are required. In this paper,
we examine the underlying structure of a deep neural network to interpret the
statistical interactions it captures. Our key observation is that any input
features that interact with each other must follow strongly weighted
connections to a common hidden unit before the final output. We propose a novel
framework for detecting feature interactions of arbitrary order by interpreting
neural network weights. Our framework, which we call Neural Interaction
Detector (NID), accurately identifies meaningful interactions without an
exhaustive search on an exponential solution space of interaction candidates.
Empirical evaluation on both synthetic and real-world data showed the
effectiveness of NID, which can uncover interactions omitted by other methods
in orders of magnitude less time.
Ali Jadbabaie, Elchanan Mossel, M. Amin Rahimian
Subjects: Statistics Theory (math.ST); Computational Complexity (cs.CC); Learning (cs.LG); Multiagent Systems (cs.MA); Social and Information Networks (cs.SI)
Many important real-world decision-making problems involve interactions of
individuals with purely informational externalities, for example, in jury
deliberations, expert committees, etc. We model such interactions of rational
agents in a group, where they receive private information and act based on that
information while also observing other people’s beliefs or actions. As a
Bayesian agent attempts to infer the truth from her sequence of observations of
actions of others and her own private signal, she recursively refines her
belief on the signals that other players could have observed and actions that
they could have taken given that other players are also rational. The existing
literature addresses asymptotic properties of Bayesian group decisions
(important questions such as convergence to consensus and learning). In this
work, we address the computations that the Bayesian agent should undertake to
realize the optimal actions at every decision epoch. We use the iterated
eliminations of infeasible signals (IEIS) to model the thinking process as well
as the calculations of a Bayesian agent in a group decision scenario. We show
that IEIS algorithm runs in exponential time; however, when the group structure
is a partially ordered set the Bayesian calculations simplify and
polynomial-time computation of the Bayesian recommendations is possible. We
next shift attention to the case where agents reveal their beliefs (instead of
actions) at every decision epoch. We analyze the computational complexity of
the Bayesian belief formation in groups and show that it is NP-hard. We also
investigate the factors underlying this computational complexity and show how
belief calculations simplify in special network structures or cases with strong
inherent symmetries. We finally give insights about the statistical efficiency
(optimality) of the beliefs and its relations to computational efficiency.
Tianhua Xu
Subjects: Information Theory (cs.IT)
The achievable information rates of optical communication networks have been
widely increased over the past four decades with the introduction and
development of optical amplifiers, coherent detection, advanced modulation
formats, and digital signal processing techniques. These developments promoted
the revolution of optical communication systems and the growth of Internet,
towards the direction of high-capacity and long-distance transmissions. The
performance of long-haul high-capacity optical fiber communication systems is
significantly degraded by transmission impairments, such as chromatic
dispersion, polarization mode dispersion, laser phase noise and Kerr fiber
nonlinearities. With the entire capture of the amplitude and phase of the
signals using coherent optical detection, the powerful compensation and
effective mitigation of the transmission impairments can be implemented using
the digital signal processing in electrical domain. This becomes one of the
most promising techniques for next-generation optical communication networks to
achieve a performance close to the Shannon capacity limit. This chapter will
focus on the introduction and investigation of digital signal processing
employed for channel impairments compensation based on the coherent detection
of optical signals, to provide a roadmap for the design and implementation of
realtime optical fiber communication systems.
Chao He, Sheng Yang, Pablo Piantanida
Comments: 30 pages, 3 figures, submitted to IEEE Transactions on Wireless Communications (revised version)
Subjects: Information Theory (cs.IT)
The state-dependent (K)-user memoryless Broadcast Channel~(BC) with state
feedback is investigated. We propose a novel transmission scheme and derive its
corresponding achievable rate region, which, compared to some general schemes
that deal with feedback, has the advantage of being relatively simple and thus
is easy to evaluate. In particular, it is shown that the capacity region of the
symmetric erasure BC with an arbitrary input alphabet size is achievable with
the proposed scheme. For the fading Gaussian BC, we derive a symmetric
achievable rate as a function of the signal-to-noise ratio~(SNR) and a small
set of parameters. Besides achieving the optimal degrees of freedom at high
SNR, the proposed scheme is shown, through numerical results, to outperform
existing schemes from the literature in the finite SNR regime.
Jarkko Kaleva, Antti Tölli, Markku Juntti, Randall Berry, Michael Honig
Subjects: Information Theory (cs.IT)
Downlink beamforming techniques with low signaling overhead are proposed for
joint processing coordinated (JP) multi-point transmission. The objective is to
maximize the weighted sum rate within joint transmission clusters. As the
considered weighted sum rate maximization is a non-convex problem, successive
convex approximation techniques, based on weighted mean-squared error
minimization, are applied to devise algorithms with tractable computational
complexity. Decentralized algorithms are proposed to enable JP even with
limited backhaul connectivity. These algorithms rely provide a variety of
alternatives for signaling overhead, computational complexity and convergence
behavior. Time division duplexing is exploited to design transceiver training
techniques for two scenarios: stream specific estimation and direct estimation.
In the stream specific estimation, the base station and user equipment estimate
all of the stream specific precoded pilots individually and construct the
transmit/receive covariance matrices based on these pilot estimates. With the
direct estimation, only the intended transmission is separately estimated and
the covariance matrices constructed directly from the aggregate system-wide
pilots. The impact of feedback/backhaul signaling quantization is considered,
in order to further reduce the signaling overhead. Also, user admission is
being considered for time-correlated channels. The enhanced transceiver
convergence rate enables periodic beamformer reinitialization, which greatly
improves the achieved system performance in dense networks.
Yeong Foong Choo, Brian L. Evans, Alan Gatherer
Comments: 6 pages, 6 figures, submitted to Asilomar Conference on Signals, Systems, and Computers 2017
Subjects: Information Theory (cs.IT)
We propose a new complex block floating-point format to reduce implementation
complexity. The new format achieves wordlength reduction by sharing an exponent
across the block of samples, and uses box encoding for the shared exponent to
reduce quantization error. Arithmetic operations are performed on blocks of
samples at time, which can also reduce implementation complexity. For a case
study of a baseband quadrature amplitude modulation (QAM) transmitter and
receiver, we quantify the tradeoffs in signal quality vs. implementation
complexity using the new approach to represent IQ samples. Signal quality is
measured using error vector magnitude (EVM) in the receiver, and implementation
complexity is measured in terms of arithmetic complexity as well as memory
allocation and memory input/output rates. The primary contributions of this
paper are (1) a complex block floating-point format with box encoding of the
shared exponent to reduce quantization error, (2) arithmetic operations using
the new complex block floating-point format, and (3) a QAM transceiver case
study to quantify signal quality vs. implementation complexity tradeoffs using
the new format and arithmetic operations.
Andjela Draganic, Irena Orovic, Srdjan Stankovic
Comments: submitted to Facta Universitatis Scientific Journal, Series: Electronics and Energetics, March 2017
Subjects: Information Theory (cs.IT)
Compressive Sensing, as an emerging technique in signal processing is
reviewed in this paper together with its common applications. As an alternative
to the traditional signal sampling, Compressive Sensing allows a new
acquisition strategy with significantly reduced number of samples needed for
accurate signal reconstruction. The basic ideas and motivation behind this
approach are provided in the theoretical part of the paper. The commonly used
algorithms for missing data reconstruction are presented. The Compressive
Sensing applications have gained significant attention leading to an intensive
growth of signal processing possibilities. Hence, some of the existing
practical applications assuming different types of signals in real-world
scenarios are described and analyzed as well.
Nicolas Auguin, David Morales-Jimenez, Matthew McKay
Comments: 6 pages, 1 figure, 1 table
Subjects: Information Theory (cs.IT); Probability (math.PR)
This paper is concerned with the statistical properties of the Gram matrix
(mathbf{W}=mathbf{H}mathbf{H}^dagger), where (mathbf{H}) is a (2 imes2)
complex central Gaussian matrix whose elements have arbitrary variances. With
such arbitrary variance profile, this random matrix model fundamentally departs
from classical Wishart models and presents a significant challenge as the
classical analytical toolbox no longer directly applies. We derive new exact
expressions for the distribution of (mathbf{W}) and that of its eigenvalues by
means of an explicit parameterization of the group of unitary matrices. Our
results yield remarkably simple expressions, which are further leveraged to
study the outage data rate of a dual-antenna communication system under
different variance profiles.
Shudi Yang, Chuangqiang Hu
Comments: 12 pages
Subjects: Information Theory (cs.IT)
In this paper, by employing the results over Kummer extensions, we give an
arithmetic characterization of pure gaps at many totally ramified places over
the quotients of Hermitian curves, including the well-studied Hermitian curves
as special cases. The cardinality of these pure gaps is explicitly
investigated. In particular, the numbers of gaps and pure gaps at a pair of
distinct places are determined precisely, which can be regarded as an extension
of the previous work by Matthews (2001) considered Hermitian curves.
Additionally, some concrete examples are provided to illustrate our results.
Samith Abeywickrama, Tharaka Samarasinghe, Chin Keong Ho
Comments: 12 pages, 13 figures, Submitted to IEEE Transactions on Signal Processing, 2016 (Revised: March 2017)
Subjects: Information Theory (cs.IT)
Multiple antenna techniques that allow energy beamforming have been looked
upon as a possible candidate for increasing the transfer efficiency between the
energy transmitter (ET) and the energy receiver (ER) in wireless power
transfer. This paper introduces a novel scheme that facilitates energy
beamforming by utilizing Received Signal Strength Indicator (RSSI) values to
estimate the channel. Firstly, in the training stage, the ET will transmit
using each beamforming vector in a codebook, which is pre-defined using a
Cramer-Rao lower bound analysis. RSSI value corresponding to each beamforming
vector is fed back to the ET, and these values are used to estimate the channel
through a maximum likelihood analysis. The results that are obtained are
remarkably simple, requires minimal processing, and can be easily implemented.
The paper also validates the analytical results numerically, as well as
experimentally, and it is shown that the proposed method achieves impressive
results.
Abhishek Aich, P.Palanisamy
Subjects: Information Theory (cs.IT)
Direction of Arrival (DOA) estimation of multiple narrow-band coherent or
partially coherent sources is a major challenge in array signal processing.
Though many subspace- based algorithms are available in literature, none of
them tackle the problem of resolving coherent sources directly, e.g. without
modifying the sample data covariance matrix. Compressive Sensing (CS) based
sparse recovery algorithms are being applied as a novel technique to this area.
In this paper, we introduce Orthogonal Matching Pursuit (OMP) to the DOA
estimation problem. We demonstrate how a DOA estimation problem can be modelled
for sparse recovery problem and then exploited using OMP to obtain the set of
DOAs. Moreover, this algorithm uses only one snapshot to obtain the results.
The simulation results demonstrate the validity and advantages of using OMP
algorithm over the existing subspace- based algorithms.
Qimei Cui, Yu Gu, Wei Ni, Renping Liu
Subjects: Information Theory (cs.IT)
License-assisted access (LAA) is a promising technology to offload
dramatically increasing cellular traffic to unlicensed bands. Challenges arise
from the provision of quality-of-service (QoS) and the quantification of
capacity, due to the distributed and heterogeneous nature of LAA and legacy
systems (such as WiFi) coexisting in the bands. In this paper, we develop new
theories of the effective capacity to measure LAA under statistical QoS
requirements. A new four-state semi-Markovian model is developed to capture
transmission collisions, random backoffs, and lossy wireless channels of LAA in
distributed heterogeneous network environments. A closed-form expression for
the effective capacity is derived to comprehensively analyze LAA. The
four-state model is further abstracted to an insightful two-state equivalent
which reveals the concavity of the effective capacity in terms of transmit
rate. Validated by simulations, the concavity is exploited to maximize the
effective capacity and effective energy efficiency of LAA, and provide
significant improvements of 62.7% and 171.4%, respectively, over existing
approaches. Our results are of practical value to holistic designs and
deployments of LAA systems.
George C. Alexandropoulos, Melissa Duarte
Comments: 8 pages, 4 figures, IEEE ICC 2017
Subjects: Information Theory (cs.IT)
Incorporating full duplex operation in Multiple Input Multiple Output (MIMO)
systems provides the potential of boosting throughput performance. However, the
hardware complexity of the analog self-interference canceller scales with the
number of transmit and receive antennas, thus exploiting the benefits of analog
cancellation becomes impractical for full duplex MIMO transceivers. In this
paper, we present a novel architecture for the analog canceller comprising of
reduced number of taps (tap refers to a line of fixed delay and variable phase
shifter and attenuator) and simple multiplexers for efficient signal routing
among the transmit and receive radio frequency chains. In contrast to the
available analog cancellation architectures, the values for each tap and the
configuration of the multiplexers are jointly designed with the digital
beamforming filters according to certain performance objectives. Focusing on a
narrowband flat fading channel model as an example, we present a general
optimization framework for the joint design of analog cancellation and digital
beamforming. We also detail a particular optimization objective together with
its derived solution for the latter architectural components. Representative
computer simulation results demonstrate the superiority of the proposed low
complexity full duplex MIMO system over lately available ones.
Junan Zhu
Comments: PhD dissertation
Subjects: Information Theory (cs.IT)
Many real-world problems in machine learning, signal processing, and
communications assume that an unknown vector x is measured by a matrix A,
resulting in a vector y=Ax+z, where z denotes the noise; we call this a single
measurement vector (SMV) problem. Sometimes, multiple dependent vectors
x^{(j)}, jin {1,…,J}, are measured at the same time, forming the so-called
multi-measurement vector (MMV) problem. Both SMV and MMV are linear models
(LM’s), and the process of estimating the underlying vector(s) x from an LM
given the matrices, noisy measurements, and knowledge of the noise statistics,
is called a linear inverse problem. In some scenarios, the matrix A is stored
in a single processor and this processor also records its measurements y; this
is called centralized LM. In other scenarios, multiple sites are measuring the
same underlying unknown vector x, where each site only possesses part of the
matrix A; we call this multi-processor LM. Recently, due to an ever-increasing
amount of data and ever-growing dimensions in LM’s, it has become more
important to study large-scale linear inverse problems. In this dissertation,
we take advantage of tools in statistical physics and information theory to
advance the understanding of large-scale linear inverse problems. The intuition
of the application of statistical physics to our problem is that statistical
physics deals with large-scale problems, and we can make an analogy between an
LM and a thermodynamic system. In terms of information theory, although it was
originally developed to characterize the theoretic limits of digital
communication systems, information theory was later found to be rather useful
in analyzing and understanding other inference problems. (The full abstract
cannot fit in due to space limit. Please refer to the PDF.)
Mahesh Babu Vaddi, B. Sundar Rajan
Comments: Considerable overlap with that of arXiv:1705.03192v1 [cs.IT] 9 may 2017 (especially the introductory parts). Figures=9; Table=2
Subjects: Information Theory (cs.IT)
A single unicast index coding problem (SUICP) with symmetric neighboring
interference (SNI) has equal number of (K) messages and (K) receivers, the
(k)th receiver (R_{k}) wanting the (k)th message (x_{k}) and having the
side-information (mathcal{K}_{k}=(mathcal{I}_{k} cup x_{k})^c,) where
({I}_k= {x_{k-U},dots,x_{k-2},x_{k-1}}cup{x_{k+1},
x_{k+2},dots,x_{k+D}}) is the interference with (D) messages after and (U)
messages before its desired message. Maleki, Cadambe and Jafar obtained the
capacity of this symmetric neighboring interference single unicast index coding
problem (SNI-SUICP) with ((K)) tending to infinity and Blasiak, Kleinberg and
Lubetzky for the special case of ((D=U=1)) with (K) being finite. In this work,
for any finite (K) and arbitrary (D) we obtain the capacity for the case
(U=gcd(K,D+1)-1.) Our proof is constructive, i.e., we give an explicit
construction of a linear index code achieving the capacity.
Elena Boshkovska, Xiaoming Chen, Linglong Dai, Derrick Wing Kwan Ng, Robert Schober
Comments: Invited paper, IEEE VTC 2017, Fall, Toronto, Canada
Subjects: Information Theory (cs.IT)
We study the beamforming design for multiuser systems with simultaneous
wireless information and power transfer (SWIPT). Employing a practical
non-linear energy harvesting (EH) model, the design is formulated as a
non-convex optimization problem for the maximization of the minimum harvested
power across several energy harvesting receivers. The proposed problem
formulation takes into account imperfect channel state information (CSI) and a
minimum required signal-to-interference-plus-noise ratio (SINR). The globally
optimal solution of the design problem is obtained via the semidefinite
programming (SDP) relaxation approach. Interestingly, we can show that at most
one dedicated energy beam is needed to achieve optimality. Numerical results
demonstrate that with the proposed design a significant performance gain and
improved fairness can be provided to the users compared to two baseline
schemes.
Nikhilesh Rajaraman, Andrew Thangaraj, Ananda Theertha Suresh
Comments: IEEE International Symposium on Information Theory 2017, Aachen, Germany
Subjects: Information Theory (cs.IT)
The problem of estimating the missing mass or total probability of unseen
elements in a sequence of (n) random samples is considered under the squared
error loss function. The worst-case risk of the popular Good-Turing estimator
is shown to be between (0.6080/n) and (0.6179/n). The minimax risk is shown to
be lower bounded by (0.25/n). This appears to be the first such published
result on minimax risk for estimation of missing mass, which has several
practical and theoretical applications.
Sourbh Bhadane, Andrew Thangaraj
Comments: expanded version of a paper that appeared in the National Conference on Communications 2017, IIT Madras, Chennai, India
Subjects: Information Theory (cs.IT)
A code is said to be a Locally Recoverable Code (LRC) with availability if
every coordinate can be recovered from multiple disjoint sets of other
coordinates called recovering sets. The vector of sizes of recovering sets of a
coordinate is called its recovery profile. In this work, we consider LRCs with
availability under two different settings: (1) irregular recovery: non-constant
recovery profile that remains fixed for all coordinates, (2) unequal locality:
regular recovery profile that can vary with coordinates. For each setting, we
derive bounds for the minimum distance that generalize previously known bounds
to the cases of irregular or varying recovery profiles. For the case of regular
and fixed recovery profile, we show that a specific Tamo-Barg
polynomial-evaluation construction is optimal for all-symbol locality, and we
provide parity-check matrix constructions for information locality with
availability.
Yaning Zou, Mario Castañeda, Tommy Svensson, Gerhard Fettweis
Subjects: Information Theory (cs.IT)
In this paper, we propose a sequential hybrid beamforming design for
multi-link transmission over mmwave frequency bands. As a starting point, a
baseline data communication link is established via traditional analog
beamforming at both the BS and UE. If an extra RF chain is available at the UE,
it can continue to probe the propagation environment at the same frequencies.
In case the environment is favorable and system resources allow, a secondary
data communication link is established to enable multi-stream transmission. In
principle, the secondary link could be served by the same BS and/or one or
several other BS(s). To initialize the secondary data communication link, a
parallel beam search scheme is proposed, which helps the UE/BS to find a
suitable beam pair with given optimization criteria without interrupting the
baseline data communication. By applying the proposed two-step approach, hybrid
beamforming becomes an add-on feature that can be easily switched on over an
analog beamforming enabled system without interrupting its operation whenever
system requires. Meanwhile, the information obtained by deploying the proposed
parallel beam search scheme can also be used for deciding a back-up beam pair
if signal blockage occurs to the baseline data communication link.
Yaning Zou, Per Zetterberg, Ulf Gustavsson, Tommy Svensson, Ali Zaidi, Tobias Kadur, Wolfgang Rave, Gerhard Fettweis
Subjects: Information Theory (cs.IT)
In this paper, we study the joint impact of three major RF im-pairments,
namely, oscillator phase noise, power amplifier non-linearity and I/Q imbalance
on the performance of a mm-wave communication link based on OFDM modulation.
General im-pairment models are first derived for describing the joint effects
in each TX, each RX as well as a mm-wave communication link. Based on the
obtained signal models and initial air interface de-sign from the mmMAGIC
project, we numerically evaluate the impact of RF impairments on channel
estimation in terms of channel-to-noise ratio (CNR) and also channel
fluctuation due to common phase error (CPE) caused by phase noise within the
channel coherence time. Then the impact on the link performance in terms of
maximum sum rate is evaluated using extensive com-puter simulations. The
simulation results show that the used air interface design is generally robust
to the presence of RF impair-ments. With regard to the use of high order
modulation alphabet and implementation of low-power and low-cost RF
transceivers in mm-wave communication, special attention needs to be paid on
phase noise where the inter-carrier-interference (ICI) can become a major
limiting factor.
Yaning Zou, Wolfgang Rave, Gerhard Fettweis
Subjects: Information Theory (cs.IT)
In this paper, we propose an analog beamsteering approach for enabling
flexible hybrid beamforming design that can achieve performance close to
singular value decomposition (SVD) based digital beamforming with single user
case. As a starting point, assuming the use of a codebook with infinite
precision, we apply transposes of antenna array response matrices as analog
beam-formers at the BS and the UE sides. The resulting effective chan-nel
matrix including both propagation channel and analog beam-formers is shown to
be able to provide a maximal achievable rate very close to SVD based digital
beamforming at low to medium SNR. Next, the impact of a finite size codebook is
analyzed for practical analog beamformer design. A closed-form derivation is
obtained for mapping rate loss to the number of antennas, code-book sizes and
the received SNR. The analysis shows that in or-der to achieve performance
comparable to analog beamforming with infinite precision, the codebook size
should be at least larger than the number of implemented antennas. Therefore,
the pro-posed analog beamforming approach can be designed inde-pendently
without taking digital beamforming into account. Such an approach can lead to
the development of flexible hybrid beam-forming architectures that could adapt
to a changing propagation environment via agile analog beam selection and
maximize spatial multiplexing gain via proper digital beamformer design.
Chuili Kong, Caijun Zhong, Shi Jin, Sheng Yang, Hai Lin, Zhaoyang Zhang
Comments: 14 pages, 10 figures, Accepted to appear in IEEE Transactions on Wireless Communications
Subjects: Information Theory (cs.IT)
This paper considers a multipair amplify-and-forward massive MIMO relaying
system with low-resolution ADCs at both the relay and destinations. The channel
state information (CSI) at the relay is obtained via pilot training, which is
then utilized to perform simple maximum-ratio combining/maximum-ratio
transmission processing by the relay. Also, it is assumed that the destinations
use statistical CSI to decode the transmitted signals. Exact and approximated
closed-form expressions for the achievable sum rate are presented, which enable
the efficient evaluation of the impact of key system parameters on the system
performance. In addition, optimal relay power allocation scheme is studied, and
power scaling law is characterized. It is found that, with only low-resolution
ADCs at the relay, increasing the number of relay antennas is an effective
method to compensate for the rate loss caused by coarse quantization. However,
it becomes ineffective to handle the detrimental effect of low-resolution ADCs
at the destination. Moreover, it is shown that deploying massive relay antenna
arrays can still bring significant power savings, i.e., the transmit power of
each source can be cut down proportional to (1/M) to maintain a constant rate,
where (M) is the number of relay antennas.
Reinhard Heckel, Ilan Shomorony, Kannan Ramchandran, David N. C. Tse
Comments: To appear in Proc. of IEEE International Symposium on Information Theory (ISIT). Slightly extended version containing the proofs
Subjects: Information Theory (cs.IT)
Due to its longevity and enormous information density, DNA is an attractive
medium for archival storage. In this work, we study the fundamental limits and
tradeoffs of DNA-based storage systems under a simple model, motivated by
current technological constraints on DNA synthesis and sequencing. Our model
captures two key distinctive aspects of DNA storage systems: (1) the data is
written onto many short DNA molecules that are stored in an unordered way and
(2) the data is read by randomly sampling from this DNA pool. Under this model,
we characterize the storage capacity, and show that a simple index-based coding
scheme is optimal.
Mingxiong Zhao, Jong Yeol Ryu, Jemin Lee, Tony Q.S. Quek, Suili Feng
Comments: 15 pages,9 figures, to appear in IEEE Transactions on Wireless communications
Subjects: Information Theory (cs.IT)
For a user cooperation system with multiple antennas, we consider a trust
degree based cooperation techniques to explore the influence of the
trustworthiness between users on the communication systems. For the system with
two communication pairs, when one communication pair achieves its quality of
service (QoS) requirement, they can help the transmission of the other
communication pair according to the trust degree, which quantifies the
trustworthiness between users in the cooperation. For given trust degree, we
investigate the user cooperation strategies, which include the power allocation
and precoder design for various antenna configurations. For SISO and MISO
cases, we provide the optimal power allocation and beamformer design that
maximize the expected achievable rates while guaranteeing the QoS requirement.
For a SIMO case, we resort to semidefinite relaxation (SDR) technique and block
coordinate update (BCU) method to solve the corresponding problem, and
guarantee the rank-one solutions at each step. For a MIMO case, as MIMO is the
generalization of MISO and SIMO, the similarities among their problem
structures inspire us to combine the methods from MISO and SIMO together to
efficiently tackle MIMO case. Simulation results show that the trust degree
information has a great effect on the performance of the user cooperation in
terms of the expected achievable rate, and the proposed user cooperation
strategies achieve high achievable rates for given trust degree.
Nicolás Rubido, Celso Grebogi, Murilo S. Baptista
Comments: 5 pages, 3 figures, proceedings paper
Subjects: Adaptation and Self-Organizing Systems (nlin.AO); Information Theory (cs.IT)
Information Theory concepts and methodologies conform the background of how
communication systems are studied and understood. They are mainly focused on
the source-channel-receiver problem and on the asymptotic limits of accuracy
and communication rates, which are the classical problems studied by Shannon.
However, the impact of Information Theory on networks (acting as the channel)
is just starting. Here, we present an approach to understand how information
flows in any connected complex network. Our approach is based on defining
linear conservative flows that travel through the network from source to
receiver. This framework allows us to have an analytical description of the
problem and also linking the topological invariants of the network, such as the
node degree, with the information flow. In particular, our approach is able to
deal with information transmission in modular networks (networks containing
community structures) or multiplex networks (networks with multiple layers),
which are nowadays of paramount importance.
Jarek Duda
Comments: 3 pages, 4 figures
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)
Pyramid Vector Quantizer (PVQ) is a promising technique especially for
multimedia data compression, already used in Opus audio codec and considered
for AV1 video codec. It quantizes vectors from Euclidean unit sphere by first
projecting them to (L^1) norm unit sphere, then quantizing and encoding there.
This paper shows that the used standard radial projection is suboptimal and
proposes to tune its deformations by using parameterized power projection:
(x o x^p/|x^p|) instead, where the optimized power (p) is applied
coordinate-wise, getting usually (geq 0.5, dB) improvement comparing to
radial projection.
Qing Xue, Xuming Fang, Cheng-Xiang Wang
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
For future networks (i.e., the fifth generation (5G) wireless networks and
beyond), millimeter-wave (mmWave) communication with large available unlicensed
spectrum is a promising technology that enables gigabit multimedia
applications. Thanks to the short wavelength of mmWave radio, massive antenna
arrays can be packed into the limited dimensions of mmWave transceivers.
Therefore, with directional beamforming (BF), both mmWave transmitters (MTXs)
and mmWave receivers (MRXs) are capable of supporting multiple beams in 5G
networks. However, for the transmission between an MTX and an MRX, most works
have only considered a single beam, which means that they do not make full
potential use of mmWave. Furthermore, the connectivity of single beam
transmission can easily be blocked. In this context, we propose a single-user
multi-beam concurrent transmission scheme for future mmWave networks with
multiple reflected paths. Based on spatial spectrum reuse, the scheme can be
described as a multiple-input multiple-output (MIMO) technique in beamspace
(i.e., in the beam-number domain). Moreover, this study investigates the
challenges and potential solutions for implementing this scheme, including
multibeam selection, cooperative beam tracking, multi-beam power allocation and
synchronization. The theoretical and numerical results show that the proposed
beamspace SU-MIMO can largely improve the achievable rate of the transmission
between an MTX and an MRX and, meanwhile, can maintain the connectivity.
Mário S. Alvim, Konstantinos Chatzikokolakis, Yusuke Kawamoto, Catuscia Palamidessi
Subjects: Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)
We formalize the interplay between defender and adversary in a game-theoretic
framework adapted to the specific issues of quantitative information flow.
Assuming that both defender and adversary may be active and influence the
system during the attack, we define a general framework of information leakage
games in which the payoff function of the game is information leakage. We
provide methods for finding the solution and the equilibria of various kinds of
leakage games, varying according to whether players act simultaneously or
sequentially, and to whether the defender can conceal from the adversary the
channel used. We demonstrate that in some cases it is in the best interest of
not only the defender, but also of the adversary, to employ randomization in
the choice of actions. We illustrate the power of our framework with a detailed
case study of the Crowds protocol running on a MANET (Mobile Ad-hoc NETwork).
V.I. Yukalov, E.P. Yukalova
Comments: Latex file, 10 pages, 2 figures
Journal-ref: J. Phys. Conf. Ser. 826 (2017) 012021
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
Entanglement production, generated by an evolution operator, is considered.
The measure of entanglement production, introduced earlier by one of the
authors, is employed. As an illustration, a two-qubit register is studied and
the corresponding measure of evolutional entanglement production is calculated.
Such two-qubit registers can be realized by atomic systems in a double well or
by trapped atoms with two coherent modes.