Timothy J. Draelos, Nadine E. Miner, Christopher C. Lamb, Craig M. Vineyard, Kristofor D. Carlson, Conrad D. James, James B. Aimone
Comments: Submitted to IJCNN 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
Neural machine learning methods, such as deep neural networks (DNN), have
achieved remarkable success in a number of complex data processing tasks. These
methods have arguably had their strongest impact on tasks such as image and
audio processing – data processing domains in which humans have long held clear
advantages over conventional algorithms. In contrast to biological neural
systems, which are capable of learning continuously, deep artificial networks
have a limited ability for incorporating new information in an already trained
network. As a result, methods for continuous learning are potentially highly
impactful in enabling the application of deep networks to dynamic data sets.
Here, inspired by the process of adult neurogenesis in the hippocampus, we
explore the potential for adding new neurons to deep layers of artificial
neural networks in order to facilitate their acquisition of novel information
while preserving previously trained data representations. Our results on the
MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes
lower and upper case letters and digits, demonstrate that neurogenesis is well
suited for addressing the stability-plasticity dilemma that has long challenged
adaptive machine learning algorithms.
Yuzhen Lu
Comments: 5 pages, 5 figures
Subjects: Neural and Evolutionary Computing (cs.NE)
The standard LSTM, although it succeeds in the modeling long-range
dependences, suffers from a highly complex structure that can be simplified
through modifications to its gate units. This paper was to perform an empirical
comparison between the standard LSTM and three new simplified variants that
were obtained by eliminating input signal, bias and hidden unit signal from
individual gates, on the tasks of modeling two sequence datasets. The
experiments show that the three variants, with reduced parameters, can achieve
comparable performance with the standard LSTM. Due attention should be paid to
turning the learning rate to achieve high accuracies
Andrzej Jaszkiewicz
Subjects: Neural and Evolutionary Computing (cs.NE)
In this paper, we present an improved version of recently proposed Quick
Hypervolume algorithm for calculating exact hypervolume of the space dominated
by a set of d-dimensional points. This value is often used as a quality
indicator in multiobjective evolutionary algorithms and the efficiency of
calculating this indicator is of crucial importance especially in the case of
large set or many dimensional spaces. We use a similar divide and conquer
scheme as in the original Quick Hypervolume algorithm, we modify, however, the
way the problem is split into smaller sub-problems. Through both theoretical
analysis and computational study we show that our approach improves
computational complexity of the algorithm and running time of its
implementation.
Yuansi Chen, Cengiz Pehlevan, Dmitri B. Chklovskii
Comments: 2016 Asilomar Conference on Signals, Systems and Computers
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)
Recently, a novel family of biologically plausible online algorithms for
reducing the dimensionality of streaming data has been derived from the
similarity matching principle. In these algorithms, the number of output
dimensions can be determined adaptively by thresholding the singular values of
the input data matrix. However, setting such threshold requires knowing the
magnitude of the desired singular values in advance. Here we propose online
algorithms where the threshold is self-calibrating based on the singular values
computed from the existing observations. To derive these algorithms from the
similarity matching cost function we propose novel regularizers. As before,
these online algorithms can be implemented by Hebbian/anti-Hebbian neural
networks in which the learning rule depends on the chosen regularizer. We
demonstrate both mathematically and via simulation the effectiveness of these
online algorithms in various settings.
Venkataraman Santhanam, Vlad I. Morariu, Larry S. Davis
Comments: Submitted to CVPR on November 15th, 2016. Code will be made available soon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We present a Deep Convolutional Neural Network architecture which serves as a
generic image-to-image regressor that can be trained end-to-end without any
further machinery. Our proposed architecture: the Recursively Branched
Deconvolutional Network (RBDN) develops a cheap multi-context image
representation very early on using an efficient recursive branching scheme with
extensive parameter sharing and learnable upsampling. This multi-context
representation is subjected to a highly non-linear locality preserving
transformation by the remainder of our network comprising of a series of
convolutions/deconvolutions without any spatial downsampling. The RBDN
architecture is fully convolutional and can handle variable sized images during
inference. We provide qualitative/quantitative results on (3) diverse tasks:
relighting, denoising and colorization and show that our proposed RBDN
architecture obtains comparable results to the state-of-the-art on each of
these tasks when used off-the-shelf without any post processing or
task-specific architectural modifications.
Thomas Mesnard, Wulfram Gerstner, Johanni Brea
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In machine learning, error back-propagation in multi-layer neural networks
(deep learning) has been impressively successful in supervised and
reinforcement learning tasks. As a model for learning in the brain, however,
deep learning has long been regarded as implausible, since it relies in its
basic form on a non-local plasticity rule. To overcome this problem,
energy-based models with local contrastive Hebbian learning were proposed and
tested on a classification task with networks of rate neurons. We extended this
work by implementing and testing such a model with networks of leaky
integrate-and-fire neurons. Preliminary results indicate that it is possible to
learn a non-linear regression task with hidden layers, spiking neurons and a
local synaptic plasticity rule.
Xiaofang Wang, Yi Shi, Kris M. Kitani
Comments: Appear in ACCV 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Hashing is one of the most popular and powerful approximate nearest neighbor
search techniques for large-scale image retrieval. Most traditional hashing
methods first represent images as off-the-shelf visual features and then
produce hashing codes in a separate stage. However, off-the-shelf visual
features may not be optimally compatible with the hash code learning procedure,
which may result in sub-optimal hash codes. Recently, deep hashing methods have
been proposed to simultaneously learn image features and hash codes using deep
neural networks and have shown superior performance over traditional hashing
methods. Most deep hashing methods are given supervised information in the form
of pairwise labels or triplet labels. The current state-of-the-art deep hashing
method DPSH~cite{li2015feature}, which is based on pairwise labels, performs
image feature learning and hash code learning simultaneously by maximizing the
likelihood of pairwise similarities. Inspired by DPSH~cite{li2015feature}, we
propose a triplet label based deep hashing method which aims to maximize the
likelihood of the given triplet labels. Experimental results show that our
method outperforms all the baselines on CIFAR-10 and NUS-WIDE datasets,
including the state-of-the-art method DPSH~cite{li2015feature} and all the
previous triplet label based deep hashing methods.
Chen-Hsuan Lin, Simon Lucey
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this paper, we establish a theoretical connection between the classical
Lucas & Kanade (LK) algorithm and the emerging topic of Spatial Transformer
Networks (STNs). STNs are of interest to the vision and learning communities
due to their natural ability to combine alignment and classification within the
same theoretical framework. Inspired by the Inverse Compositional (IC) variant
of the LK algorithm, we present Inverse Compositional Spatial Transformer
Networks (IC-STNs). We demonstrate that IC-STNs can achieve better performance
than conventional STNs with less model capacity; in particular, we show
superior performance in pure image alignment tasks as well as joint
alignment/classification problems on real-world problems.
Alexander Krull, Eric Brachmann, Sebastian Nowozin, Frank Michel, Jamie Shotton, Carsten Rother
Subjects: Computer Vision and Pattern Recognition (cs.CV)
State-of-the-art computer vision algorithms often achieve efficiency by
making discrete choices about which hypotheses to explore next. This allows
allocation of computational resources to promising candidates, however, such
decisions are non-differentiable. As a result, these algorithms are hard to
train in an end-to-end fashion. In this work we propose to learn an efficient
algorithm for the task of 6D object pose estimation. Our system optimizes the
parameters of an existing state-of-the art pose estimation system using
reinforcement learning, where the pose estimation system now becomes the
stochastic policy, parametrized by a CNN. Additionally, we present an efficient
training algorithm that dramatically reduces computation time. We show
empirically that our learned pose estimation procedure makes better use of
limited resources and improves upon the state-of-the-art on challenging
datasets. Our approach enables differentiable end-to-end training of complex
algorithmic pipelines and learns to make optimal use of a given computational
budget.
Nima Sedaghat
Comments: 8 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
CNN-based optical flow estimators have attracted attentions recently, mainly
due to their impressive speed. As successful as they’ve been on synthetic
datasets, they are still far behind the classical methods in real-world
scenarios, mainly due to lack of flow ground-truth. In the current work, we
seek to boost CNN-based flow estimation in real scenes with the help of the
freely available self-supervised task of next-frame prediction. To this end we
train the network in a hybrid way, providing it with a mixture of synthetic and
real videos. With the help of a novel time-variant multi-tasking architecture,
the network decides which of the tasks to learn at each point of time,
depending on the availability of ground-truth. Our proposed method improves
optical flow estimation in real scenes dramatically. We also experiment with
prediction of “next-flow” instead of estimation of the current flow, which is
intuitively more related to the task of next-frame prediction and improve the
results even more. We report and evaluate results both qualitatively and
quantitatively. The latter is done by training a single-stream action
classifier on the estimated flow fields on UCF101 & HMDB51 and demonstrate high
improvements of accuracy over the baseline: 10.2% and 9.6% respectively. Also
as a side product of this work, we report significant improvements over
state-of-the-art results in the task of next-frame prediction.
Holger Caesar, Jasper Uijlings, Vittorio Ferrari
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Semantic classes can be either things (objects with a well-defined shape,
e.g. car, person) or stuff (amorphous background regions, e.g. grass, sky).
While lots of classification and detection works focus on thing classes, less
attention has been given to stuff classes. Nonetheless, stuff classes are
important as they allow to explain important aspects of an image, including (1)
scene type; (2) which thing classes are likely to be present and their location
(determined through contextual reasoning); (3) physical attributes, material
types and geometric properties of the scene. To understand stuff and things in
context we annotate 10,000 images of the COCO dataset with a broad range of
stuff classes, using a specialized stuff annotation protocol allowing us to
efficiently label each pixel. On this dataset, we analyze several aspects: (a)
the importance of stuff and thing classes in terms of their surface cover and
how frequently they are mentioned in image captions; (b) the importance of
several visual criteria to discriminate stuff and thing classes; (c) we study
the spatial relations between stuff and things, highlighting the rich
contextual relations that make our dataset unique. Furthermore, we show
experimentally how modern semantic segmentation methods perform on stuff and
thing classes and answer the question whether stuff is easier to segment than
things. We release our new dataset and the trained models online, hopefully
promoting further research on stuff and stuff-thing contextual relations.
Oscar A. C. Linares, Glenda Michele Botelho, Francisco Aparecido Rodrigues, João Batista Neto
Comments: 20 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Image segmentation has many applications which range from machine learning to
medical diagnosis. In this paper, we propose a framework for the segmentation
of images based on super-pixels and algorithms for community identification in
graphs. The super-pixel pre-segmentation step reduces the number of nodes in
the graph, rendering the method the ability to process large images. Moreover,
community detection algorithms provide more accurate segmentation than
traditional approaches, such as those based on spectral graph partition. We
also compare our method with two algorithms: a) the graph-based approach by
Felzenszwalb and Huttenlocher and b) the contour-based method by Arbelaez.
Results have shown that our method provides more precise segmentation and is
faster than both of them.
Maksim Lapin, Matthias Hein, Bernt Schiele
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Top-k error is currently a popular performance measure on large scale image
classification benchmarks such as ImageNet and Places. Despite its wide
acceptance, our understanding of this metric is limited as most of the previous
research is focused on its special case, the top-1 error. In this work, we
explore two directions that shed more light on the top-k error. First, we
provide an in-depth analysis of established and recently proposed single-label
multiclass methods along with a detailed account of efficient optimization
algorithms for them. Our results indicate that the softmax loss and the smooth
multiclass SVM are surprisingly competitive in top-k error uniformly across all
k, which can be explained by our analysis of multiclass top-k calibration.
Further improvements for a specific k are possible with a number of proposed
top-k loss functions. Second, we use the top-k methods to explore the
transition from multiclass to multilabel learning. In particular, we find that
it is possible to obtain effective multilabel classifiers on Pascal VOC using a
single label per image for training, while the gap between multiclass and
multilabel methods on MS COCO is more significant. Finally, our contribution of
efficient algorithms for training with the considered top-k and multilabel loss
functions is of independent interest.
Zichuan Liu, Yixing Li, Fengbo Ren, Hao Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we develop a binary convolutional encoder-decoder network
(B-CEDNet) for natural scene text processing (NSTP). It converts a text image
to a class-distinguished salience map that reveals the categorical, spatial and
morphological information of characters. The existing solutions are either
memory consuming or run-time consuming that cannot be applied to real-time
applications on resource-constrained devices such as advanced driver assistance
systems. The developed network can process multiple regions containing
characters by one-off forward operation, and is trained to have binary weights
and binary feature maps, which lead to both remarkable inference run-time
speedup and memory usage reduction. By training with over 200, 000 synthesis
scene text images (size of (32 imes128)), it can achieve (90\%) and (91\%)
pixel-wise accuracy on ICDAR-03 and ICDAR-13 datasets. It only consumes (4.59
ms) inference run-time realized on GPU with a small network size of 2.14 MB,
which is up to (8 imes) faster and (96\%) smaller than it full-precision
version.
Marc Bolaños, Álvaro Peris, Francisco Casacuberta, Petia Radeva
Comments: Submitted to IbPRIA’17, 8 pages, 3 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
In this paper, we address the problem of visual question answering by
proposing a novel model, called VIBIKNet. Our model is based on integrating
Kernelized Convolutional Neural Networks and Long-Short Term Memory units to
generate an answer given a question about an image. We prove that VIBIKNet is
an optimal trade-off between accuracy and computational load, in terms of
memory and time consumption. We validate our method on the VQA challenge
dataset and compare it to the top performing methods in order to illustrate its
performance and speed.
Qiulei Dong, Zhanyi Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cadieu et al. (Cadieu,2014) reported that deep neural networks(DNNs) could
rival the representation of primate inferotemporal cortex for object
recognition. Lehky et al. (Lehky,2011) provided a statistical analysis on
neural responses to object stimuli in primate AIT cortex. They found the
intrinsic dimensionality of object representations in AIT cortex is around 100
(Lehky,2014). Considering the outstanding performance of DNNs in object
recognition, it is worthwhile investigating whether the responses of DNN
neurons have similar response statistics to those of AIT neurons. Following
Lehky et al.’s works, we analyze the response statistics to image stimuli and
the intrinsic dimensionality of object representations of DNN neurons. Our
findings show in terms of kurtosis and Pareto tail index, the response
statistics on single-neuron selectivity and population sparseness of DNN
neurons are fundamentally different from those of IT neurons except some
special cases. By increasing the number of neurons and stimuli, the conclusions
could alter substantially. In addition, with the ascendancy of the
convolutional layers of DNNs, the single-neuron selectivity and population
sparseness of DNN neurons increase, indicating the last convolutional layer is
to learn features for object representations, while the following
fully-connected layers are to learn categorization features. It is also found
that a sufficiently large number of stimuli and neurons are necessary for
obtaining a stable dimensionality. To our knowledge, this is the first work to
analyze the response statistics of DNN neurons comparing with AIT neurons, and
our results provide not only some insights into the discrepancy of DNN neurons
with respect to IT neurons in object representation, but also shed some light
on possible outcomes of IT neurons when the number of recorded neurons and
stimuli is beyond the level in (Lehky,2011,2014).
Jonghwan Mun, Minsu Cho, Bohyung Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Visual attention plays an important role to understand images and
demonstrates its effectiveness in generating natural language descriptions of
images. On the other hand, recent studies show that language associated with an
image can steer visual attention in the scene during our cognitive process.
Inspired by this, we introduce a text-guided attention model for image
captioning, which learns to drive visual attention using associated captions.
For this model, we propose an exemplar-based learning approach that retrieves
from training data associated captions with each image, and use them to learn
attention on visual features. Our attention model enables to describe a
detailed state of scenes by distinguishing small or confusable objects
effectively. We validate our model on MS-COCO Captioning benchmark and achieve
the state-of-the-art performance in standard metrics.
Dongkuan Xu, Jia Wu, Wei Zhang, Yingjie Tian
Comments: 11 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Positive instance detection, especially for these in positive bags (true
positive instances, TPIs), plays a key role for multiple instance learning
(MIL) arising from a specific classification problem only provided with bag (a
set of instances) label information. However, most previous MIL methods on this
issue ignore the global similarity among positive instances and that negative
instances are non-i.i.d., usually resulting in the detection of TPI not precise
and sensitive to outliers. To the end, we propose a positive instance detection
via graph updating for multiple instance learning, called PIGMIL, to detect TPI
accurately. PIGMIL selects instances from working sets (WSs) of some working
bags (WBs) as positive candidate pool (PCP). The global similarity among
positive instances and the robust discrimination of instances of PCP from
negative instances are measured to construct the consistent similarity and
discrimination graph (CSDG). As a result, the primary goal (i.e. TPI detection)
is transformed into PCP updating, which is approximated efficiently by updating
CSDG with a random walk ranking algorithm and an instance updating strategy. At
last bags are transformed into feature representation vector based on the
identified TPIs to train a classifier. Extensive experiments demonstrate the
high precision of PIGMIL’s detection of TPIs and its excellent performance
compared to classic baseline MIL methods.
Diqi Chen, Yizhou Wang, Tianfu Wu, Wen Gao
Comments: 9 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a recurrent attentional model (RAM) for general
no-reference image quality assessment (NR-IQA), that is to predict the
perceptual quality score for an input image without using any reference image
and/or prior knowledge regarding underlying distortions. The proposed RAM is
inspired by the well known visual attention mechanism, both covert and overt,
which affects many aspects of visual perception including image quality
assessment. The attentional mechanism is, however, largely ignored in the
NR-IQA literature. The proposed RAM hypothesizes that the attentional scanning
path in an image should contain intrinsic information for IQA. The RAM thus
consists of three components: a glimpse sub-network analyzing the quality at a
fixation using multi-scale information, a location sub-network selecting where
to look next by sampling a stochastic node, and a recurrent network aggregating
information along the scanning path to compute the final prediction. The RAM is
formulated under multi-task learning for the joint prediction of distortion
type and image quality score and for the REINFORCE
rule~cite{williams1992simple} used to handle the stochastic node. The RAM is
trained through back-propagation. In experiments, the RAM is tested on the
TID2008 dataset with promising performance obtained, which shows the
effectiveness of the proposed RAM. Furthermore, the RAM is very efficient in
the sense that a small number of glimpses are used usually in testing.
Daniël Reichman, Leslie M. Collins, Jordan M. Malof
Comments: 9 pages, 8 figures, journal paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Ground penetrating radar (GPR) is one of the most popular and successful
sensing modalities that has been investigated for landmine and subsurface
threat detection. Many of the detection algorithms applied to this task are
supervised and therefore require labeled examples of target and non-target data
for training. Training data most often consists of 2-dimensional images (or
patches) of GPR data, from which features are extracted, and provided to the
classifier during training and testing. Identifying desirable training and
testing locations to extract patches, which we term “keypoints”, is well
established in the literature. In contrast however, a large variety of
strategies have been proposed regarding keypoint utilization (e.g., how many of
the identified keypoints should be used at targets, or non-target, locations).
Given the variety keypoint utilization strategies that are available, it is
very unclear (i) which strategies are best, or (ii) whether the choice of
strategy has a large impact on classifier performance. We address these
questions by presenting a taxonomy of existing utilization strategies, and then
evaluating their effectiveness on a large dataset using many different
classifiers and features. We analyze the results and propose a new strategy,
called PatchSelect, which outperforms other strategies across all experiments.
Sahar Yousefi, M.T. Manzuri Shalmani
Comments: 15 pages, 14 figures, 3 Tables, Journal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, there has been a considerable attention to motion detection due to
the explosive growth of its application in video analysis and surveillance
systems. While good results can be achieved in the previous approaches,
accurate motion detection remains a challenging task due to the difficulties
raised by illumination variations, occlusion, camouflage, burst physical
motion, dynamic texture and environmental changes such as those on climate
changes, sunlight changes during a day, etc. In this paper, we propose a novel
perpixel motion descriptor for both motion detection and dynamic texture
segmentation which outperforms the literature in severe scenarios. The proposed
descriptor is based on two complementary three-dimensional-discrete wavelet
transform (3D-DWT) and three dimensional wavelet leader. In this approach, a
feature vector is extracted for each pixel by applying a novel three
dimensional wavelet based motion descriptor. Then, the extracted features are
clustered by the well-known k-means algorithm. The experimental results
demonstrate the effectiveness of our proposed method compared to the literature
motion detection approaches.
Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, Ghyslain Gagnon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research.
Yongqing Liang, Cheng Jin, Yuejie Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we establish a novel bottom-up cue named Convex Hull Overlap
(CHO), and then propose an effective approach to detect salient regions using
the combination of the CHO cue and global contrast cue. Our scheme
significantly differs from other earlier work in: 1) The hierarchical
segmentation model based on Normalized Graph-Cut fits the splitting and merging
processes in human visual perception; 2) Previous work only focuses on color
and texture cues, while our CHO cue makes up the obvious gap between the
spatial region covering and the region saliency. CHO is a kind of improved and
enhanced Gestalt cue, while other popular figure-ground cues such as convexity
and surroundedness can be regarded as the special cases of CHO. Our experiments
on a large number of public data have obtained very positive results.
Sankaraganesh Jonna, Krishna K. Nakka, Rajiv R. Sahay
Comments: The paper was accepted in VISAPP-2015
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Conventional approaches to image de-fencing suffer from non-robust fence
detection and are limited to processing images of static scenes. In this
position paper, we propose an automatic de-fencing algorithm for images of
dynamic scenes. We divide the problem of image de-fencing into the tasks of
automated fence detection, motion estimation and fusion of data from multiple
frames of a captured video of the dynamic scene. Fences are detected
automatically using two approaches, namely, employing Gabor filter and a
machine learning method. We cast the fence removal problem in an optimization
framework, by modeling the formation of the degraded observations. The inverse
problem is solved using split Bregman technique assuming total variation of the
de-fenced image as the regularization constraint.
Venkataraman Santhanam, Vlad I. Morariu, Larry S. Davis
Comments: Submitted to CVPR on November 15th, 2016. Code will be made available soon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We present a Deep Convolutional Neural Network architecture which serves as a
generic image-to-image regressor that can be trained end-to-end without any
further machinery. Our proposed architecture: the Recursively Branched
Deconvolutional Network (RBDN) develops a cheap multi-context image
representation very early on using an efficient recursive branching scheme with
extensive parameter sharing and learnable upsampling. This multi-context
representation is subjected to a highly non-linear locality preserving
transformation by the remainder of our network comprising of a series of
convolutions/deconvolutions without any spatial downsampling. The RBDN
architecture is fully convolutional and can handle variable sized images during
inference. We provide qualitative/quantitative results on (3) diverse tasks:
relighting, denoising and colorization and show that our proposed RBDN
architecture obtains comparable results to the state-of-the-art on each of
these tasks when used off-the-shelf without any post processing or
task-specific architectural modifications.
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, Dimitris Metaxas
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Synthesizing photo-realistic images from text descriptions is a challenging
problem in computer vision and has many practical applications. Samples
generated by existing text-to-image approaches can roughly reflect the meaning
of the given descriptions, but they fail to contain necessary details and vivid
object parts. In this paper, we propose stacked Generative Adversarial Networks
(StackGAN) to generate photo-realistic images conditioned on text descriptions.
The Stage-I GAN sketches the primitive shape and basic colors of the object
based on the given text description, yielding Stage-I low resolution images.
The Stage-II GAN takes Stage-I results and text descriptions as inputs, and
generates high resolution images with photo-realistic details. The Stage-II GAN
is able to rectify defects and add compelling details with the refinement
process. Samples generated by StackGAN are more plausible than those generated
by existing approaches. Importantly, our StackGAN for the first time generates
realistic 256 x 256 images conditioned on only text descriptions, while
state-of-the-art methods can generate at most 128 x 128 images. To demonstrate
the effectiveness of the proposed StackGAN, extensive experiments are conducted
on CUB and Oxford-102 datasets, which contain enough object appearance
variations and are widely-used for text-to-image generation analysis.
Hieu Le, Chen-Ping Yu, Gregory Zelinsky, Dimitris Samaras
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Co-localization is the problem of localizing categorical objects using only
positive sets of example images, without any form of further supervision. This
is a challenging task as there is no pixel-level annotations. Motivated by
human visual learning, we find the common features of an object category from
convolutional kernels of a pre-trained convolutional neural network (CNN). We
call these category-consistent CNN features. Then, we co-propagate their
activated spatial regions using superpixel geodesic distances for localization.
In our first set of experiments, we show that the proposed method achieves
state-of-the-art performance on three related benchmarks: PASCAL 2007,
PASCAL-2012, and the Object Discovery dataset. We also show that our method is
able to detect and localize truly unseen categories, using six held-out ImagNet
subset of categories with state-of-the-art accuracies. Our intuitive approach
achieves this success without any region proposals or object detectors, and can
be based on a CNN that was pre-trained purely on image classification tasks
without further fine-tuning.
Jianxu Chen, Chukka Srinivas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic detection of lymphocyte in H&E images is a necessary first step in
lots of tissue image analysis algorithms. An accurate and robust automated
lymphocyte detection approach is of great importance in both computer science
and clinical studies. Most of the existing approaches for lymphocyte detection
are based on traditional image processing algorithms and/or classic machine
learning methods. In the recent years, deep learning techniques have
fundamentally transformed the way that a computer interprets images and have
become a matchless solution in various pattern recognition problems. In this
work, we design a new deep neural network model which extends the fully
convolutional network by combining the ideas in several recent techniques, such
as shortcut links. Also, we design a new training scheme taking the prior
knowledge about lymphocytes into consideration. The training scheme not only
efficiently exploits the limited amount of free-form annotations from
pathologists, but also naturally supports efficient fine-tuning. As a
consequence, our model has the potential of self-improvement by leveraging the
errors collected during real applications. Our experiments show that our deep
neural network model achieves good performance in the images of different
staining conditions or different types of tissues.
Mehdi Mirza, Aaron Courville, Yoshua Bengio
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Humans learn a predictive model of the world and use this model to reason
about future events and the consequences of actions. In contrast to most
machine predictors, we exhibit an impressive ability to generalize to unseen
scenarios and reason intelligently in these settings. One important aspect of
this ability is physical intuition(Lake et al., 2016). In this work, we explore
the potential of unsupervised learning to find features that promote better
generalization to settings outside the supervised training distribution. Our
task is predicting the stability of towers of square blocks. We demonstrate
that an unsupervised model, trained to predict future frames of a video
sequence of stable and unstable block configurations, can yield features that
support extrapolating stability prediction to blocks configurations outside the
training set distribution
Jie Wang, Luyan Ji, Xiaomeng Huang, Haohuan Fu, Shiming Xu, Congcong Li
Comments: 38 pages, 5 figures
Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV)
There is a trend to acquire high accuracy land-cover maps using multi-source
classification methods, most of which are based on data fusion, especially
pixel- or feature-level fusions. A probabilistic graphical model (PGM) approach
is proposed in this research for 30 m resolution land-cover mapping with
multi-temporal Landsat and MODerate Resolution Imaging Spectroradiometer
(MODIS) data. Independent classifiers were applied to two single-date Landsat 8
scenes and the MODIS time-series data, respectively, for probability
estimation. A PGM was created for each pixel in Landsat 8 data. Conditional
probability distributions were computed based on data quality and reliability
by using information selectively. Using the administrative territory of Beijing
City (Area-1) and a coastal region of Shandong province, China (Area-2) as
study areas, multiple land-cover maps were generated for comparison.
Quantitative results show the effectiveness of the proposed method. Overall
accuracies promoted from 74.0% (maps acquired from single-temporal Landsat
images) to 81.8% (output of the PGM) for Area-1. Improvements can also be seen
when using MODIS data and only a single-temporal Landsat image as input
(overall accuracy: 78.4% versus 74.0% for Area-1, and 86.8% versus 83.0% for
Area-2). Information from MODIS data did not help much when the PGM was applied
to cloud free regions of. One of the advantages of the proposed method is that
it can be applied where multi-temporal data cannot be simply stacked as a
multi-layered image.
Hanie Sedghi, Ashish Sabharwal
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We consider knowledge base (KB) completion for KBs rich in facts about common
nouns (generics), such as “trees produce oxygen” or “dogs have tails”. While KB
completion has received much attention for named entity KBs, little emphasis
has been placed on generics despite their importance for capturing general
knowledge. Compared with named entity KBs such as Freebase, KBs about generics
have more complex underlying regularities, are substantially more incomplete,
and violate the commonly used locally closed world assumption (LCWA).
Consequently, existing completion methods struggle with this new task, and the
commonly used evaluation metrics become less meaningful. To address these
challenges, we make three contributions: (a) a tensor factorization approach
that achieves state-of-the-art results by incorporating external knowledge
about relation schema and entity taxonomy, (b) taxonomy guided submodular
active learning to efficiently collect additional annotations to address KB
incompleteness, and (c) a more appropriate metric (yield at high precision)
along with a constant-factor deterministic approximation algorithm to compute
it cost-effectively with only a logarithmic number of human annotations. An
empirical evaluation shows that our techniques achieve state-of-the-art results
on two novel generics KBs about elementary level science.
Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, Stig Petersen
Comments: 11 pages, 8 figures
Subjects: Artificial Intelligence (cs.AI)
DeepMind Lab is a first-person 3D game platform designed for research and
development of general artificial intelligence and machine learning systems.
DeepMind Lab can be used to study how autonomous artificial agents may learn
complex tasks in large, partially observed, and visually diverse worlds.
DeepMind Lab has a simple and flexible API enabling creative task-designs and
novel AI-designs to be explored and quickly iterated upon. It is powered by a
fast and widely recognised game engine, and tailored for effective use by the
research community.
Ludovic Hofer, Hugo Gimbert
Journal-ref: ICAPS 26th, PlanRob 4th (Workshop) (2016) 37-48
Subjects: Artificial Intelligence (cs.AI)
This paper presents a new method to learn online policies in continuous
state, continuous action, model-free Markov decision processes, with two
properties that are crucial for practical applications. First, the policies are
implementable with a very low computational cost: once the policy is computed,
the action corresponding to a given state is obtained in logarithmic time with
respect to the number of samples used. Second, our method is versatile: it does
not rely on any a priori knowledge of the structure of optimal policies. We
build upon the Fitted Q-iteration algorithm which represents the (Q)-value as
the average of several regression trees. Our algorithm, the Fitted Policy
Forest algorithm (FPF), computes a regression forest representing the Q-value
and transforms it into a single tree representing the policy, while keeping
control on the size of the policy using resampling and leaf merging. We
introduce an adaptation of Multi-Resolution Exploration (MRE) which is
particularly suited to FPF. We assess the performance of FPF on three classical
benchmarks for reinforcement learning: the “Inverted Pendulum”, the “Double
Integrator” and “Car on the Hill” and show that FPF equals or outperforms other
algorithms, although these algorithms rely on the use of particular
representations of the policies, especially chosen in order to fit each of the
three problems. Finally, we exhibit that the combination of FPF and MRE allows
to find nearly optimal solutions in problems where (epsilon)-greedy approaches
would fail.
Sahand Sharifzadeh, Ioannis Chiotellis, Rudolph Triebel, Daniel Cremers
Comments: NIPS workshop on Deep Learning for Action and Interaction, 2016
Subjects: Artificial Intelligence (cs.AI)
We propose an inverse reinforcement learning (IRL) approach using Deep
Q-Networks to extract the rewards in problems with large state spaces. We
evaluate the performance of this approach in a simulation-based autonomous
driving scenario. Our results resemble the intuitive relation between the
reward function and readings of distance sensors mounted at different poses on
the car. We also show that, after a few learning rounds, our simulated agent
generates collision-free motions and performs human-like lane change behaviour.
Vasileios Lampos
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
We provide a brief technical description of an online platform for disease
monitoring, titled as the Flu Detector (fludetector.cs.ucl.ac.uk). Flu
Detector, in its current version (v.0.5), uses either Twitter or Google search
data in conjunction with statistical Natural Language Processing models to
estimate the rate of influenza-like illness in the population of England. Its
back-end is a live service that collects online data, utilises modern
technologies for large-scale text processing, and finally applies statistical
inference models that are trained offline. The front-end visualises the various
disease rate estimates. Notably, the models based on Google data achieve a high
level of accuracy with respect to the most recent four flu seasons in England
(2012/13 to 2015/16). This highlighted Flu Detector as having a great potential
of becoming a complementary source to the domestic traditional flu surveillance
schemes.
Xiao Li, Cristian-Ioan Vasile, Calin Belta
Subjects: Artificial Intelligence (cs.AI)
The reward function plays a critical role in rein- forcement learning (RL).
It is a place where designers specify the desired behavior and impose important
constraints for the system. While most reward functions used in the current RL
literature have been quite simple for relatively simple tasks, real world
applications typically involve tasks that are logically more complex. In this
paper we take advantage of the expressive power of temporal logic (TL) to
express complex rules the system should follow, or incorporate domain knowledge
into learning. We propose a TL that we argue is suitable for robotic task
specification in RL. We also present the concept of one-step robustness, and
show that the problem of sparse reward that occurs when using TL specification
as the reward function can be alleviated while leaving the resultant policy
invariant. Simulated manipulation tasks are used to illustrate RL with the
proposed TL and the one-step robustness.
Feng Chen, Baojian Zhou
Comments: International Joint Conference on Artificial Intelligence, 2016
Subjects: Artificial Intelligence (cs.AI)
Sparsity-constrained optimization is an important and challenging problem
that has wide applicability in data mining, machine learning, and statistics.
In this paper, we focus on sparsity-constrained optimization in cases where the
cost function is a general nonlinear function and, in particular, the sparsity
constraint is defined by a graph-structured sparsity model. Existing methods
explore this problem in the context of sparse estimation in linear models. To
the best of our knowledge, this is the first work to present an efficient
approximation algorithm, namely, Graph-structured Matching Pursuit (Graph-Mp),
to optimize a general nonlinear function subject to graph-structured
constraints. We prove that our algorithm enjoys the strong guarantees analogous
to those designed for linear models in terms of convergence rate and
approximation accuracy. As a case study, we specialize Graph-Mp to optimize a
number of well-known graph scan statistic models for the connected subgraph
detection task, and empirical evidence demonstrates that our general algorithm
performs superior over state-of-the-art methods that are designed specifically
for the task of connected subgraph detection.
Judson Bandeira, Ig Ibert Bittencourt, Patricia Espinheira, Seiji Isotani
Subjects: Artificial Intelligence (cs.AI)
Modeling an ontology is a hard and time-consuming task. Although
methodologies are useful for ontologists to create good ontologies, they do not
help with the task of evaluating the quality of the ontology to be reused. For
these reasons, it is imperative to evaluate the quality of the ontology after
constructing it or before reusing it. Few studies usually present only a set of
criteria and questions, but no guidelines to evaluate the ontology. The effort
to evaluate an ontology is very high as there is a huge dependence on the
evaluator’s expertise to understand the criteria and questions in depth.
Moreover, the evaluation is still very subjective. This study presents a novel
methodology for ontology evaluation, taking into account three fundamental
principles: i) it is based on the Goal, Question, Metric approach for empirical
evaluation; ii) the goals of the methodologies are based on the roles of
knowledge representations combined with specific evaluation criteria; iii) each
ontology is evaluated according to the type of ontology. The methodology was
empirically evaluated using different ontologists and ontologies of the same
domain. The main contributions of this study are: i) defining a step-by-step
approach to evaluate the quality of an ontology; ii) proposing an evaluation
based on the roles of knowledge representations; iii) the explicit difference
of the evaluation according to the type of the ontology iii) a questionnaire to
evaluate the ontologies; iv) a statistical model that automatically calculates
the quality of the ontologies.
Pedram Daee, Tomi Peltola, Marta Soare, Samuel Kaski
Comments: 22 pages, 9 figures, all codes and data available at this https URL
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG); Machine Learning (stat.ML)
Prediction in a small-sized sample with a large number of covariates, the
“small n, large p” problem, is challenging. This setting is encountered in
multiple applications, such as precision medicine, where obtaining additional
samples can be extremely costly or even impossible, and extensive research
effort has recently been dedicated to finding principled solutions for accurate
prediction. However, a valuable source of additional information, domain
experts, has not yet been efficiently exploited. We formulate knowledge
elicitation generally as a probabilistic inference process, where expert
knowledge is sequentially queried to improve predictions. In the specific case
of sparse linear regression, where we assume the expert has knowledge about the
values of the regression coefficients or about the relevance of the features,
we propose an algorithm and computational approximation for fast and efficient
interaction, which sequentially identifies the most informative features on
which to query expert knowledge. Evaluations of our method in experiments with
simulated and real users show improved prediction accuracy already with a small
effort from the expert.
Rajendra Rana Bhat, Vivek Viswanath, Xiaolin Li
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Genomics (q-bio.GN)
Transcriptional profiling on microarrays to obtain gene expressions has been
used to facilitate cancer diagnosis. We propose a deep generative machine
learning architecture (called DeepCancer) that learn features from unlabeled
microarray data. These models have been used in conjunction with conventional
classifiers that perform classification of the tissue samples as either being
cancerous or non-cancerous. The proposed model has been tested on two different
clinical datasets. The evaluation demonstrates that DeepCancer model achieves a
very high precision score, while significantly controlling the false positive
and false negative scores.
Mason Bretan, Gil Weinberg, Larry Heck
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
Several methods exist for a computer to generate music based on data
including Markov chains, recurrent neural networks, recombinancy, and grammars.
We explore the use of unit selection and concatenation as a means of generating
music using a procedure based on ranking, where, we consider a unit to be a
variable length number of measures of music. We first examine whether a unit
selection method, that is restricted to a finite size unit library, can be
sufficient for encompassing a wide spectrum of music. We do this by developing
a deep autoencoder that encodes a musical input and reconstructs the input by
selecting from the library. We then describe a generative model that combines a
deep structured semantic model (DSSM) with an LSTM to predict the next unit,
where units consist of four, two, and one measures of music. We evaluate the
generative model using objective metrics including mean rank and accuracy and
with a subjective listening test in which expert musicians are asked to
complete a forced-choiced ranking task. We compare our model to a note-level
generative baseline that consists of a stacked LSTM trained to predict forward
by one note.
Oscar A. C. Linares, Glenda Michele Botelho, Francisco Aparecido Rodrigues, João Batista Neto
Comments: 20 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Image segmentation has many applications which range from machine learning to
medical diagnosis. In this paper, we propose a framework for the segmentation
of images based on super-pixels and algorithms for community identification in
graphs. The super-pixel pre-segmentation step reduces the number of nodes in
the graph, rendering the method the ability to process large images. Moreover,
community detection algorithms provide more accurate segmentation than
traditional approaches, such as those based on spectral graph partition. We
also compare our method with two algorithms: a) the graph-based approach by
Felzenszwalb and Huttenlocher and b) the contour-based method by Arbelaez.
Results have shown that our method provides more precise segmentation and is
faster than both of them.
Xun Wang, Katsuhito Sudoh, Masaaki Nagata, Tomohide Shibata, Kawahara Daisuke, Kurohashi Sadao
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This paper introduces a novel neural network model for question answering,
the emph{entity-based memory network}. It enhances neural networks’ ability of
representing and calculating information over a long period by keeping records
of entities contained in text. The core component is a memory pool which
comprises entities’ states. These entities’ states are continuously updated
according to the input text. Questions with regard to the input text are used
to search the memory pool for related entities and answers are further
predicted based on the states of retrieved entities. Compared with previous
memory network models, the proposed model is capable of handling fine-grained
information and more sophisticated relations based on entities. We formulated
several different tasks as question answering problems and tested the proposed
model. Experiments reported satisfying results.
Michael Crosscombe, Jonathan Lawry
Journal-ref: Adaptive Behavior. 24 (4). pp 249-260. 2016
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
Consensus formation is investigated for multi-agent systems in which agents’
beliefs are both vague and uncertain. Vagueness is represented by a third truth
state meaning emph{borderline}. This is combined with a probabilistic model of
uncertainty. A belief combination operator is then proposed which exploits
borderline truth values to enable agents with conflicting beliefs to reach a
compromise. A number of simulation experiments are carried out in which agents
apply this operator in pairwise interactions, under the bounded confidence
restriction that the two agents’ beliefs must be sufficiently consistent with
each other before agreement can be reached. As well as studying the consensus
operator in isolation we also investigate scenarios in which agents are
influenced either directly or indirectly by the state of the world. For the
former we conduct simulations which combine consensus formation with belief
updating based on evidence. For the latter we investigate the effect of
assuming that the closer an agent’s beliefs are to the truth the more visible
they are in the consensus building process. In all cases applying the consensus
operators results in the population converging to a single shared belief which
is both crisp and certain. Furthermore, simulations which combine consensus
formation with evidential updating converge faster to a shared opinion which is
closer to the actual state of the world than those in which beliefs are only
changed as a result of directly receiving new evidence. Finally, if agent
interactions are guided by belief quality measured as similarity to the true
state of the world, then applying the consensus operator alone results in the
population converging to a high quality shared belief.
Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, Ghyslain Gagnon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research.
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, Dimitris Metaxas
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Synthesizing photo-realistic images from text descriptions is a challenging
problem in computer vision and has many practical applications. Samples
generated by existing text-to-image approaches can roughly reflect the meaning
of the given descriptions, but they fail to contain necessary details and vivid
object parts. In this paper, we propose stacked Generative Adversarial Networks
(StackGAN) to generate photo-realistic images conditioned on text descriptions.
The Stage-I GAN sketches the primitive shape and basic colors of the object
based on the given text description, yielding Stage-I low resolution images.
The Stage-II GAN takes Stage-I results and text descriptions as inputs, and
generates high resolution images with photo-realistic details. The Stage-II GAN
is able to rectify defects and add compelling details with the refinement
process. Samples generated by StackGAN are more plausible than those generated
by existing approaches. Importantly, our StackGAN for the first time generates
realistic 256 x 256 images conditioned on only text descriptions, while
state-of-the-art methods can generate at most 128 x 128 images. To demonstrate
the effectiveness of the proposed StackGAN, extensive experiments are conducted
on CUB and Oxford-102 datasets, which contain enough object appearance
variations and are widely-used for text-to-image generation analysis.
Zhe Yu, Nicholas A. Kraft, Tim Menzies
Comments: 16 pages, 13 figures, submitted to Information and Software Technology
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Context: Systematic literature reviews (SLRs) are the primary method for
aggregating and synthesizing evidence in evidence-based software engineering.
Primary study selection is a critical and time-consuming SLR step in which
reviewers use titles, abstracts, or even full texts to evaluate thousands of
studies to find the dozens of them that are relevant to the research questions.
Objective: We seek to reduce the effort of primary study selection in SLRs with
machine assisted reading techniques. Method: In this paper we explore and
refactor the state-of-the-art machine assisted reading techniques from both
evidence-based medicine and legal electronic discovery to support SLRs. By
refactoring those methods, we discovered FASTREAD, which is a new
state-of-the-art in machine assisted primary studies for SLRs. Tested on two
data sets generated from existing SLRs of Hall, Wahono, et al., FASTREAD
outperforms the current state-of-the-art methods. Results: Using FASTREAD, it
is possible to find 90% of the studies found by standard manual methods, but
after only reading less than 10% of the candidate studies. Conclusions: With
the help of FASTREAD, conducting an SLR is much more efficient and less
difficult. Software Engineering researchers now have no excuse: they should
conduct SLRs.
Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis
Comments: In Proceedings of the 39th European Conference on Information Retrieval, ECIR 2017, to appear
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Recent research has shown that the performance of search personalization
depends on the richness of user profiles which normally represent the user’s
topical interests. In this paper, we propose a new embedding approach to
learning user profiles, where users are embedded on a topical interest space.
We then directly utilize the user profiles for search personalization.
Experiments on query logs from a major commercial web search engine demonstrate
that our embedding approach improves the performance of the search engine and
also achieves better search performance than other strong baselines.
Omar Alonso
Subjects: Information Retrieval (cs.IR)
There is a renaissance in visual analytics systems for data analysis and
sharing, in particular, in the current wave of big data applications. We
introduce RAVE, a prototype that automates the generation of an interface that
uses facets and visualization techniques for exploring and analyzing relevance
assessments data sets collected via crowdsourcing. We present a technical
description of the main components and demonstrate its use.
Seyed-Mehdi-Reza Beheshti, Alireza Tabebordbar, Boualem Benatallah, Reza Nouri
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Understanding and analyzing big data is firmly recognized as a powerful and
strategic priority. For deeper interpretation of and better intelligence with
big data, it is important to transform raw data (unstructured, semi-structured
and structured data sources, e.g., text, video, image data sets) into curated
data: contextualized data and knowledge that is maintained and made available
for use by end-users and applications. In particular, data curation acts as the
glue between raw data and analytics, providing an abstraction layer that
relieves users from time consuming, tedious and error prone curation tasks. In
this context, the data curation process becomes a vital analytics asset for
increasing added value and insights.
In this paper, we identify and implement a set of curation APIs and make them
available (on GitHub) to researchers and developers to assist them transforming
their raw data into curated data. The curation APIs enable developers to easily
add features – such as extracting keyword, part of speech, and named entities
such as Persons, Locations, Organizations, Companies, Products, Diseases,
Drugs, etc.; providing synonyms and stems for extracted information items
leveraging lexical knowledge bases for the English language such as WordNet;
linking extracted entities to external knowledge bases such as Google Knowledge
Graph and Wikidata; discovering similarity among the extracted information
items, such as calculating similarity between string, number, date and time
data; classifying, sorting and categorizing data into various types, forms or
any other distinct class; and indexing structured and unstructured data – into
their applications.
Yongjun Zhu, Erjia Yan, Il-Yeol Song
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
With the ever-increasing scientific literature, there is a need on a natural
language interface to bibliographic information retrieval systems to retrieve
related information effectively. In this paper, we propose a natural language
interface, NLI-GIBIR, to a graph-based bibliographic information retrieval
system. In designing NLI-GIBIR, we developed a novel framework that can be
applicable to graph-based bibliographic information retrieval systems. Our
framework integrates algorithms/heuristics for interpreting and analyzing
natural language bibliographic queries. NLI-GIBIR allows users to search for a
variety of bibliographic data through natural language. A series of text- and
linguistic-based techniques are used to analyze and answer natural language
queries, including tokenization, named entity recognition, and syntactic
analysis. We find that our framework can effectively represents and addresses
complex bibliographic information needs. Thus, the contributions of this paper
are as follows: First, to our knowledge, it is the first attempt to propose a
natural language interface to graph-based bibliographic information retrieval.
Second, we propose a novel customized natural language processing framework
that integrates a few original algorithms/heuristics for interpreting and
analyzing natural language bibliographic queries. Third, we show that the
proposed framework and natural language interface provide a practical solution
in building real-world natural language interface-based bibliographic
information retrieval systems. Our experimental results show that the presented
system can correctly answer 39 out of 40 example natural language queries with
varying lengths and complexities.
David Tsurel, Dan Pelleg, Ido Guy, Dafna Shahaf
Comments: To appear in Proceedings of tenth ACM International Conference on Web Search and Data Mining, WSDM 2017
Subjects: Social and Information Networks (cs.SI); Databases (cs.DB); Information Retrieval (cs.IR)
A significant portion of web search queries directly refers to named
entities. Search engines explore various ways to improve the user experience
for such queries. We suggest augmenting search results with {em trivia facts}
about the searched entity. Trivia is widely played throughout the world, and
was shown to increase users’ engagement and retention.
Most random facts are not suitable for the trivia section. There is skill
(and art) to curating good trivia. In this paper, we formalize a notion of
emph{trivia-worthiness} and propose an algorithm that automatically mines
trivia facts from Wikipedia. We take advantage of Wikipedia’s category
structure, and rank an entity’s categories by their trivia-quality.
Our algorithm is capable of finding interesting facts, such as Obama’s Grammy
or Elvis’ stint as a tank gunner. In user studies, our algorithm captures the
intuitive notion of “good trivia” 45\% higher than prior work. Search-page
tests show a 22\% decrease in bounce rates and a 12\% increase in dwell time,
proving our facts hold users’ attention.
Xiaopeng Li, Ming Cheung, James She
Comments: IEEE International Conference on Big Data 2016
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)
Social graphs, representing online friendships among users, are one of the
fundamental types of data for many applications, such as recommendation,
virality prediction and marketing in social media. However, this data may be
unavailable due to the privacy concerns of users, or kept private by social
network operators, which makes such applications difficult. Inferring user
interests and discovering user connections through their shared multimedia
content has attracted more and more attention in recent years. This paper
proposes a Gaussian relational topic model for connection discovery using user
shared images in social media. The proposed model not only models user
interests as latent variables through their shared images, but also considers
the connections between users as a result of their shared images. It explicitly
relates user shared images to user connections in a hierarchical, systematic
and supervisory way and provides an end-to-end solution for the problem. This
paper also derives efficient variational inference and learning algorithms for
the posterior of the latent variables and model parameters. It is demonstrated
through experiments with over 200k images from Flickr that the proposed method
significantly outperforms the methods in previous works.
Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, Ghyslain Gagnon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research.
Felix Stahlberg, Adrià de Gispert, Eva Hasler, Bill Byrne
Comments: Submitted as EACL2017 short paper
Subjects: Computation and Language (cs.CL)
We present a novel scheme to combine neural machine translation (NMT) with
traditional statistical machine translation (SMT). Our approach borrows ideas
from linearised lattice minimum Bayes-risk decoding for SMT. The NMT score is
combined with the Bayes-risk of the translation according the SMT lattice. This
makes our approach much more flexible than (n)-best list or lattice rescoring
as the neural decoder is not restricted to the SMT search space. We show an
efficient and simple way to integrate risk estimation into the NMT decoder. We
test our method on English-German and Japanese-English and report significant
gains over lattice rescoring on several data sets for both single and ensembled
NMT.
Yushi Yao, Guangjian Li
Comments: 15 pages
Subjects: Computation and Language (cs.CL)
Traditional sentiment analysis often uses sentiment dictionary to extract
sentiment information in text and classify documents. However, emerging
informal words and phrases in user generated content call for analysis aware to
the context. Usually, they have special meanings in a particular context.
Because of its great performance in representing inter-word relation, we use
sentiment word vectors to identify the special words. Based on the distributed
language model word2vec, in this paper we represent a novel method about
sentiment representation of word under particular context, to be detailed, to
identify the words with abnormal sentiment polarity in long answers. Result
shows the improved model shows better performance in representing the words
with special meaning, while keep doing well in representing special idiomatic
pattern. Finally, we will discuss the meaning of vectors representing in the
field of sentiment, which may be different from general object-based
conditions.
Carlo Combi, Margherita Zorzi, Gabriele Pozzani, Ugo Moretti
Comments: arXiv admin note: substantial text overlap with arXiv:1506.08052
Subjects: Computation and Language (cs.CL)
The collection of narrative spontaneous reports is an irreplaceable source
for the prompt detection of suspected adverse drug reactions (ADRs): qualified
domain experts manually revise a huge amount of narrative descriptions and then
encode texts according to MedDRA standard terminology. The manual annotation of
narrative documents with medical terminology is a subtle and expensive task,
since the number of reports is growing up day-by-day. MagiCoder, a Natural
Language Processing algorithm, is proposed for the automatic encoding of
free-text descriptions into MedDRA terms. MagiCoder procedure is efficient in
terms of computational complexity (in particular, it is linear in the size of
the narrative input and the terminology). We tested it on a large dataset of
about 4500 manually revised reports, by performing an automated comparison
between human and MagiCoder revisions. For the current base version of
MagiCoder, we measured: on short descriptions, an average recall of (86\%) and
an average precision of (88\%); on medium-long descriptions (up to 255
characters), an average recall of (64\%) and an average precision of (63\%).
From a practical point of view, MagiCoder reduces the time required for
encoding ADR reports. Pharmacologists have simply to review and validate the
MagiCoder terms proposed by the application, instead of choosing the right
terms among the 70K low level terms of MedDRA. Such improvement in the
efficiency of pharmacologists’ work has a relevant impact also on the quality
of the subsequent data analysis. We developed MagiCoder for the Italian
pharmacovigilance language. However, our proposal is based on a general
approach, not depending on the considered language nor the term dictionary.
Iris Hendrickx, Louis Onrust, Florian Kunneman, Ali Hürriyetoğlu, Antal van den Bosch, Wessel Stoop
Subjects: Computation and Language (cs.CL)
We investigate what distinguishes reported dreams from other personal
narratives. The continuity hypothesis, stemming from psychological dream
analysis work, states that most dreams refer to a person’s daily life and
personal concerns, similar to other personal narratives such as diary entries.
Differences between the two texts may reveal the linguistic markers of dream
text, which could be the basis for new dream analysis work and for the
automatic detection of dream descriptions. We used three text analytics
methods: text classification, topic modeling, and text coherence analysis, and
applied these methods to a balanced set of texts representing dreams, diary
entries, and other personal stories. We observed that dream texts could be
distinguished from other personal narratives nearly perfectly, mostly based on
the presence of uncertainty markers and descriptions of scenes. Important
markers for non-dream narratives are specific time expressions and
conversational expressions. Dream texts also exhibit a lower discourse
coherence than other personal narratives.
Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov
Comments: Submitted to ICLR 2017
Subjects: Computation and Language (cs.CL)
We consider the problem of producing compact architectures for text
classification, such that the full model fits in a limited amount of memory.
After considering different solutions inspired by the hashing literature, we
propose a method built upon product quantization to store word embeddings.
While the original technique leads to a loss in accuracy, we adapt this method
to circumvent quantization artefacts. Our experiments carried out on several
benchmarks show that our approach typically requires two orders of magnitude
less memory than fastText while being only slightly inferior with respect to
accuracy. As a result, it outperforms the state of the art by a good margin in
terms of the compromise between memory usage and accuracy.
Xun Wang, Katsuhito Sudoh, Masaaki Nagata, Tomohide Shibata, Kawahara Daisuke, Kurohashi Sadao
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This paper introduces a novel neural network model for question answering,
the emph{entity-based memory network}. It enhances neural networks’ ability of
representing and calculating information over a long period by keeping records
of entities contained in text. The core component is a memory pool which
comprises entities’ states. These entities’ states are continuously updated
according to the input text. Questions with regard to the input text are used
to search the memory pool for related entities and answers are further
predicted based on the states of retrieved entities. Compared with previous
memory network models, the proposed model is capable of handling fine-grained
information and more sophisticated relations based on entities. We formulated
several different tasks as question answering problems and tested the proposed
model. Experiments reported satisfying results.
Matti Lankinen, Hannes Heikinheimo, Pyry Takala, Tapani Raiko, Juha Karhunen
Subjects: Computation and Language (cs.CL)
Inspired by recent research, we explore ways to model the highly
morphological Finnish language at the level of characters while maintaining the
performance of word-level models. We propose a new
Character-to-Word-to-Character (C2W2C) compositional language model that uses
characters as input and output while still internally processing word level
embeddings. Our preliminary experiments, using the Finnish Europarl V7 corpus,
indicate that C2W2C can respond well to the challenges of morphologically rich
languages such as high out of vocabulary rates, the prediction of novel words,
and growing vocabulary size. Notably, the model is able to correctly score
inflectional forms that are not present in the training data and sample
grammatically and semantically correct Finnish sentences character by
character.
Jiaji Huang, Rewon Child, Vinay Rao, Hairong Liu, Sanjeev Satheesh, Adam Coates
Comments: published as a workshop paper at NIPS 2016
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
In training speech recognition systems, labeling audio clips can be
expensive, and not all data is equally valuable. Active learning aims to label
only the most informative samples to reduce cost. For speech recognition,
confidence scores and other likelihood-based active learning methods have been
shown to be effective. Gradient-based active learning methods, however, are
still not well-understood. This work investigates the Expected Gradient Length
(EGL) approach in active learning for end-to-end speech recognition. We justify
EGL from a variance reduction perspective, and observe that EGL’s measure of
informativeness picks novel samples uncorrelated with confidence scores.
Experimentally, we show that EGL can reduce word errors by 11\%, or
alternatively, reduce the number of samples to label by 50\%, when compared to
random sampling.
Peter Potash, Alexey Romanov, Anna Rumshisky
Comments: 10 Pages
Subjects: Computation and Language (cs.CL)
In this work, we present a new dataset for computational humor, specifically
comparative humor ranking, which attempts to eschew the ubiquitous binary
approach to humor detection. The dataset consists of tweets that are humorous
responses to a given hashtag. We describe the motivation for this new dataset,
as well as the collection process, which includes a description of our
semi-automated system for data collection. We also present initial experiments
for this dataset using both unsupervised and supervised approaches. Our best
supervised system achieved 63.7% accuracy, suggesting that this task is much
more difficult than comparable humor detection tasks. Initial experiments
indicate that a character-level model is more suitable for this task than a
token-level model, likely due to a large amount of puns that can be captured by
a character-level model.
Peter Potash, Alexey Romanov, Anna Rumshisky
Comments: 10 pages
Subjects: Computation and Language (cs.CL)
Language generation tasks that seek to mimic human ability to use language
creatively are difficult to evaluate, since one must consider creativity,
style, and other non-trivial aspects of the generated text. The goal of this
paper is to develop evaluation methods for one such task, ghostwriting of rap
lyrics, and to provide an explicit, quantifiable foundation for the goals and
future directions of this task. Ghostwriting must produce text that is similar
in style to the emulated artist, yet distinct in content. We develop a novel
evaluation methodology that addresses several complementary aspects of this
task, and illustrate how such evaluation can be used to meaningfully analyze
system performance. We provide a corpus of lyrics for 13 rap artists, annotated
for stylistic similarity, which allows us to assess the feasibility of manual
evaluation for generated verse.
Marc Bolaños, Álvaro Peris, Francisco Casacuberta, Petia Radeva
Comments: Submitted to IbPRIA’17, 8 pages, 3 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
In this paper, we address the problem of visual question answering by
proposing a novel model, called VIBIKNet. Our model is based on integrating
Kernelized Convolutional Neural Networks and Long-Short Term Memory units to
generate an answer given a question about an image. We prove that VIBIKNet is
an optimal trade-off between accuracy and computational load, in terms of
memory and time consumption. We validate our method on the VQA challenge
dataset and compare it to the top performing methods in order to illustrate its
performance and speed.
Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis
Comments: In Proceedings of the 39th European Conference on Information Retrieval, ECIR 2017, to appear
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Recent research has shown that the performance of search personalization
depends on the richness of user profiles which normally represent the user’s
topical interests. In this paper, we propose a new embedding approach to
learning user profiles, where users are embedded on a topical interest space.
We then directly utilize the user profiles for search personalization.
Experiments on query logs from a major commercial web search engine demonstrate
that our embedding approach improves the performance of the search engine and
also achieves better search performance than other strong baselines.
Vasileios Lampos
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
We provide a brief technical description of an online platform for disease
monitoring, titled as the Flu Detector (fludetector.cs.ucl.ac.uk). Flu
Detector, in its current version (v.0.5), uses either Twitter or Google search
data in conjunction with statistical Natural Language Processing models to
estimate the rate of influenza-like illness in the population of England. Its
back-end is a live service that collects online data, utilises modern
technologies for large-scale text processing, and finally applies statistical
inference models that are trained offline. The front-end visualises the various
disease rate estimates. Notably, the models based on Google data achieve a high
level of accuracy with respect to the most recent four flu seasons in England
(2012/13 to 2015/16). This highlighted Flu Detector as having a great potential
of becoming a complementary source to the domestic traditional flu surveillance
schemes.
Seyed-Mehdi-Reza Beheshti, Alireza Tabebordbar, Boualem Benatallah, Reza Nouri
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Understanding and analyzing big data is firmly recognized as a powerful and
strategic priority. For deeper interpretation of and better intelligence with
big data, it is important to transform raw data (unstructured, semi-structured
and structured data sources, e.g., text, video, image data sets) into curated
data: contextualized data and knowledge that is maintained and made available
for use by end-users and applications. In particular, data curation acts as the
glue between raw data and analytics, providing an abstraction layer that
relieves users from time consuming, tedious and error prone curation tasks. In
this context, the data curation process becomes a vital analytics asset for
increasing added value and insights.
In this paper, we identify and implement a set of curation APIs and make them
available (on GitHub) to researchers and developers to assist them transforming
their raw data into curated data. The curation APIs enable developers to easily
add features – such as extracting keyword, part of speech, and named entities
such as Persons, Locations, Organizations, Companies, Products, Diseases,
Drugs, etc.; providing synonyms and stems for extracted information items
leveraging lexical knowledge bases for the English language such as WordNet;
linking extracted entities to external knowledge bases such as Google Knowledge
Graph and Wikidata; discovering similarity among the extracted information
items, such as calculating similarity between string, number, date and time
data; classifying, sorting and categorizing data into various types, forms or
any other distinct class; and indexing structured and unstructured data – into
their applications.
Yongjun Zhu, Erjia Yan, Il-Yeol Song
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
With the ever-increasing scientific literature, there is a need on a natural
language interface to bibliographic information retrieval systems to retrieve
related information effectively. In this paper, we propose a natural language
interface, NLI-GIBIR, to a graph-based bibliographic information retrieval
system. In designing NLI-GIBIR, we developed a novel framework that can be
applicable to graph-based bibliographic information retrieval systems. Our
framework integrates algorithms/heuristics for interpreting and analyzing
natural language bibliographic queries. NLI-GIBIR allows users to search for a
variety of bibliographic data through natural language. A series of text- and
linguistic-based techniques are used to analyze and answer natural language
queries, including tokenization, named entity recognition, and syntactic
analysis. We find that our framework can effectively represents and addresses
complex bibliographic information needs. Thus, the contributions of this paper
are as follows: First, to our knowledge, it is the first attempt to propose a
natural language interface to graph-based bibliographic information retrieval.
Second, we propose a novel customized natural language processing framework
that integrates a few original algorithms/heuristics for interpreting and
analyzing natural language bibliographic queries. Third, we show that the
proposed framework and natural language interface provide a practical solution
in building real-world natural language interface-based bibliographic
information retrieval systems. Our experimental results show that the presented
system can correctly answer 39 out of 40 example natural language queries with
varying lengths and complexities.
Sérgio Esteves, Helena Galhardas, Luís Veiga
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
To extract value from evergrowing volumes of data, coming from a number of
different sources, and to drive decision making, organizations frequently
resort to the composition of data processing workflows, since they are
expressive, flexible, and scalable. The typical workflow model enforces strict
temporal synchronization across processing steps without accounting the actual
effect of intermediate computations on the final workflow output. However, this
is not the most desirable behavior in a multitude of scenarios. We identify a
class of applications for continuous data processing where workflow output
changes slowly and without great significance in a short-to-medium time window,
thus wasting compute resources and energy with current approaches.
To overcome such inefficiency, we introduce a novel workflow model, for
continuous and data-intensive processing, capable of relaxing triggering
semantics according to the impact input data is assessed to have on changing
the workflow output. To assess this impact, learn the correlation between input
and output variation, and guarantee correctness within a given tolerated error
constant, we rely on Machine Learning. The functionality of this workflow model
is implemented in SmartFlux, a middleware framework which can be effortlessly
integrated with existing workflow managers. Experimental results indicate we
are able to save a significant amount of resources while not deviating the
workflow output beyond a small error constant with high confidence level.
Robert Escriva, Robbert van Renesse
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Consus is a strictly serializable geo-replicated transactional key-value
store. The key contribution of Consus is a new commit protocol that reduces the
cost of executing a transaction to three wide area message delays in the common
case. Augmenting the commit protocol are multiple Paxos implementations
optimized for different purposes. Together the different implementations and
optimizations comprise a cohesive system that provides low latency, high
availability, and strong guarantees. This paper describes the techniques
implemented in the open source release of Consus, and lays the groundwork for
evaluating Consus once the system implementation is sufficiently robust for a
thorough evaluation.
George Teodoro, Tahsin Kurc, Luis F. R. Taveira, Alba C. M. A. Melo, Jun Kong, Joel Saltz
Comments: 36 pages, 10 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Background: We describe an informatics framework for researchers and clinical
investigators to efficiently perform parameter sensitivity analysis and
auto-tuning for algorithms that segment and classify image features in a large
dataset of high-resolution images. The computational cost of the sensitivity
analysis process can be very high, because the process requires processing the
input dataset several times to systematically evaluate how output varies when
input parameters are varied. Thus, high performance computing techniques are
required to quickly execute the sensitivity analysis process.
Results: We carried out an empirical evaluation of the proposed method on
high performance computing clusters with multi-core CPUs and co-processors
(GPUs and Intel Xeon Phis). Our results show that (1) the framework achieves
excellent scalability and efficiency on a high performance computing cluster —
execution efficiency remained above 85% in all experiments; (2) the parameter
auto-tuning methods are able to converge by visiting only a small fraction
(0.0009%) of the search space with limited impact to the algorithm output
(0.56% on average).
Conclusions: The sensitivity analysis framework provides a range of
strategies for the efficient exploration of the parameter space, as well as
multiple indexes to evaluate the effect of parameter modification to outputs or
even correlation between parameters. Our work demonstrates the feasibility of
performing sensitivity analyses, parameter studies, and auto-tuning with large
datasets with the use of high-performance systems and techniques. The proposed
technologies will enable the quantification of error estimations and output
variations in these pipelines, which may be used in application specific ways
to assess uncertainty of conclusions extracted from data generated by these
image analysis pipelines.
Rashish Tandon, Qi Lei, Alexandros G. Dimakis, Nikos Karampatziakis
Comments: 13 pages, Presented at the Machine Learning Systems Workshop at NIPS 2016
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Computation (stat.CO)
We propose a novel coding theoretic framework for mitigating stragglers in
distributed learning. We show how carefully replicating data blocks and coding
across gradients can provide tolerance to failures and stragglers for
synchronous Gradient Descent. We implement our scheme in MPI and show how we
compare against baseline architectures in running time and generalization
error.
Denise M. Reeves
Comments: 122 pages, 25 figures. arXiv admin note: text overlap with arXiv:1511.05102
Subjects: Learning (cs.LG)
This paper considers the design and development of Bayes’ minimax, linear
classification systems using linear discriminant functions that are Bayes’
equalizer rules. Bayes’ equalizer rules divide two-class feature spaces into
decision regions that have equal classification errors. I will formulate the
problem of learning unknown linear discriminant functions from data as a locus
problem, thereby formulating geometric locus methods within a statistical
framework. Solving locus problems involves finding the equation of a curve or
surface defined by a given property, and finding the graph or locus of a given
equation. I will devise a system of locus equations that determines Bayes’
equalizer rules which is based on a variant of the inequality constrained
optimization problem for linear kernel support vector machines. Thereby, I will
define a class of learning machines which are fundamental building blocks for
Bayes’ minimax pattern recognition systems.
Miaoyan Wang, Yun S. Song
Comments: 33 pages, 5 figures
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Tensor decompositions have rich applications in statistics and machine
learning, and developing efficient, accurate algorithms for the problem has
received much attention recently. Here, we present a new method built on
Kruskal’s uniqueness theorem to decompose symmetric, nearly orthogonally
decomposable tensors. Unlike the classical higher-order singular value
decomposition which unfolds a tensor along a single mode, we consider
unfoldings along two modes and use rank-1 constraints to characterize the
underlying components. This tensor decomposition method provably handles a
greater level of noise compared to previous methods and achieves a high
estimation accuracy. Numerical results demonstrate that our algorithm is robust
to various noise distributions and that it performs especially favorably as the
order increases.
Daniel Romero, Vassilis N. Ioannidis, Georgios B. Giannakis
Comments: Submitted to IEEE Journal of Selected Topics in Signal processing, Oct. 2016
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Graph-based methods pervade the inference toolkits of numerous disciplines
including sociology, biology, neuroscience, physics, chemistry, and
engineering. A challenging problem encountered in this context pertains to
determining the attributes of a set of vertices given those of another subset
at possibly different time instants. Leveraging spatiotemporal dynamics can
drastically reduce the number of observed vertices, and hence the cost of
sampling. Alleviating the limited flexibility of existing approaches, the
present paper broadens the existing kernel-based graph function reconstruction
framework to accommodate time-evolving functions over possibly time-evolving
topologies. This approach inherits the versatility and generality of
kernel-based methods, for which no knowledge on distributions or second-order
statistics is required. Systematic guidelines are provided to construct two
families of space-time kernels with complementary strengths. The first
facilitates judicious control of regularization on a space-time frequency
plane, whereas the second can afford time-varying topologies. Batch and online
estimators are also put forth, and a novel kernel Kalman filter is developed to
obtain these estimates at affordable computational cost. Numerical tests with
real data sets corroborate the merits of the proposed methods relative to
competing alternatives.
Michael Tschannen, Helmut Bölcskei
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
Sparsity-based subspace clustering algorithms have attracted significant
attention thanks to their excellent performance in practical applications. A
prominent example is the sparse subspace clustering (SSC) algorithm by
Elhamifar and Vidal, which performs spectral clustering based on an adjacency
matrix obtained by sparsely representing each data point in terms of all the
other data points via the Lasso. When the number of data points is large or the
dimension of the ambient space is high, the computational complexity of SSC
quickly becomes prohibitive. Dyer et al. observed that SSC-OMP obtained by
replacing the Lasso by the greedy orthogonal matching pursuit (OMP) algorithm
results in significantly lower computational complexity, while often yielding
comparable performance. The central goal of this paper is an analytical
performance characterization of SSC-OMP for noisy data. Moreover, we introduce
and analyze the SSC-MP algorithm, which employs matching pursuit (MP) in lieu
of OMP. Both SSC-OMP and SSC-MP are proven to succeed even when the subspaces
intersect and when the data points are contaminated by severe noise. The
clustering conditions we obtain for SSC-OMP and SSC-MP are similar to those for
SSC and for the thresholding-based subspace clustering (TSC) algorithm due to
Heckel and B”olcskei. Analytical results in combination with numerical results
indicate that both SSC-OMP and SSC-MP with a data-dependent stopping criterion
automatically detect the dimensions of the subspaces underlying the data.
Moreover, experiments on synthetic and real data show that SSC-MP compares very
favorably to SSC, SSC-OMP, TSC, and the nearest subspace neighbor (NSN)
algorithm, both in terms of clustering performance and running time. In
addition, we find that, in contrast to SSC-OMP, the performance of SSC-MP is
very robust with respect to the choice of parameters in the stopping criteria.
Yochai Blau, Tomer Michaeli
Subjects: Learning (cs.LG)
Spectral dimensionality reduction algorithms are widely used in numerous
domains, including for recognition, segmentation, tracking and visualization.
However, despite their popularity, these algorithms suffer from a major
limitation known as the “repeated Eigen-directions” phenomenon. That is, many
of the embedding coordinates they produce typically capture the same direction
along the data manifold. This leads to redundant and inefficient
representations that do not reveal the true intrinsic dimensionality of the
data. In this paper, we propose a general method for avoiding redundancy in
spectral algorithms. Our approach relies on replacing the orthogonality
constraints underlying those methods by unpredictability constraints.
Specifically, we require that each embedding coordinate be unpredictable (in
the statistical sense) from all previous ones. We prove that these constraints
necessarily prevent redundancy, and provide a simple technique to incorporate
them into existing methods. As we illustrate on challenging high-dimensional
scenarios, our approach produces significantly more informative and compact
representations, which substantially improve visualization and classification
tasks.
Thomas Mesnard, Wulfram Gerstner, Johanni Brea
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In machine learning, error back-propagation in multi-layer neural networks
(deep learning) has been impressively successful in supervised and
reinforcement learning tasks. As a model for learning in the brain, however,
deep learning has long been regarded as implausible, since it relies in its
basic form on a non-local plasticity rule. To overcome this problem,
energy-based models with local contrastive Hebbian learning were proposed and
tested on a classification task with networks of rate neurons. We extended this
work by implementing and testing such a model with networks of leaky
integrate-and-fire neurons. Preliminary results indicate that it is possible to
learn a non-linear regression task with hidden layers, spiking neurons and a
local synaptic plasticity rule.
Chen-Hsuan Lin, Simon Lucey
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this paper, we establish a theoretical connection between the classical
Lucas & Kanade (LK) algorithm and the emerging topic of Spatial Transformer
Networks (STNs). STNs are of interest to the vision and learning communities
due to their natural ability to combine alignment and classification within the
same theoretical framework. Inspired by the Inverse Compositional (IC) variant
of the LK algorithm, we present Inverse Compositional Spatial Transformer
Networks (IC-STNs). We demonstrate that IC-STNs can achieve better performance
than conventional STNs with less model capacity; in particular, we show
superior performance in pure image alignment tasks as well as joint
alignment/classification problems on real-world problems.
Hanie Sedghi, Ashish Sabharwal
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We consider knowledge base (KB) completion for KBs rich in facts about common
nouns (generics), such as “trees produce oxygen” or “dogs have tails”. While KB
completion has received much attention for named entity KBs, little emphasis
has been placed on generics despite their importance for capturing general
knowledge. Compared with named entity KBs such as Freebase, KBs about generics
have more complex underlying regularities, are substantially more incomplete,
and violate the commonly used locally closed world assumption (LCWA).
Consequently, existing completion methods struggle with this new task, and the
commonly used evaluation metrics become less meaningful. To address these
challenges, we make three contributions: (a) a tensor factorization approach
that achieves state-of-the-art results by incorporating external knowledge
about relation schema and entity taxonomy, (b) taxonomy guided submodular
active learning to efficiently collect additional annotations to address KB
incompleteness, and (c) a more appropriate metric (yield at high precision)
along with a constant-factor deterministic approximation algorithm to compute
it cost-effectively with only a logarithmic number of human annotations. An
empirical evaluation shows that our techniques achieve state-of-the-art results
on two novel generics KBs about elementary level science.
Mehdi Mirza, Aaron Courville, Yoshua Bengio
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Humans learn a predictive model of the world and use this model to reason
about future events and the consequences of actions. In contrast to most
machine predictors, we exhibit an impressive ability to generalize to unseen
scenarios and reason intelligently in these settings. One important aspect of
this ability is physical intuition(Lake et al., 2016). In this work, we explore
the potential of unsupervised learning to find features that promote better
generalization to settings outside the supervised training distribution. Our
task is predicting the stability of towers of square blocks. We demonstrate
that an unsupervised model, trained to predict future frames of a video
sequence of stable and unstable block configurations, can yield features that
support extrapolating stability prediction to blocks configurations outside the
training set distribution
Timothy J. Draelos, Nadine E. Miner, Christopher C. Lamb, Craig M. Vineyard, Kristofor D. Carlson, Conrad D. James, James B. Aimone
Comments: Submitted to IJCNN 2017
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG); Machine Learning (stat.ML)
Neural machine learning methods, such as deep neural networks (DNN), have
achieved remarkable success in a number of complex data processing tasks. These
methods have arguably had their strongest impact on tasks such as image and
audio processing – data processing domains in which humans have long held clear
advantages over conventional algorithms. In contrast to biological neural
systems, which are capable of learning continuously, deep artificial networks
have a limited ability for incorporating new information in an already trained
network. As a result, methods for continuous learning are potentially highly
impactful in enabling the application of deep networks to dynamic data sets.
Here, inspired by the process of adult neurogenesis in the hippocampus, we
explore the potential for adding new neurons to deep layers of artificial
neural networks in order to facilitate their acquisition of novel information
while preserving previously trained data representations. Our results on the
MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes
lower and upper case letters and digits, demonstrate that neurogenesis is well
suited for addressing the stability-plasticity dilemma that has long challenged
adaptive machine learning algorithms.
Zheng Xu, Furong Huang, Louiqa Raschid, Tom Goldstein
Comments: NIPS tensor workshop
Subjects: Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)
We propose an algorithm for the non-negative factorization of an occurrence
tensor built from heterogeneous networks. We use l0 norm to model sparse errors
over discrete values (occurrences), and use decomposed factors to model the
embedded groups of nodes. An efficient splitting method is developed to
optimize the nonconvex and nonsmooth objective. We study both synthetic
problems and a new dataset built from financial documents, resMBS.
Zheng Xu, Soham De, Mario Figueiredo, Christoph Studer, Tom Goldstein
Comments: NIPS nonconvex workshop
Subjects: Optimization and Control (math.OC); Learning (cs.LG)
The alternating direction method of multipliers (ADMM) is a common
optimization tool for solving constrained and non-differentiable problems. We
provide an empirical study of the practical performance of ADMM on several
nonconvex applications, including l0 regularized linear regression, l0
regularized image denoising, phase retrieval, and eigenvector computation. Our
experiments suggest that ADMM performs well on a broad class of non-convex
problems. Moreover, recently proposed adaptive ADMM methods, which
automatically tune penalty parameters as the method runs, can improve algorithm
efficiency and solution quality compared to ADMM with a non-tuned penalty.
Pedram Daee, Tomi Peltola, Marta Soare, Samuel Kaski
Comments: 22 pages, 9 figures, all codes and data available at this https URL
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG); Machine Learning (stat.ML)
Prediction in a small-sized sample with a large number of covariates, the
“small n, large p” problem, is challenging. This setting is encountered in
multiple applications, such as precision medicine, where obtaining additional
samples can be extremely costly or even impossible, and extensive research
effort has recently been dedicated to finding principled solutions for accurate
prediction. However, a valuable source of additional information, domain
experts, has not yet been efficiently exploited. We formulate knowledge
elicitation generally as a probabilistic inference process, where expert
knowledge is sequentially queried to improve predictions. In the specific case
of sparse linear regression, where we assume the expert has knowledge about the
values of the regression coefficients or about the relevance of the features,
we propose an algorithm and computational approximation for fast and efficient
interaction, which sequentially identifies the most informative features on
which to query expert knowledge. Evaluations of our method in experiments with
simulated and real users show improved prediction accuracy already with a small
effort from the expert.
Venkataraman Santhanam, Vlad I. Morariu, Larry S. Davis
Comments: Submitted to CVPR on November 15th, 2016. Code will be made available soon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We present a Deep Convolutional Neural Network architecture which serves as a
generic image-to-image regressor that can be trained end-to-end without any
further machinery. Our proposed architecture: the Recursively Branched
Deconvolutional Network (RBDN) develops a cheap multi-context image
representation very early on using an efficient recursive branching scheme with
extensive parameter sharing and learnable upsampling. This multi-context
representation is subjected to a highly non-linear locality preserving
transformation by the remainder of our network comprising of a series of
convolutions/deconvolutions without any spatial downsampling. The RBDN
architecture is fully convolutional and can handle variable sized images during
inference. We provide qualitative/quantitative results on (3) diverse tasks:
relighting, denoising and colorization and show that our proposed RBDN
architecture obtains comparable results to the state-of-the-art on each of
these tasks when used off-the-shelf without any post processing or
task-specific architectural modifications.
Jiaji Huang, Rewon Child, Vinay Rao, Hairong Liu, Sanjeev Satheesh, Adam Coates
Comments: published as a workshop paper at NIPS 2016
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
In training speech recognition systems, labeling audio clips can be
expensive, and not all data is equally valuable. Active learning aims to label
only the most informative samples to reduce cost. For speech recognition,
confidence scores and other likelihood-based active learning methods have been
shown to be effective. Gradient-based active learning methods, however, are
still not well-understood. This work investigates the Expected Gradient Length
(EGL) approach in active learning for end-to-end speech recognition. We justify
EGL from a variance reduction perspective, and observe that EGL’s measure of
informativeness picks novel samples uncorrelated with confidence scores.
Experimentally, we show that EGL can reduce word errors by 11\%, or
alternatively, reduce the number of samples to label by 50\%, when compared to
random sampling.
Rajendra Rana Bhat, Vivek Viswanath, Xiaolin Li
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Genomics (q-bio.GN)
Transcriptional profiling on microarrays to obtain gene expressions has been
used to facilitate cancer diagnosis. We propose a deep generative machine
learning architecture (called DeepCancer) that learn features from unlabeled
microarray data. These models have been used in conjunction with conventional
classifiers that perform classification of the tissue samples as either being
cancerous or non-cancerous. The proposed model has been tested on two different
clinical datasets. The evaluation demonstrates that DeepCancer model achieves a
very high precision score, while significantly controlling the false positive
and false negative scores.
Christian Grussler, Pontus Giselsson
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
Optimization problems with rank constraints appear in many diverse fields
such as control, machine learning and image analysis. Since the rank constraint
is non-convex, these problems are often approximately solved via convex
relaxations. Nuclear norm regularization is the prevailing convexifying
technique for dealing with these types of problem. This paper introduces a
family of low-rank inducing norms and regularizers which includes the nuclear
norm as a special case. A posteriori guarantees on solving an underlying rank
constrained optimization problem with these convex relaxations are provided. We
evaluate the performance of the low-rank inducing norms on three matrix
completion problems. In all examples, the nuclear norm heuristic is
outperformed by convex relaxations based on other low-rank inducing norms. For
two of the problems there exist low-rank inducing norms that succeed in
recovering the partially unknown matrix, while the nuclear norm fails. These
low-rank inducing norms are shown to be representable as semi-definite programs
and to have cheaply computable proximal mappings. The latter makes it possible
to also solve problems of large size with the help of scalable first-order
methods. Finally, it is proven that our findings extend to the more general
class of atomic norms. In particular, this allows us to solve corresponding
vector-valued problems, as well as problems with other non-convex constraints.
Yuanwei Liu, Zhijin Qin, Maged Elkashlan, Yue Gao, Lajos Hanzo
Comments: IEEE Transactions on Wireless Communications, revision
Subjects: Information Theory (cs.IT)
This paper investigates the physical layer security of non-orthogonal
multiple access (NOMA) in large-scale networks with invoking stochastic
geometry. Both single-antenna and multiple-antenna aided transmission scenarios
are considered, where the base station (BS) communicates with randomly
distributed NOMA users. In the single-antenna scenario, we adopt a protected
zone around the BS to establish an eavesdropper-exclusion area with the aid of
careful channel-ordering of the NOMA users. In the multiple-antenna scenario,
artificial noise is generated at the BS for further improving the security of a
beamforming-aided system. In order to characterize the secrecy performance, we
derive new exact expressions of the security outage probability for both
single-antenna and multiple-antenna aided scenarios. To obtain further
insights, 1) for the single antenna scenario, we perform secrecy diversity
order analysis of the selected user pair. The analytical results derived
demonstrate that the secrecy diversity order is determined by the specific user
having the worse channel condition among the selected user pair; and 2) for the
multiple-antenna scenario, we derive the asymptotic secrecy outage probability,
when the number of transmit antennas tends to infinity. Monte Carlo simulations
are provided for verifying the analytical results derived and to show that:
i)~The security performance of the NOMA networks can be improved by invoking
the protected zone and by generating artificial noise at the BS; and ii)~The
asymptotic secrecy outage probability is close to the exact secrecy outage
probability.
Minglei You, Hongjian Sun, Jing Jiang, Jiayi Zhang
Subjects: Information Theory (cs.IT)
This paper proposes a unified framework for the effective rate analysis over
arbitrary correlated and not necessarily identical multiple inputs single
output (MISO) fading channels, which uses moment generating function (MGF)
based approach and H transform representation. The proposed framework has the
potential to simplify the cumbersome analysis procedure compared to the
probability density function (PDF) based approach. Moreover, the effective
rates over two specific fading scenarios are investigated, namely independent
but not necessarily identical distributed (i.n.i.d.) MISO hyper Fox’s H fading
channels and arbitrary correlated generalized K fading channels. The exact
analytical representations for these two scenarios are also presented. By
substituting corresponding parameters, the effective rates in various practical
fading scenarios, such as Rayleigh, Nakagami-m, Weibull/Gamma and generalized K
fading channels, are readily available. In addition, asymptotic approximations
are provided for the proposed H transform and MGF based approach as well as for
the effective rate over i.n.i.d. MISO hyper Fox’s H fading channels.
Simulations under various fading scenarios are also presented, which support
the validity of the proposed method.
Anastasia Lavrenko, Florian Roemer, Giovanni Del Galdo Member, Reiner Thomae
Comments: 4 pages + references
Subjects: Information Theory (cs.IT)
Compressed sensing is a sampling paradigm that allows to simultaneously
measure and compress signals that are sparse or compressible in some domain.
The choice of a sensing matrix that carries out the measurement has a defining
impact on the system performance and it is often advocated to draw its elements
randomly. It has been noted that in the presence of input (signal) noise, the
application of the sensing matrix causes SNR degradation due to the noise
folding effect. In fact, it might also result in the variations of the output
SNR in compressive measurements over the support of the input signal. In this
work, we study the impact of a distribution from which the elements of a
sensing matrix are drawn on the spread of the output SNR. We derive analytic
expressions for several common types of sensing matrices and show that SNR
variations can be significant, especially for lower dimensional problems.
Hence, this effect should not be ignored during system design and performance
evaluation.
Fan Liu, Christos Masouros, Ang Li, Jianming Zhou
Subjects: Information Theory (cs.IT)
In this letter, we consider the coexistence and spectrum sharing between
downlink multiple-input-multiple-output (MIMO) communication and a MIMO radar.
Based on the knowledge of the communication and interference channels, the
communication beamforming matrix is designed to minimize the transmit power
while guaranteeing the performance requirement of the radar and downlink
communication users. We investigate the beamforming design for perfect Channel
State Information (CSI) and the robust beamforming for imperfect CSI case.
Simulation results verify that the proposed approach facilitates the
coexistence between radar and communication links, and illustrates a scalable
trade-off of the two systems’ performance.
Zhanzhan Zhang, Zhiyong Chen, Hao Feng, Bin Xia, Weiliang Xie, Yong Zhao
Comments: submitted to IEEE Trans. Veh. Tech., Sep. 2016
Subjects: Information Theory (cs.IT)
It is starting to become a big trend in the era of social networking that
people produce and upload user-generated contents to Internet via wireless
networks, bringing a significant burden on wireless uplink networks. In this
paper, we contribute to designing and theoretical understanding of wireless
cache-enabled upload transmission in a delay-tolerant small cell network to
relieve the burden, and then propose the corresponding scheduling policies for
the small base station (SBS) under the infinite and finite cache sizes.
Specifically, the cache ability introduced by SBS enables SBS to eliminate the
redundancy among the upload contents from users. This strategy not only
alleviates the wireless backhual traffic congestion from SBS to a macro base
station (MBS), but also improves the transmission efficiency of SBS. We then
investigate the scheduling schemes of SBS to offload more data traffic under
caching size constraint. Moreover, two operational regions for the wireless
cache-enabled upload network, namely, the delay-limited region and the
cache-limited region, are established to reveal the fundamental tradeoff
between the delay tolerance and the cache ability. Finally, numerical results
are provided to demonstrate the significant performance gains of the proposed
wireless cache-enabled upload network.
Hongwei Liu, Maouche Youcef
Subjects: Information Theory (cs.IT)
For any prime (p), (lambda)-constacyclic codes of length (p^s) over ({cal
R}=mathbb{F}_{p^m} + umathbb{F}_{p^m}) are precisely the ideals of the local
ring ({cal R}_{lambda}=frac{{cal R}[x]}{leftlangle x^{p^s}-lambda
ight
angle}), where (u^2=0). In this paper, we first investigate the Hamming
distances of cyclic codes of length (p^s) over ({cal R}). The minimum Hamming
distances of all cyclic codes of length (p^s) over ({cal R}) are determined.
Moreover, an isometry between cyclic and (alpha)-constacyclic codes of length
(p^s) over ({cal R}) is established, where (alpha) is a nonzero element of
(mathbb{F}_{p^m}), which carries over the results regarding cyclic codes
corresponding to (alpha)-constacyclic codes of length (p^s) over ({cal R}).
Mohammad J. Khojasteh, Morteza H. Shoreh, Jawad A. Salehi
Comments: Communication and Information Theory (IWCIT), 2015 Iran Workshop on. IEEE, 2015
Subjects: Information Theory (cs.IT); Rings and Algebras (math.RA)
In this paper, we investigate PN-sequences with ideal autocorrelation
property and the consequences of this property on the number of +1s and -1s and
run structure of sequences. We begin by discussing and surveying about the
length of PNsequences with ideal autocorrelation property. From our discussion
and survey we introduce circulant matrix representation of PN-sequence. Through
circulant matrix representation we obtain system of non-linear equations that
lead to ideal autocorrelation property. Rewriting PN-sequence and its
autocorrelation property in {0,1} leads to a definition based on Hamming weight
and Hamming distance and hence we can easily prove some results on the
PN-sequences with ideal autocorrelation property.
Paul Hand, Vladislav Voroninski
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC); Probability (math.PR)
We consider the problem of phase retrieval from corrupted magnitude
observations. In particular we show that a fixed (x_0 in mathbb{R}^n) can be
recovered exactly from corrupted magnitude measurements (|langle a_i, x_0
angle | + eta_i, quad i =1,2ldots m) with high probability for (m = O(n)),
where (a_i in mathbb{R}^n) are i.i.d standard Gaussian and (eta in
mathbb{R}^m) has fixed sparse support and is otherwise arbitrary, by using a
version of the PhaseMax algorithm augmented with slack variables subject to a
penalty. This linear programming formulation, which we call RobustPhaseMax,
operates in the natural parameter space, and our proofs rely on a direct
analysis of the optimality conditions using concentration inequalities.
Sinem Unal, Aaron B. Wagner
Comments: Presented in part at the IEEE Int. Symposium on Information Theory (ISIT), Barcelona, July 2016 and submitted for presentation in part to Data Compression Conference (DCC), Snowbird, UT April, 2017. Submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
We consider a rate-distortion problem with side information at multiple
decoders. Several upper and lower bounds have been proposed for this general
problem or special cases of it. We provide an upper bound for general instances
of this problem, which takes the form of a linear program, by utilizing random
binning and simultaneous decoding techniques and compare it with the existing
bounds. We also provide a lower bound for the general problem, which was
inspired by a linear-programming lower bound for index coding, and show that it
subsumes most of the lower bounds in literature. Using these upper and lower
bounds, we explicitly characterize the rate-distortion function of a problem
that can be seen as a Gaussian analogue of the “odd-cycle” index coding
problem.
Sinem Unal, Aaron B. Wagner
Comments: Presented in parts at the IEEE Int. Symposium on Information Theory (ISIT), Hawaii, July 2014, the Information Theory and Applications Workshop, Jan./Feb. 2016, and the Conference on Information Sciences and Systems (CISS), Princeton, March 2016. Submitted to IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
We consider rate-distortion with two decoders, each with distinct side
information. This problem is well understood when the side information at the
decoders satisfies a certain degradedness condition. We consider cases in which
this degradedness condition is violated but the source and the side information
consist of jointly Gaussian vectors. We provide a hierarchy of four lower
bounds on the optimal rate. These bounds are then used to determine the optimal
rate for several classes of instances.
Amin Gohari, Mahtab Mirmohseni, Masoumeh Nasiri-Kenari
Comments: Accepted for publication in the IEEE Transactions on Molecular, Biological, and Multi-Scale Communications as a Shannon Centennial Special Issue
Subjects: Information Theory (cs.IT)
Molecular Communication (MC) is a communication strategy that uses molecules
as carriers of information, and is widely used by biological cells. As an
interdisciplinary topic, it has been studied by biologists, communication
theorists and a growing number of information theorists. This paper aims to
specifically bring MC to the attention of information theorists. To do this, we
first highlight the unique mathematical challenges of studying the capacity of
molecular channels. Addressing these problems require use of known, or
development of new mathematical tools. Toward this goal, we review a subjective
selection of the existing literature on information theoretic aspect of
molecular communication. The emphasis here is on the mathematical techniques
used, rather than on the setup or modeling of a specific paper. Finally, as an
example, we propose a concrete information theoretic problem that was motivated
by our study of molecular communication.
Jianhua Mo, Robert W. Heath Jr
Comments: To appear in the Proceedings of 50th Asilomar Conference on Signals, Systems and Computers
Subjects: Information Theory (cs.IT)
We analyze limited feedback in systems where a multiple-antenna transmitter
sends signals to single-antenna receivers with finite-bit ADCs. If channel
state information (CSI) is not available with high resolution at the
transmitter and the precoding is not well designed, the inter-user interference
is a big decoding challenge for receivers with low-resolution quantization. In
this paper, we derive achievable rates with finite-bit ADCs and finite-bit CSI
feedback. The performance loss compared to the case with perfect CSI is then
analyzed. The results show that the number of bits per feedback should increase
linearly with the ADC resolution to restrict the loss.
Monika Polak, Eustrat Zhupa
Comments: 9 pages, 5 figures, 2 tables
Subjects: Information Theory (cs.IT)
In this article we present a construction of error correcting codes, that
have representation as very sparse matrices and belong to the class of Low
Density Parity Check Codes. LDPC codes are in the classical Hamming metric.
They are very close to well known Shannon bound. The ability to use graphs for
code construction was first discussed by Tanner in 1981 and has been used in a
number of very effective implementations. We describe how to construct such
codes by using special a family of graphs introduced by Ustimenko and Woldar.
Graphs that we used are bipartite, bi-regular, very sparse and do not have
short cycles C 4 . Due to the very low density of such graphs, the obtained
codes are fast decodable. We describe how to choose parameters to obtain a
desired code rate. We also show results of computer simulations of BER (bit
error rate) of the obtained codes in order to compare them with other known
LDPC codes.
Yongzhi Li, Cheng Tao, Amine Mezghani, A. Lee Swindlehurst, Gonzalo Seco-Granados, Liu Liu
Comments: 14 pages, 7 figures
Subjects: Information Theory (cs.IT)
This paper considers a single-cell massive multiple-input multiple-output
(MIMO) system equipped with a base station (BS) that uses one-bit quantization
and investigates the energy efficiency (EE) and spectral efficiency (SE)
trade-off by simultaneously looking at the uplink and downlink transmission. To
this end, we first propose a new precoding scheme and downlink power allocation
strategy, which makes the uplink-downlink SINR duality hold in the one-bit MIMO
systems. Then, by taking into account the effect of the imperfect channel state
information (CSI), we obtain approximate closed-form expressions for the uplink
achievable rate with maximum ratio combining (MRC) and zero-forcing (ZF)
receivers, which, according to the duality property, can also be achieved in
the downlink transmission. By employing the multiple objective optimization
(MOO) framework, we focus on the optimal design for the EE and SE trade-off
that jointly selects the number of active terminals, pilot training duration
and operating power to maximize both EE and SE. The weighted Chebyshev method
is used to obtain the Pareto boundary of EE and SE, which allows the system to
know all the possible operating points and balance the EE and SE in an
efficient way. In order to go beyond the Pareto boundary and actually solve the
MOO problem, this problem is transformed into a single-objective problem using
the a priori method such as the weighted product method, where the operating EE
and SE are chosen with equal importance by adjusting the weights. Numerical
results are presented to verify our analytical results and demonstrate the
fundamental tradeoff between EE and SE for different parameter settings.
Ahmed Ewaisha, Cihan Tepedelenlioglu
Comments: Asilomar 2016 (to appear). arXiv admin note: text overlap with arXiv:1601.00608, arXiv:1602.08010
Subjects: Information Theory (cs.IT)
We study an uplink multi secondary user (SU) cognitive radio system suffering
statistical heterogeneity among SUs’ channels. This heterogeneity may result in
differentiated delay performances to these SUs and result in harmful
interference to the PU. We first derive an explicit closed-form expression for
the average delay in terms of an arbitrary power-control policy. Then, we
propose a delay-optimal closed-form scheduling and power-control policy that
can provide the required average delay guarantees to all SUs besides protecting
the PU from harmful interference. We support our findings by extensive system
simulations and show that it outperforms existing policies substantially.
Ahmed Ewaisha, Cihan Tepedelenlioglu
Comments: Asilomar 2015. arXiv admin note: text overlap with arXiv:1602.08010
Subjects: Information Theory (cs.IT)
We study an uplink multi secondary user (SU) system having statistical delay
constraints, and an average interference constraint to the primary user (PU).
SUs with heterogeneous interference channel statistics, to the PU, experience
heterogeneous delay performances since SUs causing low interference are
scheduled more frequently than those causing high interference. We propose a
scheduling algorithm that can provide arbitrary average delay guarantees to SUs
irrespective of their statistical channel qualities. We derive the algorithm
using the Lyapunov technique and show that it yields bounded queues and satisfy
the interference constraints. Using simulations, we show its superiority over
the Max-Weight algorithm.
Kathryn Haymaker, Beth Malmskog, Gretchen Matthews
Subjects: Number Theory (math.NT); Information Theory (cs.IT)
We generalize the construction of locally recoverable codes on algebraic
curves given by Barg, Tamo and Vladut to those with arbitrarily many recovery
sets by exploiting the structure of fiber products of curves. Employing maximal
curves, we create several new families of locally recoverable codes with
multiple recovery sets, including codes with two recovery sets from the
generalized Giulietti and Korchm’aros (GK) curves and the Suzuki curves, and a
new locally recoverable code with many recovery sets based on the Hermitian
curve, using a fiber product construction of Van der Geer and Van der Vlugt. In
addition, we consider the relationship between local error recovery and global
error correction, and the availability required to locally recover any pattern
of a fixed number of erasures.
E.O. Kiktenko, A.S. Trushechkin, C.C.W. Lim, Y.V. Kurochkin, A.K. Fedorov
Comments: 10 pages, 4 figures
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
Quantum key distribution (QKD) is a quantum-proof key exchange scheme which
is fast approaching the communication industry. An essential component in QKD
is the information reconciliation step, which is used for correcting the
quantum channel noise errors. The recently suggested blind reconciliation
technique, based on low-density parity-check (LDPC) codes, offers remarkable
prospectives for efficient information reconciliation without an a priori error
rate estimation. In the present work, we suggest an improvement of the blind
information reconciliation protocol allowing significant increase the
efficiency of the procedure and reducing its interactivity. The proposed
technique is based on introducing symmetry in operations of parties, and the
consideration of results of unsuccessful belief propagation decodings.
Fedor Nazarov, Yuval Peres
Comments: 9 pages
Subjects: Probability (math.PR); Information Theory (cs.IT); Statistics Theory (math.ST)
In the trace reconstruction problem, an unknown bit string (x in {0,1}^n)
is observed through the deletion channel, which deletes each bit of (x) with
some constant probability (q), yielding a contracted string (widetilde{x}).
How many independent copies of (widetilde{x}) are needed to reconstruct (x)
with high probability? Prior to this work, the best upper bound, due to
Holenstein, Mitzenmacher, Panigrahy, and Wieder (2008), was
(exp(widetilde{O}(n^{1/2}))). We improve this bound to (exp(O(n^{1/3})))
using statistics of individual bits in the output and show that this bound is
sharp in the restricted model where this is the only information used. Our
method, that uses elementary complex analysis, can also handle insertions.
Michael Tschannen, Helmut Bölcskei
Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
Sparsity-based subspace clustering algorithms have attracted significant
attention thanks to their excellent performance in practical applications. A
prominent example is the sparse subspace clustering (SSC) algorithm by
Elhamifar and Vidal, which performs spectral clustering based on an adjacency
matrix obtained by sparsely representing each data point in terms of all the
other data points via the Lasso. When the number of data points is large or the
dimension of the ambient space is high, the computational complexity of SSC
quickly becomes prohibitive. Dyer et al. observed that SSC-OMP obtained by
replacing the Lasso by the greedy orthogonal matching pursuit (OMP) algorithm
results in significantly lower computational complexity, while often yielding
comparable performance. The central goal of this paper is an analytical
performance characterization of SSC-OMP for noisy data. Moreover, we introduce
and analyze the SSC-MP algorithm, which employs matching pursuit (MP) in lieu
of OMP. Both SSC-OMP and SSC-MP are proven to succeed even when the subspaces
intersect and when the data points are contaminated by severe noise. The
clustering conditions we obtain for SSC-OMP and SSC-MP are similar to those for
SSC and for the thresholding-based subspace clustering (TSC) algorithm due to
Heckel and B”olcskei. Analytical results in combination with numerical results
indicate that both SSC-OMP and SSC-MP with a data-dependent stopping criterion
automatically detect the dimensions of the subspaces underlying the data.
Moreover, experiments on synthetic and real data show that SSC-MP compares very
favorably to SSC, SSC-OMP, TSC, and the nearest subspace neighbor (NSN)
algorithm, both in terms of clustering performance and running time. In
addition, we find that, in contrast to SSC-OMP, the performance of SSC-MP is
very robust with respect to the choice of parameters in the stopping criteria.
Wojciech Słomczyński, Anna Szczepanek
Comments: 8 double column pages, 4 figures
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)
We introduce two information-theoretical invariants for the projective
unitary group acting on a finite-dimensional complex Hilbert space: PVM- and
POVM-dynamical (quantum) entropies. They quantify the randomness of the
successive quantum measurement results in the case where the evolution of the
system between each two consecutive measurements is described by a given
unitary operator. We study the class of chaotic unitaries, i.e., the ones of
maximal entropy or, equivalently, such that they can be represented by suitably
rescaled complex Hadamard matrices in some orthonormal bases. We provide
necessary conditions for a unitary operator to be chaotic, which become also
sufficient for qubits and qutrits. These conditions are expressed in terms of
the relation between the trace and the determinant of the operator. We also
compute the volume of the set of chaotic unitaries in dimensions two and three,
and the average PVM-dynamical entropy over the unitary group in dimension two.
We prove that this mean value behaves as the logarithm of the dimension of the
Hilbert space, which implies that the probability that the dynamical entropy of
a unitary is almost as large as possible approaches unity as the dimension
tends to infinity.