Huiling Zhen, Shang-Nan Wang, Hai-Jun Zhou
Comments: 10 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)
Unsupervised learning in a generalized Hopfield associative-memory network is
investigated in this work. First, we prove that the (generalized) Hopfield
model is equivalent to a semi-restricted Boltzmann machine with a layer of
visible neurons and another layer of hidden binary neurons, so it could serve
as the building block for a multilayered deep-learning system. We then
demonstrate that the Hopfield network can learn to form a faithful internal
representation of the observed samples, with the learned memory patterns being
prototypes of the input data. Furthermore, we propose a spectral method to
extract a small set of concepts (idealized prototypes) as the most concise
summary or abstraction of the empirical data.
Mahardhika Pratama, Plamen P. Angelov, Edwin Lughofer, Meng Joo Er
Comments: this paper is submitted for publication in Information Sciences
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
The theory of random vector functional link network (RVFLN) has provided a
breakthrough in the design of neural networks (NNs) since it conveys solid
theoretical justification of randomized learning. Existing works in RVFLN are
hardly scalable for data stream analytics because they are inherent to the
issue of complexity as a result of the absence of structural learning
scenarios. A novel class of RVLFN, namely parsimonious random vector functional
link network (pRVFLN), is proposed in this paper. pRVFLN features an open
structure paradigm where its network structure can be built from scratch and
can be automatically generated in accordance with degree of nonlinearity and
time-varying property of system being modelled. pRVFLN is equipped with
complexity reduction scenarios where inconsequential hidden nodes can be pruned
and input features can be dynamically selected. pRVFLN puts into perspective an
online active learning mechanism which expedites the training process and
relieves operator labelling efforts. In addition, pRVFLN introduces a
non-parametric type of hidden node, developed using an interval-valued data
cloud. The hidden node completely reflects the real data distribution and is
not constrained by a specific shape of the cluster. All learning procedures of
pRVFLN follow a strictly single-pass learning mode, which is applicable for an
online real-time deployment. The efficacy of pRVFLN was rigorously validated
through numerous simulations and comparisons with state-of-the art algorithms
where it produced the most encouraging numerical results. Furthermore, the
robustness of pRVFLN was investigated and a new conclusion is made to the scope
of random parameters where it plays vital role to the success of randomized
learning.
Mengyuan Wu, Ke Li, Sam Kwong, Qingfu Zhang
Comments: 24 pages, 5 figures
Subjects: Neural and Evolutionary Computing (cs.NE)
The decomposition-based method has been recognized as a major approach for
multi-objective optimization. It decomposes a multi-objective optimization
problem into several single-objective optimization subproblems, each of which
is usually defined as a scalarizing function using a weight vector. Due to the
characteristics of the contour line of a particular scalarizing function, the
performance of the decomposition-based method strongly depends on the Pareto
front’s shape by merely using a single scalarizing function, especially when
facing a large number of objectives. To improve the flexibility of the
decomposition-based method, this paper develops an adversarial decomposition
method that leverages the complementary characteristics of two different
scalarizing functions within a single paradigm. More specifically, we maintain
two co-evolving populations simultaneously by using different scalarizing
functions. In order to avoid allocating redundant computational resources to
the same region of the Pareto front, we stably match these two co-evolving
populations into one-one solution pairs according to their working regions of
the Pareto front. Then, each solution pair can at most contribute one mating
parent during the mating selection process. Comparing with nine
state-of-the-art many-objective optimizers, we have witnessed the competitive
performance of our proposed algorithm on 130 many-objective test instances with
various characteristics and Pareto front’s shapes.
Martin Simonovsky, Nikos Komodakis
Comments: Accepted to CVPR 2017; extended version
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
A number of problems can be formulated as prediction on graph-structured
data. In this work, we generalize the convolution operator from regular grids
to arbitrary graphs while avoiding the spectral domain, which allows us to
handle graphs of varying size and connectivity. To move beyond a simple
diffusion, filter weights are conditioned on the specific edge labels in the
neighborhood of a vertex. Together with the proper choice of graph coarsening,
we explore constructing deep neural networks for graph classification. In
particular, we demonstrate the generality of our formulation in point cloud
classification, where we set the new state of the art, and on a graph
classification dataset, where we outperform other deep learning approaches.
Avanti Shrikumar, Peyton Greenside, Anshul Kundaje
Comments: 9 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The purported “black box”‘ nature of neural networks is a barrier to adoption
in applications where interpretability is essential. Here we present DeepLIFT
(Deep Learning Important FeaTures), a method for decomposing the output
prediction of a neural network on a specific input by backpropagating the
contributions of all neurons in the network to every feature of the input.
DeepLIFT compares the activation of each neuron to its ‘reference activation’
and assigns contribution scores according to the difference. By optionally
giving separate consideration to positive and negative contributions, DeepLIFT
can also reveal dependencies which are missed by other approaches. Scores can
be computed efficiently in a single backward pass. We apply DeepLIFT to models
trained on MNIST and simulated genomic data, and show significant advantages
over gradient-based methods. A detailed video tutorial on the method is at
this http URL and code is at this http URL
Vincenzo Liguori
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
This paper explores the use of Pyramid Vector Quantization (PVQ) to reduce
the computational cost for a variety of neural networks (NNs) while, at the
same time, compressing the weights that describe them. This is based on the
fact that the dot product between an N dimensional vector of real numbers and
an N dimensional PVQ vector can be calculated with only additions and
subtractions and one multiplication. This is advantageous since tensor
products, commonly used in NNs, can be re-conduced to a dot product or a set of
dot products. Finally, it is stressed that any NN architecture that is based on
an operation that can be re-conduced to a dot product can benefit from the
techniques described here.
Samuel Rota Bulò, Gerhard Neuhold, Peter Kontschieder
Comments: accepted at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We introduce a novel loss max-pooling concept for handling imbalanced
training data distributions, applicable as alternative loss layer in the
context of deep neural networks for semantic image segmentation. Most
real-world semantic segmentation datasets exhibit long tail distributions with
few object categories comprising the majority of data and consequently biasing
the classifiers towards them. Our method adaptively re-weights the
contributions of each pixel based on their observed losses, targeting
under-performing classification results as often encountered for
under-represented object classes. Our approach goes beyond conventional
cost-sensitive learning attempts through adaptive considerations that allow us
to indirectly address both, inter- and intra-class imbalances. We provide a
theoretical justification of our approach, complementary to experimental
analyses on benchmark datasets. In our experiments on the Cityscapes and Pascal
VOC 2012 segmentation datasets we find consistently improved results,
demonstrating the efficacy of our approach.
Adrian Albert, Jasleen Kaur, Marta Gonzalez
Comments: 18 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Urban planning applications (energy audits, investment, etc.) require an
understanding of built infrastructure and its environment, i.e., both
low-level, physical features (amount of vegetation, building area and geometry
etc.), as well as higher-level concepts such as land use classes (which encode
expert understanding of socio-economic end uses). This kind of data is
expensive and labor-intensive to obtain, which limits its availability
(particularly in developing countries). We analyze patterns in land use in
urban neighborhoods using large-scale satellite imagery data (which is
available worldwide from third-party providers) and state-of-the-art computer
vision techniques based on deep convolutional neural networks. For supervision,
given the limited availability of standard benchmarks for remote-sensing data,
we obtain ground truth land use class labels carefully sampled from open-source
surveys, in particular the Urban Atlas land classification dataset of (20) land
use classes across (~300) European cities. We use this data to train and
compare deep architectures which have recently shown good performance on
standard computer vision tasks (image classification and segmentation),
including on geospatial data. Furthermore, we show that the deep
representations extracted from satellite imagery of urban environments can be
used to compare neighborhoods across several cities. We make our dataset
available for other machine learning researchers to use for remote-sensing
applications.
Weifeng Chen, Donglai Xiang, Jia Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We study the problem of single-image depth estimation for images in the wild.
We collect human annotated surface normals and use them to train a neural
network that directly predicts pixel-wise depth. We propose two novel loss
functions for training with surface normal annotations. Experiments on NYU
Depth and our own dataset demonstrate that our approach can significantly
improve the quality of depth estimation in the wild.
Björn Barz, Erik Rodner, Christoph Käding, Joachim Denzler
Comments: Technical Report about the possibilities introduced with ARTOS v2, originally created March 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We combine features extracted from pre-trained convolutional neural networks
(CNNs) with the fast, linear Exemplar-LDA classifier to get the advantages of
both: the high detection performance of CNNs, automatic feature engineering,
fast model learning from few training samples and efficient sliding-window
detection. The Adaptive Real-Time Object Detection System (ARTOS) has been
refactored broadly to be used in combination with Caffe for the experimental
studies reported in this work.
Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri, Philip H. S. Torr, Puneet K. Dokania
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)
This paper describes an intuitive generalization to the Generative
Adversarial Networks (GANs) to generate samples while capturing diverse modes
of the true data distribution. Firstly, we propose a very simple and intuitive
multi-agent GAN architecture that incorporates multiple generators capable of
generating samples from high probability modes. Secondly, in order to enforce
different generators to generate samples from diverse modes, we propose two
extensions to the standard GAN objective function. (1) We augment the generator
specific GAN objective function with a diversity enforcing term that encourage
different generators to generate diverse samples using a user-defined
similarity based function. (2) We modify the discriminator objective function
where along with finding the real and fake samples, the discriminator has to
predict the generator which generated the given fake sample. Intuitively, in
order to succeed in this task, the discriminator must learn to push different
generators towards different identifiable modes. Our framework is generalizable
in the sense that it can be easily combined with other existing variants of
GANs to produce diverse samples. Experimentally we show that our framework is
able to produce high quality diverse samples for the challenging tasks such as
image/face generation and image-to-image translation. We also show that it is
capable of learning a better feature representation in an unsupervised setting.
Martin Simonovsky, Nikos Komodakis
Comments: Accepted to CVPR 2017; extended version
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
A number of problems can be formulated as prediction on graph-structured
data. In this work, we generalize the convolution operator from regular grids
to arbitrary graphs while avoiding the spectral domain, which allows us to
handle graphs of varying size and connectivity. To move beyond a simple
diffusion, filter weights are conditioned on the specific edge labels in the
neighborhood of a vertex. Together with the proper choice of graph coarsening,
we explore constructing deep neural networks for graph classification. In
particular, we demonstrate the generality of our formulation in point cloud
classification, where we set the new state of the art, and on a graph
classification dataset, where we outperform other deep learning approaches.
Roy R. Lederman, Amit Singer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Single particle cryo-electron microscopy (EM) is an increasingly popular
method for determining the 3-D structure of macromolecules from noisy 2-D
images of single macromolecules whose orientations and positions are random and
unknown. One of the great opportunities in cryo-EM is to recover the structure
of macromolecules in heterogeneous samples, where multiple types or multiple
conformations are mixed together. Indeed, in recent years, many tools have been
introduced for the analysis of multiple discrete classes of molecules mixed
together in a cryo-EM experiment. However, many interesting structures have a
continuum of conformations which do not fit discrete models nicely; the
analysis of such continuously heterogeneous models has remained a more elusive
goal. In this manuscript, we propose to represent heterogeneous molecules and
similar structures as higher dimensional objects. We generalize the basic
operations used in many existing reconstruction algorithms, making our approach
generic in the sense that, in principle, existing algorithms can be adapted to
reconstruct those higher dimensional objects. As proof of concept, we present a
prototype of a new algorithm which we use to solve simulated reconstruction
problems.
Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell
Comments: Accepted to CVPR 2017. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we introduce a new video representation for action
classification that aggregates local convolutional features across the entire
spatio-temporal extent of the video. We do so by integrating state-of-the-art
two-stream networks with learnable spatio-temporal feature aggregation. The
resulting architecture is end-to-end trainable for whole-video classification.
We investigate different strategies for pooling across space and time and
combining signals from the different streams. We find that: (i) it is important
to pool jointly across space and time, but (ii) appearance and motion streams
are best aggregated into their own separate representations. Finally, we show
that our representation outperforms the two-stream base architecture by a large
margin (13% relative) as well as out-performs other baselines with comparable
base architectures on HMDB51, UCF101, and Charades video classification
benchmarks.
Partha Ghosh, Jie Song, Emre Aksan, Otmar Hilliges
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a new architecture for the learning of predictive spatio-temporal
motion models from data alone. Our approach, dubbed the Dropout Autoencoder
LSTM, is capable of synthesizing natural looking motion sequences over long
time horizons without catastrophic drift or motion degradation. The model
consists of two components, a 3-layer recurrent neural network to model
temporal aspects and a novel auto-encoder that is trained to implicitly recover
the spatial structure of the human skeleton via randomly removing information
about joints during training time. This Dropout Autoencoder (D-AE) is then used
to filter each predicted pose of the LSTM, reducing accumulation of error and
hence drift over time. Furthermore, we propose new evaluation protocols to
assess the quality of synthetic motion sequences even for which no ground truth
data exists. The proposed protocols can be used to assess generated sequences
of arbitrary length. Finally, we evaluate our proposed method on two of the
largest motion-capture datasets available to date and show that our model
outperforms the state-of-the-art on a variety of actions, including cyclic and
acyclic motion, and that it can produce natural looking sequences over longer
time horizons than previous methods.
Estefania Talavera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei, Petia Radeva
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a new method for egocentric video temporal
segmentation based on integrating a statistical mean change detector and
agglomerative clustering(AC) within an energy-minimization framework. Given the
tendency of most AC methods to oversegment video sequences when clustering
their frames, we combine the clustering with a concept drift detection
technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as
a statistical upper bound for the clustering-based video segmentation. We
integrate both techniques in an energy-minimization framework that serves to
disambiguate the decision of both techniques and to complete the segmentation
taking into account the temporal continuity of video frames descriptors. We
present experiments over egocentric sets of more than 13.000 images acquired
with different wearable cameras, showing that our method outperforms
state-of-the-art clustering methods.
Xiangteng He, Yuxin Peng
Comments: 9 pages, to appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Fine-grained image classification is a challenging task due to the large
intra-class variance and small inter-class variance, aiming at recognizing
hundreds of sub-categories belonging to the same basic-level category. Most
existing fine-grained image classification methods generally learn part
detection models to obtain the semantic parts for better classification
accuracy. Despite achieving promising results, these methods mainly have two
limitations: (1) not all the parts which obtained through the part detection
models are beneficial and indispensable for classification, and (2)
fine-grained image classification requires more detailed visual descriptions
which could not be provided by the part locations or attribute annotations. For
addressing the above two limitations, this paper proposes the two-stream model
combing vision and language (CVL) for learning latent semantic representations.
The vision stream learns deep representations from the original visual
information via deep convolutional neural network. The language stream utilizes
the natural language descriptions which could point out the discriminative
parts or characteristics for each image, and provides a flexible and compact
way of encoding the salient visual aspects for distinguishing sub-categories.
Since the two streams are complementary, combing the two streams can further
achieves better classification accuracy. Comparing with 12 state-of-the-art
methods on the widely used CUB-200-2011 dataset for fine-grained image
classification, the experimental results demonstrate our CVL approach achieves
the best performance.
Spyridon Thermos, Georgios Th. Papadopoulos, Petros Daras, Gerasimos Potamianos
Comments: 9 pages, 7 figures, dataset link included, accepted to CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
It is well-established by cognitive neuroscience that human perception of
objects constitutes a complex process, where object appearance information is
combined with evidence about the so-called object “affordances”, namely the
types of actions that humans typically perform when interacting with them. This
fact has recently motivated the “sensorimotor” approach to the challenging task
of automatic object recognition, where both information sources are fused to
improve robustness. In this work, the aforementioned paradigm is adopted,
surpassing current limitations of sensorimotor object recognition research.
Specifically, the deep learning paradigm is introduced to the problem for the
first time, developing a number of novel neuro-biologically and
neuro-physiologically inspired architectures that utilize state-of-the-art
neural networks for fusing the available information sources in multiple ways.
The proposed methods are evaluated using a large RGB-D corpus, which is
specifically collected for the task of sensorimotor object recognition and is
made publicly available. Experimental results demonstrate the utility of
affordance information to object recognition, achieving an up to 29% relative
error reduction by its inclusion.
Laura Leal-Taixé, Anton Milan, Konrad Schindler, Daniel Cremers, Ian Reid, Stefan Roth
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Standardized benchmarks are crucial for the majority of computer vision
applications. Although leaderboards and ranking tables should not be
over-claimed, benchmarks often provide the most objective measure of
performance and are therefore important guides for research. We present a
benchmark for Multiple Object Tracking launched in the late 2014, with the goal
of creating a framework for the standardized evaluation of multiple object
tracking methods. This paper collects the two releases of the benchmark made so
far, and provides an in-depth analysis of almost 50 state-of-the-art trackers
that were tested on over 11000 frames. We show the current trends and
weaknesses of multiple people tracking methods, and provide pointers of what
researchers should be focusing on to push the field forward.
Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, Jiaya Jia
Comments: 9 pages, submitted to conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Previous CNN-based video super-resolution approaches need to align multiple
frames to the reference. In this paper, we show that proper frame alignment and
motion compensation is crucial for achieving high quality results. We
accordingly propose a `sub-pixel motion compensation’ (SPMC) layer in a CNN
framework. Analysis and experiments show the suitability of this layer in video
SR. The final end-to-end, scalable CNN framework effectively incorporates the
SPMC layer and fuses multiple frames to reveal image details. Our
implementation can generate visually and quantitatively high-quality results,
superior to current state-of-the-arts, without the need of parameter tuning.
Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould
Comments: Accepted in IEEE International Conference on Computer Vision and Pattern Recognition CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a principled approach to uncover the structure of visual data by
solving a novel deep learning task coined visual permutation learning. The goal
of this task is to find the permutation that recovers the structure of data
from shuffled versions of it. In the case of natural images, this task boils
down to recovering the original image from patches shuffled by an unknown
permutation matrix. Unfortunately, permutation matrices are discrete, thereby
posing difficulties for gradient-based methods. To this end, we resort to a
continuous approximation of these matrices using doubly-stochastic matrices
which we generate from standard CNN predictions using Sinkhorn iterations.
Unrolling these iterations in a Sinkhorn network layer, we propose DeepPermNet,
an end-to-end CNN model for this task. The utility of DeepPermNet is
demonstrated on two challenging computer vision problems, namely, (i) relative
attributes learning and (ii) self-supervised representation learning. Our
results show state-of-the-art performance on the Public Figures and OSR
benchmarks for (i) and on the classification and segmentation tasks on the
PASCAL VOC dataset for (ii).
Zheng Xu, Mario A. T. Figueiredo, Xiaoming Yuan, Christoph Studer, Tom Goldstein
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Numerical Analysis (cs.NA)
Many modern computer vision and machine learning applications rely on solving
difficult optimization problems that involve non-differentiable objective
functions and constraints. The alternating direction method of multipliers
(ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a
generalization of ADMM that often achieves better performance, but its
efficiency depends strongly on algorithm parameters that must be chosen by an
expert user. We propose an adaptive method that automatically tunes the key
algorithm parameters to achieve optimal performance without user oversight.
Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM
(ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A
detailed convergence analysis of ARADMM is provided, and numerical results on
several applications demonstrate fast practical convergence.
Lei Bi, Jinman Kim, Ashnil Kumar, Dagan Feng
Comments: Submission for 2017 ISBI LiTS Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic segmentation of liver lesions is a fundamental requirement towards
the creation of computer aided diagnosis (CAD) and decision support systems
(CDS). Traditional segmentation approaches depend heavily upon hand-crafted
features and a priori knowledge of the user. As such, these methods are
difficult to adopt within a clinical environment. Recently, deep learning
methods based on fully convolutional networks (FCNs) have been successful in
many segmentation problems primarily because they leverage a large labelled
dataset to hierarchically learn the features that best correspond to the
shallow visual appearance as well as the deep semantics of the areas to be
segmented. However, FCNs based on a 16 layer VGGNet architecture have limited
capacity to add additional layers. Therefore, it is challenging to learn more
discriminative features among different classes for FCNs. In this study, we
overcome these limitations using deep residual networks (ResNet) to segment
liver lesions. ResNet contain skip connections between convolutional layers,
which solved the problem of the training degradation of training accuracy in
very deep networks and thereby enables the use of additional layers for
learning more discriminative features. In addition, we achieve more precise
boundary definitions through a novel cascaded ResNet architecture with
multi-scale fusion to gradually learn and infer the boundaries of both the
liver and the liver lesions. Our proposed method achieved 4th place in the ISBI
2017 Liver Tumor Segmentation Challenge by the submission deadline.
Rodney LaLonde, Dong Zhang, Mubarak Shah
Comments: Under review at the International Conference on Computer Vision (ICCV), 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Multiple object detection in wide area aerial videos, has drawn the attention
of the computer vision research community for a number of years. A novel
framework is proposed in this paper using a fully convolutional deep neural
network, which is able to detect all objects simultaneously for a given region
of interest. The network is designed to accept multiple video frames at a time
as the input and yields detection results for all objects in the temporally
center frame. This multi-frame approach yield far better results than its
single frame counterpart. Additionally, the proposed method can detect vehicles
which are slowing, stopped, and/or partially or fully occluded during some
frames, which cannot be handled by nearly all state-of-the-art methods. To the
best of our knowledge, this is the first use of a multiple-frame, fully
convolutional deep model for detecting multiple small objects and the only
framework which can detect stopped and temporarily occluded vehicles, for
aerial videos. The proposed network exceeds state-of-the-art results
significantly on WPAFB 2009 dataset.
Avanti Shrikumar, Peyton Greenside, Anshul Kundaje
Comments: 9 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The purported “black box”‘ nature of neural networks is a barrier to adoption
in applications where interpretability is essential. Here we present DeepLIFT
(Deep Learning Important FeaTures), a method for decomposing the output
prediction of a neural network on a specific input by backpropagating the
contributions of all neurons in the network to every feature of the input.
DeepLIFT compares the activation of each neuron to its ‘reference activation’
and assigns contribution scores according to the difference. By optionally
giving separate consideration to positive and negative contributions, DeepLIFT
can also reveal dependencies which are missed by other approaches. Scores can
be computed efficiently in a single backward pass. We apply DeepLIFT to models
trained on MNIST and simulated genomic data, and show significant advantages
over gradient-based methods. A detailed video tutorial on the method is at
this http URL and code is at this http URL
Kaveh Fathian, J. Pablo Ramirez-Paredes, Emily A. Doucette, J. Willard Curtis, Nicholas R. Gans
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We present a novel solution to the camera pose estimation problem, where
rotation and translation of a camera between two views are estimated from
matched feature points in the images. The camera pose estimation problem is
traditionally solved via algorithms that are based on the essential matrix or
the Euclidean homography. With six or more feature points in general positions
in the space, essential matrix based algorithms can recover a unique solution.
However, such algorithms fail when points are on critical surfaces (e.g.,
coplanar points) and homography should be used instead. By formulating the
problem in quaternions and decoupling the rotation and translation estimation,
our proposed algorithm works for all point configurations. Using both simulated
and real world images, we compare the estimation accuracy of our algorithm with
some of the most commonly used algorithms. Our method is shown to be more
robust to noise and outliers. For the benefit of community, we have made the
implementation of our algorithm available online and free.
Shanxin Yuan, Qi Ye, Bjorn Stenger, Siddhand Jain, Tae-Kyun Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we introduce a large-scale hand pose dataset, collected using a
novel capture method. Existing datasets are either generated synthetically or
captured using depth sensors: synthetic datasets exhibit a certain level of
appearance difference from real depth images, and real datasets are limited in
quantity and coverage, mainly due to the difficulty to annotate them. We
propose a tracking system with six 6D magnetic sensors and inverse kinematics
to automatically obtain 21-joints hand pose annotations of depth maps captured
with minimal restriction on the range of motion. The capture protocol aims to
fully cover the natural hand pose space. As shown in embedding plots, the new
dataset exhibits a significantly wider and denser range of hand poses compared
to existing benchmarks. Current state-of-the-art methods are evaluated on the
dataset, and we demonstrate significant improvements in cross-benchmark
performance. We also show significant improvements in egocentric hand pose
estimation with a CNN trained on the new dataset.
Hongsong Wang, Liang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, skeleton based action recognition gains more popularity due to
cost-effective depth sensors coupled with real-time skeleton estimation
algorithms. Traditional approaches based on handcrafted features are limited to
represent the complexity of motion patterns. Recent methods that use Recurrent
Neural Networks (RNN) to handle raw skeletons only focus on the contextual
dependency in the temporal domain and neglect the spatial configurations of
articulated skeletons. In this paper, we propose a novel two-stream RNN
architecture to model both temporal dynamics and spatial configurations for
skeleton based action recognition. We explore two different structures for the
temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed
according to human body kinematics. We also propose two effective methods to
model the spatial structure by converting the spatial graph into a sequence of
joints. To improve generalization of our model, we further exploit 3D
transformation based data augmentation techniques including rotation and
scaling transformation to transform the 3D coordinates of skeletons during
training. Experiments on 3D action recognition benchmark datasets show that our
method brings a considerable improvement for a variety of actions, i.e.,
generic actions, interaction activities and gestures.
Xin Chen, Emma Marriott, Yuling Yan
Comments: 4 pages 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, high-speed videoendoscopy (HSV) has significantly aided the
diagnosis of voice pathologies and furthered the understanding the voice
production in recent years. As the first step of these studies, automatic
segmentation of glottal images till presents a major challenge for this
technique. In this paper, we propose an improved Saliency Network that
automatically delineates the contour of the glottis from HSV image sequences.
Our proposed additional saliency measure, Motion Saliency (MS), improves upon
the original Saliency Network by using the velocities of defined edges. In our
results and analysis, we demonstrate the effectiveness of our approach and
discuss its potential applications for computer-aided assessment of voice
pathologies and understanding voice production.
Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, Alberto Del Bimbo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Compression artifacts arise in images whenever a lossy compression algorithm
is applied. These artifacts eliminate details present in the original image, or
add noise and small structures; because of these effects they make images less
pleasant for the human eye, and may also lead to decreased performance of
computer vision algorithms such as object detectors. To eliminate such
artifacts, when decompressing an image, it is required to recover the original
image from a disturbed version. To this end, we present a feed-forward fully
convolutional residual network model that directly optimizes the Structural
Similarity (SSIM), which is a better loss with respect to the simpler Mean
Squared Error (MSE). We then build on the same architecture to re-formulate the
problem in a generative adversarial framework. Our GAN is able to produce
images with more photorealistic details than MSE or SSIM based networks.
Moreover we show that our approach can be used as a pre-processing step for
object detection in case images are degraded by compression to a point that
state-of-the art detectors fail. In this task, our GAN method obtains better
performance than MSE or SSIM trained networks.
Santhosh K. Ramakrishnan, Ambar Pal, Gaurav Sharma, Anurag Mittal
Comments: 11 pages, 4 figures, accepted in CVPR 2017 (poster)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We study the problem of answering questions about images in the harder
setting, where the test questions and corresponding images contain novel
objects, which were not queried about in the training data. Such setting is
inevitable in real world-owing to the heavy tailed distribution of the visual
categories, there would be some objects which would not be annotated in the
train set. We show that the performance of two popular existing methods drop
significantly (up to 28%) when evaluated on novel objects cf. known objects. We
propose methods which use large existing external corpora of (i) unlabeled
text, i.e. books, and (ii) images tagged with classes, to achieve novel object
based visual question answering. We do systematic empirical studies, for both
an oracle case where the novel objects are known textually, as well as a fully
automatic case without any explicit knowledge of the novel objects, but with
the minimal assumption that the novel objects are semantically related to the
existing objects in training. The proposed methods for novel object based
visual question answering are modular and can potentially be used with many
visual question answering architectures. We show consistent improvements with
the two popular architectures and give qualitative analysis of the cases where
the model does well and of those where it fails to bring improvements.
Zili Yi, Hao Zhang, Ping Tan. Minglun Gong
Comments: First submitted to ICCV on Mar 9, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Using conditional Generative Adversarial Network (conditional GAN) for
cross-domain image-to-image translation has achieved significant improvements
in the past year. Depending on the degree of task complexity, thousands or even
millions of labeled image pairs are needed to train conditional GANs. However,
human labeling is very expensive and sometimes impractical. Inspired by the
success of dual learning paradigm in natural language translation, we develop a
novel dual-GAN mechanism, which enables image translators to be trained from
two sets of unlabeled images each representing a domain. In our architecture,
the primal GAN learns to translate images from domain (U) to those in domain
(V), while the dual GAN learns to convert images from (V) to (U). The closed
loop made by the primal and dual tasks allows images from either domain to be
translated and then reconstructed. Hence a loss function that accounts for the
reconstruction error of images can be used to train the translation models.
Experiments on multiple image translation tasks with unlabeled data show
considerable performance gain of our dual-GAN architecture over a single GAN.
For some tasks, our model can even achieve comparable or slightly better
results to conditional GAN trained on fully labeled data.
Lu Tian, Shengjin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Person re-identification is generally divided into two part: first how to
represent a pedestrian by discriminative visual descriptors and second how to
compare them by suitable distance metrics. Conventional methods isolate these
two parts, the first part usually unsupervised and the second part supervised.
The Bag-of-Words (BoW) model is a widely used image representing descriptor in
part one. Its codebook is simply generated by clustering visual features in
Euclidian space. In this paper, we propose to use part two metric learning
techniques in the codebook generation phase of BoW. In particular, the proposed
codebook is clustered under Mahalanobis distance which is learned supervised.
Extensive experiments prove that our proposed method is effective. With several
low level features extracted on superpixel and fused together, our method
outperforms state-of-the-art on person re-identification benchmarks including
VIPeR, PRID450S, and Market1501.
Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Despite a rapid rise in the quality of built-in smartphone cameras, their
physical limitations – small sensor size, compact lenses and the lack of
specific hardware, – impede them to achieve the quality results of DSLR
cameras. In this work we present an end-to-end deep learning approach that
bridges this gap by translating ordinary photos into DSLR-produced images. We
propose learning the translation function using a residual convolutional neural
network that improves both color rendition and image sharpness. Since the
standard mean squared loss is not well suited for measuring perceptual image
quality, we introduce a composite perceptual error function that combines
content, color and texture losses. The first two losses are defined
analytically, while the texture loss is learned using an adversarial network.
We also present a large-scale dataset that consists of real photos captured
from three different phones and one high-end reflex camera. Our quantitative
and qualitative assessments reveal that the enhanced images demonstrate the
quality comparable to DSLR-taken photos, while the method itself can be applied
to any type of digital cameras.
Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek, Tae-Kyun Kim
Comments: Dataset can be visualized here: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work we study the use of 3D hand poses to recognize first-person hand
actions interacting with 3D objects. Towards this goal, we collected RGB-D
video sequences of more than 100K frames of 45 daily hand action categories,
involving 25 different objects in several hand grasp configurations. To obtain
high quality hand pose annotations from real sequences, we used our own mo-cap
system that automatically infers the location of each of the 21 joints of the
hand via 6 magnetic sensors on the finger tips and the inverse-kinematics of a
hand model. To the best of our knowledge, this is the first benchmark for RGB-D
hand action sequences with 3D hand poses. Additionally, we recorded the 6D
(i.e. 3D rotations and locations) object poses and provide 3D object models for
a subset of hand-object interaction sequences. We present extensive
experimental evaluations of RGB-D and pose-based action recognition by 18
baselines/state-of-the-art. The impact of using appearance features, poses and
their combinations are measured, and the different training/testing protocols
including cross-persons are evaluated. Finally, we assess how ready the current
hand pose estimation is when hands are severely occluded by objects in
egocentric views and its influence on action recognition. From the results, we
see clear benefits of using hand pose as a cue for action recognition compared
to other data modalities. Our dataset and experiments can be of interest to
communities of 6D object pose, robotics, and 3D hand pose estimation as well as
action recognition.
Mohammad Reza Khosravi, Habib Rostami, Gholam Reza Ahmadi, Suleiman Mansouri, Ahmad Keshavarz
Journal-ref: International Journal of Electronics Communication and Computer
Engineering, vol. 6, no. 3, pp. 324-329 (2015)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Remote sensing image processing is so important in geo-sciences. Images which
are obtained by different types of sensors might initially be unrecognizable.
To make an acceptable visual perception in the images, some pre-processing
steps (for removing noises and etc) are preformed which they affect the
analysis of images. There are different types of processing according to the
types of remote sensing images. The method that we are going to introduce in
this paper is to use virtual colors to colorize the gray-scale images of
satellite sensors. This approach helps us to have a better analysis on a sample
single-band image which has been taken by Landsat-8 (OLI) sensor (as a
multi-band sensor with natural color bands, its images’ natural color can be
compared to synthetic color by our approach). A good feature of this method is
the original image reversibility in order to keep the suitable resolution of
output images.
Xiang Wu, Lingxiao Song, Ran He, Tieniu Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Heterogeneous face matching is a challenge issue in face recognition due to
large domain difference as well as insufficient pairwise images in different
modalities during training. This paper proposes a coupled deep learning (CDL)
approach for the heterogeneous face matching. CDL seeks a shared feature space
in which the heterogeneous face matching problem can be approximately treated
as a homogeneous face matching problem. The objective function of CDL mainly
includes two parts. The first part contains a trace norm as a relevance
constraint, which makes unpaired images from multiple modalities be clustered
and correlated. An approximate variational formulation is introduced to deal
with the difficulties of optimizing low-rank constraint directly. The second
part contains a cross modal ranking among triplet domain specific images to
maximize the margin for different identities and increase data for a small
amount of training samples. Besides, an alternating minimization method is
employed to iteratively update the parameters of CDL. Experimental results show
that CDL achieves better performance on the challenging CASIA NIR-VIS 2.0 face
recognition database, the IIIT-D Sketch database, the CUHK Face Sketch (CUFS),
and the CUHK Face Sketch FERET (CUFSF), which significantly outperforms
state-of-the-art heterogeneous face recognition methods.
Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, Yichen Wei
Comments: Submitted to ICCV 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we study the task of 3D human pose estimation in the wild.
This task is challenging because existing benchmark datasets provide either 2D
annotations in the wild or 3D annotations in controlled environments.
We propose a weakly-supervised transfer learning method that learns an
end-to-end network using training data with mixed 2D and 3D labels. The network
augments a state-of-the-art 2D pose estimation network with a 3D depth
regression network. Unlike previous approaches that train these two
sub-networks in a sequential manner, we introduce a unified training method
that fully exploits the correlation between these two sub-tasks and learns
common feature representations. In doing so, the 3D pose labels in controlled
environments are transferred to images in the wild that only possess 2D
annotations. In addition, we introduce a 3D geometric constraint to regularize
the prediction 3D poses, which is effective on images that only have 2D
annotations.
Our method leads to considerable performance gains and achieves competitive
results on both 2D and 3D benchmarks. It produces high quality 3D human poses
in the wild, without supervision of in-the-wild 3D data.
Feng Qian, Miao Yin, Ming-Jun Su, Yaojun Wang, Guangmin Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Prestack seismic data carries much useful information that can help us find
more complex atypical reservoirs. Therefore, we are increasingly inclined to
use prestack seismic data for seis- mic facies recognition. However, due to the
inclusion of ex- cessive redundancy, effective feature extraction from prestack
seismic data becomes critical. In this paper, we consider seis- mic facies
recognition based on prestack data as an image clus- tering problem in computer
vision (CV) by thinking of each prestack seismic gather as a picture. We
propose a convo- lutional autoencoder (CAE) network for deep feature learn- ing
from prestack seismic data, which is more effective than principal component
analysis (PCA) in redundancy removing and valid information extraction. Then,
using conventional classification or clustering techniques (e.g. K-means or
self- organizing maps) on the extracted features, we can achieve seismic facies
recognition. We applied our method to the prestack data from physical model and
LZB region. The re- sult shows that our approach is superior to the
conventionals.
Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
Comments: Accepted at CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel method for detecting pedestrians under adverse
illumination conditions. Our approach relies on a novel cross-modality learning
framework and it is based on two main phases. First, given a multimodal
dataset, a deep convolutional network is employed to learn a non-linear
mapping, modeling the relations between RGB and thermal data. Then, the learned
feature representations are transferred to a second deep network, which
receives as input an RGB image and outputs the detection results. In this way,
features which are both discriminative and robust to bad illumination
conditions are learned. Importantly, at test time, only the second pipeline is
considered and no thermal data are required. Our extensive evaluation
demonstrates that the proposed approach outperforms the state-of- the-art on
the challenging KAIST multispectral pedestrian dataset and it is competitive
with previous methods on the popular Caltech dataset.
Jo Schlemper, Jose Caballero, Joseph V. Hajnal, Anthony Price, Daniel Rueckert
Comments: arXiv admin note: substantial text overlap with arXiv:1703.00555
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Inspired by recent advances in deep learning, we propose a framework for
reconstructing dynamic sequences of 2D cardiac magnetic resonance (MR) images
from undersampled data using a deep cascade of convolutional neural networks
(CNNs) to accelerate the data acquisition process. In particular, we address
the case where data is acquired using aggressive Cartesian undersampling.
Firstly, we show that when each 2D image frame is reconstructed independently,
the proposed method outperforms state-of-the-art 2D compressed sensing
approaches such as dictionary learning-based MR image reconstruction, in terms
of reconstruction error and reconstruction speed. Secondly, when reconstructing
the frames of the sequences jointly, we demonstrate that CNNs can learn
spatio-temporal correlations efficiently by combining convolution and data
sharing approaches. We show that the proposed method consistently outperforms
Dictionary Learning with Temporal Gradients (DLTG) and is capable of preserving
anatomical structure more faithfully up to 11-fold undersampling. Moreover,
reconstruction is very fast: each complete dynamic sequence can be
reconstructed in less than 10s and, for the 2D case, each image frame can be
reconstructed in 23ms, enabling real-time applications.
Yuhang Wu, Shishir K. Shah, Ioannis A. Kakadiaris
Comments: Submitted to Image and Vision Computing in Feb. 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Facial landmark localization is a fundamental module for face recognition.
Current common approach for facial landmark detection is cascaded regression,
which is composed by two steps: feature extraction and facial shape regression.
Recent methods employ deep convolutional networks to extract robust features in
each step and the whole system could be regarded as a deep cascaded regression
architecture. Unfortunately, this architecture is problematic. First,
parameters in the networks are optimized from a greedy stage-wise perspective.
Second, the network cannot efficiently merge landmark coordinate vectors with
2D convolutional layers. Third, the facial shape regression relies on a feature
vector generated from the bottom layer of the convolutional neural network,
which has recently been criticized for lacking spatial resolution to accomplish
pixel-wise localization tasks. We propose a globally optimized dual-pathway
system (GoDP) to handle the optimization and precision weaknesses of deep
cascaded regression without resorting to high-level inference models or complex
stacked architecture. This end-to-end system relies on distance-aware softmax
functions and dual-pathway proposal-refinement architecture. The proposed
system outperforms the state-of-the-art cascaded regression-based methods on
multiple in-the-wild face alignment databases. Experiments on face
identification demonstrate that GoDP significantly improves the quality of face
frontalization in face recognition.
Kyle Genova, Manolis Savva, Angel X. Chang, Thomas Funkhouser
Comments: ICCV submission, combined main paper and supplemental material
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The use of rendered images, whether from completely synthetic datasets or
from 3D reconstructions, is increasingly prevalent in vision tasks. However,
little attention has been given to how the selection of viewpoints affects the
performance of rendered training sets. In this paper, we propose a data-driven
approach to view set selection. Given a set of example images, we extract
statistics describing their contents and generate a set of views matching the
distribution of those statistics. Motivated by semantic segmentation tasks, we
model the spatial distribution of each semantic object category within an image
view volume. We provide a search algorithm that generates a sampling of likely
candidate views according to the example distribution, and a set selection
algorithm that chooses a subset of the candidates that jointly cover the
example distribution. Results of experiments with these algorithms on SUNCG
indicate that they are indeed able to produce view distributions similar to an
example set from NYUDv2 according to the earth mover’s distance. Furthermore,
the selected views improve performance on semantic segmentation compared to
alternative view selection algorithms.
Anurag Arnab, Philip H.S Torr
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Semantic segmentation and object detection research have recently achieved
rapid progress. However, the former task has no notion of different instances
of the same object, and the latter operates at a coarse, bounding-box level. We
propose an Instance Segmentation system that produces a segmentation map where
each pixel is assigned an object class and instance identity label. Most
approaches adapt object detectors to produce segments instead of boxes. In
contrast, our method is based on an initial semantic segmentation module, which
feeds into an instance subnetwork. This subnetwork uses the initial
category-level segmentation, along with cues from the output of an object
detector, within an end-to-end CRF to predict instances. This part of our model
is dynamically instantiated to produce a variable number of instances per
image. Our end-to-end approach requires no post-processing and considers the
image holistically, instead of processing independent proposals. Therefore,
unlike some related work, a pixel cannot belong to multiple instances.
Furthermore, far more precise segmentations are achieved, as shown by our
state-of-the-art results (particularly at high IoU thresholds) on the Pascal
VOC and Cityscapes datasets.
P. Saponaro, W. Treible, A. Kolagunda, S. Rhein, J. Caplan, C. Kambhamettu, R. Wisser
Comments: This is submitted to ICIP 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automating the extraction and quantification of features from
three-dimensional (3-D) image stacks is a critical task for advancing computer
vision research. The union of 3-D image acquisition and analysis enables the
quantification of biological resistance of a plant tissue to fungal infection
through the analysis of attributes such as fungal penetration depth, fungal
mass, and branching of the fungal network of connected cells. From an image
processing perspective, these tasks reduce to segmentation of vessel-like
structures and the extraction of features from their skeletonization. In order
to sample multiple infection events for analysis, we have developed an approach
we refer to as macroscopic microscopy. However, macroscopic microscopy produces
high-resolution image stacks that pose challenges to routine approaches and are
difficult for a human to annotate to obtain ground truth data. We present a
synthetic hyphal network generator, a comparison of several vessel segmentation
methods, and a minimum spanning tree method for connecting small gaps resulting
from imperfections in imaging or incomplete skeletonization of hyphal networks.
Qualitative results are shown for real microscopic data. We believe the
comparison of vessel detectors on macroscopic microscopy data, the synthetic
vessel generator, and the gap closing technique are beneficial to the image
processing community.
Jana Lipková, Markus Rempfler, Patrick Christ, John Lowengrub, Bjoern H. Menze
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The segmentation of liver lesions is crucial for detection, diagnosis and
monitoring progression of liver cancer. However, design of accurate automated
methods remains challenging due to high noise in CT scans, low contrast between
liver and lesions, as well as large lesion variability. We propose a 3D
automatic, unsupervised method for liver lesions segmentation using a phase
separation approach. It is assumed that liver is a mixture of two phases:
healthy liver and lesions, represented by different image intensities polluted
by noise. The Cahn-Hilliard equation is used to remove the noise and separate
the mixture into two distinct phases with well-defined interfaces. This
simplifies the lesion detection and segmentation task drastically and enables
to segment liver lesions by thresholding the Cahn-Hilliard solution. The method
was tested on 3Dircadb and LITS dataset.
Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi
Comments: Submitted to Journal Paper, 28 pages, 12 figures, 5 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Major advances have recently been made in merging language and vision
representations. But most tasks considered so far have confined themselves to
the processing of objects and lexicalised relations amongst objects (content
words). We know, however, that humans (even pre-school children) can abstract
over raw data to perform certain types of higher-level reasoning, expressed in
natural language by function words. A case in point is given by their ability
to learn quantifiers, i.e. expressions like ‘few’, ‘some’ and ‘all’. From
formal semantics and cognitive linguistics, we know that quantifiers are
relations over sets which, as a simplification, we can see as proportions. For
instance, in ‘most fish are red’, most encodes the proportion of fish which are
red fish. In this paper, we study how well current language and vision
strategies model such relations. We show that state-of-the-art attention
mechanisms coupled with a traditional linguistic formalisation of quantifiers
gives best performance on the task. Additionally, we provide insights on the
role of ‘gist’ representations in quantification. A ‘logical’ strategy to
tackle the task would be to first obtain a numerosity estimation for the two
involved sets and then compare their cardinalities. We however argue that
precisely identifying the composition of the sets is not only beyond current
state-of-the-art models but perhaps even detrimental to a task that is most
efficiently performed by refining the approximate numerosity estimator of the
system.
Dat Tien Nguyen, Firoj Alam, Ferda Ofli, Muhammad Imran
Comments: Accepted for publication in the 14th International Conference on Information Systems For Crisis Response and Management (ISCRAM), 2017
Subjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)
The extensive use of social media platforms, especially during disasters,
creates unique opportunities for humanitarian organizations to gain situational
awareness and launch relief operations accordingly. In addition to the textual
content, people post overwhelming amounts of imagery data on social networks
within minutes of a disaster hit. Studies point to the importance of this
online imagery content for emergency response. Despite recent advances in the
computer vision field, automatic processing of the crisis-related social media
imagery data remains a challenging task. It is because a majority of which
consists of redundant and irrelevant content. In this paper, we present an
image processing pipeline that comprises de-duplication and relevancy filtering
mechanisms to collect and filter social media image content in real-time during
a crisis event. Results obtained from extensive experiments on real-world
crisis datasets demonstrate the significance of the proposed pipeline for
optimal utilization of both human and machine computing resources.
El Mahdi El Mhamdi, Rachid Guerraoui, Hadrien Hendrikx, Alexandre Maurer
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Recent progress in artificial intelligence enabled the design and
implementation of autonomous computing devices, agents, that may interact and
learn from each other to achieve certain goals. Sometimes however, a human
operator needs to intervene and interrupt an agent in order to prevent certain
dangerous situations. Yet, as part of their learning process, agents may link
these interruptions that impact their reward to specific states, and
deliberately avoid them. The situation is particularly challenging in a
distributed context because agents might not only learn from their own past
interruptions, but also from those of other agents. This paper defines the
notion of safe interruptibility as a distributed computing problem, and studies
this notion in the two main learning frameworks: joint action learners and
independent learners. We give realistic sufficient conditions on the learning
algorithm for safe interruptibility in the case of joint action learners, yet
show that these conditions are not sufficient for independent learners. We show
however that if agents can detect interruptions, it is possible to prune the
observations to ensure safe interruptibility even for independent learners
Martin Biehl
Comments: PhD thesis, 198 pages
Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Multiagent Systems (cs.MA)
This thesis contributes to the formalisation of the notion of an agent within
the class of finite multivariate Markov chains. Agents are seen as entities
that act, perceive, and are goal-directed.
We present a new measure that can be used to identify entities (called
(iota)-entities), some general requirements for entities in multivariate
Markov chains, as well as formal definitions of actions and perceptions
suitable for such entities.
The intuition behind (iota)-entities is that entities are spatiotemporal
patterns for which every part makes every other part more probable. The
measure, complete local integration (CLI), is formally investigated in general
Bayesian networks. It is based on the specific local integration (SLI) which is
measured with respect to a partition. CLI is the minimum value of SLI over all
partitions. We prove that (iota)-entities are blocks in specific partitions of
the global trajectory. These partitions are the finest partitions that achieve
a given SLI value. We also establish the transformation behaviour of SLI under
permutations of nodes in the network.
We go on to present three conditions on general definitions of entities.
These are not fulfilled by sets of random variables i.e. the perception-action
loop, which is often used to model agents, is too restrictive. We propose that
any general entity definition should in effect specify a subset (called an an
entity-set) of the set of all spatiotemporal patterns of a given multivariate
Markov chain. The set of (iota)-entities is such a set. Importantly the
perception-action loop also induces an entity-set.
We then propose formal definitions of actions and perceptions for arbitrary
entity-sets. These specialise to standard notions in case of the
perception-action loop entity-set.
Finally we look at some very simple examples.
Andrew J Sedgewick, Joseph D. Ramsey, Peter Spirtes, Clark Glymour, Panayiotis V. Benos
Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Graphical causal models are an important tool for knowledge discovery because
they can represent both the causal relations between variables and the
multivariate probability distributions over the data. Once learned, causal
graphs can be used for classification, feature selection and hypothesis
generation, while revealing the underlying causal network structure and thus
allowing for arbitrary likelihood queries over the data. However, current
algorithms for learning sparse directed graphs are generally designed to handle
only one type of data (continuous-only or discrete-only), which limits their
applicability to a large class of multi-modal biological datasets that include
mixed type variables. To address this issue, we developed new methods that
modify and combine existing methods for finding undirected graphs with methods
for finding directed graphs. These hybrid methods are not only faster, but also
perform better than the directed graph estimation methods alone for a variety
of parameter settings and data set sizes. Here, we describe a new conditional
independence test for learning directed graphs over mixed data types and we
compare performances of different graph learning strategies on synthetic data.
Mieczysław A. Kłopotek, Sławomir T. Wierzchoń
Comments: 23 pages
Journal-ref: This is the preliminary version of the paper published in
Demonstratio Mathematica. Vol XXXI No 3,1998, pp. 669-688
Subjects: Artificial Intelligence (cs.AI)
The paper presents a novel view of the Dempster-Shafer belief function as a
measure of diversity in relational data bases. It is demonstrated that under
the interpretation The Dempster rule of evidence combination corresponds to the
join operator of the relational database theory. This rough-set based
interpretation is qualitative in nature and can represent a number of belief
function operators.
The interpretation has the property that Given a definition of the belief
measure of objects in the interpretation domain we can perform operations in
this domain and the measure of the resulting object is derivable from measures
of component objects via belief operator. We demonstrated this property for
Dempster rule of combination, marginalization, Shafer’s conditioning,
independent variables, Shenoy’s notion of conditional independence of
variables.
The interpretation is based on rough sets (in connection with decision
tables), but differs from previous interpretations of this type in that it
counts the diversity rather than frequencies in a decision table.
Thales Felipe Costa Bertaglia, Maria das Graças Volpe Nunes
Comments: Published in Proceedings of the 2nd Workshop on Noisy User-generated Text, 9 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Text normalization techniques based on rules, lexicons or supervised training
requiring large corpora are not scalable nor domain interchangeable, and this
makes them unsuitable for normalizing user-generated content (UGC). Current
tools available for Brazilian Portuguese make use of such techniques. In this
work we propose a technique based on distributed representation of words (or
word embeddings). It generates continuous numeric vectors of
high-dimensionality to represent words. The vectors explicitly encode many
linguistic regularities and patterns, as well as syntactic and semantic word
relationships. Words that share semantic similarity are represented by similar
vectors. Based on these features, we present a totally unsupervised, expandable
and language and domain independent method for learning normalization lexicons
from word embeddings. Our approach obtains high correction rate of orthographic
errors and internet slang in product reviews, outperforming the current
available tools for Brazilian Portuguese.
Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi
Comments: Submitted to Journal Paper, 28 pages, 12 figures, 5 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Major advances have recently been made in merging language and vision
representations. But most tasks considered so far have confined themselves to
the processing of objects and lexicalised relations amongst objects (content
words). We know, however, that humans (even pre-school children) can abstract
over raw data to perform certain types of higher-level reasoning, expressed in
natural language by function words. A case in point is given by their ability
to learn quantifiers, i.e. expressions like ‘few’, ‘some’ and ‘all’. From
formal semantics and cognitive linguistics, we know that quantifiers are
relations over sets which, as a simplification, we can see as proportions. For
instance, in ‘most fish are red’, most encodes the proportion of fish which are
red fish. In this paper, we study how well current language and vision
strategies model such relations. We show that state-of-the-art attention
mechanisms coupled with a traditional linguistic formalisation of quantifiers
gives best performance on the task. Additionally, we provide insights on the
role of ‘gist’ representations in quantification. A ‘logical’ strategy to
tackle the task would be to first obtain a numerosity estimation for the two
involved sets and then compare their cardinalities. We however argue that
precisely identifying the composition of the sets is not only beyond current
state-of-the-art models but perhaps even detrimental to a task that is most
efficiently performed by refining the approximate numerosity estimator of the
system.
Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri, Philip H. S. Torr, Puneet K. Dokania
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)
This paper describes an intuitive generalization to the Generative
Adversarial Networks (GANs) to generate samples while capturing diverse modes
of the true data distribution. Firstly, we propose a very simple and intuitive
multi-agent GAN architecture that incorporates multiple generators capable of
generating samples from high probability modes. Secondly, in order to enforce
different generators to generate samples from diverse modes, we propose two
extensions to the standard GAN objective function. (1) We augment the generator
specific GAN objective function with a diversity enforcing term that encourage
different generators to generate diverse samples using a user-defined
similarity based function. (2) We modify the discriminator objective function
where along with finding the real and fake samples, the discriminator has to
predict the generator which generated the given fake sample. Intuitively, in
order to succeed in this task, the discriminator must learn to push different
generators towards different identifiable modes. Our framework is generalizable
in the sense that it can be easily combined with other existing variants of
GANs to produce diverse samples. Experimentally we show that our framework is
able to produce high quality diverse samples for the challenging tasks such as
image/face generation and image-to-image translation. We also show that it is
capable of learning a better feature representation in an unsupervised setting.
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum
Journal-ref: SemEval 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We describe the SemEval task of extracting keyphrases and relations between
them from scientific documents, which is crucial for understanding which
publications describe which processes, tasks and materials. Although this was a
new task, we had a total of 26 submissions across 3 evaluation scenarios. We
expect the task and the findings reported in this paper to be relevant for
researchers working on understanding scientific content, as well as the broader
knowledge base population and information extraction communities.
Zheng Xu, Mario A. T. Figueiredo, Xiaoming Yuan, Christoph Studer, Tom Goldstein
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Numerical Analysis (cs.NA)
Many modern computer vision and machine learning applications rely on solving
difficult optimization problems that involve non-differentiable objective
functions and constraints. The alternating direction method of multipliers
(ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a
generalization of ADMM that often achieves better performance, but its
efficiency depends strongly on algorithm parameters that must be chosen by an
expert user. We propose an adaptive method that automatically tunes the key
algorithm parameters to achieve optimal performance without user oversight.
Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM
(ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A
detailed convergence analysis of ARADMM is provided, and numerical results on
several applications demonstrate fast practical convergence.
David S. Warren, Yanhong A. Liu
Comments: David S. Warren and Yanhong A. Liu (Editors). 33 pages. Including summaries by Christopher Kane and abstracts or position papers by M. Aref, J. Rosenwald, I. Cervesato, E.S.L. Lam, M. Balduccini, J. Lobo, A. Russo, E. Lupu, N. Leone, F. Ricca, G. Gupta, K. Marple, E. Salazar, Z. Chen, A. Sobhi, S. Srirangapalli, C.R. Ramakrishnan, N. Bj{o}rner, N.P. Lopes, A. Rybalchenko, and P. Tarau
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
This document describes the contributions of the 2016 Applications of Logic
Programming Workshop (AppLP), which was held on October 17 and associated with
the International Conference on Logic Programming (ICLP) in Flushing, New York
City.
Yubo Zhou, Ali Nadaf
Subjects: Information Retrieval (cs.IR)
Using only implicit data, many recommender systems fail in general to provide
a precise set of recommendations to users with limited interaction history.
This issue is regarded as the “Cold Start” problem and is typically resolved by
switching to content-based approaches where extra costly information is
required. In this paper, we use a dimensionality reduction algorithm, Word2Vec
(W2V), originally applied in Natural Language Processing problems under the
framework of Collaborative Filtering (CF) to tackle the “Cold Start” problem
using only implicit data. This combined method is named Embedded Collaborative
Filtering (ECF). An experiment is conducted to determine the performance of ECF
on two different implicit data sets. We show that the ECF approach outperforms
other popular and state-of-the-art approaches in “Cold Start” scenarios.
Thales Felipe Costa Bertaglia, Maria das Graças Volpe Nunes
Comments: Published in Proceedings of the 2nd Workshop on Noisy User-generated Text, 9 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Text normalization techniques based on rules, lexicons or supervised training
requiring large corpora are not scalable nor domain interchangeable, and this
makes them unsuitable for normalizing user-generated content (UGC). Current
tools available for Brazilian Portuguese make use of such techniques. In this
work we propose a technique based on distributed representation of words (or
word embeddings). It generates continuous numeric vectors of
high-dimensionality to represent words. The vectors explicitly encode many
linguistic regularities and patterns, as well as syntactic and semantic word
relationships. Words that share semantic similarity are represented by similar
vectors. Based on these features, we present a totally unsupervised, expandable
and language and domain independent method for learning normalization lexicons
from word embeddings. Our approach obtains high correction rate of orthographic
errors and internet slang in product reviews, outperforming the current
available tools for Brazilian Portuguese.
Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi
Comments: Submitted to Journal Paper, 28 pages, 12 figures, 5 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Major advances have recently been made in merging language and vision
representations. But most tasks considered so far have confined themselves to
the processing of objects and lexicalised relations amongst objects (content
words). We know, however, that humans (even pre-school children) can abstract
over raw data to perform certain types of higher-level reasoning, expressed in
natural language by function words. A case in point is given by their ability
to learn quantifiers, i.e. expressions like ‘few’, ‘some’ and ‘all’. From
formal semantics and cognitive linguistics, we know that quantifiers are
relations over sets which, as a simplification, we can see as proportions. For
instance, in ‘most fish are red’, most encodes the proportion of fish which are
red fish. In this paper, we study how well current language and vision
strategies model such relations. We show that state-of-the-art attention
mechanisms coupled with a traditional linguistic formalisation of quantifiers
gives best performance on the task. Additionally, we provide insights on the
role of ‘gist’ representations in quantification. A ‘logical’ strategy to
tackle the task would be to first obtain a numerosity estimation for the two
involved sets and then compare their cardinalities. We however argue that
precisely identifying the composition of the sets is not only beyond current
state-of-the-art models but perhaps even detrimental to a task that is most
efficiently performed by refining the approximate numerosity estimator of the
system.
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum
Journal-ref: SemEval 2017
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We describe the SemEval task of extracting keyphrases and relations between
them from scientific documents, which is crucial for understanding which
publications describe which processes, tasks and materials. Although this was a
new task, we had a total of 26 submissions across 3 evaluation scenarios. We
expect the task and the findings reported in this paper to be relevant for
researchers working on understanding scientific content, as well as the broader
knowledge base population and information extraction communities.
Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq
Journal-ref: European Chapter of the Association for Computational Linguistics
(EACL) 2017, Valencia, Spain, pp. 417-427
Subjects: Computation and Language (cs.CL)
We present a Character-Word Long Short-Term Memory Language Model which both
reduces the perplexity with respect to a baseline word-level language model and
reduces the number of parameters of the model. Character information can reveal
structural (dis)similarities between words and can even be used when a word is
out-of-vocabulary, thus improving the modeling of infrequent and unknown words.
By concatenating word and character embeddings, we achieve up to 2.77% relative
improvement on English compared to a baseline model with a similar amount of
parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level
models with a larger number of parameters.
Chuanqi Tan, Furu Wei, Pengjie Ren, Weifeng Lv, Ming Zhou
Subjects: Computation and Language (cs.CL)
We present a simple yet effective approach for linking entities in queries.
The key idea is to search sentences similar to a query from Wikipedia articles
and directly use the human-annotated entities in the similar sentences as
candidate entities for the query. Then, we employ a rich set of features, such
as link-probability, context-matching, word embeddings, and relatedness among
candidate entities as well as their related entities, to rank the candidates
under a regression based framework. The advantages of our approach lie in two
aspects, which contribute to the ranking process and final linking result.
First, it can greatly reduce the number of candidate entities by filtering out
irrelevant entities with the words in the query. Second, we can obtain the
query sensitive prior probability in addition to the static link-probability
derived from all Wikipedia articles. We conduct experiments on two benchmark
datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ
dataset. Experimental results show that our method outperforms state-of-the-art
systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ
dataset.
Quynh Ngoc Thi Do, Steven Bethard, Marie-Francine Moens
Subjects: Computation and Language (cs.CL)
We introduce an approach to implicit semantic role labeling (iSRL) based on a
recurrent neural semantic frame model that learns probability distributions
over sequences of explicit semantic frame arguments. On the NomBank iSRL test
set, the approach results in better state-of-the-art performance with much less
reliance on manually constructed language resources.
Dafydd Gibbon
Comments: 29 pages, 21 figures
Subjects: Computation and Language (cs.CL)
The present contribution is a tutorial on selected aspects of prosody, the
rhythms and melodies of speech, based on a course of the same name at the
Summer School on Contemporary Phonetics and Phonology at Tongji University,
Shanghai, China in July 2016. The tutorial is not intended as an introduction
to experimental methodology or as an overview of the literature on the topic,
but as an outline of observationally accessible aspects of fundamental
frequency and timing patterns with the aid of computational visualisation,
situated in a semiotic framework of sign ranks and interpretations. After an
informal introduction to the basic concepts of prosody in the introduction and
a discussion of the place of prosody in the architecture of language, a
selection of acoustic phonetic topics in phonemic tone and accent prosody, word
prosody, phrasal prosody and discourse prosody are discussed, and a stylisation
method for visualising aspects of prosody is introduced. Examples are taken
from a number of typologically different languages: Anyi/Agni (Niger-Congo>Kwa,
Ivory Coast), English, Kuki-Thadou (Sino-Tibetan, North-East India and
Myanmar), Mandarin Chinese, Tem (Niger-Congo>Gur, Togo) and Farsi. The main
focus is on fundamental frequency patterns, but issues of timing and rhythm are
also discussed. In the final section, further reading and possible future
research directions are outlined.
Steffen Eger, Alexander Mehler
Comments: Published at ACL 2016, Berlin (short papers)
Subjects: Computation and Language (cs.CL)
We consider two graph models of semantic change. The first is a time-series
model that relates embedding vectors from one time period to embedding vectors
of previous time periods. In the second, we construct one graph for each word:
nodes in this graph correspond to time points and edge weights to the
similarity of the word’s meaning across two time points. We apply our two
models to corpora across three different languages. We find that semantic
change is linear in two senses. Firstly, today’s embedding vectors (= meaning)
of words can be derived as linear combinations of embedding vectors of their
neighbors in previous time periods. Secondly, self-similarity of words decays
linearly in time. We consider both findings as new laws/hypotheses of semantic
change.
Luis Gerardo Mojica
Subjects: Computation and Language (cs.CL)
An-ever increasing number of social media websites, electronic newspapers and
Internet forums allow visitors to leave comments for others to read and
interact. This exchange is not free from participants with malicious
intentions, which do not contribute with the written conversation. Among
different communities users adopt strategies to handle such users. In this
paper we present a comprehensive categorization of the trolling phenomena
resource, inspired by politeness research and propose a model that jointly
predicts four crucial aspects of trolling: intention, interpretation, intention
disclosure and response strategy. Finally, we present a new annotated dataset
containing excerpts of conversations involving trolls and the interactions with
other users that we hope will be a useful resource for the research community.
Zhe Liu, Anbang Xu, Mengdi Zhang, Jalal Mahmud, Vibha Sinha
Subjects: Computation and Language (cs.CL)
One problem that every presenter faces when delivering a public discourse is
how to hold the listeners’ attentions or to keep them involved. Therefore, many
studies in conversation analysis work on this issue and suggest qualitatively
con-structions that can effectively lead to audience’s applause. To investigate
these proposals quantitatively, in this study we an-alyze the transcripts of
2,135 TED Talks, with a particular fo-cus on the rhetorical devices that are
used by the presenters for applause elicitation. Through conducting regression
anal-ysis, we identify and interpret 24 rhetorical devices as triggers of
audience applauding. We further build models that can rec-ognize
applause-evoking sentences and conclude this work with potential implications.
Maria Chiara Caschera, Fernando Ferri, Patrizia Grifoni
Comments: 23 pages
Journal-ref: JNIT (Journal of Next Generation Information Technology), Volume 4
Issue 5, July, 2013,Pages 87-109, ISSN 2092-8637. GlobalCIS (Convergence
Information Society, Republic of Korea)
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
This paper deals with classifying ambiguities for Multimodal Languages. It
evolves the classifications and the methods of the literature on ambiguities
for Natural Language and Visual Language, empirically defining an original
classification of ambiguities for multimodal interaction using a linguistic
perspective. This classification distinguishes between Semantic and Syntactic
multimodal ambiguities and their subclasses, which are intercepted using a
rule-based method implemented in a software module. The experimental results
have achieved an accuracy of the obtained classification compared to the
expected one, which are defined by the human judgment, of 94.6% for the
semantic ambiguities classes, and 92.1% for the syntactic ambiguities classes.
Eric Bailey, Shuchin Aeron
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
Most popular word embedding techniques involve implicit or explicit
factorization of a word co-occurrence based matrix into low rank factors. In
this paper, we aim to generalize this trend by using numerical methods to
factor higher-order word co-occurrence based arrays, or extit{tensors}. We
present four word embeddings using tensor factorization and analyze their
advantages and disadvantages. One of our main contributions is a novel joint
symmetric tensor factorization technique related to the idea of coupled tensor
factorization. We show that embeddings based on tensor factorization can be
used to discern the various meanings of polysemous words without being
explicitly trained to do so, and motivate the intuition behind why this works
in a way that doesn’t with existing methods. We also modify an existing word
embedding evaluation metric known as Outlier Detection [Camacho-Collados and
Navigli, 2016] to evaluate the quality of the order-(N) relations that a word
embedding captures, and show that tensor-based methods outperform existing
matrix-based methods at this task. Experimentally, we show that all of our word
embeddings either outperform or are competitive with state-of-the-art baselines
commonly used today on a variety of recent datasets. Suggested applications of
tensor factorization-based word embeddings are given, and all source code and
pre-trained vectors are publicly available online.
Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)
Voice conversion (VC) using sequence-to-sequence learning of context
posterior probabilities is proposed. Conventional VC using shared context
posterior probabilities predicts target speech parameters from the context
posterior probabilities estimated from the source speech parameters. Although
conventional VC can be built from non-parallel data, it is difficult to convert
speaker individuality such as phonetic property and speaking rate contained in
the posterior probabilities because the source posterior probabilities are
directly used for predicting target speech parameters. In this work, we assume
that the training data partly include parallel speech data and propose
sequence-to-sequence learning between the source and target posterior
probabilities. The conversion models perform non-linear and variable-length
transformation from the source probability sequence to the target one. Further,
we propose a joint training algorithm for the modules. In contrast to
conventional VC, which separately trains the speech recognition that estimates
posterior probabilities and the speech synthesis that predicts target speech
parameters, our proposed method jointly trains these modules along with the
proposed probability conversion modules. Experimental results demonstrate that
our approach outperforms the conventional VC.
Ioannis Giannakopoulos, Dimitrios Tsoumakos, Nectarios Koziris
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Performance (cs.PF)
Cloud computing has allowed applications to allocate and elastically utilize
massive amounts of resources of different types, leading to an exponential
growth of the applications’ configuration space and increased difficulty in
predicting their performance. In this work, we describe a novel, automated
profiling methodology that makes no assumptions on application structure. Our
approach utilizes oblique Decision Trees in order to recursively partition an
application’s configuration space in disjoint regions, choose a set of
representative samples from each subregion according to a defined policy and
returns a model for the entire configuration space as a composition of linear
models over each subregion. An extensive experimental evaluation over real-life
applications and synthetic performance functions showcases that our scheme
outperforms other state-of-the-art profiling methodologies. It particularly
excels at reflecting abnormalities and discontinuities of the performance
function, allowing the user to influence the sampling policy based on the
modeling accuracy, the space coverage and the deployment cost.
Shaoshan Liu, Jie Tang, Chao Wang, Quan Wang, Jean-Luc Gaudiot
Comments: 8 pages, 12 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)
Autonomous driving clouds provide essential services to support autonomous
vehicles. Today these services include but not limited to distributed
simulation tests for new algorithm deployment, offline deep learning model
training, and High-Definition (HD) map generation. These services require
infrastructure support including distributed computing, distributed storage, as
well as heterogeneous computing. In this paper, we present the details of how
we implement a unified autonomous driving cloud infrastructure, and how we
support these services on top of this infrastructure.
Rajdeep Das, Rohit Pratap Singh, Ripon Patgiri
Comments: Journal Article
Journal-ref: International Journal of Current Engineering and Scientific
Research, volume 3(11), pages 88-100, 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Undoubtedly, the MapReduce is the most powerful programming paradigm in
distributed computing. The enhancement of the MapReduce is essential and it can
lead the computing faster. Therefore, here are many scheduling algorithms to
discuss based on their characteristics. Moreover, there are many shortcoming to
discover in this field. In this article, we present the state-of-the-art
scheduling algorithm to enhance the understanding of the algorithms. The
algorithms are presented systematically such that there can be many future
possibilities in scheduling algorithm through this article. In this paper, we
provide in-depth insight on the MapReduce scheduling algorithm. In addition, we
discuss various issues of MapReduce scheduler developed for large-scale
computing as well as heterogeneous environment.
Barun Gorain, Partha Sarathi Mandal
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Time-varying coverage, namely sweep coverage is a recent development in the
area of wireless sensor networks, where a small number of mobile sensors sweep
or monitor comparatively large number of locations periodically. In this
article we study barrier sweep coverage with mobile sensors where the barrier
is considered as a finite length continuous curve on a plane. The coverage at
every point on the curve is time-variant. We propose an optimal solution for
sweep coverage of a finite length continuous curve. Usually energy source of a
mobile sensor is battery with limited power, so energy restricted sweep
coverage is a challenging problem for long running applications. We propose an
energy restricted sweep coverage problem where every mobile sensors must visit
an energy source frequently to recharge or replace its battery. We propose a
(frac{13}{3})-approximation algorithm for this problem. The proposed algorithm
for multiple curves achieves the best possible approximation factor 2 for a
special case. We propose a 5-approximation algorithm for the general problem.
As an application of the barrier sweep coverage problem for a set of line
segments, we formulate a data gathering problem. In this problem a set of
mobile sensors is arbitrarily monitoring the line segments one for each. A set
of data mules periodically collects the monitoring data from the set of mobile
sensors. We prove that finding the minimum number of data mules to collect data
periodically from every mobile sensor is NP-hard and propose a 3-approximation
algorithm to solve it.
Giuseppe Antonio Di Luna, Paola Flocchini, Linda Pagli, Giuseppe Prencipe, Nicola Santoro, Giovanni Viglietta
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The gathering problem requires a set of mobile agents, arbitrarily positioned
at different nodes of the network to group within finite time at the same
location, not fixed in advanced.
The extensive existing literature on this problem shares the same fundamental
assumption that the topological structure does not change during the rendezvous
or the gathering.
In this paper we start the investigation of gathering in dynamic graphs, that
is networks where the topology changes continuously and at unpredictable
locations.
We study the feasibility of gathering mobile agents, identical and without
explicit communication capabilities, in a dynamic ring of anonymous nodes; the
class of dynamics we consider is the classic 1-interval-connectivity. In
particular, we focus on the impact that factors such as chirality (i.e., common
sense of orientation) and cross detection (i.e., the ability to detect, when
traversing an edge, whether some agent is traversing it in the other
direction), have on the solvability of the problem.
We establish several results. We provide a complete characterization of the
classes of initial configurations from which gathering problem is solvable in
presence and in absence of cross detection. We provide distributed algorithms
that allow the agents to gather within low polynomial time. In particular, the
protocol for gathering with cross detection is time optimal. We show that cross
detection is a powerful computational element; furthermore, we prove that, with
cross detection, knowledge of the ring size is strictly more powerful than
knowledge of the number of agents.
From our investigation it follows that, for the gathering problem, the
computational obstacles created by the dynamic nature of the ring can be
overcome by the presence of chirality or of cross-detection.
Ling Ren, Kartik Nayak, Ittai Abraham, Srinivas Devadas
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
We present new protocols for Byzantine state machine replication and
Byzantine agreement in the synchronous and authenticated setting. The
celebrated PBFT state machine replication protocol tolerates (f) Byzantine
faults in an asynchronous setting using (3f+1) replicas, and has since been
studied or deployed by numerous works. In this work, we improve the Byzantine
fault tolerance to (n=2f+1) by utilizing the synchrony assumption. The key
challenge is to ensure a quorum intersection at one emph{honest} replica. Our
solution is to rely on the synchrony assumption to form a emph{post-commit}
quorum of size (2f+1), which intersects at (f+1) replicas with any
emph{pre-commit} quorums of size (f+1). Our protocol also solves synchronous
authenticated Byzantine agreement in fewer rounds than the best existing
solution (Katz and Koo, 2006). A challenge in this direction is to handle
non-simultaneous termination, which we solve by introducing a notion of
emph{virtual} participation after termination. Our protocols may be applied to
build practical synchronous Byzantine fault tolerant systems and improve
cryptographic protocols such as secure multiparty computation and
cryptocurrencies when synchrony can be assumed.
Ehsan Totoni, Wajih Ul Hassan, Todd A. Anderson, Tatiana Shpeisman
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Data frames in scripting languages are essential abstractions for processing
structured data. However, existing data frame solutions are either not
distributed (e.g., Pandas in Python) and therefore have limited scalability, or
they are not tightly integrated with array computations (e.g., Spark SQL). This
paper proposes a novel compiler-based approach where we integrate data frames
into the High Performance Analytics Toolkit (HPAT) to build HiFrames. It
provides expressive and flexible data frame APIs which are tightly integrated
with array operations. HiFrames then automatically parallelizes and compiles
relational operations along with other array computations in end-to-end data
analytics programs, and generates efficient MPI/C++ code. We demonstrate that
HiFrames is significantly faster than alternatives such as Spark SQL on
clusters, without forcing the programmer to switch to embedded SQL for part of
the program. HiFrames is 3.6x to 70x faster than Spark SQL for basic relational
operations, and can be up to 20,000x faster for advanced analytics operations,
such as weighted moving averages (WMA), that the map-reduce paradigm cannot
handle effectively. HiFrames is also 5x faster than Spark SQL for TPCx-BB Q26
on 64 nodes of Cori supercomputer.
Marlon Brenes, Vipin Kerala Varma, Antonello Scardicchio, Ivan Girotto
Comments: 16 pages, 6 figures, 3 tables
Subjects: Computational Physics (physics.comp-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Strongly Correlated Electrons (cond-mat.str-el); Distributed, Parallel, and Cluster Computing (cs.DC)
We have developed an application and implemented parallel algorithms in order
to provide a computational framework suitable for massively parallel
supercomputers to study the unitary dynamics of quantum systems. We use
renowned parallel libraries such as PETSc/SLEPc combined with high-performance
computing approaches in order to overcome the large memory requirements to be
able to study systems whose Hilbert space dimension comprises over 9 billion
independent quantum states. Moreover, we provide descriptions on the parallel
approach used for the three most important stages of the simulation: handling
the Hilbert subspace basis, constructing a matrix representation for a generic
Hamiltonian operator and the time evolution of the system by means of the
Krylov subspace methods. We employ our setup to study the evolution of
quasidisordered and clean many-body systems, focussing on the return
probability and related dynamical exponents: the large system sizes accessible
provide novel insights into their thermalization properties.
Stanislav Minsker, Nate Strawn
Subjects: Statistics Theory (math.ST); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
This paper presents new algorithms for distributed statistical estimation
that can take advantage of the divide-and-conquer approach. We show that one of
the key benefits attained by an appropriate divide-and-conquer strategy is
robustness, an important characteristic of large distributed systems. We
introduce a class of algorithms that are based on the properties of the
geometric median, establish connections between performance of these
distributed algorithms and rates of convergence in normal approximation, and
provide tight deviations guarantees for resulting estimators in the form of
exponential concentration inequalities. We illustrate our techniques with
several examples; in particular, we obtain new results for the median-of-means
estimator, as well as provide performance guarantees for robust distributed
maximum likelihood estimation.
Vasco T. Vasconcelos (University of Lisbon), Philipp Haller (KTH Royal Institute of Technology)
Journal-ref: EPTCS 246, 2017
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)
PLACES 2017 (full title: Programming Language Approaches to Concurrency- and
Communication-cEntric Software) is the tenth edition of the PLACES workshop
series. After the first PLACES, which was affiliated to DisCoTec in 2008, the
workshop has been part of ETAPS every year since 2009 and is now an established
part of the ETAPS satellite events. PLACES 2017 was held on 29th April in
Uppsala, Sweden. The workshop series was started in order to promote the
application of novel programming language ideas to the increasingly important
problem of developing software for systems in which concurrency and
communication are intrinsic aspects. This includes software for both multi-core
systems and large-scale distributed and/or service-oriented systems. The scope
of PLACES includes new programming language features, whole new programming
language designs, new type systems, new semantic approaches, new program
analysis techniques, and new implementation mechanisms. This volume consists of
the papers accepted for presentation at the workshop.
Lihi Cohen, Yuval Emek, Oren Louidor, Jara Uitto
Comments: 33 pages
Subjects: Probability (math.PR); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
Consider a small number of scouts exploring the infinite (d)-dimensional grid
with the aim of hitting a hidden target point. Each scout is controlled by a
probabilistic finite automaton that determines its movement (to a neighboring
grid point) based on its current state. The scouts, that operate under a fully
synchronous schedule, communicate with each other (in a way that affects their
respective states) when they share the same grid point and operate
independently otherwise. Our main research question is: How many scouts are
required to guarantee that the target admits a finite mean hitting time?
Recently, it was shown that (d + 1) is an upper bound on the answer to this
question for any dimension (d geq 1) and the main contribution of this paper
comes in the form of proving that this bound is tight for (d in { 1, 2 }).
Ahmed M. Alaa, Mihaela van der Schaar
Subjects: Learning (cs.LG)
We consider the problem of obtaining individualized estimates for the effect
of a certain treatment given observational data. The problem differs
fundamentally from classical supervised learning since for each individual
subject, we either observe the response with or without the treatment but never
both. Hence, estimating the effect of a treatment entails a causal inference
task in which we need to estimate counterfactual outcomes. To address this
problem, we propose a novel multi-task learning framework in which the
individuals’ responses with and without the treatment are modeled as a
vector-valued function that belongs to a reproducing kernel Hilbert space.
Unlike previous methods for causal inference that use the G-computation
formula, our approach does not obtain separate estimates for the treatment and
control response surfaces, but rather obtains a joint estimate that ensures
data efficiency in scenarios where the selection bias is strong. In order to be
able to provide individualized measures of uncertainty in our estimates, we
adopt a Bayesian approach for learning this vector-valued function using a
multi-task Gaussian process prior; uncertainty is quantified via posterior
credible intervals. We develop a novel risk based empirical Bayes approach for
calibrating the Gaussian process hyper-parameters in a data-driven fashion
based on gradient descent in which the update rule is itself learned from the
data using a recurrent neural network. Experiments conducted on semi-synthetic
data show that our algorithm significantly outperforms state-of-the-art causal
inference methods.
Israa Ahmed Zriqat, Ahmad Mousa Altamimi, Mohammad Azzeh
Journal-ref: ISSN 1947-5500
Subjects: Learning (cs.LG)
Improving the precision of heart diseases detection has been investigated by
many researchers in the literature. Such improvement induced by the
overwhelming health care expenditures and erroneous diagnosis. As a result,
various methodologies have been proposed to analyze the disease factors aiming
to decrease the physicians practice variation and reduce medical costs and
errors. In this paper, our main motivation is to develop an effective
intelligent medical decision support system based on data mining techniques. In
this context, five data mining classifying algorithms, with large datasets,
have been utilized to assess and analyze the risk factors statistically related
to heart diseases in order to compare the performance of the implemented
classifiers (e.g., Na”ive Bayes, Decision Tree, Discriminant, Random Forest,
and Support Vector Machine). To underscore the practical viability of our
approach, the selected classifiers have been implemented using MATLAB tool with
two datasets. Results of the conducted experiments showed that all
classification algorithms are predictive and can give relatively correct
answer. However, the decision tree outperforms other classifiers with an
accuracy rate of 99.0% followed by Random forest. That is the case because both
of them have relatively same mechanism but the Random forest can build ensemble
of decision tree. Although ensemble learning has been proved to produce
superior results, but in our case the decision tree has outperformed its
ensemble version.
Meire Fortunato, Charles Blundell, Oriol Vinyals
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In this work we explore a straightforward variational Bayes scheme for
Recurrent Neural Networks. Firstly, we show that a simple adaptation of
truncated backpropagation through time can yield good quality uncertainty
estimates and superior regularisation at only a small extra computational cost
during training. Secondly, we demonstrate how a novel kind of posterior
approximation yields further improvements to the performance of Bayesian RNNs.
We incorporate local gradient information into the approximate posterior to
sharpen it around the current batch statistics. This technique is not exclusive
to recurrent neural networks and can be applied more widely to train Bayesian
neural networks. We also empirically demonstrate how Bayesian RNNs are superior
to traditional RNNs on a language modelling benchmark and an image captioning
task, as well as showing how each of these methods improve our model over a
variety of other schemes for training them. We also introduce a new benchmark
for studying uncertainty for language models so future methods can be easily
compared.
Richard Nock, Frank Nielsen
Subjects: Learning (cs.LG)
In Valiant’s model of evolution, a class of representations is evolvable iff
a polynomial-time process of random mutations guided by selection converges
with high probability to a representation as (epsilon)-close as desired from
the optimal one, for any required (epsilon>0). Several previous positive
results exist that can be related to evolving a vector space, but each former
result imposes restrictions either on (re)initialisations, distributions,
performance functions and/or the mutator. In this paper, we show that all it
takes to evolve a complete normed vector space is merely a set that generates
the space. Furthermore, it takes only ( ilde{O}(1/epsilon^2)) steps and it is
essentially strictly monotonic, agnostic and handles target drifts that rival
some proven in fairly restricted settings. In the context of the model, we
bring to the fore new results not documented previously. Evolution appears to
occur in a mean-divergence model reminiscent of Markowitz mean-variance model
for portfolio selection, and the risk-return efficient frontier of evolution
shows an interesting pattern: when far from the optimum, the mutator always has
access to mutations close to the efficient frontier. Toy experiments in
supervised and unsupervised learning display promising directions for this
scheme to be used as a (new) provable gradient-free stochastic optimisation
algorithm.
Vincenzo Liguori
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
This paper explores the use of Pyramid Vector Quantization (PVQ) to reduce
the computational cost for a variety of neural networks (NNs) while, at the
same time, compressing the weights that describe them. This is based on the
fact that the dot product between an N dimensional vector of real numbers and
an N dimensional PVQ vector can be calculated with only additions and
subtractions and one multiplication. This is advantageous since tensor
products, commonly used in NNs, can be re-conduced to a dot product or a set of
dot products. Finally, it is stressed that any NN architecture that is based on
an operation that can be re-conduced to a dot product can benefit from the
techniques described here.
Sadegh Eskandari, Emre Akbas
Subjects: Learning (cs.LG)
In this paper, we present a new feature selection method that is suitable for
both unsupervised and supervised problems. We build upon the recently proposed
Infinite Feature Selection (IFS) method where feature subsets of all sizes
(including infinity) are considered. We extend IFS in two ways. First, we
propose a supervised version of it. Second, we propose new ways of forming the
feature adjacency matrix that perform better for unsupervised problems. We
extensively evaluate our methods on many benchmark datasets, including large
image-classification datasets (PASCAL VOC), and show that our methods
outperform both the IFS and the widely used “minimum-redundancy
maximum-relevancy (mRMR)” feature selection algorithm.
Keigo Kimura, Lu Sun, Mineichi Kudo
Comments: Instruction pages are now under construction
Subjects: Learning (cs.LG)
Multi-Label Classification toolbox is a MATLAB/OCTAVE library for Multi-Label
Classification (MLC). There exists a few Java libraries for MLC, but no
MATLAB/OCTAVE library that covers various methods. This toolbox offers an
environment for evaluation, comparison and visualization of the MLC results.
One attraction of this toolbox is that it enables us to try many combinations
of feature space dimension reduction, sample clustering, label space dimension
reduction and ensemble, etc.
Yang Liu, Prajit Ramachandran, Qiang Liu, Jian Peng
Subjects: Learning (cs.LG)
Policy gradient methods have been successfully applied to many complex
reinforcement learning problems. However, policy gradient methods suffer from
high variance, slow convergence, and inefficient exploration. In this work, we
introduce a maximum entropy policy optimization framework which explicitly
encourages parameter exploration, and show that this framework can be reduced
to a Bayesian inference problem. We then propose a novel Stein variational
policy gradient method (SVPG) which combines existing policy gradient methods
and a repulsive functional to generate a set of diverse but well-behaved
policies. SVPG is robust to initialization and can easily be implemented in a
parallel manner. On continuous control problems, we find that implementing SVPG
on top of REINFORCE and advantage actor-critic algorithms improves both average
return and data efficiency.
Amit Dhurandhar, Margareta Ackerman, Xiang Wang
Comments: accepted to SDM 2017 (oral)
Subjects: Learning (cs.LG)
Clustering is a widely-used data mining tool, which aims to discover
partitions of similar items in data. We introduce a new clustering paradigm,
emph{accordant clustering}, which enables the discovery of (predefined) group
level insights. Unlike previous clustering paradigms that aim to understand
relationships amongst the individual members, the goal of accordant clustering
is to uncover insights at the group level through the analysis of their
members. Group level insight can often support a call to action that cannot be
informed through previous clustering techniques. We propose the first accordant
clustering algorithm, and prove that it finds near-optimal solutions when data
possesses inherent cluster structure. The insights revealed by accordant
clusterings enabled experts in the field of medicine to isolate successful
treatments for a neurodegenerative disease, and those in finance to discover
patterns of unnecessary spending.
Luciana Ferrer
Comments: Technical report
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Standard probabilistic discriminant analysis (PLDA) for speaker recognition
assumes that the sample’s features (usually, i-vectors) are given by a sum of
three terms: a term that depends on the speaker identity, a term that models
the within-speaker variability and is assumed independent across samples, and a
final term that models any remaining variability and is also independent across
samples. In this work, we propose a generalization of this model where the
within-speaker variability is not necessarily assumed independent across
samples but dependent on another discrete variable. This variable, which we
call the channel variable as in the standard PLDA approach, could be, for
example, a discrete category for the channel characteristics, the language
spoken by the speaker, the type of speech in the sample (conversational,
monologue, read), etc. The value of this variable is assumed to be known during
training but not during testing. Scoring is performed, as in standard PLDA, by
computing a likelihood ratio between the null hypothesis that the two sides of
a trial belong to the same speaker versus the alternative hypothesis that the
two sides belong to different speakers. The two likelihoods are computed by
marginalizing over two hypothesis about the channels in both sides of a trial:
that they are the same and that they are different. This way, we expect that
the new model will be better at coping with same-channel versus
different-channel trials than standard PLDA, since knowledge about the channel
(or language, or speech style) is used during training and implicitly
considered during scoring.
Ershad Banijamali, Ali Ghodsi
Comments: 8 Pages- Accepted in 14th International Conference on Image Analysis and Recognition
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we introduce an algorithm for performing spectral clustering
efficiently. Spectral clustering is a powerful clustering algorithm that
suffers from high computational complexity, due to eigen decomposition. In this
work, we first build the adjacency matrix of the corresponding graph of the
dataset. To build this matrix, we only consider a limited number of points,
called landmarks, and compute the similarity of all data points with the
landmarks. Then, we present a definition of the Laplacian matrix of the graph
that enable us to perform eigen decomposition efficiently, using a deep
autoencoder. The overall complexity of the algorithm for eigen decomposition is
(O(np)), where (n) is the number of data points and (p) is the number of
landmarks. At last, we evaluate the performance of the algorithm in different
experiments.
Arturs Backurs, Piotr Indyk, Ludwig Schmidt
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)
Empirical risk minimization (ERM) is ubiquitous in machine learning and
underlies most supervised learning methods. While there has been a large body
of work on algorithms for various ERM problems, the exact computational
complexity of ERM is still not understood. We address this issue for multiple
popular ERM problems including kernel SVMs, kernel ridge regression, and
training the final layer of a neural network. In particular, we give
conditional hardness results for these problems based on complexity-theoretic
assumptions such as the Strong Exponential Time Hypothesis. Under these
assumptions, we show that there are no algorithms that solve the aforementioned
ERM problems to high accuracy in sub-quadratic time. We also give similar
hardness results for computing the gradient of the empirical loss, which is the
main computational burden in many non-convex learning tasks.
Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri, Philip H. S. Torr, Puneet K. Dokania
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)
This paper describes an intuitive generalization to the Generative
Adversarial Networks (GANs) to generate samples while capturing diverse modes
of the true data distribution. Firstly, we propose a very simple and intuitive
multi-agent GAN architecture that incorporates multiple generators capable of
generating samples from high probability modes. Secondly, in order to enforce
different generators to generate samples from diverse modes, we propose two
extensions to the standard GAN objective function. (1) We augment the generator
specific GAN objective function with a diversity enforcing term that encourage
different generators to generate diverse samples using a user-defined
similarity based function. (2) We modify the discriminator objective function
where along with finding the real and fake samples, the discriminator has to
predict the generator which generated the given fake sample. Intuitively, in
order to succeed in this task, the discriminator must learn to push different
generators towards different identifiable modes. Our framework is generalizable
in the sense that it can be easily combined with other existing variants of
GANs to produce diverse samples. Experimentally we show that our framework is
able to produce high quality diverse samples for the challenging tasks such as
image/face generation and image-to-image translation. We also show that it is
capable of learning a better feature representation in an unsupervised setting.
Martin Simonovsky, Nikos Komodakis
Comments: Accepted to CVPR 2017; extended version
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
A number of problems can be formulated as prediction on graph-structured
data. In this work, we generalize the convolution operator from regular grids
to arbitrary graphs while avoiding the spectral domain, which allows us to
handle graphs of varying size and connectivity. To move beyond a simple
diffusion, filter weights are conditioned on the specific edge labels in the
neighborhood of a vertex. Together with the proper choice of graph coarsening,
we explore constructing deep neural networks for graph classification. In
particular, we demonstrate the generality of our formulation in point cloud
classification, where we set the new state of the art, and on a graph
classification dataset, where we outperform other deep learning approaches.
Sven Banisch, Eckehard Olbrich
Comments: Submitted to the Social Simulation Conference (Dublin 2017)
Subjects: Physics and Society (physics.soc-ph); Learning (cs.LG); Social and Information Networks (cs.SI); Adaptation and Self-Organizing Systems (nlin.AO)
We explore a new mechanism to explain polarization phenomena in opinion
dynamics. The model is based on the idea that agents evaluate alternative views
on the basis of the social feedback obtained on expressing them. A high support
of the favored and therefore expressed opinion in the social environment, is
treated as a positive social feedback which reinforces the value associated to
this opinion. In this paper we concentrate on the model with dyadic
communication and encounter probabilities defined by an unweighted,
time-homogeneous network. The model captures polarization dynamics more
plausibly compared to bounded confidence opinion models and avoids extensive
opinion flipping usually present in binary opinion dynamics. We perform
systematic simulation experiments to understand the role of network
connectivity for the emergence of polarization.
El Mahdi El Mhamdi, Rachid Guerraoui, Hadrien Hendrikx, Alexandre Maurer
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Recent progress in artificial intelligence enabled the design and
implementation of autonomous computing devices, agents, that may interact and
learn from each other to achieve certain goals. Sometimes however, a human
operator needs to intervene and interrupt an agent in order to prevent certain
dangerous situations. Yet, as part of their learning process, agents may link
these interruptions that impact their reward to specific states, and
deliberately avoid them. The situation is particularly challenging in a
distributed context because agents might not only learn from their own past
interruptions, but also from those of other agents. This paper defines the
notion of safe interruptibility as a distributed computing problem, and studies
this notion in the two main learning frameworks: joint action learners and
independent learners. We give realistic sufficient conditions on the learning
algorithm for safe interruptibility in the case of joint action learners, yet
show that these conditions are not sufficient for independent learners. We show
however that if agents can detect interruptions, it is possible to prune the
observations to ensure safe interruptibility even for independent learners
Huiling Zhen, Shang-Nan Wang, Hai-Jun Zhou
Comments: 10 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)
Unsupervised learning in a generalized Hopfield associative-memory network is
investigated in this work. First, we prove that the (generalized) Hopfield
model is equivalent to a semi-restricted Boltzmann machine with a layer of
visible neurons and another layer of hidden binary neurons, so it could serve
as the building block for a multilayered deep-learning system. We then
demonstrate that the Hopfield network can learn to form a faithful internal
representation of the observed samples, with the learned memory patterns being
prototypes of the input data. Furthermore, we propose a spectral method to
extract a small set of concepts (idealized prototypes) as the most concise
summary or abstraction of the empirical data.
Mahardhika Pratama, Plamen P. Angelov, Edwin Lughofer, Meng Joo Er
Comments: this paper is submitted for publication in Information Sciences
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
The theory of random vector functional link network (RVFLN) has provided a
breakthrough in the design of neural networks (NNs) since it conveys solid
theoretical justification of randomized learning. Existing works in RVFLN are
hardly scalable for data stream analytics because they are inherent to the
issue of complexity as a result of the absence of structural learning
scenarios. A novel class of RVLFN, namely parsimonious random vector functional
link network (pRVFLN), is proposed in this paper. pRVFLN features an open
structure paradigm where its network structure can be built from scratch and
can be automatically generated in accordance with degree of nonlinearity and
time-varying property of system being modelled. pRVFLN is equipped with
complexity reduction scenarios where inconsequential hidden nodes can be pruned
and input features can be dynamically selected. pRVFLN puts into perspective an
online active learning mechanism which expedites the training process and
relieves operator labelling efforts. In addition, pRVFLN introduces a
non-parametric type of hidden node, developed using an interval-valued data
cloud. The hidden node completely reflects the real data distribution and is
not constrained by a specific shape of the cluster. All learning procedures of
pRVFLN follow a strictly single-pass learning mode, which is applicable for an
online real-time deployment. The efficacy of pRVFLN was rigorously validated
through numerous simulations and comparisons with state-of-the art algorithms
where it produced the most encouraging numerical results. Furthermore, the
robustness of pRVFLN was investigated and a new conclusion is made to the scope
of random parameters where it plays vital role to the success of randomized
learning.
L. Martino, V. Elvira, G. Camps-Valls
Subjects: Computation (stat.CO); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Importance Sampling (IS) is a well-known Monte Carlo technique that
approximates integrals involving a posterior distribution by means of weighted
samples. In this work, we study the assignation of a single weighted sample
which compresses the information contained in a population of weighted samples.
Part of the theory that we present as Group Importance Sampling (GIS) has been
employed implicitly in different works in the literature. The provided analysis
yields several theoretical and practical consequences. For instance, we discuss
the application of GIS into the Sequential Importance Resampling (SIR)
framework and show that Independent Multiple Try Metropolis (I-MTM) schemes can
be interpreted as a standard Metropolis-Hastings algorithm, following the GIS
approach. We also introduce two novel Markov Chain Monte Carlo (MCMC)
techniques based on GIS. The first one, named Group Metropolis Sampling (GMS)
method, produces a Markov chain of sets of weighted samples. All these sets are
then employed for obtaining a unique global estimator. The second one is the
Distributed Particle Metropolis-Hastings (DPMH) technique, where different
parallel particle filters are jointly used to drive an MCMC algorithm.
Different resampled trajectories are compared and then tested with a proper
acceptance probability. The novel schemes are tested in different numerical
experiments such as learning the hyperparameters of Gaussian Processes (GP),
the localization problem in a sensor network and the tracking of the Leaf Area
Index (LAI), where they are compared with several benchmark Monte Carlo
techniques. Three descriptive Matlab demos are also provided.
Angelia Nedić, Alex Olshevsky, César A. Uribe
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Multiagent Systems (cs.MA); Probability (math.PR); Machine Learning (stat.ML)
We study the problem of cooperative inference where a group of agents
interact over a network and seek to estimate a joint parameter that best
explains a set of observations. Agents do not know the network topology or the
observations of other agents. We explore a variational interpretation of the
Bayesian posterior density, and its relation to the stochastic mirror descent
algorithm, to propose a new distributed learning algorithm. We show that, under
appropriate assumptions, the beliefs generated by the proposed algorithm
concentrate around the true parameter exponentially fast. We provide explicit
non-asymptotic bounds for the convergence rate. Moreover, we develop explicit
and computationally efficient algorithms for observation models belonging to
exponential families.
Zheng Xu, Mario A. T. Figueiredo, Xiaoming Yuan, Christoph Studer, Tom Goldstein
Comments: CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Numerical Analysis (cs.NA)
Many modern computer vision and machine learning applications rely on solving
difficult optimization problems that involve non-differentiable objective
functions and constraints. The alternating direction method of multipliers
(ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a
generalization of ADMM that often achieves better performance, but its
efficiency depends strongly on algorithm parameters that must be chosen by an
expert user. We propose an adaptive method that automatically tunes the key
algorithm parameters to achieve optimal performance without user oversight.
Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM
(ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A
detailed convergence analysis of ARADMM is provided, and numerical results on
several applications demonstrate fast practical convergence.
Eric Bailey, Shuchin Aeron
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)
Most popular word embedding techniques involve implicit or explicit
factorization of a word co-occurrence based matrix into low rank factors. In
this paper, we aim to generalize this trend by using numerical methods to
factor higher-order word co-occurrence based arrays, or extit{tensors}. We
present four word embeddings using tensor factorization and analyze their
advantages and disadvantages. One of our main contributions is a novel joint
symmetric tensor factorization technique related to the idea of coupled tensor
factorization. We show that embeddings based on tensor factorization can be
used to discern the various meanings of polysemous words without being
explicitly trained to do so, and motivate the intuition behind why this works
in a way that doesn’t with existing methods. We also modify an existing word
embedding evaluation metric known as Outlier Detection [Camacho-Collados and
Navigli, 2016] to evaluate the quality of the order-(N) relations that a word
embedding captures, and show that tensor-based methods outperform existing
matrix-based methods at this task. Experimentally, we show that all of our word
embeddings either outperform or are competitive with state-of-the-art baselines
commonly used today on a variety of recent datasets. Suggested applications of
tensor factorization-based word embeddings are given, and all source code and
pre-trained vectors are publicly available online.
Avanti Shrikumar, Peyton Greenside, Anshul Kundaje
Comments: 9 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The purported “black box”‘ nature of neural networks is a barrier to adoption
in applications where interpretability is essential. Here we present DeepLIFT
(Deep Learning Important FeaTures), a method for decomposing the output
prediction of a neural network on a specific input by backpropagating the
contributions of all neurons in the network to every feature of the input.
DeepLIFT compares the activation of each neuron to its ‘reference activation’
and assigns contribution scores according to the difference. By optionally
giving separate consideration to positive and negative contributions, DeepLIFT
can also reveal dependencies which are missed by other approaches. Scores can
be computed efficiently in a single backward pass. We apply DeepLIFT to models
trained on MNIST and simulated genomic data, and show significant advantages
over gradient-based methods. A detailed video tutorial on the method is at
this http URL and code is at this http URL
Arjun Nitin Bhagoji, Daniel Cullina, Prateek Mittal
Comments: 20 pages
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
We propose the use of dimensionality reduction as a defense against evasion
attacks on ML classifiers. We present and investigate a strategy for
incorporating dimensionality reduction via Principal Component Analysis to
enhance the resilience of machine learning, targeting both the classification
and the training phase. We empirically evaluate and demonstrate the feasibility
of dimensionality reduction of data as a defense mechanism against evasion
attacks using multiple real-world datasets. Our key findings are that the
defenses are (i) effective against strategic evasion attacks in the literature,
increasing the resources required by an adversary for a successful attack by a
factor of about two, (ii) applicable across a range of ML classifiers,
including Support Vector Machines and Deep Neural Networks, and (iii)
generalizable to multiple application domains, including image classification,
malware classification, and human activity classification.
Ahmad El Sallab, Mohammed Abdou, Etienne Perot, Senthil Yogamani
Comments: Reprinted with permission of IS&T: The Society for Imaging Science and Technology, sole copyright owners of Electronic Imaging, Autonomous Vehicles and Machines 2017
Journal-ref: IS&T Electronic Imaging, Autonomous Vehicles and Machines 2017,
AVM-023, pg. 70-76 (2017)
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Robotics (cs.RO)
Reinforcement learning is considered to be a strong AI paradigm which can be
used to teach machines through interaction with the environment and learning
from their mistakes. Despite its perceived utility, it has not yet been
successfully applied in automotive applications. Motivated by the successful
demonstrations of learning of Atari games and Go by Google DeepMind, we propose
a framework for autonomous driving using deep reinforcement learning. This is
of particular relevance as it is difficult to pose autonomous driving as a
supervised learning problem due to strong interactions with the environment
including other vehicles, pedestrians and roadworks. As it is a relatively new
area of research for autonomous driving, we provide a short overview of deep
reinforcement learning and then describe our proposed framework. It
incorporates Recurrent Neural Networks for information integration, enabling
the car to handle partially observable scenarios. It also integrates the recent
work on attention models to focus on relevant information, thereby reducing the
computational complexity for deployment on embedded hardware. The framework was
tested in an open source 3D car racing simulator called TORCS. Our simulation
results demonstrate learning of autonomous maneuvering in a scenario of complex
road curvatures and simple interaction of other vehicles.
Achintya Kr. Sarkar, Zheng-Hua Tan
Subjects: Sound (cs.SD); Learning (cs.LG)
In this paper, we present a time-contrastive learning (TCL)based unsupervised
bottleneck (BN) feature extraction method for speech signals with an
application to speaker verification. The method exploits the temporal structure
of a speech signal and more specifically, it trains deep neural networks (DNNs)
to discriminate temporal events obtained by uniformly segmenting the signal
without using any label information, in contrast to conventional DNN based BN
feature extraction methods that train DNNs using labeled data to discriminate
speakers or passphrases or phones or a combination of them. We consider
different strategies for TCL and its combination with transfer learning.
Experimental results on the RSR2015 database show that the TCL method is
superior to the conventional speaker and pass-phrase discriminant BN feature
and Mel-frequency cepstral coefficients (MFCCs) feature for text-dependent
speaker verification. The unsupervised TCL method further has the advantage of
being able to leverage the huge amount of unlabeled data that are often
available in real life.
Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)
Voice conversion (VC) using sequence-to-sequence learning of context
posterior probabilities is proposed. Conventional VC using shared context
posterior probabilities predicts target speech parameters from the context
posterior probabilities estimated from the source speech parameters. Although
conventional VC can be built from non-parallel data, it is difficult to convert
speaker individuality such as phonetic property and speaking rate contained in
the posterior probabilities because the source posterior probabilities are
directly used for predicting target speech parameters. In this work, we assume
that the training data partly include parallel speech data and propose
sequence-to-sequence learning between the source and target posterior
probabilities. The conversion models perform non-linear and variable-length
transformation from the source probability sequence to the target one. Further,
we propose a joint training algorithm for the modules. In contrast to
conventional VC, which separately trains the speech recognition that estimates
posterior probabilities and the speech synthesis that predicts target speech
parameters, our proposed method jointly trains these modules along with the
proposed probability conversion modules. Experimental results demonstrate that
our approach outperforms the conventional VC.
Loren Lugosch, Warren J. Gross
Subjects: Information Theory (cs.IT); Learning (cs.LG)
Recently, it was shown that if multiplicative weights are assigned to the
edges of a Tanner graph used in belief propagation decoding, it is possible to
use deep learning techniques to find values for the weights which improve the
error-correction performance of the decoder. Unfortunately, this approach
requires many multiplications, which are generally expensive operations. In
this paper, we suggest a more hardware-friendly approach in which offset
min-sum decoding is augmented with learnable offset parameters. Our method uses
no multiplications and has a parameter count less than half that of the
multiplicative algorithm. This both speeds up training and provides a feasible
path to hardware architectures. After describing our method, we compare the
performance of the two neural decoding algorithms and show that our method
achieves error-correction performance within 0.1 dB of the multiplicative
approach and as much as 1 dB better than traditional belief propagation for the
codes under consideration.
Wei Feng, Yanmin Wang, Dengsheng Lin, Ning Ge, Jianhua Lu, Shaoqian Li
Comments: 12 pages, 6 figures, accepted by IEEE JSAC
Journal-ref: IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2017
Subjects: Information Theory (cs.IT)
The millimeter-wave (mmWave) communication is envisioned to provide orders of
magnitude capacity improvement. However, it is challenging to realize a
sufficient link margin due to high path loss and blockages. To address this
difficulty, in this paper, we explore the potential gain of ultra-densification
for enhancing mmWave communications from a network-level perspective. By
deploying the mmWave base stations (BSs) in an extremely dense and amorphous
fashion, the access distance is reduced and the choice of serving BSs is
enriched for each user, which are intuitively effective for mitigating the
propagation loss and blockages. Nevertheless, co-channel interference under
this model will become a performance-limiting factor. To solve this problem, we
propose a large-scale channel state information (CSI) based interference
coordination approach. Note that the large-scale CSI is highly
location-dependent, and can be obtained with a quite low cost. Thus, the
scalability of the proposed coordination framework can be guaranteed.
Particularly, using only the large-scale CSI of interference links, a
coordinated frequency resource block allocation problem is formulated for
maximizing the minimum achievable rate of the users, which is uncovered to be a
NP-hard integer programming problem. To circumvent this difficulty, a greedy
scheme with polynomial-time complexity is proposed by adopting the bisection
method and linear integer programming tools. Simulation results demonstrate
that the proposed coordination scheme based on large-scale CSI only can still
offer substantial gains over the existing methods. Moreover, although the
proposed scheme is only guaranteed to converge to a local optimum, it performs
well in terms of both user fairness and system efficiency.
Jialing Liao, Muhammad R. A. Khandaker, Kai-Kit Wong
Subjects: Information Theory (cs.IT)
This letter considers simultaneous wireless information and power transfer
(SWIPT) for a multiple-input multiple-output (MIMO) relay system. The relay is
powered by harvesting energy from the source via time switching (TS) and
utilizes the harvested energy to forward the information signal. Our aim is to
maximize the rate of the system subject to the power constraints at both the
source and relay nodes. In the first scenario in which the source covariance
matrix is an identity matrix, we present the joint-optimal solution for
relaying and the TS ratio in closed form. An iterative scheme is then proposed
for jointly optimizing the source and relaying matrices and the TS ratio.
Ioannis Dimitriou, Nikolaos Pappas
Comments: Submitted for journal publication
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
In this work we consider a two-user and a three-user slotted ALOHA network
with multi-packet reception (MPR) capabilities. The nodes can adapt their
transmission probabilities and their transmission parameters based on the
status of the other nodes. Each user has external bursty arrivals that are
stored in their infinite capacity queues. For the two- and the three-user cases
we obtain the stability region of the system. For the two-user case we provide
the conditions where the stability region is a convex set. We perform a
detailed mathematical analysis in order to study the queueing delay by
formulating two boundary value problems (a Dirichlet and a Riemann-Hilbert
boundary value problem), the solution of which provides the generating function
of the joint stationary probability distribution of the queue size at user
nodes. Furthermore, for the two-user symmetric case with MPR we obtain a lower
and an upper bound for the average delay without explicitly computing the
generating function for the stationary joint queue length distribution. The
bounds as it is seen in the numerical results appear to be tight. Explicit
expressions for the average delay are obtained for the symmetrical model with
capture effect which is a subclass of MPR models. We also provide the optimal
transmission probability in closed form expression that minimizes the average
delay in the symmetric capture case. Finally, we evaluate numerically the
presented theoretical results.
Trygve Johnsen, Hugues Verdure
Subjects: Information Theory (cs.IT)
We describe a two-party wire-tap channel of type II in the framework of
almost affine codes. Its cryptological performance is related to some relative
profiles of a pair of almost affine codes. These profiles are analogues of
relative generalized Hamming weights in the linear case.
Zeinab Yazdanshenasan, Harpreet S. Dhillon, Peter Han Joo Chong
Comments: Proc., Biennial Symposium on Communications (BSC), 2016
Subjects: Information Theory (cs.IT)
Heterogeneous cellular networks (HCNs) usually exhibit spatial separation
amongst base stations (BSs) of different types (termed tiers in this paper).
For instance, operators will usually not deploy a picocell in close proximity
to a macrocell, thus inducing separation amongst the locations of pico and
macrocells. This separation has recently been captured by modeling the small
cell locations by a Poisson Hole Process (PHP) with the hole centers being the
locations of the macrocells. Due to the presence of exclusion zones, the
analysis of the resulting model is significantly more complex compared to the
more popular Poisson Point Process (PPP) based models. In this paper, we derive
a tight bound on the distribution of the distance of a typical user to the
closest point of a PHP. Since the exact distribution of this distance is not
known, it is often approximated in the literature. For this model, we then
provide tight characterization of the downlink coverage probability for a
typical user in a two-tier closed-access HCN under two cases: (i) typical user
is served by the closest macrocell, and (ii) typical user is served by its
closest small cell. The proposed approach can be extended to analyze other
relevant cases of interest, e.g., coverage in a PHP-based open access HCN.
Keerthi Kumar Nagalapur, Erik G. Ström, Fredrik Brännström, Jan Carlsson, Kristian Karlsson
Subjects: Information Theory (cs.IT)
For critical services, such as traffic safety and traffic efficiency, it is
advisable to design systems with robustness as the main criteria, possibly at
the price of reduced peak performance and efficiency. Ensuring robust
communications in case of embedded or hidden antennas is a challenging task due
to nonisotropic radiation patterns of these antennas. The challenges due to the
nonisotropic radiation patterns can be overcome with the use of multiple
antennas. In this paper, we describe a simple, low-cost method for combining
the output of multiple nonisotropic antennas to guarantee robustness, i.e.,
support reliable communications in worst-case scenarios. The combining method
is designed to minimize the burst error probability, i.e., the probability of
consecutive decoding errors of status messages arriving periodically at a
receiver from an arbitrary angle of arrival. The proposed method does not
require the knowledge of instantaneous signal-to-noise ratios or the
complex-valued channel gains at the antenna outputs. The proposed method is
applied to measured and theoretical antenna radiation patterns, and it is shown
that the method supports robust communications from an arbitrary angle of
arrival.
Yo-Seb Jeon, Namyoon Lee, Ravi Tandon
Comments: Submitted to IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)
This paper considers a (K)-cell multiple access channel with inter-symbol
interference. The primary finding of this paper is that, without instantaneous
channel state information at the transmitters (CSIT), the sum
degrees-of-freedom (DoF) of the considered channel is (frac{eta -1}{eta}K)
with (eta geq 2) when the number of users per cell is sufficiently large,
where (eta) is the ratio of the maximum channel-impulse-response (CIR) length
of desired links to that of interfering links in each cell. Our finding implies
that even without instantaneous CSIT, extit{interference-free DoF per cell}
is achievable as (eta) approaches infinity with a sufficiently large number
of users per cell. This achievability is shown by a blind interference
management method that exploits the relativity in delay spreads between desired
and interfering links. In this method, all inter-cell-interference signals are
aligned to the same direction by using a discrete-Fourier-transform-based
precoding with cyclic prefix that only depends on the number of CIR taps. Using
this method, we also characterize the achievable sum rate of the considered
channel, in a closed-form expression.
Zheng Wang, Cong Ling
Comments: submitted to Transaction on Information Theory
Subjects: Information Theory (cs.IT)
Sampling from the lattice Gaussian distribution is an efficient way for
solving the closest vector problem (CVP) in lattice decoding. In this paper,
decoding by MCMC-based lattice Gaussian sampling is investigated in full
details. First of all, the spectral gap of the transition matrix of the Markov
chain induced by the independent Metropolis-Hastings-Klein (MHK) algorithm is
derived, dictating an exponential convergence rate to the target lattice
Gaussian distribution. Then, the decoding complexity of CVP is derived as
(O(e^{d^2(Lambda, mathbf{c})/min_i^2|widehat{mathbf{b}}_i|})), where
(d(Lambda, mathbf{c})) represents the Euclidean distance between the query
point (mathbf{c}) and the lattice (Lambda), and (mathbf{widehat{b}}_i) is
the (i)th Gram-Schmidt vector of the lattice basis (mathbf{B}). Furthermore,
the decoding radius from the perspective of bounded distance decoding (BDD)
given a fixed number of Markov moves (t) is also derived, revealing a flexible
trade-off between the decoding performance and complexity. Finally, by taking
advantages of (k) trial samples from the proposal distribution, the independent
multiple-try Metropolis-Klein (MTMK) algorithm is proposed to further enhance
the exponential convergence rate. By adjusting (k), the independent MTMK
sampler enjoys a flexible decoding performance, where the independent MHK
algorithm is just a case with (k=1). Additionally, the proposed decoding allows
a fully parallel implementation, which is beneficial for the practical
interest.
Gozde Ozcan, M. Cenk Gursoy, Jian Tang
Comments: This paper is accepted for publication in IEEE Transactions on Communications
Subjects: Information Theory (cs.IT)
This paper studies energy efficiency (EE) and average throughput maximization
for cognitive radio systems in the presence of unslotted primary users. It is
assumed that primary user activity follows an ON-OFF alternating renewal
process. Secondary users first sense the channel possibly with errors in the
form of miss detections and false alarms, and then start the data transmission
only if no primary user activity is detected. The secondary user transmission
is subject to constraints on collision duration ratio, which is defined as the
ratio of average collision duration to transmission duration. In this setting,
the optimal power control policy which maximizes the EE of the secondary users
or maximizes the average throughput while satisfying a minimum required EE
under average/peak transmit power and average interference power constraints
are derived. Subsequently, low-complexity algorithms for jointly determining
the optimal power level and frame duration are proposed. The impact of
probabilities of detection and false alarm, transmit and interference power
constraints on the EE, average throughput of the secondary users, optimal
transmission power, and the collisions with primary user transmissions are
evaluated. In addition, some important properties of the collision duration
ratio are investigated. The tradeoff between the EE and average throughput
under imperfect sensing decisions and different primary user traffic are
further analyzed.
Alexander M. Romanov
Subjects: Information Theory (cs.IT)
The paper deals with the perfect 1-error correcting codes over a finite field
with (q) elements (briefly (q)-ary 1-perfect codes). We show that the
orthogonal code to the (q)-ary non-full-rank 1-perfect code of length (n =
(q^{m}-1)/(q-1)) is a (q)-ary constant-weight code with Hamming weight equals
to (q^{m – 1}) where (m) is any natural number not less than two. We derive
necessary and sufficient conditions for (q)-ary 1-perfect codes of non-full
rank. We suggest a generalization of the concatenation construction to the
(q)-ary case and construct the ternary 1-perfect codes of length 13 and rank
12.
Ahmed El Shafie, Ahmed Sultan, Ioannis Krikidis, Naofal Al-Dhahir
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
We derive the mutual information (MI) of the wireless links in a buffer-aided
full-duplex (FD) multiple-input multiple-output (MIMO) relaying network. The FD
relay still suffers from residual self-interference (RSI), after the
application of self-interference mitigation techniques. We investigate both
cases of the slow-RSI channel, where the RSI is fixed over the entire codeword,
and the fast-RSI channel, where the RSI changes from one symbol duration to
another within the codeword. We show that the RSI can be completely eliminated
when the FD relay is equipped with a buffer in the case of slow RSI. In the
case of fast RSI, the RSI cannot be eliminated. Closed-form expressions for the
links’ MI are derived under both RSI scenarios. For the fixed-rate data
transmission scenario, we derive the optimal transmission strategy that should
be adopted by the source and relay nodes to maximize the system throughput. We
verify our analytical findings through simulations.
Xianming Liu, Guangyue Han
Subjects: Information Theory (cs.IT)
We establish natural connections between continuous-time Gaussian
feedback/memory channels and their associated discrete-time versions in the
forms of sampling and approximating theorems. It turns out that these
connections, together with relevant tools from stochastic calculus, can enhance
our understanding of continuous-time Gaussian channels in terms of giving
alternative interpretations to some long-held “folklores”, recovering known
results from new perspectives, and obtaining new results inspired by the
insights and ideas that come along with the connections. In particular, we
derive the capacity regions of a continuous-time white Gaussian multiple access
channel, a continuous-time white Gaussian interference channel, and a
continuous-time white Gaussian broadcast channel, furthermore, applying the the
sampling and approximating theorems and the ideas and techniques in their
proofs, we analyze how feedback affects the capacity regions of families of
continuous-time multi-user one-hop Gaussian channels: feedback will increase
the capacity regions of some continuous-time white Gaussian broadcast and
interference channels, while it will not increase capacity regions of
continuous-time white Gaussian multiple access channels.
Yiming Huo, Xiaodai Dong, Wei Xu
Comments: Submitted to IEEE Access. 15 pages, 17 figures, 4 tables
Subjects: Information Theory (cs.IT)
Research and development on the next generation wireless systems, namely 5G,
has experienced explosive growth in recent years. In the physical layer (PHY),
the massive multiple-input-multiple-output (MIMO) technique and the use of high
GHz frequency bands are two promising trends for adoption. Millimeter-wave
(mmWave) bands such as 28 GHz, 38 GHz, 64 GHz and 71 GHz which were previously
considered not suitable for commercial cellular networks, will play an
important role in 5G. Currently, most 5G research deals with the algorithms and
implementations of modulation and coding schemes, new spatial signal processing
technologies, new spectrum opportunities, channel modeling, 5G proof of concept
(PoC) systems, and other system-level enabling technologies. In this paper,
based on a review of leading mainstream mobile handset devices, we conduct a
thorough investigation on the contemporary wireless user equipment (UE)
hardware design, and unveil the critical 5G UE hardware design constraints on
the radio frequency (RF) architecture, antenna system, RF and baseband (BB)
circuits, etc. On top of the said investigation and design trade-offs analysis,
a new highly reconfigurable system architecture for 5G cellular user equipment,
namely distributed phased arrays based MIMO (DPA-MIMO) is proposed. Finally,
the link budget calculation and data throughput numerical results are presented
for the evaluation of the proposed architecture.
Mohammadali Mohammadi, Batu K. Chalise, Himal A. Suraweera, Zhiguo Ding
Comments: Accepted for the IEEE International Conference on Communications (ICC 2017)
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
We consider a multiuser wireless system with a full-duplex hybrid access
point (HAP) that transmits to a set of users in the downlink channel, while
receiving data from a set of energy-constrained sensors in the uplink channel.
We assume that the HAP is equipped with a massive antenna array, while all
users and sensor nodes have a single antenna. We adopt a time-switching
protocol where in the first phase, sensors are powered through wireless energy
transfer from HAP and HAP estimates the downlink channel of the users. In the
second phase, sensors use the harvested energy to transmit to the HAP. The
downlink-uplink sum-rate region is obtained by solving downlink sum-rate
maximization problem under a constraint on uplink sum-rate. Moreover, assuming
perfect and imperfect channel state information, we derive expressions for the
achievable uplink and downlink rates in the large-antenna limit and approximate
results that hold for any finite number of antennas. Based on these analytical
results, we obtain the power-scaling law and analyze the effect of the number
of antennas on the cancellation of intra-user interference and the
self-interference.
Atri Rudra, Mary Wootters
Subjects: Information Theory (cs.IT)
We analyze the list-decodability, and related notions, of random linear
codes. This has been studied extensively before: there are many different
parameter regimes and many different variants. Previous works have used
complementary styles of arguments—which each work in their own parameter
regimes but not in others—and moreover have left some gaps in our
understanding of the list-decodability of random linear codes. In particular,
none of these arguments work well for list-recovery, a generalization of
list-decoding that has been useful in a variety of settings.
In this work, we present a new approach, which works across parameter regimes
and further generalizes to list-recovery. Our main theorem can establish better
list-decoding and list-recovery results for low-rate random linear codes over
large fields; list-recovery of high-rate random linear codes; and it can
recover the rate bounds of Guruswami, Hastad, and Kopparty for constant-rate
random linear codes (although with large list sizes).
Martin Biehl
Comments: PhD thesis, 198 pages
Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Multiagent Systems (cs.MA)
This thesis contributes to the formalisation of the notion of an agent within
the class of finite multivariate Markov chains. Agents are seen as entities
that act, perceive, and are goal-directed.
We present a new measure that can be used to identify entities (called
(iota)-entities), some general requirements for entities in multivariate
Markov chains, as well as formal definitions of actions and perceptions
suitable for such entities.
The intuition behind (iota)-entities is that entities are spatiotemporal
patterns for which every part makes every other part more probable. The
measure, complete local integration (CLI), is formally investigated in general
Bayesian networks. It is based on the specific local integration (SLI) which is
measured with respect to a partition. CLI is the minimum value of SLI over all
partitions. We prove that (iota)-entities are blocks in specific partitions of
the global trajectory. These partitions are the finest partitions that achieve
a given SLI value. We also establish the transformation behaviour of SLI under
permutations of nodes in the network.
We go on to present three conditions on general definitions of entities.
These are not fulfilled by sets of random variables i.e. the perception-action
loop, which is often used to model agents, is too restrictive. We propose that
any general entity definition should in effect specify a subset (called an an
entity-set) of the set of all spatiotemporal patterns of a given multivariate
Markov chain. The set of (iota)-entities is such a set. Importantly the
perception-action loop also induces an entity-set.
We then propose formal definitions of actions and perceptions for arbitrary
entity-sets. These specialise to standard notions in case of the
perception-action loop entity-set.
Finally we look at some very simple examples.
Jiange Li
Subjects: Probability (math.PR); Information Theory (cs.IT); Functional Analysis (math.FA)
This paper is twofold. In the first part, we derive an improvement of the
R’enyi Entropy Power Inequality (EPI) recently obtained by Bobkov and
Marsiglietti cite{BM16}. The proof largely follows Lieb’s cite{Lieb78}
approach of employing Young’s inequality. In the second part, we prove a
reverse R’enyi EPI, that verifies a conjecture proposed in cite{BNT15, MMX16}
in two cases. Connections with various (p)-th mean bodies in convex geometry
are also explored.