Ramin M. Hasani, Giorgio Ferrari, Hideaki Yamamoto, Sho Kono, Koji Ishihara, Soya Fujimori, Takashi Tanii, Enrico Prati
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
There are several indications that brain is organized not on a basis of
individual unreliable neurons, but on a micro-circuital scale providing Lego
blocks employed to create complex architectures. At such an intermediate scale,
the firing activity in the microcircuits is governed by collective effects
emerging by the background noise soliciting spontaneous firing, the degree of
mutual connections between the neurons, and the topology of the connections. We
compare spontaneous firing activity of small populations of neurons adhering to
an engineered scaffold with simulations of biologically plausible CMOS
artificial neuron populations whose spontaneous activity is ignited by tailored
background noise. We provide a full set of flexible and low-power consuming
silicon blocks including neurons, excitatory and inhibitory synapses, and both
white and pink noise generators for spontaneous firing activation. We achieve a
comparable degree of correlation of the firing activity of the biological
neurons by controlling the kind and the number of connection among the silicon
neurons. The correlation between groups of neurons, organized as a ring of four
distinct populations connected by the equivalent of interneurons, is triggered
more effectively by adding multiple synapses to the connections than increasing
the number of independent point-to-point connections. The comparison between
the biological and the artificial systems suggests that a considerable number
of synapses is active also in biological populations adhering to engineered
scaffolds.
Eliya Nachmani, Elad Marciano, David Burshtein, Yair Be'ery
Subjects: Information Theory (cs.IT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Designing a practical, low complexity, close to optimal, channel decoder for
powerful algebraic codes with short to moderate block length is an open
research problem. Recently it has been shown that a feed-forward neural network
architecture can improve on standard belief propagation decoding, despite the
large example space. In this paper we introduce a recurrent neural network
architecture for decoding linear block codes. Our method shows comparable bit
error rate results compared to the feed-forward neural network with
significantly less parameters. We also demonstrate improved performance over
belief propagation on sparser Tanner graph representations of the codes.
Furthermore, we demonstrate that the RNN decoder can be used to improve the
performance or alternatively reduce the computational complexity of the mRRD
algorithm for low complexity, close to optimal, decoding of short BCH codes.
Alfredo Nava-Tudela
Comments: 22 pages, 34 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We build a collaborative filtering recommender system to restore images with
impulse noise for which the noisy pixels have been previously identified. We
define this recommender system in terms of a new color image representation
using three matrices that depend on the noise-free pixels of the image to
restore, and two parameters: (k), the number of features; and (lambda), the
regularization factor. We perform experiments on a well known image database to
test our algorithm and we provide image quality statistics for the results
obtained. We discuss the roles of bias and variance in the performance of our
algorithm as determined by the values of (k) and (lambda), and provide
guidance on how to choose the values of these parameters. Finally, we discuss
the possibility of using our collaborative filtering recommender system to
perform image inpainting and super-resolution.
Dipan K. Pal, Marios Savvides
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this paper, we theoretically address three fundamental problems involving
deep convolutional networks regarding invariance, depth and hierarchy. We
introduce the paradigm of Transformation Networks (TN) which are a direct
generalization of Convolutional Networks (ConvNets). Theoretically, we show
that TNs (and thereby ConvNets) are can be invariant to non-linear
transformations of the input despite pooling over mere local translations. Our
analysis provides clear insights into the increase in invariance with depth in
these networks. Deeper networks are able to model much richer classes of
transformations. We also find that a hierarchical architecture allows the
network to generate invariance much more efficiently than a non-hierarchical
network. Our results provide useful insight into these three fundamental
problems in deep learning using ConvNets.
Amy Tabb, Henry Medeiros
Comments: 33 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
We consider the problem of extracting curve skeletons of three-dimensional,
elongated objects given a noisy surface, which has applications in agricultural
contexts such as extracting the branching structure of plants. We describe an
efficient and robust method based on breadth-first search that can determine
curve skeletons in these contexts. Our approach is capable of automatically
detecting junction points as well as spurious segments and loops. All of that
is accomplished with only one user-adjustable parameter. The run time of our
method ranges from hundreds of milliseconds to less than four seconds on large,
challenging datasets, which makes it appropriate for situations where real-time
decision making is needed. Experiments on synthetic models as well as on data
from real world objects, some of which were collected in challenging field
conditions, show that our approach compares favorably to classical thinning
algorithms as well as to recent contributions to the field.
Amy Tabb, Henry Medeiros
Comments: 5 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Segmentation in dynamic outdoor environments can be difficult when the
illumination levels and other aspects of the scene cannot be controlled. In
this paper, we describe a method that uses superpixels to determine low texture
regions of the image that correspond to the background material, and then show
how this information can be integrated with the color distribution of the image
to compute optimal segmentation parameters for traditional binary segmentation
as well as to produce silhouette probability maps. We show results of this
algorithm in the context of an application for tree modeling.
Klaas Kelchtermans, Tinne Tuytelaars
Comments: 12 pages, 30 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This work explores the feasibility of steering a drone with a (recurrent)
neural network, based on input from a forward looking camera, in the context of
a high-level navigation task. We set up a generic framework for training a
network to perform navigation tasks based on imitation learning. It can be
applied to both aerial and land vehicles. As a proof of concept we apply it to
a UAV (Unmanned Aerial Vehicle) in a simulated environment, learning to cross a
room containing a number of obstacles. So far only feedforward neural networks
(FNNs) have been used to train UAV control. To cope with more complex tasks, we
propose the use of recurrent neural networks (RNN) instead and successfully
train an LSTM (Long-Short Term Memory) network for controlling UAVs. Vision
based control is a sequential prediction problem, known for its highly
correlated input data. The correlation makes training a network hard,
especially an RNN. To overcome this issue, we investigate an alternative
sampling method during training, namely window-wise truncated backpropagation
through time (WW-TBPTT). Further, end-to-end training requires a lot of data
which often is not available. Therefore, we compare the performance of
retraining only the Fully Connected (FC) and LSTM control layers with networks
which are trained end-to-end. Performing the relatively simple task of crossing
a room already reveals important guidelines and good practices for training
neural control networks. Different visualizations help to explain the behavior
learned.
Songxuan Lai, Lianwen Jin, Weixin Yang
Comments: 10 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents an investigation of several techniques that increase the
accuracy of online handwritten Chinese character recognition (HCCR). We propose
a new training strategy named DropDistortion to train a deep convolutional
neural network (DCNN) with distorted samples. DropDistortion gradually lowers
the degree of character distortion during training, which allows the DCNN to
better generalize. Path signature is used to extract effective features for
online characters. Further improvement is achieved by employing spatial
stochastic max-pooling as a method of feature map distortion and model
averaging. Experiments were carried out on three publicly available datasets,
namely CASIA-OLHWDB 1.0, CASIA-OLHWDB 1.1, and the ICDAR2013 online HCCR
competition dataset. The proposed techniques yield state-of-the-art recognition
accuracies of 97.67%, 97.30%, and 97.99%, respectively.
Judith Bütepage, Michael Black, Danica Kragic, Hedvig Kjellström
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Generative models of 3D human motion are often restricted to a small number
of activities and can therefore not generalize well to novel movements or
applications. In this work we propose a deep learning framework for human
motion capture data that learns a generic representation from a large corpus of
motion capture data and generalizes well to new, unseen, motions. Using an
encoding-decoding network that learns to predict future 3D poses from the most
recent past, we extract a feature representation of human motion. Most work on
deep learning for sequence prediction focuses on video and speech. Since
skeletal data has a different structure, we present and evaluate different
network architectures that make different assumptions about time dependencies
and limb correlations. To quantify the learned features, we use the output of
different layers for action classification and visualize the receptive fields
of the network units. Our method outperforms the recent state of the art in
skeletal motion prediction even though these use action specific training data.
Our results show that deep feedforward networks, trained from a generic mocap
database, can successfully be used for feature extraction from human motion
data and that this representation can be used as a foundation for
classification and prediction.
Wensen Feng, Yunjin Chen
Comments: to appear in Journal of Mathematical Imaging and Vision. Demo codes are available from this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Speckle reduction is a prerequisite for many image processing tasks in
synthetic aperture radar (SAR) images, as well as all coherent images. In
recent years, predominant state-of-the-art approaches for despeckling are
usually based on nonlocal methods which mainly concentrate on achieving utmost
image restoration quality, with relatively low computational efficiency.
Therefore, in this study we aim to propose an efficient despeckling model with
both high computational efficiency and high recovery quality. To this end, we
exploit a newly-developed trainable nonlinear reaction diffusion(TNRD)
framework which has proven a simple and effective model for various image
restoration problems. {In the original TNRD applications, the diffusion network
is usually derived based on the direct gradient descent scheme. However, this
approach will encounter some problem for the task of multiplicative noise
reduction exploited in this study. To solve this problem, we employed a new
architecture derived from the proximal gradient descent method.} {Taking into
account the speckle noise statistics, the diffusion process for the despeckling
task is derived. We then retrain all the model parameters in the presence of
speckle noise. Finally, optimized nonlinear diffusion filtering models are
obtained, which are specialized for despeckling with various noise levels.
Experimental results substantiate that the trained filtering models provide
comparable or even better results than state-of-the-art nonlocal approaches.
Meanwhile, our proposed model merely contains convolution of linear filters
with an image, which offers high level parallelism on GPUs. As a consequence,
for images of size (512 imes 512), our GPU implementation takes less than 0.1
seconds to produce state-of-the-art despeckling performance.}
Fei Han, Xue Yang, Christopher Reardon, Yu Zhang, Hao Zhang
Comments: 8 pages, 6 figures, accepted by ICRA’17
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Robot awareness of human actions is an essential research problem in robotics
with many important real-world applications, including human-robot
collaboration and teaming. Over the past few years, depth sensors have become a
standard device widely used by intelligent robots for 3D perception, which can
also offer human skeletal data in 3D space. Several methods based on skeletal
data were designed to enable robot awareness of human actions with satisfactory
accuracy. However, previous methods treated all body parts and features equally
important, without the capability to identify discriminative body parts and
features. In this paper, we propose a novel simultaneous Feature And Body-part
Learning (FABL) approach that simultaneously identifies discriminative body
parts and features, and efficiently integrates all available information
together to enable real-time robot awareness of human behaviors. We formulate
FABL as a regression-like optimization problem with structured
sparsity-inducing norms to model interrelationships of body parts and features.
We also develop an optimization algorithm to solve the formulated problem,
which possesses a theoretical guarantee to find the optimal solution. To
evaluate FABL, three experiments were performed using public benchmark
datasets, including the MSR Action3D and CAD-60 datasets, as well as a Baxter
robot in practical assistive living applications. Experimental results show
that our FABL approach obtains a high recognition accuracy with a processing
speed of the order-of-magnitude of 10e4 Hz, which makes FABL a promising method
to enable real-time robot awareness of human behaviors in practical robotics
applications.
Peng Qiao, Yong Dou, Wensen Feng, Yunjin Chen
Comments: under review in a journal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Image diffusion plays a fundamental role for the task of image denoising.
Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a
simple but very effective framework for image denoising. However, as the TNRD
model is a local model, the diffusion behavior of which is purely controlled by
information of local patches, it is prone to create artifacts in the homogenous
regions and over-smooth highly textured regions, especially in the case of
strong noise levels. Meanwhile, it is widely known that the non-local
self-similarity (NSS) prior stands as an effective image prior for image
denoising, which has been widely exploited in many non-local methods. In this
work, we are highly motivated to embed the NSS prior into the TNRD model to
tackle its weaknesses. In order to preserve the expected property that
end-to-end training is available, we exploit the NSS prior by a set of
non-local filters, and derive our proposed trainable non-local reaction
diffusion (TNLRD) model for image denoising. Together with the local filters
and influence functions, the non-local filters are learned by employing
loss-specific training. The experimental results show that the trained TNLRD
model produces visually plausible recovered images with more textures and less
artifacts, compared to its local versions. Moreover, the trained TNLRD model
can achieve strongly competitive performance to recent state-of-the-art image
denoising methods in terms of peak signal-to-noise ratio (PSNR) and structural
similarity index (SSIM).
Patrick Wang, Kenneth Morton, Peter Torrione, Leslie Collins
Subjects: Computer Vision and Pattern Recognition (cs.CV)
An object detector performs suboptimally when applied to image data taken
from a viewpoint different from the one with which it was trained. In this
paper, we present a viewpoint adaptation algorithm that allows a trained
single-view object detector to be adapted to a new, distinct viewpoint. We
first illustrate how a feature space transformation can be inferred from a
known homography between the source and target viewpoints. Second, we show that
a variety of trained classifiers can be modified to behave as if that
transformation were applied to each testing instance. The proposed algorithm is
evaluated on a person detection task using images from the PETS 2007 and CAVIAR
datasets, as well as from a new synthetic multi-view person detection dataset.
It yields substantial performance improvements when adapting single-view person
detectors to new viewpoints, and simultaneously reduces computational
complexity. This work has the potential to improve detection performance for
cameras viewing objects from arbitrary viewpoints, while simplifying data
collection and feature extraction.
Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, Xiaogang Wang
Comments: The first two authors contribute equally to this work
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose to incorporate convolutional neural networks with a
multi-context attention mechanism into an end-to-end framework for human pose
estimation. We adopt stacked hourglass networks to generate attention maps from
features at multiple resolutions with various semantics. The Conditional Random
Field (CRF) is utilized to model the correlations among neighboring regions in
the attention map. We further combine the holistic attention model, which
focuses on the global consistency of the full human body, and the body part
attention model, which focuses on the detailed description for different body
parts. Hence our model has the ability to focus on different granularity from
local salient regions to global semantic-consistent spaces. Additionally, we
design novel Hourglass Residual Units (HRUs) to increase the receptive field of
the network. These units are extensions of residual units with a side branch
incorporating filters with larger receptive fields, hence features with various
scales are learned and combined within the HRUs. The effectiveness of the
proposed multi-context attention mechanism and the hourglass residual units is
evaluated on two widely used human pose estimation benchmarks. Our approach
outperforms all existing methods on both benchmarks over all the body parts.
Jie Li, Katherine A. Skinner, Ryan M. Eustice, Matthew Johnson-Roberson
Comments: 8 pages, 16 figures, submission to RA-letter/IROS 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
This paper reports on WaterGAN, a generative adversarial network (GAN) for
generating realistic underwater images from in-air image and depth pairings in
an unsupervised pipeline used for color correction of monocular underwater
images. Cameras onboard autonomous and remotely operated vehicles can capture
high resolution images to map the seafloor, however, underwater image formation
is subject to the complex process of light propagation through the water
column. The raw images retrieved are characteristically different than images
taken in air due to effects such as absorption and scattering, which cause
attenuation of light at different rates for different wavelengths. While this
physical process is well described theoretically, the model depends on many
parameters intrinsic to the water column as well as the objects in the scene.
These factors make recovery of these parameters difficult without simplifying
assumptions or field calibration, hence, restoration of underwater images is a
non-trivial problem. Deep learning has demonstrated great success in modeling
complex nonlinear systems but requires a large amount of training data, which
is difficult to compile in deep sea environments. Using WaterGAN, we generate a
large training dataset of paired imagery, both raw underwater and true color
in-air, as well as depth data. This data serves as input to a novel end-to-end
network for color correction of monocular underwater images. Due to the
depth-dependent water column effects inherent to underwater environments, we
show that our end-to-end network implicitly learns a coarse depth estimate of
the underwater scene from monocular underwater images. Our proposed pipeline is
validated with testing on real data collected from both a pure water tank and
from underwater surveys in field testing. Source code is made publicly
available with sample datasets and pretrained models.
Shibani Santurkar, David Budden, Alexander Matveev, Heather Berlin, Hayk Saribekyan, Yaron Meirovitch, Nir Shavit
Comments: 10 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Connectomics is an emerging field in neuroscience that aims to reconstruct
the 3-dimensional morphology of neurons from electron microscopy (EM) images.
Recent studies have successfully demonstrated the use of convolutional neural
networks (ConvNets) for segmenting cell membranes to individuate neurons.
However, there has been comparatively little success in high-throughput
identification of the intercellular synaptic connections required for deriving
connectivity graphs.
In this study, we take a compositional approach to segmenting synapses,
modeling them explicitly as an intercellular cleft co-located with an
asymmetric vesicle density along a cell membrane. Instead of requiring a deep
network to learn all natural combinations of this compositionality, we train
lighter networks to model the simpler marginal distributions of membranes,
clefts and vesicles from just 100 electron microscopy samples. These feature
maps are then combined with simple rules-based heuristics derived from prior
biological knowledge.
Our approach to synapse detection is both more accurate than previous
state-of-the-art (7% higher recall and 5% higher F1-score) and yields a 20-fold
speed-up compared to the previous fastest implementations. We demonstrate by
reconstructing the first complete, directed connectome from the largest
available anisotropic microscopy dataset (245 GB) of mouse somatosensory cortex
(S1) in just 9.7 hours on a single shared-memory CPU system. We believe that
this work marks an important step toward the goal of a microscope-pace
streaming connectomics pipeline.
Tanu Srivastava, Raj Shree Singh, Sunil Kumar, Pavan Chakraborty
Comments: conference
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Nowadays actions are increasingly being handled in electronic ways, instead
of physical interaction. From earlier times biometrics is used in the
authentication of a person. It recognizes a person by using a human trait
associated with it like eyes (by calculating the distance between the eyes) and
using hand gestures, fingerprint detection, face detection etc. Advantages of
using these traits for identification are that they uniquely identify a person
and cannot be forgotten or lost. These are unique features of a human being
which are being used widely to make the human life simpler. Hand gesture
recognition system is a powerful tool that supports efficient interaction
between the user and the computer. The main moto of hand gesture recognition
research is to create a system which can recognise specific hand gestures and
use them to convey useful information for device control. This paper presents
an experimental study over the feasibility of principal component analysis in
hand gesture recognition system. PCA is a powerful tool for analyzing data. The
primary goal of PCA is dimensionality reduction. Frames are extracted from the
Sheffield KInect Gesture (SKIG) dataset. The implementation is done by creating
a training set and then training the recognizer. It uses Eigen space by
processing the eigenvalues and eigenvectors of the images in training set.
Euclidean distance with the threshold value is used as similarity metric to
recognize the gestures. The experimental results show that PCA is feasible to
be used for hand gesture recognition system.
Hamid Reza Shahdoosti
Comments: 7 pages, in Persian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In an appropriate image fusion method, spatial information of the
panchromatic image is injected into the multispectral images such that the
spectral information is not distorted. The high-pass modulation method is a
successful method in image fusion. However, the main drawback of this method is
that this technique uses the boxcar filter to extract the high frequency
information of the panchromatic image. Using the boxcar filter introduces the
ringing effect into the fused image. To cope with this problem, we use the
wavelet transform instead of boxcar filters. Then, the results of the proposed
method and those of other methods such as, Brovey, IHS, and PCA ones are
compared. Experiments show the superiority of the proposed method in terms of
correlation coefficient and mutual information.
Charlotte Revel, Yannick Deville, Véronique Achard, Xavier Briottet
Subjects: Methodology (stat.ME); Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
Blind source separation is a common processing tool to analyse the
constitution of pixels of hyperspectral images. Such methods usually suppose
that pure pixel spectra (endmembers) are the same in all the image for each
class of materials. In the framework of remote sensing, such an assumption is
no more valid in the presence of intra-class variabilities due to illumination
conditions, weathering, slight variations of the pure materials, etc… In this
paper, we first describe the results of investigations highlighting intra-class
variability measured in real images. Considering these results, a new
formulation of the linear mixing model is presented leading to two new methods.
Unconstrained Pixel-by-pixel NMF (UP-NMF) is a new blind source separation
method based on the assumption of a linear mixing model, which can deal with
intra-class variability. To overcome UP-NMF limitations an extended method is
proposed, named Inertia-constrained Pixel-by-pixel NMF (IP-NMF). For each
sensed spectrum, these extended versions of NMF extract a corresponding set of
source spectra. A constraint is set to limit the spreading of each source’s
estimates in IP-NMF. The methods are tested on a semi-synthetic data set built
with spectra extracted from a real hyperspectral image and then numerically
mixed. We thus demonstrate the interest of our methods for realistic source
variabilities. Finally, IP-NMF is tested on a real data set and it is shown to
yield better performance than state of the art methods.
Ahmed Hussain Qureshi, Yutaka Nakamura, Yuichiro Yoshikawa, Hiroshi Ishiguro
Comments: The paper is published in IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2016
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
For robots to coexist with humans in a social world like ours, it is crucial
that they possess human-like social interaction skills. Programming a robot to
possess such skills is a challenging task. In this paper, we propose a
Multimodal Deep Q-Network (MDQN) to enable a robot to learn human-like
interaction skills through a trial and error method. This paper aims to develop
a robot that gathers data during its interaction with a human and learns human
interaction behaviour from the high-dimensional sensory information using
end-to-end reinforcement learning. This paper demonstrates that the robot was
able to learn basic interaction skills successfully, after 14 days of
interacting with people.
Fei Han, Xue Yang, Yu Zhang, Hao Zhang
Comments: 8 pages, 6 figures, accepted by ICRA’17
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Apprenticeship learning has recently attracted a wide attention due to its
capability of allowing robots to learn physical tasks directly from
demonstrations provided by human experts. Most previous techniques assumed that
the state space is known a priori or employed simple state representations that
usually suffer from perceptual aliasing. Different from previous research, we
propose a novel approach named Sequence-based Multimodal Apprenticeship
Learning (SMAL), which is capable to simultaneously fusing temporal information
and multimodal data, and to integrate robot perception with decision making. To
evaluate the SMAL approach, experiments are performed using both simulations
and real-world robots in the challenging search and rescue scenarios. The
empirical study has validated that our SMAL approach can effectively learn
plans for robots to make decisions using sequence of multimodal observations.
Experimental results have also showed that SMAL outperforms the baseline
methods using individual images.
Domenic Curro, Konstantinos G. Derpanis, Andriy V. Miranskyy
Subjects: Software Engineering (cs.SE); Computer Vision and Pattern Recognition (cs.CV)
To improve software quality, one needs to build test scenarios resembling the
usage of a software product in the field. This task is rendered challenging
when a product’s customer base is large and diverse. In this scenario, existing
profiling approaches, such as operational profiling, are difficult to apply. In
this work, we consider publicly available video tutorials of a product to
profile usage. Our goal is to construct an automatic approach to extract
information about user actions from instructional videos. To achieve this goal,
we use a Deep Convolutional Neural Network (DCNN) to recognize user actions.
Our pilot study shows that a DCNN trained to recognize user actions in video
can classify five different actions in a collection of 236 publicly available
Microsoft Word tutorial videos (published on YouTube). In our empirical
evaluation we report a mean average precision of 94.42% across all actions.
This study demonstrates the efficacy of DCNN-based methods for extracting
software usage information from videos. Moreover, this approach may aid in
other software engineering activities that require information about customer
usage of a product.
Elias Mueggler, Guillermo Gallego, Henri Rebecq, Davide Scaramuzza
Comments: 14 pages, 12 figures, 4 tables
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
In contrast to traditional cameras, which output images at a fixed rate,
event cameras have independent pixels that output asynchronous pixel-level
brightness changes with microsecond resolution. In this paper, we leverage a
continuous-time framework to perform trajectory estimation by fusing visual
data from a moving event camera with inertial data from an IMU. This framework
allows direct integration of the asynchronous events with micro-second accuracy
and the inertial measurements at high frequency. The pose trajectory is
approximated by a smooth curve in the space of rigid-body motions using cubic
splines. This formulation significantly reduces the number of variables in
trajectory estimation problems. We evaluate our method on real data from
several scenes and compare the results against ground truth from a
motion-capture system. We show superior performance of the proposed technique
compared to non-batch event-based algorithms. We also show that both the map
orientation and scale can be recovered accurately by fusing events and inertial
data. To the best of our knowledge, this is the first work on visual-inertial
fusion with event cameras using a continuous-time framework.
Dipan K. Pal, Vishnu Boddeti, Marios Savvides
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Many theories have emerged which investigate how in- variance is generated in
hierarchical networks through sim- ple schemes such as max and mean pooling.
The restriction to max/mean pooling in theoretical and empirical studies has
diverted attention away from a more general way of generating invariance to
nuisance transformations. We con- jecture that hierarchically building
selective invariance (i.e. carefully choosing the range of the transformation
to be in- variant to at each layer of a hierarchical network) is im- portant
for pattern recognition. We utilize a novel pooling layer called adaptive
pooling to find linear pooling weights within networks. These networks with the
learnt pooling weights have performances on object categorization tasks that
are comparable to max/mean pooling networks. In- terestingly, adaptive pooling
can converge to mean pooling (when initialized with random pooling weights),
find more general linear pooling schemes or even decide not to pool at all. We
illustrate the general notion of selective invari- ance through object
categorization experiments on large- scale datasets such as SVHN and ILSVRC
2012.
Mengya Wang, Hankui Zhuo, Huiling Zhu
Subjects: Artificial Intelligence (cs.AI)
Representation learning of knowledge graphs encodes entities and relation
types into a continuous low-dimensional vector space, learns embeddings of
entities and relation types. Most existing methods only concentrate on
knowledge triples, ignoring logic rules which contain rich background
knowledge. Although there has been some work aiming at leveraging both
knowledge triples and logic rules, they ignore the transitivity and
antisymmetry of logic rules. In this paper, we propose a novel approach to
learn knowledge representations with entities and ordered relations in
knowledges and logic rules. The key idea is to integrate knowledge triples and
logic rules, and approximately order the relation types in logic rules to
utilize the transitivity and antisymmetry of logic rules. All entries of the
embeddings of relation types are constrained to be non-negative. We translate
the general constrained optimization problem into an unconstrained optimization
problem to solve the non-negative matrix factorization. Experimental results
show that our model significantly outperforms other baselines on knowledge
graph completion task. It indicates that our model is capable of capturing the
transitivity and antisymmetry information, which is significant when learning
embeddings of knowledge graphs.
Ahmed Hussain Qureshi, Yutaka Nakamura, Yuichiro Yoshikawa, Hiroshi Ishiguro
Comments: The paper is published in IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2016
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
For robots to coexist with humans in a social world like ours, it is crucial
that they possess human-like social interaction skills. Programming a robot to
possess such skills is a challenging task. In this paper, we propose a
Multimodal Deep Q-Network (MDQN) to enable a robot to learn human-like
interaction skills through a trial and error method. This paper aims to develop
a robot that gathers data during its interaction with a human and learns human
interaction behaviour from the high-dimensional sensory information using
end-to-end reinforcement learning. This paper demonstrates that the robot was
able to learn basic interaction skills successfully, after 14 days of
interacting with people.
Fei Han, Xue Yang, Yu Zhang, Hao Zhang
Comments: 8 pages, 6 figures, accepted by ICRA’17
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Apprenticeship learning has recently attracted a wide attention due to its
capability of allowing robots to learn physical tasks directly from
demonstrations provided by human experts. Most previous techniques assumed that
the state space is known a priori or employed simple state representations that
usually suffer from perceptual aliasing. Different from previous research, we
propose a novel approach named Sequence-based Multimodal Apprenticeship
Learning (SMAL), which is capable to simultaneously fusing temporal information
and multimodal data, and to integrate robot perception with decision making. To
evaluate the SMAL approach, experiments are performed using both simulations
and real-world robots in the challenging search and rescue scenarios. The
empirical study has validated that our SMAL approach can effectively learn
plans for robots to make decisions using sequence of multimodal observations.
Experimental results have also showed that SMAL outperforms the baseline
methods using individual images.
David Balduzzi
Comments: 13 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
As artificial agents proliferate, it is becoming increasingly important to
ensure that their interactions with one another are well-behaved. In this
paper, we formalize a common-sense notion of when algorithms are well-behaved:
an algorithm is safe if it does no harm. Motivated by recent progress in deep
learning, we focus on the specific case where agents update their actions
according to gradient descent. The first result is that gradient descent
converges to a Nash equilibrium in safe games.
The paper provides sufficient conditions that guarantee safe interactions.
The main contribution is to define strongly-typed agents and show they are
guaranteed to interact safely. A series of examples show that strong-typing
generalizes certain key features of convexity and is closely related to blind
source separation. The analysis introduce a new perspective on classical
multilinear games based on tensor decomposition.
Dipan K. Pal, Marios Savvides
Comments: 11 page main paper (including references), 2 page supplementary, for a total of 13 pages. Submitted for review at ICLR 2016
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
The study of representations invariant to common transformations of the data
is important to learning. Most techniques have focused on local approximate
invariance implemented within expensive optimization frameworks lacking
explicit theoretical guarantees. In this paper, we study kernels that are
invariant to the unitary group while having theoretical guarantees in
addressing practical issues such as (1) unavailability of transformed versions
of labelled data and (2) not observing all transformations. We present a
theoretically motivated alternate approach to the invariant kernel SVM. Unlike
previous approaches to the invariant SVM, the proposed formulation solves both
issues mentioned. We also present a kernel extension of a recent technique to
extract linear unitary-group invariant features addressing both issues and
extend some guarantees regarding invariance and stability. We present
experiments on the UCI ML datasets to illustrate and validate our methods.
Cem Safak Sahin, Rajmonda S. Caceres, Brandon Oselio, William M. Campbell
Comments: 4 pages, 2 figures
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Word embedding models offer continuous vector representations that can
capture rich contextual semantics based on their word co-occurrence patterns.
While these word vectors can provide very effective features used in many NLP
tasks such as clustering similar words and inferring learning relationships,
many challenges and open research questions remain. In this paper, we propose a
solution that aligns variations of the same model (or different models) in a
joint low-dimensional latent space leveraging carefully generated synthetic
data points. This generative process is inspired by the observation that a
variety of linguistic relationships is captured by simple linear operations in
embedded space. We demonstrate that our approach can lead to substantial
improvements in recovering embeddings of local neighborhoods.
Cem Safak Sahin, Rajmonda S. Caceres, Brandon Oselio, William M. Campbell
Comments: 4 pages, 2 figures
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Word embedding models offer continuous vector representations that can
capture rich contextual semantics based on their word co-occurrence patterns.
While these word vectors can provide very effective features used in many NLP
tasks such as clustering similar words and inferring learning relationships,
many challenges and open research questions remain. In this paper, we propose a
solution that aligns variations of the same model (or different models) in a
joint low-dimensional latent space leveraging carefully generated synthetic
data points. This generative process is inspired by the observation that a
variety of linguistic relationships is captured by simple linear operations in
embedded space. We demonstrate that our approach can lead to substantial
improvements in recovering embeddings of local neighborhoods.
Nafise Sadat Moosavi, Michael Strube
Comments: CORBON workshop@EACL 2017
Subjects: Computation and Language (cs.CL)
Only a year ago, all state-of-the-art coreference resolvers were using an
extensive amount of surface features. Recently, there was a paradigm shift
towards using word embeddings and deep neural networks, where the use of
surface features is very limited. In this paper, we show that a simple SVM
model with surface features outperforms more complex neural models for
detecting anaphoric mentions. Our analysis suggests that using generalized
representations and surface features have different strength that should be
both taken into account for improving coreference resolution.
Shaohua Li
Comments: 5 pages
Subjects: Computation and Language (cs.CL)
This document is about the multi-document Von-Mises-Fisher mixture model with
a Dirichlet prior, referred to as VMFMix. VMFMix is analogous to Latent
Dirichlet Allocation (LDA) in that they can capture the co-occurrence patterns
acorss multiple documents. The difference is that in VMFMix, the topic-word
distribution is defined on a continuous n-dimensional hypersphere. Hence VMFMix
is used to derive topic embeddings, i.e., representative vectors, from multiple
sets of embedding vectors. An efficient Variational Expectation-Maximization
inference algorithm is derived. The performance of VMFMix on two document
classification tasks is reported, with some preliminary analysis.
Chen Wu, Rodrigo Tobar, Kevin Vinsen, Andreas Wicenec, Dave Pallot, Baoqiang Lao, Ruonan Wang, Tao An, Mark Boulton, Ian Cooper, Richard Dodson, Markus Dolensky, Ying Mei, Feng Wang
Comments: 31 pages, 12 figures, currently under review by Astronomy and Computing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Instrumentation and Detectors (physics.ins-det)
The Data Activated Liu Graph Engine – DALiuGE – is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE’s scalability to very large numbers of tasks on
two supercomputing facilities.
Lélia Blin, Sébastien Tixeuil
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
We present a self-stabilizing leader election algorithm for arbitrary
networks, with space-complexity (O(max{log Delta, log log n})) bits per
node in (n)-node networks with maximum degree~(Delta). This space complexity
is sub-logarithmic in (n) as long as (Delta = n^{o(1)}). The best
space-complexity known so far for arbitrary networks was (O(log n)) bits per
node, and algorithms with sub-logarithmic space-complexities were known for the
ring only. To our knowledge, our algorithm is the first algorithm for
self-stabilizing leader election to break the (Omega(log n)) bound for silent
algorithms in arbitrary networks. Breaking this bound was obtained via the
design of a (non-silent) self-stabilizing algorithm using sophisticated tools
such as solving the distance-2 coloring problem in a silent self-stabilizing
manner, with space-complexity (O(max{log Delta, log log n})) bits per
node. Solving this latter coloring problem allows us to implement a
sub-logarithmic encoding of spanning trees — storing the IDs of the neighbors
requires (Omega(log n)) bits per node, while we encode spanning trees using
(O(max{log Delta, log log n})) bits per node. Moreover, we show how to
construct such compactly encoded spanning trees without relying on variables
encoding distances or number of nodes, as these two types of variables would
also require (Omega(log n)) bits per node.
Hesham Arafat Ali, Salah Attiya, Ibrahim El-henawy
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper we develop parallel cluster sampling algorithms and show that a
multi-chain version is embarrassingly parallel and can be used efficiently for
medical image retrieval among other applications.
Justin M Wozniak, Jonathan Ozik, Daniel S. Katz, Michael Wilde
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This is a position paper, submitted to the Future Online Analysis Platform
Workshop (this https URL), which argues that simple
data analysis applications are common today, but future online supercomputing
workloads will need to couple multiple advanced technologies (streams, caches,
analysis, and simulations) to rapidly deliver scientific results. Each of these
technologies are active research areas when integrated with high-performance
computing. These components will interact in complex ways, therefore coupling
them needs to be programmed. Programming in the large, on top of existing
applications, enables us to build much more capable applications and to
productively manage this complexity.
Carlos Mera-Gómez, Francisco Ramírez, Rami Bahsoon, Rajkumar Buyya
Subjects: Software Engineering (cs.SE); Distributed, Parallel, and Cluster Computing (cs.DC)
Elasticity is a cloud property that enables applications and its execution
systems to dynamically acquire and release shared computational resources on
demand. Moreover, it unfolds the advantage of economies of scale in the cloud
through a drop in the average costs of these shared resources. However, it is
still an open challenge to achieve a perfect match between resource demand and
provision in autonomous elasticity management. Resource adaptation decisions
essentially involve a trade-off between economics and performance, which
produces a gap between the ideal and actual resource provisioning. This gap, if
not properly managed, can negatively impact the aggregate utility of a cloud
customer in the long run. To address this limitation, we propose a technical
debt-aware learning approach for autonomous elasticity management based on a
reinforcement learning of elasticity debts in resource provisioning; the
adaptation pursues strategic decisions that trades off economics against
performance. We extend CloudSim and Burlap to evaluate our approach. The
evaluation shows that a reinforcement learning of technical debts in elasticity
obtains a higher utility for a cloud customer, while conforming expected levels
of performance.
Keren Censor-Hillel, Ran Gelles, Bernhard Haeupler
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
We consider the problem of making distributed computations robust to noise,
in particular to worst-case (adversarial) corruptions of messages. We give a
general distributed interactive coding scheme which simulates any asynchronous
distributed protocol while tolerating an optimal corruption of a (Theta(1/n))
fraction of all messages while incurring a moderate blowup of (O(nlog^2 n)) in
the communication complexity.
Our result is the first fully distributed interactive coding scheme in which
the topology of the communication network is not known in advance. Prior work
required either a coordinating node to be connected to all other nodes in the
network or assumed a synchronous network in which all nodes already know the
complete topology of the network.
Alon Cohen, Tamir Hazan, Tomer Koren
Subjects: Learning (cs.LG)
We revisit the study of optimal regret rates in bandit combinatorial
optimization—a fundamental framework for sequential decision making under
uncertainty that abstracts numerous combinatorial prediction problems. We prove
that the attainable regret in this setting grows as
(widetilde{Theta}(k^{3/2}sqrt{dT})) where (d) is the dimension of the
problem and (k) is a bound over the maximal instantaneous loss, disproving a
conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal
rate should be of the form (widetilde{Theta}(ksqrt{dT})). Our bounds apply
to several important instances of the framework, and in particular, imply a
tight bound for the well-studied bandit shortest path problem. By that, we also
resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).
Stefan Elfwing, Eiji Uchibe, Kenji Doya
Comments: 15 pages, 10 figures. arXiv admin note: text overlap with arXiv:1702.03118
Subjects: Learning (cs.LG)
The efficiency of reinforcement learning algorithms depends critically on a
few meta-parameters that modulates the learning updates and the trade-off
between exploration and exploitation. The adaptation of the meta-parameters is
an open question in reinforcement learning, which arguably has become more of
an issue recently with the success of deep reinforcement learning in
high-dimensional state spaces. The long learning times in domains such as Atari
2600 video games makes it not feasible to perform comprehensive searches of
appropriate meta-parameter values. We propose the Online Meta-learning by
Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several
instances of a reinforcement learning algorithm are run in parallel with small
differences in the initial values of the meta-parameters. After a fixed number
of episodes, the instances are selected based on their performance in the task
at hand. Before continuing the learning, Gaussian noise is added to the
meta-parameters with a predefined probability. We validate the OMPAC method by
improving the state-of-the-art results in stochastic SZ-Tetris and in standard
Tetris with a smaller, 10( imes)10, board, by 31% and 84%, respectively, and
by improving the results for deep Sarsa((lambda)) agents in three Atari 2600
games by 62% or more. The experiments also show the ability of the OMPAC method
to adapt the meta-parameters according to the learning progress in different
tasks.
David Balduzzi
Comments: 13 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
As artificial agents proliferate, it is becoming increasingly important to
ensure that their interactions with one another are well-behaved. In this
paper, we formalize a common-sense notion of when algorithms are well-behaved:
an algorithm is safe if it does no harm. Motivated by recent progress in deep
learning, we focus on the specific case where agents update their actions
according to gradient descent. The first result is that gradient descent
converges to a Nash equilibrium in safe games.
The paper provides sufficient conditions that guarantee safe interactions.
The main contribution is to define strongly-typed agents and show they are
guaranteed to interact safely. A series of examples show that strong-typing
generalizes certain key features of convexity and is closely related to blind
source separation. The analysis introduce a new perspective on classical
multilinear games based on tensor decomposition.
Tomer Koren, Roi Livni, Yishay Mansour
Subjects: Learning (cs.LG); Computer Science and Game Theory (cs.GT)
We extend the model of Multi-armed Bandit with unit switching cost to
incorporate a metric between the actions. We consider the case where the metric
over the actions can be modeled by a complete binary tree, and the distance
between two leaves is the size of the subtree of their least common ancestor,
which abstracts the case that the actions are points on the continuous interval
([0,1]) and the switching cost is their distance. In this setting, we give a
new algorithm that establishes a regret of (widetilde{O}(sqrt{kT} + T/k)),
where (k) is the number of actions and (T) is the time horizon. When the set of
actions corresponds to whole ([0,1]) interval we can exploit our method for the
task of bandit learning with Lipschitz loss functions, where our algorithm
achieves an optimal regret rate of (widetilde{Theta}(T^{2/3})), which is the
same rate one obtains when there is no penalty for movements. As our main
application, we use our new algorithm to solve an adaptive pricing problem.
Specifically, we consider the case of a single seller faced with a stream of
patient buyers. Each buyer has a private value and a window of time in which
they are interested in buying, and they buy at the lowest price in the window,
if it is below their value. We show that with an appropriate discretization of
the prices, the seller can achieve a regret of (widetilde{O}(T^{2/3}))
compared to the best fixed price in hindsight, which outperform the previous
regret bound of (widetilde{O}(T^{3/4})) for the problem.
Simon S. Du, Sivaraman Balakrishnan, Aarti Singh
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG)
Many conventional statistical procedures are extremely sensitive to seemingly
minor deviations from modeling assumptions. This problem is exacerbated in
modern high-dimensional settings, where the problem dimension can grow with and
possibly exceed the sample size. We consider the problem of robust estimation
of sparse functionals, and provide a computationally and statistically
efficient algorithm in the high-dimensional setting. Our theory identifies a
unified set of deterministic conditions under which our algorithm guarantees
accurate recovery. By further establishing that these deterministic conditions
hold with high-probability for a wide range of statistical models, our theory
applies to many problems of considerable interest including sparse mean and
covariance estimation; sparse linear regression; and sparse generalized linear
models.
Stephen N. Pallone, Peter I. Frazier, Shane G. Henderson
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
We analyze the problem of learning a single user’s preferences in an active
learning setting, sequentially and adaptively querying the user over a finite
time horizon. Learning is conducted via choice-based queries, where the user
selects her preferred option among a small subset of offered alternatives.
These queries have been shown to be a robust and efficient way to learn an
individual’s preferences. We take a parametric approach and model the user’s
preferences through a linear classifier, using a Bayesian prior to encode our
current knowledge of this classifier. The rate at which we learn depends on the
alternatives offered at every time epoch. Under certain noise assumptions, we
show that the Bayes-optimal policy for maximally reducing entropy of the
posterior distribution of this linear classifier is a greedy policy, and that
this policy achieves a linear lower bound when alternatives can be constructed
from the continuum. Further, we analyze a different metric called
misclassification error, proving that the performance of the optimal policy
that minimizes misclassification error is bounded below by a linear function of
differential entropy. Lastly, we numerically compare the greedy entropy
reduction policy with a knowledge gradient policy under a number of scenarios,
examining their performance under both differential entropy and
misclassification error.
Dipan K. Pal, Marios Savvides
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
In this paper, we theoretically address three fundamental problems involving
deep convolutional networks regarding invariance, depth and hierarchy. We
introduce the paradigm of Transformation Networks (TN) which are a direct
generalization of Convolutional Networks (ConvNets). Theoretically, we show
that TNs (and thereby ConvNets) are can be invariant to non-linear
transformations of the input despite pooling over mere local translations. Our
analysis provides clear insights into the increase in invariance with depth in
these networks. Deeper networks are able to model much richer classes of
transformations. We also find that a hierarchical architecture allows the
network to generate invariance much more efficiently than a non-hierarchical
network. Our results provide useful insight into these three fundamental
problems in deep learning using ConvNets.
Mahdi Imani, Ulisses Braga-Neto
Subjects: Molecular Networks (q-bio.MN); Learning (cs.LG); Machine Learning (stat.ML)
This paper is concerned with the problem of stochastic control of gene
regulatory networks (GRNs) observed indirectly through noisy measurements and
with uncertainty in the intervention inputs. The partial observability of the
gene states and uncertainty in the intervention process are accounted for by
modeling GRNs using the partially-observed Boolean dynamical system (POBDS)
signal model with noisy gene expression measurements. Obtaining the optimal
infinite-horizon control strategy for this problem is not attainable in
general, and we apply reinforcement learning and Gaussian process techniques to
find a near-optimal solution. The POBDS is first transformed to a
directly-observed Markov Decision Process in a continuous belief space, and the
Gaussian process is used for modeling the cost function over the belief and
intervention spaces. Reinforcement learning then is used to learn the cost
function from the available gene expression data. In addition, we employ
sparsification, which enables the control of large partially-observed GRNs. The
performance of the resulting algorithm is studied through a comprehensive set
of numerical experiments using synthetic gene expression data generated from a
melanoma gene regulatory network.
Eliya Nachmani, Elad Marciano, David Burshtein, Yair Be'ery
Subjects: Information Theory (cs.IT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Designing a practical, low complexity, close to optimal, channel decoder for
powerful algebraic codes with short to moderate block length is an open
research problem. Recently it has been shown that a feed-forward neural network
architecture can improve on standard belief propagation decoding, despite the
large example space. In this paper we introduce a recurrent neural network
architecture for decoding linear block codes. Our method shows comparable bit
error rate results compared to the feed-forward neural network with
significantly less parameters. We also demonstrate improved performance over
belief propagation on sparser Tanner graph representations of the codes.
Furthermore, we demonstrate that the RNN decoder can be used to improve the
performance or alternatively reduce the computational complexity of the mRRD
algorithm for low complexity, close to optimal, decoding of short BCH codes.
Muhammad Farooqn, Ingo Steinwart
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Conditional expectiles are becoming an increasingly important tool in finance
as well as in other areas of applications. We analyse a support vector machine
type approach for estimating conditional expectiles and establish learning
rates that are minimax optimal modulo a logarithmic factor if Gaussian RBF
kernels are used and the desired expectile is smooth in a Besov sense. As a
special case, our learning rates improve the best known rates for kernel-based
least squares regression in this scenario. Key ingredients of our statistical
analysis are a general calibration inequality for the asymmetric least squares
loss, a corresponding variance bound as well as an improved entropy number
bound for Gaussian RBF kernels.
Briland Hitaj, Giuseppe Ateniese, Fernando Perez-Cruz
Comments: 14 pages, 12 figures
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)
In recent years, a branch of machine learning called Deep Learning has become
incredibly popular thanks to the ability of a new class of algorithms to model
and interpret a large quantity of data in a similar way to humans. Properly
training deep learning models involves collecting a vast amount of users’
private data, including habits, geographical positions, interests, and much
more. Another major issue is that it is possible to extract from trained models
useful information about the training set and this hinders collaboration among
distrustful participants or parties that deal with sensitive information. To
tackle this problem, collaborative deep learning models have recently been
proposed where parties share only a subset of the parameters in the attempt to
keep their respective training sets private. Parameters can also be obfuscated
via differential privacy to make information extraction even more challenging,
as shown by Shokri and Shmatikov at CCS’15. Unfortunately, we show that any
privacy-preserving collaborative deep learning is susceptible to a powerful
attack that we devise in this paper. In particular, we show that a distributed
or decentralized deep learning approach is fundamentally broken and does not
protect the training sets of honest participants. The attack we developed
exploits the real-time nature of the learning process that allows the adversary
to train a Generative Adversarial Network (GAN) that generates valid samples of
the targeted training set that was meant to be private. Interestingly, we show
that differential privacy applied to shared parameters of the model as
suggested at CCS’15 and CCS’16 is utterly futile. In our generative model
attack, all techniques adopted to scramble or obfuscate shared parameters in
collaborative deep learning are rendered ineffective with no possibility of a
remedy under the threat model considered.
Chong Wang, Yining Wang, Po-Sen Huang, Abdelrahman Mohamed, Dengyong Zhou, Li Deng
Comments: recurrent neural networks, dynamic programming, structured prediction
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Segmental structure is a common pattern in many types of sequences such as
phrases in human languages. In this paper, we present a probabilistic model for
sequences via their segmentations. The probability of a segmented sequence is
calculated as the product of the probabilities of all its segments, where each
segment is modeled using existing tools such as recurrent neural networks.
Since the segmentation of a sequence is usually unknown in advance, we sum over
all valid segmentations to obtain the final probability for the sequence. An
efficient dynamic programming algorithm is developed for forward and backward
computations without resorting to any approximation. We demonstrate our
approach on text segmentation and speech recognition tasks. In addition to
quantitative results, we also show that our approach can discover meaningful
segments in their respective application contexts.
Randall Balestriero
Comments: 11 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In this paper we propose a synergistic melting of neural networks and
decision trees into a deep hashing neural network (HNN) having a modeling
capability exponential with respect to its number of neurons. We first derive a
soft decision tree named neural decision tree allowing the optimization of
arbitrary decision function at each split node. We then rewrite this soft space
partitioning as a new kind of neural network layer, namely the hashing layer
(HL), which can be seen as a generalization of the known soft-max layer. This
HL can easily replace the standard last layer of ANN in any known network
topology and thus can be used after a convolutional or recurrent neural network
for example. We present the modeling capacity of this deep hashing function on
small datasets where one can reach at least equally good results as standard
neural networks by diminishing the number of output neurons. Finally, we show
that for the case where the number of output neurons is large, the neural
network can mitigate the absence of linear decision boundaries by learning for
each difficult class a collection of not necessarily connected sub-regions of
the space leading to more flexible decision surfaces. Finally, the HNN can be
seen as a deep locality sensitive hashing function which can be trained in a
supervised or unsupervised setting as we will demonstrate for classification
and regression problems.
M. Dalai, Y. Polyanskiy
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)
New lower and upper bounds on the reliability function of typewriter channels
are given. Our lower bounds improve upon the (multiletter) expurgated bound of
Gallager, furnishing a new and simple counterexample to a conjecture made in
1967 by Shannon, Gallager and Berlekamp on its tightness. The only other known
counterexample is due to Katsman, Tsfasman and Vlu{a}duc{t} who used
algebraic-geometric codes on a (q)-ary symmetric channels, (qgeq 49). Here we
prove, by introducing dependence between codewords of a random ensemble, that
the conjecture is false even for a typewriter channel with (q=4) inputs. In the
process, we also demonstrate that Lov’asz’s proof of the capacity of the
pentagon was implicitly contained (but unnoticed!) in the works of Jelinek and
Gallager on the expurgated bound done at least ten years before Lov’asz. In
the opposite direction, new upper bounds on the reliability function are
derived for channels with an odd number of inputs by using an adaptation of
Delsarte’s linear programming bound. First we derive a bound based on the
minimum distance, which combines Lov’asz’s construction for bounding the graph
capacity with the McEliece-Rodemich-Rumsey-Welch construction for bounding the
minimum distance of codes in the Hamming space. Then, for the particular case
of cross-over probability (1/2), we derive an improved bound by also using the
method of Kalai and Linial to study the spectrum distribution of codes.
Daniel J. Katz, Sangman Lee, Stanislav A. Trunov
Comments: 30 pages
Subjects: Information Theory (cs.IT); Complex Variables (math.CV); Number Theory (math.NT)
We consider the Rudin-Shapiro-like polynomials, whose (L^4) norms on the
complex unit circle were studied by Borwein and Mossinghoff. The polynomial
(f(z)=f_0+f_1 z + cdots + f_d z^d) is identified with the sequence
((f_0,f_1,ldots,f_d)) of its coefficients. From the (L^4) norm of a
polynomial, one can easily calculate the autocorrelation merit factor of its
associated sequence, and conversely. In this paper, we study the
crosscorrelation properties of pairs of sequences associated to
Rudin-Shapiro-like polynomials. We find an explicit formula for the
crosscorrelation merit factor. A computer search is then used to find pairs of
Rudin-Shapiro-like polynomials whose autocorrelation and crosscorrelation merit
factors are simultaneously high. Pursley and Sarwate proved a bound that limits
how good this combined autocorrelation and crosscorrelation performance can be.
We find infinite families of polynomials whose performance approaches quite
close to this fundamental limit.
Richard J. Barton
Subjects: Information Theory (cs.IT)
In this paper, we derive upper and lower bounds as well as a simple
closed-form approximation for the capacity of the continuous-time, bandlimited,
additive white Gaussian noise channel in a three-dimensional free-space
electromagnetic propagation environment subject to constraints on the total
effective antenna aperture area of the link and a total transmitter power
constraint. We assume that the communication range is much larger than the
radius of the sphere containing the antennas at both ends of the link, and we
show that, in general, the capacity can only be achieved by transmitting
multiple spatially-multiplexed data streams simultaneously over the channel.
Furthermore, the lower bound on capacity can be approached asymptotically by
transmitting the data streams between a pair of physically-realizable
distributed antenna arrays at either end of the link. A consequence of this
result is that, in general, communication at close to the maximum achievable
data rate on a deep-space communication link can be achieved in practice if and
only if the communication system utilizes spatial multiplexing over a
distributed MIMO antenna array. Such an approach to deep-space communication
does not appear to be envisioned currently by any of the international space
agencies or any commercial space companies. A second consequence is that the
capacity of a long-range free-space communication link, if properly utilized,
grows asymptotically as a function of the square root of the received SNR
rather than only logarithmically in the received SNR.
Cheng Zhang, Wangdong Qi, Li Wei, Jiang Chang, Yuexin Zhao
Comments: 5 pages, 5 figures
Subjects: Information Theory (cs.IT)
The radio interferometric positioning system (RIPS) is an accurate node
localization method featuring a novel phase-based ranging process. Multipath is
the limiting error source for RIPS in ground-deployed scenarios or indoor
applications. There are four distinct channels involved in the ranging process
for RIPS. Multipath reflections affect both the phase and amplitude of the
ranging signal for each channel. By exploiting untapped amplitude information,
we put forward a scheme to estimate each channel’s multipath profile, which is
then subsequently used to correct corresponding errors in phase measurements.
Simulations show that such a scheme is very effective in reducing multipath
phase errors, which are essentially brought down to the level of receiver noise
under moderate multipath conditions. It is further demonstrated that ranging
errors in RIPS are also greatly reduced via the proposed scheme.
Hien Quoc Ngo, Le-Nam Tran, Trung Q. Duong, Michail Matthaiou, Erik G. Larsson
Subjects: Information Theory (cs.IT)
We consider the cell-free massive multiple-input multiple-output (MIMO)
downlink, where a very large number of distributed multiple-antenna access
points (APs) serve many single-antenna users in the same time-frequency
resource. A simple (distributed) conjugate beamforming scheme is applied at
each AP via the use of local channel state information (CSI). This CSI is
acquired through time-division duplex operation and the reception of uplink
training signals transmitted by the users. We derive a closed-form expression
for the spectral efficiency taking into account the effects of channel
estimation errors and power control. This closed-form result enables us to
analyze the effects of backhaul power consumption, the number of APs, and the
number of antennas per AP on the total energy efficiency, as well as, to design
an optimal power allocation algorithm. The optimal power allocation algorithm
aims at maximizing the total energy efficiency, subject to a per-user spectral
efficiency constraint and a per-AP power constraint. Compared with the case of
without power control, our proposed power allocation scheme can double the
total energy efficiency. Furthermore, we propose AP selections schemes, in
which each user chooses a subset of APs, to reduce the power consumption caused
by the backhaul links. With our proposed AP selection schemes, the total energy
efficiency increases significantly, especially for large numbers of APs.
Moreover, under a requirement of good quality-of-service for all users,
cell-free massive MIMO outperforms the colocated counterpart in terms of energy
efficiency.
Eliya Nachmani, Elad Marciano, David Burshtein, Yair Be'ery
Subjects: Information Theory (cs.IT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Designing a practical, low complexity, close to optimal, channel decoder for
powerful algebraic codes with short to moderate block length is an open
research problem. Recently it has been shown that a feed-forward neural network
architecture can improve on standard belief propagation decoding, despite the
large example space. In this paper we introduce a recurrent neural network
architecture for decoding linear block codes. Our method shows comparable bit
error rate results compared to the feed-forward neural network with
significantly less parameters. We also demonstrate improved performance over
belief propagation on sparser Tanner graph representations of the codes.
Furthermore, we demonstrate that the RNN decoder can be used to improve the
performance or alternatively reduce the computational complexity of the mRRD
algorithm for low complexity, close to optimal, decoding of short BCH codes.
Zahra Rezaei, Ehsan Yazdian, Foroogh S. Tabataba
Comments: 29 pages
Subjects: Information Theory (cs.IT)
Energy beamforming (EB) is a key technique to significantly enhance the
efficiency of wireless power transfer (WPT). In this paper, we study optimal EB
under per-antenna power constraint (PAC) which is more practical than
conventional sum-power constraint (SPC) at multi antenna energy transmitter
(ET). We consider a broadcast network, where one multi antenna ET with PAC,
transfers wireless energy for energy receivers (ER)s which are randomly placed
within the cell area. First, we consider sum energy maximization problem with
PAC without fairness and provide the optimal solution structure for general
case. This optimal structure implies that similar to SPC, sending one energy
beam is optimal with PAC which means that the rank of transmit covariance
matrix is one. We also derive closed-form solutions for two special cases and
propose two sub-optimal solutions for general case, which are very close to
optimal numerical results. To consider the fairness among the ERs, we further
propose max-min fair problem with PAC, and analyze it for the special case of
two transmit antennas. Simulation results show its advantages in comparison to
the recent works in the literature.
Georg Böcherer, Patrick Schulte, Fabian Steiner
Comments: Including 7 diagrams and 5 figures with simulation results
Subjects: Information Theory (cs.IT)
Product distribution matching (PDM) is proposed to generate target
distributions over large alphabets by combining the output of several parallel
distribution matchers (DMs) with smaller output alphabets. The parallel
architecture of PDM enables low-complexity and high-throughput implementation.
PDM is used as a shaping device for probabilistic amplitude shaping (PAS). For
64-ASK and a spectral efficiency of 4.5 bits per channel use (bpcu), PDM is as
power efficient as a single full-fledged DM. It is shown how PDM enables PAS
for parallel channels present in multi-carrier systems like digital subscriber
line (DSL) and orthogonal frequency-division multiplexing (OFDM). The key
feature is that PDM shares the DMs for lower bit-levels among different
sub-carriers, which improves the power efficiency significantly. A
representative parallel channel example shows that PAS with PDM is 0.93 dB more
power efficient than conventional uniform signaling and PDM is 0.35 dB more
power efficient than individual per channel DMs.
Beongjun Choi, Jy-yong Sohn, Sung Whan Yoon, Jaekyun Moon
Comments: 6 pages, accepted at IEEE ICC 2017
Subjects: Information Theory (cs.IT)
This paper considers the security issue of practical distributed storage
systems (DSSs) which consist of multiple clusters of storage nodes. Noticing
that actual storage nodes constituting a DSS are distributed in multiple
clusters, two novel eavesdropper models – the node-restricted model and the
cluster-restricted model – are suggested which reflect the clustered nature of
DSSs. In the node-restricted model, an eavesdropper cannot access the
individual nodes, but can eavesdrop incoming/outgoing data for (L_c)
compromised clusters. In the cluster-restricted model, an eavesdropper can
access a total of (l) individual nodes but the number of accessible clusters is
limited to (L_c). We provide an upper bound on the securely storable data for
each model, while a specific network coding scheme which achieves the upper
bound is obtained for the node-restricted model, given some mild condition on
the node storage size.
Chung Chan, Manuj Mukherjee, Navin Kashyap, Qiaoqiao Zhou
Subjects: Information Theory (cs.IT)
For the multiterminal secret key agreement problem under a private source
model, it is known that the maximum key rate, i.e., the secrecy capacity, can
be achieved through communication for omniscience, but the omniscience strategy
can be strictly suboptimal in terms of minimizing the public discussion rate.
While a single-letter characterization is not known for the minimum discussion
rate needed for achieving the secrecy capacity, we derive single-letter lower
and upper bounds that yield some simple conditions for omniscience to be
discussion-rate optimal. These conditions turn out to be enough to deduce the
optimality of omniscience for a large class of sources including the
hypergraphical sources. Through conjectures and examples, we explore other
source models to which our methods do not easily extend.
Şuayb Ş. Arslan
Comments: To be submitted to Elsevier Open access journal SoftwareX, 2017
Subjects: Information Theory (cs.IT)
Founsure is an open-source software library, distributed under LGPLv3 license
and implements a multi-dimensional graph-based erasure coding entirely based on
fast exclusive OR (XOR) logic. Its implementation uses compiler optimizations
to generate the right assembly for the given SIMD-enabled architectures.
Founsure 1.0 supports a variety of features that shall find interesting
applications in modern data storage as well as communication and computer
network systems which are becoming hungry in terms of network bandwidth,
computational resources and average consumed power. In particular, Founsure
erasure code provides a three dimensional design space i.e., computation
complexity, coding overhead and repair bandwidth to meet the requirements of
modern distributed data storage and processing systems in which the data needs
to be protected against device/hardware failure
Stephen N. Pallone, Peter I. Frazier, Shane G. Henderson
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
We analyze the problem of learning a single user’s preferences in an active
learning setting, sequentially and adaptively querying the user over a finite
time horizon. Learning is conducted via choice-based queries, where the user
selects her preferred option among a small subset of offered alternatives.
These queries have been shown to be a robust and efficient way to learn an
individual’s preferences. We take a parametric approach and model the user’s
preferences through a linear classifier, using a Bayesian prior to encode our
current knowledge of this classifier. The rate at which we learn depends on the
alternatives offered at every time epoch. Under certain noise assumptions, we
show that the Bayes-optimal policy for maximally reducing entropy of the
posterior distribution of this linear classifier is a greedy policy, and that
this policy achieves a linear lower bound when alternatives can be constructed
from the continuum. Further, we analyze a different metric called
misclassification error, proving that the performance of the optimal policy
that minimizes misclassification error is bounded below by a linear function of
differential entropy. Lastly, we numerically compare the greedy entropy
reduction policy with a knowledge gradient policy under a number of scenarios,
examining their performance under both differential entropy and
misclassification error.
Dong Xia, Ming Yuan
Comments: 56 pages, 4 figures
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
In this paper, we investigate the sample size requirement for exact recovery
of a high order tensor of low rank from a subset of its entries. We show that a
gradient descent algorithm with initial value obtained from a spectral method
can, in particular, reconstruct a ({d imes d imes d}) tensor of multilinear
ranks ((r,r,r)) with high probability from as few as
(O(r^{7/2}d^{3/2}log^{7/2}d+r^7dlog^6d)) entries. In the case when the ranks
(r=O(1)), our sample size requirement matches those for nuclear norm
minimization (Yuan and Zhang, 2016a), or alternating least squares assuming
orthogonal decomposability (Jain and Oh, 2014). Unlike these earlier
approaches, however, our method is efficient to compute, easy to implement, and
does not impose extra structures on the tensor. Numerical results are presented
to further demonstrate the merits of the proposed approach.
Yury Polyanskiy, Ananda Theertha Suresh, Yihong Wu
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (stat.ML)
The problem of population recovery refers to estimating a distribution based
on incomplete or corrupted samples. Consider a random poll of sample size (n)
conducted on a population of individuals, where each pollee is asked to answer
(d) binary questions. We consider one of the two polling impediments: (a) in
lossy population recovery, a pollee may skip each question with probability
(epsilon); (b) in noisy population recovery, a pollee may lie on each question
with probability (epsilon). Given (n) lossy or noisy samples, the goal is to
estimate the probabilities of all (2^d) binary vectors simultaneously within
accuracy (delta) with high probability.
This paper settles the sample complexity of population recovery. For lossy
model, the optimal sample complexity is ( ildeTheta(delta^{
-2max{frac{epsilon}{1-epsilon},1}})), improving the state of the art by
Moitra and Saks (2013) in several ways: a lower bound is established, the upper
bound is improved and the result is dimension-free. Surprisingly, the sample
complexity undergoes a phase transition from parametric to nonparametric rate
when (epsilon) exceeds (1/2). For noisy population recovery, the sharp sample
complexity turns out to be dimension-dependent and scales as
(exp(Theta(d^{1/3} log^{2/3}(1/delta)))) except for the trivial cases of
(epsilon=0,1/2) or (1).
For both models, our estimators simply compute the empirical mean of a
certain function, which is found by pre-solving a linear program (LP).
Curiously, the dual LP can be understood as Le Cam’s method for lower-bounding
the minimax risk, thus establishing the statistical optimality of the proposed
estimators. The value of the LP is determined by complex-analytic methods.