Carlos Garcia Cordero
Comments: 70 pages, MSc. Thesis in Artificial Intelligence, University of Edinburgh
Subjects: Neural and Evolutionary Computing (cs.NE)
Generality is one of the main advantages of heuristic algorithms, as such,
multiple parameters are exposed to the user with the objective of allowing them
to shape the algorithms to their specific needs. Parameter selection,
therefore, becomes an intrinsic problem of every heuristic algorithm. Selecting
good parameter values relies not only on knowledge related to the problem at
hand, but to the algorithms themselves. This research explores the usage of
self-organized criticality to reduce user interaction in the process of
selecting suitable parameters for particle swarm optimization (PSO) heuristics.
A particle swarm variant (named Adaptive PSO) with self-organized criticality
is developed and benchmarked against the standard PSO. Criticality is observed
in the dynamic behaviour of this swarm and excellent results are observed in
the long run. In contrast with the standard PSO, the Adaptive PSO does not
stagnate at any point in time, balancing the concepts of exploration and
exploitation better. A software platform for experimenting with particle
swarms, called PSO Laboratory, is also developed. This software is used to test
the standard PSO as well as all other PSO variants developed in the process of
creating the Adaptive PSO. As the software is intended to be of aid to future
and related research, special attention has been put in the development of a
friendly graphical user interface. Particle swarms are executed in real time,
allowing users to experiment by changing parameters on-the-fly.
Catherine D. Schuman, Thomas E. Potok, Robert M. Patton, J. Douglas Birdwell, Mark E. Dean, Garrett S. Rose, James S. Plank
Subjects: Neural and Evolutionary Computing (cs.NE)
Neuromorphic computing has come to refer to a variety of brain-inspired
computers, devices, and models that contrast the pervasive von Neumann computer
architecture. This biologically inspired approach has created highly connected
synthetic neurons and synapses that can be used to model neuroscience theories
as well as solve challenging machine learning problems. The promise of the
technology is to create a brain-like ability to learn and adapt, but the
technical challenges are significant, starting with an accurate neuroscience
model of how the brain works, to finding materials and engineering
breakthroughs to build devices to support these models, to creating a
programming framework so the systems can learn, to creating applications with
brain-like capabilities. In this work, we provide a comprehensive survey of the
research and motivations for neuromorphic computing over its history. We begin
with a 35-year review of the motivations and drivers of neuromorphic computing,
then look at the major research areas of the field, which we define as
neuro-inspired models, algorithms and learning approaches, hardware and
devices, supporting systems, and finally applications. We conclude with a broad
discussion on the major research topics that need to be addressed in the coming
years to see the promise of neuromorphic computing fulfilled. The goals of this
work are to provide an exhaustive review of the research conducted in
neuromorphic computing since the inception of the term, and to motivate further
work by illuminating gaps in the field where new research is needed.
Akhilesh Jaiswal, Amogh Agrawal, Kaushik Roy
Subjects: Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)
Conventional von-Neumann computing models have achieved remarkable feats for
the past few decades. However, they fail to deliver the required efficiency for
certain basic tasks like image and speech recognition when compared to
biological systems. As such, taking cues from biological systems, novel
computing paradigms are being explored for efficient hardware implementations
of recognition/classification tasks. The basic building blocks of such
neuromorphic systems are neurons and synapses. Towards that end, we propose a
leaky-integrate-fire (LIF) neuron using domain wall motion induced by
magnetoelectric effect. Due to a strong elastic pinning between the
ferromagnetic domain wall (FM-DW) and the underlying ferro-electric domain wall
(FE-DW), the FM-DW gets dragged by the FE-DW on application of a voltage pulse.
The fact that FE materials are insulators allows for pure voltage-driven FM-DW
motion, which in turn can be used to mimic both the leaky and the integrate
behavior of a spiking neuron. Further, the voltage driven nature of the
proposed neuron allows energy-efficient operation. A detailed device to circuit
level simulation framework based on micromagnetic simulations has been
developed to analyze the feasibility of the proposed device.
Zhengyang Wang, Shuiwang Ji
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Visual question answering is a recently proposed artificial intelligence task
that requires a deep understanding of both images and texts. In deep learning,
images are typically modeled through convolutional neural networks, and texts
are typically modeled through recurrent neural networks. While the requirement
for modeling images is similar to traditional computer vision tasks, such as
object recognition and image classification, visual question answering raises a
different need for textual representation as compared to other natural language
processing tasks. In this work, we perform a detailed analysis on natural
language questions in visual question answering. Based on the analysis, we
propose to rely on convolutional neural networks for learning textual
representations. By exploring the various properties of convolutional neural
networks specialized for text data, such as width and depth, we present our
“CNN Inception + Gate” model. We show that our model improves question
representations and thus the overall accuracy of visual question answering
models. We also show that the text representation requirement in visual
question answering is more complicated and comprehensive than that in
conventional natural language processing tasks, making it a better task to
evaluate textual representation methods. Shallow models like fastText, which
can obtain comparable results with deep learning models in tasks like text
classification, are not suitable in visual question answering.
Zhengyang Wang, Hao Yuan, Shuiwang Ji
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The key idea of variational auto-encoders (VAEs) resembles that of
traditional auto-encoder models in which spatial information is supposed to be
explicitly encoded in the latent space. However, the latent variables in VAEs
are vectors, which are commonly interpreted as multiple feature maps of size
1×1. Such representations can only convey spatial information implicitly when
coupled with powerful decoders. In this work, we propose spatial VAEs that use
latent variables as feature maps of larger size to explicitly capture spatial
information. This is achieved by allowing the latent variables to be sampled
from matrix-variate normal (MVN) distributions whose parameters are computed
from the encoder network. To increase dependencies among locations on latent
feature maps and reduce the number of parameters, we further propose spatial
VAEs via low-rank MVN distributions. Experimental results show that the
proposed spatial VAEs outperform original VAEs in capturing rich structural and
spatial information.
Hongyang Gao, Hao Yuan, Zhengyang Wang, Shuiwang Ji
Comments: 10 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Deconvolutional layers have been widely used in a variety of deep models for
up-sampling, including encoder-decoder networks for semantic segmentation and
deep generative models for unsupervised learning. One of the key limitations of
deconvolutional operations is that they result in the so-called checkerboard
problem. This is caused by the fact that no direct relationship exists among
adjacent pixels on the output feature map. To address this problem, we propose
the pixel deconvolutional layer (PixelDCL) to establish direct relationships
among adjacent pixels on the up-sampled feature map. Our method is based on a
fresh interpretation of the regular deconvolution operation. The resulting
PixelDCL can be used to replace any deconvolutional layer in a plug-and-play
manner without compromising the fully trainable capabilities of original
models. The proposed PixelDCL may result in slight decrease in efficiency, but
this can be overcome by an implementation trick. Experimental results on
semantic segmentation demonstrate that PixelDCL can consider spatial features
such as edges and shapes and yields more accurate segmentation outputs than
deconvolutional layers. When used in image generation tasks, our PixelDCL can
largely overcome the checkerboard problem suffered by regular deconvolution
operations.
Martin Mundt, Tobias Weis, Kishore Konda, Visvanathan Ramesh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Successful training of convolutional neural networks is often associated with
the training of sufficiently deep architectures composed of high amounts of
features while relying on a variety of regularization and pruning techniques to
converge to less redundant states. We introduce an easy to compute metric,
based on feature time evolution, to evaluate feature importance during training
and demonstrate its potency in determining a networks effective capacity. In
consequence we propose a novel algorithm to evolve fixed-depth architectures
starting from just a single feature per layer to attain effective
representational capacities needed for a specific task by greedily adding
feature by feature. We revisit popular CNN architectures and demonstrate how
evolved architectures not only converge to similar topologies that benefit from
less parameters or improved accuracy, but furthermore exhibit systematic
correspondence in representational complexity with the specified task. In
contrast to conventional design patterns that typically have a monotonic
increase in the amount of features with increased depth, we observe that CNNs
perform better when there is a peak in learnable parameters in intermediate,
with falloffs to earlier and later layers.
Alex Kendall, Yarin Gal, Roberto Cipolla
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Numerous deep learning applications benefit from multi-task learning with
multiple regression and classification objectives. In this paper we make the
observation that the performance of such systems is strongly dependent on the
relative weighting between each task’s loss. Tuning these weights by hand is a
difficult and expensive process, making multi-task learning prohibitive in
practice. We propose a principled approach to multi-task deep learning which
weighs multiple loss functions by considering the homoscedastic uncertainty of
each task. This allows us to simultaneously learn various quantities with
different units or scales in both classification and regression settings. We
demonstrate our model learning per-pixel depth regression, semantic and
instance segmentation from a monocular input image. Perhaps surprisingly, we
show our model can learn multi-task weightings and outperform separate models
trained individually on each task.
Clara Callenberg, Felix Heide, Gordon Wetzstein, Matthias Hullin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Computational photography encompasses a diversity of imaging techniques, but
one of the core operations performed by many of them is to compute image
differences. An intuitive approach to computing such differences is to capture
several images sequentially and then process them jointly. Usually, this
approach leads to artifacts when recording dynamic scenes. In this paper, we
introduce a snapshot difference imaging approach that is directly implemented
in the sensor hardware of emerging time-of-flight cameras. With a variety of
examples, we demonstrate that the proposed snapshot difference imaging
technique is useful for direct-global illumination separation, for direct
imaging of spatial and temporal image gradients, for direct depth edge imaging,
and more.
Chih-Chung Hsu, Chia-Wen Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Given a large unlabeled set of images, how to efficiently and effectively
group them into clusters based on the extracted visual representations remains
a challenging problem. To address this problem, we propose a convolutional
neural network (CNN) to jointly solve clustering and representation learning in
an iterative manner. In the proposed method, given an input image set, we first
randomly pick k samples and extract their features as initial cluster centroids
using the proposed CNN with an initial model pre-trained from the ImageNet
dataset. Mini-batch k-means is then performed to assign cluster labels to
individual input samples for a mini-batch of images randomly sampled from the
input image set until all images are processed. Subsequently, the proposed CNN
simultaneously updates the parameters of the proposed CNN and the centroids of
image clusters iteratively based on stochastic gradient descent. We also
proposed a feature draft compensation scheme to mitigate the drift error caused
by feature mismatch in representation learning. Experimental results
demonstrate the proposed method outperforms start-of-the-art clustering schemes
in terms of accuracy and storage complexity on large-scale image sets
containing millions of images.
Karttikeya Mangalam, K S Venkatesh
Comments: 5 Pages. The code is available at : this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cellular Automata (CA) theory is a discrete model that represents the state
of each of its cells from a finite set of possible values which evolve in time
according to a pre-defined set of transition rules. CA have been applied to a
number of image processing tasks such as Convex Hull Detection, Image Denoising
etc. but mostly under the limitation of restricting the input to binary images.
In general, a gray-scale image may be converted to a number of different binary
images which are finally recombined after CA operations on each of them
individually. We have developed a multinomial regression based weighed
summation method to recombine binary images for better performance of CA based
Image Processing algorithms. The recombination algorithm is tested for the
specific case of denoising Salt and Pepper Noise to test against standard
benchmark algorithms such as the Median Filter for various images and noise
levels. The results indicate several interesting invariances in the application
of the CA, such as the particular noise realization and the choice of
sub-sampling of pixels to determine recombination weights. Additionally, it
appears that simpler algorithms for weight optimization which seek local minima
work as effectively as those that seek global minima such as Simulated
Annealing.
Haifeng Li, Jian Peng, Chao Tao, Jie Chen, Min Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Recently, deep convolutional neural network (DCNN) achieved increasingly
remarkable success and rapidly developed in the field of natural image
recognition. Compared with the natural image, the scale of remote sensing image
is larger and the scene and the object it represents are more macroscopic. This
study inquires whether remote sensing scene and natural scene recognitions
differ and raises the following questions: What are the key factors in remote
sensing scene recognition? Is the DCNN recognition mechanism centered on object
recognition still applicable to the scenarios of remote sensing scene
understanding? We performed several experiments to explore the influence of the
DCNN structure and the scale of remote sensing scene understanding from the
perspective of scene complexity. Our experiment shows that understanding a
complex scene depends on an in-depth network and multiple-scale perception.
Using a visualization method, we qualitatively and quantitatively analyze the
recognition mechanism in a complex remote sensing scene and demonstrate the
importance of multi-objective joint semantic support.
Nathanael L. Baisa, Stephanie Bricq, Alain Lalande
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET)
automatic 3-D registration is implemented and validated for small animal image
volumes so that the high-resolution anatomical MRI information can be fused
with the low spatial resolution of functional PET information for the
localization of lesion that is currently in high demand in the study of tumor
of cancer (oncology) and its corresponding preparation of pharmaceutical drugs.
Though many registration algorithms are developed and applied on human brain
volumes, these methods may not be as efficient on small animal datasets due to
lack of intensity information and often the high anisotropy in voxel
dimensions. Therefore, a fully automatic registration algorithm which can
register not only assumably rigid small animal volumes such as brain but also
deformable organs such as kidney, cardiac and chest is developed using a
combination of global affine and local B-spline transformation models in which
mutual information is used as a similarity criterion. The global affine
registration uses a multi-resolution pyramid on image volumes of 3 levels
whereas in local B-spline registration, a multi-resolution scheme is applied on
the B-spline grid of 2 levels on the finest resolution of the image volumes in
which only the transform itself is affected rather than the image volumes.
Since mutual information lacks sufficient spatial information, PCA is used to
inject it by estimating initial translation and rotation parameters. It is
computationally efficient since it is implemented using C++ and ITK library,
and is qualitatively and quantitatively shown that this PCA-initialized global
registration followed by local registration is in close agreement with expert
manual registration and outperforms the one without PCA initialization tested
on small animal brain and kidney.
Hung Le, Ali Borji
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we explain in detail how receptive fields, effective receptive
fields, and projective fields of neurons in different layers, convolution or
pooling, of a Convolutional Neural Network (CNN) are calculated. While our
focus here is on CNNs, the same operations, but in the reverse order, can be
used to calculate these quantities for deconvolutional neural networks. These
are important concepts, not only for better understanding and analyzing
convolutional and deconvolutional networks, but also for optimizing their
performance in real-world applications.
Jen-wei Kuo, Jonathan Mamou, Yao Wang, Emi Saegusa-Beecroft, Junji Machi, Ernest J. Feleppa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Previous studies by our group have shown that three-dimensional
high-frequency quantitative ultrasound methods have the potential to
differentiate metastatic lymph nodes from cancer-free lymph nodes dissected
from human cancer patients. To successfully perform these methods inside the
lymph node parenchyma, an automatic segmentation method is highly desired to
exclude the surrounding thin layer of fat from quantitative ultrasound
processing and accurately correct for ultrasound attenuation. In high-frequency
ultrasound images of lymph nodes, the intensity distribution of lymph node
parenchyma and fat varies spatially because of acoustic attenuation and
focusing effects. Thus, the intensity contrast between two object regions
(e.g., lymph node parenchyma and fat) is also spatially varying. In our
previous work, nested graph cut demonstrated its ability to simultaneously
segment lymph node parenchyma, fat, and the outer phosphate-buffered saline
bath even when some boundaries are lost because of acoustic attenuation and
focusing effects. This paper describes a novel approach called graph cut with
locally adaptive energy to further deal with spatially varying distributions of
lymph node parenchyma and fat caused by inhomogeneous acoustic attenuation. The
proposed method achieved Dice similarity coefficients of 0.937+-0.035 when
compared to expert manual segmentation on a representative dataset consisting
of 115 three-dimensional lymph node images obtained from colorectal cancer
patients.
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We describe the DeepMind Kinetics human action video dataset. The dataset
contains 400 human action classes, with at least 400 video clips for each
action. Each clip lasts around 10s and is taken from a different YouTube video.
The actions are human focussed and cover a broad range of classes including
human-object interactions such as playing instruments, as well as human-human
interactions such as shaking hands. We describe the statistics of the dataset,
how it was collected, and give some baseline performance figures for neural
network architectures trained and tested for human action classification on
this dataset. We also carry out a preliminary analysis of whether imbalance in
the dataset leads to bias in the classifiers.
Muhammad Ahmad, Stanislav Protasov, Adil Mehmood Khan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In order to make hyperspectral image classification compu- tationally
tractable, it is often necessary to select the most informative bands instead
to process the whole data without losing the geometrical representation of
original data. To cope with said issue, an improved un- supervised non-linear
deep auto encoder (UDAE) based band selection method is proposed. The proposed
UDAE is able to select the most infor- mative bands in such a way that preserve
the key information but in the lower dimensions, where the hidden
representation is a non-linear trans- formation that maps the original space to
a space of lower dimensions. This work emphasizes to analyze what type of
information is needed to preserve the hierarchical UDAE representation while
selecting a sub- set from original space. Our experiments on publically
available hyper- spectral dataset demonstrate the effectiveness of UDAE method,
which equates favorably with other state-of-the-art methods.
Dmytro Derkach, Federico M. Sukno
Comments: 12th IEEE International Conference on Face and Gesture Recognition, Washington, DC, USA, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We investigate the problem of facial expression recognition using 3D data.
Building from one of the most successful frameworks for facial analysis using
exclusively 3D geometry, we extend the analysis from a curve-based
representation into a spectral representation, which allows a complete
description of the underlying surface that can be further tuned to the desired
level of detail. Spectral representations are based on the decomposition of the
geometry in its spatial frequency components, much like a Fourier transform,
which are related to intrinsic characteristics of the surface. In this work, we
propose the use of Graph Laplacian Features (GLF), which results from the
projection of local surface patches into a common basis obtained from the Graph
Laplacian eigenspace. We test the proposed approach in the BU-3DFE database in
terms of expressions and Action Units recognition. Our results confirm that the
proposed GLF produces consistently higher recognition rates than the
curves-based approach, thanks to a more complete description of the surface,
while requiring a lower computational complexity. We also show that the GLF
outperform the most popular alternative approach for spectral representation,
Shape- DNA, which is based on the Laplace Beltrami Operator and cannot provide
a stable basis that guarantee that the extracted signatures for the different
patches are directly comparable.
You Hao, Shirui Li, Hanlin Mo, Hua Li
Comments: 11 pages,4 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a novel Affine-Gradient based Local Binary Pattern (AGLBP)
descriptor for texture classification. It is very hard to describe complicated
texture using single type information, such as Local Binary Pattern (LBP),
which just utilizes the sign information of the difference between the pixel
and its local neighbors. Our descriptor has three characteristics: 1) In order
to make full use of the information contained in the texture, the
Affine-Gradient, which is different from Euclidean-Gradient and invariant to
affine transformation is incorporated into AGLBP. 2) An improved method is
proposed for rotation invariance, which depends on the reference direction
calculating respect to local neighbors. 3) Feature selection method,
considering both the statistical frequency and the intraclass variance of the
training dataset, is also applied to reduce the dimensionality of descriptors.
Experiments on three standard texture datasets, Outex12, Outex10 and KTH-TIPS2,
are conducted to evaluate the performance of AGLBP. The results show that our
proposed descriptor gets better performance comparing to some state-of-the-art
rotation texture descriptors in texture classification.
Chuyang Ye, Jerry L. Prince
Comments: A shorter version is accepted by MICCAI 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Diffusion magnetic resonance imaging (dMRI) is currently the only tool for
noninvasively imaging the brain’s white matter tracts. The fiber orientation
(FO) is a key feature computed from dMRI for fiber tract reconstruction.
Because the number of FOs in a voxel is usually small, dictionary-based sparse
reconstruction has been used to estimate FOs with a relatively small number of
diffusion gradients. However, accurate FO estimation in regions with complex FO
configurations in the presence of noise can still be challenging. In this work
we explore the use of a deep network for FO estimation in a dictionary-based
framework and propose an algorithm named Fiber Orientation Reconstruction
guided by a Deep Network (FORDN). FORDN consists of two steps. First, we use a
smaller dictionary encoding coarse basis FOs to represent the diffusion
signals. To estimate the mixture fractions of the dictionary atoms (and thus
coarse FOs), a deep network is designed specifically for solving the sparse
reconstruction problem. Here, the smaller dictionary is used to reduce the
computational cost of training. Second, the coarse FOs inform the final FO
estimation, where a larger dictionary encoding dense basis FOs is used and a
weighted l1-norm regularized least squares problem is solved to encourage FOs
that are consistent with the network output. FORDN was evaluated and compared
with state-of-the-art algorithms that estimate FOs using sparse reconstruction
on simulated and real dMRI data, and the results demonstrate the benefit of
using a deep network for FO estimation.
Yan Yang, Jian Sun, Huibin Li, Zongben Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Compressive sensing (CS) is an effective approach for fast Magnetic Resonance
Imaging (MRI). It aims at reconstructing MR images from a small number of
under-sampled data in k-space, and accelerating the data acquisition in MRI. To
improve the current MRI system in reconstruction accuracy and speed, in this
paper, we propose two novel deep architectures, dubbed ADMM-Nets in basic and
generalized versions. ADMM-Nets are defined over data flow graphs, which are
derived from the iterative procedures in Alternating Direction Method of
Multipliers (ADMM) algorithm for optimizing a general CS-based MRI model. They
take the sampled k-space data as inputs and output reconstructed MR images.
Moreover, we extend our network to cope with complex-valued MR images. In the
training phase, all parameters of the nets, e.g., transforms, shrinkage
functions, etc., are discriminatively trained end-to-end. In the testing phase,
they have computational overhead similar to ADMM algorithm but use optimized
parameters learned from the data for CS-based reconstruction task. We
investigate different configurations in network structures and conduct
extensive experiments on MR image reconstruction under different sampling
rates. Due to the combination of the advantages in model-based approach and
deep learning approach, the ADMM-Nets achieve state-of-the-art reconstruction
accuracies with fast computational speed.
Qin Zhang, Hui Wang, Junyu Dong, Guoqiang Zhong, Xin Sun
Comments: 5 page, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This letter adopts long short-term memory(LSTM) to predict sea surface
temperature(SST), which is the first attempt, to our knowledge, to use
recurrent neural network to solve the problem of SST prediction, and to make
one week and one month daily prediction. We formulate the SST prediction
problem as a time series regression problem. LSTM is a special kind of
recurrent neural network, which introduces gate mechanism into vanilla RNN to
prevent the vanished or exploding gradient problem. It has strong ability to
model the temporal relationship of time series data and can handle the
long-term dependency problem well. The proposed network architecture is
composed of two kinds of layers: LSTM layer and full-connected dense layer.
LSTM layer is utilized to model the time series relationship. Full-connected
layer is utilized to map the output of LSTM layer to a final prediction. We
explore the optimal setting of this architecture by experiments and report the
accuracy of coastal seas of China to confirm the effectiveness of the proposed
method. In addition, we also show its online updated characteristics.
Songxuan Lai, Lianwen Jin, Weixin Yang
Comments: 6 pages, 5 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Inspired by the great success of recurrent neural networks (RNNs) in
sequential modeling, we introduce a novel RNN system to improve the performance
of online signature verification. The training objective is to directly
minimize intra-class variations and to push the distances between skilled
forgeries and genuine samples above a given threshold. By back-propagating the
training signals, our RNN network produced discriminative features with desired
metrics. Additionally, we propose a novel descriptor, called the
length-normalized path signature (LNPS), and apply it to online signature
verification. LNPS has interesting properties, such as scale invariance and
rotation invariance after linear combination, and shows promising results in
online signature verification. Experiments on the publicly available SVC-2004
dataset yielded state-of-the-art performance of 2.37% equal error rate (EER).
Nasim Nematzadeh, David M.W. Powers
Comments: Submitted to PLOS ONE
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper explores the tilt illusion effect in the Cafe Wall pattern using a
classical Gaussian Receptive Field model. In this illusion, the mortar lines
are misperceived as diverging or converging rather than horizontal. We examine
the capability of a simple bioplausible filtering model to recognize different
degrees of tilt effect in the Cafe Wall illusion based on different
characteristics of the pattern. Our study employed a Difference of Gaussians
model of retinal to cortical ON center and/or OFF center receptive fields. A
wide range of parameters of the stimulus, for example mortar thickness,
brightness, tile contrast, phase of the tile displacement, have been studied.
Our model constructs a multiple scale edge map representation that reveals tilt
cues and clues involved in the illusory perception of the Cafe Wall pattern. We
will show here that our model not only can detect the tilt in this pattern, but
also can predict the strength of the illusion and quantify the degree of tilt.
The results of our simulations are consistent with previous psychophysical
findings across the full range of Cafe Wall variations tested. Our results also
suggest that the Difference of Gaussian mechanism is the heart of the effects
explained by, and the mechanisms proposed by the Irradiation, Brightness
Induction, and Band-pass filtering models.
Chaoyang Wang, Hamed Kiani Galoogahi, Chen-Hsuan Lin, Chen Huang, Simon Lucey
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we present a new approach for efficient regression based object
tracking which we refer to as Deep- LK. Our approach is closely related to the
Generic Object Tracking Using Regression Networks (GOTURN) framework of Held et
al. We make the following contributions. First, we demonstrate that there is a
theoretical relationship between siamese regression networks like GOTURN and
the classical Inverse-Compositional Lucas & Kanade (IC-LK) algorithm. Further,
we demonstrate that unlike GOTURN IC-LK adapts its regressor to the appearance
of the currently tracked frame. We argue that this missing property in GOTURN
can be attributed to its poor performance on unseen objects and/or viewpoints.
Second, we propose a novel framework for object tracking – which we refer to as
Deep-LK – that is inspired by the IC-LK framework. Finally, we show impressive
results demonstrating that Deep-LK substantially outperforms GOTURN.
Additionally, we demonstrate comparable tracking performance to current state
of the art deep-trackers whilst being an order of magnitude (i.e. 100 FPS)
computationally efficient.
Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a method which combines the flexibility of the
neural algorithm of artistic style with the speed of fast style transfer
networks to allow real-time stylization using any content/style image pair. We
build upon recent work leveraging conditional instance normalization for
multi-style transfer networks by learning to predict the conditional instance
normalization parameters directly from a style image. The model is successfully
trained on a corpus of roughly 80,000 paintings and is able to generalize to
paintings previously unobserved. We demonstrate that the learned embedding
space is smooth and contains a rich structure and organizes semantic
information associated with paintings in an entirely unsupervised manner.
Martin Mundt, Tobias Weis, Kishore Konda, Visvanathan Ramesh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Successful training of convolutional neural networks is often associated with
the training of sufficiently deep architectures composed of high amounts of
features while relying on a variety of regularization and pruning techniques to
converge to less redundant states. We introduce an easy to compute metric,
based on feature time evolution, to evaluate feature importance during training
and demonstrate its potency in determining a networks effective capacity. In
consequence we propose a novel algorithm to evolve fixed-depth architectures
starting from just a single feature per layer to attain effective
representational capacities needed for a specific task by greedily adding
feature by feature. We revisit popular CNN architectures and demonstrate how
evolved architectures not only converge to similar topologies that benefit from
less parameters or improved accuracy, but furthermore exhibit systematic
correspondence in representational complexity with the specified task. In
contrast to conventional design patterns that typically have a monotonic
increase in the amount of features with increased depth, we observe that CNNs
perform better when there is a peak in learnable parameters in intermediate,
with falloffs to earlier and later layers.
Xi'ai Chen, Zhi Han, Yao Wang, Qian Zhao, Deyu Meng, Lin Lin, Yandong Tang
Comments: 13 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Because of the limitations of matrix factorization, such as losing spatial
structure information, the concept of low-rank tensor factorization (LRTF) has
been applied for the recovery of a low dimensional subspace from high
dimensional visual data. The low-rank tensor recovery is generally achieved by
minimizing the loss function between the observed data and the factorization
representation. The loss function is designed in various forms under different
noise distribution assumptions, like (L_1) norm for Laplacian distribution and
(L_2) norm for Gaussian distribution. However, they often fail to tackle the
real data which are corrupted by the noise with unknown distribution. In this
paper, we propose a generalized weighted low-rank tensor factorization method
(GWLRTF) integrated with the idea of noise modelling. This procedure treats the
target data as high-order tensor directly and models the noise by a Mixture of
Gaussians, which is called MoG GWLRTF. The parameters in the model are
estimated under the EM framework and through a new developed algorithm of
weighted low-rank tensor factorization. We provide two versions of the
algorithm with different tensor factorization operations, i.e., CP
factorization and Tucker factorization. Extensive experiments indicate the
respective advantages of this two versions in different applications and also
demonstrate the effectiveness of MoG GWLRTF compared with other competing
methods.
Zhengyang Wang, Hao Yuan, Shuiwang Ji
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The key idea of variational auto-encoders (VAEs) resembles that of
traditional auto-encoder models in which spatial information is supposed to be
explicitly encoded in the latent space. However, the latent variables in VAEs
are vectors, which are commonly interpreted as multiple feature maps of size
1×1. Such representations can only convey spatial information implicitly when
coupled with powerful decoders. In this work, we propose spatial VAEs that use
latent variables as feature maps of larger size to explicitly capture spatial
information. This is achieved by allowing the latent variables to be sampled
from matrix-variate normal (MVN) distributions whose parameters are computed
from the encoder network. To increase dependencies among locations on latent
feature maps and reduce the number of parameters, we further propose spatial
VAEs via low-rank MVN distributions. Experimental results show that the
proposed spatial VAEs outperform original VAEs in capturing rich structural and
spatial information.
Hongyang Gao, Hao Yuan, Zhengyang Wang, Shuiwang Ji
Comments: 10 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Deconvolutional layers have been widely used in a variety of deep models for
up-sampling, including encoder-decoder networks for semantic segmentation and
deep generative models for unsupervised learning. One of the key limitations of
deconvolutional operations is that they result in the so-called checkerboard
problem. This is caused by the fact that no direct relationship exists among
adjacent pixels on the output feature map. To address this problem, we propose
the pixel deconvolutional layer (PixelDCL) to establish direct relationships
among adjacent pixels on the up-sampled feature map. Our method is based on a
fresh interpretation of the regular deconvolution operation. The resulting
PixelDCL can be used to replace any deconvolutional layer in a plug-and-play
manner without compromising the fully trainable capabilities of original
models. The proposed PixelDCL may result in slight decrease in efficiency, but
this can be overcome by an implementation trick. Experimental results on
semantic segmentation demonstrate that PixelDCL can consider spatial features
such as edges and shapes and yields more accurate segmentation outputs than
deconvolutional layers. When used in image generation tasks, our PixelDCL can
largely overcome the checkerboard problem suffered by regular deconvolution
operations.
Charalampos Nikolaou, Egor V. Kostylev, George Konstantinidis, Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks
Subjects: Artificial Intelligence (cs.AI)
Ontology-based data access (OBDA) is a popular approach for integrating and
querying multiple data sources by means of a shared ontology. The ontology is
linked to the sources using mappings, which assign views over the data to
ontology predicates. Motivated by the need for OBDA systems supporting
database-style aggregate queries, we propose a bag semantics for OBDA, where
duplicate tuples in the views defined by the mappings are retained, as is the
case in standard databases. We show that bag semantics makes conjunctive query
answering in OBDA coNP-hard in data complexity. To regain tractability, we
consider a rather general class of queries and show its rewritability to a
generalisation of the relational calculus to bags.
Ondrej Kuzelka, Jesse Davis, Steven Schockaert
Comments: Longer version of a paper appearing in IJCAI 2017
Subjects: Artificial Intelligence (cs.AI)
The field of Statistical Relational Learning (SRL) is concerned with learning
probabilistic models from relational data. Learned SRL models are typically
represented using some kind of weighted logical formulas, which make them
considerably more interpretable than those obtained by e.g. neural networks. In
practice, however, these models are often still difficult to interpret
correctly, as they can contain many formulas that interact in non-trivial ways
and weights do not always have an intuitive meaning. To address this, we
propose a new SRL method which uses possibilistic logic to encode relational
models. Learned models are then essentially stratified classical theories,
which explicitly encode what can be derived with a given level of certainty.
Compared to Markov Logic Networks (MLNs), our method is faster and produces
considerably more interpretable models.
Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, Boris Motik, Ian Horrocks
Comments: 23 pages. To appear in IJCAI-17
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Motivated by applications in declarative data analysis, we study
(mathit{Datalog}_{mathbb{Z}})—an extension of positive Datalog with
arithmetic functions over integers. This language is known to be undecidable,
so we propose two fragments. In (mathit{limit}~mathit{Datalog}_{mathbb{Z}})
predicates are axiomatised to keep minimal/maximal numeric values, allowing us
to show that fact entailment is coNExpTime-complete in combined, and
coNP-complete in data complexity. Moreover, an additional (mathit{stability})
requirement causes the complexity to drop to ExpTime and PTime, respectively.
Finally, we show that stable (mathit{Datalog}_{mathbb{Z}}) can express many
useful data analysis tasks, and so our results provide a sound foundation for
the development of advanced information systems.
Jing Wu Lian, Nicholas Mattei, Renee Noble, Toby Walsh
Comments: 3 Figure
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Motivated by the common academic problem of allocating papers to referees for
conference reviewing we propose a novel mechanism for solving the assignment
problem when we have a two sided matching problem with preferences from one
side (the agents/reviewers) over the other side (the objects/papers) and both
sides have capacity constraints. The assignment problem is a fundamental
problem in both computer science and economics with application in many areas
including task and resource allocation. We draw inspiration from multi-criteria
decision making and voting and use order weighted averages (OWAs) to propose a
novel and flexible class of algorithms for the assignment problem. We show an
algorithm for finding a (Sigma)-OWA assignment in polynomial time, in contrast
to the NP-hardness of finding an egalitarian assignment. Inspired by this
setting we observe an interesting connection between our model and the classic
proportional multi-winner election problem in social choice.
Hamid Arabnejad, Claus Pahl, Pooyan Jamshidi, Giovani Estrada
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
A goal of cloud service management is to design self-adaptable auto-scaler to
react to workload fluctuations and changing the resources assigned. The key
problem is how and when to add/remove resources in order to meet agreed
service-level agreements. Reducing application cost and guaranteeing
service-level agreements (SLAs) are two critical factors of dynamic controller
design. In this paper, we compare two dynamic learning strategies based on a
fuzzy logic system, which learns and modifies fuzzy scaling rules at runtime. A
self-adaptive fuzzy logic controller is combined with two reinforcement
learning (RL) approaches: (i) Fuzzy SARSA learning (FSL) and (ii) Fuzzy
Q-learning (FQL). As an off-policy approach, Q-learning learns independent of
the policy currently followed, whereas SARSA as an on-policy always
incorporates the actual agent’s behavior and leads to faster learning. Both
approaches are implemented and compared in their advantages and disadvantages,
here in the OpenStack cloud platform. We demonstrate that both auto-scaling
approaches can handle various load traffic situations, sudden and periodic, and
delivering resources on demand while reducing operating costs and preventing
SLA violations. The experimental results demonstrate that FSL and FQL have
acceptable performance in terms of adjusted number of virtual machine targeted
to optimize SLA compliance and response time.
Emmanouil A. Platanios, Hoifung Poon, Tom M. Mitchell, Eric Horvitz
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We propose an efficient method to estimate the accuracy of classifiers using
only unlabeled data. We consider a setting with multiple classification
problems where the target classes may be tied together through logical
constraints. For example, a set of classes may be mutually exclusive, meaning
that a data instance can belong to at most one of them. The proposed method is
based on the intuition that: (i) when classifiers agree, they are more likely
to be correct, and (ii) when the classifiers make a prediction that violates
the constraints, at least one classifier must be making an error. Experiments
on four real-world data sets produce accuracy estimates within a few percent of
the true accuracy, using solely unlabeled data. Our models also outperform
existing state-of-the-art solutions in both estimating accuracies, and
combining multiple classifier outputs. The results emphasize the utility of
logical constraints in estimating accuracy, thus validating our intuition.
Robert Adamski, Tomasz Grel, Maciej Klimek, Henryk Michalewski
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Learning (cs.LG)
The asynchronous nature of the state-of-the-art reinforcement learning
algorithms such as the Asynchronous Advantage Actor-Critic algorithm, makes
them exceptionally suitable for CPU computations. However, given the fact that
deep reinforcement learning often deals with interpreting visual information, a
large part of the train and inference time is spent performing convolutions. In
this work we present our results on learning strategies in Atari games using a
Convolutional Neural Network, the Math Kernel Library and TensorFlow 0.11rc0
machine learning framework. We also analyze effects of asynchronous
computations on the convergence of reinforcement learning algorithms.
Nat Dilokthanakul, Christos Kaplanis, Nick Pawlowski, Murray Shanahan
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
The problem of sparse rewards is one of the hardest challenges in
contemporary reinforcement learning. Hierarchical reinforcement learning (HRL)
tackles this problem by using a set of temporally-extended actions, or options,
each of which has its own subgoal. These subgoals are normally handcrafted for
specific tasks. Here, though, we introduce a generic class of subgoals with
broad applicability in the visual domain. Underlying our approach (in common
with work using “auxiliary tasks”) is the hypothesis that the ability to
control aspects of the environment is an inherently useful skill to have. We
incorporate such subgoals in an end-to-end hierarchical reinforcement learning
system and test two variants of our algorithm on a number of games from the
Atari suite. We highlight the advantage of our approach in one of the hardest
games — Montezuma’s revenge — for which the ability to handle sparse rewards
is key. Our agent learns several times faster than the current state-of-the-art
HRL agent in this game, reaching a similar level of performance.
Cory Stephenson, Patrick Callier, Abhinav Ganesh, Karl Ni
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We propose an algorithm to separate simultaneously speaking persons from each
other, the “cocktail party problem”, using a single microphone. Our approach
involves a deep recurrent neural networks regression to a vector space that is
descriptive of independent speakers. Such a vector space can embed empirically
determined speaker characteristics and is optimized by distinguishing between
speaker masks. We call this technique source-contrastive estimation. The
methodology is inspired by negative sampling, which has seen success in natural
language processing, where an embedding is learned by correlating and
de-correlating a given input vector with output weights. Although the matrix
determined by the output weights is dependent on a set of known speakers, we
only use the input vectors during inference. Doing so will ensure that source
separation is explicitly speaker-independent. Our approach is similar to recent
deep neural network clustering and permutation-invariant training research; we
use weighted spectral features and masks to augment individual speaker
frequencies while filtering out other speakers. We avoid, however, the severe
computational burden of other approaches with our technique. Furthermore, by
training a vector space rather than combinations of different speakers or
differences thereof, we avoid the so-called permutation problem during
training. Our algorithm offers an intuitive, computationally efficient response
to the cocktail party problem, and most importantly boasts better empirical
performance than other current techniques.
Aida Slavic
Comments: Aida Slavic (2011) Classification revisited: a web of knowledge. In: Innovations in information retrieval: perspectives for theory and practice. Eds. Allen Foster and Pauline Rafferty. London: Facet, pp. 23-48
Subjects: Information Retrieval (cs.IR)
The vision of the Semantic Web (SW) is gradually unfolding and taking shape
through a web of linked data, a part of which is built by capturing semantics
stored in existing knowledge organization systems (KOS), subject metadata and
resource metadata. The content of vast bibliographic collections is currently
categorized by some widely used bibliographic classification and we may soon
see them being mined for information and linked in a meaningful way across the
Web. Bibliographic classifications are designed for knowledge mediation which
offers both a rich terminology and different ways in which concepts can be
categorized and related to each other in the universe of knowledge. From
1990-2010 they have been used in various resource discovery services on the Web
and continue to be used to support information integration in a number of
international digital library projects. In this chapter we will revisit some of
the ways in which universal classifications, as language independent concept
schemes, can assist humans and computers in structuring and presenting
information and formulating queries. Most importantly, we highlight issues
important to understanding bibliographic classifications, both in terms of
their unused potential and technical limitations.
Gustavo R. Lima, Carlos E. Mello, Geraldo Zimbrao
Subjects: Information Retrieval (cs.IR)
Recommender systems play an important role in many scenarios where users are
overwhelmed with too many choices to make. In this context, Collaborative
Filtering (CF) arises by providing a simple and widely used approach for
personalized recommendation. Memory-based CF algorithms mostly rely on
similarities between pairs of users or items, which are posteriorly employed in
classifiers like k-Nearest Neighbor (kNN) to generalize for unknown ratings. A
major issue regarding this approach is to build the similarity matrix.
Depending on the dimensionality of the rating matrix, the similarity
computations may become computationally intractable. To overcome this issue, we
propose to represent users by their distances to preselected users, namely
landmarks. This procedure allows to drastically reduce the computational cost
associated with the similarity matrix. We evaluated our proposal on two
distinct distinguishing databases, and the results showed our method has
consistently and considerably outperformed eight CF algorithms (including both
memory-based and model-based) in terms of computational performance.
Aida Slavic
Journal-ref: Axiomathes, 18 (2), pp. 257-71 (2008)
Subjects: Information Retrieval (cs.IR)
The paper discusses issues related to the use of faceted classifications in
an online environment. The author argues that knowledge organization systems
can be fully utilized in information retrieval only if they are exposed and
made available for machine processing. The experience with classification
automation to date may be used to speed up and ease the conversion of existing
faceted schemes or the creation of management tools for new systems. The author
suggests that it is possible to agree on a set of functional requirements for
supporting faceted classifications online that are equally relevant for the
maintenance of classifications, the creation of classification indexing tools,
or the management of classifications in an authority file. It is suggested that
a set of requirements for analytico-synthetic classifications may be put
forward to improve standards for the use and exchange of knowledge organization
systems.
Matthias Dorfer, Jan Schlüter, Andreu Vall, Filip Korzeniowski, Gerhard Widmer
Comments: 15 pages
Subjects: Information Retrieval (cs.IR)
Cross-modality retrieval encompasses retrieval tasks where the fetched items
are of a different type than the search query, e.g., retrieving pictures
relevant to a given text query. The state-of-the-art approach to cross-modality
retrieval relies on learning a joint embedding space of the two modalities,
where items from either modality are retrieved using nearest-neighbor search.
In this work, we introduce a neural network layer based on Canonical
Correlation Analysis (CCA) that learns better embedding spaces by analytically
computing projections that maximize correlation. In contrast to previous
approaches, the CCA layer allows us to combine existing objectives for
embedding space learning, such as pairwise ranking losses, with the optimal
projections of CCA. We show the effectiveness of our approach for
cross-modality retrieval on three datasets, surpassing both Deep CCA and a
multi-view network using freely learned projections optimized by a pairwise
ranking loss, especially when few training data is available.
Leandro B. dos Santos, Magali S. Duran, Nathan S. Hartmann, Arnaldo Candido Jr., Gustavo H. Paetzold, Sandra M. Aluisio
Comments: Paper accepted for TSD2017
Subjects: Computation and Language (cs.CL)
Psycholinguistic properties of words have been used in various approaches to
Natural Language Processing tasks, such as text simplification and readability
assessment. Most of these properties are subjective, involving costly and
time-consuming surveys to be gathered. Recent approaches use the limited
datasets of psycholinguistic properties to extend them automatically to large
lexicons. However, some of the resources used by such approaches are not
available to most languages. This study presents a method to infer
psycholinguistic properties for Brazilian Portuguese (BP) using regressors
built with a light set of features usually available for less resourced
languages: word length, frequency lists, lexical databases composed of school
dictionaries and word embedding models. The correlations between the properties
inferred are close to those obtained by related works. The resulting resource
contains 26,874 words in BP annotated with concreteness, age of acquisition,
imageability and subjective frequency.
Zhengyang Wang, Shuiwang Ji
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Visual question answering is a recently proposed artificial intelligence task
that requires a deep understanding of both images and texts. In deep learning,
images are typically modeled through convolutional neural networks, and texts
are typically modeled through recurrent neural networks. While the requirement
for modeling images is similar to traditional computer vision tasks, such as
object recognition and image classification, visual question answering raises a
different need for textual representation as compared to other natural language
processing tasks. In this work, we perform a detailed analysis on natural
language questions in visual question answering. Based on the analysis, we
propose to rely on convolutional neural networks for learning textual
representations. By exploring the various properties of convolutional neural
networks specialized for text data, such as width and depth, we present our
“CNN Inception + Gate” model. We show that our model improves question
representations and thus the overall accuracy of visual question answering
models. We also show that the text representation requirement in visual
question answering is more complicated and comprehensive than that in
conventional natural language processing tasks, making it a better task to
evaluate textual representation methods. Shallow models like fastText, which
can obtain comparable results with deep learning models in tasks like text
classification, are not suitable in visual question answering.
Hamid Arabnejad, Claus Pahl, Pooyan Jamshidi, Giovani Estrada
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
A goal of cloud service management is to design self-adaptable auto-scaler to
react to workload fluctuations and changing the resources assigned. The key
problem is how and when to add/remove resources in order to meet agreed
service-level agreements. Reducing application cost and guaranteeing
service-level agreements (SLAs) are two critical factors of dynamic controller
design. In this paper, we compare two dynamic learning strategies based on a
fuzzy logic system, which learns and modifies fuzzy scaling rules at runtime. A
self-adaptive fuzzy logic controller is combined with two reinforcement
learning (RL) approaches: (i) Fuzzy SARSA learning (FSL) and (ii) Fuzzy
Q-learning (FQL). As an off-policy approach, Q-learning learns independent of
the policy currently followed, whereas SARSA as an on-policy always
incorporates the actual agent’s behavior and leads to faster learning. Both
approaches are implemented and compared in their advantages and disadvantages,
here in the OpenStack cloud platform. We demonstrate that both auto-scaling
approaches can handle various load traffic situations, sudden and periodic, and
delivering resources on demand while reducing operating costs and preventing
SLA violations. The experimental results demonstrate that FSL and FQL have
acceptable performance in terms of adjusted number of virtual machine targeted
to optimize SLA compliance and response time.
Robert Adamski, Tomasz Grel, Maciej Klimek, Henryk Michalewski
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Learning (cs.LG)
The asynchronous nature of the state-of-the-art reinforcement learning
algorithms such as the Asynchronous Advantage Actor-Critic algorithm, makes
them exceptionally suitable for CPU computations. However, given the fact that
deep reinforcement learning often deals with interpreting visual information, a
large part of the train and inference time is spent performing convolutions. In
this work we present our results on learning strategies in Atari games using a
Convolutional Neural Network, the Math Kernel Library and TensorFlow 0.11rc0
machine learning framework. We also analyze effects of asynchronous
computations on the convergence of reinforcement learning algorithms.
Sarvar Patel, Giuseppe Persiano, Kevin Yeo
Comments: 15 pages, 4 figures
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
In this work, we present CacheShuffle, an algorithm for obliviously shuffling
data on an untrusted server. Our methods improve the previously best known
results by reducing the number of block accesses to (4 + epsilon)N which is
close to the lower bound of 2N. Experimentation shows that for practical sizes
of N, there is a 4x improvement over the previous best result. Also our result
only requires 2 roundtrips compared to 5 roundtrips needed by the previous
result. The novelty in our algorithm involves introducing a persistent client
cache. Additionally, we show a recursive version of our algorithm which allows
oblivious shuffling with smaller client memory.
Emmanouil A. Platanios, Hoifung Poon, Tom M. Mitchell, Eric Horvitz
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We propose an efficient method to estimate the accuracy of classifiers using
only unlabeled data. We consider a setting with multiple classification
problems where the target classes may be tied together through logical
constraints. For example, a set of classes may be mutually exclusive, meaning
that a data instance can belong to at most one of them. The proposed method is
based on the intuition that: (i) when classifiers agree, they are more likely
to be correct, and (ii) when the classifiers make a prediction that violates
the constraints, at least one classifier must be making an error. Experiments
on four real-world data sets produce accuracy estimates within a few percent of
the true accuracy, using solely unlabeled data. Our models also outperform
existing state-of-the-art solutions in both estimating accuracies, and
combining multiple classifier outputs. The results emphasize the utility of
logical constraints in estimating accuracy, thus validating our intuition.
Mehmet A. Donmez, Maxim Raginsky, Andrew C. Singer
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We present a generic framework for trading off fidelity and cost in computing
stochastic gradients when the costs of acquiring stochastic gradients of
different quality are not known a priori. We consider a mini-batch oracle that
distributes a limited query budget over a number of stochastic gradients and
aggregates them to estimate the true gradient. Since the optimal mini-batch
size depends on the unknown cost-fidelity function, we propose an algorithm,
{it EE-Grad}, that sequentially explores the performance of mini-batch oracles
and exploits the accumulated knowledge to estimate the one achieving the best
performance in terms of cost-efficiency. We provide performance guarantees for
EE-Grad with respect to the optimal mini-batch oracle, and illustrate these
results in the case of strongly convex objectives. We also provide a simple
numerical example that corroborates our theoretical findings.
Daniel Hsu, Kevin Shi, Xiaorui Sun
Subjects: Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
This article considers algorithmic and statistical aspects of linear
regression when the correspondence between the covariates and the responses is
unknown. First, a fully polynomial-time approximation scheme is given for the
natural least squares optimization problem in any constant dimension. Next, in
an average-case and noise-free setting where the responses exactly correspond
to a linear function of i.i.d. draws from a standard multivariate normal
distribution, an efficient algorithm based on lattice basis reduction is shown
to exactly recover the unknown linear function in arbitrary dimension. Finally,
lower bounds on the signal-to-noise ratio are established for approximate
recovery of the unknown linear function by any estimator.
Shipra Agrawal, Randy Jia
Subjects: Learning (cs.LG)
We present an algorithm based on posterior sampling (aka Thompson sampling)
that achieves near-optimal worst-case regret bounds when the underlying Markov
Decision Process (MDP) is communicating with a finite, though unknown,
diameter. Our main result is a high probability regret upper bound of
( ilde{O}(Dsqrt{SAT})) for any communicating MDP with (S) states, (A) actions
and diameter (D), when (Tge S^5A). Here, regret compares the total reward
achieved by the algorithm to the total expected reward of an optimal
infinite-horizon undiscounted average reward policy, in time horizon (T). This
result improves over the best previously known upper bound of
( ilde{O}(DSsqrt{AT})) achieved by any algorithm in this setting, and matches
the dependence on (S) in the established lower bound of (Omega(sqrt{DSAT}))
for this problem. Our techniques involve proving some novel results about the
anti-concentration of Dirichlet distribution, which may be of independent
interest.
Zhenfang Hu, Gang Pan, Zhaohui Wu
Subjects: Learning (cs.LG)
Spectral graph theory has been widely applied in unsupervised and
semi-supervised learning. In this paper, we find for the first time, to our
knowledge, that it also plays a concrete role in supervised classification. It
turns out that two classifiers are inherently related to the theory: linear
regression for classification (LRC) and normalized radial basis function
network (nRBFN), corresponding to linear and nonlinear kernel respectively. The
spectral graph theory provides us with a new insight into a fundamental aspect
of classification: the tradeoff between fitting error and overfitting risk.
With the theory, ideal working conditions for LRC and nRBFN are presented,
which ensure not only zero fitting error but also low overfitting risk. For
quantitative analysis, two concepts, the fitting error and the spectral risk
(indicating overfitting), have been defined. Their bounds for nRBFN and LRC are
derived. A special result shows that the spectral risk of nRBFN is lower
bounded by the number of classes and upper bounded by the size of radial basis.
When the conditions are not met exactly, the classifiers will pursue the
minimum fitting error, running into the risk of overfitting. It turns out that
(ell_2)-norm regularization can be applied to control overfitting. Its effect
is explored under the spectral context. It is found that the two terms in the
(ell_2)-regularized objective are one-one correspondent to the fitting error
and the spectral risk, revealing a tradeoff between the two quantities.
Concerning practical performance, we devise a basis selection strategy to
address the main problem hindering the applications of (n)RBFN. With the
strategy, nRBFN is easy to implement yet flexible. Experiments on 14 benchmark
data sets show the performance of nRBFN is comparable to that of SVM, whereas
the parameter tuning of nRBFN is much easier, leading to reduction of model
selection time.
Michal Derezinski, Manfred K. Warmuth
Subjects: Learning (cs.LG)
Given a full rank matrix (X) with more columns than rows consider the task of
estimating the pseudo inverse (X^+) based on the pseudo inverse of a sampled
subset of columns (of size at least the number of rows). We show that this is
possible if the subset of columns is chosen proportional to the squared volume
spanned by the rows of the chosen submatrix (ie, volume sampling). The
resulting estimator is unbiased and surprisingly the covariance of the
estimator also has a closed form: It equals a specific factor times
(X^+X^{+ op}).
Pseudo inverse plays an important part in solving the linear least squares
problem, where we try to predict a label for each column of (X). We assume
labels are expensive and we are only given the labels for the small subset of
columns we sample from (X). Using our methods we show that the weight vector of
the solution for the sub problem is an unbiased estimator of the optimal
solution for the whole problem based on all column labels.
We believe that these new formulas establish a fundamental connection between
linear least squares and volume sampling. We use our methods to obtain an
algorithm for volume sampling that is faster than state-of-the-art and for
obtaining bounds for the total loss of the estimated least-squares solution on
all labeled columns.
Haotian Jiang, Jian Li, Mingda Qiao
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
In the Best-(K) identification problem (Best-(K)-Arm), we are given (N)
stochastic bandit arms with unknown reward distributions. Our goal is to
identify the (K) arms with the largest means with high confidence, by drawing
samples from the arms adaptively. This problem is motivated by various
practical applications and has attracted considerable attention in the past
decade. In this paper, we propose new practical algorithms for the Best-(K)-Arm
problem, which have nearly optimal sample complexity bounds (matching the lower
bound up to logarithmic factors) and outperform the state-of-the-art algorithms
for the Best-(K)-Arm problem (even for (K=1)) in practice.
Zhengyang Wang, Shuiwang Ji
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Visual question answering is a recently proposed artificial intelligence task
that requires a deep understanding of both images and texts. In deep learning,
images are typically modeled through convolutional neural networks, and texts
are typically modeled through recurrent neural networks. While the requirement
for modeling images is similar to traditional computer vision tasks, such as
object recognition and image classification, visual question answering raises a
different need for textual representation as compared to other natural language
processing tasks. In this work, we perform a detailed analysis on natural
language questions in visual question answering. Based on the analysis, we
propose to rely on convolutional neural networks for learning textual
representations. By exploring the various properties of convolutional neural
networks specialized for text data, such as width and depth, we present our
“CNN Inception + Gate” model. We show that our model improves question
representations and thus the overall accuracy of visual question answering
models. We also show that the text representation requirement in visual
question answering is more complicated and comprehensive than that in
conventional natural language processing tasks, making it a better task to
evaluate textual representation methods. Shallow models like fastText, which
can obtain comparable results with deep learning models in tasks like text
classification, are not suitable in visual question answering.
Zhengyang Wang, Hao Yuan, Shuiwang Ji
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The key idea of variational auto-encoders (VAEs) resembles that of
traditional auto-encoder models in which spatial information is supposed to be
explicitly encoded in the latent space. However, the latent variables in VAEs
are vectors, which are commonly interpreted as multiple feature maps of size
1×1. Such representations can only convey spatial information implicitly when
coupled with powerful decoders. In this work, we propose spatial VAEs that use
latent variables as feature maps of larger size to explicitly capture spatial
information. This is achieved by allowing the latent variables to be sampled
from matrix-variate normal (MVN) distributions whose parameters are computed
from the encoder network. To increase dependencies among locations on latent
feature maps and reduce the number of parameters, we further propose spatial
VAEs via low-rank MVN distributions. Experimental results show that the
proposed spatial VAEs outperform original VAEs in capturing rich structural and
spatial information.
Hongyang Gao, Hao Yuan, Zhengyang Wang, Shuiwang Ji
Comments: 10 pages
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Deconvolutional layers have been widely used in a variety of deep models for
up-sampling, including encoder-decoder networks for semantic segmentation and
deep generative models for unsupervised learning. One of the key limitations of
deconvolutional operations is that they result in the so-called checkerboard
problem. This is caused by the fact that no direct relationship exists among
adjacent pixels on the output feature map. To address this problem, we propose
the pixel deconvolutional layer (PixelDCL) to establish direct relationships
among adjacent pixels on the up-sampled feature map. Our method is based on a
fresh interpretation of the regular deconvolution operation. The resulting
PixelDCL can be used to replace any deconvolutional layer in a plug-and-play
manner without compromising the fully trainable capabilities of original
models. The proposed PixelDCL may result in slight decrease in efficiency, but
this can be overcome by an implementation trick. Experimental results on
semantic segmentation demonstrate that PixelDCL can consider spatial features
such as edges and shapes and yields more accurate segmentation outputs than
deconvolutional layers. When used in image generation tasks, our PixelDCL can
largely overcome the checkerboard problem suffered by regular deconvolution
operations.
Nat Dilokthanakul, Christos Kaplanis, Nick Pawlowski, Murray Shanahan
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
The problem of sparse rewards is one of the hardest challenges in
contemporary reinforcement learning. Hierarchical reinforcement learning (HRL)
tackles this problem by using a set of temporally-extended actions, or options,
each of which has its own subgoal. These subgoals are normally handcrafted for
specific tasks. Here, though, we introduce a generic class of subgoals with
broad applicability in the visual domain. Underlying our approach (in common
with work using “auxiliary tasks”) is the hypothesis that the ability to
control aspects of the environment is an inherently useful skill to have. We
incorporate such subgoals in an end-to-end hierarchical reinforcement learning
system and test two variants of our algorithm on a number of games from the
Atari suite. We highlight the advantage of our approach in one of the hardest
games — Montezuma’s revenge — for which the ability to handle sparse rewards
is key. Our agent learns several times faster than the current state-of-the-art
HRL agent in this game, reaching a similar level of performance.
Masaki Onuki, Shunsuke Ono, Keiichiro Shirai, Yuichi Tanaka
Comments: This is a journal paper
Subjects: Numerical Analysis (cs.NA); Learning (cs.LG)
We propose an approximation method for thresholding of singular values using
Chebyshev polynomial approximation (CPA). Many signal processing problems
require iterative application of singular value decomposition (SVD) for
minimizing the rank of a given data matrix with other cost functions and/or
constraints, which is called matrix rank minimization. In matrix rank
minimization, singular values of a matrix are shrunk by hard-thresholding,
soft-thresholding, or weighted soft-thresholding. However, the computational
cost of SVD is generally too expensive to handle high dimensional signals such
as images; hence, in this case, matrix rank minimization requires enormous
computation time. In this paper, we leverage CPA to (approximately) manipulate
singular values without computing singular values and vectors. The thresholding
of singular values is expressed by a multiplication of certain matrices, which
is derived from a characteristic of CPA. The multiplication is also efficiently
computed using the sparsity of signals. As a result, the computational cost is
significantly reduced. Experimental results suggest the effectiveness of our
method through several image processing applications based on matrix rank
minimization with nuclear norm relaxation in terms of computation time and
approximation precision.
Yağmur Güçlütürk, Umut Güçlü, Katja Seeliger, Sander Bosch, Rob van Lier, Marcel van Gerven
Subjects: Neurons and Cognition (q-bio.NC); Learning (cs.LG); Machine Learning (stat.ML)
Here, we present a novel approach to solve the problem of reconstructing
perceived stimuli from brain responses by combining probabilistic inference
with deep learning. Our approach first inverts the linear transformation from
latent features to brain responses with maximum a posteriori estimation and
then inverts the nonlinear transformation from perceived stimuli to latent
features with adversarial training of convolutional neural networks. We test
our approach with a functional magnetic resonance imaging experiment and show
that it can generate state-of-the-art reconstructions of perceived faces from
brain activations.
Yingzhen Li, Richard E. Turner
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Implicit models, which allow for the generation of samples but not for
point-wise evaluation of probabilities, are omnipresent in real world problems
tackled by machine learning and a hot topic of current research. Some examples
include data simulators that are widely used in engineering and scientific
research, generative adversarial networks (GANs) for image synthesis, and
hot-off-the-press approximate inference techniques relying on implicit
distributions. The majority of existing approaches to learning implicit models
rely on approximating the intractable distribution or optimisation objective
for gradient-based optimisation, which is liable to produce inaccurate updates
and thus poor models. This paper alleviates the need for such approximations by
proposing the Stein gradient estimator, which directly estimates the score
function of the implicitly defined distribution. The efficacy of the proposed
estimator is empirically demonstrated by examples that include meta-learning
for approximate inference, and entropy regularised GANs that provide improved
sample diversities.
Laetitia Le, Camille Marini, Alexandre Gramfort, David Nguyen, Mehdi Cherti, Sana Tfaili, Ali Tfayli, Arlette Baillet-Guffroy, Patrice Prognon, Pierre Chaminade, Eric Caudron, Balázs Kégl
Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG)
Monoclonal antibodies constitute one of the most important strategies to
treat patients suffering from cancers such as hematological malignancies and
solid tumors. In order to guarantee the quality of those preparations prepared
at hospital, quality control has to be developed. The aim of this study was to
explore a noninvasive, nondestructive, and rapid analytical method to ensure
the quality of the final preparation without causing any delay in the process.
We analyzed four mAbs (Inlfiximab, Bevacizumab, Ramucirumab and Rituximab)
diluted at therapeutic concentration in chloride sodium 0.9% using Raman
spectroscopy. To reduce the prediction errors obtained with traditional
chemometric data analysis, we explored a data-driven approach using statistical
machine learning methods where preprocessing and predictive models are jointly
optimized. We prepared a data analytics workflow and submitted the problem to a
collaborative data challenge platform called Rapid Analytics and Model
Prototyping (RAMP). This allowed to use solutions from about 300 data
scientists during five days of collaborative work. The prediction of the four
mAbs samples was considerably improved with a misclassification rate and the
mean error rate of 0.8% and 4%, respectively.
Haifeng Li, Jian Peng, Chao Tao, Jie Chen, Min Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Recently, deep convolutional neural network (DCNN) achieved increasingly
remarkable success and rapidly developed in the field of natural image
recognition. Compared with the natural image, the scale of remote sensing image
is larger and the scene and the object it represents are more macroscopic. This
study inquires whether remote sensing scene and natural scene recognitions
differ and raises the following questions: What are the key factors in remote
sensing scene recognition? Is the DCNN recognition mechanism centered on object
recognition still applicable to the scenarios of remote sensing scene
understanding? We performed several experiments to explore the influence of the
DCNN structure and the scale of remote sensing scene understanding from the
perspective of scene complexity. Our experiment shows that understanding a
complex scene depends on an in-depth network and multiple-scale perception.
Using a visualization method, we qualitatively and quantitatively analyze the
recognition mechanism in a complex remote sensing scene and demonstrate the
importance of multi-objective joint semantic support.
George Papamakarios, Theo Pavlakou, Iain Murray
Comments: 17 pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Autoregressive models are among the best performing neural density
estimators. We describe an approach for increasing the flexibility of an
autoregressive model, based on modelling the random numbers that the model uses
internally when generating data. By constructing a stack of autoregressive
models, each modelling the random numbers of the next model in the stack, we
obtain a type of normalizing flow suitable for density estimation, which we
call Masked Autoregressive Flow. This type of flow is closely related to
Inverse Autoregressive Flow and is a generalization of Real NVP. Masked
Autoregressive Flow achieves state-of-the-art performance in a range of
general-purpose density estimation tasks.
Pan Zhou, Jiashi Feng
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)
This paper studies the landscape of empirical risk of deep neural networks by
theoretically analyzing its convergence behavior to the population risk as well
as its stationary points and properties. For an (l)-layer linear neural
network, we prove its empirical risk uniformly converges to its population risk
at the rate of (mathcal{O}(r^{2l}sqrt{dlog(l)}/sqrt{n})) with training
sample size of (n), the total weight dimension of (d) and the magnitude bound
(r) of weight of each layer. We then derive the stability and generalization
bounds for the empirical risk based on this result. Besides, we establish the
uniform convergence of gradient of the empirical risk to its population
counterpart. We prove the one-to-one correspondence of the non-degenerate
stationary points between the empirical and population risks with convergence
guarantees, which describes the landscape of deep neural networks. In addition,
we analyze these properties for deep nonlinear neural networks with sigmoid
activation functions. We prove similar results for convergence behavior of
their empirical risks as well as the gradients and analyze properties of their
non-degenerate stationary points.
To our best knowledge, this work is the first one theoretically
characterizing landscapes of deep learning algorithms. Besides, our results
provide the sample complexity of training a good deep neural network. We also
provide theoretical understanding on how the neural network depth (l), the
layer width, the network size (d) and parameter magnitude determine the neural
network landscapes.
Sebastien Dubois, Sebastien Dubois, David C. Kale, Nigam Shah, Kenneth Jung
Comments: Under review for the Machine Learning for Healthcare Conference 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Clinical notes are a rich source of information about patient state. However,
using them effectively presents many challenges. In this work we present two
methods for summarizing clinical notes into patient-level representations. The
resulting representations are evaluated on a range of prediction tasks and
cohort sizes. The new representations offer significant predictive performance
gains over the common baselines of Bag of Words and topic model representations
across all tested tasks and cohort sizes.
Yang Cao, Yao Xie, Huan Xu
Subjects: Statistics Theory (math.ST); Learning (cs.LG)
Sequential hypothesis test and change-point detection when the distribution
parameters are unknown is a fundamental problem in statistics and machine
learning. We show that for such problems, detection procedures based on
sequential likelihood ratios with simple online mirror descent estimators are
nearly optimal. This is a blessing, since although the well-known generalized
likelihood ratio statistics are optimal theoretically, but their exact
computation usually requires infinite memory of historical data. We prove the
optimality by making a connection of sequential analysis with online convex
optimization and leveraging the logarithmic regret bound property of online
mirror descent algorithm. Numerical examples validate our theory.
Robert Adamski, Tomasz Grel, Maciej Klimek, Henryk Michalewski
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Learning (cs.LG)
The asynchronous nature of the state-of-the-art reinforcement learning
algorithms such as the Asynchronous Advantage Actor-Critic algorithm, makes
them exceptionally suitable for CPU computations. However, given the fact that
deep reinforcement learning often deals with interpreting visual information, a
large part of the train and inference time is spent performing convolutions. In
this work we present our results on learning strategies in Atari games using a
Convolutional Neural Network, the Math Kernel Library and TensorFlow 0.11rc0
machine learning framework. We also analyze effects of asynchronous
computations on the convergence of reinforcement learning algorithms.
Raymond Brummelhuis, Zhongmin Luo
Comments: 51 pages; 21 Figures; 15 Tables
Subjects: Statistical Finance (q-fin.ST); Learning (cs.LG); Risk Management (q-fin.RM); Machine Learning (stat.ML)
Regulators require financial institutions to estimate counterparty default
risks from liquid CDS quotes for the valuation and risk management of OTC
derivatives. However, the vast majority of counterparties do not have liquid
CDS quotes and need proxy CDS rates. Existing methods cannot account for
counterparty-specific default risks; we propose to construct proxy CDS rates by
associating to illiquid counterparty liquid CDS Proxy based on Machine Learning
Techniques. After testing 156 classifiers from 8 most popular classifier
families, we found that some classifiers achieve highly satisfactory accuracy
rates. Furthermore, we have rank-ordered the performances and investigated
performance variations amongst and within the 8 classifier families. This paper
is, to the best of our knowledge, the first systematic study of CDS Proxy
construction by Machine Learning techniques, and the first systematic
classifier comparison study based entirely on financial market data. Its
findings both confirm and contrast existing classifier performance literature.
Given the typically highly correlated nature of financial data, we investigated
the impact of correlation on classifier performance. The techniques used in
this paper should be of interest for financial institutions seeking a CDS Proxy
method, and can serve for proxy construction for other financial variables.
Some directions for future research are indicated.
Renbo Zhao, William B. Haskell, Jiashi Feng
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)
We propose a unified framework to speed up the existing stochastic matrix
factorization (SMF) algorithms via variance reduction. Our framework is general
and it subsumes several well-known SMF formulations in the literature. We
perform a non-asymptotic convergence analysis of our framework and derive
computational and sample complexities for our algorithm to converge to an
(epsilon)-stationary point in expectation. In addition, extensive experiments
for a wide class of SMF formulations demonstrate that our framework
consistently yields faster convergence and a more accurate output dictionary
vis-`a-vis state-of-the-art frameworks.
Evgeny Bauman, Konstantin Bauman
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In a standard cluster analysis, such as k-means, in addition to clusters
locations and distances between them, it’s important to know if they are
connected or well separated from each other. The main focus of this paper is
discovering the relations between the resulting clusters. We propose a new
method which is based on pairwise overlapping k-means clustering, that in
addition to means of clusters provides the graph structure of their relations.
The proposed method has a set of parameters that can be tuned in order to
control the sensitivity of the model and the desired relative size of the
pairwise overlapping interval between means of two adjacent clusters, i.e.,
level of overlapping. We present the exact formula for calculating that
parameter. The empirical study presented in the paper demonstrates that our
approach works well not only on toy data but also compliments standard
clustering results with a reasonable graph structure on real datasets, such as
financial indices and restaurants.
Cory Stephenson, Patrick Callier, Abhinav Ganesh, Karl Ni
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We propose an algorithm to separate simultaneously speaking persons from each
other, the “cocktail party problem”, using a single microphone. Our approach
involves a deep recurrent neural networks regression to a vector space that is
descriptive of independent speakers. Such a vector space can embed empirically
determined speaker characteristics and is optimized by distinguishing between
speaker masks. We call this technique source-contrastive estimation. The
methodology is inspired by negative sampling, which has seen success in natural
language processing, where an embedding is learned by correlating and
de-correlating a given input vector with output weights. Although the matrix
determined by the output weights is dependent on a set of known speakers, we
only use the input vectors during inference. Doing so will ensure that source
separation is explicitly speaker-independent. Our approach is similar to recent
deep neural network clustering and permutation-invariant training research; we
use weighted spectral features and masks to augment individual speaker
frequencies while filtering out other speakers. We avoid, however, the severe
computational burden of other approaches with our technique. Furthermore, by
training a vector space rather than combinations of different speakers or
differences thereof, we avoid the so-called permutation problem during
training. Our algorithm offers an intuitive, computationally efficient response
to the cocktail party problem, and most importantly boasts better empirical
performance than other current techniques.
Deepesh Data, Vinod M. Prabhakaran
Comments: This is an extended version of a submission to ITW, 2017
Subjects: Information Theory (cs.IT)
We consider secure computation of randomized functions between two users,
where both the users (Alice and Bob) have inputs, Alice sends a message to Bob
over a rate-limited, noise-free link, and then Bob produces the output. We
study two cases: (i) when privacy condition is required only against Bob, who
tries to learn more about Alice’s input from the message than what can be
inferred by his own input and output, and (ii) when there is no privacy
requirement. For both the problems, we give single-letter expressions for the
optimal rates. For the first problem, we also explicitly characterize securely
computable randomized functions when input has full support, which leads to a
much simpler expression for the optimal rate. Recently, Data (ISIT 2016)
studied the remaining two cases (first, when privacy conditions are against
both the users; and second, when privacy condition is only against Alice) and
obtained single-letter expressions for optimal rates in both the scenarios.
Alireza Sheikh, Alexandre Graell i Amat, Gianluigi Liva
Subjects: Information Theory (cs.IT)
We analyze the achievable information rates (AIRs) for coded modulation
schemes with QAM constellations with both bit-wise and symbol-wise decoders,
corresponding to the case where a binary code is used in combination with a
higher-order modulation using the bit-interleaved coded modulation (BICM)
paradigm and to the case where a nonbinary code over a field matched to the
constellation size is used, respectively. In particular, we consider hard
decision decoding, which is the preferable option for fiber-optic communication
systems where decoding complexity is a concern. Recently, Liga emph{et al.}
analyzed the AIRs for bit-wise and symbol-wise decoders considering what the
authors called emph{hard decision decoder} which, however, exploits emph{soft
information} of the transition probabilities of discrete-input discrete-ouput
channel resulting from the hard detection. As such, the complexity of the
decoder is essentially the same as the complexity of a soft decision decoder.
In this paper, we analyze instead the AIRs for the standard hard decision
decoder, commonly used in practice, where the decoding is based on the Hamming
distance metric. We show that if standard hard decision decoding is used,
bit-wise decoders yield significantly higher AIRs than symbol-wise decoders. As
a result, contrary to the conclusion by Liga emph{et al.}, binary decoders
together with the BICM paradigm are preferable for spectrally-efficient
fiber-optic systems. We also design binary and nonbinary staircase codes and
show that, in agreement with the AIRs, binary codes yield better performance.
He Huang, Chenglin Zhao
Comments: 5 pages
Subjects: Information Theory (cs.IT)
This paper presents optimization issue of energy detection (ED) thresholds
for cooperative spectrum sensing (CSS) with regard to generalized Gaussian
channel in cognitive radios (CRs) systems. Enhanced ED thresholds are made use
of dealing with excessive sensitivity of Gaussian noise uncertainty, which is
appropriate for multiple and mal-distribution noises interference matter.
Two-steps decision pattern and convex limitation threshold solution within
Voting Progress have been used for secondary users (SUs) over coherent complex
random Gaussian channel. Through deducing the probability of detection and
false alarm under independent and identical distribution (i.i.d.) for multiple
SUs, we derive optimal parameters (such as ED thresholds and the number of
collaborative users) to achieve mixture noises fusion processing at low
signal-to-noise-ratio (SNR) level. Furthermore, simulation results show that
proposed schemes significantly outperform most other noise uncertainty plans,
which extremely improve (total error rate) in low-quality communication
circumstance.
Huang He, Zhao Chenglin
Comments: 9 pages
Subjects: Information Theory (cs.IT)
Energy detection is a reliable non-coherent signal processing technology of
spectrum sensing of cognitive radio networks, which thanks to its low
complexity, no requirement of priori received information and fast sensing
ability etc. Since the excellent performance of energy detection would be
actually affected by physical multipath fading, this paper is concentrating on
characteristics analysis of energy detection over composite shadowed fading
channels. The small-scale and line-of-sight fading distribution consists of
particular examples such as Rayleigh, Hoyt, Nakagami-m and one sided Gaussian
distributions. Based on this, we derive the probability density function of
signal envelope and signal-to-noise ratio of the composite shadowed fading
channels, which could accurately present the line-of-sight shadowed fading
characterization. Subsequently the exact close-form expressions with infinite
series formulation for the appropriate detection probability have been firstly
extended to estimate detection capacity of the above-mentioned model by
adopting Inverse Gaussian asymptotic distribution. In addition, the absolute
truncation error is deduced for evaluating minimum detection efficiency. The
established model can be also applied in detection estimation with non-integral
fading parameters. Last but not least, the analytical results and
quantification performance are approved by numerically evaluation with
MATHEMATICA and MATLAB as the power variables of dominant components changes.
Huang He, Zhao Chenglin
Comments: 14 pages
Subjects: Information Theory (cs.IT)
This paper analyzes the unified performance of energy detection (ED) of
spectrum sensing (SS) over generalized fading channels in cognitive radio (CR)
networks. The detective performance of SS schemes will be obviously affected by
fading channel between communication nodes, and ED has the advantages of fast
implementation, no requirement of priori received information and low
complexity, so it is meaningful to investigate ED that is performed over fading
channels such as Nakagami-m channel and Rice channel, or generalized fading
channels such as k{appa}-{mu} fading distribution and {eta}-{mu} fading
distribution. The {alpha}-k{appa}-{mu} fading distribution is a generalized
fading model that represents the nonlinear and small-scale variation of fading
channels. The probability density function (p.d.f.) of instantaneous
signal-to-ratio (SNR) of {alpha}-k{appa}-{mu} distribution is derived from
the envelope p.d.f. to evaluate energy efficiency for sensing systems. Next,
the probability of detection model with Marcum-Q function has been derived and
the close-form detective expressions with moment generating function (MGF)
method are deduced to achieve sensing communications over generalized fading
channels. Furthermore, novel and exact closed-form analytic expressions for
average area under the receiver operating characteristics curve also have been
deduced to analyze the performance characteristics of ED over
{alpha}-k{appa}-{mu} fading channels. Besides, cooperative spectrum sensing
(CSS) with diversity reception has been applied to improve the detection
accuracy and mitigate the shadowed fading features with OR-rule. At last, the
results show that the detection capacity can be evidently affected by
{alpha}-k{appa}-{mu} fading conditions, but appropriate channel parameters
will improve sensing performance.
Junyeong Seo, Youngchul Sung
Comments: 29 pages, 8 figures, submitted to IEEE transaction on singal processing
Subjects: Information Theory (cs.IT)
In this paper, an efficient transmit beam design and user scheduling method
is proposed for multi-user (MU) multiple-input single-output (MISO)
non-orthogonal multiple access (NOMA) downlink, based on Pareto-optimality. The
proposed beam design and user scheduling method groups simultaneously-served
users into multiple clusters with practical two users in each cluster, and then
applies spatical zeroforcing (ZF) across clusters to control inter-cluster
interference (ICI) and Pareto-optimal beam design with successive interference
cancellation (SIC) to two users in each cluster to remove interference to
strong users and leverage signal-to-interference-plus-noise ratios (SINRs) of
interference-experiencing weak users. The proposed method has flexibility to
control the rates of strong and weak users and numerical results show that the
proposed method yields good performance.
Trung Kien Vu, Chen-Feng Liu, Mehdi Bennis, Mérouane Debbah, Matti Latva-aho, Choong Seon Hong
Comments: Accepted May 12, 2017 by IEEE Communications Letters. Topic is Ultra-Reliable and Low Latency Communication in 5G mmWave Networks
Subjects: Information Theory (cs.IT)
Ultra-reliability and low-latency are two key components in 5G networks. In
this letter, we investigate the problem of ultra-reliable and low-latency
communication (URLLC) in millimeter wave (mmWave)-enabled massive
multiple-input multiple-output (MIMO) networks. The problem is cast as a
network utility maximization subject to probabilistic latency and reliability
constraints. To solve this problem, we resort to the Lyapunov technique whereby
a utility-delay control approach is proposed, which adapts to channel
variations and queue dynamics. Numerical results demonstrate that our proposed
approach ensures reliable communication with a guaranteed probability of
99.99%, and reduces latency by 28.41% and 77.11% as compared to baselines with
and without probabilistic latency constraints, respectively.
Mohammad Hadi, Mohammad Reza Pakravan
Comments: 28 pages, 11 figures, 4 tables
Subjects: Information Theory (cs.IT)
We propose a two-stage algorithm for energy-efficient resource allocation
constrained to QoS and physical requirements in OFDM-based EONs. The first
stage deals with routing, grooming and traffic ordering and aims at minimizing
amplifier power consumption and number of active transponders. We provide a
heuristic procedure which yields an acceptable solution for the complex ILP
formulation of the routing and grooming. In the second stage, we optimize
transponder configuration including spectrum and transmit power parameters to
minimize transponder power consumption. We show how QoS and transponder power
consumption are represented by convex expressions and use the results to
formulate a convex problem for configuring transponders in which transmit
optical power is an optimization variable. Simulation results demonstrate that
the power consumption is reduced by 9% when the proposed routing and grooming
algorithm is applied to European Cost239 network with aggregate traffic 60
Tbps. It is shown that our convex formulation for transponder parameter
assignment is considerably faster than its MINLP counterpart and its ability to
optimize transmit optical power improves transponder power consumption by 8%
for aggregate traffic 60 Tbps. Furthermore, we investigate the effect of
adaptive modulation assignment and transponder capacity on inherent tradeoff
between network CAPEX and OPEX.
Robert F.H. Fischer, Susanne Sparrer
Subjects: Information Theory (cs.IT)
We consider iterative (`turbo’) algorithms for compressed sensing. First, a
unified exposition of the different approaches available in the literature is
given, thereby enlightening the general principles and main differences. In
particular we discuss i) the estimation step (matched filter vs. optimum MMSE
estimator), ii) the unbiasing operation (implicitly or explicitly done and
equivalent to the calculation of extrinsic information), and iii) thresholding
vs. the calculation of soft values. Based on these insights we propose a
low-complexity but well-performing variant utilizing a Krylov space
approximation of the optimum linear MMSE estimator. The derivations are valid
for any probability density of the signal vector. However, numerical results
are shown for the discrete case. The novel algorithms shows very good
performance and even slightly faster convergence compared to approximative
message passing.
Sha Hu, Fredrik Rusek, Ove Edfors
Comments: Submitted to IEEE Trans. on Signal Processing on Apr. 2017; 30 pages; 13 figures
Subjects: Information Theory (cs.IT)
We consider the potential for positioning with a system where antenna arrays
are deployed as a large intelligent surface (LIS), which is a newly proposed
concept beyond massive-MIMO where future man-made structures are electronically
active with integrated electronics and wireless communication making the entire
environment lqlq{}intelligent
q
q{}. In a first step, we derive
Fisher-information and Cram'{e}r-Rao lower bounds (CRLBs) in closed-form for
positioning a terminal located perpendicular to the center of the LIS, whose
location we refer to as being on the central perpendicular line (CPL) of the
LIS. For a terminal that is not on the CPL, closed-form expressions of the
Fisher-information and CRLB seem out of reach, and we alternatively find
approximations of them which are shown to be accurate. Under mild conditions,
we show that the CRLB for all three Cartesian dimensions ((x), (y) and (z))
decreases quadratically in the surface-area of the LIS, except for a terminal
exactly on the CPL where the CRLB for the (z)-dimension (distance from the LIS)
decreases linearly in the same. In a second step, we analyze the CRLB for
positioning when there is an unknown phase (varphi) presented in the analog
circuits of the LIS. We then show that the CRLBs are dramatically increased for
all three dimensions but decrease in the third-order of the surface-area.
Moreover, with an infinitely large LIS the CRLB for the (z)-dimension with an
unknown (varphi) is 6 dB higher than the case without phase uncertainty, and
the CRLB for estimating (varphi) converges to a constant that is independent
of the wavelength (lambda). At last, we extensively discuss the impact of
centralized and distributed deployments of LIS, and show that a distributed
deployment of LIS can enlarge the coverage for terminal-positioning and improve
the overall positioning performance.
Pengfei Huang, Yi Liu, Xiaojie Zhang, Paul H. Siegel, Erich F. Haratsch
Comments: Submitted to ITW 2017
Subjects: Information Theory (cs.IT)
Rate-compatible error-correcting codes (ECCs), which consist of a set of
extended codes, are of practical interest in both wireless communications and
data storage. In this work, we first study the lower bounds for rate-compatible
ECCs, thus proving the existence of good rate-compatible codes. Then, we
propose a general framework for constructing rate-compatible ECCs based on
cosets and syndromes of a set of nested linear codes. We evaluate our
construction from two points of view. From a combinatorial perspective, we show
that we can construct rate-compatible codes with increasing minimum distances.
From a probabilistic point of view, we prove that we are able to construct
capacity-achieving rate-compatible codes.
Daniele Pinchera, Marco Donald Migliore, Fulvio Schettino, Gaetano Panariello
Comments: 9 Pages, 19 figures, to be submitted for publication
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
The aim of this paper is to analyse the linear arrays synthesis for 5G
massive MIMO systems in the Line-of-Sight working condition. The main result of
this paper is that non uniform arrays are the natural choice in this kind of
applications. In particular, by using non equispaced array we show that it is
possible to achieve a better average condition number of the channel matrix.
Furthermore, we verify that increasing the array size is beneficial also for
circular arrays and we provide some useful rules-of-thumb for antenna array
design for Massive MIMO applications. These results are in contrasts to the
widely accepted idea in 5G massive MIMO literature, in which half-wavelength
linear uniform array is universally adopted.
Mustafa A. Kishk, Harpreet S. Dhillon
Subjects: Information Theory (cs.IT)
Ambient radio frequency (RF) energy harvesting has emerged as a promising
solution for powering small devices and sensors in massive Internet of Things
(IoT) ecosystem due to its ubiquity and cost efficiency. In this paper, we
study joint uplink and downlink coverage of cellular-based ambient RF energy
harvesting IoT where the cellular network is assumed to be the only source of
RF energy. We consider a time division-based approach for power and information
transmission where each time-slot is partitioned into three sub-slots: (i)
charging sub-slot during which the cellular base stations (BSs) act as RF
chargers for the IoT devices, which then use the energy harvested in this
sub-slot for information transmission and/or reception during the remaining two
sub-slots, (ii) downlink sub-slot during which the IoT device receives
information from the associated BS, and (iii) uplink sub-slot during which the
IoT device transmits information to the associated BS. For this setup, we
characterize the joint coverage probability, which is the joint probability of
the events that the typical device harvests sufficient energy in the given time
slot and is under both uplink and downlink signal-to-interference-plus-noise
ratio (SINR) coverage with respect to its associated BS. This metric
significantly generalizes the prior art on energy harvesting communications,
which usually focused on downlink or uplink coverage separately. The key
technical challenge is in handling the correlation between the amount of energy
harvested in the charging sub-slot and the information signal quality (SINR) in
the downlink and uplink sub-slots. Dominant BS-based approach is developed to
derive tight approximation for this joint coverage probability. Several system
design insights including comparison with regularly powered IoT network and
throughput-optimal slot partitioning are also provided.
Ali Dehghan, Amir H. Banihashemi
Subjects: Information Theory (cs.IT)
The performance of low-density parity-check (LDPC) codes in the error floor
region is closely related to some combinatorial structures of the code’s Tanner
graph, collectively referred to as {it trapping sets (TSs)}. In this paper, we
study the asymptotic average number of different types of trapping sets such as
{em elementary TSs (ETS)}, {em leafless ETSs (LETS)}, {em absorbing sets
(ABS)}, {em elementary ABSs (EABS)}, and {em stopping sets (SS)}, in random
variable-regular and irregular LDPC code ensembles. We demonstrate that,
regardless of the type of the TS, as the code’s length tends to infinity, the
average number of a given structure tends to infinity, to a positive constant,
or to zero, if the structure contains no cycle, only one cycle, or more than
one cycle, respectively. For the case where the structure contains a single
cycle, we obtain an estimate of the expected number of the structure through
the available approximations for the average number of its constituent cycle.
These estimates, which are independent of the block length and only depend on
the code’s degree distributions, are shown to be accurate even for
finite-length codes.
Man Chu, Biao He, Xuewen Liao, Zhenzhen Gao, Shihua Zhu
Comments: 6 pages, 6 figures, to appear in Proc. VTC Fall, Toronto, Canada
Subjects: Information Theory (cs.IT)
In this paper, we study a multi-user multi-relay interference-channel
network, where energy-constrained relays harvest energy from sources’ radio
frequency (RF) signals and use the harvested energy to forward the information
to destinations. We adopt the interference alignment (IA) technique to address
the issue of interference, and propose a novel transmission scheme with the IA
at sources and the power splitting (PS) at relays. A distributed and iterative
algorithm to obtain the optimal PS ratios is further proposed, aiming at
maximizing the sum rate of the network. The analysis is then validated by
simulation results. Our results show that the proposed scheme with the optimal
design significantly improves the performance of the network.
Stella Civelli, Enrico Forestieri, Marco Secondini
Comments: The manuscript has been submitted to Photonics Technology Letters for publication
Subjects: Information Theory (cs.IT); Applied Physics (physics.app-ph)
The performance of optical fiber systems based on nonlinear
frequency-division multiplexing (NFDM) or on more conventional transmission
techniques is compared through numerical simulations. Some critical issues
affecting NFDM systems-namely, the strict requirements needed to avoid burst
interaction due to signal dispersion and the unfavorable dependence of
performance on burst length-are investigated, highlighting their potentially
disruptive effect in terms of spectral efficiency. Two digital processing
techniques are finally proposed to halve the guard time between NFDM symbol
bursts and reduce the size of the processing window at the receiver, increasing
spectral efficiency and reducing computational complexity.
Marco Giordani, Andrea Zanella, Michele Zorzi
Comments: This document is a preliminary report which describes the mathematical model developed to carry out the results proposed in our work “Millimeter Wave Communication in Vehicular Networks: Challenges and Opportunities”, accepted to International Conference on Modern Circuits and Systems Technologies (MOCAST), 2017. A more updated version of this technical report will appear in the next weeks
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
In this technical report (TR), we describe the mathematical model we
developed to carry out a preliminary coverage and connectivity analysis in an
automotive communication scenario based on mmWave links. The purpose is to
exemplify some of the complex and interesting tradeoffs that have to be
considered when designing solutions for mmWave automotive scenarios.