David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Deep neural networks trained on large supervised datasets have led to
impressive results in recent years. However, since well-annotated datasets can
be prohibitively expensive and time-consuming to collect, recent work has
explored the use of larger but noisy datasets that can be more easily obtained.
In this paper, we investigate the behavior of deep neural networks on training
sets with massively noisy labels. We show that successful learning is possible
even with an essentially arbitrary amount of noise. For example, on MNIST we
find that accuracy of above 90 percent is still attainable even when the
dataset has been diluted with 100 noisy examples for each clean example. Such
behavior holds across multiple patterns of label noise, even when noisy labels
are biased towards confusing classes. Further, we show how the required dataset
size for successful training increases with higher label noise. Finally, we
present simple actionable techniques for improving learning in the regime of
high label noise.
José Novoa, Josué Fredes, Néstor Becerra Yoma
Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE)
In this paper, the uncertainty is defined as the mean square error between a
given enhanced noisy observation vector and the corresponding clean one. Then,
a DNN is trained by using enhanced noisy observation vectors as input and the
uncertainty as output with a training database. In testing, the DNN receives an
enhanced noisy observation vector and delivers the estimated uncertainty. This
uncertainty in employed in combination with a weighted DNN-HMM based speech
recognition system and compared with an existing estimation of the noise
cancelling uncertainty variance based on an additive noise model. Experiments
were carried out with Aurora-4 task. Results with clean, multi-noise and
multi-condition training are presented.
Erbo Li, Hua Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Symmetry detection and discrimination are of fundamental meaning in science,
technology, and engineering. This paper introduces reflection invariants and
defines the directional moment to detect symmetry for shape analysis and object
recognition. And it demonstrates that detection of reflection symmetry can be
done in a simple way by solving a trigonometric system derived from the
directional moment, and discrimination of reflection symmetry can be achieved
by application of the reflection invariants in 2D and 3D. Rotation symmetry can
also be determined based on that.The experiments in 2D and 3D, including the
regular triangle, the square, and the five Platonic objects, show that all the
reflection lines or planes can be deterministically found using directional
moments up to order six. This result can be used to simplify the efforts of
symmetry detection in research areas, such as protein structure, model
retrieval, inverse engineering, and machine vision etc.
Chih-Ting Liu, Yi-Heng Wu, Yu-Sheng Lin, Shao-Yi Chien
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep Convolutional Neural Networks (CNN) have won a significant place in the
computer vision recently, which repeatedly convolving an image to extract the
knowledge behind it. However, with the depth of convolutional layers getting
deeper and deeper in recent years, the computational complexity also increases
significantly, which make it difficult to be deployed on embedded systems with
limited hardware resources.
In this paper we propose a method to reduce the redundant convolution kernels
during the computation of CNN and apply it to a network for super resolution
(SR). Using PSNR drop compared to the original network as performance
criterion, our method can get the optimal PSNR under a certain computation
budget constraint. On the other hand, our method is also capable of minimizing
the computation required under a given PSNR drop.
Tong Zhang (1 and 2), Wenming Zheng (2), Zhen Cui (2), Yuan Zong (2), Yang Li (1 and 2) ((1) the Department of Information Science and Engineering, Southeast University, Nanjing, China (2) the Key Laboratory of Child Development and Learning Science of Ministry of Education, Research Center for Learning Science, Southeast University, Nanjing, China)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, a novel deep manifold-to-manifold transforming network
(DMT-Net) is proposed for action recognition, in which symmetric positive
definite (SPD) matrix is adopted to describe the spatial-temporal information
of action feature vectors. Since each SPD matrix is a point of the Riemannian
manifold space, the proposed DMT-Net aims to learn more discriminative feature
by hierarchically transforming the data points from one Riemannian manifold to
another more discriminative one. To this end, several novel layers are proposed
in DMT-Net, including SPD convolutional layer, channel convolution layer,
diagonalizing layer and kernel regression layer. Specifically, SPD
convolutional layer enables multi-channel convolution to be well applied to
Riemannian manifold, and the kernel regression layer enables the distance
metric computation between two SPD points in Riemannian manifold to be done in
the Euclidean space, in which a novel reference set dynamically generated
during the network training is also introduced to relieve the computational
burden of the kernel method. To evaluate the effectiveness of the proposed
method, three action recognition databases are respectively used to testify our
method and the experimental results show that our algorithm outperforms the
state-of-the-art methods.
Ali Taalimi, Liu Liu, Hairong Qi
Comments: 5 pages, Accepted in International Conference of Image Processing, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel hierarchical approach for the simultaneous
tracking of multiple targets in a video. We use a network flow approach to link
detections in low-level and tracklets in high-level. At each step of the
hierarchy, the confidence of candidates is measured by using a new scoring
system, ConfRank, that considers the quality and the quantity of its
neighborhood. The output of the first stage is a collection of safe tracklets
and unlinked high-confidence detections. For each individual detection, we
determine if it belongs to an existing or is a new tracklet. We show the effect
of our framework to recover missed detections and reduce switch identity. The
proposed tracker is referred to as TVOD for multi-target tracking using the
visual tracker and generic object detector. We achieve competitive results with
lower identity switches on several datasets comparing to state-of-the-art.
Ali Taalimi, Alireza Rahimpour, Liu Liu, Hairong Qi
Comments: 5 pages, Accepted in International Conference of Image Processing, 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Nowadays, distributed smart cameras are deployed for a wide set of tasks in
several application scenarios, ranging from object recognition, image
retrieval, and forensic applications. Due to limited bandwidth in distributed
systems, efficient coding of local visual features has in fact been an active
topic of research. In this paper, we propose a novel approach to obtain a
compact representation of high-dimensional visual data using sensor fusion
techniques. We convert the problem of visual analysis in resource-limited
scenarios to a multi-view representation learning, and we show that the key to
finding properly compressed representation is to exploit the position of
cameras with respect to each other as a norm-based regularization in the
particular signal representation of sparse coding. Learning the representation
of each camera is viewed as an individual task and a multi-task learning with
joint sparsity for all nodes is employed. The proposed representation learning
scheme is referred to as the multi-view task-driven learning for visual sensor
network (MT-VSN). We demonstrate that MT-VSN outperforms state-of-the-art in
various surveillance recognition tasks.
Mark Marsden, Kevin McGuinness, Suzanne Little, Noel E. O'Connor
Comments: 7 Pages, AVSS 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we propose ResnetCrowd, a deep residual architecture for
simultaneous crowd counting, violent behaviour detection and crowd density
level classification. To train and evaluate the proposed multi-objective
technique, a new 100 image dataset referred to as Multi Task Crowd is
constructed. This new dataset is the first computer vision dataset fully
annotated for crowd counting, violent behaviour detection and density level
classification. Our experiments show that a multi-task approach boosts
individual task performance for all tasks and most notably for violent
behaviour detection which receives a 9\% boost in ROC curve AUC (Area under the
curve). The trained ResnetCrowd model is also evaluated on several additional
benchmarks highlighting the superior generalisation of crowd analysis models
trained for multiple objectives.
Jingya Wang, Xiatian Zhu, Shaogang Gong
Comments: Artificial Intelligence journal 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Discovering automatically the semantic structure of tagged visual data (e.g.
web videos and images) is important for visual data analysis and
interpretation, enabling the machine intelligence for effectively processing
the fast-growing amount of multi-media data. However, this is non-trivial due
to the need for jointly learning underlying correlations between heterogeneous
visual and tag data. The task is made more challenging by inherently sparse and
incomplete tags. In this work, we develop a method for modelling the inherent
visual data concept structures based on a novel Hierarchical-Multi-Label Random
Forest model capable of correlating structured visual and tag information so as
to more accurately interpret the visual semantics, e.g. disclosing meaningful
visual groups with similar high-level concepts, and recovering missing tags for
individual visual data samples. Specifically, our model exploits hierarchically
structured tags of different semantic abstractness and multiple tag statistical
correlations in addition to modelling visual and tag interactions. As a result,
our model is able to discover more accurate semantic correlation between
textual tags and visual features, and finally providing favourable visual
semantics interpretation even with highly sparse and incomplete tags. We
demonstrate the advantages of our proposed approach in two fundamental
applications, visual data clustering and missing tag completion, on
benchmarking video (i.e. TRECVID MED 2011) and image (i.e. NUS-WIDE) datasets.
Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler
Comments: Accepted in Proc. IEEE International Conference on Image Processing (ICIP), 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Imaging the atmosphere using ground-based sky cameras is a popular approach
to study various atmospheric phenomena. However, it usually focuses on the
daytime. Nighttime sky/cloud images are darker and noisier, and thus harder to
analyze. An accurate segmentation of sky/cloud images is already challenging
because of the clouds’ non-rigid structure and size, and the lower and less
stable illumination of the night sky increases the difficulty. Nonetheless,
nighttime cloud imaging is essential in certain applications, such as
continuous weather analysis and satellite communication.
In this paper, we propose a superpixel-based method to segment nighttime
sky/cloud images. We also release the first nighttime sky/cloud image
segmentation database to the research community. The experimental results show
the efficacy of our proposed algorithm for nighttime images.
Rui Gao, Sergiy A. Vorobyov
Comments: 27 pages, 8 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We address the multi-focus image fusion problem, where multiple images
captured with different focal settings are to be fused into an all-in-focus
image of higher quality. Algorithms for this problem necessarily admit the
source image characteristics along with focused and blurred feature. However,
most sparsity-based approaches use a single dictionary in focused feature space
to describe multi-focus images, and ignore the representations in blurred
feature space. Here, we propose a multi-focus image fusion approach based on
coupled sparse representation. The approach exploits the facts that (i) the
patches in given training set can be sparsely represented by a couple of
overcomplete dictionaries related to the focused and blurred categories of
images; and (ii) merging such representations leads to a more flexible and
therefore better fusion strategy than the one based on just selecting the
sparsest representation in the original image estimate. By jointly learning the
coupled dictionary, we enforce the similarity of sparse representations in the
focused and blurred feature spaces, and then introduce a fusion approach to
combine these representations for generating an all-in-focus image. We also
discuss the advantages of the fusion approach based on coupled sparse
representation and present an efficient algorithm for learning the coupled
dictionary. Extensive experimental comparisons with state-of-the-art
multi-focus image fusion algorithms validate the effectiveness of the proposed
approach.
Wenhan Luo, Peng Sun, Yadong Mu, Wei Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we propose an active object tracking approach, which provides a
tracking solution simultaneously addressing tracking and camera control.
Crucially, these two tasks are tackled in an end-to-end manner via
reinforcement learning. Specifically, a ConvNet-LSTM function approximator is
adopted, which takes as input only visual observations (i.e., frame sequences)
and directly outputs camera motions (e.g., move forward, turn left, etc.). The
tracker, regarded as an agent, is trained with the A3C algorithm, where we
harness an environment augmentation technique and a customized rewarding
function to encourage robust object tracking. We carry out experiments on the
AI research platform ViZDoom. The yielded tracker can automatically pay
attention to the most likely object in the initial frame and perform tracking
subsequently, not requiring a manual bounding box as initialization. Moreover,
our approach shows a good generalization ability when performing tracking in
case of unseen object moving path, object appearance, background and
distracting object. The tracker can even restore tracking once it occasionally
loses the target.
Longquan Dai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we will disclose that the Guided Filter (GF) can be
interpreted as the Cyclic Coordinate Descent (CCD) solver of a Least Square
(LS) objective function. This discovery implies a possible way to extend GF
because we can alter the objective function of GF and define new filters as the
first pass iteration of the CCD solver of modified objective functions.
Moreover, referring to the iterative minimizing procedure of CCD, we can derive
new rolling filtering schemes. Hence, under the guidance of this discovery, we
not only propose new GF-like filters adapting to the specific requirements of
applications but also offer thoroughly explanations for two rolling filtering
schemes of GF as well as the way to extend them. Experiments show that our new
filters and extensions produce state-of-the-art results.
Hamed R. Tavakoli, Fawad Ahmed, Ali Borji, Jorma Laaksonen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper revisits visual saliency prediction by evaluating the recent
advancements in this field such as crowd-sourced mouse tracking-based databases
and contextual annotations. We pursue a critical and quantitative approach
towards some of the new challenges including the quality of mouse tracking
versus eye tracking for model training and evaluation. We extend quantitative
evaluation of models in order to incorporate contextual information by
proposing an evaluation methodology that allows accounting for contextual
factors such as text, faces, and object attributes. The proposed contextual
evaluation scheme facilitates detailed analysis of models and helps identify
their pros and cons. Through several experiments, we find that (1) mouse
tracking data has lower inter-participant visual congruency and higher
dispersion, compared to the eye tracking data, (2) mouse tracking data does not
totally agree with eye tracking in general and in terms of different contextual
regions in specific, and (3) mouse tracking data leads to acceptable results in
training current existing models, and (4) mouse tracking data is less reliable
for model selection and evaluation. The contextual evaluation also reveals
that, among the studied models, there is no single model that performs best on
all the tested annotations.
Hannah Spitzer, Katrin Amunts, Stefan Harmeling, Timo Dickscheid
Comments: Accepted for oral presentation at International Symposium of Biomedical Imaging (ISBI) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Microscopic analysis of histological sections is considered the “gold
standard” to verify structural parcellations in the human brain. Its high
resolution allows the study of laminar and columnar patterns of cell
distributions, which build an important basis for the simulation of cortical
areas and networks. However, such cytoarchitectonic mapping is a semiautomatic,
time consuming process that does not scale with high throughput imaging. We
present an automatic approach for parcellating histological sections at 2um
resolution. It is based on a convolutional neural network that combines
topological information from probabilistic atlases with the texture features
learned from high-resolution cell-body stained images. The model is applied to
visual areas and trained on a sparse set of partial annotations. We show how
predictions are transferable to new brains and spatially consistent across
sections.
Zhanyu Ma, Jing-Hao Xue, Arne Leijon, Zheng-Hua Tan, Zhen Yang, Jun Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
In this paper, we propose novel strategies for neutral vector variable
decorrelation. Two fundamental invertible transformations, namely serial
nonlinear transformation and parallel nonlinear transformation, are proposed to
carry out the decorrelation. For a neutral vector variable, which is not
multivariate Gaussian distributed, the conventional principal component
analysis (PCA) cannot yield mutually independent scalar variables. With the two
proposed transformations, a highly negatively correlated neutral vector can be
transformed to a set of mutually independent scalar variables with the same
degrees of freedom. We also evaluate the decorrelation performances for the
vectors generated from a single Dirichlet distribution and a mixture of
Dirichlet distributions. The mutual independence is verified with the distance
correlation measurement. The advantages of the proposed decorrelation
strategies are intensively studied and demonstrated with synthesized data and
practical application evaluations.
Haifeng Li, Chao Tao, Zhixiang Wu, Jie Chen, Jianya Gong, Min Deng
Comments: 39 pages, 21 figures, 7 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Remote sensing image classification is a fundamental task in remote sensing
image processing. Remote sensing field still lacks of such a large-scale
benchmark compared to ImageNet, Place2. We propose a remote sensing image
classification benchmark (RSI-CB) based on crowd-source data which is massive,
scalable, and diversity. Using crowdsource data, we can efficiently annotate
ground objects in remotes sensing image by point of interests, vectors data
from OSM or other crowd-source data. Based on this method, we construct a
worldwide large-scale benchmark for remote sensing image classification. In
this benchmark, there are two sub datasets with 256 * 256 and 128 * 128 size
respectively since different convolution neural networks requirement different
image size. The former sub dataset contains 6 categories with 35 subclasses
with total of more than 24,000 images; the later one contains 6 categories with
45 subclasses with total of more than 36,000 images. The six categories are
agricultural land, construction land and facilities, transportation and
facilities, water and water conservancy facilities, woodland and other land,
and each category has several subclasses. This classification system is defined
according to the national standard of land use classification in China, and is
inspired by the hierarchy mechanism of ImageNet. Finally, we have done a large
number of experiments to compare RSI-CB with SAT-4, UC-Merced datasets on
handcrafted features, such as such as SIFT, and classical CNN models, such as
AlexNet, VGG, GoogleNet, and ResNet. We also show CNN models trained by RSI-CB
have good performance when transfer to other dataset, i.e. UC-Merced, and good
generalization ability. The experiments show that RSI-CB is more suitable as a
benchmark for remote sensing image classification task than other ones in big
data era, and can be potentially used in practical applications.
Jimmy Ren, Zhiyang Yu, Jianbo Liu, Rui Zhang, Wenxiu Sun, Jiahao Pang, Xiaohao Chen, Qiong Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advances in visual tracking showed that deep Convolutional Neural
Networks (CNN) trained for image classification can be strong feature
extractors for discriminative trackers. However, due to the drastic difference
between image classification and tracking, extra treatments such as model
ensemble and feature engineering must be carried out to bridge the two domains.
Such procedures are either time consuming or hard to generalize well across
datasets. In this paper we discovered that the internal structure of Region
Proposal Network (RPN)’s top layer feature can be utilized for robust visual
tracking. We showed that such property has to be unleashed by a novel loss
function which simultaneously considers classification accuracy and bounding
box quality. Without ensemble and any extra treatment on feature maps, our
proposed method achieved state-of-the-art results on several large scale
benchmarks including OTB50, OTB100 and VOT2016. We will make our code publicly
available.
Hehe Fan, Liang Zheng, Yi Yang
Comments: This version is a draft. We will update it with more results, parameter analysis and comparisons
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The superiority of deeply learned pedestrian representations has been
reported in very recent literature of person re-identification (re-ID). In this
paper, we consider the more pragmatic issue of learning a deep feature with
only a few or no labels. We propose a progressive unsupervised learning (PUL)
method to transfer pretrained deep representations to unseen domains. Our
method is easy to implement and can be viewed as an effective baseline for
unsupervised feature learning. Specifically, PUL iterates between 1) pedestrian
clustering and 2) fine-tuning of the convolutional neural network (CNN) to
improve the original model trained on the irrelevant labeled dataset. At the
beginning when the model is weak, CNN is fine-tuned on a small amount of
reliable examples which locate near to cluster centroids in the feature space.
As the model becomes stronger in subsequent iterations, more images are being
selected as CNN training samples. Progressively, pedestrian clustering and the
CNN model are improved simultaneously until algorithm convergence. We then
point out promising directions that may lead to further improvement.
Basura Fernando, Stephen Gould
Comments: International Journal of Computer Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we present novel temporal encoding methods for action and
activity classification by extending the unsupervised rank pooling temporal
encoding method in two ways. First, we present “discriminative rank pooling” in
which the shared weights of our video representation and the parameters of the
action classifiers are estimated jointly for a given training dataset of
labelled vector sequences using a bilevel optimization formulation of the
learning problem. When the frame level features vectors are obtained from a
convolutional neural network (CNN), we rank pool the network activations and
jointly estimate all parameters of the model, including CNN filters and
fully-connected weights, in an end-to-end manner which we coined as “end-to-end
trainable rank pooled CNN”. Importantly, this model can make use of any
existing convolutional neural network architecture (e.g., AlexNet or VGG)
without modification or introduction of additional parameters. Then, we extend
rank pooling to a high capacity video representation, called “hierarchical rank
pooling”. Hierarchical rank pooling consists of a network of rank pooling
functions, which encode temporal semantics over arbitrary long video clips
based on rich frame level features. By stacking non-linear feature functions
and temporal sub-sequence encoders one on top of the other, we build a high
capacity encoding network of the dynamic behaviour of the video. The resulting
video representation is a fixed-length feature vector describing the entire
video clip that can be used as input to standard machine learning classifiers.
We demonstrate our approach on the task of action and activity recognition.
Obtained results are comparable to state-of-the-art methods on three important
activity recognition benchmarks with classification performance of 76.7% mAP on
Hollywood2, 69.4% on HMDB51, and 93.6% on UCF101.
Evgeny Zamyatin, Andrey Filchenkov
Comments: Submitted to NIPS 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Generative adversarial networks (GANs) has gained tremendous popularity
lately due to an ability to reinforce quality of its predictive model with
generated objects and the quality of the generative model with and supervised
feedback. GANs allow to synthesize images with a high degree of realism.
However, the learning process of such models is a very complicated optimization
problem and certain limitation for such models were found. It affects the
choice of certain layers and nonlinearities when designing architectures. In
particular, it does not allow to train convolutional GAN models with
fully-connected hidden layers. In our work, we propose a modification of the
previously described set of rules, as well as new approaches to designing
architectures that will allow us to train more powerful GAN models. We show the
effectiveness of our methods on the problem of synthesizing projections of 3D
objects with the possibility of interpolation by class and view point.
Junjie Bai, Abhay Shah, Xiaodong Wu
Comments: Paper in review
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Shape priors have been widely utilized in medical image segmentation to
improve segmentation accuracy and robustness. A major way to encode such a
prior shape model is to use a mesh representation, which is prone to causing
self-intersection or mesh folding. Those problems require complex and expensive
algorithms to mitigate. In this paper, we propose a novel shape prior directly
embedded in the voxel grid space, based on gradient vector flows of a
pre-segmentation. The flexible and powerful prior shape representation is ready
to be extended to simultaneously segmenting multiple interacting objects with
minimum separation distance constraint. The problem is formulated as a Markov
random field problem whose exact solution can be efficiently computed with a
single minimum s-t cut in an appropriately constructed graph. The proposed
algorithm is validated on two multi-object segmentation applications: the brain
tissue segmentation in MRI images, and the bladder/prostate segmentation in CT
images. Both sets of experiments show superior or competitive performance of
the proposed method to other state-of-the-art methods.
Titus Cieslewski, Davide Scaramuzza
Comments: 3 pages, 4 figures. This is a self-published paper that accompanies our original work [1] as well as the ICRA 2017 Workshop on Multi-robot Perception-Driven Control and Planning [2]
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we discuss the adaptation of our decentralized place
recognition method described in [1] to full image descriptors. As we had shown,
the key to making a scalable decentralized visual place recognition lies in
exploting deterministic key assignment in a distributed key-value map. Through
this, it is possible to reduce bandwidth by up to a factor of n, the robot
count, by casting visual place recognition to a key-value lookup problem. In
[1], we exploited this for the bag-of-words method [3], [4]. Our method of
casting bag-of-words, however, results in a complex decentralized system, which
has inherently worse recall than its centralized counterpart. In this paper, we
instead start from the recent full-image description method NetVLAD [5]. As we
show, casting this to a key-value lookup problem can be achieved with k-means
clustering, and results in a much simpler system than [1]. The resulting system
still has some flaws, albeit of a completely different nature: it suffers when
the environment seen during deployment lies in a different distribution in
feature space than the environment seen during training.
David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Deep neural networks trained on large supervised datasets have led to
impressive results in recent years. However, since well-annotated datasets can
be prohibitively expensive and time-consuming to collect, recent work has
explored the use of larger but noisy datasets that can be more easily obtained.
In this paper, we investigate the behavior of deep neural networks on training
sets with massively noisy labels. We show that successful learning is possible
even with an essentially arbitrary amount of noise. For example, on MNIST we
find that accuracy of above 90 percent is still attainable even when the
dataset has been diluted with 100 noisy examples for each clean example. Such
behavior holds across multiple patterns of label noise, even when noisy labels
are biased towards confusing classes. Further, we show how the required dataset
size for successful training increases with higher label noise. Finally, we
present simple actionable techniques for improving learning in the regime of
high label noise.
Lorien X. Hayden, Alexander A. Alemi, Paul H. Ginsparg, James P. Sethna
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Computer Vision and Pattern Recognition (cs.CV)
Neural networks have been shown to have a remarkable ability to uncover low
dimensional structure in data: the space of possible reconstructed images form
a reduced model manifold in image space. We explore this idea directly by
analyzing the manifold learned by Deep Belief Networks and Stacked Denoising
Autoencoders using Monte Carlo sampling. The model manifold forms an only
slightly elongated hyperball with actual reconstructed data appearing
predominantly on the boundaries of the manifold. In connection with the results
we present, we discuss problems of sampling high-dimensional manifolds as well
as recent work [M. Transtrum, G. Hart, and P. Qiu, Submitted (2014)] discussing
the relation between high dimensional geometry and model reduction.
Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Multiagent Systems (cs.MA)
Inspired by previous work on emergent language in referential games, we
propose a novel multi-modal, multi-step referential game, where the sender and
receiver have access to distinct modalities of an object, and their information
exchange is bidirectional and of arbitrary duration. The multi-modal multi-step
setting allows agents to develop an internal language significantly closer to
natural language, in that they share a single set of messages, and that the
length of the conversation may vary according to the difficulty of the task. We
examine these properties empirically using a dataset consisting of images and
textual descriptions of mammals, where the agents are tasked with identifying
the correct object. Our experiments indicate that a robust and efficient
communication protocol emerges, where gradual information exchange informs
better predictions and higher communication bandwidth improves generalization.
Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
The Wasserstein probability metric has received much attention from the
machine learning community. Unlike the Kullback-Leibler divergence, which
strictly measures change in probability, the Wasserstein metric reflects the
underlying geometry between outcomes. The value of being sensitive to this
geometry has been demonstrated, among others, in ordinal regression and
generative modelling. In this paper we describe three natural properties of
probability divergences that reflect requirements from machine learning: sum
invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein
metric possesses the first two properties but, unlike the Kullback-Leibler
divergence, does not possess the third. We provide empirical evidence
suggesting that this is a serious issue in practice. Leveraging insights from
probabilistic forecasting we propose an alternative to the Wasserstein metric,
the Cram’er distance. We show that the Cram’er distance possesses all three
desired properties, combining the best of the Wasserstein and Kullback-Leibler
divergences. To illustrate the relevance of the Cram’er distance in practice
we design a new algorithm, the Cram’er Generative Adversarial Network (GAN),
and show that it performs significantly better than the related Wasserstein
GAN.
Tina Fang, Martin Jaggi, Katerina Argyraki
Comments: ACL 2017 Student Research Workshop
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Multimedia (cs.MM)
Motivated by concerns for user privacy, we design a steganographic system
(“stegosystem”) that enables two users to exchange encrypted messages without
an adversary detecting that such an exchange is taking place. We propose a new
linguistic stegosystem based on a Long Short-Term Memory (LSTM) neural network.
We demonstrate our approach on the Twitter and Enron email datasets and show
that it yields high-quality steganographic text while significantly improving
capacity (encrypted bits per word) relative to the state-of-the-art.
Naveen Sundar Govindarajulu, Selmer Bringsjord
Subjects: Artificial Intelligence (cs.AI)
We present a new system S for handling uncertainty in a quantified modal
logic (first-order modal logic). The system is based on both probability theory
and proof theory. The system is derived from Chisholm’s epistemology. We
concretize Chisholm’s system by grounding his undefined and primitive (i.e.
foundational) concept of reasonablenes in probability and proof theory. S can
be useful in systems that have to interact with humans and provide
justifications for their uncertainty. As a demonstration of the system, we
apply the system to provide a solution to the lottery paradox. Another
advantage of the system is that it can be used to provide uncertainty values
for counterfactual statements. Counterfactuals are statements that an agent
knows for sure are false. Among other cases, counterfactuals are useful when
systems have to explain their actions to users. Uncertainties for
counterfactuals fall out naturally from our system.
Efficient reasoning in just simple first-order logic is a hard problem.
Resolution-based first-order reasoning systems have made significant progress
over the last several decades in building systems that have solved non-trivial
tasks (even unsolved conjectures in mathematics). We present a sketch of a
novel algorithm for reasoning that extends first-order resolution.
Finally, while there have been many systems of uncertainty for propositional
logics, first-order logics and propositional modal logics, there has been very
little work in building systems of uncertainty for first-order modal logics.
The work described below is in progress; and once finished will address this
lack.
Stuart Armstrong, Benjamin Levinstein
Subjects: Artificial Intelligence (cs.AI)
There are many goals for an AI that could become dangerous if the AI becomes
superintelligent or otherwise powerful. Much work on the AI control problem has
been focused on constructing AI goals that are safe even for such AIs. This
paper looks at an alternative approach: defining a general concept of `low
impact’. The aim is to ensure that a powerful AI which implements low impact
will not modify the world extensively, even if it is given a simple or
dangerous goal. The paper proposes various ways of defining and grounding low
impact, and discusses methods for ensuring that the AI can still be allowed to
have a (desired) impact despite the restriction. The end of the paper addresses
known issues with this approach and avenues for future research.
Ti-Rong Wu, I-Chen Wu, Guan-Wun Chen, Ting-han Wei, Tung-Yi Lai, Hung-Chun Wu, Li-Cheng Lan
Comments: This version was also submitted to IEEE TCIAIG on May 30, 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
This paper proposes a new approach to a novel value network architecture for
the game Go, called a multi-labelled (ML) value network. In the ML value
network, different values (win rates) are trained simultaneously for different
settings of komi, a compensation given to balance the initiative of playing
first. The ML value network has three advantages, (a) it outputs values for
different komi, (b) it supports dynamic komi, and (c) it lowers the mean
squared error (MSE). This paper also proposes a new dynamic komi method to
improve game-playing strength. This paper also performs experiments to
demonstrate the merits of the architecture. First, the MSE of the ML value
network is generally lower than the value network alone. Second, the program
based on the ML value network wins by a rate of 67.6% against the program based
on the value network alone. Third, the program with the proposed dynamic komi
method significantly improves the playing strength over the baseline that does
not use dynamic komi, especially for handicap games. To our knowledge, up to
date, no handicap games have been played openly by programs using value
networks. This paper provides these programs with a useful approach to playing
handicap games.
John Aslanides, Jan Leike, Marcus Hutter
Comments: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
Subjects: Artificial Intelligence (cs.AI)
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.
Victor do Nascimento Silva, Luiz Chaimowicz
Subjects: Artificial Intelligence (cs.AI)
Games have always been popular testbeds for Artificial Intelligence (AI). In
the last decade, we have seen the rise of the Multiple Online Battle Arena
(MOBA) games, which are the most played games nowadays. In spite of this, there
are few works that explore MOBA as a testbed for AI Research. In this paper we
present and discuss the main features and opportunities offered by MOBA games
to Game AI Research. We describe the various challenges faced along the game
and also propose a discrete model that can be used to better understand and
explore the game. With this, we aim to encourage the use of MOBA as a novel
research platform for Game AI.
Hamid Mirzaei, Tony Givargis
Comments: Accepted in IEEE Smart World Congress 2017
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (cs.SY)
Recent advances in combining deep learning and Reinforcement Learning have
shown a promising path for designing new control agents that can learn optimal
policies for challenging control tasks. These new methods address the main
limitations of conventional Reinforcement Learning methods such as customized
feature engineering and small action/state space dimension requirements. In
this paper, we leverage one of the state-of-the-art Reinforcement Learning
methods, known as Trust Region Policy Optimization, to tackle intersection
management for autonomous vehicles. We show that using this method, we can
perform fine-grained acceleration control of autonomous vehicles in a grid
street plan to achieve a global design objective.
Patrick Hohenecker, Thomas Lukasiewicz
Comments: 9 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
In this work, we present a novel approach to ontology reasoning that is based
on deep learning rather than logic-based formal reasoning. To this end, we
introduce a new model for statistical relational learning that is built upon
deep recursive neural networks, and give experimental evidence that it can
easily compete with, or even outperform, existing logic-based reasoners on the
task of ontology reasoning. More precisely, we compared our implemented system
with one of the best logic-based ontology reasoners at present, RDFox, on a
number of large standard benchmark datasets, and found that our system attained
high reasoning quality, while being up to two orders of magnitude faster.
Rudolf Kadlec, Ondrej Bajgar, Jan Kleindienst
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Many papers have been published on the knowledge base completion task in the
past few years. Most of these introduce novel architectures for relation
learning that are evaluated on standard datasets such as FB15k and WN18. This
paper shows that the accuracy of almost all models published on the FB15k can
be outperformed by an appropriately tuned baseline – our reimplementation of
the DistMult model. Our findings cast doubt on the claim that the performance
improvements of recent models are due to architectural changes as opposed to
hyper-parameter tuning or different training objectives. This should prompt
future research to re-consider how the performance of models is evaluated and
reported.
David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Deep neural networks trained on large supervised datasets have led to
impressive results in recent years. However, since well-annotated datasets can
be prohibitively expensive and time-consuming to collect, recent work has
explored the use of larger but noisy datasets that can be more easily obtained.
In this paper, we investigate the behavior of deep neural networks on training
sets with massively noisy labels. We show that successful learning is possible
even with an essentially arbitrary amount of noise. For example, on MNIST we
find that accuracy of above 90 percent is still attainable even when the
dataset has been diluted with 100 noisy examples for each clean example. Such
behavior holds across multiple patterns of label noise, even when noisy labels
are biased towards confusing classes. Further, we show how the required dataset
size for successful training increases with higher label noise. Finally, we
present simple actionable techniques for improving learning in the regime of
high label noise.
Gianluca Cima
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
Despite the current interest in Open Data publishing, a formal and
comprehensive methodology supporting an organization in deciding which data to
publish and carrying out precise procedures for publishing high-quality data,
is still missing. In this paper we argue that the Ontology-based Data
Management paradigm can provide a formal basis for a principled approach to
publish high quality, semantically annotated Open Data. We describe two main
approaches to using an ontology for this endeavor, and then we present some
technical results on one of the approaches, called bottom-up, where the
specification of the data to be published is given in terms of the sources, and
specific techniques allow deriving suitable annotations for interpreting the
published data under the light of the ontology.
Guan-Horng Liu, Avinash Siravuru, Sai Prabhakar, Manuela Veloso, George Kantor
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Sensor fusion is indispensable to improve accuracy and robustness in an
autonomous navigation setting. However, in the space of end-to-end sensorimotor
control, this multimodal outlook has received limited attention. In this work,
we propose a novel stochastic regularization technique, called Sensor Dropout,
to robustify multimodal sensor policy learning outcomes. We also introduce an
auxiliary loss on policy network along with the standard DRL loss that help
reduce the action variations of the multimodal sensor policy. Through empirical
testing we demonstrate that our proposed policy can 1) operate with minimal
performance drop in noisy environments, 2) remain functional even in the face
of a sensor subset failure. Finally, through the visualization of gradients, we
show that the learned policies are conditioned on the same latent input
distribution despite having multiple sensory observations spaces – a hallmark
of true sensor-fusion. This efficacy of a multimodal policy is shown through
simulations on TORCS, a popular open-source racing car game. A demo video can
be seen here: this https URL
Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine Yilmaz
Comments: 8 pages Accepted at WWW 2017
Subjects: Information Retrieval (cs.IR)
Many online services, such as search engines, social media platforms, and
digital marketplaces, are advertised as being available to any user, regardless
of their age, gender, or other demographic factors. However, there are growing
concerns that these services may systematically underserve some groups of
users. In this paper, we present a framework for internally auditing such
services for differences in user satisfaction across demographic groups, using
search engines as a case study. We first explain the pitfalls of na”ively
comparing the behavioral metrics that are commonly used to evaluate search
engines. We then propose three methods for measuring latent differences in user
satisfaction from observed differences in evaluation metrics. To develop these
methods, we drew on ideas from the causal inference literature and the
multilevel modeling literature. Our framework is broadly applicable to other
online services, and provides general insight into interpreting their
evaluation metrics.
Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, Dell Zhang
Comments: 10 pages
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)
This paper provides a unified account of two schools of thinking in
information retrieval modelling: the generative retrieval focusing on
predicting relevant documents given a query, and the discriminative retrieval
focusing on predicting relevancy given a query-document pair. We propose a game
theoretical minimax game to iteratively optimise both models. On one hand, the
discriminative model, aiming to mine signals from labelled and unlabelled data,
provides guidance to train the generative model towards fitting the underlying
relevance distribution over documents given the query. On the other hand, the
generative model, acting as an attacker to the current discriminative model,
generates difficult examples for the discriminative model in an adversarial way
by minimising its discrimination objective. With the competition between these
two models, we show that the unified framework takes advantage of both schools
of thinking: (i) the generative model learns to fit the relevance distribution
over documents via the signals from the discriminative model, and (ii) the
discriminative model is able to exploit the unlabelled data selected by the
generative model to achieve a better estimation for document ranking. Our
experimental results have demonstrated significant performance gains as much as
23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
applications including web search, item recommendation, and question answering.
Hamidreza Alvari
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)
Twitter, one of the biggest and most popular microblogging Websites, has
evolved into a powerful communication platform which allows millions of active
users to generate huge volume of microposts and queries on a daily basis. To
accommodate effective categorization and easy search, users are allowed to make
use of hashtags, keywords or phrases prefixed by hash character, to categorize
and summarize their posts. However, valid hashtags are not restricted and thus
are created in a free and heterogeneous style, increasing difficulty of the
task of tweet categorization. In this paper, we propose a low-rank weighted
matrix factorization based method to recommend hashtags to the users solely
based on their hashtag usage history and independent from their tweets’
contents. We confirm using two-sample t-test that users are more likely to
adopt new hashtags similar to the ones they have previously adopted. In
particular, we formulate the problem of hashtag recommendation into an
optimization problem and incorporate hashtag correlation weight matrix into it
to account for the similarity between different hashtags. We finally leverage
widely used matrix factorization from recommender systems to solve the
optimization problem by capturing the latent factors of users and hashtags.
Empirical experiments demonstrate that our method is capable to properly
recommend hashtags.
Eric S. Tellez, Guillermo Ruiz, Edgar Chavez, Mario Graff
Comments: extended version of SISAP’15 “Guillermo Ruiz, Edgar Chavez, Mario Graff, Eric S. Tellez: Finding Near Neighbors Through Local Search. SISAP 2015: 103-109”
Subjects: Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)
Near neighbor search (NNS) has been traditionally addressed from an
algorithmic perspective, that is, given a dataset and a distance function, the
goal is to build a data structure along with discarding rules that quickly find
the near neighbor of a query under the given distance function. In this
manuscript, we take another approach. We restate NNS as a combinatorial
optimization problem, being the goal to minimize the distance from the query to
the data set. This cost function is well defined, and the minimum is realized
precisely with the nearest object to the query. Adopting this new view allows
the use of a rich collection of optimization algorithms with a long tradition.
We present three local search algorithms that solve the NNS problem from the
combinatorial optimization point of view. Two of the algorithms can be
described as the hyper-parameter optimization of two known indexing methods,
simplifying their adoption for final users. The third method achieves excellent
search tradeoffs among speed, memory, and quality of the approximation. For
instance, for moderately large dimensions, our indexes can reach above 0.99
recall reviewing less than 1% of the database. We conducted extensive
experimentation with five real-world standard benchmark and five synthetic
datasets to prove our claims.
Francisco Rangel, Marc Franco-Salvador, Paolo Rosso
Journal-ref: CICLing – Computational Linguistics and Intelligent Text
Processing, 2016
Subjects: Computation and Language (cs.CL)
Language variety identification aims at labelling texts in a native language
(e.g. Spanish, Portuguese, English) with its specific variation (e.g.
Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work
we propose a low dimensionality representation (LDR) to address this task with
five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain.
We compare our LDR method with common state-of-the-art representations and show
an increase in accuracy of ~35%. Furthermore, we compare LDR with two reference
distributed representation models. Experimental results show competitive
performance while dramatically reducing the dimensionality –and increasing the
big data suitability– to only 6 features per variety. Additionally, we analyse
the behaviour of the employed machine learning algorithms and the most
discriminating features. Finally, we employ an alternative dataset to test the
robustness of our low dimensionality representation with another set of similar
languages.
Thai-Hoang Pham, Phuong Le-Hong
Comments: 7 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:1705.04044
Subjects: Computation and Language (cs.CL)
This paper presents a state-of-the-art system for Vietnamese Named Entity
Recognition (NER). By incorporating automatic syntactic features with word
embeddings as input for bidirectional Long Short-Term Memory (Bi-LSTM), our
system, although simpler than some deep learning architectures, achieves a much
better result for Vietnamese NER. The proposed method achieves an overall F1
score of 92.05% on the test set of an evaluation campaign, organized in late
2016 by the Vietnamese Language and Speech Processing (VLSP) community. Our
named entity recognition system outperforms the best previous systems for
Vietnamese NER by a large margin.
Zhenzhou Wu, Xin Zheng, Daniel Dahlmeier
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Despite the success of deep learning on many fronts especially image and
speech, its application in text classification often is still not as good as a
simple linear SVM on n-gram TF-IDF representation especially for smaller
datasets. Deep learning tends to emphasize on sentence level semantics when
learning a representation with models like recurrent neural network or
recursive neural network, however from the success of TF-IDF representation, it
seems a bag-of-words type of representation has its strength. Taking advantage
of both representions, we present a model known as TDSM (Top Down Semantic
Model) for extracting a sentence representation that considers both the
word-level semantics by linearly combining the words with attention weights and
the sentence-level semantics with BiLSTM and use it on text classification. We
apply the model on characters and our results show that our model is better
than all the other character-based and word-based convolutional neural network
models by cite{zhang15} across seven different datasets with only 1\% of their
parameters. We also demonstrate that this model beats traditional linear models
on TF-IDF vectors on small and polished datasets like news article in which
typically deep learning models surrender.
Vanessa Q. Marinho, Henrique F. de Arruda, Thales S. Lima, Luciano F. Costa, Diego R. Amancio
Comments: TextGraphs ACL 2017 (to appear)
Subjects: Computation and Language (cs.CL)
Authorship attribution is a natural language processing task that has been
widely studied, often by considering small order statistics. In this paper, we
explore a complex network approach to assign the authorship of texts based on
their mesoscopic representation, in an attempt to capture the flow of the
narrative. Indeed, as reported in this work, such an approach allowed the
identification of the dominant narrative structure of the studied authors. This
has been achieved due to the ability of the mesoscopic approach to take into
account relationships between different, not necessarily adjacent, parts of the
text, which is able to capture the story flow. The potential of the proposed
approach has been illustrated through principal component analysis, a
comparison with the chance baseline method, and network visualization. Such
visualizations reveal individual characteristics of the authors, which can be
understood as a kind of calligraphy.
Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Multiagent Systems (cs.MA)
Inspired by previous work on emergent language in referential games, we
propose a novel multi-modal, multi-step referential game, where the sender and
receiver have access to distinct modalities of an object, and their information
exchange is bidirectional and of arbitrary duration. The multi-modal multi-step
setting allows agents to develop an internal language significantly closer to
natural language, in that they share a single set of messages, and that the
length of the conversation may vary according to the difficulty of the task. We
examine these properties empirically using a dataset consisting of images and
textual descriptions of mammals, where the agents are tasked with identifying
the correct object. Our experiments indicate that a robust and efficient
communication protocol emerges, where gradual information exchange informs
better predictions and higher communication bandwidth improves generalization.
Tom Vander Aa, Imen Chakroun, Tom Haber
Comments: arXiv admin note: substantial text overlap with arXiv:1705.04159
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Using the matrix factorization technique in machine learning is very common
mainly in areas like recommender systems. Despite its high prediction accuracy
and its ability to avoid over-fitting of the data, the Bayesian Probabilistic
Matrix Factorization algorithm (BPMF) has not been widely used on large scale
data because of the prohibitive cost. In this paper, we propose a distributed
high-performance parallel implementation of the BPMF using Gibbs sampling on
shared and distributed architectures. We show by using efficient load balancing
using work stealing on a single node, and by using asynchronous communication
in the distributed version we beat state of the art implementations.
Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Convolution is a fundamental operation in many applications, such as computer
vision, natural language processing, image processing, etc. Recent successes of
convolutional neural networks in various deep learning applications put even
higher demand on fast convolution. The high computation throughput and memory
bandwidth of graphics processing units (GPUs) make GPUs a natural choice for
accelerating convolution operations. However, maximally exploiting the
available memory bandwidth of GPUs for convolution is a challenging task. This
paper introduces a general model to address the mismatch between the memory
bank width of GPUs and computation data width of threads. Based on this model,
we develop two convolution kernels, one for the general case and the other for
a special case with one input channel. By carefully optimizing memory access
patterns and computation patterns, we design a communication-optimized kernel
for the special case and a communication-reduced kernel for the general case.
Experimental data based on implementations on Kepler GPUs show that our kernels
achieve 5.16X and 35.5% average performance improvement over the latest cuDNN
library, for the special case and the general case, respectively.
Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)
We consider a large-scale matrix multiplication problem where the computation
is carried out using a distributed system with a master node and multiple
worker nodes, where each worker can store parts of the input matrices. We
propose a computation strategy that leverages ideas from coding theory to
design intermediate computations at the worker nodes, in order to efficiently
deal with straggling workers. The proposed strategy, named as emph{polynomial
codes}, achieves the optimum recovery threshold, defined as the minimum number
of workers that the master needs to wait for in order to compute the output.
Furthermore, by leveraging the algebraic structure of polynomial codes, we can
map the reconstruction problem of the final output to a polynomial
interpolation problem, which can be solved efficiently. Polynomial codes
provide order-wise improvement over the state of the art in terms of recovery
threshold, and are also optimal in terms of several other metrics. Furthermore,
we extend this code to distributed convolution and show its order-wise
optimality.
Mercy O. Jaiyeola, Kyle Patron, Jared Saia, Maxwell Young, Qian M. Zhou
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
A popular technique for tolerating Byzantine faults in open distributed
systems is to group machines into sets called quorums, each of which has an
honest majority. These quorums are then used as basic building blocks to design
systems that are robust to adversarial faults.
Despite over a decade of active research, all current algorithms require
quorum sizes of (Omega(log n)), where (n) is the number of machines in the
network. This size is important since communication cost scales polynomially in
the size of the quorum. Given the stubbornness of this (Omega(log n))
barrier, a natural question is whether better bounds are possible.
In this paper, we demonstrate that it is possible to reduce quorums sizes to
(O(loglog n)), despite an adversary that controls a constant fraction of the
computational resources in the network. In particular, we show that even with
such small quorums, we can ensure that all but an (o(1))-fraction of the
machines can communicate with all but an (o(1))-fraction of the machines in the
network.
Giorgos Borboudakis, Ioannis Tsamardinos
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Forward-backward selection is one of the most basic and commonly-used feature
selection algorithms available. It is also general and conceptually applicable
to many different types of data. In this paper, we propose a heuristic that
significantly improves its running time, while preserving predictive accuracy.
The idea is to temporarily discard the variables that are conditionally
independent with the outcome given the selected variable set. Depending on how
those variables are reconsidered and reintroduced, this heuristic gives rise to
a family of algorithms with increasingly stronger theoretical guarantees. In
distributions that can be faithfully represented by Bayesian networks or
maximal ancestral graphs, members of this algorithmic family are able to
correctly identify the Markov blanket in the sample limit. In experiments we
show that the proposed heuristic increases computational efficiency by about
two orders of magnitude in high-dimensional problems, while selecting fewer
variables and retaining predictive performance. Furthermore, we show that the
proposed algorithm and feature selection with LASSO perform similarly when
restricted to select the same number of variables, making the proposed
algorithm an attractive alternative for problems where no (efficient) algorithm
for LASSO exists.
Ramakrishna Vedantam, Ian Fischer, Jonathan Huang, Kevin Murphy
Comments: 25 pages, 14 figures, Submitted to NIPS 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Consider how easy it is for people to imagine what a “purple hippo” would
look like, even though they do not exist. If we instead said “purple hippo with
wings”, they could just as easily create a different internal mental
representation, to represent this more specific concept. To assess whether the
person has correctly understood the concept, we can ask them to draw a few
sketches, to illustrate their thoughts. We call the ability to map text
descriptions of concepts to latent representations and then to images (or vice
versa) visually grounded semantic imagination. We propose a latent variable
model for images and attributes, based on variational auto-encoders, which can
perform this task. Our method uses a novel training objective, and a novel
product-of-experts inference network, which can handle partially specified
(abstract) concepts in a principled and efficient way. We also propose a set of
easy-to-compute evaluation metrics that capture our intuitive notions of what
it means to have good imagination, namely correctness, coverage, and
compositionality (the 3 C’s). Finally, we perform a detailed comparison (in
terms of the 3 C’s) of our method with two existing joint image-attribute VAE
methods (the JMVAE method of (Suzuki et al., 2017) and the bi-VCCA method of
(Wang et al., 2016)) by applying them to two simple datasets based on MNIST,
where it is easy to objectively evaluate performance in a controlled way.
Junier B. Oliva, Kumar Avinava Dubey, Barnabas Poczos, Eric Xing, Jeff Schneider
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
This paper presents the recurrent estimation of distributions (RED) for
modeling real-valued data in a semiparametric fashion. RED models make two
novel uses of recurrent neural networks (RNNs) for density estimation of
general real-valued data. First, RNNs are used to transform input covariates
into a latent space to better capture conditional dependencies in inputs.
After, an RNN is used to compute the conditional distributions of the latent
covariates. The resulting model is efficient to train, compute, and sample
from, whilst producing normalized pdfs. The effectiveness of RED is shown via
several real-world data experiments. Our results show that RED models achieve a
lower held-out negative log-likelihood than other neural network approaches
across multiple dataset sizes and dimensionalities. Further context of the
efficacy of RED is provided by considering anomaly detection tasks, where we
also observe better performance over alternative models.
Rudolf Kadlec, Ondrej Bajgar, Jan Kleindienst
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Many papers have been published on the knowledge base completion task in the
past few years. Most of these introduce novel architectures for relation
learning that are evaluated on standard datasets such as FB15k and WN18. This
paper shows that the accuracy of almost all models published on the FB15k can
be outperformed by an appropriately tuned baseline – our reimplementation of
the DistMult model. Our findings cast doubt on the claim that the performance
improvements of recent models are due to architectural changes as opposed to
hyper-parameter tuning or different training objectives. This should prompt
future research to re-consider how the performance of models is evaluated and
reported.
David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Deep neural networks trained on large supervised datasets have led to
impressive results in recent years. However, since well-annotated datasets can
be prohibitively expensive and time-consuming to collect, recent work has
explored the use of larger but noisy datasets that can be more easily obtained.
In this paper, we investigate the behavior of deep neural networks on training
sets with massively noisy labels. We show that successful learning is possible
even with an essentially arbitrary amount of noise. For example, on MNIST we
find that accuracy of above 90 percent is still attainable even when the
dataset has been diluted with 100 noisy examples for each clean example. Such
behavior holds across multiple patterns of label noise, even when noisy labels
are biased towards confusing classes. Further, we show how the required dataset
size for successful training increases with higher label noise. Finally, we
present simple actionable techniques for improving learning in the regime of
high label noise.
Mingsheng Long, Zhangjie Cao, Jianmin Wang, Michael I. Jordan
Comments: arXiv admin note: text overlap with arXiv:1605.06636
Subjects: Learning (cs.LG)
Adversarial learning has been successfully embedded into deep networks to
learn transferable features for domain adaptation, which reduce distribution
discrepancy between the source and target domains and improve generalization
performance. Prior domain adversarial adaptation methods could not align
complex multimode distributions since the discriminative structures and
inter-layer interactions across multiple domain-specific layers have not been
exploited for distribution alignment. In this paper, we present randomized
multilinear adversarial networks (RMAN), which exploit multiple feature layers
and the classifier layer based on a randomized multilinear adversary to enable
both deep and discriminative adversarial adaptation. The learning can be
performed by stochastic gradient descent with the gradients computed by
back-propagation in linear-time. Experiments demonstrate that our models exceed
the state-of-the-art results on standard domain adaptation datasets.
Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel
Comments: Accepted to ICML 2017
Subjects: Learning (cs.LG)
For many applications of reinforcement learning it can be more convenient to
specify both a reward function and constraints, rather than trying to design
behavior through the reward function. For example, systems that physically
interact with or around humans should satisfy safety constraints. Recent
advances in policy search algorithms (Mnih et al., 2016, Schulman et al., 2015,
Lillicrap et al., 2016, Levine et al., 2016) have enabled new capabilities in
high-dimensional control, but do not consider the constrained setting.
We propose Constrained Policy Optimization (CPO), the first general-purpose
policy search algorithm for constrained reinforcement learning with guarantees
for near-constraint satisfaction at each iteration. Our method allows us to
train neural network policies for high-dimensional control while making
guarantees about policy behavior all throughout training. Our guarantees are
based on a new theoretical result, which is of independent interest: we prove a
bound relating the expected returns of two policies to an average divergence
between them. We demonstrate the effectiveness of our approach on simulated
robot locomotion tasks where the agent must satisfy constraints motivated by
safety.
Marko V. Jankovic
Subjects: Learning (cs.LG); Information Theory (cs.IT)
In this paper, we propose the classification method based on a learning
paradigm we are going to call Quantum Low Entropy based Associative Reasoning
or QLEAR learning. The approach is based on the idea that classification can be
understood as supervised clustering, where a quantum entropy in the context of
the quantum probabilistic model, will be used as a “capturer” (measure, or
external index), of the “natural structure” of the data. By using quantum
entropy we do not make any assumption about linear separability of the data
that are going to be classified. The basic idea is to find close neighbors to a
query sample and then use relative change in the quantum entropy as a measure
of similarity of the newly arrived sample with the representatives of interest.
In other words, method is based on calculation of quantum entropy of the
referent system and its relative change with the addition of the newly arrived
sample. Referent system consists of vectors that represent individual classes
and that are the most similar, in Euclidean distance sense, to the vector that
is analyzed. Here, we analyze the classification problem in the context of
measuring similarities to prototype examples of categories. While nearest
neighbor classifiers are natural in this setting, they suffer from the problem
of high variance (in bias-variance decomposition) in the case of limited
sampling. Alternatively, one could use machine learning techniques (like
support vector machines) but they involve time-consuming optimization. Here we
propose a hybrid of nearest neighbor and machine learning technique which deals
naturally with the multi-class setting, has reasonable computational complexity
both in training and at run time, and yields excellent results in practice.
Luisa F. Polania, Kenneth E. Barner
Comments: Accepted for publication at IEEE Transactions on Signal Processing
Subjects: Learning (cs.LG)
This paper proposes a CS scheme that exploits the representational power of
restricted Boltzmann machines and deep learning architectures to model the
prior distribution of the sparsity pattern of signals belonging to the same
class. The determined probability distribution is then used in a maximum a
posteriori (MAP) approach for the reconstruction. The parameters of the prior
distribution are learned from training data. The motivation behind this
approach is to model the higher-order statistical dependencies between the
coefficients of the sparse representation, with the final goal of improving the
reconstruction. The performance of the proposed method is validated on the
Berkeley Segmentation Dataset and the MNIST Database of handwritten digits.
Kfir Y. Levy
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We present an approach towards convex optimization that relies on a novel
scheme which converts online adaptive algorithms into offline methods. In the
offline optimization setting, our derived methods are shown to obtain
favourable adaptive guarantees which depend on the harmonic sum of the queried
gradients. We further show that our methods implicitly adapt to the objective’s
structure: in the smooth case fast convergence rates are ensured without any
prior knowledge of the smoothness parameter, while still maintaining guarantees
in the non-smooth setting. Our approach has a natural extension to the
stochastic setting, resulting in a lazy version of SGD (stochastic GD), where
minibathces are chosen emph{adaptively} depending on the magnitude of the
gradients. Thus providing a principled approach towards choosing minibatch
sizes.
Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, Ameet Talwalkar
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Federated learning poses new statistical and systems challenges in training
machine learning models over distributed networks of devices. In this work, we
show that multi-task learning is naturally suited to handle the statistical
challenges of this setting, and propose a novel systems-aware optimization
method, MOCHA, that is robust to practical systems issues. Our method and
theory for the first time consider issues of high communication cost,
stragglers, and fault tolerance for distributed multi-task learning. The
resulting method achieves significant speedups compared to alternatives in the
federated setting, as we demonstrate through simulations on real-world
federated datasets.
Lars Mescheder, Sebastian Nowozin, Andreas Geiger
Subjects: Learning (cs.LG)
In this paper, we analyze the numerics of common algorithms for training
Generative Adversarial Networks (GANs). Using the formalism of smooth
two-player games we analyze the associated gradient vector field of GAN
training objectives. Our findings suggest that the convergence of current
algorithms suffers due to two factors: i) presence of eigenvalues of the
Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues
with big imaginary part. Using these findings, we design a new algorithm that
overcomes some of these limitations and has better convergence properties.
Experimentally, we demonstrate its superiority on training common GAN
architectures and show convergence on GAN architectures that are known to be
notoriously hard to train.
Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Multiagent Systems (cs.MA)
Inspired by previous work on emergent language in referential games, we
propose a novel multi-modal, multi-step referential game, where the sender and
receiver have access to distinct modalities of an object, and their information
exchange is bidirectional and of arbitrary duration. The multi-modal multi-step
setting allows agents to develop an internal language significantly closer to
natural language, in that they share a single set of messages, and that the
length of the conversation may vary according to the difficulty of the task. We
examine these properties empirically using a dataset consisting of images and
textual descriptions of mammals, where the agents are tasked with identifying
the correct object. Our experiments indicate that a robust and efficient
communication protocol emerges, where gradual information exchange informs
better predictions and higher communication bandwidth improves generalization.
Dajiang Zhu, Brandalyn C. Riedel, Neda Jahanshad, Nynke A. Groenewold, Dan J. Stein, Ian H. Gotlib, Danai Dima, James H. Cole, Cynthia H.Y. Fu, Henrik Walter, Ilya M. Veer, Thomas Frodl, Lianne Schmaal, Dick J. Veltman, Paul M. Thompson
Comments: Accepted by MICCAI 2017
Subjects: Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Applications (stat.AP)
Large-scale collaborative analysis of brain imaging data, in psychiatry and
neu-rology, offers a new source of statistical power to discover features that
boost ac-curacy in disease classification, differential diagnosis, and outcome
prediction. However, due to data privacy regulations or limited accessibility
to large datasets across the world, it is challenging to efficiently integrate
distributed information. Here we propose a novel classification framework
through multi-site weighted LASSO: each site performs an iterative weighted
LASSO for feature selection separately. Within each iteration, the
classification result and the selected features are collected to update the
weighting parameters for each feature. This new weight is used to guide the
LASSO process at the next iteration. Only the fea-tures that help to improve
the classification accuracy are preserved. In tests on da-ta from five sites
(299 patients with major depressive disorder (MDD) and 258 normal controls),
our method boosted classification accuracy for MDD by 4.9% on average. This
result shows the potential of the proposed new strategy as an ef-fective and
practical collaborative platform for machine learning on large scale
distributed imaging and biobank data.
Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
The Wasserstein probability metric has received much attention from the
machine learning community. Unlike the Kullback-Leibler divergence, which
strictly measures change in probability, the Wasserstein metric reflects the
underlying geometry between outcomes. The value of being sensitive to this
geometry has been demonstrated, among others, in ordinal regression and
generative modelling. In this paper we describe three natural properties of
probability divergences that reflect requirements from machine learning: sum
invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein
metric possesses the first two properties but, unlike the Kullback-Leibler
divergence, does not possess the third. We provide empirical evidence
suggesting that this is a serious issue in practice. Leveraging insights from
probabilistic forecasting we propose an alternative to the Wasserstein metric,
the Cram’er distance. We show that the Cram’er distance possesses all three
desired properties, combining the best of the Wasserstein and Kullback-Leibler
divergences. To illustrate the relevance of the Cram’er distance in practice
we design a new algorithm, the Cram’er Generative Adversarial Network (GAN),
and show that it performs significantly better than the related Wasserstein
GAN.
Eric Price, Zhao Song, David P. Woodruff
Comments: ICALP 2017
Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)
Sketching has emerged as a powerful technique for speeding up problems in
numerical linear algebra, such as regression. In the overconstrained regression
problem, one is given an (n imes d) matrix (A), with (n gg d), as well as an
(n imes 1) vector (b), and one wants to find a vector (hat{x}) so as to
minimize the residual error (|Ax-b|_2). Using the sketch and solve paradigm,
one first computes (S cdot A) and (S cdot b) for a randomly chosen matrix
(S), then outputs (x’ = (SA)^{dagger} Sb) so as to minimize (|SAx’ – Sb|_2).
The sketch-and-solve paradigm gives a bound on (|x’-x^*|_2) when (A) is
well-conditioned. Our main result is that, when (S) is the subsampled
randomized Fourier/Hadamard transform, the error (x’ – x^*) behaves as if it
lies in a “random” direction within this bound: for any fixed direction (ain
mathbb{R}^d), we have with (1 – d^{-c}) probability that
[
langle a, x’-x^*
angle lesssim
frac{|a|_2|x’-x^*|_2}{d^{frac{1}{2}-gamma}}, quad (1)
]
where (c, gamma > 0) are arbitrary constants.
This implies (|x’-x^*|_{infty}) is a factor (d^{frac{1}{2}-gamma})
smaller than (|x’-x^*|_2). It also gives a better bound on the generalization
of (x’) to new examples: if rows of (A) correspond to examples and columns to
features, then our result gives a better bound for the error introduced by
sketch-and-solve when classifying fresh examples. We show that not all
oblivious subspace embeddings (S) satisfy these properties. In particular, we
give counterexamples showing that matrices based on Count-Sketch or leverage
score sampling do not satisfy these properties.
We also provide lower bounds, both on how small (|x’-x^*|_2) can be, and
for our new guarantee (1), showing that the subsampled randomized
Fourier/Hadamard transform is nearly optimal.
Ti-Rong Wu, I-Chen Wu, Guan-Wun Chen, Ting-han Wei, Tung-Yi Lai, Hung-Chun Wu, Li-Cheng Lan
Comments: This version was also submitted to IEEE TCIAIG on May 30, 2017
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
This paper proposes a new approach to a novel value network architecture for
the game Go, called a multi-labelled (ML) value network. In the ML value
network, different values (win rates) are trained simultaneously for different
settings of komi, a compensation given to balance the initiative of playing
first. The ML value network has three advantages, (a) it outputs values for
different komi, (b) it supports dynamic komi, and (c) it lowers the mean
squared error (MSE). This paper also proposes a new dynamic komi method to
improve game-playing strength. This paper also performs experiments to
demonstrate the merits of the architecture. First, the MSE of the ML value
network is generally lower than the value network alone. Second, the program
based on the ML value network wins by a rate of 67.6% against the program based
on the value network alone. Third, the program with the proposed dynamic komi
method significantly improves the playing strength over the baseline that does
not use dynamic komi, especially for handicap games. To our knowledge, up to
date, no handicap games have been played openly by programs using value
networks. This paper provides these programs with a useful approach to playing
handicap games.
Weilin Xu, David Evans, Yanjun Qi
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
Feature squeezing is a recently-introduced framework for mitigating and
detecting adversarial examples. In previous work, we showed that it is
effective against several earlier methods for generating adversarial examples.
In this short note, we report on recent results showing that simple feature
squeezing techniques also make deep learning models significantly more robust
against the Carlini/Wagner attacks, which are the best known adversarial
methods discovered to date.
Rick Smetsers
Comments: Submitted and selected for oral presentation at the LearnAut workshop at LICS 2017
Subjects: Formal Languages and Automata Theory (cs.FL); Learning (cs.LG); Logic in Computer Science (cs.LO)
The problem of learning a minimal consistent model from a set of labeled
sequences of symbols is addressed from a satisfiability modulo theories
perspective. We present two encodings for deterministic finite automata and
extend one of these for Moore and Mealy machines. Our experimental results show
that these encodings improve upon the state-of-the-art, and are useful in
practice for learning small models.
Zhulin Liu, C. L. Philip Chen
Comments: 2016 3rd International Conference on Informative and Cybernetics for Computational Social Systems (ICCSS)
Subjects: Numerical Analysis (math.NA); Learning (cs.LG)
A new Hardy space Hardy space approach of Dirichlet type problem based on
Tikhonov regularization and Reproducing Hilbert kernel space is discussed in
this paper, which turns out to be a typical extremal problem located on the
upper upper-high complex plane. If considering this in the Hardy space, the
optimization operator of this problem will be highly simplified and an
efficient algorithm is possible. This is mainly realized by the help of
reproducing properties of the functions in the Hardy space of upper-high
complex plane, and the detail algorithm is proposed. Moreover, harmonic
mappings, which is a significant geometric transformation, are commonly used in
many applications such as image processing, since it describes the energy
minimization mappings between individual manifolds. Particularly, when we focus
on the planer mappings between two Euclid planer regions, the harmonic mappings
are exist and unique, which is guaranteed solidly by the existence of harmonic
function. This property is attractive and simulation results are shown in this
paper to ensure the capability of applications such as planer shape distortion
and surface registration.
Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Convolution is a fundamental operation in many applications, such as computer
vision, natural language processing, image processing, etc. Recent successes of
convolutional neural networks in various deep learning applications put even
higher demand on fast convolution. The high computation throughput and memory
bandwidth of graphics processing units (GPUs) make GPUs a natural choice for
accelerating convolution operations. However, maximally exploiting the
available memory bandwidth of GPUs for convolution is a challenging task. This
paper introduces a general model to address the mismatch between the memory
bank width of GPUs and computation data width of threads. Based on this model,
we develop two convolution kernels, one for the general case and the other for
a special case with one input channel. By carefully optimizing memory access
patterns and computation patterns, we design a communication-optimized kernel
for the special case and a communication-reduced kernel for the general case.
Experimental data based on implementations on Kepler GPUs show that our kernels
achieve 5.16X and 35.5% average performance improvement over the latest cuDNN
library, for the special case and the general case, respectively.
Zhenzhou Wu, Xin Zheng, Daniel Dahlmeier
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
Despite the success of deep learning on many fronts especially image and
speech, its application in text classification often is still not as good as a
simple linear SVM on n-gram TF-IDF representation especially for smaller
datasets. Deep learning tends to emphasize on sentence level semantics when
learning a representation with models like recurrent neural network or
recursive neural network, however from the success of TF-IDF representation, it
seems a bag-of-words type of representation has its strength. Taking advantage
of both representions, we present a model known as TDSM (Top Down Semantic
Model) for extracting a sentence representation that considers both the
word-level semantics by linearly combining the words with attention weights and
the sentence-level semantics with BiLSTM and use it on text classification. We
apply the model on characters and our results show that our model is better
than all the other character-based and word-based convolutional neural network
models by cite{zhang15} across seven different datasets with only 1\% of their
parameters. We also demonstrate that this model beats traditional linear models
on TF-IDF vectors on small and polished datasets like news article in which
typically deep learning models surrender.
Rui Gao, Sergiy A. Vorobyov
Comments: 27 pages, 8 figures, 1 table
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We address the multi-focus image fusion problem, where multiple images
captured with different focal settings are to be fused into an all-in-focus
image of higher quality. Algorithms for this problem necessarily admit the
source image characteristics along with focused and blurred feature. However,
most sparsity-based approaches use a single dictionary in focused feature space
to describe multi-focus images, and ignore the representations in blurred
feature space. Here, we propose a multi-focus image fusion approach based on
coupled sparse representation. The approach exploits the facts that (i) the
patches in given training set can be sparsely represented by a couple of
overcomplete dictionaries related to the focused and blurred categories of
images; and (ii) merging such representations leads to a more flexible and
therefore better fusion strategy than the one based on just selecting the
sparsest representation in the original image estimate. By jointly learning the
coupled dictionary, we enforce the similarity of sparse representations in the
focused and blurred feature spaces, and then introduce a fusion approach to
combine these representations for generating an all-in-focus image. We also
discuss the advantages of the fusion approach based on coupled sparse
representation and present an efficient algorithm for learning the coupled
dictionary. Extensive experimental comparisons with state-of-the-art
multi-focus image fusion algorithms validate the effectiveness of the proposed
approach.
Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, Dell Zhang
Comments: 10 pages
Subjects: Information Retrieval (cs.IR); Learning (cs.LG)
This paper provides a unified account of two schools of thinking in
information retrieval modelling: the generative retrieval focusing on
predicting relevant documents given a query, and the discriminative retrieval
focusing on predicting relevancy given a query-document pair. We propose a game
theoretical minimax game to iteratively optimise both models. On one hand, the
discriminative model, aiming to mine signals from labelled and unlabelled data,
provides guidance to train the generative model towards fitting the underlying
relevance distribution over documents given the query. On the other hand, the
generative model, acting as an attacker to the current discriminative model,
generates difficult examples for the discriminative model in an adversarial way
by minimising its discrimination objective. With the competition between these
two models, we show that the unified framework takes advantage of both schools
of thinking: (i) the generative model learns to fit the relevance distribution
over documents via the signals from the discriminative model, and (ii) the
discriminative model is able to exploit the unlabelled data selected by the
generative model to achieve a better estimation for document ranking. Our
experimental results have demonstrated significant performance gains as much as
23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
applications including web search, item recommendation, and question answering.
Francesc Wilhelmi, Boris Bellalta, Cristina Cano, Anders Jonsson
Comments: Conference
Subjects: Networking and Internet Architecture (cs.NI); Learning (cs.LG)
Reinforcement Learning is gaining attention by the wireless networking
community due to its potential to learn good-performing configurations only
from the observed results. In this work we propose a stateless variation of
Q-learning, which we apply to exploit spatial reuse in a wireless network. In
particular, we allow networks to modify both their transmission power and the
channel used solely based on the experienced throughput. We concentrate in a
completely decentralized scenario in which no information about neighbouring
nodes is available to the learners. Our results show that although the
algorithm is able to find the best-performing actions to enhance aggregate
throughput, there is high variability in the throughput experienced by the
individual networks. We identify the cause of this variability as the
adversarial setting of our setup, in which the most played actions provide
intermittent good/poor performance depending on the neighbouring decisions. We
also evaluate the effect of the intrinsic learning parameters of the algorithm
on this variability.
Guillaume Gautier, Rémi Bardenet, Michal Valko
Comments: 19 pages, accepted to ICML 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Computation (stat.CO)
Determinantal point processes (DPPs) are distributions over sets of items
that model diversity using kernels. Their applications in machine learning
include summary extraction and recommendation systems. Yet, the cost of
sampling from a DPP is prohibitive in large-scale applications, which has
triggered an effort towards efficient approximate samplers. We build a novel
MCMC sampler that combines ideas from combinatorial geometry, linear
programming, and Monte Carlo methods to sample from DPPs with a fixed sample
cardinality, also called projection DPPs. Our sampler leverages the ability of
the hit-and-run MCMC kernel to efficiently move across convex bodies. Previous
theoretical results yield a fast mixing time of our chain when targeting a
distribution that is close to a projection DPP, but not a DPP in general. Our
empirical results demonstrate that this extends to sampling projection DPPs,
i.e., our sampler is more sample-efficient than previous approaches.
Baruch Epstein. Ron Meir, Tomer Michaeli
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
The incorporation of prior knowledge into learning is essential in achieving
good performance based on small noisy samples. Such knowledge is often
incorporated through the availability of related data arising from domains and
tasks similar to the one of current interest. Ideally one would like to allow
both the data for the current task and for previous related tasks to
self-organize the learning system in such a way that commonalities and
differences between the tasks are learned in a data-driven fashion. We develop
a framework for learning multiple tasks simultaneously, based on sharing
features that are common to all tasks, achieved through the use of a modular
deep feedforward neural network consisting of shared branches, dealing with the
common features of all tasks, and private branches, learning the specific
unique aspects of each task. Once an appropriate weight sharing architecture
has been established, learning takes place through standard algorithms for
feedforward networks, e.g., stochastic gradient descent and its variations. The
method deals with domain adaptation and multi-task learning in a unified
fashion, and can easily deal with data arising from different types of sources.
Numerical experiments demonstrate the effectiveness of learning in domain
adaptation and transfer learning setups, and provide evidence for the flexible
and task-oriented representations arising in the network.
Karol Hausman, Yevgen Chebotar, Stefan Schaal, Gaurav Sukhatme, Joseph Lim
Subjects: Robotics (cs.RO); Learning (cs.LG)
Imitation learning has traditionally been applied to learn a single task from
demonstrations thereof. The requirement of structured and isolated
demonstrations limits the scalability of imitation learning approaches as they
are difficult to apply to real-world scenarios, where robots have to be able to
execute a multitude of tasks. In this paper, we propose a multi-modal imitation
learning framework that is able to segment and imitate skills from unlabelled
and unstructured demonstrations by learning skill segmentation and imitation
learning jointly. The extensive simulation results indicate that our method can
efficiently separate the demonstrations into individual skills and learn to
imitate them using a single multi-modal policy. The video of our experiments is
available at this http URL
Weiyang Liu, Bo Dai, James M. Rehg, Le Song
Comments: To appear in ICML 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
In this paper, we consider the problem of machine teaching, the inverse
problem of machine learning. Different from traditional machine teaching which
views the learners as batch algorithms, we study a new paradigm where the
learner uses an iterative algorithm and a teacher can feed examples
sequentially and intelligently based on the current performance of the learner.
We show that the teaching complexity in the iterative case is very different
from that in the batch case. Instead of constructing a minimal training set for
learners, our iterative machine teaching focuses on achieving fast convergence
in the learner model. Depending on the level of information the teacher has
from the learner model, we design teaching algorithms which can provably reduce
the number of teaching examples and achieve faster convergence than learning
without teachers. We also validate our theoretical findings with extensive
experiments on different data distribution and real image datasets.
Guan-Horng Liu, Avinash Siravuru, Sai Prabhakar, Manuela Veloso, George Kantor
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Sensor fusion is indispensable to improve accuracy and robustness in an
autonomous navigation setting. However, in the space of end-to-end sensorimotor
control, this multimodal outlook has received limited attention. In this work,
we propose a novel stochastic regularization technique, called Sensor Dropout,
to robustify multimodal sensor policy learning outcomes. We also introduce an
auxiliary loss on policy network along with the standard DRL loss that help
reduce the action variations of the multimodal sensor policy. Through empirical
testing we demonstrate that our proposed policy can 1) operate with minimal
performance drop in noisy environments, 2) remain functional even in the face
of a sensor subset failure. Finally, through the visualization of gradients, we
show that the learned policies are conditioned on the same latent input
distribution despite having multiple sensory observations spaces – a hallmark
of true sensor-fusion. This efficacy of a multimodal policy is shown through
simulations on TORCS, a popular open-source racing car game. A demo video can
be seen here: this https URL
Jonathan Gryak, Robert M. Haralick, Delaram Kahrobaei
Subjects: Group Theory (math.GR); Learning (cs.LG)
Machine learning and pattern recognition techniques have been successfully
applied to algorithmic problems in free groups. In this paper, we seek to
extend these techniques to finitely presented non-free groups, with a
particular emphasis on polycyclic and metabelian groups that are of interest to
non-commutative cryptography.
As a prototypical example, we utilize supervised learning methods to
construct classifiers that can solve the conjugacy decision problem, i.e.,
determine whether or not a pair of elements from a specified group are
conjugate. The accuracies of classifiers created using decision trees, random
forests, and N-tuple neural network models are evaluated for several non-free
groups. The very high accuracy of these classifiers suggests an underlying
mathematical relationship with respect to conjugacy in the tested groups.
Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Barnabas Poczos, Aarti Singh
Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)
Although gradient descent (GD) almost always escapes saddle points
asymptotically [Lee et al., 2016], this paper shows that even with fairly
natural random initialization schemes and non-pathological functions, GD can be
significantly slowed down by saddle points, taking exponential time to escape.
On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et
al., 2017] is not slowed down by saddle points – it can find an approximate
local minimizer in polynomial time. This result implies that GD is inherently
slower than perturbed GD, and justifies the importance of adding perturbations
for efficient non-convex optimization. While our focus is theoretical, we also
present experiments that illustrate our theoretical findings.
Clément Calauzènes, Nicolas Le Roux
Subjects: Optimization and Control (math.OC); Learning (cs.LG)
In recent years, variance-reducing stochastic methods have shown great
practical performance, exhibiting linear convergence rate when other stochastic
methods offered a sub-linear rate. However, as datasets grow ever bigger and
clusters become widespread, the need for fast distribution methods is pressing.
We propose here a distribution scheme for SAGA which maintains a linear
convergence rate, even when communication between nodes is limited.
Minje Kim
Journal-ref: Proc. of the IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pp 76-80, March 2017
Subjects: Sound (cs.SD); Learning (cs.LG)
We show that a Modular Neural Network (MNN) can combine various speech
enhancement modules, each of which is a Deep Neural Network (DNN) specialized
on a particular enhancement job. Differently from an ordinary ensemble
technique that averages variations in models, the propose MNN selects the best
module for the unseen test signal to produce a greedy ensemble. We see this as
Collaborative Deep Learning (CDL), because it can reuse various already-trained
DNN models without any further refining. In the proposed MNN selecting the best
module during run time is challenging. To this end, we employ a speech
AutoEncoder (AE) as an arbitrator, whose input and output are trained to be as
similar as possible if its input is clean speech. Therefore, the AE can gauge
the quality of the module-specific denoised result by seeing its AE
reconstruction error, e.g. low error means that the module output is similar to
clean speech. We propose an MNN structure with various modules that are
specialized on dealing with a specific noise type, gender, and input
Signal-to-Noise Ratio (SNR) value, and empirically prove that it almost always
works better than an arbitrarily chosen DNN module and sometimes as good as an
oracle result.
Benjamin Paul Chamberlain, James Clough, Marc Peter Deisenroth
Comments: 7 pages, 5 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Neural embeddings have been used with great success in Natural Language
Processing (NLP). They provide compact representations that encapsulate word
similarity and attain state-of-the-art performance in a range of linguistic
tasks. The success of neural embeddings has prompted significant amounts of
research into applications in domains other than language. One such domain is
graph-structured data, where embeddings of vertices can be learned that
encapsulate vertex similarity and improve performance on tasks including edge
prediction and vertex labelling. For both NLP and graph based tasks, embeddings
have been learned in high-dimensional Euclidean spaces. However, recent work
has shown that the appropriate isometric space for embedding complex networks
is not the flat Euclidean space, but negatively curved, hyperbolic space. We
present a new concept that exploits these recent insights and propose learning
neural embeddings of graphs in hyperbolic space. We provide experimental
evidence that embedding graphs in their natural geometry significantly improves
performance on downstream tasks for several real-world public datasets.
Patrick Hohenecker, Thomas Lukasiewicz
Comments: 9 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
In this work, we present a novel approach to ontology reasoning that is based
on deep learning rather than logic-based formal reasoning. To this end, we
introduce a new model for statistical relational learning that is built upon
deep recursive neural networks, and give experimental evidence that it can
easily compete with, or even outperform, existing logic-based reasoners on the
task of ontology reasoning. More precisely, we compared our implemented system
with one of the best logic-based ontology reasoners at present, RDFox, on a
number of large standard benchmark datasets, and found that our system attained
high reasoning quality, while being up to two orders of magnitude faster.
Taiji Suzuki
Comments: 36 pages
Subjects: Statistics Theory (math.ST); Learning (cs.LG); Machine Learning (stat.ML)
We develop a new theoretical framework to analyze the generalization error of
deep learning, and derive a new fast learning rate for two representative
algorithms: empirical risk minimization and Bayesian deep learning. The series
of theoretical analyses of deep learning has revealed its high expressive power
and universal approximation capability. Although these analyses are highly
nonparametric, existing generalization error analyses have been developed
mainly in a fixed dimensional parametric model. To compensate this gap, we
develop an infinite dimensional model that is based on an integral form as
performed in the analysis of the universal approximation capability. This
allows us to define a reproducing kernel Hilbert space corresponding to each
layer. Our point of view is to deal with the ordinary finite dimensional deep
neural network as a finite approximation of the infinite dimensional one. The
approximation error is evaluated by the degree of freedom of the reproducing
kernel Hilbert space in each layer. To estimate a good finite dimensional
model, we consider both of empirical risk minimization and Bayesian deep
learning. We derive its generalization error bound and it is shown that there
appears bias-variance trade-off in terms of the number of parameters of the
finite dimensional approximation. We show that the optimal width of the
internal layers can be determined through the degree of freedom and the
convergence rate can be faster than (O(1/sqrt{n})) rate which has been shown
in the existing studies.
Stanislav Kruglik, Valeriya Potapova, Alexey Frolov
Comments: submitted to WCC 2017
Subjects: Information Theory (cs.IT)
An algorithm for constructing parity-check matrices of non-binary
quasi-cyclic low-density parity-check (NB QC-LDPC) codes is proposed. The
algorithm finds short cycles in the base matrix and tries to eliminate them by
selecting the circulants and the elements of GF(q). Firstly the algorithm tries
to eliminate the cycles with the smallest number edges going outside the cycle.
The efficiency of the algorithm is demonstrated by means of simulations. In
particular, it was shown that NB QC-LDPC codes constructed with use of our
algorithm loose less that 0.1 dB in comparison to the best NB LDPC codes.
Hemani Kaushal, Georges Kaddoum
Comments: 41 pages, 13 Figures and 8 Tables. arXiv admin note: substantial text overlap with arXiv:1506.04836
Journal-ref: IEEE Communications Surveys & Tutorials ( Volume: 19, Issue: 1,
Firstquarter 2017 ), pp. 57-96, 2016
Subjects: Information Theory (cs.IT)
In recent years, free space optical communication has gained significant
importance owing to its unique features: large bandwidth, license-free
spectrum, high data rate, easy and quick deployability, less power and low mass
requirements. FSO communication uses the optical carrier in the near infrared
band to establish either terrestrial links within the Earth’s atmosphere or
inter-satellite or deep space links or ground-to-satellite or
satellite-to-ground links. However, despite the great potential of FSO
communication, its performance is limited by the adverse effects viz.,
absorption, scattering, and turbulence of the atmospheric channel. This paper
presents a comprehensive survey on various challenges faced by FSO
communication system for ground-to-satellite or satellite-to-ground and
inter-satellite links. It also provides details of various performance
mitigation techniques in order to have high link availability and reliability.
The first part of the paper will focus on various types of impairments that
pose a serious challenge to the performance of optical communication system for
ground-to-satellite or satellite-to-ground and inter-satellite links. The
latter part of the paper will provide the reader with an exhaustive review of
various techniques both at physical layer as well as at the other layers i.e.,
link, network or transport layer to combat the adverse effects of the
atmosphere. It also uniquely presents a recently developed technique using
orbital angular momentum for utilizing the high capacity advantage of the
optical carrier in case of space-based and near-Earth optical communication
links. This survey provides the reader with comprehensive details on the use of
space-based optical backhaul links in order to provide high-capacity and
low-cost backhaul solutions.
Mahesh Babu Vaddi, B. Sundar Rajan
Comments: 14 pages, 8 figures and 3 tables. arXiv admin note: substantial text overlap with arXiv:1705.05060, arXiv:1705.03192
Subjects: Information Theory (cs.IT)
A single unicast index coding problem (SUICP) with symmetric neighboring
interference (SNI) has equal number of (K) messages and (K) receivers, the
(k)th receiver (R_{k}) wanting the (k)th message (x_{k}) and having the
side-information (mathcal{K}_{k}=(mathcal{I}_{k} cup x_{k})^c,) where
({I}_k= {x_{k-U},dots,x_{k-2},x_{k-1}}cup{x_{k+1},
x_{k+2},dots,x_{k+D}}) is the interference with (D) messages after and (U)
messages before its desired message. Maleki, Cadambe and Jafar obtained the
capacity of this single unicast index coding problem with symmetric neighboring
interference (SUICP-SNI) with (K) tending to infinity and Blasiak, Kleinberg
and Lubetzky for the special case of ((D=U=1)) with (K) being finite. In our
previous work, we proved the capacity of SUICP-SNI for arbitrary (K) and (D)
with (U= ext{gcd}(K,D+1)-1). This paper deals with near-optimal linear code
construction for SUICP-SNI with arbitrary (K,U) and (D.) For SUICP-SNI with
arbitrary (K,U) and (D), we define a set of (2)-tuples such that for every
((a,b)) in that set the rate (D+1+frac{a}{b}) is achieved by using vector
linear index codes over every field. We prove that the set
(mathcal{mathbf{S}}) consists of ((a,b)) such that the rate of constructed
vector linear index codes are at most (frac{K~ ext{mod}~(D+1)}{left lfloor
frac{K}{D+1}
ight
floor}) away from a known lower bound on broadcast rate
of SUICP-SNI. The three known results on the exact capacity of the SUICP-SNI
are recovered as special cases of our results. Also, we give a low complexity
decoding procedure for the proposed vector linear index codes for the
SUICP-SNI.
Umberto Martínez-Peñas
Comments: 21 pages, LaTeX; parts of this paper have been accepted for presentation at the IEEE International Symposium on Information Theory, Aachen, Germany, June 2017
Subjects: Information Theory (cs.IT)
We study the problem of reducing the communication overhead from a noisy
wire-tap channel or storage system where data is encoded as a matrix, when more
columns (or their linear combinations) are available. We present its
applications to reducing communication overheads in universal secure linear
network coding and secure distributed storage with crisscross errors and
erasures and in the presence of a wire-tapper. Our main contribution is a
method to transform coding schemes based on linear rank-metric codes, with
certain properties, to schemes with lower communication overheads. By applying
this method to pairs of Gabidulin codes, we obtain coding schemes with optimal
information rate with respect to their security and rank error correction
capability, and with universally optimal communication overheads, when ( n leq
m ), being ( n ) and ( m ) the number of columns and number of rows,
respectively. Moreover, our method can be applied to other families of maximum
rank distance codes when ( n > m ). The downside of the method is generally
expanding the packet length, but some practical instances come at no cost.
Laszlo Csirmaz, Peter Ligeti
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)
We investigate graph based secret sharing schemes and its information ratio,
also called complexity, measuring the maximal amount of information the
vertices has to store. It was conjectured that in large girth graphs, where the
interaction between far away nodes is restricted to a single path, this ratio
is bounded. This conjecture was supported by several result, most notably by a
result of Csirmaz and Ligeti saying that the complexity of graphs with girth at
least six and no neighboring high degree vertices is strictly below 2. In this
paper we refute the above conjecture. First, a family of (d)-regular graphs is
defined iteratively such that the complexity of these graphs is the largest
possible ((d+1)/2) allowed by Stinson’s bound. This part extends earlier
results of van Dijk and Blundo et al, and uses the so-called entropy method.
Second, using combinatorial arguments, we show that this family contains graphs
with arbitrary large girth. In particular, we obtain the following purely
combinatorial result, which might be interesting on its own: there are
(d)-regular graphs with arbitrary large girth such that any fractional
edge-cover by stars (or by complete multipartite graphs) must cover some vertex
((d+1)/2) times.
Dogay Altinel, Gunes Karabulut Kurt
Subjects: Information Theory (cs.IT)
RF energy harvesting (RFEH) is a promising technology for energy requirements
of wireless communication nodes. However, providing sufficient amount of energy
to ensure self-sufficient devices based on RFEH may be challenging. In this
paper, the use of diversity combining in RFEH systems is proposed to increase
the amount of harvested energy. The power consumption of diversity combining
process is also taken into account to analyze the net benefit of diversity
combining. Performances of RFEH systems are investigated for selection
combining (SC), equal gain combining (EGC), and maximal ratio combining (MRC)
techniques. Simulations are conducted to compare the numerical results of SC,
EGC, and MRC, and the results show that although the diversity combining
techniques can improve the energy harvesting performance, the power consumption
parameters have a critical importance while determining the suitable technique.
Mahyar Shirvanimoghaddam, Massimo Condoluci, Mischa Dohler, Sarah Johnson
Comments: To appear in IEEE JSAC Special Issue on Non-Orthogonal Multiple Access for 5G Systems
Subjects: Information Theory (cs.IT)
Machine-to-machine (M2M) constitutes the communication paradigm at the basis
of Internet of Things (IoT) vision. M2M solutions allow billions of multi-role
devices to communicate with each other or with the underlying data transport
infrastructure without, or with minimal, human intervention. Current solutions
for wireless transmissions originally designed for human-based applications
thus require a substantial shift to cope with the capacity issues in managing a
huge amount of M2M devices. In this paper, we consider the multiple access
techniques as promising solutions to support a large number of devices in
cellular systems with limited radio resources. We focus on non-orthogonal
multiple access (NOMA) where, with the aim to increase the channel efficiency,
the devices share the same radio resources for their data transmission. This
has been shown to provide optimal throughput from an information theoretic
point of view.We consider a realistic system model and characterise the system
performance in terms of throughput and energy efficiency in a NOMA scenario
with a random packet arrival model, where we also derive the stability
condition for the system to guarantee the performance.
Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)
We consider a large-scale matrix multiplication problem where the computation
is carried out using a distributed system with a master node and multiple
worker nodes, where each worker can store parts of the input matrices. We
propose a computation strategy that leverages ideas from coding theory to
design intermediate computations at the worker nodes, in order to efficiently
deal with straggling workers. The proposed strategy, named as emph{polynomial
codes}, achieves the optimum recovery threshold, defined as the minimum number
of workers that the master needs to wait for in order to compute the output.
Furthermore, by leveraging the algebraic structure of polynomial codes, we can
map the reconstruction problem of the final output to a polynomial
interpolation problem, which can be solved efficiently. Polynomial codes
provide order-wise improvement over the state of the art in terms of recovery
threshold, and are also optimal in terms of several other metrics. Furthermore,
we extend this code to distributed convolution and show its order-wise
optimality.
Avi Zanko, Itsik Bergel, Amir Leshem
Subjects: Information Theory (cs.IT)
In this paper we propose a rapidly converging LMS algorithm for crosstalk
cancellation. The architecture is similar to deep neural networks, where
multiple layers are adapted sequentially. The application motivating this
approach is gigabit rate transmission over unshielded twisted pairs using a
vectored system. The crosstalk cancellation algorithm uses an adaptive
non-diagonal preprocessing matrix prior to a conventional LMS crosstalk
canceler. The update of the preprocessing matrix is inspired by deep neural
networks. However, since most the operations in the Deep-LMS algorithm are
linear, we are capable of providing an exact convergence speed analysis. The
role of the preprocessing matrix is to speed up the convergence of the
conventional LMS crosstalk canceler and hence the convergence of the overall
system. The Deep-LMS is important for crosstalk cancellation in the novel
G.fast standard, where traditional LMS converges very slowly due to the
ill-conditioned covariance matrix of the received signal at the extended
bandwidth. Simulation results support our analysis and show significant
reduction in convergence time compared to existing LMS variants.
Christopher Portmann
Comments: 39+18 pages, 11 figures
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)
We model (interactive) resources that provide Alice with a string (X) and a
guarantee that any Eve interacting with her interface of the resource obtains a
(quantum) system (E) such that the conditional (smooth) min-entropy of (X)
given (E) is lower bounded by some (k). This (abstract) resource specification
encompasses any setting that results in the honest players holding such a
string (or aborting). For example, it could be constructed from, e.g., noisy
channels, quantum key distribution (QKD), or a violation of Bell inequalities,
which all may be used to derive bounds on the min-entropy of (X).
As a first application, we use this min-entropy resource to modularize key
distribution (KD) schemes by dividing them in two parts, which may be analyzed
separately. In the first part, a KD protocol constructs a min-entropy resource
given the (physical) resources available in the specific setting considered. In
the second, it distills secret key from the min-entropy resource—i.e., it
constructs a secret key resource. We prove security for a generic key
distillation protocol that may use any min-entropy resource. Since the notion
of resource construction is composable—security of a composed protocol
follows from the security of its parts— this reduces proving security of a KD
protocol (e.g., QKD) to proving that it constructs a min-entropy resource.
As a second application, we provide a composable security proof for the
recent Fehr-Salvail protocol [EUROCRYPT 2017] that authenticates classical
messages with a quantum message authentication code (Q-MAC), and recycles all
the key upon successfully verifying the authenticity of the message. This
protocol uses (and recycles) a non-uniform key, which we model as consuming and
constructing a min-entropy resource.
Marko V. Jankovic
Subjects: Learning (cs.LG); Information Theory (cs.IT)
In this paper, we propose the classification method based on a learning
paradigm we are going to call Quantum Low Entropy based Associative Reasoning
or QLEAR learning. The approach is based on the idea that classification can be
understood as supervised clustering, where a quantum entropy in the context of
the quantum probabilistic model, will be used as a “capturer” (measure, or
external index), of the “natural structure” of the data. By using quantum
entropy we do not make any assumption about linear separability of the data
that are going to be classified. The basic idea is to find close neighbors to a
query sample and then use relative change in the quantum entropy as a measure
of similarity of the newly arrived sample with the representatives of interest.
In other words, method is based on calculation of quantum entropy of the
referent system and its relative change with the addition of the newly arrived
sample. Referent system consists of vectors that represent individual classes
and that are the most similar, in Euclidean distance sense, to the vector that
is analyzed. Here, we analyze the classification problem in the context of
measuring similarities to prototype examples of categories. While nearest
neighbor classifiers are natural in this setting, they suffer from the problem
of high variance (in bias-variance decomposition) in the case of limited
sampling. Alternatively, one could use machine learning techniques (like
support vector machines) but they involve time-consuming optimization. Here we
propose a hybrid of nearest neighbor and machine learning technique which deals
naturally with the multi-class setting, has reasonable computational complexity
both in training and at run time, and yields excellent results in practice.
Gang Wang, Georgios B. Giannakis, Yousef Saad, Jie Chen
Comments: 27 pages, 8 figures
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Machine Learning (stat.ML)
This paper deals with finding an (n)-dimensional solution (x) to a system of
quadratic equations of the form (y_i=|langle{a}_i,x
angle|^2) for (1le i le
m), which is also known as phase retrieval and is NP-hard in general. We put
forth a novel procedure for minimizing the amplitude-based least-squares
empirical loss, that starts with a weighted maximal correlation initialization
obtainable with a few power or Lanczos iterations, followed by successive
refinements based upon a sequence of iteratively reweighted (generalized)
gradient iterations. The two (both the initialization and gradient flow) stages
distinguish themselves from prior contributions by the inclusion of a fresh
(re)weighting regularization technique. The overall algorithm is conceptually
simple, numerically scalable, and easy-to-implement. For certain random
measurement models, the novel procedure is shown capable of finding the true
solution (x) in time proportional to reading the data ({(a_i;y_i)}_{1le i
le m}). This holds with high probability and without extra assumption on the
signal (x) to be recovered, provided that the number (m) of equations is some
constant (c>0) times the number (n) of unknowns in the signal vector, namely,
(m>cn). Empirically, the upshots of this contribution are: i) (almost) (100\%)
perfect signal recovery in the high-dimensional (say e.g., (nge 2,000)) regime
given only an information-theoretic limit number of noiseless equations,
namely, (m=2n-1) in the real-valued Gaussian case; and, ii) (nearly) optimal
statistical accuracy in the presence of additive noise of bounded support.
Finally, substantial numerical tests using both synthetic data and real images
corroborate markedly improved signal recovery performance and computational
efficiency of our novel procedure relative to state-of-the-art approaches.
Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Multiagent Systems (cs.MA)
Inspired by previous work on emergent language in referential games, we
propose a novel multi-modal, multi-step referential game, where the sender and
receiver have access to distinct modalities of an object, and their information
exchange is bidirectional and of arbitrary duration. The multi-modal multi-step
setting allows agents to develop an internal language significantly closer to
natural language, in that they share a single set of messages, and that the
length of the conversation may vary according to the difficulty of the task. We
examine these properties empirically using a dataset consisting of images and
textual descriptions of mammals, where the agents are tasked with identifying
the correct object. Our experiments indicate that a robust and efficient
communication protocol emerges, where gradual information exchange informs
better predictions and higher communication bandwidth improves generalization.