IT博客汇 | arXiv Paper Daily: Fri, 17 Feb 2017

arXiv Paper Daily: Fri, 17 Feb 2017

我爱机器学习(52ml.net)发表于 2017-02-17 00:00:00

Neural and Evolutionary Computing

Unbiased Online Recurrent Optimization

Corentin Tallec, Yann Ollivier
Comments: 11 pages, 5 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

The novel Unbiased Online Recurrent Optimization (UORO) algorithm allows for
online learning of general recurrent computational graphs such as recurrent
network models. It works in a streaming fashion and avoids backtracking through
past activations and inputs. UORO is a modification of NoBackTrack that
bypasses the need for model sparsity and makes implementation easy in current
deep learning frameworks, even for complex models. Computationally, UORO is as
costly as Truncated Backpropagation Through Time (TBPTT). Contrary to TBPTT,
UORO is guaranteed to provide unbiased gradient estimates, and does not favor
short-term dependencies. The downside is added noise, requiring smaller
learning rates.

On synthetic tasks, UORO is found to overcome several deficiencies of TBPTT.
For instance, when a parameter has a positive short-term but negative long-term
influence, TBPTT may require truncation lengths substantially larger than the
intrinsic temporal range of the interactions, while UORO performs well thanks
to the unbiasedness of its gradients.

Precise Recovery of Latent Vectors from Generative Adversarial Networks

Zachary C. Lipton, Subarna Tripathi
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Generative adversarial networks (GANs) transform latent vectors into visually
plausible images. It is generally thought that the original GAN formulation
gives no out-of-the-box method to reverse the mapping, projecting images back
into latent space. We introduce a simple, gradient-based technique called
stochastic clipping. In experiments, for images generated by the GAN, we
exactly recover their latent vector pre-images 100% of the time. Additional
experiments demonstrate that this method is robust to noise. Finally, we show
that even for unseen images, our method appears to recover unique encodings.

Training Language Models Using Target-Propagation

Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

While Truncated Back-Propagation through Time (BPTT) is the most popular
approach to training Recurrent Neural Networks (RNNs), it suffers from being
inherently sequential (making parallelization difficult) and from truncating
gradient flow between distant time-steps. We investigate whether Target
Propagation (TPROP) style approaches can address these shortcomings.
Unfortunately, extensive experiments suggest that TPROP generally underperforms
BPTT, and we end with an analysis of this phenomenon, and suggestions for
future work.

Computer Vision and Pattern Recognition

Improving Text Proposals for Scene Images with Fully Convolutional Networks

Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, Andrew D. Bagdanov
Comments: 6 pages, 8 figures, International Conference on Pattern Recognition (ICPR) – DLPR (Deep Learning for Pattern Recognition) workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text Proposals have emerged as a class-dependent version of object proposals
– efficient approaches to reduce the search space of possible text object
locations in an image. Combined with strong word classifiers, text proposals
currently yield top state of the art results in end-to-end scene text
recognition. In this paper we propose an improvement over the original Text
Proposals algorithm of Gomez and Karatzas (2016), combining it with Fully
Convolutional Networks to improve the ranking of proposals. Results on the
ICDAR RRC and the COCO-text datasets show superior performance over current
state-of-the-art.

KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors

Amit Kumar, Azadeh Alavi, Rama Chellappa
Comments: Accept as Oral FG’17
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Keypoint detection is one of the most important pre-processing steps in tasks
such as face modeling, recognition and verification. In this paper, we present
an iterative method for Keypoint Estimation and Pose prediction of
unconstrained faces by Learning Efficient H-CNN Regressors (KEPLER) for
addressing the face alignment problem. Recent state of the art methods have
shown improvements in face keypoint detection by employing Convolution Neural
Networks (CNNs). Although a simple feed forward neural network can learn the
mapping between input and output spaces, it cannot learn the inherent
structural dependencies. We present a novel architecture called H-CNN
(Heatmap-CNN) which captures structured global and local features and thus
favors accurate keypoint detecion. HCNN is jointly trained on the visibility,
fiducials and 3D-pose of the face. As the iterations proceed, the error
decreases making the gradients small and thus requiring efficient training of
DCNNs to mitigate this. KEPLER performs global corrections in pose and
fiducials for the first four iterations followed by local corrections in the
subsequent stage. As a by-product, KEPLER also provides 3D pose (pitch, yaw and
roll) of the face accurately. In this paper, we show that without using any 3D
information, KEPLER outperforms state of the art methods for alignment on
challenging datasets such as AFW and AFLW.

Improving automated multiple sclerosis lesion segmentation with a cascaded 3D convolutional neural network approach

Sergi Valverde, Mariano Cabezas, Eloy Roura, Sandra González-Villà, Deborah Pareto, Joan-Carles Vilanova, LLuís Ramió-Torrentà, Àlex Rovira, Arnau Oliver, Xavier Lladó
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we present a novel automated method for White Matter (WM)
lesion segmentation of Multiple Sclerosis (MS) patient images. Our approach is
based on a cascade of two 3D patch-wise convolutional neural networks (CNN).
The first network is trained to be more sensitive revealing possible candidate
lesion voxels while the second network is trained to reduce the number of
misclassified voxels coming from the first network. This cascaded CNN
architecture tends to learn well from small sets of training data, which can be
very interesting in practice, given the difficulty to obtain manual label
annotations and the large amount of available unlabeled Magnetic Resonance
Imaging (MRI) data. We evaluate the accuracy of the proposed method on the
public MS lesion segmentation challenge MICCAI2008 dataset, comparing it with
respect to other state-of-the-art MS lesion segmentation tools. Furthermore,
the proposed method is also evaluated on two private MS clinical datasets,
where the performance of our method is also compared with different recent
public available state-of-the-art MS lesion segmentation methods. At the time
of writing this paper, our method is the best ranked approach on the MICCAI2008
challenge, outperforming the rest of 60 participant methods when using all the
available input modalities (T1-w, T2-w and FLAIR), while still in the top-rank
(3rd position) when using only T1-w and FLAIR modalities. On clinical MS data,
our approach exhibits a significant increase in the accuracy segmenting of WM
lesions when compared with the rest of evaluated methods, highly correlating
((r ge 0.97)) also with the expected lesion volume.

Deep Hybrid Similarity Learning for Person Re-identification

Jianqing Zhu, Huanqiang Zeng, Shengcai Liao, Zhen Lei, Canhui Cai, LiXin Zheng
Comments: 10 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Person Re-IDentification (Re-ID) aims to match person images captured from
two non-overlapping cameras. In this paper, a deep hybrid similarity learning
(DHSL) method for person Re-ID based on a convolution neural network (CNN) is
proposed. In our approach, a CNN learning feature pair for the input image pair
is simultaneously extracted. Then, both the element-wise absolute difference
and multiplication of the CNN learning feature pair are calculated. Finally, a
hybrid similarity function is designed to measure the similarity between the
feature pair, which is realized by learning a group of weight coefficients to
project the element-wise absolute difference and multiplication into a
similarity score. Consequently, the proposed DHSL method is able to reasonably
assign parameters of feature learning and metric learning in a CNN so that the
performance of person Re-ID is improved. Experiments on three challenging
person Re-ID databases, QMUL GRID, VIPeR and CUHK03, illustrate that the
proposed DHSL method is superior to multiple state-of-the-art person Re-ID
methods.

Chord Angle Deviation using Tangent (CADT), an Efficient and Robust Contour-based Corner Detector

Mohammad Asiful Hossain, Abdul Kawsar Tushar
Comments: Conference Name – 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR17); Conference Date – 13 Feb, 2017; Conference Venue – University of Dhaka, Dhaka, Bangladesh
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Detection of corner is the most essential process in a large number of
computer vision and image processing applications. We have mentioned a number
of popular contour-based corner detectors in our paper. Among all these
detectors chord to triangular arm angle (CTAA) has been demonstrated as the
most dominant corner detector in terms of average repeatability. We introduce a
new effective method to calculate the value of curvature in this paper. By
demonstrating experimental results, our proposed technique outperforms CTAA and
other detectors mentioned in this paper. The results exhibit that our proposed
method is simple yet efficient at finding out corners more accurately and
reliably.

Discovering objects and their relations from entangled scene representations

David Raposo, Adam Santoro, David Barrett, Razvan Pascanu, Timothy Lillicrap, Peter Battaglia
Comments: ICLR Workshop 2017
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Our world can be succinctly and compactly described as structured scenes of
objects and relations. A typical room, for example, contains salient objects
such as tables, chairs and books, and these objects typically relate to each
other by their underlying causes and semantics. This gives rise to correlated
features, such as position, function and shape. Humans exploit knowledge of
objects and their relations for learning a wide spectrum of tasks, and more
generally when learning the structure underlying observed data. In this work,
we introduce relation networks (RNs) – a general purpose neural network
architecture for object-relation reasoning. We show that RNs are capable of
learning object relations from scene description data. Furthermore, we show
that RNs can act as a bottleneck that induces the factorization of objects from
entangled scene description inputs, and from distributed deep representations
of scene images provided by a variational autoencoder. The model can also be
used in conjunction with differentiable memory mechanisms for implicit relation
discovery in one-shot learning tasks. Our results suggest that relation
networks are a potentially powerful architecture for solving a variety of
problems that require object relation reasoning.

Artificial Intelligence

Reflexive Regular Equivalence for Bipartite Data

Aaron Gerow, Mingyang Zhou, Stan Matwin, Feng Shi
Comments: A condensed version of this paper will appear in Proceedings of the 30th Canadian Conference on Artificial Intelligence, Edmonton, Alberta, Canada
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Bipartite data is common in data engineering and brings unique challenges,
particularly when it comes to clustering tasks that impose on strong structural
assumptions. This work presents an unsupervised method for assessing similarity
in bipartite data. Similar to some co-clustering methods, the method is based
on regular equivalence in graphs. The algorithm uses spectral properties of a
bipartite adjacency matrix to estimate similarity in both dimensions. The
method is reflexive in that similarity in one dimension is used to inform
similarity in the other. Reflexive regular equivalence can also use the
structure of transitivities — in a network sense — the contribution of which
is controlled by the algorithm’s only free-parameter, (alpha). The method is
completely unsupervised and can be used to validate assumptions of
co-similarity, which are required but often untested, in co-clustering
analyses. Three variants of the method with different normalizations are tested
on synthetic data. The method is found to be robust to noise and well-suited to
asymmetric co-similar structure, making it particularly informative for cluster
analysis and recommendation in bipartite data of unknown structure. In
experiments, the convergence and speed of the algorithm are found to be stable
for different levels of noise. Real-world data from a network of malaria genes
are analyzed, where the similarity produced by the reflexive method is shown to
out-perform other measures’ ability to correctly classify genes.

Theoretical and Practical Advances on Smoothing for Extensive-Form Games

Christian Kroer, Kevin Waugh, Fatma Kilinc-Karzan, Tuomas Sandholm
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

Sparse iterative methods, in particular first-order methods, are known to be
among the most effective in solving large-scale two-player zero-sum
extensive-form games. The convergence rates of these methods depend heavily on
the properties of the distance-generating function that they are based on. We
investigate the acceleration of first-order methods for solving extensive-form
games through better design of the dilated entropy function—a class of
distance-generating functions related to the domains associated with the
extensive-form games. By introducing a new weighting scheme for the dilated
entropy function, we develop the first distance-generating function for the
strategy spaces of sequential games that has no dependence on the branching
factor of the player. This result improves the convergence rate of several
first-order methods by a factor of (Omega(b^dd)), where (b) is the branching
factor of the player, and (d) is the depth of the game tree.

Thus far, counterfactual regret minimization methods have been faster in
practice, and more popular, than first-order methods despite their
theoretically inferior convergence rates. Using our new weighting scheme and
practical tuning we show that, for the first time, the excessive gap technique
can be made faster than the fastest counterfactual regret minimization
algorithm, CFR+, in practice.

Efficient Computation of Moments in Sum-Product Networks

Han Zhao, Geoff Gordon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Bayesian online learning algorithms for Sum-Product Networks (SPNs) need to
compute moments of model parameters under the one-step update posterior
distribution. The best existing method for computing such moments scales
quadratically in the size of the SPN, although it scales linearly for trees. We
propose a linear-time algorithm that works even when the SPN is a directed
acyclic graph (DAG). We achieve this goal by reducing the moment computation
problem into a joint inference problem in SPNs and by taking advantage of a
special structure of the one-step update posterior distribution: it is a
multilinear polynomial with exponentially many monomials, and we can evaluate
moments by differentiating. The latter is known as the emph{differential
trick}. We apply the proposed algorithm to develop a linear time assumed
density filter (ADF) for SPN parameter learning. As an additional contribution,
we conduct extensive experiments comparing seven different online learning
algorithms for SPNs on 20 benchmark datasets. The new linear-time ADF method
consistently achieves low runtime due to the efficient linear-time algorithm
for moment computation; however, we discover that two other methods (CCCP and
SMA) typically perform better statistically, while a third (BMM) is comparable
to ADF. Interestingly, CCCP can be viewed as implicitly using the same
differentiation trick that we make explicit here. The fact that two of the top
four fastest methods use this trick suggests that the same trick might find
other uses for SPN learning in the future.

Information Retrieval

Luandri: a Clean Lua Interface to the Indri Search Engine

Bhaskar Mitra, Fernando Diaz, Nick Craswell
Comments: Under review for SIGIR’17
Subjects: Information Retrieval (cs.IR)

In recent years, the information retrieval (IR) community has witnessed the
first successful applications of deep neural network models to short-text
matching and ad-hoc retrieval. It is exciting to see the research on deep
neural networks and IR converge on these tasks of shared interest. However, the
two communities have less in common when it comes to the choice of programming
languages. Indri, an indexing framework popularly used by the IR community, is
written in C++, while Torch, a popular machine learning library for deep
learning, is written in the light-weight scripting language Lua. To bridge this
gap, we introduce Luandri (pronounced “laundry”), a simple interface for
exposing the search capabilities of Indri to Torch models implemented in Lua.

Multimodal Content Representation and Similarity Ranking of Movies

Konstantinos Bougiatiotis
Comments: Preliminary work
Subjects: Information Retrieval (cs.IR)

In this paper we examine the existence of correlation between movie
similarity and low level features from respective movie content. In particular,
we demonstrate the extraction of multi-modal representation models of movies
based on subtitles, audio and metadata mining. We emphasize our research in
topic modeling of movies based on their subtitles. In order to demonstrate the
proposed content representation approach, we have built a small dataset of 160
widely known movies. We assert movie similarities, as propagated by the
singular modalities and fusion models, in the form of recommendation rankings.
We showcase a novel topic model browser for movies that allows for exploration
of the different aspects of similarities between movies and an information
retrieval system for movie similarity based on multi-modal content.

Computation and Language

Addressing the Data Sparsity Issue in Neural AMR Parsing

Xiaochang Peng, Chuan Wang, Daniel Gildea, Nianwen Xue
Comments: Accepted by EACL-17
Subjects: Computation and Language (cs.CL)

Neural attention models have achieved great success in different NLP tasks.
How- ever, they have not fulfilled their promise on the AMR parsing task due to
the data sparsity issue. In this paper, we de- scribe a sequence-to-sequence
model for AMR parsing and present different ways to tackle the data sparsity
problem. We show that our methods achieve significant improvement over a
baseline neural atten- tion model and our results are also compet- itive
against state-of-the-art systems that do not use extra linguistic resources.

Fast and unsupervised methods for multilingual cognate clustering

Taraka Rama, Johannes Wahle, Pavel Sofroniev, Gerhard Jäger
Subjects: Computation and Language (cs.CL)

In this paper we explore the use of unsupervised methods for detecting
cognates in multilingual word lists. We use online EM to train sound segment
similarity weights for computing similarity between two words. We tested our
online systems on geographically spread sixteen different language groups of
the world and show that the Online PMI system (Pointwise Mutual Information)
outperforms a HMM based system and two linguistically motivated systems:
LexStat and ALINE. Our results suggest that a PMI system trained in an online
fashion can be used by historical linguists for fast and accurate
identification of cognates in not so well-studied language families.

An Analysis of Machine Learning Intelligence

John P. Lalor, Hao Wu, Tsendsuren Munkhdalai, Hong Yu
Comments: 10 pages, 2 figures, 3 tables
Subjects: Computation and Language (cs.CL)

Deep neural networks (DNNs) have set state of the art results in many machine
learning and NLP tasks. However, we do not have a strong understanding of what
DNN models learn. In this paper, we examine learning in DNNs through analysis
of their outputs. We compare DNN performance directly to a human population,
and use characteristics of individual data points such as difficulty to see how
well models perform on easy and hard examples. We investigate how training size
and the incorporation of noise affect a DNN’s ability to generalize and learn.
Our experiments show that unlike traditional machine learning models (e.g.,
Naive Bayes, Decision Trees), DNNs exhibit human-like learning properties. As
they are trained with more data, they are more able to distinguish between easy
and difficult items, and performance on easy items improves at a higher rate
than difficult items. We find that different DNN models exhibit different
strengths in learning and are robust to noise in training data.

Training Language Models Using Target-Propagation

While Truncated Back-Propagation through Time (BPTT) is the most popular
approach to training Recurrent Neural Networks (RNNs), it suffers from being
inherently sequential (making parallelization difficult) and from truncating
gradient flow between distant time-steps. We investigate whether Target
Propagation (TPROP) style approaches can address these shortcomings.
Unfortunately, extensive experiments suggest that TPROP generally underperforms
BPTT, and we end with an analysis of this phenomenon, and suggestions for
future work.

Distributed, Parallel, and Cluster Computing

Integration of QoS aspects in the Cloud Computing Research and Selection System

Manar Abourezq, Abdellah Idrissi
Journal-ref: International Journal of Advanced Computer Science and Application
(IJACSA) Vol. 6, No. 6, 2015
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Cloud Computing is a business model revolution more than a technological one.
It capitalized on various technologies that have proved themselves and reshaped
the use of computers by replacing their local use by a centralized one where
shared resources are stored and managed by a third-party in a way transparent
to end-users. With this new use came new needs and one of them is the need to
search through Cloud services and select the ones that meet certain
requirements. To address this need, we have developed, in a previous work, the
Cloud Service Research and Selection System (CSRSS) which aims to allow Cloud
users to search through Cloud services in the database and find the ones that
match their requirements. It is based on the Skyline and ELECTRE IS. In this
paper, we improve the system by introducing 7 new dimensions related to QoS
constraints. Our work’s main contribution is conceiving an Agent that uses both
the Skyline and an outranking method, called ELECTREIsSkyline, to determine
which Cloud services meet better the users’ requirements while respecting QoS
properties. We programmed and tested this method for a total of 10 dimensions
and for 50 000 cloud services. The first results are very promising and show
the effectiveness of our approach.

Ignore or Comply? On Breaking Symmetry in Consensus

Petra Berenbrink, Andrea Clementi, Robert Elsässer, Peter Kling, Frederik Mallmann-Trenn, Emanuele Natale
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We study consensus processes on the complete graph of (n) nodes. Initially,
each node supports one from up to n opinions. Nodes randomly and in parallel
sample the opinions of constant many nodes. Based on these samples, they use an
update rule to change their own opinion.

The goal is to reach consensus, a configuration where all nodes support the
same opinion. We compare two well-known update rules: 2-Choices and 3-Majority.
In the former, each node samples two nodes and adopts their opinion if they
agree. In the latter, each node samples three nodes: If an opinion is supported
by at least two samples the node adopts it, otherwise it randomly adopts one of
the sampled opinions. Known results for these update rules focus on initial
configurations with a limited number of colors (say (n^{1/3}) ), or typically
assume a bias, where one opinion has a much larger support than any other. For
such biased configurations, the time to reach consensus is roughly the same for
2-Choices and 3-Majority.

Interestingly, we prove that this is no longer true for configurations with a
large number of initial colors. In particular, we show that 3-Majority reaches
consensus with high probability in (O(n^{3/4}log^{7/8}n)) rounds, while
2-Choices can need (Omega(n/log n)) rounds. We thus get the first
unconditional sublinear bound for 3-Majority and the first result separating
the consensus time of these processes. Along the way, we develop a framework
that allows a fine-grained comparison between consensus processes from a
specific class. We believe that this framework might help to classify the
performance of more consensus processes.

Proust: A Design Space for Highly-Concurrent Transactional Data Structures

Thomas D. Dickerson, Paul Gazzillo, Eric Koskinen, Maurice Herlihy
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Most STM systems are poorly equipped to support libraries of concurrent data
structures. One reason is that they typically detect conflicts by tracking
transactions’ read sets and write sets, an approach that often leads to false
conflicts. A second is that existing data structures and libraries often need
to be rewritten from scratch to support transactional conflict detection and
rollback. This paper introduces Proust, a framework for the design and
implementation of transactional data structures. Proust is designed to maximize
re-use of existing well-engineered by providing transactional “wrappers” to
make existing thread-safe concurrent data structures transactional. Proustian
objects are also integrated with an underling STM system, allowing them to take
advantage of well-engineered STM conflict detection mechanisms. Proust
generalizes and unifies prior approaches such as boosting and predication.

Coded TeraSort

Songze Li, Sucha Supittayapornpong, Mohammad Ali Maddah-Ali, A. Salman Avestimehr
Comments: to appear in proceedings of 2017 International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

We focus on sorting, which is the building block of many machine learning
algorithms, and propose a novel distributed sorting algorithm, named Coded
TeraSort, which substantially improves the execution time of the TeraSort
benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose
structured redundancy in data, in order to enable in-network coding
opportunities that overcome the data shuffling bottleneck of TeraSort. We
empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2
clusters, and demonstrate that it achieves 1.97x – 3.39x speedup, compared with
TeraSort, for typical settings of interest.

An Efficient Parallel Data Clustering Algorithm Using Isoperimetric Number of Trees

Ramin Javadi, Saleh Ashkboos
Comments: 16 pages, 6 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We propose a parallel graph-based data clustering algorithm using CUDA GPU,
based on exact clustering of the minimum spanning tree in terms of a minimum
isoperimetric criteria. We also provide a comparative performance analysis of
our algorithm with other related ones which demonstrates the general
superiority of this parallel algorithm over other competing algorithms in terms
of accuracy and speed.

Learning

Discovering objects and their relations from entangled scene representations

Our world can be succinctly and compactly described as structured scenes of
objects and relations. A typical room, for example, contains salient objects
such as tables, chairs and books, and these objects typically relate to each
other by their underlying causes and semantics. This gives rise to correlated
features, such as position, function and shape. Humans exploit knowledge of
objects and their relations for learning a wide spectrum of tasks, and more
generally when learning the structure underlying observed data. In this work,
we introduce relation networks (RNs) – a general purpose neural network
architecture for object-relation reasoning. We show that RNs are capable of
learning object relations from scene description data. Furthermore, we show
that RNs can act as a bottleneck that induces the factorization of objects from
entangled scene description inputs, and from distributed deep representations
of scene images provided by a variational autoencoder. The model can also be
used in conjunction with differentiable memory mechanisms for implicit relation
discovery in one-shot learning tasks. Our results suggest that relation
networks are a potentially powerful architecture for solving a variety of
problems that require object relation reasoning.

Reflexive Regular Equivalence for Bipartite Data

Bipartite data is common in data engineering and brings unique challenges,
particularly when it comes to clustering tasks that impose on strong structural
assumptions. This work presents an unsupervised method for assessing similarity
in bipartite data. Similar to some co-clustering methods, the method is based
on regular equivalence in graphs. The algorithm uses spectral properties of a
bipartite adjacency matrix to estimate similarity in both dimensions. The
method is reflexive in that similarity in one dimension is used to inform
similarity in the other. Reflexive regular equivalence can also use the
structure of transitivities — in a network sense — the contribution of which
is controlled by the algorithm’s only free-parameter, (alpha). The method is
completely unsupervised and can be used to validate assumptions of
co-similarity, which are required but often untested, in co-clustering
analyses. Three variants of the method with different normalizations are tested
on synthetic data. The method is found to be robust to noise and well-suited to
asymmetric co-similar structure, making it particularly informative for cluster
analysis and recommendation in bipartite data of unknown structure. In
experiments, the convergence and speed of the algorithm are found to be stable
for different levels of noise. Real-world data from a network of malaria genes
are analyzed, where the similarity produced by the reflexive method is shown to
out-perform other measures’ ability to correctly classify genes.

Learning to Use Learners' Advice

Adish Singla, Hamed Hassani, Andreas Krause
Subjects: Learning (cs.LG)

In this paper, we study a variant of the framework of online learning using
expert advice with limited/bandit feedback—we consider each expert a learning
entity and thereby capture scenarios that are more realistic and practical for
real-world applications. In our setting, the feedback at any time (t) is
limited in a sense that it is only available to the expert (i^t) that has been
selected by the central algorithm (forecaster), i.e., only the expert (i^t)
receives feedback from the environment and gets to learn at time (t). We
consider a generic black-box approach whereby the forecaster doesn’t control or
know the learning dynamics of the experts apart from knowing the following
no-regret learning property: the average regret of any expert (j) vanishes at a
rate of at least (O(t_j^{eta-1})) with (t_j) learning steps where (eta in
[0, 1]) is a parameter. We prove the following hardness result: without any
coordination between the forecaster and the experts, it is impossible to design
a forecaster achieving no-regret guarantees in the worst-case. In order to
circumvent this hardness result, we consider a practical assumption allowing
the forecaster to “guide” the learning process of the experts by
filtering/blocking some of the feedbacks observed by them from the environment,
i.e., not allowing the selected expert (i^t) to learn at time (t) for some time
steps. Then, we design a novel no-regret learning algorithm algo for this
problem setting by carefully guiding the feedbacks observed by experts. We
prove that algo achieves the worst-case expected cumulative regret of
(O(T^frac{1}{2 – eta})) after (T) time steps and matches the regret bound of
(Theta(T^frac{1}{2})) for the special case of multi-armed bandits.

Precise Recovery of Latent Vectors from Generative Adversarial Networks

Zachary C. Lipton, Subarna Tripathi
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Generative adversarial networks (GANs) transform latent vectors into visually
plausible images. It is generally thought that the original GAN formulation
gives no out-of-the-box method to reverse the mapping, projecting images back
into latent space. We introduce a simple, gradient-based technique called
stochastic clipping. In experiments, for images generated by the GAN, we
exactly recover their latent vector pre-images 100% of the time. Additional
experiments demonstrate that this method is robust to noise. Finally, we show
that even for unseen images, our method appears to recover unique encodings.

Efficient Computation of Moments in Sum-Product Networks

Han Zhao, Geoff Gordon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

Bayesian online learning algorithms for Sum-Product Networks (SPNs) need to
compute moments of model parameters under the one-step update posterior
distribution. The best existing method for computing such moments scales
quadratically in the size of the SPN, although it scales linearly for trees. We
propose a linear-time algorithm that works even when the SPN is a directed
acyclic graph (DAG). We achieve this goal by reducing the moment computation
problem into a joint inference problem in SPNs and by taking advantage of a
special structure of the one-step update posterior distribution: it is a
multilinear polynomial with exponentially many monomials, and we can evaluate
moments by differentiating. The latter is known as the emph{differential
trick}. We apply the proposed algorithm to develop a linear time assumed
density filter (ADF) for SPN parameter learning. As an additional contribution,
we conduct extensive experiments comparing seven different online learning
algorithms for SPNs on 20 benchmark datasets. The new linear-time ADF method
consistently achieves low runtime due to the efficient linear-time algorithm
for moment computation; however, we discover that two other methods (CCCP and
SMA) typically perform better statistically, while a third (BMM) is comparable
to ADF. Interestingly, CCCP can be viewed as implicitly using the same
differentiation trick that we make explicit here. The fact that two of the top
four fastest methods use this trick suggests that the same trick might find
other uses for SPN learning in the future.

Unbiased Online Recurrent Optimization

Corentin Tallec, Yann Ollivier
Comments: 11 pages, 5 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

The novel Unbiased Online Recurrent Optimization (UORO) algorithm allows for
online learning of general recurrent computational graphs such as recurrent
network models. It works in a streaming fashion and avoids backtracking through
past activations and inputs. UORO is a modification of NoBackTrack that
bypasses the need for model sparsity and makes implementation easy in current
deep learning frameworks, even for complex models. Computationally, UORO is as
costly as Truncated Backpropagation Through Time (TBPTT). Contrary to TBPTT,
UORO is guaranteed to provide unbiased gradient estimates, and does not favor
short-term dependencies. The downside is added noise, requiring smaller
learning rates.

On synthetic tasks, UORO is found to overcome several deficiencies of TBPTT.
For instance, when a parameter has a positive short-term but negative long-term
influence, TBPTT may require truncation lengths substantially larger than the
intrinsic temporal range of the interactions, while UORO performs well thanks
to the unbiasedness of its gradients.

Generalizing Jensen and Bregman divergences with comparative convexity and the statistical Bhattacharyya distances with comparable means

Frank Nielsen, Richard Nock
Comments: 19 pages
Subjects: Information Theory (cs.IT); Learning (cs.LG)

Comparative convexity is a generalization of convexity relying on abstract
notions of means. We define the Jensen divergence and the Jensen diversity from
the viewpoint of comparative convexity, and show how to obtain the generalized
Bregman divergences as limit cases of skewed Jensen divergences. In particular,
we report explicit formula of these generalized Bregman divergences when
considering quasi-arithmetic means. Finally, we introduce a generalization of
the Bhattacharyya statistical distances based on comparative means using
relative convexity.

Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging

Shusen Wang, Alex Gittens, Michael W. Mahoney
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Numerical Analysis (cs.NA)

We address the statistical and optimization impacts of using classical or
Hessian sketch to approximately solve the Matrix Ridge Regression (MRR)
problem. Prior research has considered the effects of classical sketch on least
squares regression (LSR), a strictly simpler problem. We establish that
classical sketch has a similar effect upon the optimization properties of MRR
as it does on those of LSR — namely, it recovers nearly optimal solutions. In
contrast, Hessian sketch does not have this guarantee; instead, the
approximation error is governed by a subtle interplay between the “mass” in the
responses and the optimal objective value.

For both types of approximations, the regularization in the sketched MRR
problem gives it significantly different statistical properties from the
sketched LSR problem. In particular, there is a bias-variance trade-off in
sketched MRR that is not present in sketched LSR. We provide upper and lower
bounds on the biases and variances of sketched MRR; these establish that the
variance is significantly increased when classical sketch is used, while the
bias is significantly increased when using Hessian sketches. Empirically,
sketched MRR solutions can have risks that are higher by an order-of-magnitude
than those of the optimal MRR solutions.

We establish theoretically and empirically that model averaging greatly
decreases this gap. Thus, in the distributed setting, sketching combined with
model averaging is a powerful technique for quickly obtaining near-optimal
solutions to the MRR problem greatly mitigating the statistical losses incurred
by sketching.

Dynamic Partition Models

Marc Goessling, Yali Amit
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We present a new approach for learning compact and intuitive distributed
representations with binary encoding. Rather than summing up expert votes as in
products of experts, we employ for each variable the opinion of the most
reliable expert. Data points are hence explained through a partitioning of the
variables into expert supports. The partitions are dynamically adapted based on
which experts are active. During the learning phase we adopt a smoothed version
of this model that uses separate mixtures for each data dimension. In our
experiments we achieve accurate reconstructions of high-dimensional data points
with at most a dozen experts.

Training Language Models Using Target-Propagation

While Truncated Back-Propagation through Time (BPTT) is the most popular
approach to training Recurrent Neural Networks (RNNs), it suffers from being
inherently sequential (making parallelization difficult) and from truncating
gradient flow between distant time-steps. We investigate whether Target
Propagation (TPROP) style approaches can address these shortcomings.
Unfortunately, extensive experiments suggest that TPROP generally underperforms
BPTT, and we end with an analysis of this phenomenon, and suggestions for
future work.

Information Theory

Binary, Shortened Projective Reed Muller Codes for Coded Private Information Retrieval

Myna Vajha, Vinayak Ramkumar, P. Vijay Kumar
Comments: submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

The notion of a Private Information Retrieval (PIR) code was recently
introduced by Fazeli, Vardy and Yaakobi who showed that this class of codes
permit PIR at reduced levels of storage overhead in comparison with
replicated-server PIR. In the present paper, the construction of an ((n,k))
( au)-server binary, linear PIR code having parameters (n = sumlimits_{i =
0}^{ell} {m choose i}), (k = {m choose ell}) and ( au = 2^{ell}) is
presented. These codes are obtained through homogeneous-polynomial evaluation
and correspond to the binary, Projective Reed Muller (PRM) code. The
construction can be extended to yield PIR codes for any ( au) of the form
(2^{ell}), (2^{ell}-1) and any value of (k), through a combination of
single-symbol puncturing and shortening of the PRM code. Each of these code
constructions above, have smaller storage overhead in comparison with other PIR
codes appearing in the literature.

For the particular case of ( au=3,4), we show that the codes constructed
here are optimal, systematic PIR codes by providing an improved lower bound on
the block length (n(k, au)) of a systematic PIR code. It follows from a
result by Vardy and Yaakobi, that these codes also yield optimal, systematic
primitive multi-set ((n, k, au)_B) batch codes for ( au=3,4). The PIR code
constructions presented here also yield upper bounds on the generalized Hamming
weights of binary PRM codes.

Cache-Aided Full-Duplex Small Cells

Marco Maso, Italo Atzeni, Imène Ghamnia, Ejder Baştuğ, Mérouane Debbah
Comments: Submitted to CCDWN Workshop, International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt) 2017
Subjects: Information Theory (cs.IT)

Caching popular contents at the edge of the network can positively impact the
performance and future sustainability of wireless networks in several ways,
e.g., end-to-end access delay reduction and peak rate increase. In this paper,
we aim at showing that non-negligible performance enhancements can be observed
in terms of network interference footprint as well. To this end, we consider a
full-duplex small-cell network consisting of non-cooperative cache-aided base
stations, which communicate simultaneously with both downlink users and
wireless backhaul nodes. We propose a novel caching model seeking to mimic a
geographical policy based on local files popularity and calculate the
corresponding cache hit probability. Subsequently we study the performance of
the considered network in terms of throughput gain with respect to its
cache-free half-duplex counterpart. Numerical results corroborate our
theoretical findings and highlight remarkable performance gains when moving
from cache-free to cache-aided full-duplex small-cell networks.

A Universal Sampling Framework for Solving Physics-driven Inverse Source Problems

John Murray-Bruce, Pier Luigi Dragotti
Subjects: Information Theory (cs.IT); Analysis of PDEs (math.AP)

Partial differential equations are central to describing many physical
phenomena. Moreover in many applications these phenomena are observed through a
sensor network, with the aim of inferring its underlying properties. Here we
present a new framework for analysing fields governed by linear partial
differential equations. The framework leverages from certain results in
sampling and approximation theory to provide a unifying approach for solving a
class of inverse source problems. Specifically, we show that the unknown field
sources can be recovered from a sequence of so called generalised measurements
using multidimensional frequency estimation techniques. Moreover, we show that
this sequence of generalised measurements for our physics-driven fields, can be
computed by taking linear weighted-sums of the sensor measurements, whereby the
exact weights (of the sums) coincide with those that can reproduce
multidimensional exponentials from linearly combined translates of a particular
prototype function. The prototype function and the desired weights are shown to
depend on the Green’s function of the underlying field. Based on this new
framework we develop practical, noise robust, sensor network strategies for
solving the inverse source problem, and then present numerical simulation
results to verify their performance.

Cross-Mode Interference Characterization in Cellular Networks with Voronoi Guard Regions

Stelios Stefanatos, Antonis G. Gotsis, Angeliki Alexiou
Comments: 12 pages (two-column format); Submitted
Subjects: Information Theory (cs.IT)

Advances in cellular networks such as device-to-device communications and
full-duplex radios, as well as the inherent elimination of intra-cell
interference achieved by network-controlled multiple access schemes, motivates
the investigation of the cross-mode interference under a guard region
corresponding to the Voronoi cell of an access point. By modeling the positions
of interfering access points (APs) and user equipments (UEs) as Poisson
distributed, analytical expressions for the statistics of the interference
generated by either APs or UEs are obtained based on appropriately defined
density functions. The considered system model and analysis are general enough
to capture many operational scenarios of practical interest, including
conventional downlink/uplink transmissions with nearest AP association, as well
as transmissions where not both communicating nodes lie within the cell.
Analysis provides insights on the level of protection offered by a Voronoi
guard region and its dependence on type of interference and receiver position.
Various examples demonstrate the validity/accuracy of the analysis in obtaining
the system coverage probability for operational scenarios of practical
interest.

Sensor management with time, energy and communication constraints

Cristian Rusu, John Thompson, Neil M. Robertson
Subjects: Information Theory (cs.IT)

In this paper we present new algorithms and analysis for the linear inverse
sensor placement and scheduling problems over multiple time instances with
power and communications constraints. The proposed algorithms, which deal
directly with minimizing the mean squared and worst case errors (MSE and WCE),
are based on the convex relaxation approach to address the binary optimization
scheduling problems that are formulated in sensor network scenarios. We propose
to balance the energy and communications demands of operating a network of
sensors over time while we still guarantee a minimum level of estimation
accuracy. We measure this accuracy by the MSE and WCE for which we provide
tight average case and lower bounds analyses. We show experimentally how the
proposed algorithm perform against state-of-the-art methods previously
described in the literature.

Compressed sensing in Hilbert spaces

Yann Traonmilin (PANAMA), Gilles Puy, Rémi Gribonval (PANAMA), Mike Davies
Subjects: Information Theory (cs.IT)

In many linear inverse problems, we want to estimate an unknown vector
belonging to a high-dimensional (or infinite-dimensional) space from few linear
measurements. To overcome the ill-posed nature of such problems, we use a
low-dimension assumption on the unknown vector: it belongs to a low-dimensional
model set. The question of whether it is possible to recover such an unknown
vector from few measurements then arises. If the answer is yes, it is also
important to be able to describe a way to perform such a recovery. We describe
a general framework where appropriately chosen random measurements guarantee
that recovery is possible. We further describe a way to study the performance
of recovery methods that consist in the minimization of a regularization
function under a data-fit constraint.

Polar codes with a stepped boundary

Ilya Dumer
Comments: This article has been submitted to ISIT 2017
Subjects: Information Theory (cs.IT)

We consider explicit polar constructions of blocklength (n
ightarrowinfty)
for the two extreme cases of code rates (R
ightarrow1) and (R
ightarrow0.)
For code rates (R
ightarrow1,) we design codes with complexity order of (nlog
n) in code construction, encoding, and decoding. These codes achieve the
vanishing output bit error rates on the binary symmetric channels with any
transition error probability (p
ightarrow 0) and perform this task with a
substantially smaller redundancy ((1-R)n) than do other known high-rate codes,
such as BCH codes or Reed-Muller (RM). We then extend our design to the
low-rate codes that achieve the vanishing output error rates with the same
complexity order of (nlog n) and an asymptotically optimal code rate
(R
ightarrow0) for the case of (p
ightarrow1/2.)

Generalizing Jensen and Bregman divergences with comparative convexity and the statistical Bhattacharyya distances with comparable means

Frank Nielsen, Richard Nock
Comments: 19 pages
Subjects: Information Theory (cs.IT); Learning (cs.LG)

Comparative convexity is a generalization of convexity relying on abstract
notions of means. We define the Jensen divergence and the Jensen diversity from
the viewpoint of comparative convexity, and show how to obtain the generalized
Bregman divergences as limit cases of skewed Jensen divergences. In particular,
we report explicit formula of these generalized Bregman divergences when
considering quasi-arithmetic means. Finally, we introduce a generalization of
the Bhattacharyya statistical distances based on comparative means using
relative convexity.

Improved Converses and Gap-Results for Coded Caching

Chien-Yi Wang, Shirin Saeedi Bidokhti, Michele Wigger
Subjects: Information Theory (cs.IT)

Improved lower bounds on the average and the worst-case rate-memory tradeoffs
for the Maddah-Ali&Niesen coded caching scenario are presented. For any number
of users and files and for arbitrary cache sizes, the multiplicative gap
between the exact rate-memory tradeoff and the new lower bound is less than
2.315 in the worst-case scenario and less than 2.507 in the average-case
scenario.

ns-3 Implementation of the 3GPP MIMO Channel Model for Frequency Spectrum above 6 GHz

Menglei Zhang, Michele Polese, Marco Mezzavilla, Sundeep Rangan, Michele Zorzi
Subjects: Information Theory (cs.IT)

Communications at mmWave frequencies will be a key enabler of the next
generation of cellular networks, due to the multi-Gbps rate that can be
achieved. However, there are still several problems that must be solved before
this technology can be widely adopted, primarily associated with the interplay
between the variability of mmWave links and the complexity of mobile networks.
An end-to-end network simulator represents a great tool to assess the
performance of any proposed solution to meet the stringent 5G requirements.
Given the criticality of channel propagation characteristics at higher
frequencies, we present our implementation of the 3GPP channel model for the
6-100 GHz band for the ns-3 end-to-end 5G mmWave module, and detail its
associated MIMO beamforming architecture.

An Equivalence Between Secure Network and Index Coding

Lawrence Ong, Badri N. Vellambi, Jörg Kliewer, Phee Lep Yeoh
Journal-ref: Proceedings of the 2016 IEEE Globecom Workshop on Network Coding
and Applications (NetCod), Washington, USA, Dec. 4-8, 2016
Subjects: Information Theory (cs.IT)

We extend the equivalence between network coding and index coding by Effros,
El Rouayheb, and Langberg to the secure communication setting in the presence
of an eavesdropper. Specifically, we show that the most general versions of
secure network-coding setup by Chan and Grant and the secure index-coding setup
by Dau, Skachek, and Chee, which also include the randomised encoding setting,
are equivalent.

Achievable information rates estimates in optically-amplified transmission systems using nonlinearity compensation and probabilistic shaping

Daniel Semrau, Tianhua Xu, Nikita A. Shevchenko, Milen Paskov, Alex Alvarado, Robert I. Killey, Polina Bayvel
Journal-ref: Optics Letters, Vol. 42, No. 1, 121-124 (2017)
Subjects: Information Theory (cs.IT)

Achievable information rates (AIRs) of wideband optical communication systems
using ~40 nm (~5 THz) EDFA and ~100 nm (~12.5 THz) distributed Raman
amplification are estimated based on a first-order perturbation analysis. The
AIRs of each individual channel have been evaluated for DP-64QAM, DP-256QAM,
and DP-1024QAM modulation formats. The impact of full-field nonlinear
compensation (FF-NLC) and probabilistically shaped constellations using a
Maxwell-Boltzmann distribution were studied and compared to electronic
dispersion compensation. It is found that a probabilistically shaped DP-1024QAM
constellation combined with FF-NLC yields AIRs of ~75 Tbit/s for the EDFA
scheme and ~223 Tbit/s for the Raman amplification scheme over 2000 km standard
single mode fibre transmission.

Gaussian and Sparse Processes Are Limits of Generalized Poisson Processes

Julien Fageot, Virginie Uhlmann, Michael Unser
Comments: 16 pages, 11 figures
Subjects: Probability (math.PR); Information Theory (cs.IT)

The theory of sparse stochastic processes offers a broad class of statistical
models to study signals. In this framework, signals are represented as
realizations of random processes that are solution of linear stochastic
differential equations driven by white L’evy noises. Among these processes,
generalized Poisson processes based on compound-Poisson noises admit an
interpretation as random L-splines with random knots and weights. We
demonstrate that every generalized L’evy process-from Gaussian to sparse-can
be understood as the limit in law of a sequence of generalized Poisson
processes. This enables a new conceptual understanding of sparse processes and
suggests simple algorithms for the numerical generation of such objects.

Performance Analysis of Dense Small Cell Networks with Generalized Fading

Bin Yang, Ming Ding, Guoqiang Mao, Xiaohu Ge
Comments: 3 figures. Accepted by ICC 2017
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

In this paper, we propose a unified framework to analyze the performance of
dense small cell networks (SCNs) in terms of the coverage probability and the
area spectral efficiency (ASE). In our analysis, we consider a practical path
loss model that accounts for both non-line-of-sight (NLOS) and line-of-sight
(LOS) transmissions. Furthermore, we adopt a generalized fading model, in which
Rayleigh fading, Rician fading and Nakagami-m fading can be treated in a
unified framework. The analytical results of the coverage probability and the
ASE are derived, using a generalized stochastic geometry analysis. Different
from existing work that does not differentiate NLOS and LOS transmissions, our
results show that NLOS and LOS transmissions have a significant impact on the
coverage probability and the ASE performance, particularly when the SCNs grow
dense. Furthermore, our results establish for the first time that the
performance of the SCNs can be divided into four regimes, according to the
intensity (aka density) of BSs, where in each regime the performance is
dominated by different factors.

Coded TeraSort

We focus on sorting, which is the building block of many machine learning
algorithms, and propose a novel distributed sorting algorithm, named Coded
TeraSort, which substantially improves the execution time of the TeraSort
benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose
structured redundancy in data, in order to enable in-network coding
opportunities that overcome the data shuffling bottleneck of TeraSort. We
empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2
clusters, and demonstrate that it achieves 1.97x – 3.39x speedup, compared with
TeraSort, for typical settings of interest.