IT博客汇 | arXiv Paper Daily: Mon, 3 Oct 2016

arXiv Paper Daily: Mon, 3 Oct 2016

我爱机器学习(52ml.net)发表于 2016-10-03 00:00:00

Computer Vision and Pattern Recognition

Latent fingerprint minutia extraction using fully convolutional network

Yao Tang, Fei Gao, Jufu Feng
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Minutiae have played an important role in fingerprint identification.
Extracting effective minutiae is difficult for latent fingerprints which are
usually of poor quality. Instead of conventional hand-craft features, a fully
convolutional network(FCN) is learned end-to-end to extract minutiae from raw
fingerprints in pixel level. FCN is used to map raw fingerprints to a
correspondingly-sized minutia-score map with a fixed stride. And thus a large
number of minutiae will be extracted through a given thresh. Then small regions
centering at these minutia points are put into a convolutional neural
network(CNN) to reclassify these minutiae and calculate their orientations. The
CNN shares the convolutional layers with the fully convolutional network to
speed up. For the VGG model~cite{simonyan2014very}, 0.45 second is used on
average to detect one fingerprint on a GPU. On the NIST SD27 database we
achieve 53\% recall rate and 53\% precise rate that beats many other
algorithms. Our trained model is also visualized to see that we have
successfully extracted features preserving ridge information of a latent
fingerprint.

A deep representation for depth images from synthetic data

Fabio Maria Carlucci, Paolo Russo, Barbara Caputo
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Convolutional Neural Networks (CNNs) trained on large scale RGB databases
have become the secret sauce in the majority of recent approaches for object
categorization from RGB-D data. Thanks to colorization techniques, these
methods exploit the filters learned from 2D images to extract meaningful
representations in 2.5D. Still, the perceptual signature of these two kind of
images is very different, with the first usually strongly characterized by
textures, and the second mostly by silhouettes of objects. Ideally, one would
like to have two CNNs, one for RGB and one for depth, each trained on a
suitable data collection, able to capture the perceptual properties of each
channel for the task at hand. This has not been possible so far, due to the
lack of a suitable depth database. This paper addresses this issue, proposing
to opt for synthetically generated images rather than collecting by hand a 2.5D
large scale database. While being clearly a proxy for real data, synthetic
images allow to trade quality for quantity, making it possible to generate a
virtually infinite amount of data. We show that the filters learned from such
data collection, using the very same architecture typically used on visual
data, learns very different filters, resulting in depth features (a) able to
better characterize the different facets of depth images, and (b) complementary
with respect to those derived from CNNs pre-trained on 2D datasets. Experiments
on two publicly available databases show the power of our approach.

Training a Feedback Loop for Hand Pose Estimation

Markus Oberweger, Paul Wohlhart, Vincent Lepetit
Comments: Presented at ICCV 2015 (oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose an entirely data-driven approach to estimating the 3D pose of a
hand given a depth image. We show that we can correct the mistakes made by a
Convolutional Neural Network trained to predict an estimate of the 3D pose by
using a feedback loop. The components of this feedback loop are also Deep
Networks, optimized using training data. They remove the need for fitting a 3D
model to the input data, which requires both a carefully designed fitting
function and algorithm. We show that our approach outperforms state-of-the-art
methods, and is efficient as our implementation runs at over 400 fps on a
single GPU.

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Roberto DiCecco, Griffin Lacey, Jasmina Vasiljevic, Paul Chow, Graham Taylor, Shawki Areibi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

Convolutional Neural Networks (CNNs) have gained significant traction in the
field of machine learning, particularly due to their high accuracy in visual
recognition. Recent works have pushed the performance of GPU implementations of
CNNs to significantly improve their classification and training times. With
these improvements, many frameworks have become available for implementing CNNs
on both CPUs and GPUs, with no support for FPGA implementations. In this work
we present a modified version of the popular CNN framework Caffe, with FPGA
support. This allows for classification using CNN models and specialized FPGA
implementations with the flexibility of reprogramming the device when
necessary, seamless memory transactions between host and device, simple-to-use
test benches, and the ability to create pipelined layer implementations. To
validate the framework, we use the Xilinx SDAccel environment to implement an
FPGA-based Winograd convolution engine and show that the FPGA layer can be used
alongside other layers running on a host processor to run several popular CNNs
(AlexNet, GoogleNet, VGG A, Overfeat). The results show that our framework
achieves 50 GFLOPS across 3×3 convolutions in the benchmarks. This is achieved
within a practical framework, which will aid in future development of
FPGA-based CNNs.

A CNN Cascade for Landmark Guided Semantic Part Segmentation

Aaron Jackson, Michel Valstar, Georgios Tzimiropoulos
Comments: accepted to Geometry Meets Deep Learning ECCV 2016 Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a CNN cascade for semantic part segmentation guided by
pose-specific information encoded in terms of a set of landmarks (or
keypoints). There is large amount of prior work on each of these tasks
separately, yet, to the best of our knowledge, this is the first time in
literature that the interplay between pose estimation and semantic part
segmentation is investigated. To address this limitation of prior work, in this
paper, we propose a CNN cascade of tasks that firstly performs landmark
localisation and then uses this information as input for guiding semantic part
segmentation. We applied our architecture to the problem of facial part
segmentation and report large performance improvement over the standard
unguided network on the most challenging face datasets. Testing code and models
will be published online at this http URL

Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge

Adrian Bulat, Georgios Tzimiropoulos
Comments: Winner of 3D Face Alignment in the Wild (3DFAW) Challenge, ECCV 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper describes our submission to the 1st 3D Face Alignment in the Wild
(3DFAW) Challenge. Our method builds upon the idea of convolutional part
heatmap regression [1], extending it for 3D face alignment. Our method
decomposes the problem into two parts: (a) X,Y (2D) estimation and (b) Z
(depth) estimation. At the first stage, our method estimates the X,Y
coordinates of the facial landmarks by producing a set of 2D heatmaps, one for
each landmark, using convolutional part heatmap regression. Then, these
heatmaps, alongside the input RGB image, are used as input to a very deep
subnetwork trained via residual learning for regressing the Z coordinate. Our
method ranked 1st in the 3DFAW Challenge, surpassing the second best result by
more than 22%.

Digitizing Municipal Street Inspections Using Computer Vision

Varun Adibhatla (ARGO Labs), Shi Fan (NYU Center for Data Science), Krystof Litomisky (ARGO Labs), Patrick Atwater (ARGO Labs)
Comments: Presented at the Data For Good Exchange 2016
Subjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV)

“People want an authority to tell them how to value things. But they chose
this authority not based on facts or results. They chose it because it seems
authoritative and familiar.” – The Big Short

The pavement condition index is one such a familiar measure used by many US
cities to measure street quality and justify billions of dollars spent every
year on street repair. These billion-dollar decisions are based on evaluation
criteria that are subjective and not representative. In this paper, we build
upon our initial submission to D4GX 2015 that approaches this problem of
information asymmetry in municipal decision-making.

We describe a process to identify street-defects using computer vision
techniques on data collected using the Street Quality Identification Device
(SQUID). A User Interface to host a large quantity of image data towards
digitizing the street inspection process and enabling actionable intelligence
for a core public service is also described. This approach of combining device,
data and decision-making around street repair enables cities make targeted
decisions about street repair and could lead to an anticipatory response which
can result in significant cost savings. Lastly, we share lessons learnt from
the deployment of SQUID in the city of Syracuse, NY.

Multi-dimensional signal approximation with sparse structured priors using split Bregman iterations

Yoann Isaac, Quentin Barthélemy, Cédric Gouy-Pailler, Michèle Sebag, Jamal Atif
Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

This paper addresses the structurally-constrained sparse decomposition of
multi-dimensional signals onto overcomplete families of vectors, called
dictionaries. The contribution of the paper is threefold. Firstly, a generic
spatio-temporal regularization term is designed and used together with the
standard $ell_1$ regularization term to enforce a sparse decomposition
preserving the spatio-temporal structure of the signal. Secondly, an
optimization algorithm based on the split Bregman approach is proposed to
handle the associated optimization problem, and its convergence is analyzed.
Our well-founded approach yields same accuracy as the other algorithms at the
state-of-the-art, with significant gains in terms of convergence speed.
Thirdly, the empirical validation of the approach on artificial and real-world
problems demonstrates the generality and effectiveness of the method. On
artificial problems, the proposed regularization subsumes the Total Variation
minimization and recovers the expected decomposition. On the real-world problem
of electro-encephalography brainwave decomposition, the approach outperforms
similar approaches in terms of P300 evoked potentials detection, using
structured spatial priors to guide the decomposition.

Artificial Intelligence

Technical Report: Graph-Structured Sparse Optimization for Connected Subgraph Detection

Baojian Zhou, Feng Chen
Comments: 11 pages in 2016 IEEE International Conference of Data Mining
Subjects: Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

Structured sparse optimization is an important and challenging problem for
analyzing high-dimensional data in a variety of applications such as
bioinformatics, medical imaging, social networks, and astronomy. Although a
number of structured sparsity models have been explored, such as trees, groups,
clusters, and paths, connected subgraphs have been rarely explored in the
current literature. One of the main technical challenges is that there is no
structured sparsity-inducing norm that can directly model the space of
connected subgraphs, and there is no exact implementation of a projection
oracle for connected subgraphs due to its NP-hardness. In this paper, we
explore efficient approximate projection oracles for connected subgraphs, and
propose two new efficient algorithms, namely, Graph-IHT and Graph-GHTP, to
optimize a generic nonlinear objective function subject to connectivity
constraint on the support of the variables. Our proposed algorithms enjoy
strong guarantees analogous to several current methods for sparsity-constrained
optimization, such as Projected Gradient Descent (PGD), Approximate Model
Iterative Hard Thresholding (AM-IHT), and Gradient Hard Thresholding Pursuit
(GHTP) with respect to convergence rate and approximation accuracy. We apply
our proposed algorithms to optimize several well-known graph scan statistics in
several applications of connected subgraph detection as a case study, and the
experimental results demonstrate that our proposed algorithms outperform
state-of-the-art methods.

Characterization of experts in crowdsourcing platforms

Amal Ben Rjab (LARODEC, DRUID), Mouloud Kharoune (DRUID), Zoltan Miklos (DRUID), Arnaud Martin (DRUID)
Comments: in The 4th International Conference on Belief Functions, Sep 2016, Prague, Czech Republic
Subjects: Artificial Intelligence (cs.AI)

Crowdsourcing platforms enable to propose simple human intelligence tasks to
a large number of participants who realise these tasks. The workers often
receive a small amount of money or the platforms include some other incentive
mechanisms, for example they can increase the workers reputation score, if they
complete the tasks correctly. We address the problem of identifying experts
among participants, that is, workers, who tend to answer the questions
correctly. Knowing who are the reliable workers could improve the quality of
knowledge one can extract from responses. As opposed to other works in the
literature, we assume that participants can give partial or incomplete
responses, in case they are not sure that their answers are correct. We model
such partial or incomplete responses with the help of belief functions, and we
derive a measure that characterizes the expertise level of each participant.
This measure is based on precise and exactitude degrees that represent two
parts of the expertise level. The precision degree reflects the reliability
level of the participants and the exactitude degree reflects the knowledge
level of the participants. We also analyze our model through simulation and
demonstrate that our richer model can lead to more reliable identification of
experts.

Structured Inference Networks for Nonlinear State Space Models

Rahul G. Krishnan, Uri Shalit, David Sontag
Comments: Main paper: 13 pages, 12 Figures
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

Gaussian state space models have been used for decades as generative models
of sequential data. They admit an intuitive probabilistic interpretation, have
a simple functional form, and enjoy widespread adoption. We introduce a unified
algorithm to efficiently learn a broad class of linear and non-linear state
space models, including variants where the emission and transition
distributions are modeled by deep neural networks. Our learning algorithm
simultaneously learns a compiled inference network and the generative model,
leveraging a structured variational approximation parameterized by recurrent
neural networks to mimic the posterior distribution. We apply the learning
algorithm to both synthetic and real-world datasets, demonstrating its
scalability and versatility. We find that using the structured approximation to
the posterior results in models with significantly higher held-out likelihood.

Computation and Language

Referential Uncertainty and Word Learning in High-dimensional, Continuous Meaning Spaces

Michael Spranger, Katrien Beuls
Comments: Published as Spranger, M. and Beuls, K. (2016). Referential uncertainty and word learning in high-dimensional, continuous meaning spaces. In Hafner, V. and Pitti, A., editors, Development and Learning and Epigenetic Robotics (ICDL-Epirob), 2016 Joint IEEE International Conferences on, 2016. IEEE
Subjects: Computation and Language (cs.CL)

This paper discusses lexicon word learning in high-dimensional meaning spaces
from the viewpoint of referential uncertainty. We investigate various
state-of-the-art Machine Learning algorithms and discuss the impact of scaling,
representation and meaning space structure. We demonstrate that current Machine
Learning techniques successfully deal with high-dimensional meaning spaces. In
particular, we show that exponentially increasing dimensions linearly impact
learner performance and that referential uncertainty from word sensitivity has
no impact.

Controlling Output Length in Neural Encoder-Decoders

Yuta Kikuchi, Graham Neubig, Ryohei Sasano, Hiroya Takamura, Manabu Okumura
Comments: 11 pages. To appear in EMNLP 2016
Subjects: Computation and Language (cs.CL)

Neural encoder-decoder models have shown great success in many sequence
generation tasks. However, previous work has not investigated situations in
which we would like to control the length of encoder-decoder outputs. This
capability is crucial for applications such as text summarization, in which we
have to generate concise summaries with a desired length. In this paper, we
propose methods for controlling the output sequence length for neural
encoder-decoder models: two decoding-based methods and two learning-based
methods. Results show that our learning-based methods have the capability to
control length without degrading summary quality in a summarization task.

Distributed, Parallel, and Cluster Computing

Sprout: A functional caching approach to minimize service latency in erasure-coded storage

Vaneet Aggarwal, Yih-Farn R. Chen, Tian Lan, Yu Xiang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

Modern distributed storage systems often use erasure codes to protect against
disk and node failures to increase reliability, while trying to meet the
latency requirements of the applications and clients. Storage systems may have
caches at the proxy or client ends in order to reduce the latency. In this
paper, we consider a novel caching framework with erasure code called {em
functional caching}. Functional Caching involves using erasure-coded chunks in
the cache such that the code formed by the chunks in storage nodes and cache
combined are maximal-distance-separable (MDS) erasure codes. Based on the
arrival rates of different files, placement of file chunks on the servers, and
service time distribution of storage servers, an optimal functional caching
placement and the access probabilities of the file request from different disks
are considered. The proposed algorithm gives significant latency improvement in
both simulations and a prototyped solution in an open-source, cloud storage
deployment.

ERA Revisited: Theoretical and Experimental Evaluation

Matevž Jekovec, Andrej Brodnik
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Efficient construction of the suffix tree given an input text is an active
area of research from the time it was first introduced. Both theoretical
computer scientists and engineers tackled the problem. In this paper we focus
on the fastest practical suffix tree construction algorithm to date, ERA. We
first provide a theoretical analysis of the algorithm assuming the uniformly
random text as an input and using the PEM model of computation with respect to
the lower bounds. Secondly, we empirically confirm the theoretical results in
different test scenarios exposing the critical terms. Thirdly, we discuss the
fundamental characteristics of the input text where the fastest suffix tree
construction algorithms in practice fail. This paper serves as a foundation for
further research in the parallel text indexing area.

On the Worst-case Communication Overhead for Distributed Data Shuffling

Mohamed Attia, Ravi Tandon
Comments: To appear in Allerton 2016
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Distributed learning platforms for processing large scale data-sets are
becoming increasingly prevalent. In typical distributed implementations, a
centralized master node breaks the data-set into smaller batches for parallel
processing across distributed workers to achieve speed-up and efficiency.
Several computational tasks are of sequential nature, and involve multiple
passes over the data. At each iteration over the data, it is common practice to
randomly re-shuffle the data at the master node, assigning different batches
for each worker to process. This random re-shuffling operation comes at the
cost of extra communication overhead, since at each shuffle, new data points
need to be delivered to the distributed workers.

In this paper, we focus on characterizing the information theoretically
optimal communication overhead for the distributed data shuffling problem. We
propose a novel coded data delivery scheme for the case of no excess storage,
where every worker can only store the assigned data batches under processing.
Our scheme exploits a new type of coding opportunity and is applicable to any
arbitrary shuffle, and for any number of workers. We also present an
information theoretic lower bound on the minimum communication overhead for
data shuffling, and show that the proposed scheme matches this lower bound for
the worst-case communication overhead.

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have gained significant traction in the
field of machine learning, particularly due to their high accuracy in visual
recognition. Recent works have pushed the performance of GPU implementations of
CNNs to significantly improve their classification and training times. With
these improvements, many frameworks have become available for implementing CNNs
on both CPUs and GPUs, with no support for FPGA implementations. In this work
we present a modified version of the popular CNN framework Caffe, with FPGA
support. This allows for classification using CNN models and specialized FPGA
implementations with the flexibility of reprogramming the device when
necessary, seamless memory transactions between host and device, simple-to-use
test benches, and the ability to create pipelined layer implementations. To
validate the framework, we use the Xilinx SDAccel environment to implement an
FPGA-based Winograd convolution engine and show that the FPGA layer can be used
alongside other layers running on a host processor to run several popular CNNs
(AlexNet, GoogleNet, VGG A, Overfeat). The results show that our framework
achieves 50 GFLOPS across 3×3 convolutions in the benchmarks. This is achieved
within a practical framework, which will aid in future development of
FPGA-based CNNs.

Asynchronous Multi-Task Learning

Inci M. Baytas, Ming Yan, Anil K. Jain, Jiayu Zhou
Comments: IEEE International Conference on Data Mining (ICDM) 2016
Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Many real-world machine learning applications involve several learning tasks
which are inter-related. For example, in healthcare domain, we need to learn a
predictive model of a certain disease for many hospitals. The models for each
hospital may be different because of the inherent differences in the
distributions of the patient populations. However, the models are also closely
related because of the nature of the learning tasks modeling the same disease.
By simultaneously learning all the tasks, multi-task learning (MTL) paradigm
performs inductive knowledge transfer among tasks to improve the generalization
performance. When datasets for the learning tasks are stored at different
locations, it may not always be feasible to transfer the data to provide a
data-centralized computing environment due to various practical issues such as
high data volume and privacy. In this paper, we propose a principled MTL
framework for distributed and asynchronous optimization to address the
aforementioned challenges. In our framework, gradient update does not wait for
collecting the gradient information from all the tasks. Therefore, the proposed
method is very efficient when the communication delay is too high for some task
nodes. We show that many regularized MTL formulations can benefit from this
framework, including the low-rank MTL for shared subspace learning. Empirical
studies on both synthetic and real-world datasets demonstrate the efficiency
and effectiveness of the proposed framework.

Learning

Predicting the consequence of action in digital control state spaces

Emmanuel Daucé
Subjects: Learning (cs.LG); Systems and Control (cs.SY)

The objective of this dissertation is to shed light on some fundamental
impediments in learning control laws in continuous state spaces. In particular,
if one wants to build artificial devices capable to learn motor tasks the same
way they learn to classify signals and images, one needs to establish control
rules that do not necessitate comparisons between quantities of the surrounding
space. We propose, in that context, to take inspiration from the “end effector
control” principle, as suggested by neuroscience studies, as opposed to the
“displacement control” principle used in the classical control theory.

Asynchronous Multi-Task Learning

Many real-world machine learning applications involve several learning tasks
which are inter-related. For example, in healthcare domain, we need to learn a
predictive model of a certain disease for many hospitals. The models for each
hospital may be different because of the inherent differences in the
distributions of the patient populations. However, the models are also closely
related because of the nature of the learning tasks modeling the same disease.
By simultaneously learning all the tasks, multi-task learning (MTL) paradigm
performs inductive knowledge transfer among tasks to improve the generalization
performance. When datasets for the learning tasks are stored at different
locations, it may not always be feasible to transfer the data to provide a
data-centralized computing environment due to various practical issues such as
high data volume and privacy. In this paper, we propose a principled MTL
framework for distributed and asynchronous optimization to address the
aforementioned challenges. In our framework, gradient update does not wait for
collecting the gradient information from all the tasks. Therefore, the proposed
method is very efficient when the communication delay is too high for some task
nodes. We show that many regularized MTL formulations can benefit from this
framework, including the low-rank MTL for shared subspace learning. Empirical
studies on both synthetic and real-world datasets demonstrate the efficiency
and effectiveness of the proposed framework.

Algorithms for item categorization based on ordinal ranking data

Josh Girson, Shuchin Aeron
Comments: To appear in IEEE Allerton conference on computing, communications and control, 2016
Subjects: Learning (cs.LG); Social and Information Networks (cs.SI)

We present a new method for identifying the latent categorization of items
based on their rankings. Complimenting a recent work that uses a Dirichlet
prior on preference vectors and variational inference, we show that this
problem can be effectively dealt with using existing community detection
algorithms, with the communities corresponding to item categories. In
particular we convert the bipartite ranking data to a unipartite graph of item
affinities, and apply community detection algorithms. In this context we modify
an existing algorithm – namely the label propagation algorithm to a variant
that uses the distance between the nodes for weighting the label propagation –
to identify the categories. We propose and analyze a synthetic ordinal ranking
model and show its relation to the recently much studied stochastic block
model. We test our algorithms on synthetic data and compare performance with
several popular community detection algorithms. We also test the method on real
data sets of movie categorization from the Movie Lens database. In all of the
cases our algorithm is able to identify the categories for a suitable choice of
tuning parameter.

Charged Point Normalization: An Efficient Solution to the Saddle Point Problem

Armen Aghajanyan
Subjects: Learning (cs.LG)

Recently, the problem of local minima in very high dimensional non-convex
optimization has been challenged and the problem of saddle points has been
introduced. This paper introduces a dynamic type of normalization that forces
the system to escape saddle points. Unlike other saddle point escaping
algorithms, second order information is not utilized, and the system can be
trained with an arbitrary gradient descent learner. The system drastically
improves learning in a range of deep neural networks on various data-sets in
comparison to non-CPN neural networks.

Structured Inference Networks for Nonlinear State Space Models

Rahul G. Krishnan, Uri Shalit, David Sontag
Comments: Main paper: 13 pages, 12 Figures
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

Gaussian state space models have been used for decades as generative models
of sequential data. They admit an intuitive probabilistic interpretation, have
a simple functional form, and enjoy widespread adoption. We introduce a unified
algorithm to efficiently learn a broad class of linear and non-linear state
space models, including variants where the emission and transition
distributions are modeled by deep neural networks. Our learning algorithm
simultaneously learns a compiled inference network and the generative model,
leveraging a structured variational approximation parameterized by recurrent
neural networks to mimic the posterior distribution. We apply the learning
algorithm to both synthetic and real-world datasets, demonstrating its
scalability and versatility. We find that using the structured approximation to
the posterior results in models with significantly higher held-out likelihood.

On the Worst-case Communication Overhead for Distributed Data Shuffling

Mohamed Attia, Ravi Tandon
Comments: To appear in Allerton 2016
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Distributed learning platforms for processing large scale data-sets are
becoming increasingly prevalent. In typical distributed implementations, a
centralized master node breaks the data-set into smaller batches for parallel
processing across distributed workers to achieve speed-up and efficiency.
Several computational tasks are of sequential nature, and involve multiple
passes over the data. At each iteration over the data, it is common practice to
randomly re-shuffle the data at the master node, assigning different batches
for each worker to process. This random re-shuffling operation comes at the
cost of extra communication overhead, since at each shuffle, new data points
need to be delivered to the distributed workers.

In this paper, we focus on characterizing the information theoretically
optimal communication overhead for the distributed data shuffling problem. We
propose a novel coded data delivery scheme for the case of no excess storage,
where every worker can only store the assigned data batches under processing.
Our scheme exploits a new type of coding opportunity and is applicable to any
arbitrary shuffle, and for any number of workers. We also present an
information theoretic lower bound on the minimum communication overhead for
data shuffling, and show that the proposed scheme matches this lower bound for
the worst-case communication overhead.

Optimal spectral transportation with application to music transcription

Rémi Flamary, Cédric Févotte, Nicolas Courty, Valentin Emiya
Comments: NIPS 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

Many spectral unmixing methods rely on the non-negative decomposition of
spectral data onto a dictionary of spectral templates. In particular,
state-of-the-art music transcription systems decompose the spectrogram of the
input signal onto a dictionary of representative note spectra. The typical
measures of fit used to quantify the adequacy of the decomposition compare the
data and template entries frequency-wise. As such, small displacements of
energy from a frequency bin to another as well as variations of timber can
disproportionally harm the fit. We address these issues by means of optimal
transportation and propose a new measure of fit that treats the frequency
distributions of energy holistically as opposed to frequency-wise. Building on
the harmonic nature of sound, the new measure is invariant to shifts of energy
to harmonically-related frequencies, as well as to small and local
displacements of energy. Equipped with this new measure of fit, the dictionary
of note templates can be considerably simplified to a set of Dirac vectors
located at the target fundamental frequencies (musical pitch values). This in
turns gives ground to a very fast and simple decomposition algorithm that
achieves state-of-the-art performance on real musical data.

On Identification of Sparse Multivariable ARX Model: A Sparse Bayesian Learning Approach

J. Jin, Y. Yuan, W. Pan, D. L.T. Pham, C. J. Tomlin, A.Webb, J. Goncalves
Subjects: Systems and Control (cs.SY); Learning (cs.LG); Machine Learning (stat.ML)

This paper begins with considering the identification of sparse linear
time-invariant networks described by multivariable ARX models. Such models
possess relatively simple structure thus used as a benchmark to promote further
research. With identifiability of the network guaranteed, this paper presents
an identification method that infers both the Boolean structure of the network
and the internal dynamics between nodes. Identification is performed directly
from data without any prior knowledge of the system, including its order. The
proposed method solves the identification problem using Maximum a posteriori
estimation (MAP) but with inseparable penalties for complexity, both in terms
of element (order of nonzero connections) and group sparsity (network
topology). Such an approach is widely applied in Compressive Sensing (CS) and
known as Sparse Bayesian Learning (SBL). We then propose a novel scheme that
combines sparse Bayesian and group sparse Bayesian to efficiently solve the
problem. The resulted algorithm has a similar form of the standard Sparse Group
Lasso (SGL) while with known noise variance, it simplifies to exact re-weighted
SGL. The method and the developed toolbox can be applied to infer networks from
a wide range of fields, including systems biology applications such as
signaling and genetic regulatory networks.

Big Data analytics. Three use cases with R, Python and Spark

Philippe Besse (IMT), Brendan Guillouet (IMT), Jean-Michel Loubes (IMT)
Comments: in French, Apprentissage Statistique et Donn{‘e}es Massives, Technip, 2017, Journ{‘e}es d’Etudes en Statistisque
Subjects: Applications (stat.AP); Learning (cs.LG)

Management and analysis of big data are systematically associated with a data
distributed architecture in the Hadoop and now Spark frameworks. This article
offers an introduction for statisticians to these technologies by comparing the
performance obtained by the direct use of three reference environments: R,
Python Scikit-learn, Spark MLlib on three public use cases: character
recognition, recommending films, categorizing products. As main result, it
appears that, if Spark is very efficient for data munging and recommendation by
collaborative filtering (non-negative factorization), current implementations
of conventional learning methods (logistic regression, random forests) in MLlib
or SparkML do not ou poorly compete habitual use of these methods (R, Python
Scikit-learn) in an integrated or undistributed architecture

Social Computing for Mobile Big Data in Wireless Networks

Xing Zhang, Zhenglei Yi, Zhi Yan, Geyong Min, Wenbo Wang, Sabita Maharjan, Yan Zhang
Comments: 8 papges, 3 figures, 1 tables
Journal-ref: Computer, vol.49, no. 9, pp. 86-90, Sept. 2016
Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)

Mobile big data contains vast statistical features in various dimensions,
including spatial, temporal, and the underlying social domain. Understanding
and exploiting the features of mobile data from a social network perspective
will be extremely beneficial to wireless networks, from planning, operation,
and maintenance to optimization and marketing. In this paper, we categorize and
analyze the big data collected from real wireless cellular networks. Then, we
study the social characteristics of mobile big data and highlight several
research directions for mobile big data in the social computing areas.

Multi-dimensional signal approximation with sparse structured priors using split Bregman iterations

This paper addresses the structurally-constrained sparse decomposition of
multi-dimensional signals onto overcomplete families of vectors, called
dictionaries. The contribution of the paper is threefold. Firstly, a generic
spatio-temporal regularization term is designed and used together with the
standard $ell_1$ regularization term to enforce a sparse decomposition
preserving the spatio-temporal structure of the signal. Secondly, an
optimization algorithm based on the split Bregman approach is proposed to
handle the associated optimization problem, and its convergence is analyzed.
Our well-founded approach yields same accuracy as the other algorithms at the
state-of-the-art, with significant gains in terms of convergence speed.
Thirdly, the empirical validation of the approach on artificial and real-world
problems demonstrates the generality and effectiveness of the method. On
artificial problems, the proposed regularization subsumes the Total Variation
minimization and recovers the expected decomposition. On the real-world problem
of electro-encephalography brainwave decomposition, the approach outperforms
similar approaches in terms of P300 evoked potentials detection, using
structured spatial priors to guide the decomposition.

Max-plus statistical leverage scores

James Hook
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

The statistical leverage scores of a complex matrix $Ainmathbb{C}^{n imes
d}$ record the degree of alignment between col$(A)$ and the coordinate axes in
$mathbb{C}^n$. These score are used in random sampling algorithms for solving
certain numerical linear algebra problems. In this paper we present a max-plus
algebraic analogue for statistical leverage scores. We show that max-plus
statistical leverage scores can be used to calculate the exact asymptotic
behavior of the conventional statistical leverage scores of a generic matrices
of Puiseux series and also provide a novel way to approximate the conventional
statistical leverage scores of a fixed or complex matrix. The advantage of
approximating a complex matrices scores with max-plus scores is that the
max-plus scores can be computed very quickly. This approximation is typically
accurate to within an order or magnitude and should be useful in practical
problems where the true scores are known to vary widely.

Tensor Completion by Alternating Minimization under the Tensor Train (TT) Model

Wenqi Wang, Vaneet Aggarwal, Shuchin Aeron
Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT); Learning (cs.LG)

Using the matrix product state (MPS) representation of tensor train
decompositions, in this paper we propose a tensor completion algorithm which
alternates over the matrices (tensors) in the MPS representation. This
development is motivated in part by the success of matrix completion algorithms
which alternate over the (low-rank) factors. We comment on the computational
complexity of the proposed algorithm and numerically compare it with existing
methods employing low rank tensor train approximation for data completion as
well as several other recently proposed methods. We show that our method is
superior to existing ones for a variety of real settings.

Information Theory

On the Worst-case Communication Overhead for Distributed Data Shuffling

Mohamed Attia, Ravi Tandon
Comments: To appear in Allerton 2016
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Distributed learning platforms for processing large scale data-sets are
becoming increasingly prevalent. In typical distributed implementations, a
centralized master node breaks the data-set into smaller batches for parallel
processing across distributed workers to achieve speed-up and efficiency.
Several computational tasks are of sequential nature, and involve multiple
passes over the data. At each iteration over the data, it is common practice to
randomly re-shuffle the data at the master node, assigning different batches
for each worker to process. This random re-shuffling operation comes at the
cost of extra communication overhead, since at each shuffle, new data points
need to be delivered to the distributed workers.

In this paper, we focus on characterizing the information theoretically
optimal communication overhead for the distributed data shuffling problem. We
propose a novel coded data delivery scheme for the case of no excess storage,
where every worker can only store the assigned data batches under processing.
Our scheme exploits a new type of coding opportunity and is applicable to any
arbitrary shuffle, and for any number of workers. We also present an
information theoretic lower bound on the minimum communication overhead for
data shuffling, and show that the proposed scheme matches this lower bound for
the worst-case communication overhead.

Bit-permuted coded modulation for polar codes

Saurabha R Tavildar
Comments: 6 pages; 14 figures
Subjects: Information Theory (cs.IT)

We consider the problem of using polar codes with higher order modulation
over AWGN channels. Unlike prior work, we focus on using modulation independent
polar codes. That is, the polar codes are not re-designed based on the
modulation used. Instead, we propose bit-permuted coded modulation (BPCM): a
technique for using the multilevel coding (MLC) approach for an arbitrary polar
code. The BPCM technique exploits a natural connection between MLC and polar
codes. It involves applying bit permutations prior to mapping the polar code to
a higher order modulation. The bit permutations are designed, via density
evolution, to match the rates provided by various bit levels of the higher
order modulation to that of the polar code.

We demonstrate performance of the BPCM technique using link simulations and
density evolution for the AWGN channel. We compare the BPCM technique with the
bit-interleaved coded modulation (BICM) technique. When using polar codes
designed for BPSK modulation, we show gains for BPCM over BICM with random
interleaver of up to 0.2 dB, 0.7 dB and 1.4 dB for 4-ASK, 8-ASK, and 16-ASK
respectively.

An Overview of Sustainable Green 5G Networks

Qingqing Wu, Geoffrey Ye Li, Wen Chen, Derrick Wing Kwan Ng, Robert Schober
Comments: Submitted for possible publication
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

The stringent requirements of a 1,000 times increase in data traffic and one
millisecond round trip latency have made limiting the potentially tremendous
ensuing energy consumption one of the most challenging problems for the design
of the upcoming fifth-generation (5G) networks. To enable sustainable 5G
networks, new technologies have been proposed to improve the system energy
efficiency and alternative energy sources are introduced to reduce our
dependence on traditional fossil fuels. In particular, various 5G techniques
target the reduction of the energy consumption without sacrificing the
quality-of-service. Meanwhile, energy harvesting technologies, which enable
communication transceivers to harvest energy from various renewable resources
and ambient radio frequency signals for communi- cation, have drawn significant
interest from both academia and industry. In this article, we provide an
overview of the latest research on both green 5G techniques and energy
harvesting for communication. In addition, some technical challenges and
potential research topics for realizing sustainable green 5G networks are also
identified.

Modeling of $K$-tier Downlink Heterogeneous Cellular Networks over $κ$-$μ$ Shadowed Fading Channels

Young Jin Chun, Simon L. Cotton, Harpreet S. Dhillon, F. Javier Lopez-Martinez, José F. Paris, Seong Ki Yoo
Subjects: Information Theory (cs.IT)

Emerging cellular technologies such as those proposed for use in 5G
communications will accommodate a wide range of usage scenarios with diverse
link requirements. This will include the necessity to operate over a versatile
set of wireless channels ranging from indoor to outdoor, from line-of-sight
(LOS) to non-LOS, and from circularly symmetric scattering to environments
which promote the clustering of scattered multipath waves. Unfortunately, many
of the conventional fading models adopted in the literature to develop network
models lack the flexibility to account for such disparate signal propagation
mechanisms. To bridge the gap between theory and practical channels, we
consider $kappa$-$mu$ shadowed fading, which contains as special cases, the
majority of the linear fading models proposed in the open literature, including
Rayleigh, Rician, Nakagami-m, Nakagami-q, One-sided Gaussian, $kappa$-$mu$,
$eta$-$mu$, and Rician shadowed to name but a few. In particular, we apply an
orthogonal expansion to represent the $kappa$-$mu$ shadowed fading
distribution as a simplified series expression. Then using the series
expressions with stochastic geometry, we propose an analytic framework to
evaluate the average of an arbitrary function of the SINR over $kappa$-$mu$
shadowed fading channels. Using the proposed method, we evaluate the spectral
efficiency, moments of the SINR, bit error probability and outage probability
of a $K$-tier HetNet with $K$ classes of BSs, differing in terms of the
transmit power, BS density, shadowing characteristics and small-scale fading.
Building upon these results, we provide important new insights into the network
performance of these emerging wireless applications while considering a diverse
range of fading conditions and link qualities.

Relative two-weight $mathbb{Z}_2 mathbb{Z}_4$-additive Codes

N. Annamalai, C. Durairajan
Subjects: Information Theory (cs.IT)

In this paper, we study a relative two-weight $mathbb{Z}_2
mathbb{Z}_4$-additive codes. It is shown that the Gray image of a two-distance
$mathbb{Z}_2 mathbb{Z}_4$-additive code is a binary two-distance code and
that the Gray image of a relative two-weight $mathbb{Z}_2
mathbb{Z}_4$-additive code, with nontrivial binary part, is a linear binary
relative two-weight code. The structure of relative two-weight $mathbb{Z}_2
mathbb{Z}_4$-additive codes are described. Finally, we discussed permutation
automorphism group of a $mathbb{Z}_2 mathbb{Z}_4$-additive codes.

Joint Channel Estimation / Data Detection in MIMO-FBMC/OQAM Systems – A Tensor-Based Approach

Eleftherios Kofidis, Christos Chatzichristos, Andre L. F. de Almeida
Subjects: Information Theory (cs.IT)

Filter bank-based multicarrier (FBMC) systems are currently being considered
as a prevalent candidate for replacing the long established cyclic prefix
(CP)-based orthogonal frequency division multiplexing (CP-OFDM) in the physical
layer of next generation communications systems. In particular, FBMC/OQAM has
received increasing attention due to, among other features, its potential for
maximum spectral efficiency. It suffers, however, from an intrinsic
self-interference effect, which complicates signal processing tasks at the
receiver, including synchronization, channel estimation and equalization. In a
multiple-input multiple-output (MIMO) configuration, the multi-antenna
interference has also to be taken into account. (Semi-)blind FBMC/OQAM
receivers have been little studied so far and mainly for single-antenna
systems. The problem of joint channel estimation and data detection in a
MIMO-FBMC/OQAM system, given limited or no training information, is studied in
this paper through a tensor-based approach in the light of the success of such
techniques in OFDM applications. Simulation-based comparisons with CP-OFDM are
included, for realistic transmission models.

Sparse Methods for Direction-of-Arrival Estimation

Zai Yang, Jian Li, Petre Stoica, Lihua Xie
Comments: 64 pages, overview article
Subjects: Information Theory (cs.IT)

Direction of arrival (DOA) estimation refers to the process of retrieving the
direction information of several electromagnetic waves/sources from the outputs
of a number of receiving antennas that form a sensor array. DOA estimation is a
major problem in array signal processing and has wide applications in radar,
sonar, wireless communications, etc. The purpose of this article is to provide
an overview of the recent work on sparse DOA estimation methods.

Numerical Approach to Maximize SNR for CDMA Systems

Hirofumi Tsuda, Ken Umeno
Comments: 6 pages, 5 figures
Subjects: Information Theory (cs.IT)

Signal to Noise Ratio (SNR) is an important index for wireless
communications. There are many methods for increasing SNR. In CDMA systems,
spreading sequences are used. To increase SNR, we have to improve spreading
sequences. In classical approaches, the expression of SNR is not differentiable
in terms of the parameter of the spreading sequences even in no fading
situations. Thus, we express it as the differentiable form and construct the
non-linear programing for maximizing SNR. In particular, we solve the problem
of maximizing SNR numerically by obtaining spreading sequences whose SNR is
guaranteed to be high.

On Weighted MSE Model for MIMO Transceiver Optimization

Chengwen Xing, Yindi Jing, Yiqing Zhou
Comments: 34 Pages, 4 figures
Subjects: Information Theory (cs.IT)

Mean-squared-error (MSE) is one of the most widely used performance metrics
for the designs and analysis of multi-input-multiple-output (MIMO)
communications. Weighted MSE minimization, a more general formulation of MSE
minimization, plays an important role in MIMO transceiver optimization. While
this topic has a long history and has been extensively studied, existing
treatments on the methods in solving the weighted MSE optimization are more or
less sporadic and non-systematic. In this paper, we firstly review the two
major methodologies, Lagrange multiplier method and majorization theory based
method, and their common procedures in solving the weighted MSE minimization.
Then some problems and limitations of the methods that were usually neglected
or glossed over in existing literature are provided. These problems are
fundamental and of critical importance for the corresponding MIMO transceiver
optimizations. In addition, a new extended matrix-field weighted MSE model is
proposed. Its solutions and applications are discussed in details. Compared
with existing models, this new model has wider applications, e.g., nonlinear
MIMO transceiver designs and capacity-maximization transceiver designs for
general MIMO networks.

The Generalized Reed-Muller codes in a modular group algebra

Harinaivo Andriatahiny, Vololona Harinoro Rakotomalala
Subjects: Information Theory (cs.IT)

First we study some properties of the modular group algebra
$mathbb{F}_{p^r}[G]$ where $G$ is the additive group of a Galois ring of
characteristic $p^r$ and $mathbb{F}_{p^r}$ is the field of $p^r$ elements.
Secondly a description of the Generalized Reed-Muller codes over
$mathbb{F}_{p^r}$ in $mathbb{F}_{p^r}[G]$ is presented.

Sprout: A functional caching approach to minimize service latency in erasure-coded storage

Vaneet Aggarwal, Yih-Farn R. Chen, Tian Lan, Yu Xiang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

Modern distributed storage systems often use erasure codes to protect against
disk and node failures to increase reliability, while trying to meet the
latency requirements of the applications and clients. Storage systems may have
caches at the proxy or client ends in order to reduce the latency. In this
paper, we consider a novel caching framework with erasure code called {em
functional caching}. Functional Caching involves using erasure-coded chunks in
the cache such that the code formed by the chunks in storage nodes and cache
combined are maximal-distance-separable (MDS) erasure codes. Based on the
arrival rates of different files, placement of file chunks on the servers, and
service time distribution of storage servers, an optimal functional caching
placement and the access probabilities of the file request from different disks
are considered. The proposed algorithm gives significant latency improvement in
both simulations and a prototyped solution in an open-source, cloud storage
deployment.

Contextuality in multipartie pseudo-telepathy graph games

Peter Hoyer, Mehdi Mhalla, Simon Perdrix
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

Analyzing pseudo-telepathy graph games, we propose a way to build
contextuality scenarios exhibiting the quantum supremacy using graph states. We
consider the combinatorial structures generating equivalent scenarios. We
investigate which scenarios are more multipartite and show that there exist
graphs generating scenarios with a linear multipartiteness width.

The value of timing information in event-triggered control: The scalar case

Mohammad Javad Khojasteh, Pavankumar Tallapragada, Jorge Cortés, Massimo Franceschetti
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Systems and Control (cs.SY)

The problem of event-triggered control with rate limited communication is
considered. For continuous-time scalar systems without disturbances, a phase
transition behavior of the transmission rate required for stabilization as a
function of the communication delay is revealed. It is shown that for low
values of the delay the timing information carried by the triggering events is
large and the system can be stabilized with any positive rate. On the other
hand, when the delay exceeds a certain threshold that depends on the given
triggering strategy, the timing information alone is not enough to achieve
stabilization and the rate must begin to grow, eventually becoming larger than
what required by the classic data-rate theorem. The critical point where the
transmission rate equals the one imposed by the data-rate theorem occurs when
the delay equals the inverse of the entropy rate of the plant, representing the
intrinsic rate at which the system generates information. At this critical
point, the timing information supplied by event triggering is completely
balanced by the information loss due to the communication delay. Exponential
convergence guarantees are also discussed, and an explicit construction
providing a sufficient condition for stabilization is given.

One-Lee weight and two-Lee weight $mathbb{Z}_2mathbb{Z}_2[u]$-additive codes

Zhenliang Lu, Shixin Zhu, Liqi Wang, Xiaoshan Kai
Subjects: Rings and Algebras (math.RA); Information Theory (cs.IT)

In this paper, we study one-Lee weight and two-Lee weight codes over
$mathbb{Z}_{2}mathbb{Z}_{2}[u]$, where $u^{2}=0$. Some properties of one-Lee
weight $mathbb{Z}_{2}mathbb{Z}_{2}[u]$-additive codes are given, and a
complete classification of one-Lee weight
$mathbb{Z}_2mathbb{Z}_2[u]$-additive formally self-dual codes is obtained.
The structure of two-Lee weight projective $mathbb{Z}_2mathbb{Z}_2[u]$ codes
are determined. Some optimal binary linear codes are obtained directly from
one-Lee weight and two-Lee weight $mathbb{Z}_{2}mathbb{Z}_{2}[u]$-additive
codes via the extended Gray map.

Reversibility Problem of Multidimensional Finite Cellular Automata

Chih-Hung Chang, Hasan Akın
Subjects: Dynamical Systems (math.DS); Information Theory (cs.IT)

While the reversibility of multidimensional cellular automata is undecidable
and there exists a criterion for determining if a multidimensional linear
cellular automaton is reversible, there are only a few results about the
reversibility problem of multidimensional linear cellular automata under
boundary conditions. This work proposes a criterion for testing the
reversibility of a multidimensional linear cellular automaton under null
boundary condition and an algorithm for the computation of its reverse, if it
exists. The investigation of the dynamical behavior of a multidimensional
linear cellular automaton under null boundary condition is equivalent to
elucidating the properties of block Toeplitz matrix. The proposed criterion
significantly reduce the computational cost whenever the number of cells or the
dimension is large; the discussion can also apply to cellular automata under
periodic boundary condition with a minor modification.

Fast $L_1$-$L_2$ Minimization via a Proximal Operator

Yifei Lou, Ming Yan
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Numerical Analysis (math.NA)

This paper aims to develop new and fast algorithms for recovering a sparse
vector from a small number of measurements, which is a fundamental problem in
the field of compressive sensing (CS). Currently, CS favors incoherent systems,
in which any two measurements are as little correlated as possible. In reality,
however, many problems are coherent, and conventional methods such as $L_1$
minimization do not work well. Recently, the difference of the $L_1$ and $L_2$
norms, denoted as $L_1$-$L_2$, is shown to have superior performance over the
classic $L_1$ method, but it is computationally expensive. We derive an
analytical solution for the proximal operator of the $L_1$-$L_2$ metric, and it
makes some fast $L_1$ solvers such as forward-backward splitting (FBS) and
alternative direction method of multipliers (ADMM) applicable for $L_1$-$L_2$.
We describe in details how to incorporate the proximal operator into FBS and
ADMM and show that the resulting algorithms are convergent under mild
conditions. Both algorithms are shown to be much more efficient than the
original implementation of $L_1$-$L_2$ based on a difference-of-convex approach
in the numerical experiments.