Luke B. Godfrey, Michael S. Gashler
Comments: 13 pages, 11 figures, IEEE TNNLS Preprint
Subjects: Neural and Evolutionary Computing (cs.NE)
We present a neural network technique for the analysis and extrapolation of
time-series data called Neural Decomposition (ND). Units with a sinusoidal
activation function are used to perform a Fourier-like decomposition of
training samples into a sum of sinusoids, augmented by units with nonperiodic
activation functions to capture linear trends and other nonperiodic components.
We show how careful weight initialization can be combined with regularization
to form a simple model that generalizes well. Our method generalizes
effectively on the Mackey-Glass series, a dataset of unemployment rates as
reported by the U.S. Department of Labor Statistics, a time-series of monthly
international airline passengers, the monthly ozone concentration in downtown
Los Angeles, and an unevenly sampled time-series of oxygen isotope measurements
from a cave in north India. We find that ND outperforms popular time-series
forecasting techniques including LSTM, echo state networks, ARIMA, SARIMA, SVR
with a radial basis function, and Gashler and Ashmore’s model.
Tao Lei, Wengong Jin, Regina Barzilay, Tommi Jaakkola
Comments: to appear at ICML 2017; includes additional discussions
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL)
The design of neural architectures for structured objects is typically guided
by experimental insights rather than a formal process. In this work, we appeal
to kernels over combinatorial structures, such as sequences and graphs, to
derive appropriate neural operations. We introduce a class of deep recurrent
neural operations and formally characterize their associated kernel spaces. Our
recurrent modules compare the input to virtual reference objects (cf. filters
in CNN) via the kernels. Similar to traditional neural operations, these
reference objects are parameterized and directly optimized in end-to-end
training. We empirically evaluate the proposed class of neural architectures on
standard applications such as language modeling and molecular graph regression,
achieving state-of-the-art or competitive results across these applications. We
also draw connections to existing architectures such as LSTMs.
Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Whye Teh
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The evidence lower bound (ELBO) appears in many algorithms for maximum
likelihood estimation (MLE) with latent variables because it is a sharp lower
bound of the marginal log-likelihood. For neural latent variable models,
optimizing the ELBO jointly in the variational posterior and model parameters
produces state-of-the-art results. Inspired by the success of the ELBO as a
surrogate MLE objective, we consider the extension of the ELBO to a family of
lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We
show that the tightness of such bounds is asymptotically related to the
variance of the underlying estimator. We introduce a special case, the
filtering variational objectives (FIVOs), which takes the same arguments as the
ELBO and passes them through a particle filter to form a tighter bound. FIVOs
can be optimized tractably with stochastic gradients, and are particularly
suited to MLE in sequential latent variable models. In standard sequential
generative modeling tasks we present uniform improvements over models trained
with ELBO, including some whole nat-per-timestep improvements.
Wenjian Hu, Krishna Kumar Singh, Fanyi Xiao, Jinyoung Han, Chen-Nee Chuah, Yong Jae Lee
Comments: 10 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)
Content popularity prediction has been extensively studied due to its
importance and interest for both users and hosts of social media sites like
Facebook, Instagram, Twitter, and Pinterest. However, existing work mainly
focuses on modeling popularity using a single metric such as the total number
of likes or shares. In this work, we propose Diffusion-LSTM, a memory-based
deep recurrent network that learns to recursively predict the entire diffusion
path of an image through a social network. By combining user social features
and image features, and encoding the diffusion path taken thus far with an
explicit memory cell, our model predicts the diffusion path of an image more
accurately compared to alternate baselines that either encode only image or
social features, or lack memory. By mapping individual users to user
prototypes, our model can generalize to new users not seen during training.
Finally, we demonstrate our model’s capability of generating diffusion trees,
and show that the generated trees closely resemble ground-truth trees.
Sultan Imangaliyev, Monique H. van der Veen, Catherine M. C. Volgenant, Bruno G. Loos, Bart J. F. Keijser, Wim Crielaard, Evgeni Levin
Comments: Full version of ICANN 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Images are an important data source for diagnosis and treatment of oral
diseases. The manual classification of images may lead to misdiagnosis or
mistreatment due to subjective errors. In this paper an image classification
model based on Convolutional Neural Network is applied to Quantitative
Light-induced Fluorescence images. The deep neural network outperforms other
state of the art shallow classification models in predicting labels derived
from three different dental plaque assessment scores. The model directly
benefits from multi-channel representation of the images resulting in improved
performance when, besides the Red colour channel, additional Green and Blue
colour channels are used.
Konda Reddy Mopuri, Vishal B. Athreya, R. Venkatesh Babu
Comments: ICME 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning exploits large volumes of labeled data to learn powerful
models. When the target dataset is small, it is a common practice to perform
transfer learning using pre-trained models to learn new task specific
representations. However, pre-trained CNNs for image recognition are provided
with limited information about the image during training, which is label alone.
Tasks such as scene retrieval suffer from features learned from this weak
supervision and require stronger supervision to better understand the contents
of the image. In this paper, we exploit the features learned from caption
generating models to learn novel task specific image representations. In
particular, we consider the state-of-the art captioning system Show and
Tell~cite{SnT-pami-2016} and the dense region description model
DenseCap~cite{densecap-cvpr-2016}. We demonstrate that, owing to richer
supervision provided during the process of training, the features learned by
the captioning system perform better than those of CNNs. Further, we train a
siamese network with a modified pair-wise loss to fuse the features learned
by~cite{SnT-pami-2016} and~cite{densecap-cvpr-2016} and learn image
representations suitable for retrieval. Experiments show that the proposed
fusion exploits the complementary nature of the individual features and yields
state-of-the art retrieval results on benchmark datasets.
Nader Mahmoud, Alexandre Hostettler, Toby Collins, Luc Soler, Christophe Doignon, J.M.M. Montiel
Comments: ICRA 2017 workshop C4 Surgical Robots: Compliant, Continuum, Cognitive, and Collaborative
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recovering surgical scene structure in laparoscope surgery is crucial step
for surgical guidance and augmented reality applications. In this paper, a
quasi dense reconstruction algorithm of surgical scene is proposed. This is
based on a state-of-the-art SLAM system, and is exploiting the initial
exploration phase that is typically performed by the surgeon at the beginning
of the surgery. We show how to convert the sparse SLAM map to a quasi dense
scene reconstruction, using pairs of keyframe images and correlation-based
featureless patch matching. We have validated the approach with a live porcine
experiment using Computed Tomography as ground truth, yielding a Root Mean
Squared Error of 4.9mm.
Tong Shen, Guosheng Lin, Lingqiao Liu, Chunhua Shen, Ian Reid
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Training a Fully Convolutional Network (FCN) for semantic segmentation
requires a large number of pixel-level masks, which involves a large amount of
human labour and time for annotation. In contrast, image-level labels are much
easier to obtain. In this work, we propose a novel method for weakly supervised
semantic segmentation with only image-level labels. The method relies on a
large scale co-segmentation framework that can produce object masks for a group
of images containing objects belonging to the same semantic class. We first
retrieve images from search engines, e.g. Flickr and Google, using semantic
class names as queries, e.g. class names in PASCAL VOC 2012. We then use high
quality masks produced by co-segmentation on the retrieved images as well as
the target dataset images with image level labels to train segmentation
networks. We obtain IoU 56.9 on test set of PASCAL VOC 2012, which reaches
state of the art performance.
Aiden Nibali, Zhen He, Stuart Morgan, Daniel Greenwood
Comments: To appear at CVsports 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Due to recent advances in technology, the recording and analysis of video
data has become an increasingly common component of athlete training
programmes. Today it is incredibly easy and affordable to set up a fixed camera
and record athletes in a wide range of sports, such as diving, gymnastics,
golf, tennis, etc. However, the manual analysis of the obtained footage is a
time-consuming task which involves isolating actions of interest and
categorizing them using domain-specific knowledge. In order to automate this
kind of task, three challenging sub-problems are often encountered: 1)
temporally cropping events/actions of interest from continuous video; 2)
tracking the object of interest; and 3) classifying the events/actions of
interest.
Most previous work has focused on solving just one of the above sub-problems
in isolation. In contrast, this paper provides a complete solution to the
overall action monitoring task in the context of a challenging real-world
exemplar. Specifically, we address the problem of diving classification. This
is a challenging problem since the person (diver) of interest typically
occupies fewer than 1% of the pixels in each frame. The model is required to
learn the temporal boundaries of a dive, even though other divers and
bystanders may be in view. Finally, the model must be sensitive to subtle
changes in body pose over a large number of frames to determine the
classification code. We provide effective solutions to each of the sub-problems
which combine to provide a highly functional solution to the task as a whole.
The techniques proposed can be easily generalized to video footage recorded
from other sports.
Gregery T. Buzzard, Suhas Sreehari, Charles A. Bouman
Comments: 14 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
Regularized inversion methods for image reconstruction are used widely due to
their tractability and their ability to combine complex physical sensor models
with useful regularity criteria. Such methods were used in the recently
developed Plug-and-Play prior method, which provides a framework to use
advanced denoising algorithms as regularizers in inversion. However, the need
to formulate regularized inversion as the solution to an optimization problem
severely limits both the expressiveness of possible regularity conditions and
the variety of provably convergent Plug-and-Play denoising operators.
In this paper, we introduce the concept of consensus equilibrium (CE), which
generalizes regularized inversion to include a much wider variety of regularity
operators without the need for an optimization formulation. Consensus
equilibrium is based on the solution of a set of equilibrium equations that
balance data fit and regularity. In this framework, the problem of MAP
estimation in regularized inversion is replaced by the problem of solving these
equilibrium equations, which can be approached in multiple ways, including as a
fixed point problem that generalizes the ADMM approach used in the
Plug-and-Play method.
We present the Douglas-Rachford (DR) algorithm for computing the CE solution
as a fixed point and prove the convergence of this algorithm under conditions
that include denoising operators that do not arise from optimization problems
and that may not be nonexpansive. We give several examples to illustrate the
idea of consensus equilibrium and the convergence properties of the DR
algorithm and demonstrate this method on a sparse interpolation problem using
electron microscopy data.
Quanzeng You, Darío García-García, Mahohar Paluri, Jiebo Luo, Jungseock Joo
Comments: 10 pages, To appear in ICWSM 2017 (Full Paper)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)
Online social media is a social vehicle in which people share various moments
of their lives with their friends, such as playing sports, cooking dinner or
just taking a selfie for fun, via visual means, that is, photographs. Our study
takes a closer look at the popular visual concepts illustrating various
cultural lifestyles from aggregated, de-identified photographs. We perform
analysis both at macroscopic and microscopic levels, to gain novel insights
about global and local visual trends as well as the dynamics of interpersonal
cultural exchange and diffusion among Facebook friends. We processed images by
automatically classifying the visual content by a convolutional neural network
(CNN). Through various statistical tests, we find that socially tied
individuals more likely post images showing similar cultural lifestyles. To
further identify the main cause of the observed social correlation, we use the
Shuffle test and the Preference-based Matched Estimation (PME) test to
distinguish the effects of influence and homophily. The results indicate that
the visual content of each user’s photographs are temporally, although not
necessarily causally, correlated with the photographs of their friends, which
may suggest the effect of influence. Our paper demonstrates that Facebook
photographs exhibit diverse cultural lifestyles and preferences and that the
social interaction mediated through the visual channel in social media can be
an effective mechanism for cultural diffusion.
Clement Zotti, Zhiming Luo, Alain Lalande, Olivier Humbert, Pierre-Marc Jodoin
Comments: 14 pages, 4 tables, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose a fully automatic MRI cardiac segmentation method
based on a novel deep convolutional neural network (CNN). As opposed to most
cardiac segmentation methods which focus on the left ventricle (and especially
the left cavity), our method segments both the left ventricular cavity, the
left ventricular epicardium, and the right ventricular cavity. The novelty of
our network lies in its maximum a posteriori loss function, which is
specifically designed for the cardiac anatomy. Our loss function incorporates
the cross-entropy of the predicted labels, the predicted contours, a cardiac
shape prior, and an a priori term. Our model also includes a cardiac
center-of-mass regression module which allows for an automatic shape prior
registration. Also, since our method processes raw MR images without any manual
preprocessing and/or image cropping, our CNN learns both high-level features
(useful to distinguish the heart from other organs with a similar shape) and
low-level features (useful to get accurate segmentation results). Those
features are learned with a multi-resolution conv-deconv “grid” architecture
which can be seen as an extension of the U-Net.
We trained and tested our model on the ACDC MICCAI’17 challenge dataset of
150 patients whose diastolic and systolic images were manually outlined by 2
medical experts. Results reveal that our method can segment all three regions
of a 3D MRI cardiac volume in (0.4) second with an average Dice index of
(0.90), which is significantly better than state-of-the-art deep learning
methods.
Tao Zhou, Muhao Chen, Jie Yu, Demetri Terzopoulos
Comments: CVPR 2017 Workshop (vision meets cognition)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Following the recent progress in image classification and captioning using
deep learning, we develop a novel natural language person retrieval system
based on an attention mechanism. More specifically, given the description of a
person, the goal is to localize the person in an image. To this end, we first
construct a benchmark dataset for natural language person retrieval. To do so,
we generate bounding boxes for persons in a public image dataset from the
segmentation masks, which are then annotated with descriptions and attributes
using the Amazon Mechanical Turk. We then adopt a region proposal network in
Faster R-CNN as a candidate region generator. The cropped images based on the
region proposals as well as the whole images with attention weights are fed
into Convolutional Neural Networks for visual feature extraction, while the
natural language expression and attributes are input to Bidirectional Long
Short- Term Memory (BLSTM) models for text feature extraction. The visual and
text features are integrated to score region proposals, and the one with the
highest score is retrieved as the output of our system. The experimental
results show significant improvement over the state-of-the-art method for
generic object retrieval and this line of research promises to benefit search
in surveillance video footage.
Lei Deng, Peng Jiao, Jing Pei, Zhenzhi Wu, Guoqi Li
Comments: 9 pages, 10 figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
There is a pressing need to build an architecture that could subsume these
networks undera unified framework that achieves both higher performance and
less overhead. To this end, two fundamental issues are yet to be addressed. The
first one is how to implement the back propagation when neuronal activations
are discrete. The second one is how to remove the full-precision hidden weights
in the training phase to break the bottlenecks of memory/computation
consumption. To address the first issue, we present a multistep neuronal
activation discretization method and a derivative approximation technique that
enable the implementing the back propagation algorithm on discrete DNNs. While
for the second issue, we propose a discrete state transition (DST) methodology
to constrain the weights in a discrete space without saving the hidden weights.
In this way, we build a unified framework that subsumes the binary or ternary
networks as its special cases.More particularly, we find that when both the
weights and activations become ternary values, the DNNs can be reduced to gated
XNOR networks (or sparse binary networks) since only the event of non-zero
weight and non-zero activation enables the control gate to start the XNOR logic
operations in the original binary networks. This promises the event-driven
hardware design for efficient mobile intelligence. We achieve advanced
performance compared with state-of-the-art algorithms. Furthermore,the
computational sparsity and the number of states in the discrete space can be
flexibly modified to make it suitable for various hardware platforms.
Milad Mozafari, Saeed Reza Kheradpisheh, Timothée Masquelier, Abbas Nowzari-Dalini, Mohammad Ganjtabesh
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV)
Reinforcement learning (RL) has recently regained popularity, with major
achievements such as beating the European game of Go champion. Here, for the
first time, we show that RL can be used efficiently to train a spiking neural
network (SNN) to perform object recognition in natural images without using an
external classifier. We used a feedforward convolutional SNN and a temporal
coding scheme where the most strongly activated neurons fire first, while less
activated ones fire later, or not at all. In the highest layers, each neuron
was assigned to an object category, and it was assumed that the stimulus
category was the category of the first neuron to fire. If this assumption was
correct, the neuron was rewarded, i.e. spike-timing-dependent plasticity (STDP)
was applied, which reinforced the neuron’s selectivity. Otherwise, anti-STDP
was applied, which encouraged the neuron to learn something else. As
demonstrated on various image datasets (Caltech, ETH-80, and NORB), this reward
modulated STDP (R-STDP) approach extracted particularly discriminative visual
features, whereas classic unsupervised STDP extracts any feature that
consistently repeats. As a result, R-STDP outperformed STDP on these datasets.
Furthermore, R-STDP is suitable for online learning, and can adapt to drastic
changes such as label permutations. Finally, it is worth mentioning that both
feature extraction and classification were done with spikes, using at most one
spike per neuron. Thus the network is hardware friendly and energy efficient.
Quentin Bateux, Eric Marchand, Jurgen Leitner, Francois Chaumette
Comments: 6 pages, 7 figures
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
We present a deep neural network-based method to perform high-precision,
robust and real-time 6 DOF visual servoing. The paper describes how to create a
dataset simulating various perturbations (occlusions and lighting conditions)
from a single real-world image of the scene. A convolutional neural network is
fine-tuned using this dataset to estimate the relative pose between two images
of the same scene. The output of the network is then employed in a visual
servoing control scheme. The method converges robustly even in difficult
real-world settings with strong lighting variations and occlusions.A
positioning error of less than one millimeter is obtained in experiments with a
6 DOF robot.
Liang Zhao, Yang Wang, Yi Yang, Wei Xu
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
This paper presents two unsupervised learning layers (UL layers) for
label-free video analysis: one for fully connected layers, and the other for
convolutional ones. The proposed UL layers can play two roles: they can be the
cost function layer for providing global training signal; meanwhile they can be
added to any regular neural network layers for providing local training signals
and combined with the training signals backpropagated from upper layers for
extracting both slow and fast changing features at layers of different depths.
Therefore, the UL layers can be used in either pure unsupervised or
semi-supervised settings. Both a closed-form solution and an online learning
algorithm for two UL layers are provided. Experiments with unlabeled synthetic
and real-world videos demonstrated that the neural networks equipped with UL
layers and trained with the proposed online learning algorithm can extract
shape and motion information from video sequences of moving objects. The
experiments demonstrated the potential applications of UL layers and online
learning algorithm to head orientation estimation and moving object
localization.
Matthew Amodio, Swarat Chaudhuri, Thomas Reps
Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Recurrent neural networks have achieved remarkable success at generating
sequences with complex structures, thanks to advances that include richer
embeddings of input and cures for vanishing gradients. Trained only on
sequences from a known grammar, though, they can still struggle to learn rules
and constraints of the grammar.
Neural Attribute Machines (NAMs) are equipped with a logical machine that
represents the underlying grammar, which is used to teach the constraints to
the neural machine by (i) augmenting the input sequence, and (ii) optimizing a
custom loss function. Unlike traditional RNNs, NAMs are exposed to the grammar,
as well as samples from the language of the grammar. During generation, NAMs
make significantly fewer violations of the constraints of the underlying
grammar than RNNs trained only on samples from the language of the grammar.
Begum Genc, Mohamed Siala, Barry O'Sullivan, Gilles Simonin
Comments: Accepted for IJCAI 2017
Subjects: Artificial Intelligence (cs.AI)
We study the notion of robustness in stable matching problems. We first
define robustness by introducing (a,b)-supermatches. An ((a,b))-supermatch is a
stable matching in which if (a) pairs break up it is possible to find another
stable matching by changing the partners of those (a) pairs and at most (b)
other pairs. In this context, we define the most robust stable matching as a
((1,b))-supermatch where b is minimum. We show that checking whether a given
stable matching is a ((1,b))-supermatch can be done in polynomial time. Next,
we use this procedure to design a constraint programming model, a local search
approach, and a genetic algorithm to find the most robust stable matching. Our
empirical evaluation on large instances show that local search outperforms the
other approaches.
Yihui He, Ming Xiang
Comments: 4 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI)
With applications to many disciplines, the traveling salesman problem (TSP)
is a classical computer science optimization problem with applications to
industrial engineering, theoretical computer science, bioinformatics, and
several other disciplines. In recent years, there have been a plethora of novel
approaches for approximate solutions ranging from simplistic greedy to
cooperative distributed algorithms derived from artificial intelligence. In
this paper, we perform an evaluation and analysis of cornerstone algorithms for
the Euclidean TSP. We evaluate greedy, 2-opt, and genetic algorithms. We use
several datasets as input for the algorithms including a small dataset, a
mediumsized dataset representing cities in the United States, and a synthetic
dataset consisting of 200 cities to test algorithm scalability. We discover
that the greedy and 2-opt algorithms efficiently calculate solutions for
smaller datasets. Genetic algorithm has the best performance for optimality for
medium to large datasets, but generally have longer runtime. Our
implementations is public available.
Ashley D. Edwards, Charles L. Isbell Jr
Comments: A shorter version of this paper was accepted to RLDM (this http URL)
Subjects: Artificial Intelligence (cs.AI)
In reinforcement learning, we often define goals by specifying rewards within
desirable states. One problem with this approach is that we typically need to
redefine the rewards each time the goal changes, which often requires some
understanding of the solution in the agents environment. When humans are
learning to complete tasks, we regularly utilize alternative sources that guide
our understanding of the problem. Such task representations allow one to
specify goals on their own terms, thus providing specifications that can be
appropriately interpreted across various environments. This motivates our own
work, in which we represent goals in environments that are different from the
agents. We introduce Cross-Domain Perceptual Reward (CDPR) functions, learned
rewards that represent the visual similarity between an agents state and a
cross-domain goal image. We report results for learning the CDPRs with a deep
neural network and using them to solve two tasks with deep reinforcement
learning.
Himanshu Sahni, Saurabh Kumar, Farhan Tejani, Yannick Schroecker, Charles Isbell
Comments: 5 pages, 6 figures; 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2017), Ann Arbor, Michigan
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Typical reinforcement learning (RL) agents learn to complete tasks specified
by reward functions tailored to their domain. As such, the policies they learn
do not generalize even to similar domains. To address this issue, we develop a
framework through which a deep RL agent learns to generalize policies from
smaller, simpler domains to more complex ones using a recurrent attention
mechanism. The task is presented to the agent as an image and an instruction
specifying the goal. This meta-controller guides the agent towards its goal by
designing a sequence of smaller subtasks on the part of the state space within
the attention, effectively decomposing it. As a baseline, we consider a setup
without attention as well. Our experiments show that the meta-controller learns
to create subgoals within the attention.
Ivan Donadello, Luciano Serafini, Artur d'Avila Garcez
Comments: 14 pages, 2 figures, IJCAI 2017
Subjects: Artificial Intelligence (cs.AI)
Semantic Image Interpretation (SII) is the task of extracting structured
semantic descriptions from images. It is widely agreed that the combined use of
visual data and background knowledge is of great importance for SII. Recently,
Statistical Relational Learning (SRL) approaches have been developed for
reasoning under uncertainty and learning in the presence of data and rich
knowledge. Logic Tensor Networks (LTNs) are an SRL framework which integrates
neural networks with first-order fuzzy logic to allow (i) efficient learning
from noisy data in the presence of logical constraints, and (ii) reasoning with
logical formulas describing general properties of the data. In this paper, we
develop and apply LTNs to two of the main tasks of SII, namely, the
classification of an image’s bounding boxes and the detection of the relevant
part-of relations between objects. To the best of our knowledge, this is the
first successful application of SRL to such SII tasks. The proposed approach is
evaluated on a standard image processing benchmark. Experiments show that the
use of background knowledge in the form of logical constraints can improve the
performance of purely data-driven approaches, including the state-of-the-art
Fast Region-based Convolutional Neural Networks (Fast R-CNN). Moreover, we show
that the use of logical background knowledge adds robustness to the learning
system when errors are present in the labels of the training data.
Roni Stern, Brendan Juba
Journal-ref: International Joint Conference on Artificial Intelligence (IJCAI)
2017
Subjects: Artificial Intelligence (cs.AI)
In this paper we explore the theoretical boundaries of planning in a setting
where no model of the agent’s actions is given. Instead of an action model, a
set of successfully executed plans are given and the task is to generate a plan
that is safe, i.e., guaranteed to achieve the goal without failing. To this
end, we show how to learn a conservative model of the world in which actions
are guaranteed to be applicable. This conservative model is then given to an
off-the-shelf classical planner, resulting in a plan that is guaranteed to
achieve the goal. However, this reduction from a model-free planning to a
model-based planning is not complete: in some cases a plan will not be found
even when such exists. We analyze the relation between the number of observed
plans and the likelihood that our conservative approach will indeed fail to
solve a solvable problem. Our analysis show that the number of trajectories
needed scales gracefully.
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cooperative multi-agent systems can be naturally used to model many real
world problems, such as network packet routing and the coordination of
autonomous vehicles. There is a great need for new reinforcement learning
methods that can efficiently learn decentralised policies for such systems. To
this end, we propose a new multi-agent actor-critic method called
counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised
critic to estimate the Q-function and decentralised actors to optimise the
agents’ policies. In addition, to address the challenges of multi-agent credit
assignment, it uses a counterfactual baseline that marginalises out a single
agent’s action, while keeping the other agents’ actions fixed. COMA also uses a
critic representation that allows the counterfactual baseline to be computed
efficiently in a single forward pass. We evaluate COMA in the testbed of
StarCraft unit micromanagement, using a decentralised variant with significant
partial observability. COMA significantly improves average performance over
other multi-agent actor-critic methods in this setting, and the best performing
agents are competitive with state-of-the-art centralised controllers that get
access to the full state.
Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Whye Teh
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The evidence lower bound (ELBO) appears in many algorithms for maximum
likelihood estimation (MLE) with latent variables because it is a sharp lower
bound of the marginal log-likelihood. For neural latent variable models,
optimizing the ELBO jointly in the variational posterior and model parameters
produces state-of-the-art results. Inspired by the success of the ELBO as a
surrogate MLE objective, we consider the extension of the ELBO to a family of
lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We
show that the tightness of such bounds is asymptotically related to the
variance of the underlying estimator. We introduce a special case, the
filtering variational objectives (FIVOs), which takes the same arguments as the
ELBO and passes them through a particle filter to form a tighter bound. FIVOs
can be optimized tractably with stochastic gradients, and are particularly
suited to MLE in sequential latent variable models. In standard sequential
generative modeling tasks we present uniform improvements over models trained
with ELBO, including some whole nat-per-timestep improvements.
Walid Chaabene, Bert Huang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Incremental methods for structure learning of pairwise Markov random fields
(MRFs), such as grafting, improve scalability to large systems by avoiding
inference over the entire feature space in each optimization step. Instead,
inference is performed over an incrementally grown active set of features. In
this paper, we address the computational bottlenecks that current techniques
still suffer by introducing online edge grafting, an incremental, structured
method that activates edges as groups of features in a streaming setting. The
framework is based on reservoir sampling of edges that satisfy a necessary
activation condition, approximating the search for the optimal edge to
activate. Online edge grafting performs an informed edge search set
reorganization using search history and structure heuristics. Experiments show
a significant computational speedup for structure learning and a controllable
trade-off between the speed and the quality of learning.
Han Zhao, Zhenyao Zhu, Junjie Hu, Adam Coates, Geoff Gordon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
We propose a probabilistic framework for domain adaptation that blends both
generative and discriminative modeling in a principled way. By maximizing both
the marginal and the conditional log-likelihoods, models derived from this
framework can use both labeled instances from the source domain as well as
unlabeled instances from both source and target domains. Under this framework,
we show that the popular reconstruction loss of autoencoder corresponds to an
upper bound of the negative marginal log-likelihoods of unlabeled instances,
where marginal distributions are given by proper kernel density estimations.
This provides a way to interpret the empirical success of autoencoders in
domain adaptation and semi-supervised learning. We instantiate our framework
using neural networks, and build a concrete model, DAuto. Empirically, we
demonstrate the effectiveness of DAuto on text, image and speech datasets,
showing that it outperforms related competitors when domain adaptation is
possible.
Shuai Xiao, Junchi Yan, Stephen M. Chu, Xiaokang Yang, Hongyuan Zha
Comments: Accepted at Thirty-First AAAI Conference on Artificial Intelligence (AAAI17)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Event sequence, asynchronously generated with random timestamp, is ubiquitous
among applications. The precise and arbitrary timestamp can carry important
clues about the underlying dynamics, and has lent the event data fundamentally
different from the time-series whereby series is indexed with fixed and equal
time interval. One expressive mathematical tool for modeling event is point
process. The intensity functions of many point processes involve two
components: the background and the effect by the history. Due to its inherent
spontaneousness, the background can be treated as a time series while the other
need to handle the history events. In this paper, we model the background by a
Recurrent Neural Network (RNN) with its units aligned with time series indexes
while the history effect is modeled by another RNN whose units are aligned with
asynchronous events to capture the long-range dynamics. The whole model with
event type and timestamp prediction output layers can be trained end-to-end.
Our approach takes an RNN perspective to point process, and models its
background and history effect. For utility, our method allows a black-box
treatment for modeling the intensity which is often a pre-defined parametric
form in point processes. Meanwhile end-to-end training opens the venue for
reusing existing rich techniques in deep network for point process modeling. We
apply our model to the predictive maintenance problem using a log dataset by
more than 1000 ATMs from a global bank headquartered in North America.
Davide Venturelli, Minh Do, Eleanor Rieffel, Jeremy Frank
Journal-ref: related to proceedings of IJCAI 2017, and ICAPS SPARK Workshop
2017
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Systems and Control (cs.SY)
To run quantum algorithms on emerging gate-model quantum hardware, quantum
circuits must be compiled to take into account constraints on the hardware. For
near-term hardware, with only limited means to mitigate decoherence, it is
critical to minimize the duration of the circuit. We investigate the
application of temporal planners to the problem of compiling quantum circuits
to newly emerging quantum hardware. While our approach is general, we focus on
compiling to superconducting hardware architectures with nearest neighbor
constraints. Our initial experiments focus on compiling Quantum Approximate
Optimization Algorithm (QAOA) circuits whose high number of commuting gates
allow great flexibility in the order in which the gates can be applied. That
freedom makes it more challenging to find optimal compilations but also means
there is a greater potential win from more optimized compilation than for less
flexible circuits. We map this quantum circuit compilation problem to a
temporal planning problem, and generated a test suite of compilation problems
for QAOA circuits of various sizes to a realistic hardware architecture. We
report compilation results from several state-of-the-art temporal planners on
this test set. compile circuits of various sizes to a realistic hardware. This
early empirical evaluation demonstrates that temporal planning is a viable
approach to quantum circuit compilation.
Sirui Yao, Bert Huang
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We study fairness in collaborative-filtering recommender systems, which are
sensitive to discrimination that exists in historical data. Biased data can
lead collaborative-filtering methods to make unfair predictions for users from
minority groups. We identify the insufficiency of existing fairness metrics and
propose four new metrics that address different forms of unfairness. These
fairness metrics can be optimized by adding fairness terms to the learning
objective. Experiments on synthetic and real data show that our new metrics can
better measure fairness than the baseline, and that the fairness objectives
effectively help reduce unfairness.
Yang Liu, Mirella Lapata
Subjects: Computation and Language (cs.CL)
In this paper, we focus on learning structure-aware document representations
from data without recourse to a discourse parser or additional annotations.
Drawing inspiration from recent efforts to empower neural networks with a
structural bias, we propose a model that can encode a document while
automatically inducing rich structural dependencies. Specifically, we embed a
differentiable non-projective parsing algorithm into a neural model and use
attention mechanisms to incorporate the structural biases. Experimental
evaluation across different tasks and datasets shows that the proposed model
achieves state-of-the-art results on document modeling tasks while inducing
intermediate structures which are both interpretable and meaningful.
Jean Maillard, Stephen Clark, Dani Yogatama
Subjects: Computation and Language (cs.CL)
We introduce a neural network that represents sentences by composing their
words according to induced binary parse trees. We use Tree-LSTM as our
composition function, applied along a tree structure found by a fully
differentiable natural language chart parser. Our model simultaneously
optimises both the composition function and the parser, thus eliminating the
need for externally-provided parse trees which are normally required for
Tree-LSTM. It can therefore be seen as a tree-based RNN that is unsupervised
with respect to the parse trees. As it is fully differentiable, our model is
easily trained with an off-the-shelf gradient descent method and
backpropagation. We demonstrate that it achieves better performance compared to
various supervised Tree-LSTM architectures on a textual entailment task and a
reverse dictionary task.
Zhipeng Xie, Junfeng Hu
Journal-ref: DASFAA (1) 2017: 295-308
Subjects: Computation and Language (cs.CL)
Recognizing textual entailment is a fundamental task in a variety of text
mining or natural language processing applications. This paper proposes a
simple neural model for RTE problem. It first matches each word in the
hypothesis with its most-similar word in the premise, producing an augmented
representation of the hypothesis conditioned on the premise as a sequence of
word pairs. The LSTM model is then used to model this augmented sequence, and
the final output from the LSTM is fed into a softmax layer to make the
prediction. Besides the base model, in order to enhance its performance, we
also proposed three techniques: the integration of multiple word-embedding
library, bi-way integration, and ensemble based on model averaging.
Experimental results on the SNLI dataset have shown that the three techniques
are effective in boosting the predicative accuracy and that our method
outperforms several state-of-the-state ones.
Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
Comments: Submitted to NIPS 2017
Subjects: Computation and Language (cs.CL)
We introduce a technique for augmenting neural text-to-speech (TTS) with
lowdimensional trainable speaker embeddings to generate different voices from a
single model. As a starting point, we show improvements over the two
state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and
Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with
Deep Voice 1, but constructed with higher performance building blocks and
demonstrates a significant audio quality improvement over Deep Voice 1. We
improve Tacotron by introducing a post-processing neural vocoder, and
demonstrate a significant audio quality improvement. We then demonstrate our
technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron
on two multi-speaker TTS datasets. We show that a single neural TTS system can
learn hundreds of unique voices from less than half an hour of data per
speaker, while achieving high audio quality synthesis and preserving the
speaker identities almost perfectly.
Necva Bölücü, Burcu Can
Comments: 12 pages with 3 figures, accepted and presented at the CICLING 2017 – 18th International Conference on Intelligent Text Processing and Computational Linguistics
Journal-ref: CICLING 2017
Subjects: Computation and Language (cs.CL)
The number of word forms in agglutinative languages is theoretically infinite
and this variety in word forms introduces sparsity in many natural language
processing tasks. Part-of-speech tagging (PoS tagging) is one of these tasks
that often suffers from sparsity. In this paper, we present an unsupervised
Bayesian model using Hidden Markov Models (HMMs) for joint PoS tagging and
stemming for agglutinative languages. We use stemming to reduce sparsity in PoS
tagging. Two tasks are jointly performed to provide a mutual benefit in both
tasks. Our results show that joint POS tagging and stemming improves PoS
tagging scores. We present results for Turkish and Finnish as agglutinative
languages and English as a morphologically poor language.
Ashwini Jaya Kumar, Sören Auer, Christoph Schmidt, Joachim köhler
Comments: Under Review in International Workshop on Grounding Language Understanding, Satellite of Interspeech 2017
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
Applications which use human speech as an input require a speech interface
with high recognition accuracy. The words or phrases in the recognised text are
annotated with a machine-understandable meaning and linked to knowledge graphs
for further processing by the target application. These semantic annotations of
recognised words can be represented as a subject-predicate-object triples which
collectively form a graph often referred to as a knowledge graph. This type of
knowledge representation facilitates to use speech interfaces with any spoken
input application, since the information is represented in logical, semantic
form, retrieving and storing can be followed using any web standard query
languages. In this work, we develop a methodology for linking speech input to
knowledge graphs and study the impact of recognition errors in the overall
process. We show that for a corpus with lower WER, the annotation and linking
of entities to the DBpedia knowledge graph is considerable. DBpedia Spotlight,
a tool to interlink text documents with the linked open data is used to link
the speech recognition output to the DBpedia knowledge graph. Such a
knowledge-based speech recognition interface is useful for applications such as
question answering or spoken dialog systems.
Tao Lei, Wengong Jin, Regina Barzilay, Tommi Jaakkola
Comments: to appear at ICML 2017; includes additional discussions
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL)
The design of neural architectures for structured objects is typically guided
by experimental insights rather than a formal process. In this work, we appeal
to kernels over combinatorial structures, such as sequences and graphs, to
derive appropriate neural operations. We introduce a class of deep recurrent
neural operations and formally characterize their associated kernel spaces. Our
recurrent modules compare the input to virtual reference objects (cf. filters
in CNN) via the kernels. Similar to traditional neural operations, these
reference objects are parameterized and directly optimized in end-to-end
training. We empirically evaluate the proposed class of neural architectures on
standard applications such as language modeling and molecular graph regression,
achieving state-of-the-art or competitive results across these applications. We
also draw connections to existing architectures such as LSTMs.
Nicholas Harvey, David Karger, Virginia Savova, Leonid Peshkin
Subjects: Discrete Mathematics (cs.DM); Computation and Language (cs.CL); Data Structures and Algorithms (cs.DS)
This paper formulates a novel problem on graphs: find the minimal subset of
edges in a fully connected graph, such that the resulting graph contains all
spanning trees for a set of specifed sub-graphs. This formulation is motivated
by an un-supervised grammar induction problem from computational linguistics.
We present a reduction to some known problems and algorithms from graph theory,
provide computational complexity results, and describe an approximation
algorithm.
William C. Cooper, Maxwell Young
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Randomized binary exponential backoff (BEB) is a popular algorithm for
coordinating access to a shared channel. With an operational history exceeding
four decades, BEB is currently an important component of several wireless
standards.
Despite this track record, prior theoretical results indicate that under
bursty traffic (1) BEB yields poor makespan and (2) superior algorithms are
possible. To date, the degree to which these findings manifest in practice has
not been resolved.
To address this issue, we examine one of the strongest cases against BEB: (n)
packets that simultaneously begin contending for the wireless channel. Using
Network Simulator 3, we compare against more recent algorithms that are
inspired by BEB, but whose makespan guarantees are superior. Surprisingly, we
discover that these newer algorithms significantly underperform.
Through further investigation, we identify as the culprit a flawed but common
abstraction regarding the cost of collisions. Our experimental results are
complemented by analytical arguments that the number of collisions — and not
solely makespan — is an important metric to optimize. We believe that these
findings have implications for the design of contention-resolution algorithms.
Muhammad Anis Uddin Nasir, Hiroshi Horii, Marco Serafini, Nicolas Kourtellis, Rudy Raymond, Sarunas Girdzijauskas, Takayuki Osogami
Comments: 13 pages, under submission
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Primitive partitioning strategies for streaming applications operate
efficiently under two very strict assumptions: the resources are homogeneous
and the messages are drawn from a uniform key distribution. These assumptions
are often not true for the real-world use cases. Dealing with heterogeneity and
non-uniform workload requires inferring the resource capacities and input
distribution at run time. However, gathering these statistics and finding an
optimal placement often become a challenge when microsecond latency is desired.
In this paper, we address the load balancing problem for streaming engines
running on a heterogeneous cluster and processing skewed workload. In doing so,
we propose a novel partitioning strategy called Consistent Grouping (CG) that
is inspired by traditional consistent hashing. CG is a lightweight distributed
strategy that enables each processing element instance (PEIs) to process the
workload according to its capacity. The main idea behind CG is the notion of
equal-sized virtual workers at the sources, which are assigned to workers based
on their capacities. We provide a theoretical analysis of the proposed
algorithm and show via extensive empirical evaluation that the proposed scheme
outperforms the state-of-the-art approaches. In particular, CG achieves 3.44x
superior performance in terms of latency compared to key grouping, which is the
state-of-the-art grouping strategy for stateful streaming applications.
Taisuke Izumi, François Le Gall
Comments: To appear in PODC 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Triangle-free graphs play a central role in graph theory, and triangle
detection (or triangle finding) as well as triangle enumeration (triangle
listing) play central roles in the field of graph algorithms. In distributed
computing, algorithms with sublinear round complexity for triangle finding and
listing have recently been developed in the powerful CONGEST clique model,
where communication is allowed between any two nodes of the network. In this
paper we present the first algorithms with sublinear complexity for triangle
finding and triangle listing in the standard CONGEST model, where the
communication topology is the same as the topology of the network. More
precisely, we give randomized algorithms for triangle finding and listing with
round complexity (O(n^{2/3}(log n)^{2/3})) and (O(n^{3/4}log n)),
respectively, where (n) denotes the number of nodes of the network. We also
show a lower bound (Omega(n^{1/3}/log n)) on the round complexity of triangle
listing, which also holds for the CONGEST clique model.
Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jio Hsieh, Wei Zhang, Ji Liu
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)
Most distributed machine learning systems nowadays, including TensorFlow and
CNTK, are built in a centralized fashion. One bottleneck of centralized
algorithms lies on high communication cost on the central node. Motivated by
this, we ask, can decentralized algorithms be faster than its centralized
counterpart?
Although decentralized PSGD (D-PSGD) algorithms have been studied by the
control community, existing analysis and theory do not show any advantage over
centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario
where only the decentralized network is available. In this paper, we study a
D-PSGD algorithm and provide the first theoretical analysis that indicates a
regime in which decentralized algorithms might outperform centralized
algorithms for distributed stochastic gradient descent. This is because D-PSGD
has comparable total computational complexities to C-PSGD but requires much
less communication cost on the busiest node. We further conduct an empirical
study to validate our theoretical analysis across multiple frameworks (CNTK and
Torch), different network configurations, and computation platforms up to 112
GPUs. On network configurations with low bandwidth or high latency, D-PSGD can
be up to one order of magnitude faster than its well-optimized centralized
counterparts.
Lei Deng, Peng Jiao, Jing Pei, Zhenzhi Wu, Guoqi Li
Comments: 9 pages, 10 figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
There is a pressing need to build an architecture that could subsume these
networks undera unified framework that achieves both higher performance and
less overhead. To this end, two fundamental issues are yet to be addressed. The
first one is how to implement the back propagation when neuronal activations
are discrete. The second one is how to remove the full-precision hidden weights
in the training phase to break the bottlenecks of memory/computation
consumption. To address the first issue, we present a multistep neuronal
activation discretization method and a derivative approximation technique that
enable the implementing the back propagation algorithm on discrete DNNs. While
for the second issue, we propose a discrete state transition (DST) methodology
to constrain the weights in a discrete space without saving the hidden weights.
In this way, we build a unified framework that subsumes the binary or ternary
networks as its special cases.More particularly, we find that when both the
weights and activations become ternary values, the DNNs can be reduced to gated
XNOR networks (or sparse binary networks) since only the event of non-zero
weight and non-zero activation enables the control gate to start the XNOR logic
operations in the original binary networks. This promises the event-driven
hardware design for efficient mobile intelligence. We achieve advanced
performance compared with state-of-the-art algorithms. Furthermore,the
computational sparsity and the number of states in the discrete space can be
flexibly modified to make it suitable for various hardware platforms.
Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Whye Teh
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The evidence lower bound (ELBO) appears in many algorithms for maximum
likelihood estimation (MLE) with latent variables because it is a sharp lower
bound of the marginal log-likelihood. For neural latent variable models,
optimizing the ELBO jointly in the variational posterior and model parameters
produces state-of-the-art results. Inspired by the success of the ELBO as a
surrogate MLE objective, we consider the extension of the ELBO to a family of
lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We
show that the tightness of such bounds is asymptotically related to the
variance of the underlying estimator. We introduce a special case, the
filtering variational objectives (FIVOs), which takes the same arguments as the
ELBO and passes them through a particle filter to form a tighter bound. FIVOs
can be optimized tractably with stochastic gradients, and are particularly
suited to MLE in sequential latent variable models. In standard sequential
generative modeling tasks we present uniform improvements over models trained
with ELBO, including some whole nat-per-timestep improvements.
Joseph Anderson
Comments: 180 Pages, 7 Figures, PhD thesis, Ohio State (2017)
Subjects: Learning (cs.LG)
Machine learning and data analysis now finds both scientific and industrial
application in biology, chemistry, geology, medicine, and physics. These
applications rely on large quantities of data gathered from automated sensors
and user input. Furthermore, the dimensionality of many datasets is extreme:
more details are being gathered about single user interactions or sensor
readings. All of these applications encounter problems with a common theme: use
observed data to make inferences about the world. Our work obtains the first
provably efficient algorithms for Independent Component Analysis (ICA) in the
presence of heavy-tailed data. The main tool in this result is the centroid
body (a well-known topic in convex geometry), along with optimization and
random walks for sampling from a convex body. This is the first algorithmic use
of the centroid body and it is of independent theoretical interest, since it
effectively replaces the estimation of covariance from samples, and is more
generally accessible.
This reduction relies on a non-linear transformation of samples from such an
intersection of halfspaces (i.e. a simplex) to samples which are approximately
from a linearly transformed product distribution. Through this transformation
of samples, which can be done efficiently, one can then use an ICA algorithm to
recover the vertices of the intersection of halfspaces.
Finally, we again use ICA as an algorithmic primitive to construct an
efficient solution to the widely-studied problem of learning the parameters of
a Gaussian mixture model. Our algorithm again transforms samples from a
Gaussian mixture model into samples which fit into the ICA model and, when
processed by an ICA algorithm, result in recovery of the mixture parameters.
Our algorithm is effective even when the number of Gaussians in the mixture
grows polynomially with the ambient dimension
Aditya Krishna Menon, Robert C. Williamson
Subjects: Learning (cs.LG)
We study the problem of learning classifiers with a fairness constraint, with
three main contributions towards the goal of quantifying the problem’s inherent
tradeoffs. First, we relate two existing fairness measures to cost-sensitive
risks. Second, we show that for cost-sensitive classification and fairness
measures, the optimal classifier is an instance-dependent thresholding of the
class-probability function. Third, we show how the tradeoff between accuracy
and fairness is determined by the alignment between the class-probabilities for
the target and sensitive features. Underpinning our analysis is a general
framework that casts the problem of learning with a fairness requirement as one
of minimising the difference of two statistical risks.
Walid Chaabene, Bert Huang
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Incremental methods for structure learning of pairwise Markov random fields
(MRFs), such as grafting, improve scalability to large systems by avoiding
inference over the entire feature space in each optimization step. Instead,
inference is performed over an incrementally grown active set of features. In
this paper, we address the computational bottlenecks that current techniques
still suffer by introducing online edge grafting, an incremental, structured
method that activates edges as groups of features in a streaming setting. The
framework is based on reservoir sampling of edges that satisfy a necessary
activation condition, approximating the search for the optimal edge to
activate. Online edge grafting performs an informed edge search set
reorganization using search history and structure heuristics. Experiments show
a significant computational speedup for structure learning and a controllable
trade-off between the speed and the quality of learning.
Han Zhao, Zhenyao Zhu, Junjie Hu, Adam Coates, Geoff Gordon
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
We propose a probabilistic framework for domain adaptation that blends both
generative and discriminative modeling in a principled way. By maximizing both
the marginal and the conditional log-likelihoods, models derived from this
framework can use both labeled instances from the source domain as well as
unlabeled instances from both source and target domains. Under this framework,
we show that the popular reconstruction loss of autoencoder corresponds to an
upper bound of the negative marginal log-likelihoods of unlabeled instances,
where marginal distributions are given by proper kernel density estimations.
This provides a way to interpret the empirical success of autoencoders in
domain adaptation and semi-supervised learning. We instantiate our framework
using neural networks, and build a concrete model, DAuto. Empirically, we
demonstrate the effectiveness of DAuto on text, image and speech datasets,
showing that it outperforms related competitors when domain adaptation is
possible.
Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri
Subjects: Learning (cs.LG)
Generative adversarial networks (GAN) approximate a target data distribution
by jointly optimizing an objective function through a “two-player game” between
a generator and a discriminator. Despite their empirical success, however, two
very basic questions on how well they can approximate the target distribution
remain unanswered. First, it is not known how restricting the discriminator
family affects the approximation quality. Second, while a number of different
objective functions have been proposed, we do not understand when convergence
to the global minima of the objective function leads to convergence to the
target distribution under various notions of distributional convergence.
In this paper, we address these questions in a broad and unified setting by
defining a notion of adversarial divergences that includes a number of recently
proposed objective functions. We show that if the objective function is an
adversarial divergence with some additional conditions, then using a restricted
discriminator family has a moment-matching effect. Additionally, we show that
for objective functions that are strict adversarial divergences, convergence in
the objective function implies weak convergence, thus generalizing previous
results.
Shuai Xiao, Junchi Yan, Stephen M. Chu, Xiaokang Yang, Hongyuan Zha
Comments: Accepted at Thirty-First AAAI Conference on Artificial Intelligence (AAAI17)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Event sequence, asynchronously generated with random timestamp, is ubiquitous
among applications. The precise and arbitrary timestamp can carry important
clues about the underlying dynamics, and has lent the event data fundamentally
different from the time-series whereby series is indexed with fixed and equal
time interval. One expressive mathematical tool for modeling event is point
process. The intensity functions of many point processes involve two
components: the background and the effect by the history. Due to its inherent
spontaneousness, the background can be treated as a time series while the other
need to handle the history events. In this paper, we model the background by a
Recurrent Neural Network (RNN) with its units aligned with time series indexes
while the history effect is modeled by another RNN whose units are aligned with
asynchronous events to capture the long-range dynamics. The whole model with
event type and timestamp prediction output layers can be trained end-to-end.
Our approach takes an RNN perspective to point process, and models its
background and history effect. For utility, our method allows a black-box
treatment for modeling the intensity which is often a pre-defined parametric
form in point processes. Meanwhile end-to-end training opens the venue for
reusing existing rich techniques in deep network for point process modeling. We
apply our model to the predictive maintenance problem using a log dataset by
more than 1000 ATMs from a global bank headquartered in North America.
Scott Cheng-Hsin Yang, Yue Yu, Arash Givchi, Pei Wang, Wai Keen Vong, Patrick Shafto
Comments: 14 pages (5 pages of Supplementary Material), 1 figure
Subjects: Learning (cs.LG)
Cooperative transmission of data fosters rapid accumulation of knowledge by
efficiently combining experience across learners. Although well studied in
human learning, there has been less attention to cooperative transmission of
data in machine learning, and we consequently lack strong formal frameworks
through which we may reason about the benefits and limitations of cooperative
inference. We present such a framework. We introduce a novel index for
measuring the effectiveness of probabilistic information transmission, and
cooperative information transmission specifically. We relate our cooperative
index to previous measures of teaching in deterministic settings. We prove
conditions under which optimal cooperative inference can be achieved, including
a representation theorem which constrains the form of inductive biases for
learners optimized for cooperative inference. We conclude by demonstrating how
these principles may inform the design of machine learning algorithms and
discuss implications for human learning, machine learning, and human-machine
learning systems.
Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally
Comments: submitted to NIPS 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Sparsity helps reduce the computational complexity of deep neural networks by
skipping zeros. Taking advantage of sparsity is listed as a high priority in
next generation DNN accelerators such as TPU. The structure of sparsity, i.e.,
the granularity of pruning, affects the efficiency of hardware accelerator
design as well as the prediction accuracy. Coarse-grained pruning creates
regular sparsity patterns, making it more amenable for hardware acceleration
but more challenging to maintain the same accuracy. In this paper we
quantitatively measure the trade-off between sparsity regularity and prediction
accuracy, providing insights in how to maintain accuracy while having more a
more structured sparsity pattern. Our experimental results show that
coarse-grained pruning can achieve a sparsity ratio similar to unstructured
pruning without loss of accuracy. Moreover, due to the index saving effect,
coarse-grained pruning is able to obtain a better compression ratio than
fine-grained sparsity at the same accuracy threshold. Based on the recent
sparse convolutional neural network accelerator (SCNN), our experiments further
demonstrate that coarse-grained sparsity saves about 2x the memory references
compared to fine-grained sparsity. Since memory reference is more than two
orders of magnitude more expensive than arithmetic operations, the regularity
of sparse structure leads to more efficient hardware design.
Liang Zhao, Yang Wang, Yi Yang, Wei Xu
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
This paper presents two unsupervised learning layers (UL layers) for
label-free video analysis: one for fully connected layers, and the other for
convolutional ones. The proposed UL layers can play two roles: they can be the
cost function layer for providing global training signal; meanwhile they can be
added to any regular neural network layers for providing local training signals
and combined with the training signals backpropagated from upper layers for
extracting both slow and fast changing features at layers of different depths.
Therefore, the UL layers can be used in either pure unsupervised or
semi-supervised settings. Both a closed-form solution and an online learning
algorithm for two UL layers are provided. Experiments with unlabeled synthetic
and real-world videos demonstrated that the neural networks equipped with UL
layers and trained with the proposed online learning algorithm can extract
shape and motion information from video sequences of moving objects. The
experiments demonstrated the potential applications of UL layers and online
learning algorithm to head orientation estimation and moving object
localization.
Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We study implicit regularization when optimizing an underdetermined quadratic
objective over a matrix (X) with gradient descent on a factorization of (X). We
conjecture and provide empirical and theoretical evidence that with small
enough step sizes and initialization close enough to the origin, gradient
descent on a full dimensional factorization converges to the minimum nuclear
norm solution.
Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, Barnabas Poczos
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We design and analyse variations of the classical Thompson sampling (TS)
procedure for Bayesian optimisation (BO) in settings where function evaluations
are expensive, but can be performed in parallel. Our theoretical analysis shows
that a direct application of the sequential Thompson sampling algorithm in
either synchronous or asynchronous parallel settings yields a surprisingly
powerful result: making (n) evaluations distributed among (M) workers is
essentially equivalent to performing (n) evaluations in sequence. Further, by
modeling the time taken to complete a function evaluation, we show that, under
a time constraint, asynchronously parallel TS achieves asymptotically lower
regret than both the synchronous and sequential versions. These results are
complemented by an experimental analysis, showing that asynchronous TS
outperforms a suite of existing parallel BO algorithms in simulations and in a
hyper-parameter tuning application in convolutional neural networks. In
addition to these, the proposed procedure is conceptually and computationally
much simpler than existing work for parallel BO.
Sultan Imangaliyev, Monique H. van der Veen, Catherine M. C. Volgenant, Bruno G. Loos, Bart J. F. Keijser, Wim Crielaard, Evgeni Levin
Comments: Full version of ICANN 2017 submission
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Images are an important data source for diagnosis and treatment of oral
diseases. The manual classification of images may lead to misdiagnosis or
mistreatment due to subjective errors. In this paper an image classification
model based on Convolutional Neural Network is applied to Quantitative
Light-induced Fluorescence images. The deep neural network outperforms other
state of the art shallow classification models in predicting labels derived
from three different dental plaque assessment scores. The model directly
benefits from multi-channel representation of the images resulting in improved
performance when, besides the Red colour channel, additional Green and Blue
colour channels are used.
Timur Pekhovsky, Maxim Korenevsky
Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)
New system for i-vector speaker recognition based on variational autoencoder
(VAE) is investigated. VAE is a promising approach for developing accurate deep
nonlinear generative models of complex data. Experiments show that VAE provides
speaker embedding and can be effectively trained in an unsupervised manner. LLR
estimate for VAE is developed. Experiments on NIST SRE 2010 data demonstrate
its correctness. Additionally, we show that the performance of VAE-based system
in the i-vectors space is close to that of the diagonal PLDA. Several
interesting results are also observed in the experiments with (eta)-VAE. In
particular, we found that for (etall 1), VAE can be trained to capture the
features of complex input data distributions in an effective way, which is hard
to obtain in the standard VAE ((eta=1)).
Dongyu Meng, Hao Chen
Comments: In submission as a conference paper
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
Deep learning has shown promising results on hard perceptual problems in
recent years. However, deep learning systems are found to be vulnerable to
small adversarial perturbations that are nearly imperceptible to human. Such
specially crafted perturbations cause deep learning systems to output incorrect
decisions, with potentially disastrous consequences. These vulnerabilities
hinder the deployment of deep learning systems where safety or security is
important. Attempts to secure deep learning systems either target specific
attacks or have been shown to be ineffective.
In this paper, we propose MagNet, a framework for defending neural network
classifiers against adversarial examples. MagNet does not modify the protected
classifier or know the process for generating adversarial examples. MagNet
includes one or more separate detector networks and a reformer network.
Different from previous work, MagNet learns to differentiate between normal and
adversarial examples by approximating the manifold of normal examples. Since it
does not rely on any process for generating adversarial examples, it has
substantial generalization power. Moreover, MagNet reconstructs adversarial
examples by moving them towards the manifold, which is effective for helping
classify adversarial examples with small perturbation correctly. We discuss the
intrinsic difficulty in defending against whitebox attack and propose a
mechanism to defend against graybox attack. Inspired by the use of randomness
in cryptography, we propose to use diversity to strengthen MagNet. We show
empirically that MagNet is effective against most advanced state-of-the-art
attacks in blackbox and graybox scenarios while keeping false positive rate on
normal examples very low.
Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jio Hsieh, Wei Zhang, Ji Liu
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)
Most distributed machine learning systems nowadays, including TensorFlow and
CNTK, are built in a centralized fashion. One bottleneck of centralized
algorithms lies on high communication cost on the central node. Motivated by
this, we ask, can decentralized algorithms be faster than its centralized
counterpart?
Although decentralized PSGD (D-PSGD) algorithms have been studied by the
control community, existing analysis and theory do not show any advantage over
centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario
where only the decentralized network is available. In this paper, we study a
D-PSGD algorithm and provide the first theoretical analysis that indicates a
regime in which decentralized algorithms might outperform centralized
algorithms for distributed stochastic gradient descent. This is because D-PSGD
has comparable total computational complexities to C-PSGD but requires much
less communication cost on the busiest node. We further conduct an empirical
study to validate our theoretical analysis across multiple frameworks (CNTK and
Torch), different network configurations, and computation platforms up to 112
GPUs. On network configurations with low bandwidth or high latency, D-PSGD can
be up to one order of magnitude faster than its well-optimized centralized
counterparts.
Mohamed Aslan, Ashraf Matrawy
Subjects: Networking and Internet Architecture (cs.NI); Learning (cs.LG)
Distributed controllers are oftentimes used in large-scale SDN deployments
where they run a myriad of network applications simultaneously. Such
applications could have different consistency and availability preferences.
These controllers need to communicate via east/west interfaces in order to
synchronize their state information. The consistency and the availability of
the distributed state information are governed by an underlying consistency
model. Earlier, we suggested the use of adaptively-consistent controllers that
can autonomously tune their consistency parameters in order to meet the
performance requirements of a certain application. In this paper, we examine
the feasibility of employing adaptive controllers that are built on-top of
tunable consistency models similar to that of Apache Cassandra. We present an
adaptation strategy that uses clustering techniques (sequential k-means and
incremental k-means) in order to map a given application performance indicator
into a feasible consistency level that can be used with the underlying tunable
consistency model. In the cases that we modeled and tested, our results show
that in the case of sequential k-means, with a reasonable number of clusters
(>= 50), a plausible mapping (low RMSE) could be estimated between the
application performance indicators and the consistency level indicator. In the
case of incremental k-means, the results also showed that a plausible mapping
(low RMSE) could be estimated using a similar number of clusters (>= 50) by
using a small threshold (~( 0.01).
Yongqiang Huang, Yu Sun
Subjects: Robotics (cs.RO); Learning (cs.LG)
Pouring is a simple task people perform daily. It is the second most
frequently executed motion in cooking scenarios, after pick-and-place. We
present a pouring trajectory generation approach, which uses force feedback
from the cup to determine the future velocity of pouring. The approach uses
recurrent neural networks as its building blocks. We collected the pouring
demonstrations which we used for training. To test our approach in simulation,
we also created and trained a force estimation system. The simulated
experiments show that the system is able to generalize to single unseen element
of the pouring characteristics.
Himanshu Sahni, Saurabh Kumar, Farhan Tejani, Yannick Schroecker, Charles Isbell
Comments: 5 pages, 6 figures; 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2017), Ann Arbor, Michigan
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Typical reinforcement learning (RL) agents learn to complete tasks specified
by reward functions tailored to their domain. As such, the policies they learn
do not generalize even to similar domains. To address this issue, we develop a
framework through which a deep RL agent learns to generalize policies from
smaller, simpler domains to more complex ones using a recurrent attention
mechanism. The task is presented to the agent as an image and an instruction
specifying the goal. This meta-controller guides the agent towards its goal by
designing a sequence of smaller subtasks on the part of the state space within
the attention, effectively decomposing it. As a baseline, we consider a setup
without attention as well. Our experiments show that the meta-controller learns
to create subgoals within the attention.
Jaan Altosaar, Rajesh Ranganath, David M. Blei
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Computation (stat.CO)
Variational inference is a powerful approach for approximate posterior
inference. However, it is sensitive to initialization and can be subject to
poor local optima. In this paper, we develop proximity variational inference
(PVI). PVI is a new method for optimizing the variational objective that
constrains subsequent iterates of the variational parameters to robustify the
optimization path. Consequently, PVI is less sensitive to initialization and
optimization quirks and finds better local optima. We demonstrate our method on
three proximity statistics. We study PVI on a Bernoulli factor model and
sigmoid belief network with both real and synthetic data and compare to
deterministic annealing (Katahira et al., 2008). We highlight the flexibility
of PVI by designing a proximity statistic for Bayesian deep learning models
such as the variational autoencoder (Kingma and Welling, 2014; Rezende et al.,
2014). Empirically, we show that PVI consistently finds better local optima and
gives better predictive performance.
Efrén Cruz Cortés, Clayton Scott
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Consistency of the kernel density estimator requires that the kernel
bandwidth tends to zero as the sample size grows. In this paper we investigate
the question of whether consistency is still possible when the bandwidth is
fixed, if we consider a more general class of weighted KDEs. To answer this
question in the affirmative, we introduce the fixed-bandwidth KDE (fbKDE),
obtained by solving a quadratic program, that consistently estimates any
continuous square-integrable density. Rates of convergence for fbKDE are also
established for radial kernel and the box kernel under appropriate smoothness
assumptions. Similar results are provided for the box kernel. Furthermore, in
an experimental study we demonstrate that the fbKDE compares favorably to the
standard KDE and the previously proposed variable bandwidth KDE.
Sirui Yao, Bert Huang
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
We study fairness in collaborative-filtering recommender systems, which are
sensitive to discrimination that exists in historical data. Biased data can
lead collaborative-filtering methods to make unfair predictions for users from
minority groups. We identify the insufficiency of existing fairness metrics and
propose four new metrics that address different forms of unfairness. These
fairness metrics can be optimized by adding fairness terms to the learning
objective. Experiments on synthetic and real data show that our new metrics can
better measure fairness than the baseline, and that the fairness objectives
effectively help reduce unfairness.
Pratik Chakraborty, Shankar Prakriya
Subjects: Information Theory (cs.IT)
In this paper, we analyze the throughput performance of two co-existing
downlink multiuser underlay secondary networks that use fixed-rate
transmissions. We assume that the interference temperature limit (ITL) is
apportioned to accommodate two concurrent transmissions using an interference
temperature apportioning parameter so as to ensure that the overall
interference to the primary receiver does not exceed the ITL. Using the derived
analytical expressions for throughput, when there is only one secondary user in
each network, or when the secondary networks do not employ opportunistic user
selection (use round robin scheduling for example), there exists a critical
fixed-rate below which sum throughput with co-existing secondary networks is
higher than the throughput with a single secondary network. We derive an
expression for this critical fixed-rate. Below this critical rate, we show that
careful apportioning of the ITL is critical to maximizing sum throughput of the
co-existing networks. We derive an expression for this apportioning parameter.
Throughput is seen to increase with increase in number of users in each of the
secondary networks. Computer simulations demonstrate accuracy of the derived
expressions.
Onel L. A. López (1), Evelio M. G. Fernández (2), H. Alves (1), R. D. Souza (3) ((1) Centre for Wireless Communications (CWC), University of Oulu, Finland, (2) Federal University of Paraná (UFPR), Curitiba, Brazil, (3) Federal University of Santa Catarina (UFSC), Florianopolis, Brazil)
Comments: 23 pages, 6 main figures, submitted to IEEE TWC
Subjects: Information Theory (cs.IT)
We analyze a wireless communication system with finite block length and
finite battery energy, under quasi-static Nakagami-m fading. Wireless energy
transfer is carried out in the downlink while information transfer occurs in
the uplink. Transmission strategies for scenarios with/without energy
accumulation between transmission rounds are characterized in terms of error
probability and energy consumption. A power control protocol for the energy
accumulation scenario is proposed and results show the enormous impact on
improving the system performance, in terms of error probability and energy
consumption. The numerical results corroborate the existence and uniqueness of
an optimum target error probability, while showing that a relatively small
battery could be a limiting factor for some setups, specially when using the
energy accumulation strategy.
Ekant Sharma, Rohit Budhiraja, K Vasudevan
Comments: 32 pages
Subjects: Information Theory (cs.IT)
We consider multi-pair two-way amplify and forward (AF) relaying, where
multiple full-duplex source-node pairs exchange information via a shared
full-duplex massive multiple-input multiple-output (MIMO) relay. Most of the
previous massive MIMO relay works maximize spectral efficiency (SE). We
maximize global energy efficiency (GEE) with quality-of-service (QoS)
constraints that are expressed as the rate required by source nodes. The
problem is non-convex and is solved by approximating it as a pseudo-concave
problem, which is then solved using the Dinkelbach method. We also consider the
max-min EE problem which maximizes the EE of the worst energy efficient user.
For solving the EE optimization, we derive approximate closed-form lower bounds
for the ergodic achievable rate for maximal-ratio and zero-forcing processing
at the relay by using minimum mean squared error channel estimation. We
numerically show the accuracy of the derived lower bounds and the improved GEE
with optimal power allocation, with and without QoS constraints.
Jiangbin Lyu, Yong Zeng, Rui Zhang
Comments: 6 pages, 3 figures, submitted for possible publication
Subjects: Information Theory (cs.IT)
In conventional terrestrial cellular systems, mobile terminals (MTs) at the
cell edge often pose the performance bottleneck due to their long distance from
the ground base station (GBS), especially in hotspot areas. This paper proposes
a new hybrid network architecture by leveraging the use of unmanned aerial
vehicle (UAV) as an aerial mobile base station, which flies cyclically along
the cell edge to serve the cell-edge MTs and help offloading the traffic from
the GBS. To achieve user fairness, we aim to maximize the minimum throughput of
all MTs in a single cell by jointly optimizing the UAV’s trajectory, as well as
the bandwidth allocation and user partitioning between the UAV and GBS.
Numerical results show that the proposed hybrid network with optimized spectrum
sharing and cyclical multiple access design significantly improves the spatial
throughput over the conventional cellular network with the GBS only.
Geert Leus, Santiago Segarra, Alejandro Ribeiro, Antonio G. Marques
Comments: 5 pages, 2 figures
Subjects: Information Theory (cs.IT)
Contemporary data is often supported by an irregular structure, which can be
conveniently captured by a graph. Accounting for this graph support is crucial
to analyze the data, leading to an area known as graph signal processing (GSP).
The two most important tools in GSP are the graph shift operator (GSO), which
is a sparse matrix accounting for the topology of the graph, and the graph
Fourier transform (GFT), which maps graph signals into a frequency domain
spanned by a number of graph-related Fourier-like basis vectors. This
alternative representation of a graph signal is denominated the graph frequency
signal. Several attempts have been undertaken in order to interpret the support
of this graph frequency signal, but they all resulted in a one-dimensional
interpretation. However, if the support of the original signal is captured by a
graph, why would the graph frequency signal have a simple one-dimensional
support? That is why, for the first time, we propose an irregular support for
the graph frequency signal, which we coin the dual graph. The dual GSO leads to
a better interpretation of the graph frequency signal and its domain, helps to
understand how the different graph frequencies are related and clustered,
enables the development of better graph filters and filter banks, and
facilitates the generalization of classical SP results to the graph domain.
Yahya H. Ezzeldin, Mohammed Karmoose, Christina Fragouli
Subjects: Information Theory (cs.IT)
In this paper, we revisit the communication vs. distributed computing
trade-off, studied within the framework of MapReduce in [1]. An implicit
assumption in the aforementioned work is that each server performs all possible
computations on all the files stored in its memory. Our starting observation is
that, if servers can compute only the intermediate values they need, then
storage constraints do not directly imply computation constraints. We examine
how this affects the communication-computation trade-off and suggest that the
trade-off be studied with a predetermined storage constraint. We then proceed
to examine the case where servers need to perform computationally intensive
tasks, and may not have sufficient time to perform all computations required by
the scheme in [1]. Given a threshold that limits the computational load, we
derive a lower bound on the associated communication load, and propose a
heuristic scheme that achieves in some cases the lower bound.
Praneeth Narayanamurthy, Namrata Vaswani
Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)
Robust PCA (RPCA) is the problem of separating a given data matrix into the
sum of a sparse matrix and a low-rank matrix. Dynamic RPCA assumes that the
true data vectors lie in a low-dimensional subspace that an change with time,
albeit slowly. The goal is to track this changing subspace over time in the
presence of sparse outliers. This work provides the first guarantee for dynamic
RPCA that holds under (weakened) standard RPCA assumptions and a realistic
model of slow subspace change. We analyze an existing method called ReProCS.
Our result removes the strong assumptions needed by the two previous complete
guarantees for ReProCS. Both these required an unrealistic model of subspace
change and very specific assumptions on how the outlier support could change.
Most importantly, our guarantees show that, because it exploits slow subspace
change, ReProCS (and its offline counterpart) can provably tolerate much larger
outlier fractions, are faster than most other provable methods, and have
near-optimal storage complexity.
E.O. Kiktenko, N.O. Pozhar, M.N. Anufriev, A.S. Trushechkin, R.R. Yunusov, Y.V. Kurochkin, A.I. Lvovsky, A.K. Fedorov
Comments: 6 pages, 2 figures
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)
Blockchain is a distributed database which is cryptographically protected
against malicious modifications. While promising for a wide range of
applications, current blockchain platforms rely on digital signatures, which
are vulnerable to attacks by means of quantum computers. The same, albeit to a
lesser extent, applies to cryptographic hash functions that are used in
preparing new blocks, so parties with access to quantum computation would have
unfair advantage in procuring mining rewards. Here we propose a possible
solution to the quantum-era blockchain challenge and report an experimental
realization of a quantum-safe blockchain platform that utilizes quantum key
distribution across an urban fiber network for information-theoretically secure
authentication. These results address important questions about realizability
and scalability of quantum-safe blockchains for commercial and governmental
applications.
Dmitri Maslov, Martin Roetteler
Comments: Supersedes arXiv:1703.00874
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); Information Theory (cs.IT)
In this paper we improve the layered implementation of arbitrary stabilizer
circuits introduced by Aaronson and Gottesman in Phys. Rev. A 70(052328), 2004:
to implement a general stabilizer circuit, we reduce their 11-stage computation
-H-C-P-C-P-C-H-P-C-P-C- over the gate set consisting of Hadamard,
Controlled-NOT, and Phase gates, into a 7-stage computation of the form
-C-CZ-P-H-P-CZ-C-. We show arguments in support of using -CZ- stages over the
-C- stages: not only the use of -CZ- stages allows a shorter layered
expression, but -CZ- stages are simpler and appear to be easier to implement
compared to the -C- stages. Based on this decomposition, we develop a two-qubit
gate depth-)(14n{-}4)( implementation of stabilizer circuits over the gate
library {H, P, CNOT}, executable in the LNN architecture, improving best
previously known depth-)25n( circuit, also executable in the LNN architecture.
Our constructions rely on Bruhat decomposition of the symplectic group and on
folding arbitrarily long sequences of the form )((-P-C-))^m( into a 3-stage
computation -P-CZ-C-. Our results include the reduction of the )11(-stage
decomposition -H-C-P-C-P-C-H-P-C-P-C- into a )9(-stage decomposition of the
form -C-P-C-P-H-C-P-C-P-. This reduction is based on the Bruhat decomposition
of the symplectic group. This result also implies a new normal form for
stabilizer circuits. We show that a circuit in this normal form is optimal in
the number of Hadamard gates used. We also show that the normal form has an
asymptotically optimal number of parameters.