Stefano Nichele, Andreas Molund
Subjects: Neural and Evolutionary Computing (cs.NE); Emerging Technologies (cs.ET)
Recurrent Neural Networks (RNNs) have been a prominent concept within
artificial intelligence. They are inspired by Biological Neural Networks (BNNs)
and provide an intuitive and abstract representation of how BNNs work. Derived
from the more generic Artificial Neural Networks (ANNs), the recurrent ones are
meant to be used for temporal tasks, such as speech recognition, because they
are capable of memorizing historic input. However, such networks are very time
consuming to train as a result of their inherent nature. Recently, Echo State
Networks and Liquid State Machines have been proposed as possible RNN
alternatives, under the name of Reservoir Computing (RC). RCs are far more easy
to train. In this paper, Cellular Automata are used as reservoir, and are
tested on the 5-bit memory task (a well known benchmark within the RC
community). The work herein provides a method of mapping binary inputs from the
task onto the automata, and a recurrent architecture for handling the
sequential aspects of it. Furthermore, a layered (deep) reservoir architecture
is proposed. Performances are compared towards earlier work, in addition to its
single-layer version. Results show that the single CA reservoir system yields
similar results to state-of-the-art work. The system comprised of two layered
reservoirs do show a noticeable improvement compared to a single CA reservoir.
This indicates potential for further research and provides valuable insight on
how to design CA reservoir systems.
B Sengupta, E Vazquez, M Sasdelli, Y Qian, M Peniak, L Netherton, G Delfino
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)
With the advent of specialized hardware such as Graphics Processing Units
(GPUs), large scale image localization, classification and retrieval have seen
increased prevalence. Designing scalable software architecture that co-evolves
with such specialized hardware is a challenge in the commercial setting. In
this paper, we describe one such architecture ( extit{Cortexica}) that
leverages scalability of GPUs and sandboxing offered by docker containers. This
allows for the flexibility of mixing different computer architectures as well
as computational algorithms with the security of a trusted environment. We
illustrate the utility of this framework in a commercial setting i.e.,
searching for multiple products in an image by combining image localisation and
retrieval.
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, Julien Stainer
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
The growth of data, the need for scalability and the complexity of models
used in modern machine learning calls for distributed implementations. Yet, as
of today, distributed machine learning frameworks have largely ignored the
possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study
the robustness to Byzantine failures at the fundamental level of stochastic
gradient descent (SGD), the heart of most machine learning algorithms. Assuming
a set of (n) workers, up to (f) of them being Byzantine, we ask how robust can
SGD be, without limiting the dimension, nor the size of the parameter space.
We first show that no gradient descent update rule based on a linear
combination of the vectors proposed by the workers (i.e, current approaches)
tolerates a single Byzantine failure. We then formulate a resilience property
of the update rule capturing the basic requirements to guarantee convergence
despite (f) Byzantine workers. We finally propose Krum, an update rule that
satisfies the resilience property aforementioned. For a (d)-dimensional
learning problem, the time complexity of Krum is (O(n^2 cdot (d + log n))).
Benjamin Paul Chamberlain, Angelo Cardoso, C.H. Bryan Liu, Roberto Pagliari, Marc Peter Deisenroth
Comments: 9 pages, 10 figures
Subjects: Learning (cs.LG); Computers and Society (cs.CY); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We describe the Customer Life Time Value (CLTV) prediction system deployed at
ASOS.com, a global online fashion retailer. CLTV prediction is an important
problem in e-commerce where an accurate estimate of future value allows
retailers to effectively allocate marketing spend, identify and nurture high
value customers and mitigate exposure to losses. The system at ASOS provides
daily estimates of the future value of every customer and is one of the
cornerstones of the personalised shopping experience. The state of the art in
this domain uses large numbers of handcrafted features and ensemble regressors
to forecast value, predict churn and evaluate customer loyalty. We describe our
system, which adopts this approach, and our ongoing efforts to further improve
it. Recently, domains including language, vision and speech have shown dramatic
advances by replacing handcrafted features with features that are learned
automatically from data. We show that learning feature representations is a
promising extension to the state of the art in CLTV modelling. We propose a
novel way to generate embeddings of customers, which addresses the issue of the
ever changing product catalogue and obtain a significant improvement over an
exhaustive set of handcrafted features.
Franziska Schirrmacher, Thomas Köhler, Lennart Husvogt, James G. Fujimoto, Joachim Hornegger, Andreas K. Maier
Comments: submitted to MICCAI’17
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Optical coherence tomography (OCT) enables high-resolution and non-invasive
3D imaging of the human retina but is inherently impaired by speckle noise.
This paper introduces a spatio-temporal denoising algorithm for OCT data on a
B-scan level using a novel quantile sparse image (QuaSI) prior. To remove
speckle noise while preserving image structures of diagnostic relevance, we
implement our QuaSI prior via median filter regularization coupled with a Huber
data fidelity model in a variational approach. For efficient energy
minimization, we develop an alternating direction method of multipliers (ADMM)
scheme using a linearization of median filtering. Our spatio-temporal method
can handle both, denoising of single B-scans and temporally consecutive
B-scans, to gain volumetric OCT data with enhanced signal-to-noise ratio. Our
algorithm based on 4 B-scans only achieved comparable performance to averaging
13 B-scans and outperformed other current denoising methods.
Guido Borghi, Roberto Vezzani, Rita Cucchiara
Comments: Accepted in ICPR 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
HMMs are widely used in action and gesture recognition due to their
implementation simplicity, low computational requirement, scalability and high
parallelism. They have worth performance even with a limited training set. All
these characteristics are hard to find together in other even more accurate
methods. In this paper, we propose a novel double-stage classification
approach, based on Multiple Stream Discrete Hidden Markov Models (MSD-HMM) and
3D skeleton joint data, able to reach high performances maintaining all
advantages listed above. The approach allows both to quickly classify
pre-segmented gestures (offline classification), and to perform temporal
segmentation on streams of gestures (online classification) faster than real
time. We test our system on three public datasets, MSRAction3D, UTKinect-Action
and MSRDailyAction, and on a new dataset, Kinteract Dataset, explicitly created
for Human Computer Interaction (HCI). We obtain state of the art performances
on all of them.
Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, Alexander C. Berg
Comments: To appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a transformation-grounded image generation network for novel 3D
view synthesis from a single image. Instead of taking a ‘blank slate’ approach,
we first explicitly infer the parts of the geometry visible both in the input
and novel views and then re-cast the remaining synthesis problem as image
completion. Specifically, we both predict a flow to move the pixels from the
input to the novel view along with a novel visibility map that helps deal with
occulsion/disocculsion. Next, conditioned on those intermediate results, we
hallucinate (infer) parts of the object invisible in the input image. In
addition to the new network structure, training with a combination of
adversarial and perceptual loss results in a reduction in common artifacts of
novel view synthesis such as distortions and holes, while successfully
generating high frequency details and preserving visual aspects of the input
image. We evaluate our approach on a wide range of synthetic and real examples.
Both qualitative and quantitative results show our method achieves
significantly better results compared to existing methods.
B Sengupta, E Vazquez, M Sasdelli, Y Qian, M Peniak, L Netherton, G Delfino
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)
With the advent of specialized hardware such as Graphics Processing Units
(GPUs), large scale image localization, classification and retrieval have seen
increased prevalence. Designing scalable software architecture that co-evolves
with such specialized hardware is a challenge in the commercial setting. In
this paper, we describe one such architecture ( extit{Cortexica}) that
leverages scalability of GPUs and sandboxing offered by docker containers. This
allows for the flexibility of mixing different computer architectures as well
as computational algorithms with the security of a trusted environment. We
illustrate the utility of this framework in a commercial setting i.e.,
searching for multiple products in an image by combining image localisation and
retrieval.
Kosuke Takahashi, Akihiro Miyata, Shohei Nobuhara, Takashi Matsuyama
Comments: to appear in CVPR 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a new extrinsic calibration of kaleidoscopic imaging
system by estimating normals and distances of the mirrors. The problem to be
solved in this paper is a simultaneous estimation of all mirror parameters
consistent throughout multiple reflections. Unlike conventional methods
utilizing a pair of direct and mirrored images of a reference 3D object to
estimate the parameters on a per-mirror basis, our method renders the
simultaneous estimation problem into solving a linear set of equations. The key
contribution of this paper is to introduce a linear estimation of multiple
mirror parameters from kaleidoscopic 2D projections of a single 3D point of
unknown geometry. Evaluations with synthesized and real images demonstrate the
performance of the proposed algorithm in comparison with conventional methods.
Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
One of recent trends [30, 31, 14] in network architec- ture design is
stacking small filters (e.g., 1×1 or 3×3) in the entire network because the
stacked small filters is more ef- ficient than a large kernel, given the same
computational complexity. However, in the field of semantic segmenta- tion,
where we need to perform dense per-pixel prediction, we find that the large
kernel (and effective receptive field) plays an important role when we have to
perform the clas- sification and localization tasks simultaneously. Following
our design principle, we propose a Global Convolutional Network to address both
the classification and localization issues for the semantic segmentation. We
also suggest a residual-based boundary refinement to further refine the ob-
ject boundaries. Our approach achieves state-of-art perfor- mance on two public
benchmarks and significantly outper- forms previous results, 82.2% (vs 80.2%)
on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.
Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, Xiaoou Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Detecting activities in untrimmed videos is an important but challenging
task. The performance of existing methods remains unsatisfactory, e.g., they
often meet difficulties in locating the beginning and end of a long complex
action. In this paper, we propose a generic framework that can accurately
detect a wide variety of activities from untrimmed videos. Our first
contribution is a novel proposal scheme that can efficiently generate
candidates with accurate temporal boundaries. The other contribution is a
cascaded classification pipeline that explicitly distinguishes between
relevance and completeness of a candidate instance. On two challenging temporal
activity detection datasets, THUMOS14 and ActivityNet, the proposed framework
significantly outperforms the existing state-of-the-art methods, demonstrating
superior accuracy and strong adaptivity in handling activities with various
temporal structures.
Zequn Jie, Xiaodan Liang, Jiashi Feng, Xiaojie Jin, Wen Feng Lu, Shuicheng Yan
Comments: Advances in Neural Information Processing Systems 2016
Journal-ref: In Advances in Neural Information Processing Systems (pp. 127-135)
(2016)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Existing object proposal algorithms usually search for possible object
regions over multiple locations and scales separately, which ignore the
interdependency among different objects and deviate from the human perception
procedure. To incorporate global interdependency between objects into object
localization, we propose an effective Tree-structured Reinforcement Learning
(Tree-RL) approach to sequentially search for objects by fully exploiting both
the current observation and historical search paths. The Tree-RL approach
learns multiple searching policies through maximizing the long-term reward that
reflects localization accuracies over all the objects. Starting with taking the
entire image as a proposal, the Tree-RL approach allows the agent to
sequentially discover multiple objects via a tree-structured traversing scheme.
Allowing multiple near-optimal policies, Tree-RL offers more diversity in
search paths and is able to find multiple objects with a single feed-forward
pass. Therefore, Tree-RL can better cover different objects with various scales
which is quite appealing in the context of object proposal. Experiments on
PASCAL VOC 2007 and 2012 validate the effectiveness of the Tree-RL, which can
achieve comparable recalls with current object proposal algorithms via much
fewer candidate windows.
G M Mashrur E Elahi, Sanjay Kalra, Yee-Hong Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Texture analysis is a well-known research topic in computer vision and image
processing and has many applications. Gradient-based texture methods have
become popular in classification problems. For the first time we extend a
well-known gradient-based method, Co-occurrence Histograms of Oriented
Gradients (CoHOG) to extract texture features from 2D Magnetic Resonance Images
(MRI). Unlike the original CoHOG method, we use the whole image instead of
sub-regions for feature calculation. Also, we use a larger neighborhood size.
Gradient orientations of the image pixels are calculated using Sobel, Gaussian
Derivative (GD) and Local Frequency Descriptor Gradient (LFDG) operators. The
extracted feature vector size is very large and classification using a large
number of similar features does not provide the best results. In our proposed
method, for the first time to our best knowledge, only a minimum number of
significant features are selected using area under the receiver operator
characteristic (ROC) curve (AUC) thresholds with <= 0.01. In this paper, we
apply the proposed method to classify Amyotrophic Lateral Sclerosis (ALS)
patients from the controls. It is observed that selected texture features from
downsampled images are significantly different between patients and controls.
These features are used in a linear support vector machine (SVM) classifier to
determine the classification accuracy. Optimal sensitivity and specificity are
also calculated. Three different cohort datasets are used in the experiments.
The performance of the proposed method using three gradient operators and two
different neighborhood sizes is analyzed. Region based analysis is performed to
demonstrate that significant changes between patients and controls are limited
to the motor cortex.
Christian Bailer, Bertram Taetz, Didier Stricker
Comments: extended version of conference publication arXiv:1508.05151
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Modern large displacement optical flow algorithms usually use an
initialization by either sparse descriptor matching techniques or dense
approximate nearest neighbor fields. While the latter have the advantage of
being dense, they have the major disadvantage of being very outlier-prone as
they are not designed to find the optical flow, but the visually most similar
correspondence. In this article we present a dense correspondence field
approach that is much less outlier-prone and thus much better suited for
optical flow estimation than approximate nearest neighbor fields. Our approach
does not require explicit regularization, smoothing (like median filtering) or
a new data term. Instead we solely rely on patch matching techniques and a
novel multi-scale matching strategy. We also present enhancements for outlier
filtering. We show that our approach is better suited for large displacement
optical flow estimation than modern descriptor matching techniques. We do so by
initializing EpicFlow with our approach instead of their originally used
state-of-the-art descriptor matching technique. We significantly outperform the
original EpicFlow on MPI-Sintel, KITTI 2012, KITTI 2015 and Middlebury. In this
extended article of our former conference publication we further improve our
approach in matching accuracy as well as runtime and present more experiments
and insights.
Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R. Rabiee, Nic Lane, Hamed Haddadi
Comments: Technical Report
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The increasing quality of smartphone cameras and variety of photo editing
applications, in addition to the rise in popularity of image-centric social
media, have all led to a phenomenal growth in mobile-based photography.
Advances in computer vision and machine learning techniques provide a large
number of cloud-based services with the ability to provide content analysis,
face recognition, and object detection facilities to third parties. These
inferences and analytics might come with undesired privacy risks to the
individuals.
In this paper, we address a fundamental challenge: Can we utilize the local
processing capabilities of modern smartphones efficiently to provide desired
features to approved analytics services, while protecting against undesired
inference attacks and preserving privacy on the cloud? We propose a hybrid
architecture for a distributed deep learning model between the smartphone and
the cloud. We rely on the Siamese network and machine learning approaches for
providing privacy based on defined privacy constraints. We also use transfer
learning techniques to evaluate the proposed method. Using the latest deep
learning models for Face Recognition, Emotion Detection, and Gender
Classification techniques, we demonstrate the effectiveness of our technique in
providing highly accurate classification results for the desired analytics,
while proving strong privacy guarantees.
Yarin Gal, Riashat Islam, Zoubin Ghahramani
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Even though active learning forms an important pillar of machine learning,
deep learning tools are not prevalent within it. Deep learning poses several
difficulties when used in an active learning setting. First, active learning
(AL) methods generally rely on being able to learn and update models from small
amounts of data. Recent advances in deep learning, on the other hand, are
notorious for their dependence on large amounts of data. Second, many AL
acquisition functions rely on model uncertainty, yet deep learning methods
rarely represent such model uncertainty. In this paper we combine recent
advances in Bayesian deep learning into the active learning framework in a
practical way. We develop an active learning framework for high dimensional
data, a task which has been extremely challenging so far, with very sparse
existing literature. Taking advantage of specialised models such as Bayesian
convolutional neural networks, we demonstrate our active learning techniques
with image data, obtaining a significant improvement on existing active
learning approaches. We demonstrate this on both the MNIST dataset, as well as
for skin cancer diagnosis from lesion images (ISIC2016 task).
Quercus Hernandez, Diego Gutierrez, Adrian Jarabo
Comments: 7 pages, 11 figures
Subjects: Instrumentation and Detectors (physics.ins-det); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Single-Photon Avalanche Diodes (SPAD) are affordable photodetectors, capable
to collect extremely fast low-energy events, due to their single-photon
sensibility. This makes them very suitable for time-of-flight-based range
imaging systems, allowing to reduce costs and power requirements, without
sacrifizing much temporal resolution. In this work we describe a computational
model to simulate the behaviour of SPAD sensors, aiming to provide a realistic
camera model for time-resolved light transport simulation, with applications on
prototyping new reconstructions techniques based on SPAD time-of-flight data.
Our model accounts for the major effects of the sensor on the incoming signal.
We compare our model against real-world measurements, and apply it to a variety
of scenarios, including complex multiply-scattered light transport.
Aliakbar Jafarpour, Holger Lorenz
Subjects: Biological Physics (physics.bio-ph); Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an); Subcellular Processes (q-bio.SC)
Here we report on a set of programs developed at the ZMBH Bio-Imaging
Facility for tracking real-life images of cellular processes. These programs
perform 1) automated tracking; 2) quantitative and comparative track analyses
of different images in different groups; 3) different interactive visualization
schemes; and 4) interactive realistic simulation of different cellular
processes for validation and optimal problem-specific adjustment of image
acquisition parameters (tradeoff between speed, resolution, and quality with
feedback from the very final results). The collection of programs is primarily
developed for the common bio-image analysis software ImageJ (as a single Java
Plugin). Some programs are also available in other languages (C++ and
Javascript) and may be run simply with a web-browser; even on a low-end Tablet
or Smartphone. The programs are available at
this https URL
Abhishek Gupta, Coline Devin, YuXuan Liu, Pieter Abbeel, Sergey Levine
Comments: Published as a conference paper at ICLR 2017
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)
People can learn a wide range of tasks from their own experience, but can
also learn from observing other creatures. This can accelerate acquisition of
new skills even when the observed agent differs substantially from the learning
agent in terms of morphology. In this paper, we examine how reinforcement
learning algorithms can transfer knowledge between morphologically different
agents (e.g., different robots). We introduce a problem formulation where two
agents are tasked with learning multiple skills by sharing information. Our
method uses the skills that were learned by both agents to train invariant
feature spaces that can then be used to transfer other skills from one agent to
another. The process of learning these invariant feature spaces can be viewed
as a kind of “analogy making”, or implicit learning of partial correspondences
between two distinct domains. We evaluate our transfer learning algorithm in
two simulated robotic manipulation skills, and illustrate that we can transfer
knowledge between simulated robotic arms with different numbers of links, as
well as simulated arms with different actuation mechanisms, where one robot is
torque-driven while the other is tendon-driven.
Zichang He, Wen Jiang
Comments: 28 pages. arXiv admin note: text overlap with arXiv:1703.02386
Subjects: Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
Categorization is necessary for many decision making tasks. However, the
categorization process may interfere the decision making result and the law of
total probability can be violated in some situations. To predict the
interference effect of categorization, some model based on quantum probability
has been proposed. In this paper, a new quantum dynamic belief (QDB) model is
proposed. Considering the precise decision may not be made during the process,
the concept of uncertainty is introduced in our model to simulate real human
thinking process. Then the interference effect categorization can be predicted
by handling the uncertain information. The proposed model is applied to a
categorization decision-making experiment to explain the interference effect of
categorization. Compared with other models, our model is relatively more
succinct and the result shows the correctness and effectiveness of our model.
Kayvan Bijari, Hadi Zare, Hadi Veisi, Hossein Bobarshad
Comments: 17 pages, 3 figures, 8 tables
Journal-ref: Neural Comput & Applic (2016)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Cluster analysis plays an important role in decision making process for many
knowledge-based systems. There exist a wide variety of different approaches for
clustering applications including the heuristic techniques, probabilistic
models, and traditional hierarchical algorithms. In this paper, a novel
heuristic approach based on big bang-big crunch algorithm is proposed for
clustering problems. The proposed method not only takes advantage of heuristic
nature to alleviate typical clustering algorithms such as k-means, but it also
benefits from the memory based scheme as compared to its similar heuristic
techniques. Furthermore, the performance of the proposed algorithm is
investigated based on several benchmark test functions as well as on the
well-known datasets. The experimental results show the significant superiority
of the proposed method over the similar algorithms.
Alain Kibangou, Alexander Artikis, Evangelos Michelioudakis, Georgios Paliouras, Marius Schmitt, John Lygeros, Chris Baber, Natan Morar, Fabiana Fournier, Inna Skarbovsky
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)
Traffic on freeways can be managed by means of ramp meters from Road Traffic
Control rooms. Human operators cannot efficiently manage a network of ramp
meters. To support them, we present an intelligent platform for traffic
management which includes a new ramp metering coordination scheme in the
decision making module, an efficient dashboard for interacting with human
operators, machine learning tools for learning event definitions and Complex
Event Processing tools able to deal with uncertainties inherent to the traffic
use case. Unlike the usual approach, the devised event-driven platform is able
to predict a congestion up to 4 minutes before it really happens. Proactive
decision making can then be established leading to significant improvement of
traffic conditions.
Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath
Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
We consider the problem of learning a causal graph over a set of variables
with interventions. We study the cost-optimal causal graph learning problem:
For a given skeleton (undirected version of the causal graph), design the set
of interventions with minimum total cost, that can uniquely identify any causal
graph with the given skeleton. We show that this problem is solvable in
polynomial time. Later, we consider the case when the number of interventions
is limited. For this case, we provide polynomial time algorithms when the
skeleton is a tree or a clique tree. For a general chordal skeleton, we develop
an efficient greedy algorithm, which can be improved when the causal graph
skeleton is an interval graph.
Visak CV Kumar, Sehoon Ha, C Karen Liu
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Being able to fall safely is a necessary motor skill for humanoids performing
highly dynamic tasks, such as running and jumping. We propose a new method to
learn a policy that minimizes the maximal impulse during the fall. The
optimization solves for both a discrete contact planning problem and a
continuous optimal control problem. Once trained, the policy can compute the
optimal next contacting body part (e.g. left foot, right foot, or hands),
contact location and timing, and the required joint actuation. We represent the
policy as a mixture of actor-critic neural network, which consists of n control
policies and the corresponding value functions. Each pair of actor-critic is
associated with one of the n possible contacting body parts. During execution,
the policy corresponding to the highest value function will be executed while
the associated body part will be the next contact with the ground. With this
mixture of actor-critic architecture, the discrete contact sequence planning is
solved through the selection of the best critics while the continuous control
problem is solved by the optimization of actors. We show that our policy can
achieve comparable, sometimes even higher, rewards than a recursive search of
the action space using dynamic programming, while enjoying 50 to 400 times of
speed gain during online execution.
Dmitry I. Ignatov
Journal-ref: RuSSIR 2014, Nizhniy Novgorod, Russia, CCIS vol. 505, Springer
42-141
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)
This paper is a tutorial on Formal Concept Analysis (FCA) and its
applications. FCA is an applied branch of Lattice Theory, a mathematical
discipline which enables formalisation of concepts as basic units of human
thinking and analysing data in the object-attribute form. Originated in early
80s, during the last three decades, it became a popular human-centred tool for
knowledge representation and data analysis with numerous applications. Since
the tutorial was specially prepared for RuSSIR 2014, the covered FCA topics
include Information Retrieval with a focus on visualisation aspects, Machine
Learning, Data Mining and Knowledge Discovery, Text Mining and several others.
Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta
Comments: 10 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
Deep neural networks coupled with fast simulation and improved computation
have led to recent successes in the field of reinforcement learning (RL).
However, most current RL-based approaches fail to generalize since: (a) the gap
between simulation and real world is so large that policy-learning approaches
fail to transfer; (b) even if policy learning is done in real world, the data
scarcity leads to failed generalization from training to test scenarios (e.g.,
due to different friction or object masses). Inspired from H-infinity control
methods, we note that both modeling errors and differences in training and test
scenarios can be viewed as extra forces/disturbances in the system. This paper
proposes the idea of robust adversarial reinforcement learning (RARL), where we
train an agent to operate in the presence of a destabilizing adversary that
applies disturbance forces to the system. The jointly trained adversary is
reinforced — that is, it learns an optimal destabilization policy. We
formulate the policy learning as a zero-sum, minimax objective function.
Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah,
Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a)
improves training stability; (b) is robust to differences in training/test
conditions; and c) outperform the baseline even in the absence of the
adversary.
Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade
Comments: Project page: this https URL
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (cs.SY)
This work shows that policies with simple linear and RBF parameterizations
can be trained to solve a variety of continuous control tasks, including the
OpenAI gym benchmarks. The performance of these trained policies are
competitive with state of the art results, obtained with more elaborate
parameterizations such as fully connected neural networks. Furthermore,
existing training and testing scenarios are shown to be very limited and prone
to over-fitting, thus giving rise to only trajectory-centric policies. Training
with a diverse initial state distribution is shown to produce more global
policies with better generalization. This allows for interactive control
scenarios where the system recovers from large on-line perturbations; as shown
in the supplementary video.
Mikko Lauri, Eero Heinänen, Simone Frintrop
Comments: IEEE International Conference on Robotics and Automation (ICRA), 2017
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
A team of robots sharing a common goal can benefit from coordination of the
activities of team members, helping the team to reach the goal more reliably or
quickly. We address the problem of coordinating the actions of a team of robots
with periodic communication capability executing an information gathering task.
We cast the problem as a multi-agent optimal decision-making problem with an
information theoretic objective function. We show that appropriate techniques
for solving decentralized partially observable Markov decision processes
(Dec-POMDPs) are applicable in such information gathering problems. We quantify
the usefulness of coordinated information gathering through simulation studies,
and demonstrate the feasibility of the method in a real-world target tracking
domain.
Gourav G. Shenoy, Mangirish A. Wagle, Anwar Shaikh
Comments: 12 pages, 8 tables, 7 figures
Subjects: Information Retrieval (cs.IR)
With hundreds, even thousands, of hotels to choose from at every destination,
it’s difficult to know which will suit your personal preferences. Expedia wants
to take the proverbial rabbit hole out of hotel search by providing
personalized hotel recommendations to their users. This is no small task for a
site with hundreds of millions of visitors every month! Currently, Expedia uses
search parameters to adjust their hotel recommendations, but there aren’t
enough customer specific data to personalize them for each user. In this
project, we have taken up the challenge to contextualize customer data and
predict the likelihood a user will stay at 100 different hotel groups.
Dmitry I. Ignatov
Journal-ref: RuSSIR 2014, Nizhniy Novgorod, Russia, CCIS vol. 505, Springer
42-141
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)
This paper is a tutorial on Formal Concept Analysis (FCA) and its
applications. FCA is an applied branch of Lattice Theory, a mathematical
discipline which enables formalisation of concepts as basic units of human
thinking and analysing data in the object-attribute form. Originated in early
80s, during the last three decades, it became a popular human-centred tool for
knowledge representation and data analysis with numerous applications. Since
the tutorial was specially prepared for RuSSIR 2014, the covered FCA topics
include Information Retrieval with a focus on visualisation aspects, Machine
Learning, Data Mining and Knowledge Discovery, Text Mining and several others.
B Sengupta, E Vazquez, M Sasdelli, Y Qian, M Peniak, L Netherton, G Delfino
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)
With the advent of specialized hardware such as Graphics Processing Units
(GPUs), large scale image localization, classification and retrieval have seen
increased prevalence. Designing scalable software architecture that co-evolves
with such specialized hardware is a challenge in the commercial setting. In
this paper, we describe one such architecture ( extit{Cortexica}) that
leverages scalability of GPUs and sandboxing offered by docker containers. This
allows for the flexibility of mixing different computer architectures as well
as computational algorithms with the security of a trusted environment. We
illustrate the utility of this framework in a commercial setting i.e.,
searching for multiple products in an image by combining image localisation and
retrieval.
Nesreen K. Ahmed, Nick Duffield, Theodore Willke, Ryan A. Rossi
Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Statistics Theory (math.ST)
We propose Graph Priority Sampling (GPS), a new paradigm for order-based
reservoir sampling from massive streams of graph edges. GPS provides a general
way to weight edge sampling according to auxiliary and/or size variables so as
to accomplish various estimation goals of graph properties. In the context of
subgraph counting, we show how edge sampling weights can be chosen so as to
minimize the estimation variance of counts of specified sets of subgraphs. In
distinction with many prior graph sampling schemes, GPS separates the functions
of edge sampling and subgraph estimation. We propose two estimation frameworks:
(1) Post-Stream estimation, to allow GPS to construct a reference sample of
edges to support retrospective graph queries, and (2) In-Stream estimation, to
allow GPS to obtain lower variance estimates by incrementally updating the
subgraph count estimates during stream processing. Unbiasedness of subgraph
estimators is established through a new Martingale formulation of graph stream
order sampling, which shows that subgraph estimators, written as a product of
constituent edge estimators are unbiased, even when computed at different
points in the stream. The separation of estimation and sampling enables
significant resource savings relative to previous work. We illustrate our
framework with applications to triangle and wedge counting. We perform a
large-scale experimental study on real-world graphs from various domains and
types. GPS achieves high accuracy with less than 1% error for triangle and
wedge counting, while storing a small fraction of the graph with average update
times of a few microseconds per edge. Notably, for a large Twitter graph with
more than 260M edges, GPS accurately estimates triangle counts with less than
1% error, while storing only 40K edges.
Benjamin Paul Chamberlain, Angelo Cardoso, C.H. Bryan Liu, Roberto Pagliari, Marc Peter Deisenroth
Comments: 9 pages, 10 figures
Subjects: Learning (cs.LG); Computers and Society (cs.CY); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We describe the Customer Life Time Value (CLTV) prediction system deployed at
ASOS.com, a global online fashion retailer. CLTV prediction is an important
problem in e-commerce where an accurate estimate of future value allows
retailers to effectively allocate marketing spend, identify and nurture high
value customers and mitigate exposure to losses. The system at ASOS provides
daily estimates of the future value of every customer and is one of the
cornerstones of the personalised shopping experience. The state of the art in
this domain uses large numbers of handcrafted features and ensemble regressors
to forecast value, predict churn and evaluate customer loyalty. We describe our
system, which adopts this approach, and our ongoing efforts to further improve
it. Recently, domains including language, vision and speech have shown dramatic
advances by replacing handcrafted features with features that are learned
automatically from data. We show that learning feature representations is a
promising extension to the state of the art in CLTV modelling. We propose a
novel way to generate embeddings of customers, which addresses the issue of the
ever changing product catalogue and obtain a significant improvement over an
exhaustive set of handcrafted features.
Tianran Hu, Han Guo, Hao Sun, Thuy-vy Thi Nguyen, Jiebo Luo
Comments: 10 pages, published at ICWSM’17
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Emojis, as a new way of conveying nonverbal cues, are widely adopted in
computer-mediated communications. In this paper, first from a message sender
perspective, we focus on people’s motives in using four types of emojis —
positive, neutral, negative, and non-facial. We compare the willingness levels
of using these emoji types for seven typical intentions that people usually
apply nonverbal cues for in communication. The results of extensive statistical
hypothesis tests not only report the popularities of the intentions, but also
uncover the subtle differences between emoji types in terms of intended uses.
Second, from a perspective of message recipients, we further study the
sentiment effects of emojis, as well as their duplications, on verbal messages.
Different from previous studies in emoji sentiment, we study the sentiments of
emojis and their contexts as a whole. The experiment results indicate that the
powers of conveying sentiment are different between four emoji types, and the
sentiment effects of emojis vary in the contexts of different valences.
Tianran Hu, Ruihua Song, Philip Ding, Xing Xie, Jiebo Luo
Comments: 4 pages, 1 figure, published at ICWSM’17
Subjects: Computation and Language (cs.CL)
Divergent word usages reflect differences among people. In this paper, we
present a novel angle for studying word usage divergence — word
interpretations. We propose an approach that quantifies semantic differences in
interpretations among different groups of people. The effectiveness of our
approach is validated by quantitative evaluations. Experiment results indicate
that divergences in word interpretations exist. We further apply the approach
to two well studied types of differences between people — gender and region.
The detected words with divergent interpretations reveal the unique features of
specific groups of people. For gender, we discover that certain different
interests, social attitudes, and characters between males and females are
reflected in their divergent interpretations of many words. For region, we find
that specific interpretations of certain words reveal the geographical and
cultural features of different regions.
Bhuwan Dhingra, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Subjects: Computation and Language (cs.CL)
Training recurrent neural networks to model long term dependencies is
difficult. Hence, we propose to use external linguistic knowledge as an
explicit signal to inform the model which memories it should utilize.
Specifically, external knowledge is used to augment a sequence with typed edges
between arbitrarily distant elements, and the resulting graph is decomposed
into directed acyclic subgraphs. We introduce a model that encodes such graphs
as explicit memory in recurrent neural networks, and use it to model
coreference relations in text. We apply our model to several text comprehension
tasks and achieve new state-of-the-art results on all considered benchmarks,
including CNN, bAbi, and LAMBADA. On the bAbi QA tasks, our model solves 15 out
of the 20 tasks with only 1000 training examples per task. Analysis of the
learned representations further demonstrates the ability of our model to encode
fine-grained entity information across a document.
Dmitry I. Ignatov
Journal-ref: RuSSIR 2014, Nizhniy Novgorod, Russia, CCIS vol. 505, Springer
42-141
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)
This paper is a tutorial on Formal Concept Analysis (FCA) and its
applications. FCA is an applied branch of Lattice Theory, a mathematical
discipline which enables formalisation of concepts as basic units of human
thinking and analysing data in the object-attribute form. Originated in early
80s, during the last three decades, it became a popular human-centred tool for
knowledge representation and data analysis with numerous applications. Since
the tutorial was specially prepared for RuSSIR 2014, the covered FCA topics
include Information Retrieval with a focus on visualisation aspects, Machine
Learning, Data Mining and Knowledge Discovery, Text Mining and several others.
Ziang Xie, Sida I. Wang, Jiwei Li, Daniel Lévy, Aiming Nie, Dan Jurafsky, Andrew Y. Ng
Comments: ICLR 2017
Subjects: Learning (cs.LG); Computation and Language (cs.CL)
Data noising is an effective technique for regularizing neural network
models. While noising is widely adopted in application domains such as vision
and speech, commonly used noising primitives have not been developed for
discrete sequence-level settings such as language modeling. In this paper, we
derive a connection between input noising in neural network language models and
smoothing in (n)-gram models. Using this connection, we draw upon ideas from
smoothing to develop effective noising schemes. We demonstrate performance
gains when applying the proposed schemes to language modeling and machine
translation. Finally, we provide empirical analysis validating the relationship
between noising and smoothing.
Enrico Calore, Alessandro Gabbana, Sebastiano Fabio Schifano, Raffaele Tripiccione
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Energy efficiency is becoming increasingly important for computing systems,
in particular for large scale HPC facilities. In this work we evaluate, from an
user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS)
techniques, assisted by the power and energy monitoring capabilities of modern
processors in order to tune applications for energy efficiency. We run selected
kernels and a full HPC application on two high-end processors widely used in
the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate
the available trade-offs between energy-to-solution and time-to-solution,
attempting a function-by-function frequency tuning. We finally estimate the
benefits obtainable running the full code on a HPC multi-GPU node, with respect
to default clock frequency governors. We instrument our code to accurately
monitor power consumption and execution time without the need of any additional
hardware, and we enable it to change CPUs and GPUs clock frequencies while
running. We analyze our results on the different architectures using a simple
energy-performance model, and derive a number of energy saving strategies which
can be easily adopted on recent high-end HPC systems for generic applications.
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, Julien Stainer
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
The growth of data, the need for scalability and the complexity of models
used in modern machine learning calls for distributed implementations. Yet, as
of today, distributed machine learning frameworks have largely ignored the
possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study
the robustness to Byzantine failures at the fundamental level of stochastic
gradient descent (SGD), the heart of most machine learning algorithms. Assuming
a set of (n) workers, up to (f) of them being Byzantine, we ask how robust can
SGD be, without limiting the dimension, nor the size of the parameter space.
We first show that no gradient descent update rule based on a linear
combination of the vectors proposed by the workers (i.e, current approaches)
tolerates a single Byzantine failure. We then formulate a resilience property
of the update rule capturing the basic requirements to guarantee convergence
despite (f) Byzantine workers. We finally propose Krum, an update rule that
satisfies the resilience property aforementioned. For a (d)-dimensional
learning problem, the time complexity of Krum is (O(n^2 cdot (d + log n))).
Jesus Arias Fisteus, Luis Sanchez Fernandez, Victor Corcoba Magaña, Mario Muñoz Organero, Jorge Yago Fernandez, Juan Antonio Alvarez Garcia
Comments: Preprint of a paper accepted for publication at this http URL as part of the Proceedings of JARCA 2016 (XVIII Jornadas de ARCA Sistemas Cualitativos y sus Aplicaciones en Diagnosis, Rob’otica e Inteligencia Ambiental), Almeria, Spain, June 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Many of the services a smart city can provide to its citizens rely on the
ability of its infrastructure to collect and process in real time vast amounts
of continuous data that sensors deployed through the city produce. In this
paper we present the server infrastructure we have designed in the context of
the HERMES project to collect the data from sensors and aggregate it in streams
for their use in services of the smart city.
Tomasz Jurdzinski, Krzysztof Nowicki
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The congested clique is a synchronous, message-passing model of distributed
computing in which each computational unit (node) in each round can send
message of O(log n) bits to each other node of the network, where n is the
number of nodes. This model has been considered under two extreme scanarios:
unicast or broadcast. In the unicast model, a node can send (possibly)
different message to each other node of the network. In contrast, in the
broadcast model each node sends a single (the same) message to all other nodes.
We study the congested clique model parametrized by the range r, the maximum
number of different messages a node can send in one round. Following recent
progress in design of algorihms for graph connectivity and minimum span- ning
forest (MSF) in the unicast congested clique, we study these problems in
limited variants of the congested clique. We present the first sub-logarithmic
algorithm for connected components in the broadcast congested clique. Then, we
show that efficient unicast deterministic algorithm for MSF and randomized
algorithm for connected components can be efficiently imple- mented in the
rcast model with range r = 2, the weakest model of the congested clique above
the broadcast variant (r = 1) in the hierarchy with respect to range. More
importantly, our al- gorithms give the first solutions with optimal capacity of
communication edges, while preserving small round complexity.
M. Cárcamo, P. Román, S. Casassus, V. Moral, F.R. Rannou
Comments: 11 pages, 13 figures
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Distributed, Parallel, and Cluster Computing (cs.DC)
The maximum entropy method (MEM) is a well known deconvolution technique in
radio-interferometry. This method solves a non-linear optimization problem with
an entropy regularization term. Other heuristics such as CLEAN are faster but
highly user dependent. Nevertheless, MEM has the following advantages: it is
unsupervised, it has an statistical basis, it has a better resolution and
better image quality under certain conditions. This work presents a high
performance GPU version of non-gridded MEM, which is tested using
interferometric and simulated data. We propose a single-GPU and a multi-GPU
implementation for single and multi-spectral data, respectively. We also make
use of the Peer-to-Peer and Unified Virtual Addressing features of newer GPUs
which allows to exploit transparently and efficiently multiple GPUs. Several
ALMA data sets are used to demonstrate the effectiveness in imaging and to
evaluate GPU performance. The results show that a speedup from 1000 to 5000
times faster than a sequential version can be achieved, depending on data and
image size. This has allowed us to reconstruct the HD142527 CO(6-5) short
baseline data set in 2.1 minutes, instead of the 2.5 days that takes on CPU.
B Sengupta, E Vazquez, M Sasdelli, Y Qian, M Peniak, L Netherton, G Delfino
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE)
With the advent of specialized hardware such as Graphics Processing Units
(GPUs), large scale image localization, classification and retrieval have seen
increased prevalence. Designing scalable software architecture that co-evolves
with such specialized hardware is a challenge in the commercial setting. In
this paper, we describe one such architecture ( extit{Cortexica}) that
leverages scalability of GPUs and sandboxing offered by docker containers. This
allows for the flexibility of mixing different computer architectures as well
as computational algorithms with the security of a trusted environment. We
illustrate the utility of this framework in a commercial setting i.e.,
searching for multiple products in an image by combining image localisation and
retrieval.
Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R. Rabiee, Nic Lane, Hamed Haddadi
Comments: Technical Report
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The increasing quality of smartphone cameras and variety of photo editing
applications, in addition to the rise in popularity of image-centric social
media, have all led to a phenomenal growth in mobile-based photography.
Advances in computer vision and machine learning techniques provide a large
number of cloud-based services with the ability to provide content analysis,
face recognition, and object detection facilities to third parties. These
inferences and analytics might come with undesired privacy risks to the
individuals.
In this paper, we address a fundamental challenge: Can we utilize the local
processing capabilities of modern smartphones efficiently to provide desired
features to approved analytics services, while protecting against undesired
inference attacks and preserving privacy on the cloud? We propose a hybrid
architecture for a distributed deep learning model between the smartphone and
the cloud. We rely on the Siamese network and machine learning approaches for
providing privacy based on defined privacy constraints. We also use transfer
learning techniques to evaluate the proposed method. Using the latest deep
learning models for Face Recognition, Emotion Detection, and Gender
Classification techniques, we demonstrate the effectiveness of our technique in
providing highly accurate classification results for the desired analytics,
while proving strong privacy guarantees.
Nick Harvey, Chris Liaw, Abbas Mehrabian
Comments: 11 pages, 2 figures
Subjects: Learning (cs.LG)
We prove new upper and lower bounds on the VC-dimension of deep neural
networks with the ReLU activation function. These bounds are tight for almost
the entire range of parameters. Letting (W) be the number of weights and (L) be
the number of layers, we prove that the VC-dimension is (O(W L log(W))) and
(Omega( W L log(W/L) )). This improves both the previously known upper bounds
and lower bounds. In terms of the number (U) of non-linear units, we prove a
tight bound (Theta(W U)) on the VC-dimension. All of these results generalize
to arbitrary piecewise linear activation functions.
Yingzhen Li, Yarin Gal
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
To obtain uncertainty estimates with real-world Bayesian deep learning
models, practical inference approximations are needed. Dropout variational
inference (VI) for example has been used for machine vision and medical
applications, but VI can severely underestimates model uncertainty.
Alpha-divergences are alternative divergences to VI’s KL objective, which are
able to avoid VI’s uncertainty underestimation. But these are hard to use in
practice: existing techniques can only use Gaussian approximating
distributions, and require existing models to be changed radically, thus are of
limited use for practitioners. We propose a re-parametrisation of the
alpha-divergence objectives, deriving a simple inference technique which,
together with dropout, can be easily implemented with existing models by simply
changing the loss of the model. We demonstrate improved uncertainty estimates
and accuracy compared to VI in dropout networks. We study our model’s epistemic
uncertainty far away from the data using adversarial images, showing that these
can be distinguished from non-adversarial images by examining our model’s
uncertainty.
Yarin Gal, Riashat Islam, Zoubin Ghahramani
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Even though active learning forms an important pillar of machine learning,
deep learning tools are not prevalent within it. Deep learning poses several
difficulties when used in an active learning setting. First, active learning
(AL) methods generally rely on being able to learn and update models from small
amounts of data. Recent advances in deep learning, on the other hand, are
notorious for their dependence on large amounts of data. Second, many AL
acquisition functions rely on model uncertainty, yet deep learning methods
rarely represent such model uncertainty. In this paper we combine recent
advances in Bayesian deep learning into the active learning framework in a
practical way. We develop an active learning framework for high dimensional
data, a task which has been extremely challenging so far, with very sparse
existing literature. Taking advantage of specialised models such as Bayesian
convolutional neural networks, we demonstrate our active learning techniques
with image data, obtaining a significant improvement on existing active
learning approaches. We demonstrate this on both the MNIST dataset, as well as
for skin cancer diagnosis from lesion images (ISIC2016 task).
Andreas Doerr, Duy Nguyen-Tuong, Alonso Marco, Stefan Schaal, Sebastian Trimpe
Comments: Accepted final version to appear in 2017 IEEE International Conference on Robotics and Automation (ICRA)
Subjects: Learning (cs.LG); Robotics (cs.RO); Systems and Control (cs.SY); Machine Learning (stat.ML)
PID control architectures are widely used in industrial applications. Despite
their low number of open parameters, tuning multiple, coupled PID controllers
can become tedious in practice. In this paper, we extend PILCO, a model-based
policy search framework, to automatically tune multivariate PID controllers
purely based on data observed on an otherwise unknown system. The system’s
state is extended appropriately to frame the PID policy as a static state
feedback policy. This renders PID tuning possible as the solution of a finite
horizon optimal control problem without further a priori knowledge. The
framework is applied to the task of balancing an inverted pendulum on a seven
degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast
and data-efficient policy learning, even on complex real world problems.
Dylan J. Foster, Daniel Reichman, Karthik Sridharan
Comments: 33 pages
Subjects: Learning (cs.LG)
We consider the statistical problem of recovering a hidden “ground truth”
binary labeling for the vertices of a graph G up to low Hamming error from
noisy edge and vertex measurements. We present new algorithms and a sharp
finite-sample analysis for this problem on trees and sparse graphs with poor
expansion properties such as hypergrids and ring lattices. Our method
generalizes and improves over Globerson et al. (2015), who introduced the
problem for two-dimensional grid lattices.
For trees we provide a simple, efficient, algorithm that infers the ground
truth with optimal Hamming error and implies recovery results for all connected
graphs. Here, the presence of side information is critical to obtain a
non-trivial recovery rate. We then show how to adapt this algorithm to tree
decompositions of edge-subgraphs of certain graph families such as lattices,
resulting in optimal recovery error rates that can be obtained in a highly
efficient manner.
The thrust of our analysis is to 1) use the tree decomposition along with
edge measurements to produce a small class of viable vertex labelings and 2)
apply fast rate results from statistical learn-ing theory to show that we can
infer the ground truth from this class using vertex measurements. We show the
power of our method by providing several examples including hypergrids, ring
lattices, and the Newman-Watts model for small world graphs. For
two-dimensional grids, our results improve over Globerson et al. (2015) by
obtaining optimal recovery in the constant-height regime.
Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta
Comments: 10 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
Deep neural networks coupled with fast simulation and improved computation
have led to recent successes in the field of reinforcement learning (RL).
However, most current RL-based approaches fail to generalize since: (a) the gap
between simulation and real world is so large that policy-learning approaches
fail to transfer; (b) even if policy learning is done in real world, the data
scarcity leads to failed generalization from training to test scenarios (e.g.,
due to different friction or object masses). Inspired from H-infinity control
methods, we note that both modeling errors and differences in training and test
scenarios can be viewed as extra forces/disturbances in the system. This paper
proposes the idea of robust adversarial reinforcement learning (RARL), where we
train an agent to operate in the presence of a destabilizing adversary that
applies disturbance forces to the system. The jointly trained adversary is
reinforced — that is, it learns an optimal destabilization policy. We
formulate the policy learning as a zero-sum, minimax objective function.
Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah,
Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a)
improves training stability; (b) is robust to differences in training/test
conditions; and c) outperform the baseline even in the absence of the
adversary.
Tomo Miyazaki, Shinichiro Omachi
Comments: 8 pages
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
This paper presents a novel method for structural data recognition using a
large number of graph models. In general, prevalent methods for structural data
recognition have two shortcomings: 1) Only a single model is used to capture
structural variation. 2) Naive recognition methods are used, such as the
nearest neighbor method. In this paper, we propose strengthening the
recognition performance of these models as well as their ability to capture
structural variation. The proposed method constructs a large number of graph
models and trains decision trees using the models. This paper makes two main
contributions. The first is a novel graph model that can quickly perform
calculations, which allows us to construct several models in a feasible amount
of time. The second contribution is a novel approach to structural data
recognition: graph model boosting. Comprehensive structural variations can be
captured with a large number of graph models constructed in a boosting
framework, and a sophisticated classifier can be formed by aggregating the
decision trees. Consequently, we can carry out structural data recognition with
powerful recognition capability in the face of comprehensive structural
variation. The experiments shows that the proposed method achieves impressive
results and outperforms existing methods on datasets of IAM graph database
repository.
Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade
Comments: Project page: this https URL
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (cs.SY)
This work shows that policies with simple linear and RBF parameterizations
can be trained to solve a variety of continuous control tasks, including the
OpenAI gym benchmarks. The performance of these trained policies are
competitive with state of the art results, obtained with more elaborate
parameterizations such as fully connected neural networks. Furthermore,
existing training and testing scenarios are shown to be very limited and prone
to over-fitting, thus giving rise to only trajectory-centric policies. Training
with a diverse initial state distribution is shown to produce more global
policies with better generalization. This allows for interactive control
scenarios where the system recovers from large on-line perturbations; as shown
in the supplementary video.
Ashok Cutkosky, Kwabena Boahen
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
The vast majority of optimization and online learning algorithms today
require some prior information about the data (often in the form of bounds on
gradients or on the optimal parameter value). When this information is not
available, these algorithms require laborious manual tuning of various
hyperparameters, motivating the search for algorithms that can adapt to the
data with no prior information. We describe a frontier of new lower bounds on
the performance of such algorithms, reflecting a tradeoff between a term that
depends on the optimal parameter value and a term that depends on the
gradients’ rate of growth. Further, we construct a family of algorithms whose
performance matches any desired point on this frontier, which no previous
algorithm reaches.
Sharan Vaswani, Mark Schmidt, Laks V.S. Lakshmanan
Subjects: Learning (cs.LG)
The gang of bandits (GOB) model cite{cesa2013gang} is a recent contextual
bandits framework that shares information between a set of bandit problems,
related by a known (possibly noisy) graph. This model is useful in problems
like recommender systems where the large number of users makes it vital to
transfer information between users. Despite its effectiveness, the existing GOB
model can only be applied to small problems due to its quadratic
time-dependence on the number of nodes. Existing solutions to combat the
scalability issue require an often-unrealistic clustering assumption. By
exploiting a connection to Gaussian Markov random fields (GMRFs), we show that
the GOB model can be made to scale to much larger graphs without additional
assumptions. In addition, we propose a Thompson sampling algorithm which uses
the recent GMRF sampling-by-perturbation technique, allowing it to scale to
even larger problems (leading to a “horde” of bandits). We give regret bounds
and experimental results for GOB with Thompson sampling and epoch-greedy
algorithms, indicating that these methods are as good as or significantly
better than ignoring the graph or adopting a clustering-based approach.
Finally, when an existing graph is not available, we propose a heuristic for
learning it on the fly and show promising results.
Ashok Cutkosky, Kwabena Boahen
Journal-ref: Advances in Neural Information Processing Systems 29 (2016)
748-756
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
We propose an online convex optimization algorithm (RescaledExp) that
achieves optimal regret in the unconstrained setting without prior knowledge of
any bounds on the loss functions. We prove a lower bound showing an exponential
separation between the regret of existing algorithms that require a known bound
on the loss functions and any algorithm that does not require such knowledge.
RescaledExp matches this lower bound asymptotically in the number of
iterations. RescaledExp is naturally hyperparameter-free and we demonstrate
empirically that it matches prior optimization algorithms that require
hyperparameter optimization.
Eliav Buchnik, Edith Cohen
Comments: 11 pages, 5 figures, 6 tables
Subjects: Learning (cs.LG)
Graph-based semi-supervised learning (SSL) algorithms predict labels for all
nodes based on provided labels of a small set of seed nodes. Classic methods
capture the graph structure through some underlying diffusion process that
propagates through the graph edges. Spectral diffusion, which includes
personalized page rank and label propagation, propagates through random walks.
Social diffusion propagates through shortest paths. A common ground to these
diffusions is their {em linearity}, which does not distinguish between
contributions of few “strong” relations and many “weak” relations.
Recently, non-linear methods such as node embeddings and graph convolutional
networks (GCN) demonstrated a large gain in quality for SSL tasks. These
methods introduce multiple components and greatly vary on how the graph
structure, seed label information, and other features are used.
We aim here to study the contribution of non-linearity, as an isolated
ingredient, to the performance gain. To do so, we place classic linear graph
diffusions in a self-training framework. Surprisingly, we observe that SSL
using the resulting {em bootstrapped diffusions} not only significantly
improves over the respective non-bootstrapped baselines but also outperform
state-of-the-art non-linear SSL methods. Moreover, since the self-training
wrapper retains the scalability of the base method, we obtain both higher
quality and better scalability.
Benjamin Paul Chamberlain, Angelo Cardoso, C.H. Bryan Liu, Roberto Pagliari, Marc Peter Deisenroth
Comments: 9 pages, 10 figures
Subjects: Learning (cs.LG); Computers and Society (cs.CY); Information Retrieval (cs.IR); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We describe the Customer Life Time Value (CLTV) prediction system deployed at
ASOS.com, a global online fashion retailer. CLTV prediction is an important
problem in e-commerce where an accurate estimate of future value allows
retailers to effectively allocate marketing spend, identify and nurture high
value customers and mitigate exposure to losses. The system at ASOS provides
daily estimates of the future value of every customer and is one of the
cornerstones of the personalised shopping experience. The state of the art in
this domain uses large numbers of handcrafted features and ensemble regressors
to forecast value, predict churn and evaluate customer loyalty. We describe our
system, which adopts this approach, and our ongoing efforts to further improve
it. Recently, domains including language, vision and speech have shown dramatic
advances by replacing handcrafted features with features that are learned
automatically from data. We show that learning feature representations is a
promising extension to the state of the art in CLTV modelling. We propose a
novel way to generate embeddings of customers, which addresses the issue of the
ever changing product catalogue and obtain a significant improvement over an
exhaustive set of handcrafted features.
Ziang Xie, Sida I. Wang, Jiwei Li, Daniel Lévy, Aiming Nie, Dan Jurafsky, Andrew Y. Ng
Comments: ICLR 2017
Subjects: Learning (cs.LG); Computation and Language (cs.CL)
Data noising is an effective technique for regularizing neural network
models. While noising is widely adopted in application domains such as vision
and speech, commonly used noising primitives have not been developed for
discrete sequence-level settings such as language modeling. In this paper, we
derive a connection between input noising in neural network language models and
smoothing in (n)-gram models. Using this connection, we draw upon ideas from
smoothing to develop effective noising schemes. We demonstrate performance
gains when applying the proposed schemes to language modeling and machine
translation. Finally, we provide empirical analysis validating the relationship
between noising and smoothing.
Amina Mollaysa, Pablo Strasser, Alexandros Kalousis
Comments: 11 page with appendix
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Very often features come with their own vectorial descriptions which provide
detailed information about their properties. We refer to these vectorial
descriptions as feature side-information. In the standard learning scenario,
input is represented as a vector of features and the feature side-information
is most often ignored or used only for feature selection prior to model
fitting. We believe that feature side-information which carries information
about features intrinsic property will help improve model prediction if used in
a proper way during learning process. In this paper, we propose a framework
that allows for the incorporation of the feature side-information during the
learning of very general model families to improve the prediction performance.
We control the structures of the learned models so that they reflect features
similarities as these are defined on the basis of the side-information. We
perform experiments on a number of benchmark datasets which show significant
predictive performance gains, over a number of baselines, as a result of the
exploitation of the side-information.
Omer Dror, Boaz Nadler, Erhan Bilal, Yuval Kluger
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Consider a regression problem where there is no labeled data and the only
observations are the predictions (f_i(x_j)) of (m) experts (f_{i}) over many
samples (x_j). With no knowledge on the accuracy of the experts, is it still
possible to accurately estimate the unknown responses (y_{j})? Can one still
detect the least or most accurate experts? In this work we propose a framework
to study these questions, based on the assumption that the (m) experts have
uncorrelated deviations from the optimal predictor. Assuming the first two
moments of the response are known, we develop methods to detect the best and
worst regressors, and derive U-PCR, a novel principal components approach for
unsupervised ensemble regression. We provide theoretical support for U-PCR and
illustrate its improved accuracy over the ensemble mean and median on a variety
of regression problems.
Visak CV Kumar, Sehoon Ha, C Karen Liu
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)
Being able to fall safely is a necessary motor skill for humanoids performing
highly dynamic tasks, such as running and jumping. We propose a new method to
learn a policy that minimizes the maximal impulse during the fall. The
optimization solves for both a discrete contact planning problem and a
continuous optimal control problem. Once trained, the policy can compute the
optimal next contacting body part (e.g. left foot, right foot, or hands),
contact location and timing, and the required joint actuation. We represent the
policy as a mixture of actor-critic neural network, which consists of n control
policies and the corresponding value functions. Each pair of actor-critic is
associated with one of the n possible contacting body parts. During execution,
the policy corresponding to the highest value function will be executed while
the associated body part will be the next contact with the ground. With this
mixture of actor-critic architecture, the discrete contact sequence planning is
solved through the selection of the best critics while the continuous control
problem is solved by the optimization of actors. We show that our policy can
achieve comparable, sometimes even higher, rewards than a recursive search of
the action space using dynamic programming, while enjoying 50 to 400 times of
speed gain during online execution.
Kayvan Bijari, Hadi Zare, Hadi Veisi, Hossein Bobarshad
Comments: 17 pages, 3 figures, 8 tables
Journal-ref: Neural Comput & Applic (2016)
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)
Cluster analysis plays an important role in decision making process for many
knowledge-based systems. There exist a wide variety of different approaches for
clustering applications including the heuristic techniques, probabilistic
models, and traditional hierarchical algorithms. In this paper, a novel
heuristic approach based on big bang-big crunch algorithm is proposed for
clustering problems. The proposed method not only takes advantage of heuristic
nature to alleviate typical clustering algorithms such as k-means, but it also
benefits from the memory based scheme as compared to its similar heuristic
techniques. Furthermore, the performance of the proposed algorithm is
investigated based on several benchmark test functions as well as on the
well-known datasets. The experimental results show the significant superiority
of the proposed method over the similar algorithms.
Tomas Pevny, Petr Somol
Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)
Modeling network traffic is gaining importance in order to counter modern
threats of ever increasing sophistication. It is though surprisingly difficult
and costly to construct reliable classifiers on top of telemetry data due to
the variety and complexity of signals that no human can manage to interpret in
full. Obtaining training data with sufficiently large and variable body of
labels can thus be seen as prohibitive problem. The goal of this work is to
detect infected computers by observing their HTTP(S) traffic collected from
network sensors, which are typically proxy servers or network firewalls, while
relying on only minimal human input in model training phase. We propose a
discriminative model that makes decisions based on all computer’s traffic
observed during predefined time window (5 minutes in our case). The model is
trained on collected traffic samples over equally sized time window per large
number of computers, where the only labels needed are human verdicts about the
computer as a whole (presumed infected vs. presumed clean). As part of training
the model itself recognizes discriminative patterns in traffic targeted to
individual servers and constructs the final high-level classifier on top of
them. We show the classifier to perform with very high precision, while the
learned traffic patterns can be interpreted as Indicators of Compromise. In the
following we implement the discriminative model as a neural network with
special structure reflecting two stacked multi-instance problems. The main
advantages of the proposed configuration include not only improved accuracy and
ability to learn from gross labels, but also automatic learning of server types
(together with their detectors) which are typically visited by infected
computers.
Quan Zou, Shixiang Wan, Ying Ju, Jijun Tang, Xiangxiang Zeng
Subjects: Quantitative Methods (q-bio.QM); Learning (cs.LG); Biomolecules (q-bio.BM)
Background: It is necessary and essential to discovery protein function from
the novel primary sequences. Wet lab experimental procedures are not only
time-consuming, but also costly, so predicting protein structure and function
reliably based only on amino acid sequence has significant value. TATA-binding
protein (TBP) is a kind of DNA binding protein, which plays a key role in the
transcription regulation. Our study proposed an automatic approach for
identifying TATA-binding proteins efficiently, accurately, and conveniently.
This method would guide for the special protein identification with
computational intelligence strategies. Results: Firstly, we proposed novel
fingerprint features for TBP based on pseudo amino acid composition,
physicochemical properties, and secondary structure. Secondly, hierarchical
features dimensionality reduction strategies were employed to improve the
performance furthermore. Currently, Pretata achieves 92.92% TATA- binding
protein prediction accuracy, which is better than all other existing methods.
Conclusions: The experiments demonstrate that our method could greatly improve
the prediction accuracy and speed, thus allowing large-scale NGS data
prediction to be practical. A web server is developed to facilitate the other
researchers, which can be accessed at this http URL
Alain Kibangou, Alexander Artikis, Evangelos Michelioudakis, Georgios Paliouras, Marius Schmitt, John Lygeros, Chris Baber, Natan Morar, Fabiana Fournier, Inna Skarbovsky
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)
Traffic on freeways can be managed by means of ramp meters from Road Traffic
Control rooms. Human operators cannot efficiently manage a network of ramp
meters. To support them, we present an intelligent platform for traffic
management which includes a new ramp metering coordination scheme in the
decision making module, an efficient dashboard for interacting with human
operators, machine learning tools for learning event definitions and Complex
Event Processing tools able to deal with uncertainties inherent to the traffic
use case. Unlike the usual approach, the devised event-driven platform is able
to predict a congestion up to 4 minutes before it really happens. Proactive
decision making can then be established leading to significant improvement of
traffic conditions.
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, Julien Stainer
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
The growth of data, the need for scalability and the complexity of models
used in modern machine learning calls for distributed implementations. Yet, as
of today, distributed machine learning frameworks have largely ignored the
possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study
the robustness to Byzantine failures at the fundamental level of stochastic
gradient descent (SGD), the heart of most machine learning algorithms. Assuming
a set of (n) workers, up to (f) of them being Byzantine, we ask how robust can
SGD be, without limiting the dimension, nor the size of the parameter space.
We first show that no gradient descent update rule based on a linear
combination of the vectors proposed by the workers (i.e, current approaches)
tolerates a single Byzantine failure. We then formulate a resilience property
of the update rule capturing the basic requirements to guarantee convergence
despite (f) Byzantine workers. We finally propose Krum, an update rule that
satisfies the resilience property aforementioned. For a (d)-dimensional
learning problem, the time complexity of Krum is (O(n^2 cdot (d + log n))).
Anru Zhang, Dong Xia
Subjects: Statistics Theory (math.ST); Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Tensors, or high-order arrays, attract much attention in recent research. In
this paper, we propose a general framework for tensor principal component
analysis (tensor PCA), which focuses on the methodology and theory for
extracting the hidden low-rank structure from the high-dimensional tensor data.
A unified solution is provided for tensor PCA with considerations in both
statistical limits and computational costs. The problem exhibits three
different phases according to the signal-noise-ratio (SNR). In particular, with
strong SNR, we propose a fast spectral power iteration method that achieves the
minimax optimal rate of convergence in estimation; with weak SNR, the
information-theoretical lower bound shows that it is impossible to have
consistent estimation in general; with moderate SNR, we show that the
non-convex maximum likelihood estimation provides optimal solution, but with
NP-hard computational cost; moreover, under the hardness hypothesis of
hypergraphic planted clique detection, there are no polynomial-time algorithms
performing consistently in general. Simulation studies show that the proposed
spectral power iteration method have good performance under a variety of
settings.
Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban, Joydeep Ghosh
Comments: To appear in AISTATS 2017
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
Greedy algorithms are widely used for problems in machine learning such as
feature selection and set function optimization. Unfortunately, for large
datasets, the running time of even greedy algorithms can be quite high. This is
because for each greedy step we need to refit a model or calculate a function
using the previously selected choices and the new candidate.
Two algorithms that are faster approximations to the greedy forward selection
were introduced recently ([Mirzasoleiman et al. 2013, 2015]). They achieve
better performance by exploiting distributed computation and stochastic
evaluation respectively. Both algorithms have provable performance guarantees
for submodular functions.
In this paper we show that divergent from previously held opinion,
submodularity is not required to obtain approximation guarantees for these two
algorithms. Specifically, we show that a generalized concept of weak
submodularity suffices to give multiplicative approximation guarantees. Our
result extends the applicability of these algorithms to a larger class of
functions. Furthermore, we show that a bounded submodularity ratio can be used
to provide data dependent bounds that can sometimes be tighter also for
submodular functions. We empirically validate our work by showing superior
performance of fast greedy approximations versus several established baselines
on artificial and real datasets.
Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
We provide new approximation guarantees for greedy low rank matrix estimation
under standard assumptions of restricted strong convexity and smoothness. Our
novel analysis also uncovers previously unknown connections between the low
rank estimation and combinatorial optimization, so much so that our bounds are
reminiscent of corresponding approximation bounds in submodular maximization.
Additionally, we also provide statistical recovery guarantees. Finally, we
present empirical comparison of greedy estimation with established baselines on
two important real-world problems.
Erik M. Lindgren, Shanshan Wu, Alexandros G. Dimakis
Comments: In NIPS 2016
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)
The facility location problem is widely used for summarizing large datasets
and has additional applications in sensor placement, image retrieval, and
clustering. One difficulty of this problem is that submodular optimization
algorithms require the calculation of pairwise benefits for all items in the
dataset. This is infeasible for large problems, so recent work proposed to only
calculate nearest neighbor benefits. One limitation is that several strong
assumptions were invoked to obtain provable approximation guarantees. In this
paper we establish that these extra assumptions are not necessary—solving the
sparsified problem will be almost optimal under the standard assumptions of the
problem. We then analyze a different method of sparsification that is a better
model for methods such as Locality Sensitive Hashing to accelerate the nearest
neighbor computations and extend the use of the problem to a broader family of
similarities. We validate our approach by demonstrating that it rapidly
generates interpretable summaries.
Erik M. Lindgren, Alexandros G. Dimakis, Adam Klivans
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)
Given a graphical model, one essential problem is MAP inference, that is,
finding the most likely configuration of states according to the model.
Although this problem is NP-hard, large instances can be solved in practice. A
major open question is to explain why this is true. We give a natural condition
under which we can provably perform MAP inference in polynomial time. We
require that the number of fractional vertices in the LP relaxation exceeding
the optimal solution is bounded by a polynomial in the problem size. This
resolves an open question by Dimakis, Gohari, and Wainwright. In contrast, for
general LP relaxations of integer programs, known techniques can only handle a
constant number of fractional vertices whose value exceeds the optimal
solution. We experimentally verify this condition and demonstrate how efficient
various integer programming methods are at removing fractional solutions.
Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sujay Sanghavi
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
We consider support recovery in the quadratic logistic regression setting –
where the target depends on both p linear terms (x_i) and up to (p^2) quadratic
terms (x_i x_j). Quadratic terms enable prediction/modeling of higher-order
effects between features and the target, but when incorporated naively may
involve solving a very large regression problem. We consider the sparse case,
where at most (s) terms (linear or quadratic) are non-zero, and provide a new
faster algorithm. It involves (a) identifying the weak support (i.e. all
relevant variables) and (b) standard logistic regression optimization only on
these chosen variables. The first step relies on a novel insight about
correlation tests in the presence of non-linearity, and takes (O(pn)) time for
(n) samples – giving potentially huge computational gains over the naive
approach. Motivated by insights from the boolean case, we propose a non-linear
correlation test for non-binary finite support case that involves hashing a
variable and then correlating with the output variable. We also provide
experimental results to demonstrate the effectiveness of our methods.
Ethan R. Elenberg, Alexandros G. Dimakis, Moran Feldman, Amin Karbasi
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
In many machine learning applications, it is important to explain the
predictions of a black-box classifier. For example, why does a deep neural
network assign an image to a particular class? We cast interpretability of
black-box classifiers as a combinatorial maximization problem and propose an
efficient streaming algorithm to solve it subject to cardinality constraints.
By extending ideas from Badanidiyuru et al. [2014], we provide a constant
factor approximation guarantee for our algorithm in the case of random stream
order and a weakly submodular objective function. This is the first such
theoretical guarantee for this general class of functions, and we also show
that no such algorithm exists for a worst case stream order. Our algorithm
obtains similar explanations of Inception V3 predictions (10) times faster than
the state-of-the-art LIME framework of Ribeiro et al. [2016].
Frederic Sala, Shahroze Kabir, Guy Van den Broeck, Lara Dolecek
Comments: 11 pages, 4 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
After being trained, classifiers must often operate on data that has been
corrupted by noise. In this paper, we consider the impact of such noise on the
features of binary classifiers. Inspired by tools for classifier robustness, we
introduce the same classification probability (SCP) to measure the resulting
distortion on the classifier outputs. We introduce a low-complexity estimate of
the SCP based on quantization and polynomial multiplication. We also study
channel coding techniques based on replication error-correcting codes. In
contrast to the traditional channel coding approach, where error-correction is
meant to preserve the data and is agnostic to the application, our schemes
specifically aim to maximize the SCP (equivalently minimizing the distortion of
the classifier output) for the same redundancy overhead.
Sevi Baltaoglu, Lang Tong, Qing Zhao
Subjects: Computer Science and Game Theory (cs.GT); Learning (cs.LG)
We study the online learning problem of a bidder who participates in repeated
auctions. With the goal of maximizing his total T-period payoff, the bidder
wants to determine the optimal allocation of his fixed budget among his bids
for (K) different goods at each period. As a bidding strategy, we propose a
polynomial time algorithm, referred to as dynamic programming on discrete set
(DPDS), which is inspired by the dynamic programming approach to Knapsack
problems. We show that DPDS achieves the regret order of (O(sqrt{Tlog{T}})).
Also, by showing that the regret growth rate is lower bounded by
(Omega(sqrt{T})) for any bidding strategy, we conclude that DPDS algorithm is
order optimal up to a (sqrt{log{T}}) term. We also evaluate the performance
of DPDS empirically in the context of virtual bidding in wholesale electricity
markets by using historical data from the New York energy market.
Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, Daniel Rueckert
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Evaluating similarity between graphs is of major importance in several
computer vision and pattern recognition problems, where graph representations
are often used to model objects or interactions between elements. The choice of
a distance or similarity metric is, however, not trivial and can be highly
dependent on the application at hand. In this work, we propose a novel metric
learning method to evaluate distance between graphs that leverages the power of
convolutional neural networks, while exploiting concepts from spectral graph
theory to allow these operations on irregular graphs. We demonstrate the
potential of our method in the field of connectomics, where neuronal pathways
or functional connections between brain regions are commonly modelled as
graphs. In this problem, the definition of an appropriate graph similarity
function is critical to unveil patterns of disruptions associated with certain
brain disorders. Experimental results on the ABIDE dataset show that our method
can learn a graph similarity metric tailored for a clinical application,
improving the performance of a simple k-nn classifier by 11.9% compared to a
traditional distance metric.
Bruno Clerckx, Ekaterina Bayguzina
Comments: submitted for publication
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Far-field Wireless Power Transfer (WPT) has attracted significant attention
in the last decade. Recently, channel-adaptive waveforms have been shown to
significantly increase the DC power level at the output of the rectifier.
However the design of those waveforms is generally computationally complex and
does not lend itself easily to practical implementation. We here propose a
low-complexity channel-adaptive multisine waveform design whose performance is
very close to that of the optimal design. Performance evaluations confirm the
benefits of the new design in various rectifier topologies.
Bruno Clerckx, Zati Bayani Zawawi, Kaibin Huang
Comments: submitted for publication
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
This paper shows that wirelessly powered backscatter communications is
subject to a fundamental tradeoff between the harvested energy at the tag and
the reliability of the backscatter communication, measured in terms of SNR at
the reader. Assuming the RF transmit signal is a multisine waveform adaptive to
the channel state information, we derive a systematic approach to optimize the
transmit waveform weights (amplitudes and phases) in order to enlarge as much
as possible the SNRenergy region. Performance evaluations confirm the
significant benefits of using multiple frequency components in the adaptive
transmit multisine waveform to exploit the nonlinearity of the rectifier and a
frequency diversity gain.
Chi Xu, Meng Zheng, Wei Liang, Haibin Yu, Ying-Chang Liang
Subjects: Information Theory (cs.IT)
This paper studies a green paradigm for the underlay coexistence of primary
users (PUs) and secondary users (SUs) in energy harvesting cognitive radio
networks (EH-CRNs), wherein battery-free SUs capture both the spectrum and the
energy of PUs to enhance spectrum efficiency and green energy utilization. To
lower the transmit powers of SUs, we employ multi-hop transmission with time
division multiple access, by which SUs first harvest energy from the RF signals
of PUs and then transmit data in the allocated time concurrently with PUs, all
in the licensed spectrum. In this way, the available transmit energy of each SU
mainly depends on the harvested energy before the turn to transmit, namely
energy causality. Meanwhile, the transmit powers of SUs must be strictly
controlled to protect PUs from harmful interference. Thus, subject to the
energy causality constraint and the interference power constraint, we study the
end-to-end throughput maximization problem for optimal time and power
allocation. To solve this nonconvex problem, we first equivalently transform it
into a convex optimization problem and then propose the joint optimal time and
power allocation (JOTPA) algorithm that iteratively solves a series of
feasibility problems until convergence. Extensive simulations evaluate the
performance of EH-CRNs with JOTPA in three typical deployment scenarios and
validate the superiority of JOTPA by making comparisons with two other resource
allocation algorithms.
Cheng Zhang, Yindi Jing, Yongming Huang, Luxi Yang
Comments: 11 pages, 5 figures, submitted to IEEE Transactions on Vehicular Technology
Subjects: Information Theory (cs.IT)
This work provides a comprehensive scaling law based performance analysis for
multi-cell multi-user massive multiple-input-multiple-output (MIMO) downlink
systems with maximal-ratio transmission (MRT). Imperfect channel state
information (CSI), pilot contamination, and channel spatial correlation are all
considered. First, a sum-rate lower bound is derived by exploiting the
asymptotically deterministic property of the received signal power, while
keeping the random nature of other components in the
signal-to-interference-plus-noise-ratio (SINR) intact. Via a general scaling
model on important network parameters, including the number of users, the
channel training energy and the data transmission power, with respect to the
number of base station antennas, the asymptotic scaling law of the effective
SINR is obtained, which reveals quantitatively the tradeoff of the network
parameters. More importantly, pilot contamination and pilot contamination
elimination (PCE) are considered in the analytical framework. In addition, the
applicability of the derived asymptotic scaling law in practical systems with
large but finite antenna numbers are discussed. Finally, sufficient conditions
on the parameter scalings for the SINR to be asymptotically deterministic in
the sense of mean square convergence are provided, which covers existing
results on such analysis as special cases and shows the effect of PCE
explicitly.
Kunal Sharma, Eyuri Wakakuwa, Mark M. Wilde
Comments: 6 pages
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
Suppose that Alice and Bob are located in distant laboratories, which are
connected by an ideal quantum channel. Suppose further that they share many
copies of a quantum state (
ho_{ABE}), such that Alice possesses the (A)
systems and Bob the (BE) systems. In our model, there is an identifiable part
of Bob’s laboratory that is insecure: a third party named Eve has infiltrated
Bob’s laboratory and gained control of the (E) systems. Alice, knowing this,
would like use their shared state and the ideal quantum channel to communicate
a message in such a way that Bob, who has access to the whole of his laboratory
((BE) systems), can decode it, while Eve, who has access only to a sector of
Bob’s laboratory ((E) systems) and the ideal quantum channel connecting Alice
to Bob, cannot learn anything about Alice’s transmitted message. We call this
task the conditional one-time pad, and in this paper, we prove that the optimal
rate of secret communication for this task is equal to the conditional quantum
mutual information (I(A;B|E)) of their shared state. We thus give the
conditional quantum mutual information an operational meaning that is different
from those given in prior works, via state redistribution, conditional erasure,
or state deconstruction. We also generalize the model and method in several
ways, one of which demonstrates that the negative tripartite interaction
information (-I_{3}(A;B;E) = I(A;BE)-I(A;B)-I(A;E)) of a tripartite state
(
ho_{ABE}) is an achievable rate for a secret-sharing task, i.e., the case in
which Alice’s message should be secure from someone possessing only the (AB) or
(AE) systems but should be decodable by someone possessing all systems (A),
(B), and (E).
Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban, Joydeep Ghosh
Comments: To appear in AISTATS 2017
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
Greedy algorithms are widely used for problems in machine learning such as
feature selection and set function optimization. Unfortunately, for large
datasets, the running time of even greedy algorithms can be quite high. This is
because for each greedy step we need to refit a model or calculate a function
using the previously selected choices and the new candidate.
Two algorithms that are faster approximations to the greedy forward selection
were introduced recently ([Mirzasoleiman et al. 2013, 2015]). They achieve
better performance by exploiting distributed computation and stochastic
evaluation respectively. Both algorithms have provable performance guarantees
for submodular functions.
In this paper we show that divergent from previously held opinion,
submodularity is not required to obtain approximation guarantees for these two
algorithms. Specifically, we show that a generalized concept of weak
submodularity suffices to give multiplicative approximation guarantees. Our
result extends the applicability of these algorithms to a larger class of
functions. Furthermore, we show that a bounded submodularity ratio can be used
to provide data dependent bounds that can sometimes be tighter also for
submodular functions. We empirically validate our work by showing superior
performance of fast greedy approximations versus several established baselines
on artificial and real datasets.
Rajiv Khanna, Ethan Elenberg, Alexandros G. Dimakis, Sahand Negahban
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
We provide new approximation guarantees for greedy low rank matrix estimation
under standard assumptions of restricted strong convexity and smoothness. Our
novel analysis also uncovers previously unknown connections between the low
rank estimation and combinatorial optimization, so much so that our bounds are
reminiscent of corresponding approximation bounds in submodular maximization.
Additionally, we also provide statistical recovery guarantees. Finally, we
present empirical comparison of greedy estimation with established baselines on
two important real-world problems.
Erik M. Lindgren, Shanshan Wu, Alexandros G. Dimakis
Comments: In NIPS 2016
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)
The facility location problem is widely used for summarizing large datasets
and has additional applications in sensor placement, image retrieval, and
clustering. One difficulty of this problem is that submodular optimization
algorithms require the calculation of pairwise benefits for all items in the
dataset. This is infeasible for large problems, so recent work proposed to only
calculate nearest neighbor benefits. One limitation is that several strong
assumptions were invoked to obtain provable approximation guarantees. In this
paper we establish that these extra assumptions are not necessary—solving the
sparsified problem will be almost optimal under the standard assumptions of the
problem. We then analyze a different method of sparsification that is a better
model for methods such as Locality Sensitive Hashing to accelerate the nearest
neighbor computations and extend the use of the problem to a broader family of
similarities. We validate our approach by demonstrating that it rapidly
generates interpretable summaries.
Erik M. Lindgren, Alexandros G. Dimakis, Adam Klivans
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)
Given a graphical model, one essential problem is MAP inference, that is,
finding the most likely configuration of states according to the model.
Although this problem is NP-hard, large instances can be solved in practice. A
major open question is to explain why this is true. We give a natural condition
under which we can provably perform MAP inference in polynomial time. We
require that the number of fractional vertices in the LP relaxation exceeding
the optimal solution is bounded by a polynomial in the problem size. This
resolves an open question by Dimakis, Gohari, and Wainwright. In contrast, for
general LP relaxations of integer programs, known techniques can only handle a
constant number of fractional vertices whose value exceeds the optimal
solution. We experimentally verify this condition and demonstrate how efficient
various integer programming methods are at removing fractional solutions.
Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sujay Sanghavi
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
We consider support recovery in the quadratic logistic regression setting –
where the target depends on both p linear terms (x_i) and up to (p^2) quadratic
terms (x_i x_j). Quadratic terms enable prediction/modeling of higher-order
effects between features and the target, but when incorporated naively may
involve solving a very large regression problem. We consider the sparse case,
where at most (s) terms (linear or quadratic) are non-zero, and provide a new
faster algorithm. It involves (a) identifying the weak support (i.e. all
relevant variables) and (b) standard logistic regression optimization only on
these chosen variables. The first step relies on a novel insight about
correlation tests in the presence of non-linearity, and takes (O(pn)) time for
(n) samples – giving potentially huge computational gains over the naive
approach. Motivated by insights from the boolean case, we propose a non-linear
correlation test for non-binary finite support case that involves hashing a
variable and then correlating with the output variable. We also provide
experimental results to demonstrate the effectiveness of our methods.
Rebecca C. Steorts, Matt Barnes, Willie Neiswanger
Comments: 11 pages with supplement; 4 figures and 2 tables; to appear in AISTATS 2017
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Methodology (stat.ME); Machine Learning (stat.ML)
Record linkage involves merging records in large, noisy databases to remove
duplicate entities. It has become an important area because of its widespread
occurrence in bibliometrics, public health, official statistics production,
political science, and beyond. Traditional linkage methods directly linking
records to one another are computationally infeasible as the number of records
grows. As a result, it is increasingly common for researchers to treat record
linkage as a clustering task, in which each latent entity is associated with
one or more noisy database records. We critically assess performance bounds
using the Kullback-Leibler (KL) divergence under a Bayesian record linkage
framework, making connections to Kolchin partition models. We provide an upper
bound using the KL divergence and a lower bound on the minimum probability of
misclassifying a latent entity. We give insights for when our bounds hold using
simulated data and provide practical user guidance.
Ming Jiang, Jérôme Bobin, Jean-Luc Starck
Subjects: Applications (stat.AP); Information Theory (cs.IT)
Blind Source Separation (BSS) is a challenging matrix factorization problem
that plays a central role in multichannel imaging science. In a large number of
applications, such as astrophysics, current unmixing methods are limited since
real-world mixtures are generally affected by extra instrumental effects like
blurring. Therefore, BSS has to be solved jointly with a deconvolution problem,
which requires tackling a new inverse problem: deconvolution BSS (DBSS). In
this article, we introduce an innovative DBSS approach, called DecGMCA, based
on sparse signal modeling and an efficient alternative projected least square
algorithm. Numerical results demonstrate that the DecGMCA algorithm performs
very well on simulations. It further highlights the importance of jointly
solving BSS and deconvolution instead of considering these two problems
independently. Furthermore, the performance of the proposed DecGMCA algorithm
is demonstrated on simulated radio-interferometric data.
Ethan R. Elenberg, Alexandros G. Dimakis, Moran Feldman, Amin Karbasi
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
In many machine learning applications, it is important to explain the
predictions of a black-box classifier. For example, why does a deep neural
network assign an image to a particular class? We cast interpretability of
black-box classifiers as a combinatorial maximization problem and propose an
efficient streaming algorithm to solve it subject to cardinality constraints.
By extending ideas from Badanidiyuru et al. [2014], we provide a constant
factor approximation guarantee for our algorithm in the case of random stream
order and a weakly submodular objective function. This is the first such
theoretical guarantee for this general class of functions, and we also show
that no such algorithm exists for a worst case stream order. Our algorithm
obtains similar explanations of Inception V3 predictions (10) times faster than
the state-of-the-art LIME framework of Ribeiro et al. [2016].
Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath
Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
We consider the problem of learning a causal graph over a set of variables
with interventions. We study the cost-optimal causal graph learning problem:
For a given skeleton (undirected version of the causal graph), design the set
of interventions with minimum total cost, that can uniquely identify any causal
graph with the given skeleton. We show that this problem is solvable in
polynomial time. Later, we consider the case when the number of interventions
is limited. For this case, we provide polynomial time algorithms when the
skeleton is a tree or a clique tree. For a general chordal skeleton, we develop
an efficient greedy algorithm, which can be improved when the causal graph
skeleton is an interval graph.