Mohammed A. Zidan, YeonJoo Jeong, Jong Hong Shin, Chao Du, Wei D. Lu
Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE)
For decades, advances in electronics were directly related to the scaling of
CMOS transistors according to Moore’s law. However, both the CMOS scaling and
the classical computer architecture are approaching fundamental and practical
limits, and new computing architectures based on emerging devices, such as
non-volatile memories e.g. resistive memory (RRAM) devices, are expected to
sustain the exponential growth of computing capability. Here we propose a novel
memory-centric, reconfigurable, general purpose computing platform to handle
the exploding amount of data in a fast and energy-efficient manner. The
proposed computing architecture is based on a single physical resistive
memory-centric fabric that can be optimally reconfigured and utilized to
perform different computing and data storage tasks in a massively parallel
approach. The system can be tailored to achieve maximal energy efficiency based
on the data flow by dynamically allocating the basic computing fabric to
storage, arithmetic, and analog including neuromorphic computing tasks.
Hans Krupakar, Akshay Jayakumar, Dhivya G
Comments: 18 pages, 3 figures, 1 table, In National Conference on Computational Intelligence and High-Performance Computing (NCCIHPC)
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Population growth and increasing droughts are creating unprecedented strain
on the continued availability of water resources. Since irrigation is a major
consumer of fresh water, wastage of resources in this sector could have strong
consequences. To address this issue, irrigation water management and prediction
techniques need to be employed effectively and should be able to account for
the variabilities present in the environment. The different techniques surveyed
in this paper can be classified into two categories: computational and
statistical. Computational methods deal with scientific correlations between
physical parameters whereas statistical methods involve specific prediction
algorithms that can be used to automate the process of irrigation water
prediction. These algorithms interpret semantic relationships between the
various parameters of temperature, pressure, evapotranspiration etc. and store
them as numerical precomputed entities specific to the conditions and the area
used as the data for the training corpus used to train it. We focus on
reviewing the computational methods used to determine Evapotranspiration and
its implications. We compare the efficiencies of different data mining and
machine learning methods implemented in this area, such as Logistic Regression,
Decision Tress Classifier, SysFor, Support Vector Machine(SVM), Fuzzy Logic
techniques, Artifical Neural Networks(ANNs) and various hybrids of Genetic
Algorithms (GA) applied to irrigation prediction. We also recommend a possible
technique for the same based on its superior results in other such time series
analysis tasks.
Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin Gui, Sean Banerjee, Timothy Godisart, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, Yaser Sheikh
Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present an approach to capture the 3D motion of a group of people engaged
in a social interaction. The core challenges in capturing social interactions
are: (1) occlusion is functional and frequent; (2) subtle motion needs to be
measured over a space large enough to host a social group; (3) human appearance
and configuration variation is immense; and (4) attaching markers to the body
may prime the nature of interactions. The Panoptic Studio is a system organized
around the thesis that social interactions should be measured through the
integration of perceptual analyses over a large variety of view points. We
present a modularized system designed around this principle, consisting of
integrated structural, hardware, and software innovations. The system takes, as
input, 480 synchronized video streams of multiple people engaged in social
activities, and produces, as output, the labeled time-varying 3D structure of
anatomical landmarks on individuals in the space. Our algorithm is designed to
fuse the “weak” perceptual processes in the large number of views by
progressively generating skeletal proposals from low-level appearance cues, and
a framework for temporal refinement is also presented by associating body parts
to reconstructed dense 3D trajectory stream. Our system and method are the
first in reconstructing full body motion of more than five people engaged in
social interactions without using markers. We also empirically demonstrate the
impact of the number of views in achieving this goal.
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Feature pyramids are a basic component in recognition systems for detecting
objects at different scales. But recent deep learning object detectors have
avoided pyramid representations, in part because they are compute and memory
intensive. In this paper, we exploit the inherent multi-scale, pyramidal
hierarchy of deep convolutional networks to construct feature pyramids with
marginal extra cost. A top-down architecture with lateral connections is
developed for building high-level semantic feature maps at all scales. This
architecture, called a Feature Pyramid Network (FPN), shows significant
improvement as a generic feature extractor in several applications. Using FPN
in a basic Faster R-CNN system, our method achieves state-of-the-art
single-model results on the COCO detection benchmark without bells and
whistles, surpassing all existing single-model entries including those from the
COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU
and thus is a practical and accurate solution to multi-scale object detection.
Code will be made publicly available.
Scott Workman, Richard Souvenir, Nathan Jacobs
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Capturing the beauty of outdoor scenes in an image motivates many amateur and
professional photographers and serves as the basis for many image sharing
sites. While natural beauty is often considered a subjective property of
images, in this paper, we take an objective approach and provide methods for
quantifying and predicting the scenicness of an image. Using a dataset
containing hundreds of thousands of outdoor images captured throughout Great
Britain with crowdsourced ratings of natural beauty, we propose an approach to
predict scenicness which explicitly accounts for the variance of human raters.
We demonstrate that quantitative measures of scenicness can benefit semantic
image understanding, content-aware image processing, and a novel application of
cross-view mapping, where the sparsity of labeled ground-level images can be
addressed by incorporating unlabeled aerial images in the training and
prediction steps. For each application, our methods for scenicness prediction
result in quantitative and qualitative improvements over baseline approaches.
Zeeshan Hayder, Xuming He, Mathieu Salzmann
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We address the problem of instance-level semantic segmentation, which aims at
jointly detecting, segmenting and classifying every individual object in an
image. In this context, existing methods typically propose candidate objects,
usually as bounding boxes, and directly predict a binary mask within each such
proposal. As a consequence, they cannot recover from errors in the object
candidate generation process, such as too small or shifted boxes. In this
paper, we introduce a novel object segment representation based on the distance
transform of the object masks. We then design an object mask network (OMN) with
a new residual-deconvolution architecture that infers such a representation and
decodes it into the final binary object mask. This allows us to predict masks
that go beyond the scope of the bounding boxes and are thus robust to
inaccurate object candidates. We integrate our OMN into a Multitask Network
Cascade framework, and learn the resulting shape-aware instance segmentation
(SAIS) network in an end-to-end manner. Our experiments on the PASCAL VOC 2012
and the CityScapes datasets demonstrate the benefits of our approach, which
outperforms the state-of-the-art in both object proposal generation and
instance segmentation.
Adrià Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba
Comments: 9 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Following the gaze of people inside videos is an important signal for
understanding people and their actions. In this paper, we present an approach
for following gaze across views by predicting where a particular person is
looking throughout a scene. We collect VideoGaze, a new dataset which we use as
a benchmark to both train and evaluate models. Given one view with a person in
it and a second view of the scene, our model estimates a density for gaze
location in the second view. A key aspect of our approach is an end-to-end
model that solves the following sub-problems: saliency, gaze pose, and
geometric relationships between views. Although our model is supervised only
with gaze, we show that the model learns to solve these subproblems
automatically without supervision. Experiments suggest that our approach
follows gaze better than standard baselines and produces plausible results for
everyday situations.
Joe Yue-Hei Ng, Jonghyun Choi, Jan Neumann, Larry S. Davis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Even with the recent advances in convolutional neural networks (CNN) in
various visual recognition tasks, the state-of-the-art action recognition
system still relies on hand crafted motion feature such as optical flow to
achieve the best performance. We propose a multitask learning model
ActionFlowNet to train a single stream network directly from raw pixels to
jointly estimate optical flow while recognizing actions with convolutional
neural networks, capturing both appearance and motion in a single model. We
additionally provide insights to how the quality of the learned optical flow
affects the action recognition. Our model not only significantly improves
action recognition accuracy by a large margin (17%) compared to
state-of-the-art CNN-based action recognition models trained without external
large scale data and additional optical flow input, but also produces the
optical flow as a side product.
Maurilio Di Cicco, Ciro Potena, Giorgio Grisetti, Alberto Pretto
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Selective weeding is one of the key challenges in the field of agriculture
robotics: in order to accomplish this task, a farm robot should be able to
accurately detect plants and to distinguish them between crop and weeds. In
this paper, we face this problem by proposing a novel and effective approach
that aims to dramatically minimize the human intervention needed to train the
detection and classification algorithms. The idea is to synthetically generate
the training datasets by using state-of-the-art computer graphics methods and
tools. We explicitly model the relevant aspects of the target environment
(i.e., crop and weeds species, type of soil, light conditions): by changing the
model parameters, it is possible to render a huge number of photo-realistic
views of the environment, to be used as training data. The proposed methodology
has been validated exploiting a very recent deep learning based image
segmentation architecture called SegNet [Kendall et al.]. We trained and tested
different custom-built variants of SegNet using both real and synthetically
generated datasets, the reported results confirm the effectiveness and the
potentiality of our approach.
Christopher Pramerdorfer, Martin Kampel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The ability to recognize facial expressions automatically enables novel
applications in human-computer interaction and other areas. Consequently, there
has been active research in this field, with several recent works utilizing
Convolutional Neural Networks (CNNs) for feature extraction and inference.
These works differ significantly in terms of CNN architectures and other
factors. Based on the reported results alone, the performance impact of these
factors is unclear. In this paper, we review the state of the art in
image-based facial expression recognition using CNNs and highlight algorithmic
differences and their performance impact. On this basis, we identify existing
bottlenecks and consequently directions for advancing this research field.
Furthermore, we demonstrate that overcoming one of these bottlenecks – the
comparatively basic architectures of the CNNs utilized in this field – leads to
a substantial performance increase. By forming an ensemble of modern deep CNNs,
we obtain a FER2013 test accuracy of 75.2%, outperforming previous works
without requiring auxiliary training data or face registration.
Yubo Zhang, Vishnu Naresh Boddeti, Kris M. Kitani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Accurately identifying hands in images is a key sub-task for human activity
understanding with wearable first-person point-of-view cameras. Traditional
hand segmentation approaches rely on a large corpus of manually labeled data to
generate robust hand detectors. However, these approaches still face challenges
as the appearance of the hand varies greatly across users, tasks, environments
or illumination conditions. A key observation in the case of many wearable
applications and interfaces is that, it is only necessary to accurately detect
the user’s hands in a specific situational context. Based on this observation,
we introduce an interactive approach to learn a person-specific hand
segmentation model that does not require any manually labeled training data.
Our approach proceeds in two steps, an interactive bootstrapping step for
identifying moving hand regions, followed by learning a personalized user
specific hand appearance model. Concretely, our approach uses two convolutional
neural networks: (1) a gesture network that uses pre-defined motion information
to detect the hand region; and (2) an appearance network that learns a person
specific model of the hand region based on the output of the gesture network.
During training, to make the appearance network robust to errors in the gesture
network, the loss function of the former network incorporates the confidence of
the gesture network while learning. Experiments demonstrate the robustness of
our approach with an F1 score over 0.8 on all challenging datasets across a
wide range of illumination and hand appearance variations, improving over a
baseline approach by over 10%.
Zibang Zhang, Xueying Wang, Jingang Zhong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Fourier single-pixel imaging (FSI) has proven capable of reconstructing
high-quality two-dimensional and three-dimensional images. The utilization of
the sparsity of natural images in Fourier domain allows high-resolution images
to be reconstructed from far fewer measurements than effective image pixels.
However, applying original FSI in digital micro-mirror device (DMD) based
high-speed imaging system turns out to be challenging, because the original FSI
uses grayscale Fourier basis patterns for illumination while DMDs generate
grayscale patterns at a relatively low rate. DMDs are a binary device which can
only generate a black-and-white pattern at each instance. In this paper, we
adopt binary Fourier patterns for illumination to achieve DMD-based high-speed
single-pixel imaging. Binary Fourier patterns are generated by upsampling and
then applying error diffusion based dithering to the grayscale patterns.
Experiments demonstrate the proposed technique able to achieve static imaging
with high quality and dynamic imaging in real time. The proposed technique
potentially allows high-quality and high-speed imaging over broad wavebands.
Erik Wijmans, Yasutaka Furukawa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel algorithm that utilizes a 2D floorplan to align
panorama RGBD scans. While effective panorama RGBD alignment techniques exist,
such a system requires extremely dense RGBD image sampling. Our approach can
significantly reduce the number of necessary scans with the aid of a floorplan
image. We formulate a novel Markov Random Field inference problem as a scan
placement over the floorplan, as opposed to the conventional scan-to-scan
alignment. The technical contributions lie in multi-modal image correspondence
cues (between scans and schematic floorplan) as well as a novel coverage
potential avoiding an inherent stacking bias. The proposed approach has been
evaluated on five challenging large indoor spaces. To the best of our
knowledge, we present the first effective system that utilizes a 2D floorplan
image for building-scale 3D pointcloud alignment. The source code and the data
will be shared with the community to further enhance indoor mapping research.
Hang Zhang, Jia Xue, Kristin Dana
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose a Deep Texture Encoding Network (Deep-TEN) with a novel Encoding
Layer integrated on top of convolutional layers, which ports the entire
dictionary learning and encoding pipeline into a single model. Current methods
build from distinct components, using standard encoders with separate
off-the-shelf features such as SIFT descriptors or pre-trained CNN features for
material recognition. Our new approach provides an end-to-end learning
framework, where the inherent visual vocabularies are learned directly from the
loss function. The features, dictionaries and the encoding representation for
the classifier are all learned simultaneously. The representation is orderless
and therefore is particularly useful for material and texture recognition. The
Encoding Layer generalizes robust residual encoders such as VLAD and Fisher
Vectors, and has the property of discarding domain specific information which
makes the learned convolutional features easier to transfer. Additionally,
joint training using multiple datasets of varied sizes and class labels is
supported resulting in increased recognition performance. The experimental
results show superior performance as compared to state-of-the-art methods using
gold-standard databases such as MINC-2500, Flickr Material Database,
KTH-TIPS-2b, and two recent databases 4D-Light-Field-Material and GTOS. The
source code for the complete system are publicly available.
Frank Nielsen, Richard Nock
Comments: 18 pages
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a series of closed-form maximum entropy upper bounds for the
differential entropy of a continuous univariate random variable and study the
properties of that series. We then show how to use those generic bounds for
upper bounding the differential entropy of Gaussian mixture models. This
requires to calculate the raw moments and raw absolute moments of Gaussian
mixtures in closed-form that may also be handy in statistical machine learning
and information theory. We report on our experiments and discuss on the
tightness of those bounds.
Jessa Bekker, Arjen Hommersom, Martijn Lappenschaar, Jesse Davis
Comments: Machine Learning for Health @ NIPS 2016
Subjects: Artificial Intelligence (cs.AI)
Managing patients with multimorbidity often results in polypharmacy: the
prescription of multiple drugs. However, the long-term effects of specific
combinations of drugs and diseases are typically unknown. In particular, drugs
prescribed for one condition may result in adverse effects for the other. To
investigate which types of drugs may affect the further progression of
multimorbidity, we query models of diseases and prescriptions that are learned
from primary care data. State-of-the-art tractable Bayesian network
representations, on which such complex queries can be computed efficiently, are
employed for these large medical networks. Our results confirm that
prescriptions may lead to unintended negative consequences in further
development of multimorbidity in cardiovascular diseases. Moreover, a drug
treatment for one disease group may affect diseases of another group.
Davoud Mougouei, David Powers
Comments: Idea Paper
Subjects: Artificial Intelligence (cs.AI)
It has been widely recognized that uncertainty is an inevitable aspect of
diagnosis and treatment of medical disorders. Such uncertainties hence, need to
be considered in computerized medical models. The existing medical modeling
techniques however, have mainly focused on capturing uncertainty associated
with diagnosis of medical disorders while ignoring uncertainty of treatments.
To tackle this issue, we have proposed using a fuzzy-based modeling and
description technique for capturing uncertainties in treatment plans. We have
further contributed a formal framework which allows for goal-oriented modeling
and analysis of medical treatments.
Richard S. Sutton, Vivek Veeriah
Comments: preliminary draft
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Representations are fundamental to Artificial Intelligence. Typically, the
performance of a learning system depends on its data representation. These data
representations are usually hand-engineered based on some prior domain
knowledge regarding the task. More recently, the trend is to learn these
representations through deep neural networks as these can produce dramatical
performance improvements over hand-engineered data representations. In this
paper, we present a new incremental learning algorithm, called crossprop, for
learning representations based on prior learning experiences. Unlike
backpropagation, crossprop minimizes the cross-validation error. Specifically,
our algorithm considers the influences of all the past weights on the current
squared error, and uses this gradient for incrementally learning the weights in
a neural network. This idea is similar to that of tuning the learning system
through an offline cross-validation procedure. Crossprop is applicable to
incremental learning tasks, where a sequence of examples are encountered by the
learning system and they need to be processed one by one and then discarded.
The learning system can use each example only once and can spend only a limited
amount of computation for an example. From our preliminary experiments, we
concluce that crossprop is a promising alternative for backprop for
representation learning.
Kim Peter Wabersich, Marc Toussaint
Comments: Long version of accepted NIPS BayesOpt 2016 paper
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Bayesian Optimization (BO) has become a core method for solving expensive
black-box optimization problems. While much research focussed on the choice of
the acquisition function, we focus on online length-scale adaption and the
choice of kernel function. Instead of choosing hyperparameters in view of
maximum likelihood on past data, we propose to use the acquisition function to
decide on hyperparameter adaptation more robustly and in view of the future
optimization progress. Further, we propose a particular kernel function that
includes non-stationarity and local anisotropy and thereby implicitly
integrates the efficiency of local convex optimization with global Bayesian
optimization. Comparisons to state-of-the art BO methods underline the
efficiency of these mechanisms on global optimization benchmarks.
Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Machine learning is being deployed in a growing number of applications which
demand real-time, accurate, and robust predictions under heavy query load.
However, most machine learning frameworks and systems only address model
training and not deployment.
In this paper, we introduce Clipper, the first general-purpose low-latency
prediction serving system. Interposing between end-user applications and a wide
range of machine learning frameworks, Clipper introduces a modular architecture
to simplify model deployment across frameworks. Furthermore, by introducing
caching, batching, and adaptive model selection techniques, Clipper reduces
prediction latency and improves prediction throughput, accuracy, and robustness
without modifying the underlying machine learning frameworks. We evaluate
Clipper on four common machine learning benchmark datasets and demonstrate its
ability to meet the latency, accuracy, and throughput demands of online serving
applications. Finally, we compare Clipper to the TensorFlow Serving system and
demonstrate comparable prediction throughput and latency on a range of models
while enabling new functionality, improved accuracy, and robustness.
Vitaly Buravlev, Rocco De Nicola, Claudio Antares Mezzina
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Among the paradigms for parallel and distributed computing, the one
popularized with Linda, and based on tuple spaces, is one of the least used,
despite the fact of being intuitive, easy to understand and to use. A tuple
space is a repository, where processes can add, withdraw or read tuples by
means of atomic operations. Tuples may contain different values, and processes
can inspect their content via pattern matching. The lack of a reference
implementation for this paradigm has prevented its widespread. In this paper,
first we perform an extensive analysis of a number of actual implementations of
the tuple space paradigm and summarise their main features. Then, we select
four such implementations and compare their performances on four different case
studies that aim at stressing different aspects of computing such as
communication, data manipulation, and cpu usage. After reasoning on strengths
and weaknesses of the four implementations, we conclude with some
recommendations for future work towards building an effective implementation of
the tuple space paradigm.
Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, Alexander Spiegelman
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)
The decentralized cryptocurrency Bitcoin has experienced great success but
also encountered many challenges. One of the challenges has been the long
confirmation time and low transaction throughput. Another challenge is the lack
of incentives at certain steps of the protocol, raising concerns for
transaction withholding, selfish mining, etc. To address these challenges, we
propose Solidus, a decentralized cryptocurrency based on permissionless
Byzantine consensus. A core technique in Solidus is to use proof of work for
leader election to adapt the Practical Byzantine Fault Tolerance (PBFT)
protocol to a permissionless setting. We also design Solidus to be incentive
compatible and to mitigate selfish mining. Solidus improves on Bitcoin in
confirmation time and provides safety and liveness assuming Byzantine players
and the largest coalition of rational players collectively control less than
one-third of the computation power.
Constantinos Daskalakis, Qinxuan Pan
Subjects: Learning (cs.LG); Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)
We show that the square Hellinger distance between two Bayesian networks on
the same directed graph, (G), is subadditive with respect to the neighborhoods
of (G). Namely, if (P) and (Q) are the probability distributions defined by two
Bayesian networks on the same DAG, our inequality states that the square
Hellinger distance, (H^2(P,Q)), between (P) and (Q) is upper bounded by the
sum, (sum_v H^2(P_{{v} cup Pi_v}, Q_{{v} cup Pi_v})), of the square
Hellinger distances between the marginals of (P) and (Q) on every node (v) and
its parents (Pi_v) in the DAG. Importantly, our bound does not involve the
conditionals but the marginals of (P) and (Q). We derive a similar inequality
for more general Markov Random Fields.
As an application of our inequality, we show that distinguishing whether two
Bayesian networks (P) and (Q) on the same (but potentially unknown) DAG satisfy
(P=Q) vs (d_{
m TV}(P,Q)>epsilon) can be performed from
( ilde{O}(|Sigma|^{3/4(d+1)} cdot n/epsilon^2)) samples, where (d) is the
maximum in-degree of the DAG and (Sigma) the domain of each variable of the
Bayesian networks. If (P) and (Q) are defined on potentially different and
potentially unknown trees, the sample complexity becomes
( ilde{O}(|Sigma|^{4.5} n/epsilon^2)), whose dependence on (n, epsilon) is
optimal up to logarithmic factors. Lastly, if (P) and (Q) are product
distributions over ({0,1}^n) and (Q) is known, the sample complexity becomes
(O(sqrt{n}/epsilon^2)), which is optimal up to constant factors.
Kim Peter Wabersich, Marc Toussaint
Comments: Long version of accepted NIPS BayesOpt 2016 paper
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Bayesian Optimization (BO) has become a core method for solving expensive
black-box optimization problems. While much research focussed on the choice of
the acquisition function, we focus on online length-scale adaption and the
choice of kernel function. Instead of choosing hyperparameters in view of
maximum likelihood on past data, we propose to use the acquisition function to
decide on hyperparameter adaptation more robustly and in view of the future
optimization progress. Further, we propose a particular kernel function that
includes non-stationarity and local anisotropy and thereby implicitly
integrates the efficiency of local convex optimization with global Bayesian
optimization. Comparisons to state-of-the art BO methods underline the
efficiency of these mechanisms on global optimization benchmarks.
Kareem Abdelfatah, Junshu Bao, Gabriel Terejanu
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
A probabilistic model is proposed by stacking a set of independently trained
Gaussian processes to obtain prediction of quantities of interests that require
composition of functions. Analytical derivations are provided for first and
second-order moments of the stacked Gaussian process using RBF and polynomial
kernels. The StackedGP model can be extended to any number of layers and nodes
per layer, and it provides flexibility in kernel selection for each node. The
proposed nonparametric stacked model is validated using different synthetic
datasets and its performance is measured in two real-world applications.
Hans Krupakar, Akshay Jayakumar, Dhivya G
Comments: 18 pages, 3 figures, 1 table, In National Conference on Computational Intelligence and High-Performance Computing (NCCIHPC)
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Population growth and increasing droughts are creating unprecedented strain
on the continued availability of water resources. Since irrigation is a major
consumer of fresh water, wastage of resources in this sector could have strong
consequences. To address this issue, irrigation water management and prediction
techniques need to be employed effectively and should be able to account for
the variabilities present in the environment. The different techniques surveyed
in this paper can be classified into two categories: computational and
statistical. Computational methods deal with scientific correlations between
physical parameters whereas statistical methods involve specific prediction
algorithms that can be used to automate the process of irrigation water
prediction. These algorithms interpret semantic relationships between the
various parameters of temperature, pressure, evapotranspiration etc. and store
them as numerical precomputed entities specific to the conditions and the area
used as the data for the training corpus used to train it. We focus on
reviewing the computational methods used to determine Evapotranspiration and
its implications. We compare the efficiencies of different data mining and
machine learning methods implemented in this area, such as Logistic Regression,
Decision Tress Classifier, SysFor, Support Vector Machine(SVM), Fuzzy Logic
techniques, Artifical Neural Networks(ANNs) and various hybrids of Genetic
Algorithms (GA) applied to irrigation prediction. We also recommend a possible
technique for the same based on its superior results in other such time series
analysis tasks.
Clement Canonne, Ilias Diakonikolas, Daniel Kane, Alistair Stewart
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)
This work initiates a systematic investigation of testing {em
high-dimensional} structured distributions by focusing on testing {em Bayesian
networks} — the prototypical family of directed graphical models. A Bayesian
network is defined by a directed acyclic graph, where we associate a random
variable with each node. The value at any particular node is conditionally
independent of all the other non-descendant nodes once its parents are fixed.
Specifically, we study the properties of identity testing and closeness testing
of Bayesian networks. Our main contribution is the first non-trivial efficient
testing algorithms for these problems and corresponding information-theoretic
lower bounds. For a wide range of parameter settings, our testing algorithms
have sample complexity {em sublinear} in the dimension and are sample-optimal,
up to constant factors.
Anindya De, Ryan O'Donnell, Rocco Servedio
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Learning (cs.LG)
In the (deletion-channel) trace reconstruction problem, there is an unknown
(n)-bit source string (x). An algorithm is given access to independent traces
of (x), where a trace is formed by deleting each bit of~(x) independently with
probability~(delta). The goal of the algorithm is to recover~(x) exactly (with
high probability), while minimizing samples (number of traces) and running
time.
Previously, the best known algorithm for the trace reconstruction problem was
due to Holenstein~et~al.; it uses (exp( ilde{O}(n^{1/2}))) samples and
running time for any fixed (0 < delta < 1). It is also what we call a
“mean-based algorithm”, meaning that it only uses the empirical means of the
individual bits of the traces. Holenstein~et~al.~also gave a lower bound,
showing that any mean-based algorithm must use at least (n^{ ilde{Omega}(log
n)}) samples.
In this paper we improve both of these results, obtaining matching upper and
lower bounds for mean-based trace reconstruction. For any constant deletion
rate (0 < delta < 1), we give a mean-based algorithm that uses
(exp(O(n^{1/3}))) time and traces; we also prove that any mean-based algorithm
must use at least (exp(Omega(n^{1/3}))) traces. In fact, we obtain matching
upper and lower bounds even for (delta) subconstant and (
ho := 1-delta)
subconstant: when ((log^3 n)/n ll delta leq 1/2) the bound is
(exp(-Theta(delta n)^{1/3})), and when (1/sqrt{n} ll
ho leq 1/2) the
bound is (exp(-Theta(n/
ho)^{1/3})).
Our proofs involve estimates for the maxima of Littlewood polynomials on
complex disks. We show that these techniques can also be used to perform trace
reconstruction with random insertions and bit-flips in addition to deletions.
We also find a surprising result: for deletion probabilities (delta > 1/2),
the presence of insertions can actually help with trace reconstruction.
Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)
Given samples from an unknown multivariate distribution (p), is it possible
to distinguish whether (p) is the product of its marginals versus (p) being far
from every product distribution? Similarly, is it possible to distinguish
whether (p) equals a given distribution (q) versus (p) and (q) being far from
each other? These problems of testing independence and goodness-of-fit have
received enormous attention in statistics, information theory, and theoretical
computer science, with sample-optimal algorithms known in several interesting
regimes of parameters. Unfortunately, it has also been understood that these
problems become intractable in large dimensions, necessitating exponential
sample complexity.
Motivated by the exponential lower bounds for general distributions as well
as the ubiquity of Markov Random Fields (MRFs) in the modeling of
high-dimensional distributions, we initiate the study of distribution testing
on structured multivariate distributions, and in particular the prototypical
example of MRFs: the Ising Model. We demonstrate that, in this structured
setting, we can avoid the curse of dimensionality, obtaining sample and time
efficient testers for independence and goodness-of-fit. Along the way, we
develop new tools for establishing concentration of functions of the Ising
model, using the exchangeable pairs framework developed by Chatterjee, and
improving upon this framework. In particular, we prove tighter concentration
results for multi-linear functions of the Ising model in the high-temperature
regime.
Adriano Barra, Giuseppe Genovese, Peter Sollich, Daniele Tantari
Comments: 5 pages, 4 figures
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
We study generalised restricted Boltzmann machines with generic priors for
units and weights, interpolating between Boolean and Gaussian variables. We
present a complete analysis of the replica symmetric phase diagram of these
models, which can be regarded as generalised Hopfield models. We show the way
the paramagnetic phase boundary is directly related to the optimal size of the
training set necessary for good generalisation in a teacher- student scenario.
Moreover we underline the role of the retrieval phase for both inference and
learning processes. We show that retrieval is robust for a large class of
weight and unit priors, beyond the standard Hopfield scenario.
Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Machine learning is being deployed in a growing number of applications which
demand real-time, accurate, and robust predictions under heavy query load.
However, most machine learning frameworks and systems only address model
training and not deployment.
In this paper, we introduce Clipper, the first general-purpose low-latency
prediction serving system. Interposing between end-user applications and a wide
range of machine learning frameworks, Clipper introduces a modular architecture
to simplify model deployment across frameworks. Furthermore, by introducing
caching, batching, and adaptive model selection techniques, Clipper reduces
prediction latency and improves prediction throughput, accuracy, and robustness
without modifying the underlying machine learning frameworks. We evaluate
Clipper on four common machine learning benchmark datasets and demonstrate its
ability to meet the latency, accuracy, and throughput demands of online serving
applications. Finally, we compare Clipper to the TensorFlow Serving system and
demonstrate comparable prediction throughput and latency on a range of models
while enabling new functionality, improved accuracy, and robustness.
Nathan H Lazar, Mehmet Gönen, Kemal Sönmez
Comments: 4 main pages with 14 supplemental pages
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)
The vast majority of current machine learning algorithms are designed to
predict single responses or a vector of responses, yet many types of response
are more naturally organized as matrices or higher-order tensor objects where
characteristics are shared across modes. We present a new machine learning
algorithm BaTFLED (Bayesian Tensor Factorization Linked to External Data) that
predicts values in a three-dimensional response tensor using input features for
each of the dimensions. BaTFLED uses a probabilistic Bayesian framework to
learn projection matrices mapping input features for each mode into latent
representations that multiply to form the response tensor. By utilizing a
Tucker decomposition, the model can capture weights for interactions between
latent factors for each mode in a small core tensor. Priors that encourage
sparsity in the projection matrices and core tensor allow for feature selection
and model regularization. This method is shown to far outperform elastic net
and neural net models on ‘cold start’ tasks from data simulated in a three-mode
structure. Additionally, we apply the model to predict dose-response curves in
a panel of breast cancer cell lines treated with drug compounds that was used
as a Dialogue for Reverse Engineering Assessments and Methods (DREAM)
challenge.
Frank Nielsen, Richard Nock
Comments: 18 pages
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a series of closed-form maximum entropy upper bounds for the
differential entropy of a continuous univariate random variable and study the
properties of that series. We then show how to use those generic bounds for
upper bounding the differential entropy of Gaussian mixture models. This
requires to calculate the raw moments and raw absolute moments of Gaussian
mixtures in closed-form that may also be handy in statistical machine learning
and information theory. We report on our experiments and discuss on the
tightness of those bounds.
Richard S. Sutton, Vivek Veeriah
Comments: preliminary draft
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Representations are fundamental to Artificial Intelligence. Typically, the
performance of a learning system depends on its data representation. These data
representations are usually hand-engineered based on some prior domain
knowledge regarding the task. More recently, the trend is to learn these
representations through deep neural networks as these can produce dramatical
performance improvements over hand-engineered data representations. In this
paper, we present a new incremental learning algorithm, called crossprop, for
learning representations based on prior learning experiences. Unlike
backpropagation, crossprop minimizes the cross-validation error. Specifically,
our algorithm considers the influences of all the past weights on the current
squared error, and uses this gradient for incrementally learning the weights in
a neural network. This idea is similar to that of tuning the learning system
through an offline cross-validation procedure. Crossprop is applicable to
incremental learning tasks, where a sequence of examples are encountered by the
learning system and they need to be processed one by one and then discarded.
The learning system can use each example only once and can spend only a limited
amount of computation for an example. From our preliminary experiments, we
concluce that crossprop is a promising alternative for backprop for
representation learning.
Andrew Stevens, Yunchen Pu, Yannan Sun, Greg Spell, Lawrence Carin
Comments: Rejected from ICML 2016
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
A nonparametric factor analysis framework is developed for tensor-variate
data. The dictionary elements (factor loadings) are inferred as tensors, as
opposed to vectors. Our tensor-factor analysis (TFA) framework is developed in
the context of the beta-process factor analysis (BPFA) using a nonparametric
tensor decomposition for each dictionary element. We extend the multiplicative
gamma prior to allow inference of the shape parameter, which can help control
model complexity during learning. Moreover, we extend the TFA model to include
translation invariance and a multi-layer structure—a deep convolutional TFA.
We test our approach on 2D & 3D image denoising, inpainting, and image
classification. Our TFA models provide more diverse dictionaries. In
particular, TFA provides state of the art results for simultaneous image
inpainting & denoising and on the Caltech 101 benchmark.
Xingguo Li, Jarvis Haupt
Comments: 16 pages, 4 figures
Subjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
This paper examines the problem of locating outlier columns in a large,
otherwise low-rank matrix, in settings where {}{the data} are noisy, or where
the overall matrix has missing elements. We propose a randomized two-step
inference framework, and establish sufficient conditions on the required sample
complexities under which these methods succeed (with high probability) in
accurately locating the outliers for each task. Comprehensive numerical
experimental results are provided to verify the theoretical bounds and
demonstrate the computational efficiency of the proposed algorithm.
Holger Rauhut, Ulrich Terstiege
Comments: 3 pages
Subjects: Information Theory (cs.IT); Probability (math.PR)
The task of reconstructing a low rank matrix from incomplete linear
measurements arises in areas such as machine learning, quantum state tomography
and in the phase retrieval problem. In this note, we study the particular setup
that the measurements are taken with respect to rank one matrices constructed
from the elements of a random tight frame. We consider a convex optimization
approach and show both robustness of the reconstruction with respect to noise
on the measurements as well as stability with respect to passing to
approximately low rank matrices. This is achieved by establishing a version of
the null space property of the corresponding measurement map.
Yahia A. Eldemerdash, Octavia A. Dobre, Bruce J. Liao
Subjects: Information Theory (cs.IT)
This paper proposes an efficient identification algorithm for spatial
multiplexing (SM) and Alamouti (AL) coded orthogonal frequency division
multiplexing (OFDM) signals. The cross-correlation between the received signals
from different antennas is exploited to provide a discriminating feature to
identify SM-OFDM and AL-OFDM signals. The proposed algorithm requires neither
estimation of the channel coefficients and noise power, nor the modulation of
the transmitted signal. Moreover, it does not need space-time block code (STBC)
or OFDM block synchronization. The effectiveness of the proposed algorithm is
demonstrated through extensive simulation experiments in the presence of
diverse transmission impairments, such as time and frequency offsets, Doppler
frequency, and spatially correlated fading.
Christos G. Tsinos, Sina Maleki, Symeon Chatzinotas, Bjorn Ottersten
Comments: Millimeter wave (mmWave), Cognitive Radio, Multiple-Input Multiple-Output (MIMO), Large-scale antenna arrays, Hybrid Pre-coding, Alternating Direction Method of Multipliers (ADMM)
Subjects: Information Theory (cs.IT)
Milimeter wave (mmWave) band mobile communications can be a solution to the
continuously increasing traffic demand in modern wireless systems. Even though
mmWave bands are scarcely occupied, the design of a prospect transceiver should
guarantee the efficient coexistence with the incumbent services in these bands.
To that end, in this paper, two underlay cognitive transceiver designs are
proposed that enable the mmWave spectrum access while controlling the
interference to the incumbent users. MmWave systems usually require large
antenna arrays to achieve satisfactory performance and thus, they cannot
support fully digital transceiver designs due to high demands in hardware
complexity and power consumption. Thus, in order to develop efficient
solutions, the proposed approaches are based on a hybrid analog-digital
pre-coding architecture. In such hybrid designs, the overall beamformer can be
factorized in a low dimensional digital counterpart applied in the baseband and
in an analog one applied in the RF domain. The first cognitive solution
developed in this paper designs the cognitive hybrid pre-coder by maximizing
the mutual information between its two ends subject to interference, power and
hardware constraints related to the analog counterpart. The second solution
aims at reduced complexity requirements and thus derives the hybrid pre-coder
by minimizing the Frobenious norm of its difference to the optimal digital only
one. A novel solution for the post-coder at the cognitive receiver part is
further proposed here based on a hardware constrained Minimum Mean Square Error
criterion. Simulations show that the performance of both the proposed hybrid
approaches is very close to the one of the fully digital solution for typical
wireless environments.
Frank Nielsen, Richard Nock
Comments: 18 pages
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We present a series of closed-form maximum entropy upper bounds for the
differential entropy of a continuous univariate random variable and study the
properties of that series. We then show how to use those generic bounds for
upper bounding the differential entropy of Gaussian mixture models. This
requires to calculate the raw moments and raw absolute moments of Gaussian
mixtures in closed-form that may also be handy in statistical machine learning
and information theory. We report on our experiments and discuss on the
tightness of those bounds.
Rimvydas Aleksiejunas
Comments: 16 pages, 6 figures
Subjects: Information Theory (cs.IT)
A method for reconstructing multiple-input multiple-output (MIMO) channel
correlation matrices from lower dimensional channel measurements is presented.
Exploiting the symmetry of correlation matrix structure enables reproducing
higher dimensional MIMO channel matrices from available lower order
measurements. This leads to practically important applications allowing
prediction of higher dimensional MIMO system capacity. In particular, we study
Kronecker-type MIMO channels suitable for reconstructing full channel matrices
from partial information about transmit-receive fading in spatial and
polarimetric domains and analyze validity conditions for such models. One of
the important channel conditions is Doppler frequency related to
non-stationarity in the environment. We present simulations of cluster-type
scattering model using 2×2 MIMO channel correlation matrices to predict
performance of 2×4 MIMO system including recovery of angular power spectrum. An
example of dual circular polarized 2×4 MIMO land mobile satellite measurements
in 2.5 GHz frequency band illustrates applicability of the method to
reconstruct spatial and polarimetric channel correlation matrices for
estimating ergodic channel capacity from single-antenna or uni-polarized
measurements.
Mohammad Eslami, Farah Torkamani-Azar, Esfandiar Mehrshahi
Comments: The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), Ph.D. Forum
Subjects: Information Theory (cs.IT)
Spectrum resources are facing huge demands and cognitive radio (CR) can
improve the spectrum utilization. Recently, power spectral density (PSD) map is
defined to enable the CR to reuse the frequency resources regarding to the
area. For this reason, the sensed PSDs are fused by a Fusion Center (FC) which
the sensed PSDs are collected by the distributed sensors in the area. But, for
a given zone, the sensed PSD by neighbor CR sensors may contain a shared common
component for a while. This component can be exploited in the theory of the
distributed source coding (DSC) to compress sensing data more. In this paper
based on the distributed compressive sensing (DCS) a method is proposed to
compress and reconstruct the PSDs of the sensors when the data transmission is
slightly imperfect. Simulation results show the advantages of using proposed
method in compressing, reducing overhead and also recovering PSDs. % Proposed
method can be used to develop a framework when the holding times of the users
are large in comparison with the rate of the spectrum sensing.
Constantinos Daskalakis, Qinxuan Pan
Subjects: Learning (cs.LG); Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)
We show that the square Hellinger distance between two Bayesian networks on
the same directed graph, (G), is subadditive with respect to the neighborhoods
of (G). Namely, if (P) and (Q) are the probability distributions defined by two
Bayesian networks on the same DAG, our inequality states that the square
Hellinger distance, (H^2(P,Q)), between (P) and (Q) is upper bounded by the
sum, (sum_v H^2(P_{{v} cup Pi_v}, Q_{{v} cup Pi_v})), of the square
Hellinger distances between the marginals of (P) and (Q) on every node (v) and
its parents (Pi_v) in the DAG. Importantly, our bound does not involve the
conditionals but the marginals of (P) and (Q). We derive a similar inequality
for more general Markov Random Fields.
As an application of our inequality, we show that distinguishing whether two
Bayesian networks (P) and (Q) on the same (but potentially unknown) DAG satisfy
(P=Q) vs (d_{
m TV}(P,Q)>epsilon) can be performed from
( ilde{O}(|Sigma|^{3/4(d+1)} cdot n/epsilon^2)) samples, where (d) is the
maximum in-degree of the DAG and (Sigma) the domain of each variable of the
Bayesian networks. If (P) and (Q) are defined on potentially different and
potentially unknown trees, the sample complexity becomes
( ilde{O}(|Sigma|^{4.5} n/epsilon^2)), whose dependence on (n, epsilon) is
optimal up to logarithmic factors. Lastly, if (P) and (Q) are product
distributions over ({0,1}^n) and (Q) is known, the sample complexity becomes
(O(sqrt{n}/epsilon^2)), which is optimal up to constant factors.
Clement Canonne, Ilias Diakonikolas, Daniel Kane, Alistair Stewart
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Statistics Theory (math.ST)
This work initiates a systematic investigation of testing {em
high-dimensional} structured distributions by focusing on testing {em Bayesian
networks} — the prototypical family of directed graphical models. A Bayesian
network is defined by a directed acyclic graph, where we associate a random
variable with each node. The value at any particular node is conditionally
independent of all the other non-descendant nodes once its parents are fixed.
Specifically, we study the properties of identity testing and closeness testing
of Bayesian networks. Our main contribution is the first non-trivial efficient
testing algorithms for these problems and corresponding information-theoretic
lower bounds. For a wide range of parameter settings, our testing algorithms
have sample complexity {em sublinear} in the dimension and are sample-optimal,
up to constant factors.
Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)
Given samples from an unknown multivariate distribution (p), is it possible
to distinguish whether (p) is the product of its marginals versus (p) being far
from every product distribution? Similarly, is it possible to distinguish
whether (p) equals a given distribution (q) versus (p) and (q) being far from
each other? These problems of testing independence and goodness-of-fit have
received enormous attention in statistics, information theory, and theoretical
computer science, with sample-optimal algorithms known in several interesting
regimes of parameters. Unfortunately, it has also been understood that these
problems become intractable in large dimensions, necessitating exponential
sample complexity.
Motivated by the exponential lower bounds for general distributions as well
as the ubiquity of Markov Random Fields (MRFs) in the modeling of
high-dimensional distributions, we initiate the study of distribution testing
on structured multivariate distributions, and in particular the prototypical
example of MRFs: the Ising Model. We demonstrate that, in this structured
setting, we can avoid the curse of dimensionality, obtaining sample and time
efficient testers for independence and goodness-of-fit. Along the way, we
develop new tools for establishing concentration of functions of the Ising
model, using the exchangeable pairs framework developed by Chatterjee, and
improving upon this framework. In particular, we prove tighter concentration
results for multi-linear functions of the Ising model in the high-temperature
regime.