Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Dan Fink, Olivier Francon, Bala Raju, Arshak Navruzyan, Nigel Duffy, Babak Hodjat
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
The success of deep learning depends on finding an architecture to fit the
task. As deep learning has scaled up to more challenging tasks, the
architectures have become difficult to design by hand. This paper proposes an
automated method, CoDeepNEAT, for optimizing deep learning architectures
through evolution. By extending existing neuroevolution methods to topology,
components, and hyperparameters, this method achieves results comparable to
best human designs in standard benchmarks in object recognition and language
modeling. It also supports building a real-world application of automated image
captioning on a magazine website. Given the anticipated increases in available
computing power, evolution of deep networks is promising approach to
constructing deep learning applications in the future.
Risto Miikkulainen, Neil Iscoe, Aaron Shagrin, Ron Cordell, Cory Schoolland, Myles Brundage, Jonathan Epstein, Randy Dean, Gurmeet Lamba
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Conversion optimization means designing a web interface so that as many users
as possible take a desired action on it, such as register or purchase. Such
design is usually done by hand, testing one change at a time through A/B
testing, or a limited number of combinations through multivariate testing,
making it possible to evaluate only a small fraction of designs in a vast
design space. This paper describes Sentient Ascend, an automatic conversion
optimization system that uses evolutionary optimization to create effective web
interface designs. Ascend makes it possible to discover and utilize
interactions between the design elements that are difficult to identify
otherwise. Moreover, evaluation of design candidates is done in parallel
online, i.e. with a large number of real users interacting with the system. A
case study on a lead generation site learnhotobecome.org shows that significant
improvements (i.e. over 43%) are possible beyond human design. Ascend can
therefore be seen as an approach to massively multivariate conversion
optimization, based on a massively parallel interactive evolution.
Wojciech Marian Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
When training neural networks, the use of Synthetic Gradients (SG) allows
layers or modules to be trained without update locking – without waiting for a
true error gradient to be backpropagated – resulting in Decoupled Neural
Interfaces (DNIs). This unlocked ability of being able to update parts of a
neural network asynchronously and with only local information was demonstrated
to work empirically in Jaderberg et al (2016). However, there has been very
little demonstration of what changes DNIs and SGs impose from a functional,
representational, and learning dynamics point of view. In this paper, we study
DNIs through the use of synthetic gradients on feed-forward networks to better
understand their behaviour and elucidate their effect on optimisation. We show
that the incorporation of SGs does not affect the representational strength of
the learning system for a neural network, and prove the convergence of the
learning system for linear and deep linear models. On practical problems we
investigate the mechanism by which synthetic gradient estimators approximate
the true loss, and, surprisingly, how that leads to drastically different
layer-wise representations. Finally, we also expose the relationship of using
synthetic gradients to other error approximation techniques and find a unifying
language for discussion and comparison.
Niek Tax, Ilya Verenich, Marcello La Rosa, Marlon Dumas
Subjects: Applications (stat.AP); Databases (cs.DB); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Predictive business process monitoring methods exploit logs of completed
cases of a process in order to make predictions about running cases thereof.
Existing methods in this space are tailor-made for specific prediction tasks.
Moreover, their relative accuracy is highly sensitive to the dataset at hand,
thus requiring users to engage in trial-and-error and tuning when applying them
in a specific setting. This paper investigates Long Short-Term Memory (LSTM)
neural networks as an approach to build consistently accurate models for a wide
range of predictive process monitoring tasks. First, we show that LSTMs
outperform existing techniques to predict the next event of a running case and
its timestamp. Next, we show how to use models for predicting the next task in
order to predict the full continuation of a running case. Finally, we apply the
same approach to predict the remaining time, and show that this approach
outperforms existing tailor-made methods.
Adrian Bulat, Georgios Tzimiropoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This work is on landmark localization using binarized approximations of
Convolutional Neural Networks (CNNs). Our goal is to design architectures that
retain the groundbreaking performance of CNNs for landmark localization and at
the same time are lightweight, compact and suitable for applications with
limited computational resources. To this end, we make the following
contributions: (a) we are the first to study the effect of neural network
binarization on localization tasks, namely human pose estimation and face
alignment. We exhaustively evaluate various design choices, identify
performance bottlenecks, and more importantly propose multiple orthogonal ways
to boost performance. (b) Based on our analysis, we propose a novel
hierarchical, parallel and multi-scale residual block architecture that yields
large performance improvement over the standard bottleneck block when having
the same number of parameters, thus bridging the gap between the original
network and its binarized counterpart. (c) We also show that the performance
boost offered by the proposed architecture is not only observed for the case of
binary networks but also generalizes for the case of real valued weights and
activations. (d) We perform a large number of ablation studies that shed light
on the properties and the performance of the proposed block. (e) We present
results for experiments on the most challenging datasets for human pose
estimation and face alignment, reporting in many cases state-of-the-art
performance. Code can be download from
this http URL
Rafael Teixeira Sousa, Larissa Vasconcellos de Moraes
Comments: Abstract submitted as a requirement to ISIC2017 challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper describes the participation of Araguaia Medical Vision Lab at the
International Skin Imaging Collaboration 2017 Skin Lesion Challenge. We
describe the use of deep convolutional neural networks in attempt to classify
images of Melanoma and Seborrheic Keratosis lesions. With use of finetuned
GoogleNet and AlexNet we attained results of 0.950 and 0.846 AUC on Seborrheic
Keratosis and Melanoma respectively.
Ming-Yu Liu, Thomas Breuel, Jan Kautz
Comments: 19 pages, 19 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Most of the existing image-to-image translation frameworks—mapping an image
in one domain to a corresponding image in another—are based on supervised
learning, i.e., pairs of corresponding images in two domains are required for
learning the translation function. This largely limits their applications,
because capturing corresponding images in two different domains is often a
difficult task. To address the issue, we propose the UNsupervised
Image-to-image Translation (UNIT) framework, which is based on variational
autoencoders and generative adversarial networks. The proposed framework can
learn the translation function without any corresponding images in two domains.
We enable this learning capability by combining a weight-sharing constraint and
an adversarial training objective. Through visualization results from various
unsupervised image translation tasks, we verify the effectiveness of the
proposed framework. An ablation study further reveals the critical design
choices. Moreover, we apply the UNIT framework to the unsupervised domain
adaptation task and achieve better results than competing algorithms do in
benchmark datasets.
Luis Contreras, Walterio Mayol-Cuevas
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a study on the use of Convolutional Neural Networks for
camera relocalisation and its application to map compression. We follow state
of the art visual relocalisation results and evaluate response to different
data inputs — namely, depth, grayscale, RGB, spatial position and combinations
of these. We use a CNN map representation and introduce the notion of CNN map
compression by using a smaller CNN architecture. We evaluate our proposal in a
series of publicly available datasets. This formulation allows us to improve
relocalisation accuracy by increasing the number of training trajectories while
maintaining a constant-size CNN.
Guangcan Mai, Kai Cao, Pong C. Yuen, Anil K. Jain
Subjects: Computer Vision and Pattern Recognition (cs.CV)
State-of-the-art face recognition systems are based on deep (convolutional)
neural networks. Therefore, it is imperative to determine to what extent face
templates derived from deep networks can be inverted to obtain the original
face image. In this paper, we discuss the vulnerabilities of a face recognition
system based on deep templates, extracted by deep networks under image
reconstruction attack. We propose a de-convolutional neural network (D-CNN) to
reconstruct images of faces from their deep templates. In our experiments, we
did not assume any knowledge about the target subject nor the deep network. To
train the D-CNN reconstruction models, we augmented existing face datasets with
a large collection of images synthesized using a face generator. The proposed
reconstruction method was evaluated using type-I (comparing the reconstructed
images against the original face images used to generate the deep template) and
type-II (comparing the reconstructed images against a different face image of
the same subject) attacks. We conducted a three-trial attack for each target
face image using three face images reconstructed from three different D-CNNs.
Each D-CNN was trained on a different dataset (VGG-Face, CASIA-Webface, or
Multi-PIE). The type-I attack achieved a true accept rate (TAR) of 85.48\% at a
false accept rate (FAR) of 0.1\% on the LFW dataset. The corresponding TAR for
the type-II attack is 14.71\%. Our experimental results demonstrate the need to
secure deep templates in face recognition systems.
Felipe Petroski Such, Shagan Sah, Miguel Dominguez, Suhas Pillai, Chao Zhang, Andrew Michael, Nathan Cahill, Raymond Ptucha
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional Neural Networks (CNNs) have recently led to incredible
breakthroughs on a variety of pattern recognition problems. Banks of finite
impulse response filters are learned on a hierarchy of layers, each
contributing more abstract information than the previous layer. The simplicity
and elegance of the convolutional filtering process makes them perfect for
structured problems such as image, video, or voice, where vertices are
homogeneous in the sense of number, location, and strength of neighbors. The
vast majority of classification problems, for example in the pharmaceutical,
homeland security, and financial domains are unstructured. As these problems
are formulated into unstructured graphs, the heterogeneity of these problems,
such as number of vertices, number of connections per vertex, and edge
strength, cannot be tackled with standard convolutional techniques. We propose
a novel neural learning framework that is capable of handling both homogeneous
and heterogeneous data, while retaining the benefits of traditional CNN
successes.
Recently, researchers have proposed variations of CNNs that can handle graph
data. In an effort to create learnable filter banks of graphs, these methods
either induce constraints on the data or require considerable preprocessing. As
opposed to defining filters as spectral multipliers applied to the eigenvectors
of the graph Laplacian, our framework, which we term Graph-CNNs, defines
filters as polynomials of functions of the graph adjacency matrix. Graph-CNNs
can handle both heterogeneous and homogeneous graph data, including graphs
having entirely different vertex or edge sets. We compare Graph-CNN to
traditional CNNs using the CIFAR-10 and Imagenet image classification datasets.
Pranav Shyam, Shubham Gupta, Ambedkar Dukkipati
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Many problems in Artificial Intelligence and Machine Learning can be reduced
to the problem of quantitative comparison of two entities. In Deep Learning the
ubiquitous architecture used for this task is the Siamese Neural Network which
maps each entity to a representation through a learnable function and expresses
similarity through the distances among the entities in the representation
space. In this paper, we argue that such a static and invariant mapping is both
naive and unnatural. We develop a novel neural model called Attentive Recurrent
Comparators (ARCs) that dynamically compares two entities and test the model
extensively on the Omniglot dataset. In the task of similarity learning, our
simplistic model that does not use any convolutions performs on par with Deep
Convolutional Siamese Networks and significantly better when convolutional
layers are also used. In the challenging task of one-shot learning on the same
dataset, an ARC based model achieves the first super-human performance for a
neural method with an error rate of 1.5\%.
Jakub Sochor, Jakub Špaňhel, Adam Herout
Comments: under review in IJCV
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we focus on fine-grained recognition of vehicles mainly in
traffic surveillance applications. We propose an approach orthogonal to recent
advancement in fine-grained recognition (automatic part discovery, bilinear
pooling). Also, in contrast to other methods focused on fine-grained
recognition of vehicles, we do not limit ourselves to frontal/rear viewpoint
but allow the vehicles to be seen from any viewpoint. Our approach is based on
3D bounding boxes built around the vehicles. The bounding box can be
automatically constructed from traffic surveillance data. For scenarios where
it is not possible to use the precise construction, we propose a method for
estimation of the 3D bounding box. The 3D bounding box is used to normalize the
image viewpoint by unpacking the image into plane. We also propose to randomly
alter the color of the image and add a rectangle with random noise to random
position in the image during training Convolutional Neural Networks. We have
collected a large fine-grained vehicle dataset BoxCars116k, with 116k images of
vehicles from various viewpoints taken by numerous surveillance cameras. We
performed a number of experiments which show that our proposed method
significantly improves CNN classification accuracy (the accuracy is increased
by up to 12 percent points and the error is reduced by up to 50% compared to
CNNs without the proposed modifications). We also show that our method
outperforms state-of-the-art methods for fine-grained recognition.
Sarfaraz Hussein, Robert Gillies, Kunlin Cao, Qi Song, Ulas Bagci
Comments: Accepted for publication in IEEE International Symposium on Biomedical Imaging (ISBI) 2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Characterization of lung nodules as benign or malignant is one of the most
important tasks in lung cancer diagnosis, staging and treatment planning. While
the variation in the appearance of the nodules remains large, there is a need
for a fast and robust computer aided system. In this work, we propose an
end-to-end trainable multi-view deep Convolutional Neural Network (CNN) for
nodule characterization. First, we use median intensity projection to obtain a
2D patch corresponding to each dimension. The three images are then
concatenated to form a tensor, where the images serve as different channels of
the input image. In order to increase the number of training samples, we
perform data augmentation by scaling, rotating and adding noise to the input
image. The trained network is used to extract features from the input image
followed by a Gaussian Process (GP) regression to obtain the malignancy score.
We also empirically establish the significance of different high level nodule
attributes such as calcification, sphericity and others for malignancy
determination. These attributes are found to be complementary to the deep
multi-view CNN features and a significant improvement over other methods is
obtained.
Yanyan Geng, Weizhi Li, Gaoyuan Liang, Jingbin Wang, Yanbin Wu, Nitin Patil, Jing-Yan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In the problems of image retrieval and annotation, complete textual tag lists
of images play critical roles. However, in real-world applications, the image
tags are usually incomplete, thus it is important to learn the complete tags
for images. In this paper, we study the problem of image tag complete and
proposed a novel method for this problem based on a popular image
representation method, convolutional neural network (CNN). The method estimates
the complete tags from the CNN representations of images based on a linear
predictor. The CNN parameters, linear predictor, and the complete tags are
learned jointly by our method. We build a minimization problem to encourage the
consistency between the complete tags and the available incomplete tags, reduce
the estimation error, and reduce the model complexity. An iterative algorithm
is developed to solve the minimization problem. Experiments over benchmark
image data sets show its effectiveness.
Yuexiang Li, Linlin Shen
Comments: ISIC2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Skin lesion is a severe disease in world-wide extent. Reliable automatic
detection of skin tumors is very useful to help increase the accuracy and
efficiency of pathologists. International Skin Imaging Collaboration (ISIC) is
a challenge focusing on the automatic analysis of skin lesion. In this brief
paper, we introduce two deep learning methods to address all the three tasks
announced in ISIC 2017, i.e. lesion segmentation (task 1), lesion dermoscopic
feature extraction (task 2) and lesion classification (task 3). A
fully-convolutional network is proposed to simultaneously address the tasks of
lesion segmentation and classification and a straight-forward CNN is proposed
for the dermoscopic feature extraction task. Experimental results on the
validation set show promising accuracies, i.e. 75.1% for task 1, 84.4% for task
2 and 90.8% for task 3 were achieved.
Jo Schlemper, Jose Caballero, Joseph V. Hajnal, Anthony Price, Daniel Rueckert
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The acquisition of Magnetic Resonance Imaging (MRI) is inherently slow.
Inspired by recent advances in deep learning, we propose a framework for
reconstructing MR images from undersampled data using a deep cascade of
convolutional neural networks to accelerate the data acquisition process. We
show that for Cartesian undersampling of 2D cardiac MR images, the proposed
method outperforms the state-of-the-art compressed sensing approaches, such as
dictionary learning-based MRI (DLMRI) reconstruction, in terms of
reconstruction error, perceptual quality and reconstruction speed for both
3-fold and 6-fold undersampling. Compared to DLMRI, the error produced by the
method proposed is approximately twice as small, allowing to preserve
anatomical structures more faithfully. Using our method, each image can be
reconstructed in 23 ms, which is fast enough to enable real-time applications.
Murase Tomoya, Tanaka Kanji
Comments: 8 pages, 9 figures, technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper addresses the problem of change detection from a novel perspective
of long-term map learning. We are particularly interested in designing an
approach that can scale to large maps and that can function under global
uncertainty in the viewpoint (i.e., GPS-denied situations). Our approach, which
utilizes a compact bag-of-words (BoW) scene model, makes several contributions
to the problem:
1) Two kinds of prior information are extracted from the view sequence map
and used for change detection. Further, we propose a novel type of prior,
called motion prior, to predict the relative motions of stationary objects and
anomaly ego-motion detection. The proposed prior is also useful for
distinguishing stationary from non-stationary objects.
2) A small set of good reference images (e.g., 10) are efficiently retrieved
from the view sequence map by employing the recently developed
Bag-of-Local-Convolutional-Features (BoLCF) scene model.
3) Change detection is reformulated as a scene retrieval over these reference
images to find changed objects using a novel spatial Bag-of-Words (SBoW) scene
model. Evaluations conducted of individual techniques and also their
combinations on a challenging dataset of highly dynamic scenes in the publicly
available Malaga dataset verify their efficacy.
Md Amirul Islam, Shujon Naha, Mrigank Rochan, Neil Bruce, Yang Wang
Comments: 9 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We consider the problem of semantic image segmentation using deep
convolutional neural networks. We propose a novel network architecture called
the label refinement network that predicts segmentation labels in a
coarse-to-fine fashion at several resolutions. The segmentation labels at a
coarse resolution are used together with convolutional features to obtain finer
resolution segmentation labels. We define loss functions at several stages in
the network to provide supervisions at different stages. Our experimental
results on several standard datasets demonstrate that the proposed model
provides an effective way of producing pixel-wise dense image labeling.
Hao Chang
Comments: 5 pages, 2 figures. ISIC2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
As one kind of skin cancer, melanoma is very dangerous. Dermoscopy based
early detection and recarbonization strategy is critical for melanoma therapy.
However, well-trained dermatologists dominant the diagnostic accuracy. In order
to solve this problem, many effort focus on developing automatic image analysis
systems. Here we report a novel strategy based on deep learning technique, and
achieve very high skin lesion segmentation and melanoma diagnosis accuracy: 1)
we build a segmentation neural network (skin_segnn), which achieved very high
lesion boundary detection accuracy; 2) We build another very deep neural
network based on Google inception v3 network (skin_recnn) and its well-trained
weight. The novel designed transfer learning based deep neural network
skin_inceptions_v3_nn helps to achieve a high prediction accuracy.
Matt Berseth
Comments: ISIC2017
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Our system addresses Part 1, Lesion Segmentation and Part 3, Lesion
Classification of the ISIC 2017 challenge. Both algorithms make use of deep
convolutional networks to achieve the challenge objective.
Yu-Chuan Su, Kristen Grauman
Subjects: Computer Vision and Pattern Recognition (cs.CV)
360{deg} video requires human viewers to actively control “where” to look
while watching the video. Although it provides a more immersive experience of
the visual content, it also introduces additional burden for viewers; awkward
interfaces to navigate the video lead to suboptimal viewing experiences.
Virtual cinematography is an appealing direction to remedy these problems, but
conventional methods are limited to virtual environments or rely on
hand-crafted heuristics. We propose a new algorithm for virtual cinematography
that automatically controls a virtual camera within a 360{deg} video. Compared
to the state of the art, our algorithm allows more general camera control,
avoids redundant outputs, and extracts its output videos substantially more
efficiently. Experimental results on over 7 hours of real “in the wild” video
show that our generalized camera control is crucial for viewing 360{deg}
video, while the proposed efficient algorithm is essential for making the
generalized control computationally tractable.
Tuan Anh Le, Atilim Gunes Baydin, Robert Zinkov, Frank Wood
Comments: 8 pages, 4 figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We draw a formal connection between using synthetic training data to optimize
neural network parameters and approximate, Bayesian, model-based reasoning. In
particular, training a neural network using synthetic data can be viewed as
learning a proposal distribution generator for approximate inference in the
synthetic-data generative model. We demonstrate this connection in a
recognition task where we develop a novel Captcha-breaking architecture and
train it using synthetic data, demonstrating both state-of-the-art performance
and a way of computing task-specific posterior uncertainty. Using a neural
network trained this way, we also demonstrate successful breaking of real-world
Captchas currently used by Facebook and Wikipedia. Reasoning from these
empirical results and drawing connections with Bayesian modeling, we discuss
the robustness of synthetic data results and suggest important considerations
for ensuring good neural network generalization when training with synthetic
data.
Huan-Chih Wang, Shih-Hao Ho, Furen Xiao, Jen-Hai Chou
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV)
Brain CT has become a standard imaging tool for emergent evaluation of brain
condition, and measurement of midline shift (MLS) is one of the most important
features to address for brain CT assessment. We present a simple method to
estimate MLS and propose a new alternative parameter to MLS: the ratio of MLS
over the maximal width of intracranial region (MLS/ICWMAX). Three neurosurgeons
and our automated system were asked to measure MLS and MLS/ICWMAX in the same
sets of axial CT images obtained from 41 patients admitted to ICU under
neurosurgical service. A weighted midline (WML) was plotted based on individual
pixel intensities, with higher weighted given to the darker portions. The MLS
could then be measured as the distance between the WML and ideal midline (IML)
near the foramen of Monro. The average processing time to output an automatic
MLS measurement was around 10 seconds. Our automated system achieved an overall
accuracy of 90.24% when the CT images were calibrated automatically, and
performed better when the calibrations of head rotation were done manually
(accuracy: 92.68%). MLS/ICWMAX and MLS both gave results in same confusion
matrices and produced similar ROC curve results. We demonstrated a simple, fast
and accurate automated system of MLS measurement and introduced a new parameter
(MLS/ICWMAX) as a good alternative to MLS in terms of estimating the degree of
brain deformation, especially when non-DICOM images (e.g. JPEG) are more easily
accessed.
Malte Schmidt, Dimitri Block, Uwe Meier
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The steadily growing use of license-free frequency bands requires reliable
coexistence management for deterministic medium utilization. For interference
mitigation, proper wireless interference identification (WII) is essential. In
this work we propose the first WII approach based upon deep convolutional
neural networks (CNNs). The CNN naively learns its features through
self-optimization during an extensive data-driven GPU-based training process.
We propose a CNN example which is based upon sensing snapshots with a limited
duration of 12.8 {mu}s and an acquisition bandwidth of 10 MHz. The CNN differs
between 15 classes. They represent packet transmissions of IEEE 802.11 b/g,
IEEE 802.15.4 and IEEE 802.15.1 with overlapping frequency channels within the
2.4 GHz ISM band. We show that the CNN outperforms state-of-the-art WII
approaches and has a classification accuracy greater than 95% for
signal-to-noise ratio of at least -5 dB.
Nicolas Gillis
Comments: 18 pages, 4 figures
Journal-ref: SIAG/OPT Views and News 25 (1), pp. 7-16 (2017)
Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)
In this paper, we introduce and provide a short overview of nonnegative
matrix factorization (NMF). Several aspects of NMF are discussed, namely, the
application in hyperspectral imaging, geometry and uniqueness of NMF solutions,
complexity, algorithms, and its link with extended formulations of polyhedra.
In order to put NMF into perspective, the more general problem class of
constrained low-rank matrix approximation problems is first briefly introduced.
Tianmin Shu, Xiaofeng Gao, Michael S. Ryoo, Song-Chun Zhu
Comments: The 2017 IEEE International Conference on Robotics and Automation (ICRA)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a general framework for learning social affordance
grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human
interactions, and transfer the grammar to humanoids to enable a real-time
motion inference for human-robot interaction (HRI). Based on Gibbs sampling,
our weakly supervised grammar learning can automatically construct a
hierarchical representation of an interaction with long-term joint sub-tasks of
both agents and short term atomic actions of individual agents. Based on a new
RGB-D video dataset with rich instances of human interactions, our experiments
of Baxter simulation, human evaluation, and real Baxter test demonstrate that
the model learned from limited training data successfully generates human-like
behaviors in unseen scenarios and outperforms both baselines.
Retuh Mirsky, Ya'akov (Kobi)Gal
Subjects: Artificial Intelligence (cs.AI)
Plan Recognition algorithms require to recognize a complete hierarchy
explaining the agent’s actions and goals. While the output of such algorithms
is informative to the recognizer, the cost of its calculation is high in
run-time, space, and completeness. Moreover, performing plan recognition online
requires the observing agent to reason about future actions that have not yet
been seen and maintain a set of hypotheses to support all possible options.
This paper presents a new and efficient algorithm for online plan recognition
called SLIM (Semi-Lazy Inference Mechanism). It combines both a bottom-up and
top-down parsing processes, which allow it to commit only to the minimum
necessary actions in real-time, but still provide complete hypotheses post
factum. We show both theoretically and empirically that although the
computational cost of this process is still exponential, there is a significant
improvement in run-time when compared to a state of the art of plan recognition
algorithm.
Pierre Roy, Alexandre Papadopoulos, François Pachet
Comments: 16 pages, 11 figures
Subjects: Artificial Intelligence (cs.AI)
Machine-learning techniques have been recently used with spectacular results
to generate artefacts such as music or text. However, these techniques are
still unable to capture and generate artefacts that are convincingly
structured. In this paper we present an approach to generate structured musical
sequences. We introduce a mechanism for sampling efficiently variations of
musical sequences. Given a input sequence and a statistical model, this
mechanism samples a set of sequences whose distance to the input sequence is
approximately within specified bounds. This mechanism is implemented as an
extension of belief propagation, and uses local fields to bias the generation.
We show experimentally that sampled sequences are indeed closely correlated to
the standard musical similarity measure defined by Mongeau and Sankoff. We then
show how this mechanism can used to implement composition strategies that
enforce arbitrary structure on a musical lead sheet generation problem.
Virag Shah, Lennart Gulikers, Laurent Massoulie, Milan Vojnovic
Comments: 19 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Online two-sided matching markets such as Q&A forums (e.g. StackOverflow,
Quora) and online labour platforms (e.g. Upwork) critically rely on the ability
to propose adequate matches based on imperfect knowledge of the two parties to
be matched. This prompts the following question: Which matching recommendation
algorithms can, in the presence of such uncertainty, lead to efficient platform
operation?
To answer this question, we develop a model of a task / server matching
system. For this model, we give a necessary and sufficient condition for an
incoming stream of tasks to be manageable by the system. We further identify a
so-called back-pressure policy under which the throughput that the system can
handle is optimized. We show that this policy achieves strictly larger
throughput than a natural greedy policy. Finally, we validate our model and
confirm our theoretical findings with experiments based on logs of
Math.StackExchange, a StackOverflow forum dedicated to mathematics.
Ming-Yu Liu, Thomas Breuel, Jan Kautz
Comments: 19 pages, 19 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Most of the existing image-to-image translation frameworks—mapping an image
in one domain to a corresponding image in another—are based on supervised
learning, i.e., pairs of corresponding images in two domains are required for
learning the translation function. This largely limits their applications,
because capturing corresponding images in two different domains is often a
difficult task. To address the issue, we propose the UNsupervised
Image-to-image Translation (UNIT) framework, which is based on variational
autoencoders and generative adversarial networks. The proposed framework can
learn the translation function without any corresponding images in two domains.
We enable this learning capability by combining a weight-sharing constraint and
an adversarial training objective. Through visualization results from various
unsupervised image translation tasks, we verify the effectiveness of the
proposed framework. An ablation study further reveals the critical design
choices. Moreover, we apply the UNIT framework to the unsupervised domain
adaptation task and achieve better results than competing algorithms do in
benchmark datasets.
Risto Miikkulainen, Neil Iscoe, Aaron Shagrin, Ron Cordell, Cory Schoolland, Myles Brundage, Jonathan Epstein, Randy Dean, Gurmeet Lamba
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Conversion optimization means designing a web interface so that as many users
as possible take a desired action on it, such as register or purchase. Such
design is usually done by hand, testing one change at a time through A/B
testing, or a limited number of combinations through multivariate testing,
making it possible to evaluate only a small fraction of designs in a vast
design space. This paper describes Sentient Ascend, an automatic conversion
optimization system that uses evolutionary optimization to create effective web
interface designs. Ascend makes it possible to discover and utilize
interactions between the design elements that are difficult to identify
otherwise. Moreover, evaluation of design candidates is done in parallel
online, i.e. with a large number of real users interacting with the system. A
case study on a lead generation site learnhotobecome.org shows that significant
improvements (i.e. over 43%) are possible beyond human design. Ascend can
therefore be seen as an approach to massively multivariate conversion
optimization, based on a massively parallel interactive evolution.
Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Dan Fink, Olivier Francon, Bala Raju, Arshak Navruzyan, Nigel Duffy, Babak Hodjat
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
The success of deep learning depends on finding an architecture to fit the
task. As deep learning has scaled up to more challenging tasks, the
architectures have become difficult to design by hand. This paper proposes an
automated method, CoDeepNEAT, for optimizing deep learning architectures
through evolution. By extending existing neuroevolution methods to topology,
components, and hyperparameters, this method achieves results comparable to
best human designs in standard benchmarks in object recognition and language
modeling. It also supports building a real-world application of automated image
captioning on a magazine website. Given the anticipated increases in available
computing power, evolution of deep networks is promising approach to
constructing deep learning applications in the future.
Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, Jason H. Moore
Comments: 14 pages, 5 figures, submitted for review to JMLR
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
The selection, development, or comparison of machine learning methods in data
mining can be a difficult task based on the target problem and goals of a
particular study. Numerous publicly available real-world and simulated
benchmark datasets have emerged from different sources, but their organization
and adoption as standards have been inconsistent. As such, selecting and
curating specific benchmarks remains an unnecessary burden on machine learning
practitioners and data scientists. The present study introduces an accessible,
curated, and developing public benchmark resource to facilitate identification
of the strengths and weaknesses of different machine learning methodologies. We
compare meta-features among the current set of benchmark datasets in this
resource to characterize the diversity of available data. Finally, we apply a
number of established machine learning methods to the entire benchmark suite
and analyze how datasets and algorithms cluster in terms of performance. This
work is an important first step towards understanding the limitations of
popular benchmarking suites and developing a resource that connects existing
benchmarking standards to more diverse and efficient standards in the future.
Tianmin Shu, Xiaofeng Gao, Michael S. Ryoo, Song-Chun Zhu
Comments: The 2017 IEEE International Conference on Robotics and Automation (ICRA)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a general framework for learning social affordance
grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human
interactions, and transfer the grammar to humanoids to enable a real-time
motion inference for human-robot interaction (HRI). Based on Gibbs sampling,
our weakly supervised grammar learning can automatically construct a
hierarchical representation of an interaction with long-term joint sub-tasks of
both agents and short term atomic actions of individual agents. Based on a new
RGB-D video dataset with rich instances of human interactions, our experiments
of Baxter simulation, human evaluation, and real Baxter test demonstrate that
the model learned from limited training data successfully generates human-like
behaviors in unseen scenarios and outperforms both baselines.
Shuchi Chawla, Nikhil Devanur, Janardhan Kulkarni, Rad Niazadeh
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Learning (cs.LG)
We consider a scheduling problem where a cloud service provider has multiple
units of a resource available over time. Selfish clients submit jobs, each with
an arrival time, deadline, length, and value. The service provider’s goal is to
implement a truthful online mechanism for scheduling jobs so as to maximize the
social welfare of the schedule. Recent work shows that under a stochastic
assumption on job arrivals, there is a single-parameter family of mechanisms
that achieves near-optimal social welfare. We show that given any such family
of near-optimal online mechanisms, there exists an online mechanism that in the
worst case performs nearly as well as the best of the given mechanisms. Our
mechanism is truthful whenever the mechanisms in the given family are truthful
and prompt, and achieves optimal (within constant factors) regret.
We model the problem of competing against a family of online scheduling
mechanisms as one of learning from expert advice. A primary challenge is that
any scheduling decisions we make affect not only the payoff at the current
step, but also the resource availability and payoffs in future steps.
Furthermore, switching from one algorithm (a.k.a. expert) to another in an
online fashion is challenging both because it requires synchronization with the
state of the latter algorithm as well as because it affects the incentive
structure of the algorithms. We further show how to adapt our algorithm to a
non-clairvoyant setting where job lengths are unknown until jobs are run to
completion. Once again, in this setting, we obtain truthfulness along with
asymptotically optimal regret (within poly-logarithmic factors).
Shreesh Kumara Bhat, Aron Culotta
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Consumer protection agencies are charged with safeguarding the public from
hazardous products, but the thousands of products under their jurisdiction make
it challenging to identify and respond to consumer complaints quickly. From the
consumer’s perspective, online reviews can provide evidence of product defects,
but manually sifting through hundreds of reviews is not always feasible. In
this paper, we propose a system to mine Amazon.com reviews to identify products
that may pose safety or health hazards. Since labeled data for this task are
scarce, our approach combines positive unlabeled learning with domain
adaptation to train a classifier from consumer complaints submitted to the U.S.
Consumer Product Safety Commission. On a validation set of manually annotated
Amazon product reviews, we find that our approach results in an absolute F1
score improvement of 8% over the best competing baseline. Furthermore, we apply
the classifier to Amazon reviews of known recalled products; the classifier
identifies reviews reporting safety hazards prior to the recall date for 45% of
the products. This suggests that the system may be able to provide an early
warning system to alert consumers to hazardous products before an official
recall is announced.
Jason S. Kessler
Comments: 6 pages, 5 figures. See this https URL for source code and documentation
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Scattertext is an open source tool for visualizing linguistic variation
between document categories in a language-independent way. The tool presents a
scatterplot, where each axis corresponds to the rank-frequency a term occurs in
a category of documents. Through a tie-breaking strategy, the tool is able to
display thousands of visible term-representing points and find space to legibly
label hundreds of them. Scattertext also lends itself to a query-based
visualization of how the use of terms with similar embeddings differs between
document categories, as well as a visualization for comparing the importance
scores of bag-of-words features to univariate metrics.
Shuming Ma, Xu Sun
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
To speed up the training process, many existing systems use parallel
technology for online learning algorithms. However, most research mainly focus
on stochastic gradient descent (SGD) instead of other algorithms. We propose a
generic online parallel learning framework for large margin models, and also
analyze our framework on popular large margin algorithms, including MIRA and
Structured Perceptron. Our framework is lock-free and easy to implement on
existing systems. Experiments show that systems with our framework can gain
near linear speed up by increasing running threads, and with no loss in
accuracy.
Xu Sun, Shuming Ma
Subjects: Computation and Language (cs.CL)
Dependency parsing is an important NLP task. A popular approach for
dependency parsing is structured perceptron. Still, graph-based dependency
parsing has the time complexity of (O(n^3)), and it suffers from slow training.
To deal with this problem, we propose a parallel algorithm called parallel
perceptron. The parallel algorithm can make full use of a multi-core computer
which saves a lot of training time. Based on experiments we observe that
dependency parsing with parallel perceptron can achieve 8-fold faster training
speed than traditional structured perceptron methods when using 10 threads, and
with no loss at all in accuracy.
Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, Hui Xiong
Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)
During the course of human language evolution, the semantic meanings of words
keep evolving with time. The understanding of evolving semantics enables us to
capture the true meaning of the words in different usage contexts, and thus is
critical for various applications, such as machine translation. While it is
naturally promising to study word semantics in a time-aware manner, traditional
methods to learn word vector representation do not adequately capture the
change over time. To this end, in this paper, we aim at learning time-aware
vector representation of words through dynamic word embedding modeling.
Specifically, we first propose a method that captures time-specific semantics
and across-time alignment simultaneously in a way that is robust to data
sparsity. Then, we solve the resulting optimization problem using a scalable
coordinate descent method. Finally, we perform the empirical study on New York
Times data to learn the temporal embeddings and develop multiple evaluations
that illustrate the semantic evolution of words, discovered from news media.
Moreover, our qualitative and quantitative tests indicate that the our method
not only reliably captures the semantic evolution over time, but also
onsistently outperforms state-of-the-art temporal embedding approaches on both
semantic accuracy and alignment quality.
Rui Liu, Junjie Hu, Wei Wei, Zi Yang, Eric Nyberg
Subjects: Computation and Language (cs.CL)
This paper develops a model that addresses syntactic embedding for machine
comprehension, a key task of natural language understanding. Our proposed
model, structural embedding of syntactic trees (SEST), takes each word in a
sentence, constructs a sequence of syntactic nodes extracted from syntactic
parse trees, and encodes the sequence into a vector representation. The learned
vector is then incorporated into neural attention models, which allows learning
the mapping of syntactic structures between question and context pairs. We
evaluate our approach on SQuAD dataset and demonstrate that our model can
accurately identify the syntactic boundaries of the sentences and to extract
answers that are syntactically coherent over the baseline methods.
Jason S. Kessler
Comments: 6 pages, 5 figures. See this https URL for source code and documentation
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Scattertext is an open source tool for visualizing linguistic variation
between document categories in a language-independent way. The tool presents a
scatterplot, where each axis corresponds to the rank-frequency a term occurs in
a category of documents. Through a tie-breaking strategy, the tool is able to
display thousands of visible term-representing points and find space to legibly
label hundreds of them. Scattertext also lends itself to a query-based
visualization of how the use of terms with similar embeddings differs between
document categories, as well as a visualization for comparing the importance
scores of bag-of-words features to univariate metrics.
Jinying Chen, Hong Yu
Subjects: Computation and Language (cs.CL)
Background: Electronic health record (EHR) notes contain abundant medical
jargon that can be difficult for patients to comprehend. One way to help
patients is to reduce information overload and help them focus on medical terms
that matter most to them.
Objective: The aim of this work was to develop FIT (Finding Important Terms
for patients), an unsupervised natural language processing (NLP) system that
ranks medical terms in EHR notes based on their importance to patients.
Methods: We built FIT on a new unsupervised ensemble ranking model derived
from the biased random walk algorithm to combine heterogeneous information
resources for ranking candidate terms from each EHR note. Specifically, FIT
integrates four single views for term importance: patient use of medical
concepts, document-level term salience, word-occurrence based term relatedness,
and topic coherence. It also incorporates partial information of term
importance as conveyed by terms’ unfamiliarity levels and semantic types. We
evaluated FIT on 90 expert-annotated EHR notes and compared it with three
benchmark unsupervised ensemble ranking methods.
Results: FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR
notes to identify important terms. When including term identification, the
performance of FIT for identifying important terms from EHR notes was 0.813
AUC-ROC. It outperformed the three ensemble rankers for most metrics. Its
performance is relatively insensitive to its parameter.
Conclusions: FIT can automatically identify EHR terms important to patients
and may help develop personalized interventions to improve quality of care. By
using unsupervised learning as well as a robust and flexible framework for
information fusion, FIT can be readily applied to other domains and
applications.
Onur Mutlu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
As memory scales down to smaller technology nodes, new failure mechanisms
emerge that threaten its correct operation. If such failure mechanisms are not
anticipated and corrected, they can not only degrade system reliability and
availability but also, perhaps even more importantly, open up security
vulnerabilities: a malicious attacker can exploit the exposed failure mechanism
to take over the entire system. As such, new failure mechanisms in memory can
become practical and significant threats to system security.
In this work, we discuss the RowHammer problem in DRAM, which is a prime (and
perhaps the first) example of how a circuit-level failure mechanism in DRAM can
cause a practical and widespread system security vulnerability. RowHammer, as
it is popularly referred to, is the phenomenon that repeatedly accessing a row
in a modern DRAM chip causes bit flips in physically-adjacent rows at
consistently predictable bit locations. It is caused by a hardware failure
mechanism called DRAM disturbance errors, which is a manifestation of
circuit-level cell-to-cell interference in a scaled memory technology. We
analyze the root causes of the RowHammer problem and examine various solutions.
We also discuss what other vulnerabilities may be lurking in DRAM and other
types of memories, e.g., NAND flash memory or Phase Change Memory, that can
potentially threaten the foundations of secure systems, as the memory
technologies scale to higher densities. We conclude by describing and
advocating a principled approach to memory reliability and security research
that can enable us to better anticipate and prevent such vulnerabilities.
Pranjal Awasthi, Maria-Florina Balcan, Colin White
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
As datasets become larger and more distributed, algorithms for distributed
clustering have become more and more important. In this work, we present a
general framework for designing distributed clustering algorithms that are
robust to outliers. Using our framework, we give a distributed approximation
algorithm for k-means, k-median, or generally any L_p objective, with z
outliers and/or balance constraints, using O(m(k+z)(d+log n)) bits of
communication, where m is the number of machines, n is the size of the point
set, and d is the dimension. This generalizes and improves over previous work
of Bateni et al. and Malkomes et al. As a special case, we achieve the first
distributed algorithm for k-median with outliers, answering an open question
posed by Malkomes et al. For distributed k-means clustering, we provide the
first dimension-dependent communication complexity lower bound for finding the
optimal clustering. This improves over the lower bound from Chen et al. which
is dimension-agnostic.
Furthermore, we give distributed clustering algorithms which return nearly
optimal solutions, provided the data satisfies the approximation stability
condition of Balcan et al. or the spectral stability condition of Kumar and
Kannan. In certain clustering applications where a local clustering consistent
among all machines is sufficient, we show that no communication is necessary if
the data satisfies approximation stability.
Xiangju Qin, Paul Blomstedt, Eemeli Leppäaho, Pekka Parviainen, Samuel Kaski
Comments: 10 pages, 11 figures
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Numerical Analysis (cs.NA); Methodology (stat.ME)
Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank
representations of matrices, and giving principled predictions of missing
values. However, scaling up MCMC samplers to large matrices has proven to be
difficult with parallel algorithms that require communication between MCMC
iterations. On the other hand, designing communication-free algorithms is
challenging due to the inherent unidentifiability of BMF solutions. We propose
posterior propagation, an embarrassingly parallel inference procedure, which
hierarchically introduces dependencies between data subsets and thus alleviates
the unidentifiability problem.
Maciej Dlugosz, Sebastian Deorowicz, Marek Kokot
Subjects: Genomics (q-bio.GN); Distributed, Parallel, and Cluster Computing (cs.DC)
We introduce an improved version of RECKONER, an error corrector for Illumina
whole genome sequencing data. By modifying its workflow we reduce the
computation time even 10 times. We also propose a new method of determination
of (k)-mer length, the key parameter of (k)-spectrum-based family of
correctors. The correction algorithms are examined on huge data sets, i.e.,
human and maize genomes for both Illumina HiSeq and MiSeq instruments.
Marek Kokot, Sebastian Deorowicz, Maciej Dlugosz
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
In this paper we introduce RADULS2, the fastest parallel sorter based on
radix algorithm. It is optimized to process huge amounts of data making use of
modern multicore CPUs. The main novelties include: extremely optimized
algorithm for handling tiny arrays (up to about a hundred of records) that
could appear even billions times as subproblems to handle and improved
processing of larger subarrays with better use of non-temporal memory stores.
Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)
Robust estimation is much more challenging in high dimensions than it is in
one dimension: Most techniques either lead to intractable optimization problems
or estimators that can tolerate only a tiny fraction of errors. Recent work in
theoretical computer science has shown that, in appropriate distributional
models, it is possible to robustly estimate the mean and covariance with
polynomial time algorithms that can tolerate a constant fraction of
corruptions, independent of the dimension. However, the sample and time
complexity of these algorithms is prohibitively large for high-dimensional
applications. In this work, we address both of these issues by establishing
sample complexity bounds that are optimal, up to logarithmic factors, as well
as giving various refinements that allow the algorithms to tolerate a much
larger fraction of corruptions. Finally, we show on both synthetic and real
data that our algorithms have state-of-the-art performance and suddenly make
high-dimensional robust estimation a realistic possibility.
Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
This paper shows that a perturbed form of gradient descent converges to a
second-order stationary point in a number iterations which depends only
poly-logarithmically on dimension (i.e., it is almost “dimension-free”). The
convergence rate of this procedure matches the well-known convergence rate of
gradient descent to first-order stationary points, up to log factors. When all
saddle points are non-degenerate, all second-order stationary points are local
minima, and our result thus shows that perturbed gradient descent can escape
saddle points almost for free. Our results can be directly applied to many
machine learning applications, including deep learning. As a particular
concrete example of such an application, we show that our results can be used
directly to establish sharp global convergence rates for matrix factorization.
Our results rely on a novel characterization of the geometry around saddle
points, which may be of independent interest to the non-convex optimization
community.
Tuan Anh Le, Atilim Gunes Baydin, Robert Zinkov, Frank Wood
Comments: 8 pages, 4 figures
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We draw a formal connection between using synthetic training data to optimize
neural network parameters and approximate, Bayesian, model-based reasoning. In
particular, training a neural network using synthetic data can be viewed as
learning a proposal distribution generator for approximate inference in the
synthetic-data generative model. We demonstrate this connection in a
recognition task where we develop a novel Captcha-breaking architecture and
train it using synthetic data, demonstrating both state-of-the-art performance
and a way of computing task-specific posterior uncertainty. Using a neural
network trained this way, we also demonstrate successful breaking of real-world
Captchas currently used by Facebook and Wikipedia. Reasoning from these
empirical results and drawing connections with Bayesian modeling, we discuss
the robustness of synthetic data results and suggest important considerations
for ensuring good neural network generalization when training with synthetic
data.
Stephen H. Bach, Bryan He, Alexander Ratner, Christopher Ré
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Curating labeled training data has become the primary bottleneck in machine
learning. Recent frameworks address this bottleneck with generative models to
synthesize labels at scale from weak supervision sources. The generative
model’s dependency structure directly affects the quality of the estimated
labels, but selecting a structure automatically without any labeled data is a
distinct challenge. We propose a structure estimation method that maximizes the
(ell_1)-regularized marginal pseudolikelihood of the observed data. Our
analysis shows that the amount of unlabeled data required to identify the true
structure scales sublinearly in the number of possible dependencies for a broad
class of models. Experiments on synthetic data show that our method is
100( imes) faster than a maximum likelihood approach and selects (1/4) as many
extraneous dependencies. We also show that our method provides an average of
1.5 F1 points of improvement over existing, user-developed information
extraction applications on real-world data such as PubMed journal articles.
Saurav Talukdar, Deepjyoti Deka, Donatello Materassi, Murti V. Salapaka
Comments: 6 pages
Subjects: Learning (cs.LG); Systems and Control (cs.SY)
In this article we present a method to reconstruct the interconnectedness of
dynamically related stochastic processes, where the interactions are
bi-directional and the underlying topology is a tree. Our approach is based on
multivariate Wiener filtering which recovers spurious edges apart from the true
edges in the topology reconstruction. The main contribution of this work is to
show that all spurious links obtained using Wiener filtering can be eliminated
if the underlying topology is a tree based on which we present a three stage
network reconstruction procedure for trees. We illustrate the effectiveness of
the method developed by applying it on a typical distribution system of the
electric grid.
Tsendsuren Munkhdalai, Hong Yu
Comments: initial submission
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Deep neural networks have been successfully applied in applications with a
large amount of labeled data. However, there are major drawbacks of the neural
networks that are related to rapid generalization with small data and continual
learning of new concepts without forgetting. We present a novel meta learning
method, Meta Networks (MetaNet), that acquires a meta-level knowledge across
tasks and shifts its inductive bias via fast parameterization for the rapid
generalization. When tested on the standard one-shot learning benchmarks, our
MetaNet models achieved near human-level accuracy. We demonstrated several
appealing properties of MetaNet relating to generalization and continual
learning.
Ravid Shwartz-Ziv, Naftali Tishby
Comments: 9 pages, 7 figures
Subjects: Learning (cs.LG)
Despite their great success, there is still no com- prehensive theoretical
understanding of learning with Deep Neural Networks (DNNs) or their in- ner
organization. Previous work [Tishby & Zaslavsky (2015)] proposed to analyze
DNNs in the Information Plane; i.e., the plane of the Mutual Information values
that each layer preserves on the input and output variables. They suggested
that the goal of the network is to optimize the In- formation Bottleneck (IB)
tradeoff between com- pression and prediction, successively, for each layer.
In this work we follow up on this idea and demonstrate the effectiveness of
the Information- Plane visualization of DNNs. We first show that the stochastic
gradient descent (SGD) epochs have two distinct phases: fast empirical error
minimization followed by slow representation compression, for each layer. We
then argue that the DNN layers end up very close to the IB theo- retical bound,
and present a new theoretical argu- ment for the computational benefit of the
hidden layers.
Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio
Comments: IJCNN 2017 Accepted Paper, An extension of our paper, “ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient”
Subjects: Learning (cs.LG)
Stochastic gradient algorithms are the main focus of large-scale optimization
problems and led to important successes in the recent advancement of the deep
learning algorithms. The convergence of SGD depends on the careful choice of
learning rate and the amount of the noise in stochastic estimates of the
gradients. In this paper, we propose an adaptive learning rate algorithm, which
utilizes stochastic curvature information of the loss function for
automatically tuning the learning rates. The information about the element-wise
curvature of the loss function is estimated from the local statistics of the
stochastic first order gradients. We further propose a new variance reduction
technique to speed up the convergence. In our experiments with deep neural
networks, we obtained better performance compared to the popular stochastic
gradient algorithms.
Mike Czech, Eyke Hüllermeier, Marie-Christine Jakobs, Heike Wehrheim
Subjects: Learning (cs.LG); Software Engineering (cs.SE)
Software verification competitions, such as the annual SV-COMP, evaluate
software verification tools with respect to their effectivity and efficiency.
Typically, the outcome of a competition is a (possibly category-specific)
ranking of the tools. For many applications, such as building portfolio
solvers, it would be desirable to have an idea of the (relative) performance of
verification tools on a given verification task beforehand, i.e., prior to
actually running all tools on the task.
In this paper, we present a machine learning approach to predicting rankings
of tools on verification tasks. The method builds upon so-called label ranking
algorithms, which we complement with appropriate kernels providing a similarity
measure for verification tasks. Our kernels employ a graph representation for
software source code that mixes elements of control flow and program dependence
graphs with abstract syntax trees. Using data sets from SV-COMP, we demonstrate
our rank prediction technique to generalize well and achieve a rather high
predictive accuracy. In particular, our method outperforms a recently proposed
feature-based approach of Demyanova et al. (when applied to rank predictions).
Malte Schmidt, Dimitri Block, Uwe Meier
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The steadily growing use of license-free frequency bands requires reliable
coexistence management for deterministic medium utilization. For interference
mitigation, proper wireless interference identification (WII) is essential. In
this work we propose the first WII approach based upon deep convolutional
neural networks (CNNs). The CNN naively learns its features through
self-optimization during an extensive data-driven GPU-based training process.
We propose a CNN example which is based upon sensing snapshots with a limited
duration of 12.8 {mu}s and an acquisition bandwidth of 10 MHz. The CNN differs
between 15 classes. They represent packet transmissions of IEEE 802.11 b/g,
IEEE 802.15.4 and IEEE 802.15.1 with overlapping frequency channels within the
2.4 GHz ISM band. We show that the CNN outperforms state-of-the-art WII
approaches and has a classification accuracy greater than 95% for
signal-to-noise ratio of at least -5 dB.
Michal Moshkovitz, Naftali Tishby
Subjects: Learning (cs.LG)
We suggest analyzing neural networks through the prism of space constraints.
We observe that most training algorithms applied in practice use bounded
memory, which enables us to use a new notion introduced in the study of
space-time tradeoffs that we call mixing complexity. This notion was devised in
order to measure the (in)ability to learn using a bounded-memory algorithm. In
this paper we describe how we use mixing complexity to obtain new results on
what can and cannot be learned using neural networks.
Nils M. Kriege, Marion Neumann, Christopher Morris, Kristian Kersting, Petra Mutzel
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Non-linear kernel methods can be approximated by fast linear ones using
suitable explicit feature maps allowing their application to large scale
problems. To this end, explicit feature maps of kernels for vectorial data have
been extensively studied. As many real-world data is structured, various
kernels for complex data like graphs have been proposed. Indeed, many of them
directly compute feature maps. However, the kernel trick is employed when the
number of features is very large or the individual vertices of graphs are
annotated by real-valued attributes.
Can we still compute explicit feature maps efficiently under these
circumstances? Triggered by this question, we investigate how general
convolution kernels are composed from base kernels and construct corresponding
feature maps. We apply our results to widely used graph kernels and analyze for
which kernels and graph properties computation by explicit feature maps is
feasible and actually more efficient. In particular, we derive feature maps for
random walk and subgraph matching kernels and apply them to real-world graphs
with discrete labels. Thereby, our theoretical results are confirmed
experimentally by observing a phase transition when comparing running time with
respect to label diversity, walk lengths and subgraph size, respectively.
Moreover, we derive approximative, explicit feature maps for state-of-the-art
kernels supporting real-valued attributes including the GraphHopper and Graph
Invariant kernels. In extensive experiments we show that our approaches often
achieve a classification accuracy close to the exact methods based on the
kernel trick, but require only a fraction of their running time.
Neil G. Marchant, Benjamin I. P. Rubinstein
Comments: 13 pages, 5 figures
Subjects: Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)
Entity resolution (ER) presents unique challenges for evaluation methodology.
While crowd sourcing provides a platform to acquire ground truth, sound
approaches to sampling must drive labelling efforts. In ER, extreme class
imbalance between matching and non-matching records can lead to enormous
labelling requirements when seeking statistically consistent estimates of
population parameters. This paper addresses this important challenge with the
OASIS algorithm. OASIS draws samples from a (biased) instrumental distribution,
chosen to have optimal asymptotic variance. As new labels are collected OASIS
updates this instrumental distribution via a Bayesian latent variable model of
the annotator oracle, to quickly focus on regions providing more information.
We prove that resulting estimates of F-measure, precision, recall converge to
the true population values. Thorough comparisons of sampling methods on a
variety of ER datasets demonstrate significant labelling reductions of up to
75% without loss to estimate accuracy.
Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, Masashi Sugiyama
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
From only positive (P) and unlabeled (U) data, a binary classifier could be
trained with PU learning. Unbiased PU learning that is based on unbiased risk
estimators is now state of the art. However, if its model is very flexible, its
empirical risk on training data will go negative, and we will suffer from
overfitting seriously. In this paper, we propose a novel non-negative risk
estimator for PU learning. When being minimized, it is more robust against
overfitting, and thus we are able to train very flexible models given limited P
data. Moreover, we analyze the bias, consistency and mean-squared-error
reduction of the proposed risk estimator as well as the estimation error of the
corresponding risk minimizer. Experiments show that the non-negative risk
estimator outperforms unbiased counterparts when they disagree.
Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, Yi Zhang
Subjects: Learning (cs.LG)
This paper makes progress on several open theoretical issues related to
Generative Adversarial Networks. A definition is provided for what it means for
the training to generalize, and it is shown that generalization is not
guaranteed for the popular distances between distributions such as
Jensen-Shannon or Wasserstein. We introduce a new metric called neural net
distance for which generalization does occur. We also show that an approximate
pure equilibrium in the 2-player game exists for a natural training objective
(Wasserstein). Showing such a result has been an open problem (for any training
objective).
Finally, the above theoretical ideas lead us to propose a new training
protocol, MIX+GAN, which can be combined with any existing method. We present
experiments showing that it stabilizes and improves some existing methods.
Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande
Subjects: Learning (cs.LG); Chemical Physics (physics.chem-ph); Machine Learning (stat.ML)
Molecular machine learning has been maturing rapidly over the last few years.
Improved methods and the presence of larger datasets have enabled machine
learning algorithms to make increasingly accurate predictions about molecular
properties. However, algorithmic progress has been limited due to the lack of a
standard benchmark to compare the efficacy of proposed methods; most new
algorithms are benchmarked on different datasets making it challenging to gauge
the quality of proposed methods. This work introduces MoleculeNet, a large
scale benchmark for molecular machine learning. MoleculeNet curates multiple
public datasets, establishes metrics for evaluation, and offers high quality
open-source implementations of multiple previously proposed molecular
featurization and learning algorithms (released as part of the DeepChem open
source library). MoleculeNet benchmarks demonstrate that learnable
representations, and in particular graph convolutional networks, are powerful
tools for molecular machine learning and broadly offer the best performance.
However, for quantum mechanical and biophysical datasets, the use of
physics-aware featurizations can be significantly more important than choice of
particular learning algorithm.
David A. Moore, Stuart J. Russell
Comments: Appearing at AISTATS 2017
Subjects: Learning (cs.LG); Geophysics (physics.geo-ph)
Detecting weak seismic events from noisy sensors is a difficult perceptual
task. We formulate this task as Bayesian inference and propose a generative
model of seismic events and signals across a network of spatially distributed
stations. Our system, SIGVISA, is the first to directly model seismic
waveforms, allowing it to incorporate a rich representation of the physics
underlying the signal generation process. We use Gaussian processes over
wavelet parameters to predict detailed waveform fluctuations based on
historical events, while degrading smoothly to simple parametric envelopes in
regions with no historical seismicity. Evaluating on data from the western US,
we recover three times as many events as previous work, and reduce mean
location errors by a factor of four while greatly increasing sensitivity to
low-magnitude events.
Yuandong Tian
Subjects: Learning (cs.LG)
In this paper, we explore theoretical properties of training a two-layered
ReLU network (g(mathbf{x}; mathbf{w}) = sum_{j=1}^K
sigma(mathbf{w}_j^Tmathbf{x})) with centered (d)-dimensional spherical
Gaussian input (mathbf{x}) ((sigma)=ReLU). We train our network with gradient
descent on (mathbf{w}) to mimic the output of a teacher network with the same
architecture and fixed parameters (mathbf{w}^*). We show that its population
gradient has an analytical formula, leading to interesting theoretical analysis
of critical points and convergence behaviors. First, we prove that critical
points outside the hyperplane spanned by the teacher parameters
(“out-of-plane”) are not isolated and form manifolds, and characterize in-plane
critical-point-free regions for two ReLU case. On the other hand, convergence
to (mathbf{w}^*) for one ReLU node is guaranteed with at least
((1-epsilon)/2) probability, if weights are initialized randomly with standard
deviation upper-bounded by (O(epsilon/sqrt{d})), consistent with empirical
practice. For network with many ReLU nodes, we prove that an infinitesimal
perturbation of weight initialization results in convergence towards
(mathbf{w}^*) (or its permutation), a phenomenon known as spontaneous
symmetric-breaking (SSB) in physics. We assume no independence of ReLU
activations. Simulation verifies our findings.
Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt
Subjects: Learning (cs.LG)
We consider emph{influence maximization} (IM) in social networks, which is
the problem of maximizing the number of users that become aware of a product by
selecting a set of “seed” users to expose the product to. While prior work
assumes a known model of information diffusion, we propose a parametrization in
terms of pairwise reachability which makes our framework agnostic to the
underlying diffusion model. We give a corresponding monotone, submodular
surrogate function, and show that it is a good approximation to the original IM
objective. We also consider the case of a new marketer looking to exploit an
existing social network, while simultaneously learning the factors governing
information propagation. For this, we propose a pairwise-influence semi-bandit
feedback model and develop a LinUCB-based bandit algorithm. Our
model-independent regret analysis shows that our bound on the cumulative regret
has a better (as compared to previous work) dependence on the size of the
network. By using the graph Laplacian eigenbasis to construct features, we
describe a practical LinUCB implementation. Experimental evaluation suggests
that our framework is robust to the underlying diffusion model and can
efficiently learn a near-optimal solution.
Wojciech Marian Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu
Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
When training neural networks, the use of Synthetic Gradients (SG) allows
layers or modules to be trained without update locking – without waiting for a
true error gradient to be backpropagated – resulting in Decoupled Neural
Interfaces (DNIs). This unlocked ability of being able to update parts of a
neural network asynchronously and with only local information was demonstrated
to work empirically in Jaderberg et al (2016). However, there has been very
little demonstration of what changes DNIs and SGs impose from a functional,
representational, and learning dynamics point of view. In this paper, we study
DNIs through the use of synthetic gradients on feed-forward networks to better
understand their behaviour and elucidate their effect on optimisation. We show
that the incorporation of SGs does not affect the representational strength of
the learning system for a neural network, and prove the convergence of the
learning system for linear and deep linear models. On practical problems we
investigate the mechanism by which synthetic gradient estimators approximate
the true loss, and, surprisingly, how that leads to drastically different
layer-wise representations. Finally, we also expose the relationship of using
synthetic gradients to other error approximation techniques and find a unifying
language for discussion and comparison.
Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, Jason H. Moore
Comments: 14 pages, 5 figures, submitted for review to JMLR
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
The selection, development, or comparison of machine learning methods in data
mining can be a difficult task based on the target problem and goals of a
particular study. Numerous publicly available real-world and simulated
benchmark datasets have emerged from different sources, but their organization
and adoption as standards have been inconsistent. As such, selecting and
curating specific benchmarks remains an unnecessary burden on machine learning
practitioners and data scientists. The present study introduces an accessible,
curated, and developing public benchmark resource to facilitate identification
of the strengths and weaknesses of different machine learning methodologies. We
compare meta-features among the current set of benchmark datasets in this
resource to characterize the diversity of available data. Finally, we apply a
number of established machine learning methods to the entire benchmark suite
and analyze how datasets and algorithms cluster in terms of performance. This
work is an important first step towards understanding the limitations of
popular benchmarking suites and developing a resource that connects existing
benchmarking standards to more diverse and efficient standards in the future.
Pedro M. Esperança, Louis J. M. Aslett, Chris C. Holmes
Comments: Accepted for AISTATS 2017
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Information that is stored in an encrypted format is, by definition, usually
not amenable to statistical analysis or machine learning methods. In this paper
we present detailed analysis of coordinate and accelerated gradient descent
algorithms which are capable of fitting least squares and penalised ridge
regression models, using data encrypted under a fully homomorphic encryption
scheme. Gradient descent is shown to dominate in terms of encrypted
computational speed, and theoretical results are proven to give parameter
bounds which ensure correctness of decryption. The characteristics of encrypted
computation are empirically shown to favour a non-standard acceleration
technique. This demonstrates the possibility of approximating conventional
statistical regression methods using encrypted data without compromising
privacy.
Pranjal Awasthi, Maria-Florina Balcan, Colin White
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
As datasets become larger and more distributed, algorithms for distributed
clustering have become more and more important. In this work, we present a
general framework for designing distributed clustering algorithms that are
robust to outliers. Using our framework, we give a distributed approximation
algorithm for k-means, k-median, or generally any L_p objective, with z
outliers and/or balance constraints, using O(m(k+z)(d+log n)) bits of
communication, where m is the number of machines, n is the size of the point
set, and d is the dimension. This generalizes and improves over previous work
of Bateni et al. and Malkomes et al. As a special case, we achieve the first
distributed algorithm for k-median with outliers, answering an open question
posed by Malkomes et al. For distributed k-means clustering, we provide the
first dimension-dependent communication complexity lower bound for finding the
optimal clustering. This improves over the lower bound from Chen et al. which
is dimension-agnostic.
Furthermore, we give distributed clustering algorithms which return nearly
optimal solutions, provided the data satisfies the approximation stability
condition of Balcan et al. or the spectral stability condition of Kumar and
Kannan. In certain clustering applications where a local clustering consistent
among all machines is sufficient, we show that no communication is necessary if
the data satisfies approximation stability.
Daniel Lerch-Hostalot, David Megías
Subjects: Multimedia (cs.MM); Learning (cs.LG)
In this paper, an unsupervised steganalysis method that combines artificial
training setsand supervised classification is proposed. We provide a formal
framework for unsupervisedclassification of stego and cover images in the
typical situation of targeted steganalysis (i.e.,for a known algorithm and
approximate embedding bit rate). We also present a completeset of experiments
using 1) eight different image databases, 2) image features based on
RichModels, and 3) three different embedding algorithms: Least Significant Bit
(LSB) matching,Highly undetectable steganography (HUGO) and Wavelet Obtained
Weights (WOW). Weshow that the experimental results outperform previous methods
based on Rich Models inthe majority of the tested cases. At the same time, the
proposed approach bypasses theproblem of Cover Source Mismatch -when the
embedding algorithm and bit rate are known-, since it removes the need of a
training database when we have a large enough testing set.Furthermore, we
provide a generic proof of the proposed framework in the machine
learningcontext. Hence, the results of this paper could be extended to other
classification problemssimilar to steganalysis.
Shuming Ma, Xu Sun
Subjects: Computation and Language (cs.CL); Learning (cs.LG)
To speed up the training process, many existing systems use parallel
technology for online learning algorithms. However, most research mainly focus
on stochastic gradient descent (SGD) instead of other algorithms. We propose a
generic online parallel learning framework for large margin models, and also
analyze our framework on popular large margin algorithms, including MIRA and
Structured Perceptron. Our framework is lock-free and easy to implement on
existing systems. Experiments show that systems with our framework can gain
near linear speed up by increasing running threads, and with no loss in
accuracy.
Pranav Shyam, Shubham Gupta, Ambedkar Dukkipati
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
Many problems in Artificial Intelligence and Machine Learning can be reduced
to the problem of quantitative comparison of two entities. In Deep Learning the
ubiquitous architecture used for this task is the Siamese Neural Network which
maps each entity to a representation through a learnable function and expresses
similarity through the distances among the entities in the representation
space. In this paper, we argue that such a static and invariant mapping is both
naive and unnatural. We develop a novel neural model called Attentive Recurrent
Comparators (ARCs) that dynamically compares two entities and test the model
extensively on the Omniglot dataset. In the task of similarity learning, our
simplistic model that does not use any convolutions performs on par with Deep
Convolutional Siamese Networks and significantly better when convolutional
layers are also used. In the challenging task of one-shot learning on the same
dataset, an ARC based model achieves the first super-human performance for a
neural method with an error rate of 1.5\%.
Xiangju Qin, Paul Blomstedt, Eemeli Leppäaho, Pekka Parviainen, Samuel Kaski
Comments: 10 pages, 11 figures
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Numerical Analysis (cs.NA); Methodology (stat.ME)
Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank
representations of matrices, and giving principled predictions of missing
values. However, scaling up MCMC samplers to large matrices has proven to be
difficult with parallel algorithms that require communication between MCMC
iterations. On the other hand, designing communication-free algorithms is
challenging due to the inherent unidentifiability of BMF solutions. We propose
posterior propagation, an embarrassingly parallel inference procedure, which
hierarchically introduces dependencies between data subsets and thus alleviates
the unidentifiability problem.
Virag Shah, Lennart Gulikers, Laurent Massoulie, Milan Vojnovic
Comments: 19 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Online two-sided matching markets such as Q&A forums (e.g. StackOverflow,
Quora) and online labour platforms (e.g. Upwork) critically rely on the ability
to propose adequate matches based on imperfect knowledge of the two parties to
be matched. This prompts the following question: Which matching recommendation
algorithms can, in the presence of such uncertainty, lead to efficient platform
operation?
To answer this question, we develop a model of a task / server matching
system. For this model, we give a necessary and sufficient condition for an
incoming stream of tasks to be manageable by the system. We further identify a
so-called back-pressure policy under which the throughput that the system can
handle is optimized. We show that this policy achieves strictly larger
throughput than a natural greedy policy. Finally, we validate our model and
confirm our theoretical findings with experiments based on logs of
Math.StackExchange, a StackOverflow forum dedicated to mathematics.
Nicolas Gillis
Comments: 18 pages, 4 figures
Journal-ref: SIAG/OPT Views and News 25 (1), pp. 7-16 (2017)
Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Numerical Analysis (cs.NA); Machine Learning (stat.ML)
In this paper, we introduce and provide a short overview of nonnegative
matrix factorization (NMF). Several aspects of NMF are discussed, namely, the
application in hyperspectral imaging, geometry and uniqueness of NMF solutions,
complexity, algorithms, and its link with extended formulations of polyhedra.
In order to put NMF into perspective, the more general problem class of
constrained low-rank matrix approximation problems is first briefly introduced.
Yiling Yuan, Tao Yang, Hui Feng, Bo Hu, Jianqiu Zhang, Bin Wang, Qiyong Lu
Subjects: Information Theory (cs.IT); Learning (cs.LG)
We consider a D2D-enabled cellular network where user equipments (UEs) owned
by rational users are incentivized to form D2D pairs using tokens. They
exchange tokens electronically to “buy” and “sell” D2D services. Meanwhile the
devices have the ability to choose the transmission mode, i.e. receiving data
via cellular links or D2D links. Thus taking the different benefits brought by
diverse traffic types as a prior, the UEs can utilize their tokens more
efficiently via transmission mode selection. In this paper, the optimal
transmission mode selection strategy as well as token collection policy are
investigated to maximize the long-term utility in the dynamic network
environment. The optimal policy is proved to be a threshold strategy, and the
thresholds have a monotonicity property. Numerical simulations verify our
observations and the gain from transmission mode selection is observed.
Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric
Comments: 37 pages, 8 figures
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
We explore the sequential decision making problem where the goal is to
estimate uniformly well a number of linear models, given a shared budget of
random contexts independently sampled from a known distribution. The decision
maker must query one of the linear models for each incoming context, and
receives an observation corrupted by noise levels that are unknown, and depend
on the model instance. We present Trace-UCB, an adaptive allocation algorithm
that learns the noise levels while balancing contexts accordingly across the
different linear functions, and derive guarantees for simple regret in both
expectation and high-probability. Finally, we extend the algorithm and its
guarantees to high dimensional settings, where the number of linear models
times the dimension of the contextual space is higher than the total budget of
samples. Simulations with real data suggest that Trace-UCB is remarkably
robust, outperforming a number of baselines even when its assumptions are
violated.
Sven Schmit, Carlos Riquelme
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Recommendation systems rely on historical user data to provide suggestions.
We propose an explicit and simple model for the interaction between users and
recommendations provided by a platform, and relate this model to the
multi-armed bandit literature. First, we show that this interaction leads to a
bias in naive estimators due to selection effects. This bias leads to
suboptimal outcomes, which we quantify in terms of linear regret. We end the
first part by discussing ways to obtain unbiased estimates. The second part of
this work considers exploration of alternatives. We show that although agents
are myopic, agents’ heterogeneous preferences ensure that recommendation
systems ‘learn’ about all alternatives without explicitly incentivizing this
exploration. This work provides new and practical insights relevant to a wide
range of systems designed to help users make better decisions.
Shuchi Chawla, Nikhil Devanur, Janardhan Kulkarni, Rad Niazadeh
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Learning (cs.LG)
We consider a scheduling problem where a cloud service provider has multiple
units of a resource available over time. Selfish clients submit jobs, each with
an arrival time, deadline, length, and value. The service provider’s goal is to
implement a truthful online mechanism for scheduling jobs so as to maximize the
social welfare of the schedule. Recent work shows that under a stochastic
assumption on job arrivals, there is a single-parameter family of mechanisms
that achieves near-optimal social welfare. We show that given any such family
of near-optimal online mechanisms, there exists an online mechanism that in the
worst case performs nearly as well as the best of the given mechanisms. Our
mechanism is truthful whenever the mechanisms in the given family are truthful
and prompt, and achieves optimal (within constant factors) regret.
We model the problem of competing against a family of online scheduling
mechanisms as one of learning from expert advice. A primary challenge is that
any scheduling decisions we make affect not only the payoff at the current
step, but also the resource availability and payoffs in future steps.
Furthermore, switching from one algorithm (a.k.a. expert) to another in an
online fashion is challenging both because it requires synchronization with the
state of the latter algorithm as well as because it affects the incentive
structure of the algorithms. We further show how to adapt our algorithm to a
non-clairvoyant setting where job lengths are unknown until jobs are run to
completion. Once again, in this setting, we obtain truthfulness along with
asymptotically optimal regret (within poly-logarithmic factors).
Rika Antonova, Silvia Cruciani, Christian Smith, Danica Kragic
Comments: (Rika Antonova and Silvia Cruciani contributed equally)
Subjects: Robotics (cs.RO); Learning (cs.LG)
In this work we propose an approach to learn a robust policy for solving the
pivoting task. Recently, several model-free continuous control algorithms were
shown to learn successful policies without prior knowledge of the dynamics of
the task. However, obtaining successful policies required thousands to millions
of training episodes, limiting the applicability of these approaches to real
hardware. We developed a training procedure that allows us to use a simple
custom simulator to learn policies robust to the mismatch of simulation vs
robot. In our experiments, we demonstrate that the policy learned in the
simulator is able to pivot the object to the desired target angle on the real
robot. We also show generalization to an object with different inertia, shape,
mass and friction properties than those used during training. This result is a
step towards making model-free reinforcement learning available for solving
robotics tasks via pre-training in simulators that offer only an imprecise
match to the real-world dynamics.
Yuan Cao, Yonglin Cao
Subjects: Information Theory (cs.IT)
((1+pw))-constacyclic codes of arbitrary length over the non-principal ideal
ring (mathbb{Z}_{p^s} +umathbb{Z}_{p^s}) are studied, where (p) is a prime,
(win mathbb{Z}_{p^s}^{ imes}) and (s) an integer satisfying (sgeq 2).
First, the structure of any ((1+pw))-constacyclic code over (mathbb{Z}_{p^s}
+umathbb{Z}_{p^s}) are presented. Then enumerations for the number of all
codes and the number of codewords in each code, and the structure of dual codes
for these codes are given, respectively. Then self-dual ((1+2w))-constacyclic
codes over (mathbb{Z}_{2^s} +umathbb{Z}_{2^s}) are investigated, where
(w=2^{s-2}-1) or (2^{s-1}-1) if (sgeq 3), and (w=1) if (s=2).
José Gómez-Torrecillas, F. J. Lobillo, Gabriel Navarro
Subjects: Information Theory (cs.IT)
We design a non-commutative version of the Peterson-Gorenstein-Zierler
decoding algorithm for a class of codes that we call skew RS codes. These codes
are left ideals of a quotient of a skew polynomial ring, which endow them of a
sort of non-commutative cyclic structure. Since we work over an arbitrary
field, our techniques may be applied both to linear block codes and
convolutional codes. In particular, our decoding algorithm applies for block
codes beyond the classical cyclic case.
Masahito Hayashi, Masaki Owari, Go Kato, Ning Cai
Subjects: Information Theory (cs.IT)
In the network coding, we discuss the effect by sequential error injection to
information leakage. We show that there is no improvement when the network is
composed of linear operations. However, when the network contains non-linear
operations, we find a counterexample to improve Eve’s obtained information.
Further, we discuss the asymptotic rate in the linear network under the secrecy
and robustness conditions.
Vien V. Mai, Won-Yong Shin, Koji Ishibashi
Comments: 24 pages, 6 figures, To appear in IEEE Journal of Selected Topics in Signal Processing
Subjects: Information Theory (cs.IT); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI); Optimization and Control (math.OC)
This paper studies power allocation for distributed estimation of an unknown
scalar random source in sensor networks with a multiple-antenna fusion center
(FC), where wireless sensors are equipped with radio-frequency based energy
harvesting technology. The sensors’ observation is locally processed by using
an uncoded amplify-and-forward scheme. The processed signals are then sent to
the FC, and are coherently combined at the FC, at which the best linear
unbiased estimator (BLUE) is adopted for reliable estimation. We aim to solve
the following two power allocation problems: 1) minimizing distortion under
various power constraints; and 2) minimizing total transmit power under
distortion constraints, where the distortion is measured in terms of
mean-squared error of the BLUE. Two iterative algorithms are developed to solve
the non-convex problems, which converge at least to a local optimum. In
particular, the above algorithms are designed to jointly optimize the
amplification coefficients, energy beamforming, and receive filtering. For each
problem, a suboptimal design, a single-antenna FC scenario, and a common
harvester deployment for colocated sensors, are also studied. Using the
powerful semidefinite relaxation framework, our result is shown to be valid for
any number of sensors, each with different noise power, and for an arbitrarily
number of antennas at the FC.
Susanne Sparrer, Robert F.H. Fischer
Subjects: Information Theory (cs.IT)
In Compressed Sensing, a real-valued sparse vector has to be recovered from
an underdetermined system of linear equations. In many applications, however,
the elements of the sparse vector are drawn from a finite set. Adapted
algorithms incorporating this additional knowledge are required for the
discrete-valued setup. In this paper, turbo-based algorithms for both cases are
elucidated and analyzed from a communications engineering perspective, leading
to a deeper understanding of the algorithm. In particular, we gain the
intriguing insight that the calculation of extrinsic values is equal to the
unbiasing of a biased estimate and present an improved algorithm.
Weidong Mei, Zhi Chen, Jun Fang, Shaoqian Li
Comments: 12 pages, 5 figures
Subjects: Information Theory (cs.IT)
This paper considers an artificial noise (AN)-aided transmit design for
multi-user MIMO systems with integrated services. Specifically, two sorts of
service messages are combined and served simultaneously: one multicast message
intended for all receivers and one confidential message intended for only one
receiver and required to be perfectly secure from other unauthorized receivers.
Our interest lies in the joint design of input covariances of the multicast
message, confidential message and artificial noise (AN), such that the
achievable secrecy rate and multicast rate are simultaneously maximized. This
problem is identified as a secrecy rate region maximization (SRRM) problem in
the context of physical-layer service integration. Since this bi-objective
optimization problem is inherently complex to solve, we put forward two
different scalarization methods to convert it into a scalar optimization
problem. First, we propose to prefix the multicast rate as a constant, and
accordingly, the primal biobjective problem is converted into a secrecy rate
maximization (SRM) problem with quality of multicast service (QoMS) constraint.
By varying the constant, we can obtain different Pareto optimal points. The
resulting SRM problem can be iteratively solved via a provably convergent
difference-of-concave (DC) algorithm. In the second method, we aim to maximize
the weighted sum of the secrecy rate and the multicast rate. Through varying
the weighted vector, one can also obtain different Pareto optimal points. We
show that this weighted sum rate maximization (WSRM) problem can be recast into
a primal decomposable form, which is amenable to alternating optimization (AO).
Then we compare these two scalarization methods in terms of their overall
performance and computational complexity via theoretical analysis as well as
numerical simulation, based on which new insights can be drawn.
Dong Min Kim, Henning Thomsen, Petar Popovski
Comments: to be presented in IEEE VTC 2017 Spring
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
In this paper, we describe CoMP2flex, a user-centric base station (BS)
cooperation scheme that provides improvements in reliability of both uplink
(UL) and downlink (DL) communications of wireless cellular networks. CoMP2flex
supports not only cooperation of two BSs with same direction of traffic but
also cooperation of two BSs serving bidirectional traffic. The reliability
performance of CoMP2flex is shown with numerical simulations and analytical
expressions. We quantify and numerically validate the performance of the greedy
BS pairing algorithm by comparing maximum weight matching methods, implemented
as the Edmonds matching algorithm for weighted graphs.
Yiling Yuan, Tao Yang, Hui Feng, Bo Hu, Jianqiu Zhang, Bin Wang, Qiyong Lu
Subjects: Information Theory (cs.IT); Learning (cs.LG)
We consider a D2D-enabled cellular network where user equipments (UEs) owned
by rational users are incentivized to form D2D pairs using tokens. They
exchange tokens electronically to “buy” and “sell” D2D services. Meanwhile the
devices have the ability to choose the transmission mode, i.e. receiving data
via cellular links or D2D links. Thus taking the different benefits brought by
diverse traffic types as a prior, the UEs can utilize their tokens more
efficiently via transmission mode selection. In this paper, the optimal
transmission mode selection strategy as well as token collection policy are
investigated to maximize the long-term utility in the dynamic network
environment. The optimal policy is proved to be a threshold strategy, and the
thresholds have a monotonicity property. Numerical simulations verify our
observations and the gain from transmission mode selection is observed.
Dong Yin, Ramtin Pedarsani, Yudong Chen, Kannan Ramchandran
Subjects: Information Theory (cs.IT)
In this paper, we consider the mixture of sparse linear regressions model.
Let ({eta}^{(1)},ldots,{eta}^{(L)}inmathbb{C}^n) be ( L ) unknown sparse
parameter vectors with a total of ( K ) non-zero coefficients. Noisy linear
measurements are obtained in the form (y_i={x}_i^H {eta}^{(ell_i)} + w_i),
each of which is generated randomly from one of the sparse vectors with the
label ( ell_i ) unknown. The goal is to estimate the parameter vectors
efficiently with low sample and computational costs. This problem presents
significant challenges as one needs to simultaneously solve the demixing
problem of recovering the labels ( ell_i ) as well as the estimation problem
of recovering the sparse vectors ( {eta}^{(ell)} ).
Our solution to the problem leverages the connection between modern coding
theory and statistical inference. We introduce a new algorithm, Mixed-Coloring,
which samples the mixture strategically using query vectors ( {x}_i )
constructed based on ideas from sparse graph codes. Our novel code design
allows for both efficient demixing and parameter estimation. The algorithm
achieves the order-optimal sample and time complexities of (Theta(K)) in the
noiseless setting, and near-optimal (Theta(K ext{polylog}(n))) complexities
in the noisy setting. In one of our experiments, to recover a mixture of two
regressions with dimension (n=500) and sparsity (K=50), our algorithm is more
than (300) times faster than EM algorithm, with about ( 1/3 ) of its sample
cost.
Ronen Karni, Moshe Schwartz
Subjects: Information Theory (cs.IT)
We study covering codes of permutations with the (ell_infty)-metric. We
provide a general code construction, which uses smaller building-block codes.
We study cyclic transitive groups as building blocks, determining their exact
covering radius, and showing linear-time algorithms for finding a covering
codeword. We also bound the covering radius of relabeled cyclic transitive
groups under conjugation.
Chi-Yo Tsai, Gaurav Kumar Agarwal, Christina Fragouli, Suhas Diggavi
Subjects: Information Theory (cs.IT)
Eavesdropping attacks in inference systems aim to learn not the raw data, but
the system inferences to predict and manipulate system actions. We argue that
conventional entropy measures can be ambiguous on the adversary’s estimation
abilities, and adopt instead a distortion based framework. We show that
requiring perfect distortion-based security is more frugal than requiring
perfect entropy-based secrecy even for block length 1 codes, offering in some
cases unbounded gains. Within this framework, we design algorithms that enable
to efficiently use shared randomness, and show that each shared random key is
exponentially useful in security.
Erik Agrell, Balázs Csébfalvi
Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)
A new lower bound on the average reconstruction error variance of
multidimensional sampling and reconstruction is presented. It applies to
sampling on arbitrary lattices in arbitrary dimensions, assuming a stochastic
process with constant, isotropically bandlimited spectrum and reconstruction by
the best linear interpolator. The lower bound is exact for any lattice at
sufficiently high and low sampling rates. The two threshold rates where the
error variance deviates from the lower bound gives two optimality criteria for
sampling lattices. It is proved that at low rates, near the first threshold,
the optimal lattice is the dual of the best sphere-covering lattice, which for
the first time establishes a rigorous relation between optimal sampling and
optimal sphere covering. A previously known result is confirmed at high rates,
near the second threshold, namely, that the optimal lattice is the dual of the
best sphere-packing lattice. Numerical results quantify the performance of
various lattices for sampling and support the theoretical optimality criteria.
Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)
Robust estimation is much more challenging in high dimensions than it is in
one dimension: Most techniques either lead to intractable optimization problems
or estimators that can tolerate only a tiny fraction of errors. Recent work in
theoretical computer science has shown that, in appropriate distributional
models, it is possible to robustly estimate the mean and covariance with
polynomial time algorithms that can tolerate a constant fraction of
corruptions, independent of the dimension. However, the sample and time
complexity of these algorithms is prohibitively large for high-dimensional
applications. In this work, we address both of these issues by establishing
sample complexity bounds that are optimal, up to logarithmic factors, as well
as giving various refinements that allow the algorithms to tolerate a much
larger fraction of corruptions. Finally, we show on both synthetic and real
data that our algorithms have state-of-the-art performance and suddenly make
high-dimensional robust estimation a realistic possibility.
Luis David Alvarez Corrales, Anastasios Giovanidis, Philippe Martins, Laurent Decreusefond
Comments: submitted, 12 pages, double-column, 7 figures, 8 sub-figures in total
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Performance (cs.PF); Probability (math.PR)
Base station cooperation is a promising scheme to improve network performance
for next generation cellular networks. Up to this point research has focused on
station grouping criteria based solely on geographic proximity. However, for
the cooperation to be meaningful, each station participating in a group should
have sufficient available resources to share with others. In this work we
consider an alternative grouping criterion based on a distance that considers
both geographic proximity and available resources of the stations. When the
network is modelled by a Poisson Point Process, we derive analytical formulas
on the proportion of cooperative pairs or single stations, and the expected sum
interference from each of the groups. The results illustrate that cooperation
gains strongly depend on the distribution of available resources over the
network.