Samuli Laine, Timo Aila
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
In this paper, we present a simple and efficient method for training deep
neural networks in a semi-supervised setting where only a small portion of
training data is labeled. We introduce temporal ensembling, where we form a
consensus prediction of the unknown labels under multiple instances of the
network-in-training on different epochs, and most importantly, under different
regularization and input augmentation conditions. This ensemble prediction can
be expected to be a better predictor for the unknown labels than the output of
the network at the most recent training epoch, and can thus be used as a target
for training. Using our method, we set new records for two standard
semi-supervised learning benchmarks, reducing the classification error rate
from 18.63% to 12.89% in CIFAR-10 with 4000 labels and from 18.44% to 6.83% in
SVHN with 500 labels.
Dan Hendrycks, Kevin Gimpel
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We consider the two related problems of detecting if an example is
misclassified or out-of-distribution. We present a simple baseline that
utilizes probabilities from softmax distributions. Correctly classified
examples tend to have greater maximum softmax probabilities than erroneously
classified and out-of-distribution examples, allowing for their detection. We
assess performance by defining several tasks in computer vision, natural
language processing, and automatic speech recognition, showing the
effectiveness of this baseline across all. We then show the baseline can
sometimes be surpassed, demonstrating the room for future research on these
underexplored detection tasks.
Nancy Lynch, Cameron Musco, Merav Parter
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Neurons and Cognition (q-bio.NC)
We initiate a line of investigation into biological neural networks from an
algorithmic perspective. We develop a simplified but biologically plausible
model for distributed computation in stochastic spiking neural networks and
study tradeoffs between computation time and network complexity in this model.
Our aim is to abstract real neural networks in a way that, while not capturing
all interesting features, preserves high-level behavior and allows us to make
biologically relevant conclusions.
In this paper, we focus on the important `winner-take-all’ (WTA) problem,
which is analogous to a neural leader election unit: a network consisting of
$n$ input neurons and $n$ corresponding output neurons must converge to a state
in which a single output corresponding to a firing input (the `winner’) fires,
while all other outputs remain silent. Neural circuits for WTA rely on
inhibitory neurons, which suppress the activity of competing outputs and drive
the network towards a converged state with a single firing winner. We attempt
to understand how the number of inhibitors used affects network convergence
time.
We show that it is possible to significantly outperform naive WTA
constructions through a more refined use of inhibition, solving the problem in
$O( heta)$ rounds in expectation with just $O(log^{1/ heta} n)$ inhibitors
for any $ heta$. An alternative construction gives convergence in
$O(log^{1/ heta} n)$ rounds with $O( heta)$ inhibitors. We compliment these
upper bounds with our main technical contribution, a nearly matching lower
bound for networks using $ge loglog n$ inhibitors. Our lower bound uses
familiar indistinguishability and locality arguments from distributed computing
theory. It lets us derive a number of interesting conclusions about the
structure of any network solving WTA with good probability, and the use of
randomness and inhibition within such a network.
Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems. Special Issue on Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In big data era, the data continuously generated and its distribution may
keep changes overtime. These challenges in online stream of data are known as
concept drift. In this paper, we proposed the Adaptive Convolutional ELM method
(ACNNELM) as enhancement of Convolutional Neural Network (CNN) with a hybrid
Extreme Learning Machine (ELM) model plus adaptive capability. This method is
aimed for concept drift handling. We enhanced the CNN as convolutional
hiererchical features representation learner combined with Elastic ELM
(E$^2$LM) as a parallel supervised classifier. We propose an Adaptive OS-ELM
(AOS-ELM) for concept drift adaptability in classifier level (named ACNNELM-1)
and matrices concatenation ensembles for concept drift adaptability in ensemble
level (named ACNNELM-2). Our proposed Adaptive CNNELM is flexible that works
well in classifier level and ensemble level while most current methods only
proposed to work on either one of the levels.
We verified our method in extended MNIST data set and not MNIST data set. We
set the experiment to simulate virtual drift, real drift, and hybrid drift
event and we demonstrated how our CNNELM adaptability works. Our proposed
method works well and gives better accuracy, computation scalability, and
concept drifts adaptability compared to the regular ELM and CNN. Further
researches are still required to study the optimum parameters and to use more
varied image data set.
Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
Comments: 17 pages, 16 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
We propose a technique for making Convolutional Neural Network (CNN)-based
models more transparent by visualizing the regions of input that are
“important” for predictions from these models – or visual explanations.
Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM),
uses the class-specific gradient information flowing into the final
convolutional layer of a CNN to produce a coarse localization map of the
important regions in the image. Grad-CAM is a strict generalization of the
Class Activation Mapping. Unlike CAM, Grad-CAM requires no re-training and is
broadly applicable to any CNN-based architectures. We also show how Grad-CAM
may be combined with existing pixel-space visualizations to create a
high-resolution class-discriminative visualization (Guided Grad-CAM). We
generate Grad-CAM and Guided Grad-CAM visual explanations to better understand
image classification, image captioning, and visual question answering (VQA)
models. In the context of image classification models, our visualizations (a)
lend insight into their failure modes showing that seemingly unreasonable
predictions have reasonable explanations, and (b) outperform pixel-space
gradient visualizations (Guided Backpropagation and Deconvolution) on the
ILSVRC-15 weakly supervised localization task. For image captioning and VQA,
our visualizations expose the somewhat surprising insight that common CNN +
LSTM models can often be good at localizing discriminative input image regions
despite not being trained on grounded image-text pairs.
Finally, we design and conduct human studies to measure if Guided Grad-CAM
explanations help users establish trust in the predictions made by deep
networks. Interestingly, we show that Guided Grad-CAM helps untrained users
successfully discern a “stronger” deep network from a “weaker” one even when
both networks make identical predictions.
François Chollet
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present an interpretation of Inception modules in convolutional neural
networks as being an intermediate step in-between regular convolution and the
recently introduced “separable convolution” operation. In this light, a
separable convolution can be understood as an Inception module with a maximally
large number of towers. This observation leads us to propose a novel deep
convolutional neural network architecture inspired by Inception, where
Inception modules have been replaced with separable convolutions. We show that
this architecture, dubbed Xception, slightly outperforms Inception V3 on the
ImageNet dataset (which Inception V3 was designed for), and significantly
outperforms Inception V3 on a larger image classification dataset comprising
350 million images and 17,000 classes. Since the Xception architecture has the
same number of parameter as Inception V3, the performance gains are not due to
increased capacity but rather to a more efficient use of model parameters.
Vina Ayumi, L.M. Rasdi Rere, Mohamad Ivan Fanany, Aniati Murni Arymurthy
Comments: Accepted to be published at IEEE ICACSIS 2016. arXiv admin note: text overlap with arXiv:1610.01925
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional neural network (CNN) is one of the most prominent architectures
and algorithm in Deep Learning. It shows a remarkable improvement in the
recognition and classification of objects. This method has also been proven to
be very effective in a variety of computer vision and machine learning
problems. As in other deep learning, however, training the CNN is interesting
yet challenging. Recently, some metaheuristic algorithms have been used to
optimize CNN using Genetic Algorithm, Particle Swarm Optimization, Simulated
Annealing and Harmony Search. In this paper, another type of metaheuristic
algorithms with different strategy has been proposed, i.e. Microcanonical
Annealing to optimize Convolutional Neural Network. The performance of the
proposed method is tested using the MNIST and CIFAR-10 datasets. Although
experiment results of MNIST dataset indicate the increase in computation time
(1.02x – 1.38x), nevertheless this proposed method can considerably enhance the
performance of the original CNN (up to 4.60\%). On the CIFAR10 dataset,
currently, state of the art is 96.53\% using fractional pooling, while this
proposed method achieves 99.14\%.
Xin Jin, Jingying Chi, Siwei Peng, Yulu Tian, Chaochen Ye, Xiaodong Li
Comments: To Appear in the Proceedings of the 8th International Conference on Wireless Communications and Signal Processing (WCSP), Yangzhou, China, 13-15 October, 2016. arXiv admin note: substantial text overlap with arXiv:1409.4842 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
In this paper we investigate the image aesthetics classification problem,
aka, automatically classifying an image into low or high aesthetic quality,
which is quite a challenging problem beyond image recognition. Deep
convolutional neural network (DCNN) methods have recently shown promising
results for image aesthetics assessment. Currently, a powerful inception module
is proposed which shows very high performance in object classification.
However, the inception module has not been taken into consideration for the
image aesthetics assessment problem. In this paper, we propose a novel DCNN
structure codenamed ILGNet for image aesthetics classification, which
introduces the Inception module and connects intermediate Local layers to the
Global layer for the output. Besides, we use a pre-trained image classification
CNN called GoogLeNet on the ImageNet dataset and fine tune our connected local
and global layer on the large scale aesthetics assessment AVA dataset. The
experimental results show that the proposed ILGNet outperforms the state of the
art results in image aesthetics assessment in the AVA benchmark.
Samuel Albanie, Andrea Vedaldi
Comments: British Machine Vision Conference (BMVC) 2016
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Differently from computer vision systems which require explicit supervision,
humans can learn facial expressions by observing people in their environment.
In this paper, we look at how similar capabilities could be developed in
machine vision. As a starting point, we consider the problem of relating facial
expressions to objectively measurable events occurring in videos. In
particular, we consider a gameshow in which contestants play to win significant
sums of money. We extract events affecting the game and corresponding facial
expressions objectively and automatically from the videos, obtaining large
quantities of labelled data for our study. We also develop, using benchmarks
such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial
expression recognition, showing that pre-training on face verification data can
be highly beneficial for this task. Then, we extend these models to use facial
expressions to predict events in videos and learn nameable expressions from
them. The dataset and emotion recognition models are available at
this http URL
Zhi Lu, Gustavo Carneiro, Neeraj Dhungel, Andrew P. Bradley
Comments: 5 Pages, ISBI 2017 Submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In mammography, the efficacy of computer-aided detection methods depends, in
part, on the robust localisation of micro-calcifications ($mu$C). Currently,
the most effective methods are based on three steps: 1) detection of individual
$mu$C candidates, 2) clustering of individual $mu$C candidates, and 3)
classification of $mu$C clusters. Where the second step is motivated both to
reduce the number of false positive detections from the first step and on the
evidence that malignancy depends on a relatively large number of $mu$C
detections within a certain area. In this paper, we propose a novel approach to
$mu$C detection, consisting of the detection emph{and} classification of
individual $mu$C candidates, using shape and appearance features, using a
cascade of boosting classifiers. The final step in our approach then clusters
the remaining individual $mu$C candidates. The main advantage of this approach
lies in its ability to reject a significant number of false positive $mu$C
candidates compared to previously proposed methods. Specifically, on the
INbreast dataset, we show that our approach has a true positive rate (TPR) for
individual $mu$Cs of 40\% at one false positive per image (FPI) and a TPR of
80\% at 10 FPI. These results are significantly more accurate than the current
state of the art, which has a TPR of less than 1\% at one FPI and a TPR of 10\%
at 10 FPI. Our results are competitive with the state of the art at the
subsequent stage of detecting clusters of $mu$Cs.
Hilde Kuehne, Alexander Richard, Juergen Gall
Comments: 27 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present an approach for weakly supervised learning of human actions from
video transcriptions. Our system is based on the idea that, given a sequence of
input data and a transcript, i.e. a list of the order the actions occur in the
video, it is possible to infer the actions within the video stream, and thus,
learn the related action models without the need for any frame-based
annotation. Starting from the transcript information at hand, we split the
given data sequences uniformly based on the number of expected actions. We then
learn action models for each class by maximizing the probability that the
training video sequences are generated by the action models given the sequence
order as defined by the transcripts. The learned model can be used to
temporally segment an unseen video with or without transcript. We evaluate our
approach on four distinct activity datasets, namely Hollywood Extended, MPII
Cooking, Breakfast and CRIM13. We show that our system is able to align the
scripted actions with the video data and that the learned models localize and
classify actions competitively in comparison to models trained with full
supervision, i.e. with frame level annotations, and that they outperform any
current state-of-the-art approach for aligning transcripts with video data.
Patrick Ferdinand Christ, Mohamed Ezzeldin A. Elshaer, Florian Ettlinger, Sunil Tatavarty, Marc Bickel, Patrick Bilic, Markus Rempfler, Marco Armbruster, Felix Hofmann, Melvin D'Anastasi, Wieland H. Sommer, Seyed-Ahmad Ahmadi, Bjoern H. Menze
Comments: Accepted at MICCAI 2016. Source code available on this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic segmentation of the liver and its lesion is an important step
towards deriving quantitative biomarkers for accurate clinical diagnosis and
computer-aided decision support systems. This paper presents a method to
automatically segment liver and lesions in CT abdomen images using cascaded
fully convolutional neural networks (CFCNs) and dense 3D conditional random
fields (CRFs). We train and cascade two FCNs for a combined segmentation of the
liver and its lesions. In the first step, we train a FCN to segment the liver
as ROI input for a second FCN. The second FCN solely segments lesions from the
predicted liver ROIs of step 1. We refine the segmentations of the CFCN using a
dense 3D CRF that accounts for both spatial coherence and appearance. CFCN
models were trained in a 2-fold cross-validation on the abdominal CT dataset
3DIRCAD comprising 15 hepatic tumor volumes. Our results show that CFCN-based
semantic liver and lesion segmentation achieves Dice scores over 94% for liver
with computation times below 100s per volume. We experimentally demonstrate the
robustness of the proposed method as a decision support system with a high
accuracy and speed for usage in daily clinical routine.
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
The rise of multi-million-item dataset initiatives has enabled data-hungry
machine learning algorithms to reach near-human semantic classification at
tasks such as object and scene recognition. Here we describe the Places
Database, a repository of 10 million scene photographs, labeled with scene
semantic categories and attributes, comprising a quasi-exhaustive list of the
types of environments encountered in the world. Using state of the art
Convolutional Neural Networks, we provide impressive baseline performances at
scene classification. With its high-coverage and high-diversity of exemplars,
the Places Database offers an ecosystem to guide future progress on currently
intractable visual recognition problems.
Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Increasing the scalability of machine learning to handle big volume of data
is a challenging task. The scale up approach has some limitations. In this
paper, we proposed a scale out approach for CNN-ELM based on MapReduce on
classifier level. Map process is the CNN-ELM training for certain partition of
data. It involves many CNN-ELM models that can be trained asynchronously.
Reduce process is the averaging of all CNN-ELM weights as final training
result. This approach can save a lot of training time than single CNN-ELM
models trained alone. This approach also increased the scalability of machine
learning by combining scale out and scale up approaches. We verified our method
in extended MNIST data set and not-MNIST data set experiment. However, it has
some drawbacks by additional iteration learning parameters that need to be
carefully taken and training data distribution that need to be carefully
selected. Further researches to use more complex image data set are required.
Dan Hendrycks, Kevin Gimpel
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We consider the two related problems of detecting if an example is
misclassified or out-of-distribution. We present a simple baseline that
utilizes probabilities from softmax distributions. Correctly classified
examples tend to have greater maximum softmax probabilities than erroneously
classified and out-of-distribution examples, allowing for their detection. We
assess performance by defining several tasks in computer vision, natural
language processing, and automatic speech recognition, showing the
effectiveness of this baseline across all. We then show the baseline can
sometimes be surpassed, demonstrating the room for future research on these
underexplored detection tasks.
Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems. Special Issue on Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In big data era, the data continuously generated and its distribution may
keep changes overtime. These challenges in online stream of data are known as
concept drift. In this paper, we proposed the Adaptive Convolutional ELM method
(ACNNELM) as enhancement of Convolutional Neural Network (CNN) with a hybrid
Extreme Learning Machine (ELM) model plus adaptive capability. This method is
aimed for concept drift handling. We enhanced the CNN as convolutional
hiererchical features representation learner combined with Elastic ELM
(E$^2$LM) as a parallel supervised classifier. We propose an Adaptive OS-ELM
(AOS-ELM) for concept drift adaptability in classifier level (named ACNNELM-1)
and matrices concatenation ensembles for concept drift adaptability in ensemble
level (named ACNNELM-2). Our proposed Adaptive CNNELM is flexible that works
well in classifier level and ensemble level while most current methods only
proposed to work on either one of the levels.
We verified our method in extended MNIST data set and not MNIST data set. We
set the experiment to simulate virtual drift, real drift, and hybrid drift
event and we demonstrated how our CNNELM adaptability works. Our proposed
method works well and gives better accuracy, computation scalability, and
concept drifts adaptability compared to the regular ELM and CNN. Further
researches are still required to study the optimum parameters and to use more
varied image data set.
Sandra Castellanos-Paez (LIG Laboratoire d'Informatique de Grenoble), Damien Pellier (LIG Laboratoire d'Informatique de Grenoble), Humbert Fiorino (LIG Laboratoire d'Informatique de Grenoble), Sylvie Pesty (LIG Laboratoire d'Informatique de Grenoble)
Comments: Journ{‘e}es Francophones sur la Planification, la D{‘e}cision et l’Apprentissage pour la conduite de syst{`e}mes (JFPDA 2016) , Jul 2016, Grenoble, France. 2016
Subjects: Artificial Intelligence (cs.AI)
Planning has achieved significant progress in recent years. Among the various
approaches to scale up plan synthesis, the use of macro-actions has been widely
explored. As a first stage towards the development of a solution to learn
on-line macro-actions, we propose an algorithm to identify useful macro-actions
based on data mining techniques. The integration in the planning search of
these learned macro-actions shows significant improvements over four classical
planning benchmarks.
Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
Comments: 17 pages, 16 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
We propose a technique for making Convolutional Neural Network (CNN)-based
models more transparent by visualizing the regions of input that are
“important” for predictions from these models – or visual explanations.
Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM),
uses the class-specific gradient information flowing into the final
convolutional layer of a CNN to produce a coarse localization map of the
important regions in the image. Grad-CAM is a strict generalization of the
Class Activation Mapping. Unlike CAM, Grad-CAM requires no re-training and is
broadly applicable to any CNN-based architectures. We also show how Grad-CAM
may be combined with existing pixel-space visualizations to create a
high-resolution class-discriminative visualization (Guided Grad-CAM). We
generate Grad-CAM and Guided Grad-CAM visual explanations to better understand
image classification, image captioning, and visual question answering (VQA)
models. In the context of image classification models, our visualizations (a)
lend insight into their failure modes showing that seemingly unreasonable
predictions have reasonable explanations, and (b) outperform pixel-space
gradient visualizations (Guided Backpropagation and Deconvolution) on the
ILSVRC-15 weakly supervised localization task. For image captioning and VQA,
our visualizations expose the somewhat surprising insight that common CNN +
LSTM models can often be good at localizing discriminative input image regions
despite not being trained on grounded image-text pairs.
Finally, we design and conduct human studies to measure if Guided Grad-CAM
explanations help users establish trust in the predictions made by deep
networks. Interestingly, we show that Guided Grad-CAM helps untrained users
successfully discern a “stronger” deep network from a “weaker” one even when
both networks make identical predictions.
Fahim T. Imam
Comments: 13 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
This paper presents a systematic survey on existing literature and seminal
works relevant to the application of ontologies in different aspects of Cloud
computing. Our hypothesis is that ontologies along with their reasoning
capabilities can have significant impact on improving various aspects of the
Cloud computing phenomena. Ontologies can promote intelligent decision support
mechanisms for various Cloud based services. They can also provide effective
interoperability among the Cloud based systems and resources. This survey can
promote a comprehensive understanding on the roles and significance of
ontologies within the overall domain of Cloud Computing. Also, this project can
potentially form the basis of new research area and possibilities for both
ontology and Cloud computing communities.
Danijar Hafner
Comments: Bachelor’s thesis
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Using current reinforcement learning methods, it has recently become possible
to learn to play unknown 3D games from raw pixels. In this work, we study the
challenges that arise in such complex environments, and summarize current
methods to approach these. We choose a task within the Doom game, that has not
been approached yet. The goal for the agent is to fight enemies in a 3D world
consisting of five rooms. We train the DQN and LSTM-A3C algorithms on this
task. Results show that both algorithms learn sensible policies, but fail to
achieve high scores given the amount of training. We provide insights into the
learned behavior, which can serve as a valuable starting point for further
research in the Doom domain.
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
The rise of multi-million-item dataset initiatives has enabled data-hungry
machine learning algorithms to reach near-human semantic classification at
tasks such as object and scene recognition. Here we describe the Places
Database, a repository of 10 million scene photographs, labeled with scene
semantic categories and attributes, comprising a quasi-exhaustive list of the
types of environments encountered in the world. Using state of the art
Convolutional Neural Networks, we provide impressive baseline performances at
scene classification. With its high-coverage and high-diversity of exemplars,
the Places Database offers an ecosystem to guide future progress on currently
intractable visual recognition problems.
Lei Tai, Ming Liu
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Exploration in an unknown environment is the core functionality for mobile
robots. Learning-based exploration methods, including convolutional neural
networks, provide excellent strategies without human-designed logic for the
feature extraction. But the conventional supervised learning algorithms cost
lots of efforts on the labeling work of datasets inevitably. Scenes not
included in the training set are mostly unrecognized either. We propose a deep
reinforcement learning method for the exploration of mobile robots in an indoor
environment with the depth information from an RGB-D sensor only. Based on the
Deep Q-Network framework, the raw depth image is taken as the only input to
estimate the Q values corresponding to all moving commands. The training of the
network weights is end-to-end. In arbitrarily constructed simulation
environments, we show that the robot can be quickly adapted to unfamiliar
scenes without any man-made labeling. Besides, through analysis of receptive
fields of feature representations, deep reinforcement learning motivates the
convolutional networks to estimate the traversability of the scenes. The test
results are compared with the exploration strategies separately based on deep
learning or reinforcement learning. Even trained only in the simulated
environment, experimental results in real-world environment demonstrate that
the cognitive ability of robot controller is dramatically improved compared
with the supervised method. We believe it is the first time that raw sensor
information is used to build cognitive exploration strategy for mobile robots
through end-to-end deep reinforcement learning.
Tim Althoff, Ryen W. White, Eric Horvitz
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Physical activity helps people maintain a healthy weight and reduces the risk
for several chronic diseases. Although this knowledge is widely recognized,
adults and children in many countries around the world do not get recommended
amounts of physical activity. While many interventions are found to be
ineffective at increasing physical activity or reaching inactive populations,
there have been anecdotal reports of increased physical activity due to novel
mobile games that embed game play in the physical world. The most recent and
salient example of such a game is Pok’emon Go, which has reportedly reached
tens of millions of users in the US and worldwide.
We study the effect of Pok’emon Go on physical activity through a
combination of signals from large-scale corpora of wearable sensor data and
search engine logs for 32 thousand users over a period of three months.
Pok’emon Go players are identified through search engine queries and activity
is measured through accelerometry. We find that Pok’emon Go leads to
significant increases in physical activity over a period of 30 days, with
particularly engaged users (i.e., those making multiple search queries for
details about game usage) increasing their activity by 1473 steps a day on
average, a more than 25% increase compared to their prior activity level
($p<10^{-15}$). In the short time span of the study, we estimate that Pok’emon
Go has added a total of 144 billion steps to US physical activity. Furthermore,
Pok’emon Go has been able to increase physical activity across men and women
of all ages, weight status, and prior activity levels showing this form of game
leads to increases in physical activity with significant implications for
public health. We find that Pok’emon Go is able to reach low activity
populations while all four leading mobile health apps studied in this work
largely draw from an already very active population.
Özlem Çetinoğlu, Sarah Schulz, Ngoc Thang Vu
Comments: Will appear in the Proceedings of the 2nd Workshop on Computational Approaches to Linguistic Code Switching @EMNLP, 2016
Subjects: Computation and Language (cs.CL)
This paper addresses challenges of Natural Language Processing (NLP) on
non-canonical multilingual data in which two or more languages are mixed. It
refers to code-switching which has become more popular in our daily life and
therefore obtains an increasing amount of attention from the research
community. We report our experience that cov- ers not only core NLP tasks such
as normalisation, language identification, language modelling, part-of-speech
tagging and dependency parsing but also more downstream ones such as machine
translation and automatic speech recognition. We highlight and discuss the key
problems for each of the tasks with supporting examples from different language
pairs and relevant previous work.
Marta R. Costa-jussà, Carlos Escolano
Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)
Morphology unbalanced languages remains a big challenge in the context of
machine translation. In this paper, we propose to de-couple machine translation
from morphology generation in order to better deal with the problem. We
investigate the morphology simplification with a reasonable trade-off between
expected gain and generation complexity.
For the Chinese-Spanish task, optimum morphological simplification is in
gender and number. For this purpose, we design a new classification
architecture which, compared to other standard machine learning techniques,
obtains the best results. This proposed neural-based architecture consists of
several layers: an embedding, a convolutional followed by a recurrent neural
network and, finally, ends with sigmoid and softmax layers. We obtain
classification results over 98% accuracy in gender classification, over 93% in
number classification, and an overall translation improvement of 0.7 METEOR.
Courtney Napoles, Keisuke Sakaguchi, Joel Tetreault
Comments: to appear in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Subjects: Computation and Language (cs.CL)
Current methods for automatically evaluating grammatical error correction
(GEC) systems rely on gold-standard references. However, these methods suffer
from penalizing grammatical edits that are correct but not in the gold
standard. We show that reference-less grammaticality metrics correlate very
strongly with human judgments and are competitive with the leading
reference-based evaluation metrics. By interpolating both methods, we achieve
state-of-the-art correlation with human judgments. Finally, we show that GEC
metrics are much more reliable when they are calculated at the sentence level
instead of the corpus level. We have set up a CodaLab site for benchmarking GEC
output using a common dataset and different evaluation metrics.
Fahim T. Imam
Comments: 13 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
This paper presents a systematic survey on existing literature and seminal
works relevant to the application of ontologies in different aspects of Cloud
computing. Our hypothesis is that ontologies along with their reasoning
capabilities can have significant impact on improving various aspects of the
Cloud computing phenomena. Ontologies can promote intelligent decision support
mechanisms for various Cloud based services. They can also provide effective
interoperability among the Cloud based systems and resources. This survey can
promote a comprehensive understanding on the roles and significance of
ontologies within the overall domain of Cloud Computing. Also, this project can
potentially form the basis of new research area and possibilities for both
ontology and Cloud computing communities.
Jiejie Wang, Bin Liu
Comments: 6 pages, 3 figures, conference
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This paper proposes a Bayesian modeling approach to address the problem of
online fault-tolerant dynamic event region detection in wireless sensor
networks. In our model every network node is associated with a virtual
community and a trust index, which quantitatively measures the trustworthiness
of this node in its community. If a sensor node’s trust value is smaller than a
threshold, it suggests that this node encounters a fault and thus its sensor
reading can not be trusted at this moment. This concept of sensor node trust
discriminates our model with the other alternatives, e.g.,the Markov random
fields. The practical issues, including spatiotemporal correlations of neighbor
nodes’ sensor readings, the presence of sensor faults and the requirement of
online processing are linked together by the concept trust and are all taken
into account in the modeling stage. Based on the proposed model, the trust
value of each node is updated online by a particle filter algorithm upon the
arrival of new observations. The decision on whether a node is located in the
event region is made based upon the current estimate of this node’s trust
value. Experimental results demonstrate that the proposed solution can provide
striking better performance than existent methods in terms of error rate in
detecting the event region.
Vincenzo De Florio
Comments: Revised version of Technical Report ESAT/ACCA/1997/3, ESAT Dept., University of Leuven, Belgium
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This document describes a class of C functions implementing a distributed
software voting mechanism for EPX or similar message passing multi-threaded
environments. Such a tool may be used for example, to set up a restoring organ,
i.e., an NMR (i.e., N-module redundant) system with N voters. In order to
describe the tool we start defining its basic building block, the voter. A
voter is defined as a software module connected to one user module and to a
farm of fellow voters arranged into a clique. By means of the functions in the
class the user module is able: to create a static “picture” of the voting farm,
needed for the set up of the clique; to instantiate the local voter; to send
input or control messages to that voter. No interlocutor is needed other than
the local voter. The other user modules are supposed to create coherent
pictures and instances of voters on other nodes of the machine and to manage
consistently the task of their local intermediary. All technicalities
concerning the set up of the clique and the exchange of messages between the
voters are completely transparent to the user module. In the following the
basic functionalities of the VotingFarm class will be discussed, namely how to
set up a “passive farm”, or a non-alive topological representation of a
yet-to-be-activated voting farm; how to initiate the voting farm; how to
control the farm.
Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Increasing the scalability of machine learning to handle big volume of data
is a challenging task. The scale up approach has some limitations. In this
paper, we proposed a scale out approach for CNN-ELM based on MapReduce on
classifier level. Map process is the CNN-ELM training for certain partition of
data. It involves many CNN-ELM models that can be trained asynchronously.
Reduce process is the averaging of all CNN-ELM weights as final training
result. This approach can save a lot of training time than single CNN-ELM
models trained alone. This approach also increased the scalability of machine
learning by combining scale out and scale up approaches. We verified our method
in extended MNIST data set and not-MNIST data set experiment. However, it has
some drawbacks by additional iteration learning parameters that need to be
carefully taken and training data distribution that need to be carefully
selected. Further researches to use more complex image data set are required.
Roly Perera, Deepak Garg, James Cheney
Comments: in Proceedings of 27th International Conference on Concurrency Theory (CONCUR 2016)
Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO)
We offer a lattice-theoretic account of dynamic slicing for {pi}-calculus,
building on prior work in the sequential setting. For any run of a concurrent
program, we exhibit a Galois connection relating forward slices of the start
configuration to backward slices of the end configuration. We prove that, up to
lattice isomorphism, the same Galois connection arises for any causally
equivalent execution, allowing an efficient concurrent implementation of
slicing via a standard interleaving semantics. Our approach has been formalised
in the dependently-typed language Agda.
Hyeokjun Choe, Seil Lee, Seongsik Park, Seijoon Kim, Eui-Young Chung, Sungroh Yoon
Comments: 9 pages, 7 figures
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
In computer architecture, near-data processing (NDP) refers to augmenting the
memory or the storage with processing power so that it can process the data
stored therein, passing only the processed data upwards in the memory
hierarchy. By offloading the computational burden of CPU and saving the need
for transferring raw data, NDP has a great potential in terms of accelerating
computation and reducing power consumption. Despite its potential, NDP had only
limited success until recently, mainly due to the performance mismatch in logic
and memory process technologies. Recently, there have been two major changes in
the game, making NDP more appealing than ever. The first is the success of deep
learning, which often requires frequent transfers of big data for training. The
second is the advent of NAND flash-based solid-state drives (SSDs) containing
multicore CPUs that can be used for data processing. In this paper, we evaluate
the potential of NDP for machine learning using a new SSD platform that allows
us to simulate in-storage processing (ISP) of machine learning workloads.
Although our platform named ISPML can execute various algorithms, this paper
focuses on the stochastic gradient decent (SGD) algorithm, which is the de
facto standard method for training deep neural networks. We implement and
compare three variants of SGD (synchronous, downpour, and elastic averaging)
using the ISP-ML platform, in which we exploit the multiple NAND channels for
implementing parallel SGD. In addition, we compare the performance of ISP
optimization and that of conventional in-host processing optimization. To the
best of our knowledge, this is one of the first attempts to apply NDP to the
optimization for machine learning.
Tianyi Chen, Aryan Mokhtari, Xin Wang, Alejandro Ribeiro, Georgios B. Giannakis
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)
Existing approaches to resource allocation for nowadays stochastic networks
are challenged to meet fast convergence and tolerable delay requirements. In
the era of data deluge and information explosion, the present paper leverages
online learning advances to facilitate stochastic resource allocation tasks. By
recognizing the central role of Lagrange multipliers, the underlying
constrained optimization problem is formulated as a machine learning task
involving both training and operational modes, with the goal of learning the
sought multipliers in a fast and efficient manner. To this end, an
order-optimal offline learning approach is developed first for batch training,
and it is then generalized to the online setting with a procedure termed
learn-and-adapt. The novel resource allocation protocol permeates benefits of
stochastic approximation and statistical learning to obtain low-complexity
online updates with learning errors close to the statistical accuracy limits,
while still preserving adaptation performance, which in the stochastic network
optimization context guarantees queue stability. Analysis and simulated tests
demonstrate that the proposed data-driven approach improves the delay and
convergence performance of existing resource allocation schemes.
Nancy Lynch, Cameron Musco, Merav Parter
Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Neurons and Cognition (q-bio.NC)
We initiate a line of investigation into biological neural networks from an
algorithmic perspective. We develop a simplified but biologically plausible
model for distributed computation in stochastic spiking neural networks and
study tradeoffs between computation time and network complexity in this model.
Our aim is to abstract real neural networks in a way that, while not capturing
all interesting features, preserves high-level behavior and allows us to make
biologically relevant conclusions.
In this paper, we focus on the important `winner-take-all’ (WTA) problem,
which is analogous to a neural leader election unit: a network consisting of
$n$ input neurons and $n$ corresponding output neurons must converge to a state
in which a single output corresponding to a firing input (the `winner’) fires,
while all other outputs remain silent. Neural circuits for WTA rely on
inhibitory neurons, which suppress the activity of competing outputs and drive
the network towards a converged state with a single firing winner. We attempt
to understand how the number of inhibitors used affects network convergence
time.
We show that it is possible to significantly outperform naive WTA
constructions through a more refined use of inhibition, solving the problem in
$O( heta)$ rounds in expectation with just $O(log^{1/ heta} n)$ inhibitors
for any $ heta$. An alternative construction gives convergence in
$O(log^{1/ heta} n)$ rounds with $O( heta)$ inhibitors. We compliment these
upper bounds with our main technical contribution, a nearly matching lower
bound for networks using $ge loglog n$ inhibitors. Our lower bound uses
familiar indistinguishability and locality arguments from distributed computing
theory. It lets us derive a number of interesting conclusions about the
structure of any network solving WTA with good probability, and the use of
randomness and inhibition within such a network.
Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Increasing the scalability of machine learning to handle big volume of data
is a challenging task. The scale up approach has some limitations. In this
paper, we proposed a scale out approach for CNN-ELM based on MapReduce on
classifier level. Map process is the CNN-ELM training for certain partition of
data. It involves many CNN-ELM models that can be trained asynchronously.
Reduce process is the averaging of all CNN-ELM weights as final training
result. This approach can save a lot of training time than single CNN-ELM
models trained alone. This approach also increased the scalability of machine
learning by combining scale out and scale up approaches. We verified our method
in extended MNIST data set and not-MNIST data set experiment. However, it has
some drawbacks by additional iteration learning parameters that need to be
carefully taken and training data distribution that need to be carefully
selected. Further researches to use more complex image data set are required.
Danijar Hafner
Comments: Bachelor’s thesis
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)
Using current reinforcement learning methods, it has recently become possible
to learn to play unknown 3D games from raw pixels. In this work, we study the
challenges that arise in such complex environments, and summarize current
methods to approach these. We choose a task within the Doom game, that has not
been approached yet. The goal for the agent is to fight enemies in a 3D world
consisting of five rooms. We train the DQN and LSTM-A3C algorithms on this
task. Results show that both algorithms learn sensible policies, but fail to
achieve high scores given the amount of training. We provide insights into the
learned behavior, which can serve as a valuable starting point for further
research in the Doom domain.
Dan Alistarh, Jerry Li, Ryota Tomioka, Milan Vojnovic
Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Parallel implementations of stochastic gradient descent (SGD) have received
significant research attention, thanks to excellent scalability properties of
this algorithm, and to its efficiency in the context of training deep neural
networks. A fundamental barrier for parallelizing large-scale SGD is the fact
that the cost of communicating the gradient updates between nodes can be very
large. Consequently, lossy compresion heuristics have been proposed, by which
nodes only communicate quantized gradients. Although effective in practice,
these heuristics do not always provably converge, and it is not clear whether
they are optimal.
In this paper, we propose Quantized SGD (QSGD), a family of compression
schemes which allow the compression of gradient updates at each node, while
guaranteeing convergence under standard assumptions. QSGD allows the user to
trade off compression and convergence time: it can communicate a sublinear
number of bits per iteration in the model dimension, and can achieve
asymptotically optimal communication cost. We complement our theoretical
results with empirical data, showing that QSGD can significantly reduce
communication cost, while being competitive with standard uncompressed
techniques on a variety of real tasks.
Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
Comments: 17 pages, 16 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)
We propose a technique for making Convolutional Neural Network (CNN)-based
models more transparent by visualizing the regions of input that are
“important” for predictions from these models – or visual explanations.
Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM),
uses the class-specific gradient information flowing into the final
convolutional layer of a CNN to produce a coarse localization map of the
important regions in the image. Grad-CAM is a strict generalization of the
Class Activation Mapping. Unlike CAM, Grad-CAM requires no re-training and is
broadly applicable to any CNN-based architectures. We also show how Grad-CAM
may be combined with existing pixel-space visualizations to create a
high-resolution class-discriminative visualization (Guided Grad-CAM). We
generate Grad-CAM and Guided Grad-CAM visual explanations to better understand
image classification, image captioning, and visual question answering (VQA)
models. In the context of image classification models, our visualizations (a)
lend insight into their failure modes showing that seemingly unreasonable
predictions have reasonable explanations, and (b) outperform pixel-space
gradient visualizations (Guided Backpropagation and Deconvolution) on the
ILSVRC-15 weakly supervised localization task. For image captioning and VQA,
our visualizations expose the somewhat surprising insight that common CNN +
LSTM models can often be good at localizing discriminative input image regions
despite not being trained on grounded image-text pairs.
Finally, we design and conduct human studies to measure if Guided Grad-CAM
explanations help users establish trust in the predictions made by deep
networks. Interestingly, we show that Guided Grad-CAM helps untrained users
successfully discern a “stronger” deep network from a “weaker” one even when
both networks make identical predictions.
Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems. Special Issue on Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In big data era, the data continuously generated and its distribution may
keep changes overtime. These challenges in online stream of data are known as
concept drift. In this paper, we proposed the Adaptive Convolutional ELM method
(ACNNELM) as enhancement of Convolutional Neural Network (CNN) with a hybrid
Extreme Learning Machine (ELM) model plus adaptive capability. This method is
aimed for concept drift handling. We enhanced the CNN as convolutional
hiererchical features representation learner combined with Elastic ELM
(E$^2$LM) as a parallel supervised classifier. We propose an Adaptive OS-ELM
(AOS-ELM) for concept drift adaptability in classifier level (named ACNNELM-1)
and matrices concatenation ensembles for concept drift adaptability in ensemble
level (named ACNNELM-2). Our proposed Adaptive CNNELM is flexible that works
well in classifier level and ensemble level while most current methods only
proposed to work on either one of the levels.
We verified our method in extended MNIST data set and not MNIST data set. We
set the experiment to simulate virtual drift, real drift, and hybrid drift
event and we demonstrated how our CNNELM adaptability works. Our proposed
method works well and gives better accuracy, computation scalability, and
concept drifts adaptability compared to the regular ELM and CNN. Further
researches are still required to study the optimum parameters and to use more
varied image data set.
Ling Zhong, Jason T. L. Wang
Comments: 26 pages, 3 figures
Subjects: Genomics (q-bio.GN); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)
MicroRNAs (miRNAs) are non-coding RNAs with approximately 22 nucleotides (nt)
that are derived from precursor molecules. These precursor molecules or
pre-miRNAs often fold into stem-loop hairpin structures. However, a large
number of sequences with pre-miRNA-like hairpins can be found in genomes. It is
a challenge to distinguish the real pre-miRNAs from other hairpin sequences
with similar stem-loops (referred to as pseudo pre-miRNAs). Several
computational methods have been developed to tackle this challenge. In this
paper we propose a new method, called MirID, for identifying and classifying
microRNA precursors. We collect 74 features from the sequences and secondary
structures of pre-miRNAs; some of these features are taken from our previous
studies on non-coding RNA prediction while others were suggested in the
literature. We develop a combinatorial feature mining algorithm to identify
suitable feature sets. These feature sets are then used to train support vector
machines to obtain classification models, based on which classifier ensemble is
constructed. Finally we use an AdaBoost algorithm to further enhance the
accuracy of the classifier ensemble. Experimental results on a variety of
species demonstrate the good performance of the proposed method, and its
superiority over existing tools.
Hyeokjun Choe, Seil Lee, Seongsik Park, Seijoon Kim, Eui-Young Chung, Sungroh Yoon
Comments: 9 pages, 7 figures
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
In computer architecture, near-data processing (NDP) refers to augmenting the
memory or the storage with processing power so that it can process the data
stored therein, passing only the processed data upwards in the memory
hierarchy. By offloading the computational burden of CPU and saving the need
for transferring raw data, NDP has a great potential in terms of accelerating
computation and reducing power consumption. Despite its potential, NDP had only
limited success until recently, mainly due to the performance mismatch in logic
and memory process technologies. Recently, there have been two major changes in
the game, making NDP more appealing than ever. The first is the success of deep
learning, which often requires frequent transfers of big data for training. The
second is the advent of NAND flash-based solid-state drives (SSDs) containing
multicore CPUs that can be used for data processing. In this paper, we evaluate
the potential of NDP for machine learning using a new SSD platform that allows
us to simulate in-storage processing (ISP) of machine learning workloads.
Although our platform named ISPML can execute various algorithms, this paper
focuses on the stochastic gradient decent (SGD) algorithm, which is the de
facto standard method for training deep neural networks. We implement and
compare three variants of SGD (synchronous, downpour, and elastic averaging)
using the ISP-ML platform, in which we exploit the multiple NAND channels for
implementing parallel SGD. In addition, we compare the performance of ISP
optimization and that of conventional in-host processing optimization. To the
best of our knowledge, this is one of the first attempts to apply NDP to the
optimization for machine learning.
Samuli Laine, Timo Aila
Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)
In this paper, we present a simple and efficient method for training deep
neural networks in a semi-supervised setting where only a small portion of
training data is labeled. We introduce temporal ensembling, where we form a
consensus prediction of the unknown labels under multiple instances of the
network-in-training on different epochs, and most importantly, under different
regularization and input augmentation conditions. This ensemble prediction can
be expected to be a better predictor for the unknown labels than the output of
the network at the most recent training epoch, and can thus be used as a target
for training. Using our method, we set new records for two standard
semi-supervised learning benchmarks, reducing the classification error rate
from 18.63% to 12.89% in CIFAR-10 with 4000 labels and from 18.44% to 6.83% in
SVHN with 500 labels.
Tianyi Chen, Aryan Mokhtari, Xin Wang, Alejandro Ribeiro, Georgios B. Giannakis
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)
Existing approaches to resource allocation for nowadays stochastic networks
are challenged to meet fast convergence and tolerable delay requirements. In
the era of data deluge and information explosion, the present paper leverages
online learning advances to facilitate stochastic resource allocation tasks. By
recognizing the central role of Lagrange multipliers, the underlying
constrained optimization problem is formulated as a machine learning task
involving both training and operational modes, with the goal of learning the
sought multipliers in a fast and efficient manner. To this end, an
order-optimal offline learning approach is developed first for batch training,
and it is then generalized to the online setting with a procedure termed
learn-and-adapt. The novel resource allocation protocol permeates benefits of
stochastic approximation and statistical learning to obtain low-complexity
online updates with learning errors close to the statistical accuracy limits,
while still preserving adaptation performance, which in the stochastic network
optimization context guarantees queue stability. Analysis and simulated tests
demonstrate that the proposed data-driven approach improves the delay and
convergence performance of existing resource allocation schemes.
Dan Hendrycks, Kevin Gimpel
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)
We consider the two related problems of detecting if an example is
misclassified or out-of-distribution. We present a simple baseline that
utilizes probabilities from softmax distributions. Correctly classified
examples tend to have greater maximum softmax probabilities than erroneously
classified and out-of-distribution examples, allowing for their detection. We
assess performance by defining several tasks in computer vision, natural
language processing, and automatic speech recognition, showing the
effectiveness of this baseline across all. We then show the baseline can
sometimes be surpassed, demonstrating the room for future research on these
underexplored detection tasks.
Seyed Rasoul Etesami, Walid Saad, Narayan Mandayam, H. Vincent Poor
Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Learning (cs.LG); Systems and Control (cs.SY)
In this paper, the problem of smart grid energy management under stochastic
dynamics is investigated. In the considered model, at the demand side, it is
assumed that customers can act as prosumers who own renewable energy sources
and can both produce and consume energy. Due to the coupling between the
prosumers’ decisions and the stochastic nature of renewable energy, the
interaction among prosumers is formulated as a stochastic game, in which each
prosumer seeks to maximize its payoff, in terms of revenues, by controlling its
energy consumption and demand. In particular, the subjective behavior of
prosumers is explicitly reflected into their payoff functions using prospect
theory, a powerful framework that allows modeling real-life human choices. For
this prospect-based stochastic game, it is shown that there always exists a
stationary Nash equilibrium where the prosumers’ trading policies in the
equilibrium are independent of the time and their histories of the play.
Moreover, a novel distributed algorithm with no information sharing among
prosumers is proposed and shown to converge to an $epsilon$-Nash equilibrium.
On the other hand, at the supply side, the interaction between the utility
company and the prosumers is formulated as an online optimization problem in
which the utility company’s goal is to learn its optimal energy allocation
rules. For this case, it is shown that such an optimization problem admits a
no-regret algorithm meaning that regardless of the actual outcome of the game
among the prosumers, the utility company can follow a strategy that mitigates
its allocation costs as if it knew the entire demand market a priori.
Simulation results show the convergence of the proposed algorithms to their
predicted outcomes and present new insights resulting from prospect theory that
contribute toward more efficient energy management in the smart grids.
Oskari Tervo, Antti Tölli, Markku Juntti, Le-Nam Tran
Comments: Submitted for possible publication, 32 pages, 11 figures
Subjects: Information Theory (cs.IT)
This paper proposes energy-efficient coordinated beamforming strategies for
multi-cell multi-user multiple-input single-output system. We consider a
practical power consumption model, where part of the consumed power depends on
the base station or user specific data rates due to coding, decoding and
backhaul. This is different from the existing approaches where the base station
power consumption has been assumed to be a convex or linear function. Two
optimization criteria are considered, namely network energy efficiency
maximization and weighted sum energy efficiency maximization. We develop
successive convex approximation based algorithms to tackle these difficult
nonconvex problems. We further propose decentralized implementations for the
considered problems, in which base stations perform parallel and distributed
computation based on local channel state information and limited backhaul
information exchange. The decentralized approaches admit closed-form solutions
and can be implemented without invoking a generic external convex solver. The
effect of pilot contamination caused by pilot reuse is also taken into account
in the energy efficiency problems. To achieve energy efficiency improvements
with a limited number of pilot resources, we propose a heuristic
energy-efficient pilot allocation strategy to mitigate the pilot contamination
effect. The numerical results are provided to demonstrate that the rate
dependent power consumption has a large impact on the system energy efficiency,
and, thus, has to be taken into account when devising energy-efficient
transmission strategies. We also investigate the effect of pilot contamination
and show that the proposed pilot allocation strategy achieve significant
performance improvements when a limited number of pilot resources is available.
Jens Steinwandt, Florian Roemer, Martin Haardt, Giovanni Del Galdo
Comments: submitted to IEEE Transactions on Signal Processing on 18/01/2016
Subjects: Information Theory (cs.IT)
Spatial smoothing is a widely used preprocessing scheme to improve the
performance of high-resolution parameter estimation algorithms in case of
coherent signals or if only a small number of snapshots is available. In this
paper, we present a first-order performance analysis of the spatially smoothed
versions of R-D Standard ESPRIT and R-D Unitary ESPRIT for sources with
arbitrary signal constellations as well as R-D NC Standard ESPRIT and R-D NC
Unitary ESPRIT for strictly second-order (SO) non-circular (NC) sources. The
derived expressions are asymptotic in the effective signal-to-noise ratio
(SNR), i.e., the approximations become exact for either high SNRs or a large
sample size. Moreover, no assumptions on the noise statistics are required
apart from a zero-mean and finite SO moments. We show that both R-D NC
ESPRIT-type algorithms with spatial smoothing perform asymptotically identical
in the high effective SNR regime. Generally, the performance of spatial
smoothing based algorithms depends on the number of subarrays, which is a
design parameter and needs to be chosen beforehand. In order to gain more
insights into the optimal choice of the number of subarrays, we simplify the
derived analytical R-D mean square error (MSE) expressions for the special case
of a single source. The obtained MSE expression explicitly depends on the
number of subarrays in each dimension, which allows us to analytically find the
optimal number of subarrays for spatial smoothing. Based on this result, we
additionally derive the maximum asymptotic gain from spatial smoothing and
explicitly compute the asymptotic efficiency for this special case. All the
analytical results are verified by simulations.
Matthew W. Morency, Sergiy A. Vorobyov
Comments: 12 two columns pages, 5 figures, Submitted to IEEE Trans. Signal Processing on September 2016
Subjects: Information Theory (cs.IT); Algebraic Geometry (math.AG)
A new approach to solving a class of rankconstrained semi-definite
programming (SDP) problems, which appear in many signal processing applications
such as transmit beamspace design in multiple-input multiple-output (MIMO)
radar, downlink beamforming design in MIMO communications, generalized sidelobe
canceller design, phase retrieval, etc., is presented. The essence of the
approach is the use of underlying algebraic structure enforced in such problems
by other practical constraints such as, for example, null shaping constraint.
According to this approach, instead of relaxing the non-convex rankconstrained
SDP problem to a feasible set of positive semidefinite matrices, we restrict it
to a space of polynomials whose dimension is equal to the desired rank. The
resulting optimization problem is then convex as its solution is required to be
full rank, and can be efficiently and exactly solved. A simple matrix
decomposition is needed to recover the solution of the original problem from
the solution of the restricted one. We show how this approach can be applied to
solving some important signal processing problems that contain null-shaping
constraints. As a byproduct of our study, the conjugacy of beamfoming and
parameter estimation problems leads us to formulation of a new and rigorous
criterion for signal/noise subspace identification. Simulation results are
performed for the problem of rank-constrained beamforming design and show an
exact agreement of the solution with the proposed algebraic structure, as well
as significant performance improvements in terms of sidelobe suppression
compared to the existing methods.
Anibal Sanjab, Walid Saad
Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)
In this paper, a general model for cyber-physical systems (CPSs), that
captures the diffusion of attacks from the cyber layer to the physical system,
is studied. In particular, a game-theoretic approach is proposed to analyze the
interactions between one defender and one attacker over a CPS. In this game,
the attacker launches cyber attacks on a number of cyber components of the CPS
to maximize the potential harm to the physical system while the system operator
chooses to defend a number of cyber nodes to thwart the attacks and minimize
potential damage to the physical side. The proposed game explicitly accounts
for the fact that both attacker and defender can have different computational
capabilities and disparate levels of knowledge of the system. To capture such
bounded rationality of attacker and defender, a novel approach inspired from
the behavioral framework of cognitive hierarchy theory is developed. In this
framework, the defender is assumed to be faced with an attacker that can have
different possible thinking levels reflecting its knowledge of the system and
computational capabilities. To solve the game, the optimal strategies of each
attacker type are characterized and the optimal response of the defender facing
these different types is computed. This general approach is applied to smart
grid security considering wide area protection with energy markets
implications. Numerical results show that a deviation from the Nash equilibrium
strategy is beneficial when the bounded rationality of the attacker is
considered. Moreover, the results show that the defender’s incentive to deviate
from the Nash equilibrium decreases when faced with an attacker that has high
computational ability.
Georges El Rahi, Anibal Sanjab, Walid Saad, Narayan B. Mandayam, H. Vincent Poor
Comments: 54th Annual Allerton Conference on Communication, Control, and Computing
Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)
The proliferation of distributed generation and storage units is leading to
the development of local, small-scale distribution grids, known as microgrids
(MGs). In this paper, the problem of optimizing the energy trading decisions of
MG operators (MGOs) is studied using game theory. In the formulated game, each
MGO chooses the amount of energy that must be sold immediately or stored for
future emergencies, given the prospective market prices which are influenced by
other MGOs’ decisions. The problem is modeled using a Bayesian game to account
for the incomplete information that MGOs have about each others’ levels of
surplus. The proposed game explicitly accounts for each MGO’s subjective
decision when faced with the uncertainty of its opponents’ energy surplus. In
particular, the so-called framing effect, from the framework of prospect theory
(PT), is used to account for each MGO’s valuation of its gains and losses with
respect to an individual utility reference point. The reference point is
typically different for each individual and originates from its past
experiences and future aspirations. A closed-form expression for the Bayesian
Nash equilibrium is derived for the standard game formulation. Under PT, a best
response algorithm is proposed to find the equilibrium. Simulation results show
that, depending on their individual reference points, MGOs can tend to store
more or less energy under PT compared to classical game theory. In addition,
the impact of the reference point is found to be more prominent as the
emergency price set by the power company increases.
Seyed Rasoul Etesami, Walid Saad, Narayan Mandayam, H. Vincent Poor
Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Learning (cs.LG); Systems and Control (cs.SY)
In this paper, the problem of smart grid energy management under stochastic
dynamics is investigated. In the considered model, at the demand side, it is
assumed that customers can act as prosumers who own renewable energy sources
and can both produce and consume energy. Due to the coupling between the
prosumers’ decisions and the stochastic nature of renewable energy, the
interaction among prosumers is formulated as a stochastic game, in which each
prosumer seeks to maximize its payoff, in terms of revenues, by controlling its
energy consumption and demand. In particular, the subjective behavior of
prosumers is explicitly reflected into their payoff functions using prospect
theory, a powerful framework that allows modeling real-life human choices. For
this prospect-based stochastic game, it is shown that there always exists a
stationary Nash equilibrium where the prosumers’ trading policies in the
equilibrium are independent of the time and their histories of the play.
Moreover, a novel distributed algorithm with no information sharing among
prosumers is proposed and shown to converge to an $epsilon$-Nash equilibrium.
On the other hand, at the supply side, the interaction between the utility
company and the prosumers is formulated as an online optimization problem in
which the utility company’s goal is to learn its optimal energy allocation
rules. For this case, it is shown that such an optimization problem admits a
no-regret algorithm meaning that regardless of the actual outcome of the game
among the prosumers, the utility company can follow a strategy that mitigates
its allocation costs as if it knew the entire demand market a priori.
Simulation results show the convergence of the proposed algorithms to their
predicted outcomes and present new insights resulting from prospect theory that
contribute toward more efficient energy management in the smart grids.