IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Mon, 10 Oct 2016

    我爱机器学习(52ml.net)发表于 2016-10-10 00:00:00
    love 0

    Neural and Evolutionary Computing

    Temporal Ensembling for Semi-Supervised Learning

    Samuli Laine, Timo Aila
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    In this paper, we present a simple and efficient method for training deep
    neural networks in a semi-supervised setting where only a small portion of
    training data is labeled. We introduce temporal ensembling, where we form a
    consensus prediction of the unknown labels under multiple instances of the
    network-in-training on different epochs, and most importantly, under different
    regularization and input augmentation conditions. This ensemble prediction can
    be expected to be a better predictor for the unknown labels than the output of
    the network at the most recent training epoch, and can thus be used as a target
    for training. Using our method, we set new records for two standard
    semi-supervised learning benchmarks, reducing the classification error rate
    from 18.63% to 12.89% in CIFAR-10 with 4000 labels and from 18.44% to 6.83% in
    SVHN with 500 labels.

    A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

    Dan Hendrycks, Kevin Gimpel
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We consider the two related problems of detecting if an example is
    misclassified or out-of-distribution. We present a simple baseline that
    utilizes probabilities from softmax distributions. Correctly classified
    examples tend to have greater maximum softmax probabilities than erroneously
    classified and out-of-distribution examples, allowing for their detection. We
    assess performance by defining several tasks in computer vision, natural
    language processing, and automatic speech recognition, showing the
    effectiveness of this baseline across all. We then show the baseline can
    sometimes be surpassed, demonstrating the room for future research on these
    underexplored detection tasks.

    Computational Tradeoffs in Biological Neural Networks: Self-Stabilizing Winner-Take-All Networks

    Nancy Lynch, Cameron Musco, Merav Parter
    Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Neurons and Cognition (q-bio.NC)

    We initiate a line of investigation into biological neural networks from an
    algorithmic perspective. We develop a simplified but biologically plausible
    model for distributed computation in stochastic spiking neural networks and
    study tradeoffs between computation time and network complexity in this model.
    Our aim is to abstract real neural networks in a way that, while not capturing
    all interesting features, preserves high-level behavior and allows us to make
    biologically relevant conclusions.

    In this paper, we focus on the important `winner-take-all’ (WTA) problem,
    which is analogous to a neural leader election unit: a network consisting of
    $n$ input neurons and $n$ corresponding output neurons must converge to a state
    in which a single output corresponding to a firing input (the `winner’) fires,
    while all other outputs remain silent. Neural circuits for WTA rely on
    inhibitory neurons, which suppress the activity of competing outputs and drive
    the network towards a converged state with a single firing winner. We attempt
    to understand how the number of inhibitors used affects network convergence
    time.

    We show that it is possible to significantly outperform naive WTA
    constructions through a more refined use of inhibition, solving the problem in
    $O( heta)$ rounds in expectation with just $O(log^{1/ heta} n)$ inhibitors
    for any $ heta$. An alternative construction gives convergence in
    $O(log^{1/ heta} n)$ rounds with $O( heta)$ inhibitors. We compliment these
    upper bounds with our main technical contribution, a nearly matching lower
    bound for networks using $ge loglog n$ inhibitors. Our lower bound uses
    familiar indistinguishability and locality arguments from distributed computing
    theory. It lets us derive a number of interesting conclusions about the
    structure of any network solving WTA with good probability, and the use of
    randomness and inhibition within such a network.

    Adaptive Convolutional ELM For Concept Drift Handling in Online Stream Data

    Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
    Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems. Special Issue on Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In big data era, the data continuously generated and its distribution may
    keep changes overtime. These challenges in online stream of data are known as
    concept drift. In this paper, we proposed the Adaptive Convolutional ELM method
    (ACNNELM) as enhancement of Convolutional Neural Network (CNN) with a hybrid
    Extreme Learning Machine (ELM) model plus adaptive capability. This method is
    aimed for concept drift handling. We enhanced the CNN as convolutional
    hiererchical features representation learner combined with Elastic ELM
    (E$^2$LM) as a parallel supervised classifier. We propose an Adaptive OS-ELM
    (AOS-ELM) for concept drift adaptability in classifier level (named ACNNELM-1)
    and matrices concatenation ensembles for concept drift adaptability in ensemble
    level (named ACNNELM-2). Our proposed Adaptive CNNELM is flexible that works
    well in classifier level and ensemble level while most current methods only
    proposed to work on either one of the levels.

    We verified our method in extended MNIST data set and not MNIST data set. We
    set the experiment to simulate virtual drift, real drift, and hybrid drift
    event and we demonstrated how our CNNELM adaptability works. Our proposed
    method works well and gives better accuracy, computation scalability, and
    concept drifts adaptability compared to the regular ELM and CNN. Further
    researches are still required to study the optimum parameters and to use more
    varied image data set.


    Computer Vision and Pattern Recognition

    Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization

    Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
    Comments: 17 pages, 16 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We propose a technique for making Convolutional Neural Network (CNN)-based
    models more transparent by visualizing the regions of input that are
    “important” for predictions from these models – or visual explanations.

    Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM),
    uses the class-specific gradient information flowing into the final
    convolutional layer of a CNN to produce a coarse localization map of the
    important regions in the image. Grad-CAM is a strict generalization of the
    Class Activation Mapping. Unlike CAM, Grad-CAM requires no re-training and is
    broadly applicable to any CNN-based architectures. We also show how Grad-CAM
    may be combined with existing pixel-space visualizations to create a
    high-resolution class-discriminative visualization (Guided Grad-CAM). We
    generate Grad-CAM and Guided Grad-CAM visual explanations to better understand
    image classification, image captioning, and visual question answering (VQA)
    models. In the context of image classification models, our visualizations (a)
    lend insight into their failure modes showing that seemingly unreasonable
    predictions have reasonable explanations, and (b) outperform pixel-space
    gradient visualizations (Guided Backpropagation and Deconvolution) on the
    ILSVRC-15 weakly supervised localization task. For image captioning and VQA,
    our visualizations expose the somewhat surprising insight that common CNN +
    LSTM models can often be good at localizing discriminative input image regions
    despite not being trained on grounded image-text pairs.

    Finally, we design and conduct human studies to measure if Guided Grad-CAM
    explanations help users establish trust in the predictions made by deep
    networks. Interestingly, we show that Guided Grad-CAM helps untrained users
    successfully discern a “stronger” deep network from a “weaker” one even when
    both networks make identical predictions.

    Deep Learning with Separable Convolutions

    François Chollet
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present an interpretation of Inception modules in convolutional neural
    networks as being an intermediate step in-between regular convolution and the
    recently introduced “separable convolution” operation. In this light, a
    separable convolution can be understood as an Inception module with a maximally
    large number of towers. This observation leads us to propose a novel deep
    convolutional neural network architecture inspired by Inception, where
    Inception modules have been replaced with separable convolutions. We show that
    this architecture, dubbed Xception, slightly outperforms Inception V3 on the
    ImageNet dataset (which Inception V3 was designed for), and significantly
    outperforms Inception V3 on a larger image classification dataset comprising
    350 million images and 17,000 classes. Since the Xception architecture has the
    same number of parameter as Inception V3, the performance gains are not due to
    increased capacity but rather to a more efficient use of model parameters.

    Optimization of Convolutional Neural Network using Microcanonical Annealing Algorithm

    Vina Ayumi, L.M. Rasdi Rere, Mohamad Ivan Fanany, Aniati Murni Arymurthy
    Comments: Accepted to be published at IEEE ICACSIS 2016. arXiv admin note: text overlap with arXiv:1610.01925
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural network (CNN) is one of the most prominent architectures
    and algorithm in Deep Learning. It shows a remarkable improvement in the
    recognition and classification of objects. This method has also been proven to
    be very effective in a variety of computer vision and machine learning
    problems. As in other deep learning, however, training the CNN is interesting
    yet challenging. Recently, some metaheuristic algorithms have been used to
    optimize CNN using Genetic Algorithm, Particle Swarm Optimization, Simulated
    Annealing and Harmony Search. In this paper, another type of metaheuristic
    algorithms with different strategy has been proposed, i.e. Microcanonical
    Annealing to optimize Convolutional Neural Network. The performance of the
    proposed method is tested using the MNIST and CIFAR-10 datasets. Although
    experiment results of MNIST dataset indicate the increase in computation time
    (1.02x – 1.38x), nevertheless this proposed method can considerably enhance the
    performance of the original CNN (up to 4.60\%). On the CIFAR10 dataset,
    currently, state of the art is 96.53\% using fractional pooling, while this
    proposed method achieves 99.14\%.

    Deep Image Aesthetics Classification using Inception Modules and Fine-tuning Connected Layer

    Xin Jin, Jingying Chi, Siwei Peng, Yulu Tian, Chaochen Ye, Xiaodong Li
    Comments: To Appear in the Proceedings of the 8th International Conference on Wireless Communications and Signal Processing (WCSP), Yangzhou, China, 13-15 October, 2016. arXiv admin note: substantial text overlap with arXiv:1409.4842 by other authors
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    In this paper we investigate the image aesthetics classification problem,
    aka, automatically classifying an image into low or high aesthetic quality,
    which is quite a challenging problem beyond image recognition. Deep
    convolutional neural network (DCNN) methods have recently shown promising
    results for image aesthetics assessment. Currently, a powerful inception module
    is proposed which shows very high performance in object classification.
    However, the inception module has not been taken into consideration for the
    image aesthetics assessment problem. In this paper, we propose a novel DCNN
    structure codenamed ILGNet for image aesthetics classification, which
    introduces the Inception module and connects intermediate Local layers to the
    Global layer for the output. Besides, we use a pre-trained image classification
    CNN called GoogLeNet on the ImageNet dataset and fine tune our connected local
    and global layer on the large scale aesthetics assessment AVA dataset. The
    experimental results show that the proposed ILGNet outperforms the state of the
    art results in image aesthetics assessment in the AVA benchmark.

    Learning Grimaces by Watching TV

    Samuel Albanie, Andrea Vedaldi
    Comments: British Machine Vision Conference (BMVC) 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Differently from computer vision systems which require explicit supervision,
    humans can learn facial expressions by observing people in their environment.
    In this paper, we look at how similar capabilities could be developed in
    machine vision. As a starting point, we consider the problem of relating facial
    expressions to objectively measurable events occurring in videos. In
    particular, we consider a gameshow in which contestants play to win significant
    sums of money. We extract events affecting the game and corresponding facial
    expressions objectively and automatically from the videos, obtaining large
    quantities of labelled data for our study. We also develop, using benchmarks
    such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial
    expression recognition, showing that pre-training on face verification data can
    be highly beneficial for this task. Then, we extend these models to use facial
    expressions to predict events in videos and learn nameable expressions from
    them. The dataset and emotion recognition models are available at
    this http URL

    Automated Detection of Individual Micro-calcifications from Mammograms using a Multi-stage Cascade Approach

    Zhi Lu, Gustavo Carneiro, Neeraj Dhungel, Andrew P. Bradley
    Comments: 5 Pages, ISBI 2017 Submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In mammography, the efficacy of computer-aided detection methods depends, in
    part, on the robust localisation of micro-calcifications ($mu$C). Currently,
    the most effective methods are based on three steps: 1) detection of individual
    $mu$C candidates, 2) clustering of individual $mu$C candidates, and 3)
    classification of $mu$C clusters. Where the second step is motivated both to
    reduce the number of false positive detections from the first step and on the
    evidence that malignancy depends on a relatively large number of $mu$C
    detections within a certain area. In this paper, we propose a novel approach to
    $mu$C detection, consisting of the detection emph{and} classification of
    individual $mu$C candidates, using shape and appearance features, using a
    cascade of boosting classifiers. The final step in our approach then clusters
    the remaining individual $mu$C candidates. The main advantage of this approach
    lies in its ability to reject a significant number of false positive $mu$C
    candidates compared to previously proposed methods. Specifically, on the
    INbreast dataset, we show that our approach has a true positive rate (TPR) for
    individual $mu$Cs of 40\% at one false positive per image (FPI) and a TPR of
    80\% at 10 FPI. These results are significantly more accurate than the current
    state of the art, which has a TPR of less than 1\% at one FPI and a TPR of 10\%
    at 10 FPI. Our results are competitive with the state of the art at the
    subsequent stage of detecting clusters of $mu$Cs.

    Weakly supervised learning of actions from transcripts

    Hilde Kuehne, Alexander Richard, Juergen Gall
    Comments: 27 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present an approach for weakly supervised learning of human actions from
    video transcriptions. Our system is based on the idea that, given a sequence of
    input data and a transcript, i.e. a list of the order the actions occur in the
    video, it is possible to infer the actions within the video stream, and thus,
    learn the related action models without the need for any frame-based
    annotation. Starting from the transcript information at hand, we split the
    given data sequences uniformly based on the number of expected actions. We then
    learn action models for each class by maximizing the probability that the
    training video sequences are generated by the action models given the sequence
    order as defined by the transcripts. The learned model can be used to
    temporally segment an unseen video with or without transcript. We evaluate our
    approach on four distinct activity datasets, namely Hollywood Extended, MPII
    Cooking, Breakfast and CRIM13. We show that our system is able to align the
    scripted actions with the video data and that the learned models localize and
    classify actions competitively in comparison to models trained with full
    supervision, i.e. with frame level annotations, and that they outperform any
    current state-of-the-art approach for aligning transcripts with video data.

    Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional Neural Networks and 3D Conditional Random Fields

    Patrick Ferdinand Christ, Mohamed Ezzeldin A. Elshaer, Florian Ettlinger, Sunil Tatavarty, Marc Bickel, Patrick Bilic, Markus Rempfler, Marco Armbruster, Felix Hofmann, Melvin D'Anastasi, Wieland H. Sommer, Seyed-Ahmad Ahmadi, Bjoern H. Menze
    Comments: Accepted at MICCAI 2016. Source code available on this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic segmentation of the liver and its lesion is an important step
    towards deriving quantitative biomarkers for accurate clinical diagnosis and
    computer-aided decision support systems. This paper presents a method to
    automatically segment liver and lesions in CT abdomen images using cascaded
    fully convolutional neural networks (CFCNs) and dense 3D conditional random
    fields (CRFs). We train and cascade two FCNs for a combined segmentation of the
    liver and its lesions. In the first step, we train a FCN to segment the liver
    as ROI input for a second FCN. The second FCN solely segments lesions from the
    predicted liver ROIs of step 1. We refine the segmentations of the CFCN using a
    dense 3D CRF that accounts for both spatial coherence and appearance. CFCN
    models were trained in a 2-fold cross-validation on the abdominal CT dataset
    3DIRCAD comprising 15 hepatic tumor volumes. Our results show that CFCN-based
    semantic liver and lesion segmentation achieves Dice scores over 94% for liver
    with computation times below 100s per volume. We experimentally demonstrate the
    robustness of the proposed method as a decision support system with a high
    accuracy and speed for usage in daily clinical routine.

    Places: An Image Database for Deep Scene Understanding

    Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    The rise of multi-million-item dataset initiatives has enabled data-hungry
    machine learning algorithms to reach near-human semantic classification at
    tasks such as object and scene recognition. Here we describe the Places
    Database, a repository of 10 million scene photographs, labeled with scene
    semantic categories and attributes, comprising a quasi-exhaustive list of the
    types of environments encountered in the world. Using state of the art
    Convolutional Neural Networks, we provide impressive baseline performances at
    scene classification. With its high-coverage and high-diversity of exemplars,
    the Places Database offers an ecosystem to guide future progress on currently
    intractable visual recognition problems.

    Distributed Averaging CNN-ELM for Big Data

    Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
    Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Increasing the scalability of machine learning to handle big volume of data
    is a challenging task. The scale up approach has some limitations. In this
    paper, we proposed a scale out approach for CNN-ELM based on MapReduce on
    classifier level. Map process is the CNN-ELM training for certain partition of
    data. It involves many CNN-ELM models that can be trained asynchronously.
    Reduce process is the averaging of all CNN-ELM weights as final training
    result. This approach can save a lot of training time than single CNN-ELM
    models trained alone. This approach also increased the scalability of machine
    learning by combining scale out and scale up approaches. We verified our method
    in extended MNIST data set and not-MNIST data set experiment. However, it has
    some drawbacks by additional iteration learning parameters that need to be
    carefully taken and training data distribution that need to be carefully
    selected. Further researches to use more complex image data set are required.

    A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

    Dan Hendrycks, Kevin Gimpel
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We consider the two related problems of detecting if an example is
    misclassified or out-of-distribution. We present a simple baseline that
    utilizes probabilities from softmax distributions. Correctly classified
    examples tend to have greater maximum softmax probabilities than erroneously
    classified and out-of-distribution examples, allowing for their detection. We
    assess performance by defining several tasks in computer vision, natural
    language processing, and automatic speech recognition, showing the
    effectiveness of this baseline across all. We then show the baseline can
    sometimes be surpassed, demonstrating the room for future research on these
    underexplored detection tasks.


    Artificial Intelligence

    Adaptive Convolutional ELM For Concept Drift Handling in Online Stream Data

    Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
    Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems. Special Issue on Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In big data era, the data continuously generated and its distribution may
    keep changes overtime. These challenges in online stream of data are known as
    concept drift. In this paper, we proposed the Adaptive Convolutional ELM method
    (ACNNELM) as enhancement of Convolutional Neural Network (CNN) with a hybrid
    Extreme Learning Machine (ELM) model plus adaptive capability. This method is
    aimed for concept drift handling. We enhanced the CNN as convolutional
    hiererchical features representation learner combined with Elastic ELM
    (E$^2$LM) as a parallel supervised classifier. We propose an Adaptive OS-ELM
    (AOS-ELM) for concept drift adaptability in classifier level (named ACNNELM-1)
    and matrices concatenation ensembles for concept drift adaptability in ensemble
    level (named ACNNELM-2). Our proposed Adaptive CNNELM is flexible that works
    well in classifier level and ensemble level while most current methods only
    proposed to work on either one of the levels.

    We verified our method in extended MNIST data set and not MNIST data set. We
    set the experiment to simulate virtual drift, real drift, and hybrid drift
    event and we demonstrated how our CNNELM adaptability works. Our proposed
    method works well and gives better accuracy, computation scalability, and
    concept drifts adaptability compared to the regular ELM and CNN. Further
    researches are still required to study the optimum parameters and to use more
    varied image data set.

    Learning Macro-actions for State-Space Planning

    Sandra Castellanos-Paez (LIG Laboratoire d'Informatique de Grenoble), Damien Pellier (LIG Laboratoire d'Informatique de Grenoble), Humbert Fiorino (LIG Laboratoire d'Informatique de Grenoble), Sylvie Pesty (LIG Laboratoire d'Informatique de Grenoble)
    Comments: Journ{‘e}es Francophones sur la Planification, la D{‘e}cision et l’Apprentissage pour la conduite de syst{`e}mes (JFPDA 2016) , Jul 2016, Grenoble, France. 2016
    Subjects: Artificial Intelligence (cs.AI)

    Planning has achieved significant progress in recent years. Among the various
    approaches to scale up plan synthesis, the use of macro-actions has been widely
    explored. As a first stage towards the development of a solution to learn
    on-line macro-actions, we propose an algorithm to identify useful macro-actions
    based on data mining techniques. The integration in the planning search of
    these learned macro-actions shows significant improvements over four classical
    planning benchmarks.

    Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization

    Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
    Comments: 17 pages, 16 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We propose a technique for making Convolutional Neural Network (CNN)-based
    models more transparent by visualizing the regions of input that are
    “important” for predictions from these models – or visual explanations.

    Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM),
    uses the class-specific gradient information flowing into the final
    convolutional layer of a CNN to produce a coarse localization map of the
    important regions in the image. Grad-CAM is a strict generalization of the
    Class Activation Mapping. Unlike CAM, Grad-CAM requires no re-training and is
    broadly applicable to any CNN-based architectures. We also show how Grad-CAM
    may be combined with existing pixel-space visualizations to create a
    high-resolution class-discriminative visualization (Guided Grad-CAM). We
    generate Grad-CAM and Guided Grad-CAM visual explanations to better understand
    image classification, image captioning, and visual question answering (VQA)
    models. In the context of image classification models, our visualizations (a)
    lend insight into their failure modes showing that seemingly unreasonable
    predictions have reasonable explanations, and (b) outperform pixel-space
    gradient visualizations (Guided Backpropagation and Deconvolution) on the
    ILSVRC-15 weakly supervised localization task. For image captioning and VQA,
    our visualizations expose the somewhat surprising insight that common CNN +
    LSTM models can often be good at localizing discriminative input image regions
    despite not being trained on grounded image-text pairs.

    Finally, we design and conduct human studies to measure if Guided Grad-CAM
    explanations help users establish trust in the predictions made by deep
    networks. Interestingly, we show that Guided Grad-CAM helps untrained users
    successfully discern a “stronger” deep network from a “weaker” one even when
    both networks make identical predictions.

    Application of Ontologies in Cloud Computing: The State-Of-The-Art

    Fahim T. Imam
    Comments: 13 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

    This paper presents a systematic survey on existing literature and seminal
    works relevant to the application of ontologies in different aspects of Cloud
    computing. Our hypothesis is that ontologies along with their reasoning
    capabilities can have significant impact on improving various aspects of the
    Cloud computing phenomena. Ontologies can promote intelligent decision support
    mechanisms for various Cloud based services. They can also provide effective
    interoperability among the Cloud based systems and resources. This survey can
    promote a comprehensive understanding on the roles and significance of
    ontologies within the overall domain of Cloud Computing. Also, this project can
    potentially form the basis of new research area and possibilities for both
    ontology and Cloud computing communities.

    Deep Reinforcement Learning From Raw Pixels in Doom

    Danijar Hafner
    Comments: Bachelor’s thesis
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Using current reinforcement learning methods, it has recently become possible
    to learn to play unknown 3D games from raw pixels. In this work, we study the
    challenges that arise in such complex environments, and summarize current
    methods to approach these. We choose a task within the Doom game, that has not
    been approached yet. The goal for the agent is to fight enemies in a 3D world
    consisting of five rooms. We train the DQN and LSTM-A3C algorithms on this
    task. Results show that both algorithms learn sensible policies, but fail to
    achieve high scores given the amount of training. We provide insights into the
    learned behavior, which can serve as a valuable starting point for further
    research in the Doom domain.

    Places: An Image Database for Deep Scene Understanding

    Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    The rise of multi-million-item dataset initiatives has enabled data-hungry
    machine learning algorithms to reach near-human semantic classification at
    tasks such as object and scene recognition. Here we describe the Places
    Database, a repository of 10 million scene photographs, labeled with scene
    semantic categories and attributes, comprising a quasi-exhaustive list of the
    types of environments encountered in the world. Using state of the art
    Convolutional Neural Networks, we provide impressive baseline performances at
    scene classification. With its high-coverage and high-diversity of exemplars,
    the Places Database offers an ecosystem to guide future progress on currently
    intractable visual recognition problems.

    Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots

    Lei Tai, Ming Liu
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    Exploration in an unknown environment is the core functionality for mobile
    robots. Learning-based exploration methods, including convolutional neural
    networks, provide excellent strategies without human-designed logic for the
    feature extraction. But the conventional supervised learning algorithms cost
    lots of efforts on the labeling work of datasets inevitably. Scenes not
    included in the training set are mostly unrecognized either. We propose a deep
    reinforcement learning method for the exploration of mobile robots in an indoor
    environment with the depth information from an RGB-D sensor only. Based on the
    Deep Q-Network framework, the raw depth image is taken as the only input to
    estimate the Q values corresponding to all moving commands. The training of the
    network weights is end-to-end. In arbitrarily constructed simulation
    environments, we show that the robot can be quickly adapted to unfamiliar
    scenes without any man-made labeling. Besides, through analysis of receptive
    fields of feature representations, deep reinforcement learning motivates the
    convolutional networks to estimate the traversability of the scenes. The test
    results are compared with the exploration strategies separately based on deep
    learning or reinforcement learning. Even trained only in the simulated
    environment, experimental results in real-world environment demonstrate that
    the cognitive ability of robot controller is dramatically improved compared
    with the supervised method. We believe it is the first time that raw sensor
    information is used to build cognitive exploration strategy for mobile robots
    through end-to-end deep reinforcement learning.


    Information Retrieval

    Influence of Pokémon Go on Physical Activity: Study and Implications

    Tim Althoff, Ryen W. White, Eric Horvitz
    Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)

    Physical activity helps people maintain a healthy weight and reduces the risk
    for several chronic diseases. Although this knowledge is widely recognized,
    adults and children in many countries around the world do not get recommended
    amounts of physical activity. While many interventions are found to be
    ineffective at increasing physical activity or reaching inactive populations,
    there have been anecdotal reports of increased physical activity due to novel
    mobile games that embed game play in the physical world. The most recent and
    salient example of such a game is Pok’emon Go, which has reportedly reached
    tens of millions of users in the US and worldwide.

    We study the effect of Pok’emon Go on physical activity through a
    combination of signals from large-scale corpora of wearable sensor data and
    search engine logs for 32 thousand users over a period of three months.
    Pok’emon Go players are identified through search engine queries and activity
    is measured through accelerometry. We find that Pok’emon Go leads to
    significant increases in physical activity over a period of 30 days, with
    particularly engaged users (i.e., those making multiple search queries for
    details about game usage) increasing their activity by 1473 steps a day on
    average, a more than 25% increase compared to their prior activity level
    ($p<10^{-15}$). In the short time span of the study, we estimate that Pok’emon
    Go has added a total of 144 billion steps to US physical activity. Furthermore,
    Pok’emon Go has been able to increase physical activity across men and women
    of all ages, weight status, and prior activity levels showing this form of game
    leads to increases in physical activity with significant implications for
    public health. We find that Pok’emon Go is able to reach low activity
    populations while all four leading mobile health apps studied in this work
    largely draw from an already very active population.


    Computation and Language

    Challenges of Computational Processing of Code-Switching

    Özlem Çetinoğlu, Sarah Schulz, Ngoc Thang Vu
    Comments: Will appear in the Proceedings of the 2nd Workshop on Computational Approaches to Linguistic Code Switching @EMNLP, 2016
    Subjects: Computation and Language (cs.CL)

    This paper addresses challenges of Natural Language Processing (NLP) on
    non-canonical multilingual data in which two or more languages are mixed. It
    refers to code-switching which has become more popular in our daily life and
    therefore obtains an increasing amount of attention from the research
    community. We report our experience that cov- ers not only core NLP tasks such
    as normalisation, language identification, language modelling, part-of-speech
    tagging and dependency parsing but also more downstream ones such as machine
    translation and automatic speech recognition. We highlight and discuss the key
    problems for each of the tasks with supporting examples from different language
    pairs and relevant previous work.

    Morphology Generation for Statistical Machine Translation using Deep Learning Techniques

    Marta R. Costa-jussà, Carlos Escolano
    Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)

    Morphology unbalanced languages remains a big challenge in the context of
    machine translation. In this paper, we propose to de-couple machine translation
    from morphology generation in order to better deal with the problem. We
    investigate the morphology simplification with a reasonable trade-off between
    expected gain and generation complexity.

    For the Chinese-Spanish task, optimum morphological simplification is in
    gender and number. For this purpose, we design a new classification
    architecture which, compared to other standard machine learning techniques,
    obtains the best results. This proposed neural-based architecture consists of
    several layers: an embedding, a convolutional followed by a recurrent neural
    network and, finally, ends with sigmoid and softmax layers. We obtain
    classification results over 98% accuracy in gender classification, over 93% in
    number classification, and an overall translation improvement of 0.7 METEOR.

    There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction

    Courtney Napoles, Keisuke Sakaguchi, Joel Tetreault
    Comments: to appear in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP)
    Subjects: Computation and Language (cs.CL)

    Current methods for automatically evaluating grammatical error correction
    (GEC) systems rely on gold-standard references. However, these methods suffer
    from penalizing grammatical edits that are correct but not in the gold
    standard. We show that reference-less grammaticality metrics correlate very
    strongly with human judgments and are competitive with the leading
    reference-based evaluation metrics. By interpolating both methods, we achieve
    state-of-the-art correlation with human judgments. Finally, we show that GEC
    metrics are much more reliable when they are calculated at the sentence level
    instead of the corpus level. We have set up a CodaLab site for benchmarking GEC
    output using a common dataset and different evaluation metrics.


    Distributed, Parallel, and Cluster Computing

    Application of Ontologies in Cloud Computing: The State-Of-The-Art

    Fahim T. Imam
    Comments: 13 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

    This paper presents a systematic survey on existing literature and seminal
    works relevant to the application of ontologies in different aspects of Cloud
    computing. Our hypothesis is that ontologies along with their reasoning
    capabilities can have significant impact on improving various aspects of the
    Cloud computing phenomena. Ontologies can promote intelligent decision support
    mechanisms for various Cloud based services. They can also provide effective
    interoperability among the Cloud based systems and resources. This survey can
    promote a comprehensive understanding on the roles and significance of
    ontologies within the overall domain of Cloud Computing. Also, this project can
    potentially form the basis of new research area and possibilities for both
    ontology and Cloud computing communities.

    Online Fault-Tolerant Dynamic Event Region Detection in Sensor Networks via Trust Model

    Jiejie Wang, Bin Liu
    Comments: 6 pages, 3 figures, conference
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This paper proposes a Bayesian modeling approach to address the problem of
    online fault-tolerant dynamic event region detection in wireless sensor
    networks. In our model every network node is associated with a virtual
    community and a trust index, which quantitatively measures the trustworthiness
    of this node in its community. If a sensor node’s trust value is smaller than a
    threshold, it suggests that this node encounters a fault and thus its sensor
    reading can not be trusted at this moment. This concept of sensor node trust
    discriminates our model with the other alternatives, e.g.,the Markov random
    fields. The practical issues, including spatiotemporal correlations of neighbor
    nodes’ sensor readings, the presence of sensor faults and the requirement of
    online processing are linked together by the concept trust and are all taken
    into account in the modeling stage. Based on the proposed model, the trust
    value of each node is updated online by a particle filter algorithm upon the
    arrival of new observations. The decision on whether a node is located in the
    event region is made based upon the current estimate of this node’s trust
    value. Experimental results demonstrate that the proposed solution can provide
    striking better performance than existent methods in terms of error rate in
    detecting the event region.

    The Voting Farm: A Distributed Class for Software Voting

    Vincenzo De Florio
    Comments: Revised version of Technical Report ESAT/ACCA/1997/3, ESAT Dept., University of Leuven, Belgium
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This document describes a class of C functions implementing a distributed
    software voting mechanism for EPX or similar message passing multi-threaded
    environments. Such a tool may be used for example, to set up a restoring organ,
    i.e., an NMR (i.e., N-module redundant) system with N voters. In order to
    describe the tool we start defining its basic building block, the voter. A
    voter is defined as a software module connected to one user module and to a
    farm of fellow voters arranged into a clique. By means of the functions in the
    class the user module is able: to create a static “picture” of the voting farm,
    needed for the set up of the clique; to instantiate the local voter; to send
    input or control messages to that voter. No interlocutor is needed other than
    the local voter. The other user modules are supposed to create coherent
    pictures and instances of voters on other nodes of the machine and to manage
    consistently the task of their local intermediary. All technicalities
    concerning the set up of the clique and the exchange of messages between the
    voters are completely transparent to the user module. In the following the
    basic functionalities of the VotingFarm class will be discussed, namely how to
    set up a “passive farm”, or a non-alive topological representation of a
    yet-to-be-activated voting farm; how to initiate the voting farm; how to
    control the farm.

    Distributed Averaging CNN-ELM for Big Data

    Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
    Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Increasing the scalability of machine learning to handle big volume of data
    is a challenging task. The scale up approach has some limitations. In this
    paper, we proposed a scale out approach for CNN-ELM based on MapReduce on
    classifier level. Map process is the CNN-ELM training for certain partition of
    data. It involves many CNN-ELM models that can be trained asynchronously.
    Reduce process is the averaging of all CNN-ELM weights as final training
    result. This approach can save a lot of training time than single CNN-ELM
    models trained alone. This approach also increased the scalability of machine
    learning by combining scale out and scale up approaches. We verified our method
    in extended MNIST data set and not-MNIST data set experiment. However, it has
    some drawbacks by additional iteration learning parameters that need to be
    carefully taken and training data distribution that need to be carefully
    selected. Further researches to use more complex image data set are required.

    Causally consistent dynamic slicing

    Roly Perera, Deepak Garg, James Cheney
    Comments: in Proceedings of 27th International Conference on Concurrency Theory (CONCUR 2016)
    Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC); Logic in Computer Science (cs.LO)

    We offer a lattice-theoretic account of dynamic slicing for {pi}-calculus,
    building on prior work in the sequential setting. For any run of a concurrent
    program, we exhibit a Galois connection relating forward slices of the start
    configuration to backward slices of the end configuration. We prove that, up to
    lattice isomorphism, the same Galois connection arises for any causally
    equivalent execution, allowing an efficient concurrent implementation of
    slicing via a standard interleaving semantics. Our approach has been formalised
    in the dependently-typed language Agda.

    Near-Data Processing for Machine Learning

    Hyeokjun Choe, Seil Lee, Seongsik Park, Seijoon Kim, Eui-Young Chung, Sungroh Yoon
    Comments: 9 pages, 7 figures
    Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    In computer architecture, near-data processing (NDP) refers to augmenting the
    memory or the storage with processing power so that it can process the data
    stored therein, passing only the processed data upwards in the memory
    hierarchy. By offloading the computational burden of CPU and saving the need
    for transferring raw data, NDP has a great potential in terms of accelerating
    computation and reducing power consumption. Despite its potential, NDP had only
    limited success until recently, mainly due to the performance mismatch in logic
    and memory process technologies. Recently, there have been two major changes in
    the game, making NDP more appealing than ever. The first is the success of deep
    learning, which often requires frequent transfers of big data for training. The
    second is the advent of NAND flash-based solid-state drives (SSDs) containing
    multicore CPUs that can be used for data processing. In this paper, we evaluate
    the potential of NDP for machine learning using a new SSD platform that allows
    us to simulate in-storage processing (ISP) of machine learning workloads.
    Although our platform named ISPML can execute various algorithms, this paper
    focuses on the stochastic gradient decent (SGD) algorithm, which is the de
    facto standard method for training deep neural networks. We implement and
    compare three variants of SGD (synchronous, downpour, and elastic averaging)
    using the ISP-ML platform, in which we exploit the multiple NAND channels for
    implementing parallel SGD. In addition, we compare the performance of ISP
    optimization and that of conventional in-host processing optimization. To the
    best of our knowledge, this is one of the first attempts to apply NDP to the
    optimization for machine learning.

    Stochastic Averaging for Constrained Optimization with Application to Online Resource Allocation

    Tianyi Chen, Aryan Mokhtari, Xin Wang, Alejandro Ribeiro, Georgios B. Giannakis
    Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)

    Existing approaches to resource allocation for nowadays stochastic networks
    are challenged to meet fast convergence and tolerable delay requirements. In
    the era of data deluge and information explosion, the present paper leverages
    online learning advances to facilitate stochastic resource allocation tasks. By
    recognizing the central role of Lagrange multipliers, the underlying
    constrained optimization problem is formulated as a machine learning task
    involving both training and operational modes, with the goal of learning the
    sought multipliers in a fast and efficient manner. To this end, an
    order-optimal offline learning approach is developed first for batch training,
    and it is then generalized to the online setting with a procedure termed
    learn-and-adapt. The novel resource allocation protocol permeates benefits of
    stochastic approximation and statistical learning to obtain low-complexity
    online updates with learning errors close to the statistical accuracy limits,
    while still preserving adaptation performance, which in the stochastic network
    optimization context guarantees queue stability. Analysis and simulated tests
    demonstrate that the proposed data-driven approach improves the delay and
    convergence performance of existing resource allocation schemes.

    Computational Tradeoffs in Biological Neural Networks: Self-Stabilizing Winner-Take-All Networks

    Nancy Lynch, Cameron Musco, Merav Parter
    Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Neurons and Cognition (q-bio.NC)

    We initiate a line of investigation into biological neural networks from an
    algorithmic perspective. We develop a simplified but biologically plausible
    model for distributed computation in stochastic spiking neural networks and
    study tradeoffs between computation time and network complexity in this model.
    Our aim is to abstract real neural networks in a way that, while not capturing
    all interesting features, preserves high-level behavior and allows us to make
    biologically relevant conclusions.

    In this paper, we focus on the important `winner-take-all’ (WTA) problem,
    which is analogous to a neural leader election unit: a network consisting of
    $n$ input neurons and $n$ corresponding output neurons must converge to a state
    in which a single output corresponding to a firing input (the `winner’) fires,
    while all other outputs remain silent. Neural circuits for WTA rely on
    inhibitory neurons, which suppress the activity of competing outputs and drive
    the network towards a converged state with a single firing winner. We attempt
    to understand how the number of inhibitors used affects network convergence
    time.

    We show that it is possible to significantly outperform naive WTA
    constructions through a more refined use of inhibition, solving the problem in
    $O( heta)$ rounds in expectation with just $O(log^{1/ heta} n)$ inhibitors
    for any $ heta$. An alternative construction gives convergence in
    $O(log^{1/ heta} n)$ rounds with $O( heta)$ inhibitors. We compliment these
    upper bounds with our main technical contribution, a nearly matching lower
    bound for networks using $ge loglog n$ inhibitors. Our lower bound uses
    familiar indistinguishability and locality arguments from distributed computing
    theory. It lets us derive a number of interesting conclusions about the
    structure of any network solving WTA with good probability, and the use of
    randomness and inhibition within such a network.


    Learning

    Distributed Averaging CNN-ELM for Big Data

    Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
    Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Increasing the scalability of machine learning to handle big volume of data
    is a challenging task. The scale up approach has some limitations. In this
    paper, we proposed a scale out approach for CNN-ELM based on MapReduce on
    classifier level. Map process is the CNN-ELM training for certain partition of
    data. It involves many CNN-ELM models that can be trained asynchronously.
    Reduce process is the averaging of all CNN-ELM weights as final training
    result. This approach can save a lot of training time than single CNN-ELM
    models trained alone. This approach also increased the scalability of machine
    learning by combining scale out and scale up approaches. We verified our method
    in extended MNIST data set and not-MNIST data set experiment. However, it has
    some drawbacks by additional iteration learning parameters that need to be
    carefully taken and training data distribution that need to be carefully
    selected. Further researches to use more complex image data set are required.

    Deep Reinforcement Learning From Raw Pixels in Doom

    Danijar Hafner
    Comments: Bachelor’s thesis
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Using current reinforcement learning methods, it has recently become possible
    to learn to play unknown 3D games from raw pixels. In this work, we study the
    challenges that arise in such complex environments, and summarize current
    methods to approach these. We choose a task within the Doom game, that has not
    been approached yet. The goal for the agent is to fight enemies in a 3D world
    consisting of five rooms. We train the DQN and LSTM-A3C algorithms on this
    task. Results show that both algorithms learn sensible policies, but fail to
    achieve high scores given the amount of training. We provide insights into the
    learned behavior, which can serve as a valuable starting point for further
    research in the Doom domain.

    QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent

    Dan Alistarh, Jerry Li, Ryota Tomioka, Milan Vojnovic
    Subjects: Learning (cs.LG); Data Structures and Algorithms (cs.DS)

    Parallel implementations of stochastic gradient descent (SGD) have received
    significant research attention, thanks to excellent scalability properties of
    this algorithm, and to its efficiency in the context of training deep neural
    networks. A fundamental barrier for parallelizing large-scale SGD is the fact
    that the cost of communicating the gradient updates between nodes can be very
    large. Consequently, lossy compresion heuristics have been proposed, by which
    nodes only communicate quantized gradients. Although effective in practice,
    these heuristics do not always provably converge, and it is not clear whether
    they are optimal.

    In this paper, we propose Quantized SGD (QSGD), a family of compression
    schemes which allow the compression of gradient updates at each node, while
    guaranteeing convergence under standard assumptions. QSGD allows the user to
    trade off compression and convergence time: it can communicate a sublinear
    number of bits per iteration in the model dimension, and can achieve
    asymptotically optimal communication cost. We complement our theoretical
    results with empirical data, showing that QSGD can significantly reduce
    communication cost, while being competitive with standard uncompressed
    techniques on a variety of real tasks.

    Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization

    Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
    Comments: 17 pages, 16 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We propose a technique for making Convolutional Neural Network (CNN)-based
    models more transparent by visualizing the regions of input that are
    “important” for predictions from these models – or visual explanations.

    Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM),
    uses the class-specific gradient information flowing into the final
    convolutional layer of a CNN to produce a coarse localization map of the
    important regions in the image. Grad-CAM is a strict generalization of the
    Class Activation Mapping. Unlike CAM, Grad-CAM requires no re-training and is
    broadly applicable to any CNN-based architectures. We also show how Grad-CAM
    may be combined with existing pixel-space visualizations to create a
    high-resolution class-discriminative visualization (Guided Grad-CAM). We
    generate Grad-CAM and Guided Grad-CAM visual explanations to better understand
    image classification, image captioning, and visual question answering (VQA)
    models. In the context of image classification models, our visualizations (a)
    lend insight into their failure modes showing that seemingly unreasonable
    predictions have reasonable explanations, and (b) outperform pixel-space
    gradient visualizations (Guided Backpropagation and Deconvolution) on the
    ILSVRC-15 weakly supervised localization task. For image captioning and VQA,
    our visualizations expose the somewhat surprising insight that common CNN +
    LSTM models can often be good at localizing discriminative input image regions
    despite not being trained on grounded image-text pairs.

    Finally, we design and conduct human studies to measure if Guided Grad-CAM
    explanations help users establish trust in the predictions made by deep
    networks. Interestingly, we show that Guided Grad-CAM helps untrained users
    successfully discern a “stronger” deep network from a “weaker” one even when
    both networks make identical predictions.

    Adaptive Convolutional ELM For Concept Drift Handling in Online Stream Data

    Arif Budiman, Mohamad Ivan Fanany, Chan Basaruddin
    Comments: Submitted to IEEE Transactions on Systems, Man and Cybernetics: Systems. Special Issue on Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In big data era, the data continuously generated and its distribution may
    keep changes overtime. These challenges in online stream of data are known as
    concept drift. In this paper, we proposed the Adaptive Convolutional ELM method
    (ACNNELM) as enhancement of Convolutional Neural Network (CNN) with a hybrid
    Extreme Learning Machine (ELM) model plus adaptive capability. This method is
    aimed for concept drift handling. We enhanced the CNN as convolutional
    hiererchical features representation learner combined with Elastic ELM
    (E$^2$LM) as a parallel supervised classifier. We propose an Adaptive OS-ELM
    (AOS-ELM) for concept drift adaptability in classifier level (named ACNNELM-1)
    and matrices concatenation ensembles for concept drift adaptability in ensemble
    level (named ACNNELM-2). Our proposed Adaptive CNNELM is flexible that works
    well in classifier level and ensemble level while most current methods only
    proposed to work on either one of the levels.

    We verified our method in extended MNIST data set and not MNIST data set. We
    set the experiment to simulate virtual drift, real drift, and hybrid drift
    event and we demonstrated how our CNNELM adaptability works. Our proposed
    method works well and gives better accuracy, computation scalability, and
    concept drifts adaptability compared to the regular ELM and CNN. Further
    researches are still required to study the optimum parameters and to use more
    varied image data set.

    Effective Classification of MicroRNA Precursors Using Combinatorial Feature Mining and AdaBoost Algorithms

    Ling Zhong, Jason T. L. Wang
    Comments: 26 pages, 3 figures
    Subjects: Genomics (q-bio.GN); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)

    MicroRNAs (miRNAs) are non-coding RNAs with approximately 22 nucleotides (nt)
    that are derived from precursor molecules. These precursor molecules or
    pre-miRNAs often fold into stem-loop hairpin structures. However, a large
    number of sequences with pre-miRNA-like hairpins can be found in genomes. It is
    a challenge to distinguish the real pre-miRNAs from other hairpin sequences
    with similar stem-loops (referred to as pseudo pre-miRNAs). Several
    computational methods have been developed to tackle this challenge. In this
    paper we propose a new method, called MirID, for identifying and classifying
    microRNA precursors. We collect 74 features from the sequences and secondary
    structures of pre-miRNAs; some of these features are taken from our previous
    studies on non-coding RNA prediction while others were suggested in the
    literature. We develop a combinatorial feature mining algorithm to identify
    suitable feature sets. These feature sets are then used to train support vector
    machines to obtain classification models, based on which classifier ensemble is
    constructed. Finally we use an AdaBoost algorithm to further enhance the
    accuracy of the classifier ensemble. Experimental results on a variety of
    species demonstrate the good performance of the proposed method, and its
    superiority over existing tools.

    Near-Data Processing for Machine Learning

    Hyeokjun Choe, Seil Lee, Seongsik Park, Seijoon Kim, Eui-Young Chung, Sungroh Yoon
    Comments: 9 pages, 7 figures
    Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

    In computer architecture, near-data processing (NDP) refers to augmenting the
    memory or the storage with processing power so that it can process the data
    stored therein, passing only the processed data upwards in the memory
    hierarchy. By offloading the computational burden of CPU and saving the need
    for transferring raw data, NDP has a great potential in terms of accelerating
    computation and reducing power consumption. Despite its potential, NDP had only
    limited success until recently, mainly due to the performance mismatch in logic
    and memory process technologies. Recently, there have been two major changes in
    the game, making NDP more appealing than ever. The first is the success of deep
    learning, which often requires frequent transfers of big data for training. The
    second is the advent of NAND flash-based solid-state drives (SSDs) containing
    multicore CPUs that can be used for data processing. In this paper, we evaluate
    the potential of NDP for machine learning using a new SSD platform that allows
    us to simulate in-storage processing (ISP) of machine learning workloads.
    Although our platform named ISPML can execute various algorithms, this paper
    focuses on the stochastic gradient decent (SGD) algorithm, which is the de
    facto standard method for training deep neural networks. We implement and
    compare three variants of SGD (synchronous, downpour, and elastic averaging)
    using the ISP-ML platform, in which we exploit the multiple NAND channels for
    implementing parallel SGD. In addition, we compare the performance of ISP
    optimization and that of conventional in-host processing optimization. To the
    best of our knowledge, this is one of the first attempts to apply NDP to the
    optimization for machine learning.

    Temporal Ensembling for Semi-Supervised Learning

    Samuli Laine, Timo Aila
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    In this paper, we present a simple and efficient method for training deep
    neural networks in a semi-supervised setting where only a small portion of
    training data is labeled. We introduce temporal ensembling, where we form a
    consensus prediction of the unknown labels under multiple instances of the
    network-in-training on different epochs, and most importantly, under different
    regularization and input augmentation conditions. This ensemble prediction can
    be expected to be a better predictor for the unknown labels than the output of
    the network at the most recent training epoch, and can thus be used as a target
    for training. Using our method, we set new records for two standard
    semi-supervised learning benchmarks, reducing the classification error rate
    from 18.63% to 12.89% in CIFAR-10 with 4000 labels and from 18.44% to 6.83% in
    SVHN with 500 labels.

    Stochastic Averaging for Constrained Optimization with Application to Online Resource Allocation

    Tianyi Chen, Aryan Mokhtari, Xin Wang, Alejandro Ribeiro, Georgios B. Giannakis
    Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Machine Learning (stat.ML)

    Existing approaches to resource allocation for nowadays stochastic networks
    are challenged to meet fast convergence and tolerable delay requirements. In
    the era of data deluge and information explosion, the present paper leverages
    online learning advances to facilitate stochastic resource allocation tasks. By
    recognizing the central role of Lagrange multipliers, the underlying
    constrained optimization problem is formulated as a machine learning task
    involving both training and operational modes, with the goal of learning the
    sought multipliers in a fast and efficient manner. To this end, an
    order-optimal offline learning approach is developed first for batch training,
    and it is then generalized to the online setting with a procedure termed
    learn-and-adapt. The novel resource allocation protocol permeates benefits of
    stochastic approximation and statistical learning to obtain low-complexity
    online updates with learning errors close to the statistical accuracy limits,
    while still preserving adaptation performance, which in the stochastic network
    optimization context guarantees queue stability. Analysis and simulated tests
    demonstrate that the proposed data-driven approach improves the delay and
    convergence performance of existing resource allocation schemes.

    A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

    Dan Hendrycks, Kevin Gimpel
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We consider the two related problems of detecting if an example is
    misclassified or out-of-distribution. We present a simple baseline that
    utilizes probabilities from softmax distributions. Correctly classified
    examples tend to have greater maximum softmax probabilities than erroneously
    classified and out-of-distribution examples, allowing for their detection. We
    assess performance by defining several tasks in computer vision, natural
    language processing, and automatic speech recognition, showing the
    effectiveness of this baseline across all. We then show the baseline can
    sometimes be surpassed, demonstrating the room for future research on these
    underexplored detection tasks.

    Stochastic Games for Smart Grid Energy Management with Prospect Prosumers

    Seyed Rasoul Etesami, Walid Saad, Narayan Mandayam, H. Vincent Poor
    Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Learning (cs.LG); Systems and Control (cs.SY)

    In this paper, the problem of smart grid energy management under stochastic
    dynamics is investigated. In the considered model, at the demand side, it is
    assumed that customers can act as prosumers who own renewable energy sources
    and can both produce and consume energy. Due to the coupling between the
    prosumers’ decisions and the stochastic nature of renewable energy, the
    interaction among prosumers is formulated as a stochastic game, in which each
    prosumer seeks to maximize its payoff, in terms of revenues, by controlling its
    energy consumption and demand. In particular, the subjective behavior of
    prosumers is explicitly reflected into their payoff functions using prospect
    theory, a powerful framework that allows modeling real-life human choices. For
    this prospect-based stochastic game, it is shown that there always exists a
    stationary Nash equilibrium where the prosumers’ trading policies in the
    equilibrium are independent of the time and their histories of the play.
    Moreover, a novel distributed algorithm with no information sharing among
    prosumers is proposed and shown to converge to an $epsilon$-Nash equilibrium.
    On the other hand, at the supply side, the interaction between the utility
    company and the prosumers is formulated as an online optimization problem in
    which the utility company’s goal is to learn its optimal energy allocation
    rules. For this case, it is shown that such an optimization problem admits a
    no-regret algorithm meaning that regardless of the actual outcome of the game
    among the prosumers, the utility company can follow a strategy that mitigates
    its allocation costs as if it knew the entire demand market a priori.
    Simulation results show the convergence of the proposed algorithms to their
    predicted outcomes and present new insights resulting from prospect theory that
    contribute toward more efficient energy management in the smart grids.


    Information Theory

    Energy-Efficient Beam Coordination Strategies with Rate Dependent Processing Power

    Oskari Tervo, Antti Tölli, Markku Juntti, Le-Nam Tran
    Comments: Submitted for possible publication, 32 pages, 11 figures
    Subjects: Information Theory (cs.IT)

    This paper proposes energy-efficient coordinated beamforming strategies for
    multi-cell multi-user multiple-input single-output system. We consider a
    practical power consumption model, where part of the consumed power depends on
    the base station or user specific data rates due to coding, decoding and
    backhaul. This is different from the existing approaches where the base station
    power consumption has been assumed to be a convex or linear function. Two
    optimization criteria are considered, namely network energy efficiency
    maximization and weighted sum energy efficiency maximization. We develop
    successive convex approximation based algorithms to tackle these difficult
    nonconvex problems. We further propose decentralized implementations for the
    considered problems, in which base stations perform parallel and distributed
    computation based on local channel state information and limited backhaul
    information exchange. The decentralized approaches admit closed-form solutions
    and can be implemented without invoking a generic external convex solver. The
    effect of pilot contamination caused by pilot reuse is also taken into account
    in the energy efficiency problems. To achieve energy efficiency improvements
    with a limited number of pilot resources, we propose a heuristic
    energy-efficient pilot allocation strategy to mitigate the pilot contamination
    effect. The numerical results are provided to demonstrate that the rate
    dependent power consumption has a large impact on the system energy efficiency,
    and, thus, has to be taken into account when devising energy-efficient
    transmission strategies. We also investigate the effect of pilot contamination
    and show that the proposed pilot allocation strategy achieve significant
    performance improvements when a limited number of pilot resources is available.

    Performance analysis of multi-dimensional ESPRIT-type algorithms for arbitrary and strictly non-circular sources with spatial smoothing

    Jens Steinwandt, Florian Roemer, Martin Haardt, Giovanni Del Galdo
    Comments: submitted to IEEE Transactions on Signal Processing on 18/01/2016
    Subjects: Information Theory (cs.IT)

    Spatial smoothing is a widely used preprocessing scheme to improve the
    performance of high-resolution parameter estimation algorithms in case of
    coherent signals or if only a small number of snapshots is available. In this
    paper, we present a first-order performance analysis of the spatially smoothed
    versions of R-D Standard ESPRIT and R-D Unitary ESPRIT for sources with
    arbitrary signal constellations as well as R-D NC Standard ESPRIT and R-D NC
    Unitary ESPRIT for strictly second-order (SO) non-circular (NC) sources. The
    derived expressions are asymptotic in the effective signal-to-noise ratio
    (SNR), i.e., the approximations become exact for either high SNRs or a large
    sample size. Moreover, no assumptions on the noise statistics are required
    apart from a zero-mean and finite SO moments. We show that both R-D NC
    ESPRIT-type algorithms with spatial smoothing perform asymptotically identical
    in the high effective SNR regime. Generally, the performance of spatial
    smoothing based algorithms depends on the number of subarrays, which is a
    design parameter and needs to be chosen beforehand. In order to gain more
    insights into the optimal choice of the number of subarrays, we simplify the
    derived analytical R-D mean square error (MSE) expressions for the special case
    of a single source. The obtained MSE expression explicitly depends on the
    number of subarrays in each dimension, which allows us to analytically find the
    optimal number of subarrays for spatial smoothing. Based on this result, we
    additionally derive the maximum asymptotic gain from spatial smoothing and
    explicitly compute the asymptotic efficiency for this special case. All the
    analytical results are verified by simulations.

    An Algebraic Approach to a Class of Rank-Constrained Semi-Definite Programs With Applications

    Matthew W. Morency, Sergiy A. Vorobyov
    Comments: 12 two columns pages, 5 figures, Submitted to IEEE Trans. Signal Processing on September 2016
    Subjects: Information Theory (cs.IT); Algebraic Geometry (math.AG)

    A new approach to solving a class of rankconstrained semi-definite
    programming (SDP) problems, which appear in many signal processing applications
    such as transmit beamspace design in multiple-input multiple-output (MIMO)
    radar, downlink beamforming design in MIMO communications, generalized sidelobe
    canceller design, phase retrieval, etc., is presented. The essence of the
    approach is the use of underlying algebraic structure enforced in such problems
    by other practical constraints such as, for example, null shaping constraint.
    According to this approach, instead of relaxing the non-convex rankconstrained
    SDP problem to a feasible set of positive semidefinite matrices, we restrict it
    to a space of polynomials whose dimension is equal to the desired rank. The
    resulting optimization problem is then convex as its solution is required to be
    full rank, and can be efficiently and exactly solved. A simple matrix
    decomposition is needed to recover the solution of the original problem from
    the solution of the restricted one. We show how this approach can be applied to
    solving some important signal processing problems that contain null-shaping
    constraints. As a byproduct of our study, the conjugacy of beamfoming and
    parameter estimation problems leads us to formulation of a new and rigorous
    criterion for signal/noise subspace identification. Simulation results are
    performed for the problem of rank-constrained beamforming design and show an
    exact agreement of the solution with the proposed algebraic structure, as well
    as significant performance improvements in terms of sidelobe suppression
    compared to the existing methods.

    On Bounded Rationality in Cyber-Physical Systems Security: Game-Theoretic Analysis with Application to Smart Grid Protection

    Anibal Sanjab, Walid Saad
    Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)

    In this paper, a general model for cyber-physical systems (CPSs), that
    captures the diffusion of attacks from the cyber layer to the physical system,
    is studied. In particular, a game-theoretic approach is proposed to analyze the
    interactions between one defender and one attacker over a CPS. In this game,
    the attacker launches cyber attacks on a number of cyber components of the CPS
    to maximize the potential harm to the physical system while the system operator
    chooses to defend a number of cyber nodes to thwart the attacks and minimize
    potential damage to the physical side. The proposed game explicitly accounts
    for the fact that both attacker and defender can have different computational
    capabilities and disparate levels of knowledge of the system. To capture such
    bounded rationality of attacker and defender, a novel approach inspired from
    the behavioral framework of cognitive hierarchy theory is developed. In this
    framework, the defender is assumed to be faced with an attacker that can have
    different possible thinking levels reflecting its knowledge of the system and
    computational capabilities. To solve the game, the optimal strategies of each
    attacker type are characterized and the optimal response of the defender facing
    these different types is computed. This general approach is applied to smart
    grid security considering wide area protection with energy markets
    implications. Numerical results show that a deviation from the Nash equilibrium
    strategy is beneficial when the bounded rationality of the attacker is
    considered. Moreover, the results show that the defender’s incentive to deviate
    from the Nash equilibrium decreases when faced with an attacker that has high
    computational ability.

    Prospect Theory for Enhanced Smart Grid Resilience Using Distributed Energy Storage

    Georges El Rahi, Anibal Sanjab, Walid Saad, Narayan B. Mandayam, H. Vincent Poor
    Comments: 54th Annual Allerton Conference on Communication, Control, and Computing
    Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)

    The proliferation of distributed generation and storage units is leading to
    the development of local, small-scale distribution grids, known as microgrids
    (MGs). In this paper, the problem of optimizing the energy trading decisions of
    MG operators (MGOs) is studied using game theory. In the formulated game, each
    MGO chooses the amount of energy that must be sold immediately or stored for
    future emergencies, given the prospective market prices which are influenced by
    other MGOs’ decisions. The problem is modeled using a Bayesian game to account
    for the incomplete information that MGOs have about each others’ levels of
    surplus. The proposed game explicitly accounts for each MGO’s subjective
    decision when faced with the uncertainty of its opponents’ energy surplus. In
    particular, the so-called framing effect, from the framework of prospect theory
    (PT), is used to account for each MGO’s valuation of its gains and losses with
    respect to an individual utility reference point. The reference point is
    typically different for each individual and originates from its past
    experiences and future aspirations. A closed-form expression for the Bayesian
    Nash equilibrium is derived for the standard game formulation. Under PT, a best
    response algorithm is proposed to find the equilibrium. Simulation results show
    that, depending on their individual reference points, MGOs can tend to store
    more or less energy under PT compared to classical game theory. In addition,
    the impact of the reference point is found to be more prominent as the
    emergency price set by the power company increases.

    Stochastic Games for Smart Grid Energy Management with Prospect Prosumers

    Seyed Rasoul Etesami, Walid Saad, Narayan Mandayam, H. Vincent Poor
    Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Learning (cs.LG); Systems and Control (cs.SY)

    In this paper, the problem of smart grid energy management under stochastic
    dynamics is investigated. In the considered model, at the demand side, it is
    assumed that customers can act as prosumers who own renewable energy sources
    and can both produce and consume energy. Due to the coupling between the
    prosumers’ decisions and the stochastic nature of renewable energy, the
    interaction among prosumers is formulated as a stochastic game, in which each
    prosumer seeks to maximize its payoff, in terms of revenues, by controlling its
    energy consumption and demand. In particular, the subjective behavior of
    prosumers is explicitly reflected into their payoff functions using prospect
    theory, a powerful framework that allows modeling real-life human choices. For
    this prospect-based stochastic game, it is shown that there always exists a
    stationary Nash equilibrium where the prosumers’ trading policies in the
    equilibrium are independent of the time and their histories of the play.
    Moreover, a novel distributed algorithm with no information sharing among
    prosumers is proposed and shown to converge to an $epsilon$-Nash equilibrium.
    On the other hand, at the supply side, the interaction between the utility
    company and the prosumers is formulated as an online optimization problem in
    which the utility company’s goal is to learn its optimal energy allocation
    rules. For this case, it is shown that such an optimization problem admits a
    no-regret algorithm meaning that regardless of the actual outcome of the game
    among the prosumers, the utility company can follow a strategy that mitigates
    its allocation costs as if it knew the entire demand market a priori.
    Simulation results show the convergence of the proposed algorithms to their
    predicted outcomes and present new insights resulting from prospect theory that
    contribute toward more efficient energy management in the smart grids.




沪ICP备19023445号-2号
友情链接