IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 11 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-11 00:00:00
    love 0

    Neural and Evolutionary Computing

    Proceedings of the Workshop on Brain Analysis using COnnectivity Networks – BACON 2016

    Sarah Parisot, Jonathan Passerat-Palmbach, Markus D. Schirmer, Boris Gutman
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Understanding brain connectivity in a network-theoretic context has shown
    much promise in recent years. This type of analysis identifies brain
    organisational principles, bringing a new perspective to neuroscience. At the
    same time, large public databases of connectomic data are now available.
    However, connectome analysis is still an emerging field and there is a crucial
    need for robust computational methods to fully unravelits potential. This
    workshop provides a platform to discuss the development of new analytic
    techniques; methods for evaluating and validating commonly used approaches; as
    well as the effects of variations in pre-processing steps.

    Computing threshold functions using dendrites

    Romain Cazé, Bartozs Teleńczuk, Alain Destexhe
    Comments: 5 pages 3 figures
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Neurons, modeled as linear threshold unit (LTU), can in theory compute all
    thresh- old functions. In practice, however, some of these functions require
    synaptic weights of arbitrary large precision. We show here that dendrites can
    alleviate this requirement. We introduce here the non-Linear Threshold Unit
    (nLTU) that integrates synaptic input sub-linearly within distinct subunits to
    take into account local saturation in dendrites. We systematically search
    parameter space of the nTLU and TLU to compare them. Firstly, this shows that
    the nLTU can compute all threshold functions with smaller precision weights
    than the LTU. Secondly, we show that a nLTU can compute significantly more
    functions than a LTU when an input can only make a single synapse. This work
    paves the way for a new generation of network made of nLTU with binary
    synapses.

    Temperature-Insensitive Analog Vector-by-Matrix Multiplier Based on 55 nm NOR Flash Memory Cells

    X. Guo, F. Merrikh Bayat, M. Prezioso, Y. Chen, B. Nguyen, N. Do, D. B. Strukov
    Comments: 4 pages, 11 pages
    Subjects: Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)

    We have fabricated and successfully tested an analog vector-by-matrix
    multiplier, based on redesigned 10×12 arrays of 55 nm commercial NOR flash
    memory cells. The modified arrays enable high-precision individual analog
    tuning of each cell, with sub-1% accuracy, while keeping the highly optimized
    cells, with their long-term state retention, intact. The array has an area of
    0.33 um^2 per cell, and is at least one order of magnitude more dense than the
    reported prior implementations of nonvolatile analog memories. The demonstrated
    vector-by-vector multiplier, using gate coupling to additional periphery cells,
    has ~2% precision, limited by the aggregate effect of cell noise, retention,
    mismatch, process variations, tuning precision, and capacitive crosstalk. A
    differential version of the multiplier has allowed us to demonstrate sub-3%
    temperature drift of the output signal in the range between 25C and 85C.

    Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

    Lukas Cavigelli, Dominic Bernath, Michele Magno, Luca Benini
    Comments: Presented at SPIE Security + Defence 2016 Proc. SPIE 9997, Target and Background Signatures II
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Detecting and classifying targets in video streams from surveillance cameras
    is a cumbersome, error-prone and expensive task. Often, the incurred costs are
    prohibitive for real-time monitoring. This leads to data being stored locally
    or transmitted to a central storage site for post-incident examination. The
    required communication links and archiving of the video data are still
    expensive and this setup excludes preemptive actions to respond to imminent
    threats. An effective way to overcome these limitations is to build a smart
    camera that transmits alerts when relevant video sequences are detected. Deep
    neural networks (DNNs) have come to outperform humans in visual classifications
    tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be
    extended to make use of higher-dimensional input data such as multispectral
    data. We explore this opportunity in terms of achievable accuracy and required
    computational effort. To analyze the precision of DNNs for scene labeling in an
    urban surveillance scenario we have created a dataset with 8 classes obtained
    in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR
    snapshot sensor to assess the potential of multispectral image data for target
    classification. We evaluate several new DNNs, showing that the spectral
    information fused together with the RGB frames can be used to improve the
    accuracy of the system or to achieve similar accuracy with a 3x smaller
    computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even
    for scarcely occurring, but particularly interesting classes, such as cars, 75%
    of the pixels are labeled correctly with errors occurring only around the
    border of the objects. This high accuracy was obtained with a training set of
    only 30 labeled images, paving the way for fast adaptation to various
    application scenarios.


    Computer Vision and Pattern Recognition

    Evaluating Urbanization from Satellite and Aerial Images by means of a statistical approach to the texture analysis

    Amelia Carolina Sparavigna
    Comments: Keywords: Image analysis, 2D textures; texture functions, satellite images, aerial images
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Statistical methods are usually applied in the processing of digital images
    for the analysis of the textures displayed by them. Aiming to evaluate the
    urbanization of a given location from satellite or aerial images, here we
    consider a simple processing to distinguish in them the ‘urban’ from the
    ‘rural’ texture. The method is based on the mean values and the standard
    deviations of the colour tones of image pixels. The processing of the input
    images allows to obtain some maps from which a quantitative evaluation of the
    textures can be obtained.

    Fast Algorithm of High-resolution Microwave Imaging Using the Non-parametric Generalized Reflectivity Model

    Long Gang Wang, Lianlin Li, Tie Jun Cui
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents an efficient algorithm of high-resolution microwave
    imaging based on the concept of generalized reflectivity. The contribution made
    in this paper is two-fold. We introduce the concept of non-parametric
    generalized reflectivity (GR, for short) as a function of operational
    frequencies and view angles, etc. The GR extends the conventional Born-based
    imaging model, i.e., single-scattering model, into that accounting for more
    realistic interaction between the electromagnetic wavefield and imaged scene.
    Afterwards, the GR-based microwave imaging is formulated in the convex of
    sparsity-regularized optimization. Typically, the sparsity-regularized
    optimization requires the implementation of iterative strategy, which is
    computationally expensive, especially for large-scale problems. To break this
    bottleneck, we convert the imaging problem into the problem of physics-driven
    image processing by introducing a dual transformation. Moreover, this image
    processing is performed over overlapping patches, which can be efficiently
    solved in the parallel or distributed manner. In this way, the proposed
    high-resolution imaging methodology could be applicable to large-scale
    microwave imaging problems. Selected simulation results are provided to
    demonstrate the state-of-art performance of proposed methodology.

    Variables effecting photomosaic reconstruction and ortho-rectification from aerial survey datasets

    Jonathan Byrne, Debra Laefer
    Comments: Presented at CERAI Conference 2016, Galway
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Unmanned aerial vehicles now make it possible to obtain high quality aerial
    imagery at a low cost, but processing those images into a single, useful entity
    is neither simple nor seamless. Specifically, there are factors that must be
    addressed when merging multiple images into a single coherent one. While
    ortho-rectification can be done, it tends to be expensive and time consuming.
    Image stitching offers a more economical, low-tech approach. However direct
    application tends to fail for low-elevation imagery due to one or more factors
    including insufficient keypoints, parallax issues, and homogeneity of the
    surveyed area. This paper discusses these problems and possible solutions when
    using techniques such as image stitching and structure from motion for
    generating ortho-rectified imagery. These are presented in terms of actual
    Irish projects including the Boland’s Mills building in Dublin’s city centre,
    the Kilmoon Cross Farm, and the Richview buildings on the University College
    Dublin campus. Implications for various Irish industries are explained in terms
    of both urban and rural projects.

    X-ray Scattering Image Classification Using Deep Learning

    Boyu Wang, Kevin Yager, Dantong Yu, Minh Hoai
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Visual inspection of x-ray scattering images is a powerful technique for
    probing the physical structure of materials at the molecular scale. In this
    paper, we explore the use of deep learning to develop methods for automatically
    analyzing x-ray scattering images. In particular, we apply Convolutional Neural
    Networks and Convolutional Autoencoders for x-ray scattering image
    classification. To acquire enough training data for deep learning, we use
    simulation software to generate synthetic x-ray scattering images. Experiments
    show that deep learning methods outperform previously published methods by 10\%
    on synthetic and real datasets.

    Detecting Moving Regions in CrowdCam Images

    Adi Dafni, Yael Moses, Shai Avidan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address the novel problem of detecting dynamic regions in CrowdCam images,
    a set of still images captured by a group of people. These regions capture the
    most interesting parts of the scene, and detecting them plays an important role
    in the analysis of visual data. Our method is based on the observation that
    matching static points must satisfy the epipolar geometry constraints, but
    computing exact matches is challenging. Instead, we compute the probability
    that a pixel has a match, not necessarily the correct one, along the
    corresponding epipolar line. The complement of this probability is not
    necessarily the probability of a dynamic point because of occlusions, noise,
    and matching errors. Therefore, information from all pairs of images is
    aggregated to obtain a high quality dynamic probability map, per image.
    Experiments on challenging datasets demonstrate the effectiveness of the
    algorithm on a broad range of settings; no prior knowledge about the scene, the
    camera characteristics or the camera locations is required.

    Error concealment by means of motion refinement and regularized Bregman divergence

    Alessandra M. Coelho, Vania V. Estrela, Felipe P. do Carmo, Sandro R. Fernandes
    Comments: 8 pages, 4 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This work addresses the problem of error concealment in video transmission
    systems over noisy channels employing Bregman divergences along with
    regularization. Error concealment intends to improve the effects of
    disturbances at the reception due to bit-errors or cell loss in packet
    networks. Bregman regularization gives accurate answers after just some
    iterations with fast convergence, better accuracy, and stability. This
    technique has an adaptive nature: the regularization functional is updated
    according to Bregman functions that change from iteration to iteration
    according to the nature of the neighborhood under study at iteration n.
    Numerical experiments show that high-quality regularization parameter estimates
    can be obtained. The convergence is sped up while turning the regularization
    parameter estimation less empiric, and more automatic.

    Real Time Video Analysis using Smart Phone Camera for Stroboscopic Image

    Somnath Mukherjee, Soumyajit Ganguly
    Comments: 5 pages, 2 Figures in SPIE Electronic Imaging 2015
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Motion capturing and there by segmentation of the motion of any moving object
    from a sequence of continuous images or a video is not an exceptional task in
    computer vision area. Smart-phone camera application is an added integration
    for the development of such tasks and it also provides for a smooth testing. A
    new approach has been proposed for segmenting out the foreground moving object
    from the background and then masking the sequential motion with the static
    background which is commonly known as stroboscopic image. In this paper the
    whole process of the stroboscopic image construction technique has been clearly
    described along with some necessary constraints which is due to the traditional
    problem of estimating and modeling dynamic background changes. The background
    subtraction technique has been properly estimated here and number of sequential
    motion have also been calculated with the correlation between the motion of the
    object and its time of occurrence. This can be a very effective application
    that can replace the traditional stroboscopic system using high end SLR
    cameras, tripod stand, shutter speed control and position etc.

    Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

    Lukas Cavigelli, Dominic Bernath, Michele Magno, Luca Benini
    Comments: Presented at SPIE Security + Defence 2016 Proc. SPIE 9997, Target and Background Signatures II
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Detecting and classifying targets in video streams from surveillance cameras
    is a cumbersome, error-prone and expensive task. Often, the incurred costs are
    prohibitive for real-time monitoring. This leads to data being stored locally
    or transmitted to a central storage site for post-incident examination. The
    required communication links and archiving of the video data are still
    expensive and this setup excludes preemptive actions to respond to imminent
    threats. An effective way to overcome these limitations is to build a smart
    camera that transmits alerts when relevant video sequences are detected. Deep
    neural networks (DNNs) have come to outperform humans in visual classifications
    tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be
    extended to make use of higher-dimensional input data such as multispectral
    data. We explore this opportunity in terms of achievable accuracy and required
    computational effort. To analyze the precision of DNNs for scene labeling in an
    urban surveillance scenario we have created a dataset with 8 classes obtained
    in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR
    snapshot sensor to assess the potential of multispectral image data for target
    classification. We evaluate several new DNNs, showing that the spectral
    information fused together with the RGB frames can be used to improve the
    accuracy of the system or to achieve similar accuracy with a 3x smaller
    computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even
    for scarcely occurring, but particularly interesting classes, such as cars, 75%
    of the pixels are labeled correctly with errors occurring only around the
    border of the objects. This high accuracy was obtained with a training set of
    only 30 labeled images, paving the way for fast adaptation to various
    application scenarios.

    Mahalanobis Distance for Class Averaging of Cryo-EM Images

    Tejal Bhamre, Zhizhen Zhao, Amit Singer
    Subjects: Applications (stat.AP); Computer Vision and Pattern Recognition (cs.CV); Biomolecules (q-bio.BM); Machine Learning (stat.ML)

    Single particle reconstruction (SPR) from cryo-electron microscopy (EM) is a
    technique in which the 3D structure of a molecule needs to be determined from
    its contrast transfer function (CTF) affected, noisy 2D projection images taken
    at unknown viewing directions. One of the main challenges in cryo-EM is the
    typically low signal to noise ratio (SNR) of the acquired images. 2D
    classification of images, followed by class averaging, improves the SNR of the
    resulting averages, and is used for selecting particles from micrographs and
    for inspecting the particle images. We introduce a new affinity measure, akin
    to the Mahalanobis distance, to compare cryo-EM images belonging to different
    defocus groups. The new similarity measure is employed to detect similar
    images, thereby leading to an improved algorithm for class averaging. We
    evaluate the performance of the proposed class averaging procedure on synthetic
    datasets, obtaining state of the art classification.


    Artificial Intelligence

    Song From PI: A Musically Plausible Network for Pop Music Generation

    Hang Chu, Raquel Urtasun, Sanja Fidler
    Comments: under review at ICLR 2017
    Subjects: Artificial Intelligence (cs.AI)

    We present a novel framework for generating pop music. Our model is a
    hierarchical Recurrent Neural Network, where the layers and the structure of
    the hierarchy encode our prior knowledge about how pop music is composed. In
    particular, the bottom layers generate the melody, while the higher levels
    produce the drums and chords. We conduct several human studies that show strong
    preference of our generated music over that produced by the recent method by
    Google. We additionally show two applications of our framework: neural dancing
    and karaoke, as well as neural story singing.

    XCSP3: An Integrated Format for Benchmarking Combinatorial Constrained Problems

    Frederic Boussemart, Christophe Lecoutre, Cédric Piette
    Comments: 230 pages
    Subjects: Artificial Intelligence (cs.AI)

    We propose a major revision of the format XCSP 2.1, called XCSP3, to build
    integrated representations of combinatorial constrained problems. This new
    format is able to deal with mono/multi optimization, many types of variables,
    cost functions, reification, views, annotations, variable quantification,
    distributed, probabilistic and qualitative reasoning. The new format is made
    compact, highly readable, and rather easy to parse. Interestingly, it captures
    the structure of the problem models, through the possibilities of declaring
    arrays of variables, and identifying syntactic and semantic groups of
    constraints. The number of constraints is kept under control by introducing a
    limited set of basic constraint forms, and producing almost automatically some
    of their variations through lifting, restriction, sliding, logical combination
    and relaxation mechanisms. As a result, XCSP3 encompasses practically all
    constraints that can be found in major constraint solvers developed by the CP
    community. A website, which is developed conjointly with the format, contains
    many models and series of instances. The user can make sophisticated queries
    for selecting instances from very precise criteria. The objective of XCSP3 is
    to ease the effort required to test and compare different algorithms by
    providing a common test-bed of combinatorial constrained instances.

    Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

    Emilio Jorge, Mikael Kågebäck, Emil Gustavsson
    Comments: 8 pages with 1 page appendix. Accepted to Deep Reinforcement Learning Workshop at NIPS 2016
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Multiagent Systems (cs.MA)

    Learning your first language is an incredible feat and not easily duplicated.
    Doing this using nothing but a few pictureless books, a corpus, would likely be
    impossible even for humans. As an alternative we propose to use situated
    interactions between agents as a driving force for communication, and the
    framework of Deep Recurrent Q-Networks (DRQN) for learning a common language
    grounded in the provided environment. We task the agents with interactive image
    search in the form of the game Guess Who?. The images from the game provide a
    non trivial environment for the agents to discuss and a natural grounding for
    the concepts they decide to encode in their communication. Our experiments show
    that it is possible to learn this task using DRQN and even more importantly
    that the words the agents use correspond to physical attributes present in the
    images that make up the agents environment.

    Importance Sampling with Unequal Support

    Philip S. Thomas, Emma Brunskill
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Importance sampling is often used in machine learning when training and
    testing data come from different distributions. In this paper we propose a new
    variant of importance sampling that can reduce the variance of importance
    sampling-based estimates by orders of magnitude when the supports of the
    training and testing distributions differ. After motivating and presenting our
    new importance sampling estimator, we provide a detailed theoretical analysis
    that characterizes both its bias and variance relative to the ordinary
    importance sampling estimator (in various settings, which include cases where
    ordinary importance sampling is biased, while our new estimator is not, and
    vice versa). We conclude with an example of how our new importance sampling
    estimator can be used to improve estimates of how well a new treatment policy
    for diabetes will work for an individual, using only data from when the
    individual used a previous treatment policy.

    A stochastically verifiable autonomous control architecture with reasoning

    Paolo Izzo, Hongyang Qu, Sandor M. Veres
    Comments: Accepted at IEEE Conf. Decision and Control, 2016
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    A new agent architecture called Limited Instruction Set Agent (LISA) is
    introduced for autonomous control. The new architecture is based on previous
    implementations of AgentSpeak and it is structurally simpler than its
    predecessors with the aim of facilitating design-time and run-time verification
    methods. The process of abstracting the LISA system to two different types of
    discrete probabilistic models (DTMC and MDP) is investigated and illustrated.
    The LISA system provides a tool for complete modelling of the agent and the
    environment for probabilistic verification. The agent program can be
    automatically compiled into a DTMC or a MDP model for verification with Prism.
    The automatically generated Prism model can be used for both design-time and
    run-time verification. The run-time verification is investigated and
    illustrated in the LISA system as an internal modelling mechanism for
    prediction of future outcomes.

    Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

    Lukas Cavigelli, Dominic Bernath, Michele Magno, Luca Benini
    Comments: Presented at SPIE Security + Defence 2016 Proc. SPIE 9997, Target and Background Signatures II
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    Detecting and classifying targets in video streams from surveillance cameras
    is a cumbersome, error-prone and expensive task. Often, the incurred costs are
    prohibitive for real-time monitoring. This leads to data being stored locally
    or transmitted to a central storage site for post-incident examination. The
    required communication links and archiving of the video data are still
    expensive and this setup excludes preemptive actions to respond to imminent
    threats. An effective way to overcome these limitations is to build a smart
    camera that transmits alerts when relevant video sequences are detected. Deep
    neural networks (DNNs) have come to outperform humans in visual classifications
    tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be
    extended to make use of higher-dimensional input data such as multispectral
    data. We explore this opportunity in terms of achievable accuracy and required
    computational effort. To analyze the precision of DNNs for scene labeling in an
    urban surveillance scenario we have created a dataset with 8 classes obtained
    in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR
    snapshot sensor to assess the potential of multispectral image data for target
    classification. We evaluate several new DNNs, showing that the spectral
    information fused together with the RGB frames can be used to improve the
    accuracy of the system or to achieve similar accuracy with a 3x smaller
    computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even
    for scarcely occurring, but particularly interesting classes, such as cars, 75%
    of the pixels are labeled correctly with errors occurring only around the
    border of the objects. This high accuracy was obtained with a training set of
    only 30 labeled images, paving the way for fast adaptation to various
    application scenarios.


    Information Retrieval

    On the Impact of Entity Linking in Microblog Real-Time Filtering

    Giacomo Berardi, Diego Ceccarelli, Andrea Esuli, Diego Marcheggiani
    Comments: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain – April 13 – 17, 2015
    Journal-ref: Proceedings of the 30th Annual ACM Symposium on Applied Computing
    (SAC 2015). pp 1066-1071. Salamanca, ES, 2015
    Subjects: Information Retrieval (cs.IR)

    Microblogging is a model of content sharing in which the temporal locality of
    posts with respect to important events, either of foreseeable or unforeseeable
    nature, makes applica- tions of real-time filtering of great practical
    interest. We propose the use of Entity Linking (EL) in order to improve the
    retrieval effectiveness, by enriching the representation of microblog posts and
    filtering queries. EL is the process of recognizing in an unstructured text the
    mention of relevant entities described in a knowledge base. EL of short pieces
    of text is a difficult task, but it is also a scenario in which the information
    EL adds to the text can have a substantial impact on the retrieval process. We
    implement a start-of-the-art filtering method, based on the best systems from
    the TREC Microblog track realtime adhoc retrieval and filtering tasks , and
    extend it with a Wikipedia-based EL method. Results show that the use of EL
    significantly improves over non-EL based versions of the filtering methods.

    Getting Started with Neural Models for Semantic Matching in Web Search

    Kezban Dilek Onal, Ismail Sengor Altingovde, Pinar Karagoz, Maarten de Rijke
    Comments: under review for the Information Retrieval Journal
    Subjects: Information Retrieval (cs.IR)

    The vocabulary mismatch problem is a long-standing problem in information
    retrieval. Semantic matching holds the promise of solving the problem. Recent
    advances in language technology have given rise to unsupervised neural models
    for learning representations of words as well as bigger textual units. Such
    representations enable powerful semantic matching methods. This survey is meant
    as an introduction to the use of neural models for semantic matching. To remain
    focused we limit ourselves to web search. We detail the required background and
    terminology, a taxonomy grouping the rapidly growing body of work in the area,
    and then survey work on neural models for semantic matching in the context of
    three tasks: query suggestion, ad retrieval, and document retrieval. We include
    a section on resources and best practices that we believe will help readers who
    are new to the area. We conclude with an assessment of the state-of-the-art and
    suggestions for future work.

    Why is it Difficult to Detect Sudden and Unexpected Epidemic Outbreaks in Twitter?

    Avaré Stewart, Sara Romano, Nattiya Kanhabua, Sergio Di Martino, Wolf Siberski, Antonino Mazzeo, Wolfgang Nejdl, Ernesto Diaz-Aviles
    Comments: ACM CCS Concepts: Applied computing – Health informatics; Information systems – Web mining; Document filtering; Novelty in information retrieval; Recommender systems; Human-centered computing – Social media
    Subjects: Computers and Society (cs.CY); Information Retrieval (cs.IR); Machine Learning (stat.ML)

    Social media services such as Twitter are a valuable source of information
    for decision support systems. Many studies have shown that this also holds for
    the medical domain, where Twitter is considered a viable tool for public health
    officials to sift through relevant information for the early detection,
    management, and control of epidemic outbreaks. This is possible due to the
    inherent capability of social media services to transmit information faster
    than traditional channels. However, the majority of current studies have
    limited their scope to the detection of common and seasonal health recurring
    events (e.g., Influenza-like Illness), partially due to the noisy nature of
    Twitter data, which makes outbreak detection and management very challenging.

    Within the European project M-Eco, we developed a Twitter-based Epidemic
    Intelligence (EI) system, which is designed to also handle a more general class
    of unexpected and aperiodic outbreaks. In particular, we faced three main
    research challenges in this endeavor:

    1) dynamic classification to manage terminology evolution of Twitter
    messages, 2) alert generation to produce reliable outbreak alerts analyzing the
    (noisy) tweet time series, and 3) ranking and recommendation to support domain
    experts for better assessment of the generated alerts.

    In this paper, we empirically evaluate our proposed approach to these
    challenges using real-world outbreak datasets and a large collection of tweets.
    We validate our solution with domain experts, describe our experiences, and
    give a more realistic view on the benefits and issues of analyzing social media
    for public health.


    Computation and Language

    Roadmap Enhanced Improvement to the VSIMM Tracker via a Constrained Stochastic Context Free Grammar

    Sijia Gao, Vikram Krishnamurthy
    Subjects: Computation and Language (cs.CL)

    The aim of syntactic tracking is to classify spatio-temporal patterns of a
    target’s motion using natural language processing models. In this paper, we
    generalize earlier work by considering a constrained stochastic context free
    grammar (CSCFG) for modeling patterns confined to a roadmap. The constrained
    grammar facilitates modeling specific directions and road names in a roadmap.
    We present a novel particle filtering algorithm that exploits the CSCFG model
    for estimating the target’s patterns. This meta-level algorithm operates in
    conjunction with a base-level tracking algorithm. Extensive numerical results
    using simulated ground moving target indicator (GMTI) radar measurements show
    substantial improvement in target tracking accuracy.

    Efficient Summarization with Read-Again and Copy Mechanism

    Wenyuan Zeng, Wenjie Luo, Sanja Fidler, Raquel Urtasun
    Comments: 11 pages, 4 figures, 5 tables
    Subjects: Computation and Language (cs.CL)

    Encoder-decoder models have been widely used to solve sequence to sequence
    prediction tasks. However current approaches suffer from two shortcomings.
    First, the encoders compute a representation of each word taking into account
    only the history of the words it has read so far, yielding suboptimal
    representations. Second, current decoders utilize large vocabularies in order
    to minimize the problem of unknown words, resulting in slow decoding times. In
    this paper we address both shortcomings. Towards this goal, we first introduce
    a simple mechanism that first reads the input sequence before committing to a
    representation of each word. Furthermore, we propose a simple copy mechanism
    that is able to exploit very small vocabularies and handle out-of-vocabulary
    words. We demonstrate the effectiveness of our approach on the Gigaword dataset
    and DUC competition outperforming the state-of-the-art.

    Tracing metaphors in time through self-distance in vector spaces

    Marco Del Tredici, Malvina Nissim, Andrea Zaninello
    Comments: Proceedings of the Third Italian Conference on Computational Linguistics (CLIC 2016)
    Subjects: Computation and Language (cs.CL)

    From a diachronic corpus of Italian, we build consecutive vector spaces in
    time and use them to compare a term’s cosine similarity to itself in different
    time spans. We assume that a drop in similarity might be related to the
    emergence of a metaphorical sense at a given time. Similarity-based
    observations are matched to the actual year when a figurative meaning was
    documented in a reference dictionary and through manual inspection of corpus
    occurrences.

    Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

    Emilio Jorge, Mikael Kågebäck, Emil Gustavsson
    Comments: 8 pages with 1 page appendix. Accepted to Deep Reinforcement Learning Workshop at NIPS 2016
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Multiagent Systems (cs.MA)

    Learning your first language is an incredible feat and not easily duplicated.
    Doing this using nothing but a few pictureless books, a corpus, would likely be
    impossible even for humans. As an alternative we propose to use situated
    interactions between agents as a driving force for communication, and the
    framework of Deep Recurrent Q-Networks (DRQN) for learning a common language
    grounded in the provided environment. We task the agents with interactive image
    search in the form of the game Guess Who?. The images from the game provide a
    non trivial environment for the agents to discuss and a natural grounding for
    the concepts they decide to encode in their communication. Our experiments show
    that it is possible to learn this task using DRQN and even more importantly
    that the words the agents use correspond to physical attributes present in the
    images that make up the agents environment.


    Distributed, Parallel, and Cluster Computing

    Learning an Astronomical Catalog of the Visible Universe through Scalable Bayesian Inference

    Jeffrey Regier, Kiran Pamnany, Ryan Giordano, Rollin Thomas, David Schlegel, Jon McAuliffe, Prabhat
    Comments: submitting to IPDPS’17
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Instrumentation and Methods for Astrophysics (astro-ph.IM); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

    Celeste is a procedure for inferring astronomical catalogs that attains
    state-of-the-art scientific results. To date, Celeste has been scaled to at
    most hundreds of megabytes of astronomical images: Bayesian posterior inference
    is notoriously demanding computationally. In this paper, we report on a
    scalable, parallel version of Celeste, suitable for learning catalogs from
    modern large-scale astronomical datasets. Our algorithmic innovations include a
    fast numerical optimization routine for Bayesian posterior inference and a
    statistically efficient scheme for decomposing astronomical optimization
    problems into subproblems.

    Our scalable implementation is written entirely in Julia, a new high-level
    dynamic programming language designed for scientific and numerical computing.
    We use Julia’s high-level constructs for shared and distributed memory
    parallelism, and demonstrate effective load balancing and efficient scaling on
    up to 8192 Xeon cores on the NERSC Cori supercomputer.

    Executing Dynamic Data Rate Actor Networks on OpenCL Platforms

    Jani Boutellier, Ilkka Hautala
    Comments: 2016 IEEE International Workshop on Signal Processing Systems
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Heterogeneous computing platforms consisting of general purpose processors
    (GPPs) and graphics processing units (GPUs) have become commonplace in personal
    mobile devices and embedded systems. For years, programming of these platforms
    was very tedious and simultaneous use of all available GPP and GPU resources
    required low-level programming to ensure efficient synchronization and data
    transfer between processors. However, in the last few years several high-level
    programming frameworks have emerged, which enable programmers to describe
    applications by means of abstractions such as dataflow or Kahn process networks
    and leave parallel execution, data transfer and synchronization to be handled
    by the framework.

    Unfortunately, even the most advanced high-level programming frameworks have
    had shortcomings that limit their applicability to certain classes of
    applications. This paper presents a new, dataflow-flavored programming
    framework targeting heterogeneous platforms, and differs from previous
    approaches by allowing GPU-mapped actors to have data dependent consumption of
    inputs / production of outputs. Such flexibility is essential for configurable
    and adaptive applications that are becoming increasingly common in signal
    processing. In our experiments it is shown that this feature allows up to 5x
    increase in application throughput.

    The proposed framework is validated by application examples from the video
    processing and wireless communications domains. In the experiments the
    framework is compared to a well-known reference framework and it is shown that
    the proposed framework enables both a higher degree of flexibility and better
    throughput.

    CMS software and computing for LHC Run 2

    Kenneth Bloom
    Comments: Contribution to proceedings of the 38th International Conference on High Energy Physics (ICHEP 2016)
    Subjects: Instrumentation and Detectors (physics.ins-det); Distributed, Parallel, and Cluster Computing (cs.DC); High Energy Physics – Experiment (hep-ex)

    The CMS offline software and computing system has successfully met the
    challenge of LHC Run 2. In this presentation, we will discuss how the entire
    system was improved in anticipation of increased trigger output rate, increased
    rate of pileup interactions and the evolution of computing technology. The
    primary goals behind these changes was to increase the flexibility of computing
    facilities where ever possible, as to increase our operational efficiency, and
    to decrease the computing resources needed to accomplish the primary offline
    computing workflows. These changes have resulted in a new approach to
    distributed computing in CMS for Run 2 and for the future as the LHC luminosity
    should continue to increase. We will discuss changes and plans to our data
    federation, which was one of the key changes towards a more flexible computing
    model for Run 2. Our software framework and algorithms also underwent
    significant changes. We will summarize the our experience with a new
    multi-threaded framework as deployed on our prompt reconstruction farm for 2015
    and across the CMS WLCG Tier-1 facilities. We will discuss our experience with
    a analysis data format which is ten times smaller than our primary Run 1
    format. This “miniAOD” format has proven to be easier to analyze while be
    extremely flexible for analysts. Finally, we describe improvements to our
    workflow management system that have resulted in increased automation and
    reliability for all facets of CMS production and user analysis operations.


    Learning

    Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures

    Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart
    Subjects: Learning (cs.LG); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)

    We prove the first {em Statistical Query lower bounds} for two fundamental
    high-dimensional learning problems involving Gaussian distributions: (1)
    learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of
    a single unknown mean Gaussian. In particular, we show a {em super-polynomial
    gap} between the (information-theoretic) sample complexity and the complexity
    of {em any} Statistical Query algorithm for these problems.

    Our SQ lower bound for Problem (1) implies that — as far as SQ algorithms
    are concerned — the computational complexity of learning GMMs is inherently
    exponential {em in the dimension of the latent space} — even though there is
    no such information-theoretic barrier. Our lower bound for Problem (2) implies
    that the accuracy of the robust learning algorithm
    in~cite{DiakonikolasKKLMS16} is essentially best possible among all
    polynomial-time SQ algorithms. On the positive side, we give a new SQ learning
    algorithm for this problem with optimal accuracy whose running time nearly
    matches our lower bound. Both our SQ lower bounds are attained via a unified
    moment-matching technique that may be useful in other contexts. Our SQ learning
    algorithm for Problem (2) relies on a filtering technique that removes outliers
    based on higher-order tensors.

    Our lower bound technique also has implications for related inference
    problems, specifically for the problem of robust {em testing} of an unknown
    mean Gaussian. Here we show an information-theoretic lower bound which
    separates the sample complexity of the robust testing problem from its
    non-robust variant. This result is surprising because such a separation does
    not exist for the corresponding learning problem.

    Importance Sampling with Unequal Support

    Philip S. Thomas, Emma Brunskill
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Importance sampling is often used in machine learning when training and
    testing data come from different distributions. In this paper we propose a new
    variant of importance sampling that can reduce the variance of importance
    sampling-based estimates by orders of magnitude when the supports of the
    training and testing distributions differ. After motivating and presenting our
    new importance sampling estimator, we provide a detailed theoretical analysis
    that characterizes both its bias and variance relative to the ordinary
    importance sampling estimator (in various settings, which include cases where
    ordinary importance sampling is biased, while our new estimator is not, and
    vice versa). We conclude with an example of how our new importance sampling
    estimator can be used to improve estimates of how well a new treatment policy
    for diabetes will work for an individual, using only data from when the
    individual used a previous treatment policy.

    Disentangling factors of variation in deep representations using adversarial training

    Michael Mathieu, Junbo Zhao, Pablo Sprechmann, Aditya Ramesh, Yann LeCun
    Comments: Conference paper in NIPS 2016
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We introduce a conditional generative model for learning to disentangle the
    hidden factors of variation within a set of labeled observations, and separate
    them into complementary codes. One code summarizes the specified factors of
    variation associated with the labels. The other summarizes the remaining
    unspecified variability. During training, the only available source of
    supervision comes from our ability to distinguish among different observations
    belonging to the same class. Examples of such observations include images of a
    set of labeled objects captured at different viewpoints, or recordings of set
    of speakers dictating multiple phrases. In both instances, the intra-class
    diversity is the source of the unspecified factors of variation: each object is
    observed at multiple viewpoints, and each speaker dictates multiple phrases.
    Learning to disentangle the specified factors from the unspecified ones becomes
    easier when strong supervision is possible. Suppose that during training, we
    have access to pairs of images, where each pair shows two different objects
    captured from the same viewpoint. This source of alignment allows us to solve
    our task using existing methods. However, labels for the unspecified factors
    are usually unavailable in realistic scenarios where data acquisition is not
    strictly controlled. We address the problem of disentanglement in this more
    general setting by combining deep convolutional autoencoders with a form of
    adversarial training. Both factors of variation are implicitly captured in the
    organization of the learned embedding space, and can be used for solving
    single-image analogies. Experimental results on synthetic and real datasets
    show that the proposed method is capable of generalizing to unseen classes and
    intra-class variabilities.

    Ultimate tensorization: compressing convolutional and FC layers alike

    Timur Garipov, Dmitry Podoprikhin, Alexander Novikov, Dmitry Vetrov
    Comments: NIPS 2016 workshop: Learning with Tensors: Why Now and How?
    Subjects: Learning (cs.LG)

    Convolutional neural networks excel in image recognition tasks, but this
    comes at the cost of high computational and memory complexity. To tackle this
    problem, [1] developed a tensor factorization framework to compress
    fully-connected layers. In this paper, we focus on compressing convolutional
    layers. We show that while the direct application of the tensor framework [1]
    to the 4-dimensional kernel of convolution does compress the layer, we can do
    better. We reshape the convolutional kernel into a tensor of higher order and
    factorize it. We combine the proposed approach with the previous work to
    compress both convolutional and fully-connected layers of a network and achieve
    80x network compression rate with 1.1% accuracy drop on the CIFAR-10 dataset.

    Low Data Drug Discovery with One-shot Learning

    Han Altae-Tran, Bharath Ramsundar, Aneesh S. Pappu, Vijay Pande
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Recent advances in machine learning have made significant contributions to
    drug discovery. Deep neural networks in particular have been demonstrated to
    provide significant boosts in predictive power when inferring the properties
    and activities of small-molecule compounds. However, the applicability of these
    techniques has been limited by the requirement for large amounts of training
    data. In this work, we demonstrate how one-shot learning can be used to
    significantly lower the amounts of data required to make meaningful predictions
    in drug discovery applications. We introduce a new architecture, the residual
    LSTM embedding, that, when combined with graph convolutional neural networks,
    significantly improves the ability to learn meaningful distance metrics over
    small-molecules. We open source all models introduced in this work as part of
    DeepChem, an open-source framework for deep-learning in drug discovery.

    Using Neural Networks for Fast Reachable Set Computations

    Frank Jiang, Glen Chou, Mo Chen, Claire J. Tomlin
    Comments: Submitted to HSCC 2017
    Subjects: Learning (cs.LG)

    Hamilton-Jacobi (HJ) reachability is a powerful tool that provides
    performance and safety guarantees for dynamical systems. Unfortunately, using
    the state-of-the-art dynamic programming-based approaches, HJ reachability is
    intractable for systems with more than five dimensions because its
    computational complexity scales exponentially with system dimension. To
    sidestep the curse of dimensionality, we propose an algorithm that leverages a
    neural network to approximate the minimum time-to-reach function to synthesize
    controls. We show that our neural network generates near optimal controls which
    are guaranteed to successfully drive the system to a target state. Our
    framework is not dependent on state space discretization, leading to a
    significant reduction in computation time and space complexity in comparison
    with dynamic programming-based approaches. Using this grid-free approach also
    enables us to plan over longer time horizons with relatively little additional
    computation overhead. Unlike many previous neural network reachability
    formulations, our approximation is conservative and hence any trajectories we
    generate will be strictly feasible. For demonstration, we specialize our new
    general framework to the Dubins car model and discuss how the general framework
    can be applied to other models with higher-dimensional state spaces.

    Diversity Leads to Generalization in Neural Networks

    Bo Xie, Yingyu Liang, Le Song
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Neural networks are a powerful class of functions that can be trained with
    simple gradient descent to achieve state-of-the-art performance on a variety of
    applications. Despite their practical success, there is a paucity of results
    that provide theoretical guarantees on why they are so effective. Lying in the
    center of the problem is the difficulty of analyzing the non-convex objective
    function with potentially numerous local minima and saddle points. Can neural
    networks corresponding to the stationary points of the objective function learn
    the true labeling function? If yes, what are the key factors contributing to
    such generalization ability?

    In this paper, we provide answers to these questions by analyzing
    one-hidden-layer neural networks with ReLU activation, and show that despite
    the non-convexity, neural networks with diverse units can learn the true
    function. We bypass the non-convexity issue by directly analyzing the first
    order condition, and show that the loss is bounded if the smallest singular
    value of the “extended feature matrix” is large enough. We make novel use of
    techniques from kernel methods and geometric discrepancy, and identify a new
    relation linking the smallest singular value to the spectrum of a kernel
    function associated with the activation function and to the diversity of the
    units. Our results also suggest a novel regularization function to promote unit
    diversity for potentially better generalization ability.

    A Modular Theory of Feature Learning

    Daniel McNamara, Cheng Soon Ong, Robert C. Williamson
    Subjects: Learning (cs.LG)

    Learning representations of data, and in particular learning features for a
    subsequent prediction task, has been a fruitful area of research delivering
    impressive empirical results in recent years. However, relatively little is
    understood about what makes a representation `good’. We propose the idea of a
    risk gap induced by representation learning for a given prediction context,
    which measures the difference in the risk of some learner using the learned
    features as compared to the original inputs. We describe a set of sufficient
    conditions for unsupervised representation learning to provide a benefit, as
    measured by this risk gap. These conditions decompose the problem of when
    representation learning works into its constituent parts, which can be
    separately evaluated using an unlabeled sample, suitable domain-specific
    assumptions about the joint distribution, and analysis of the feature learner
    and subsequent supervised learner. We provide two examples of such conditions
    in the context of specific properties of the unlabeled distribution, namely
    when the data lies close to a low-dimensional manifold and when it forms
    clusters. We compare our approach to a recently proposed analysis of
    semi-supervised learning.

    Energy-efficient Machine Learning in Silicon: A Communications-inspired Approach

    Naresh R. Shanbhag
    Comments: This paper was presented at the 2016 ICML Workshop on On-Device Intelligence, June 24, 2016
    Subjects: Learning (cs.LG); Hardware Architecture (cs.AR)

    This position paper advocates a communications-inspired approach to the
    design of machine learning systems on energy-constrained embedded `always-on’
    platforms. The communications-inspired approach has two versions – 1) a
    deterministic version where existing low-power communication IC design methods
    are repurposed, and 2) a stochastic version referred to as Shannon-inspired
    statistical information processing employing information-based metrics,
    statistical error compensation (SEC), and retraining-based methods to implement
    ML systems on stochastic circuit/device fabrics operating at the limits of
    energy-efficiency. The communications-inspired approach has the potential to
    fully leverage the opportunities afforded by ML algorithms and applications in
    order to address the challenges inherent in their deployment on
    energy-constrained platforms.

    Multi-Task Multiple Kernel Relationship Learning

    Keerthiram Murugesan, Jaime Carbonell
    Comments: in submission
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    This paper presents a novel multitask multiple-kernel learning framework that
    efficiently learns the kernel weights leveraging the relationship across
    multiple tasks. The idea is to automatically infer this task relationship in
    the extit{RKHS} space corresponding to the given base kernels. The problem is
    formulated as a regularization-based approach called extit{Multi-Task
    Multiple Kernel Relationship Learning} ( extit{MK-MTRL}), which models the
    task relationship matrix from the weights learned from latent feature spaces of
    task-specific base kernels. Unlike in previous work, the proposed formulation
    allows one to incorporate prior knowledge for simultaneously learning several
    related task. We propose an alternating minimization algorithm to learn the
    model parameters, kernel weights and task relationship matrix. In order to
    tackle large-scale problems, we further propose a two-stage extit{MK-MTRL}
    online learning algorithm and show that it significantly reduces the
    computational time, and also achieves performance comparable to that of the
    joint learning framework. Experimental results on benchmark datasets show that
    the proposed formulations outperform several state-of-the-art multi-task
    learning methods.

    DiffSharp: An AD Library for .NET Languages

    Atılım Güneş Baydin, Barak A. Pearlmutter, Jeffrey Mark Siskind
    Comments: Extended abstract presented at the AD 2016 Conference, Sep 2016, Oxford UK
    Subjects: Mathematical Software (cs.MS); Learning (cs.LG)

    DiffSharp is an algorithmic differentiation or automatic differentiation (AD)
    library for the .NET ecosystem, which is targeted by the C# and F# languages,
    among others. The library has been designed with machine learning applications
    in mind, allowing very succinct implementations of models and optimization
    routines. DiffSharp is implemented in F# and exposes forward and reverse AD
    operators as general nestable higher-order functions, usable by any .NET
    language. It provides high-performance linear algebra primitives—scalars,
    vectors, and matrices, with a generalization to tensors underway—that are
    fully supported by all the AD operators, and which use a BLAS/LAPACK backend
    via the highly optimized OpenBLAS library. DiffSharp currently uses operator
    overloading, but we are developing a transformation-based version of the
    library using F#’s “code quotation” metaprogramming facility. Work on a
    CUDA-based GPU backend is also underway.

    Binomial Checkpointing for Arbitrary Programs with No User Annotation

    Jeffrey Mark Siskind, Barak A. Pearlmutter
    Comments: Extended abstract presented at the AD 2016 Conference, Sep 2016, Oxford UK
    Subjects: Programming Languages (cs.PL); Learning (cs.LG); Mathematical Software (cs.MS)

    Heretofore, automatic checkpointing at procedure-call boundaries, to reduce
    the space complexity of reverse mode, has been provided by systems like
    Tapenade. However, binomial checkpointing, or treeverse, has only been provided
    in Automatic Differentiation (AD) systems in special cases, e.g., through
    user-provided pragmas on DO loops in Tapenade, or as the nested taping
    mechanism in adol-c for time integration processes, which requires that user
    code be refactored. We present a framework for applying binomial checkpointing
    to arbitrary code with no special annotation or refactoring required. This is
    accomplished by applying binomial checkpointing directly to a program trace.
    This trace is produced by a general-purpose checkpointing mechanism that is
    orthogonal to AD.

    Learning an Astronomical Catalog of the Visible Universe through Scalable Bayesian Inference

    Jeffrey Regier, Kiran Pamnany, Ryan Giordano, Rollin Thomas, David Schlegel, Jon McAuliffe, Prabhat
    Comments: submitting to IPDPS’17
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Instrumentation and Methods for Astrophysics (astro-ph.IM); Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

    Celeste is a procedure for inferring astronomical catalogs that attains
    state-of-the-art scientific results. To date, Celeste has been scaled to at
    most hundreds of megabytes of astronomical images: Bayesian posterior inference
    is notoriously demanding computationally. In this paper, we report on a
    scalable, parallel version of Celeste, suitable for learning catalogs from
    modern large-scale astronomical datasets. Our algorithmic innovations include a
    fast numerical optimization routine for Bayesian posterior inference and a
    statistically efficient scheme for decomposing astronomical optimization
    problems into subproblems.

    Our scalable implementation is written entirely in Julia, a new high-level
    dynamic programming language designed for scientific and numerical computing.
    We use Julia’s high-level constructs for shared and distributed memory
    parallelism, and demonstrate effective load balancing and efficient scaling on
    up to 8192 Xeon cores on the NERSC Cori supercomputer.

    Policy Search with High-Dimensional Context Variables

    Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Direct contextual policy search methods learn to improve policy parameters
    and simultaneously generalize these parameters to different context or task
    variables. However, learning from high-dimensional context variables, such as
    camera images, is still a prominent problem in many real-world tasks. A naive
    application of unsupervised dimensionality reduction methods to the context
    variables, such as principal component analysis, is insufficient as
    task-relevant input may be ignored. In this paper, we propose a contextual
    policy search method in the model-based relative entropy stochastic search
    framework with integrated dimensionality reduction. We learn a model of the
    reward that is locally quadratic in both the policy parameters and the context
    variables. Furthermore, we perform supervised linear dimensionality reduction
    on the context variables by nuclear norm regularization. The experimental
    results show that the proposed method outperforms naive dimensionality
    reduction via principal component analysis and a state-of-the-art contextual
    policy search method.

    Faster Kernel Ridge Regression Using Sketching and Preconditioning

    Haim Avron, Kenneth L. Clarkson, David P. Woodruff
    Subjects: Numerical Analysis (cs.NA); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Numerical Analysis (math.NA)

    Random feature maps, such as random Fourier features, have recently emerged
    as a powerful technique for speeding up and scaling the training of
    kernel-based methods such as kernel ridge regression. However, random feature
    maps only provide crude approximations to the kernel function, so delivering
    state-of-the-art results requires the number of random features to be very
    large. Nevertheless, in some cases, even when the number of random features is
    driven to be as large as the training size, full recovery of the performance of
    the exact kernel method is not attained. In order to address this issue, we
    propose to use random feature maps to form preconditioners to be used in
    solving kernel ridge regression to high accuracy. We provide theoretical
    conditions on when this yields an effective preconditioner, and empirically
    evaluate our method and show it is highly effective for datasets of up to one
    million training examples.

    Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

    Emilio Jorge, Mikael Kågebäck, Emil Gustavsson
    Comments: 8 pages with 1 page appendix. Accepted to Deep Reinforcement Learning Workshop at NIPS 2016
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Multiagent Systems (cs.MA)

    Learning your first language is an incredible feat and not easily duplicated.
    Doing this using nothing but a few pictureless books, a corpus, would likely be
    impossible even for humans. As an alternative we propose to use situated
    interactions between agents as a driving force for communication, and the
    framework of Deep Recurrent Q-Networks (DRQN) for learning a common language
    grounded in the provided environment. We task the agents with interactive image
    search in the form of the game Guess Who?. The images from the game provide a
    non trivial environment for the agents to discuss and a natural grounding for
    the concepts they decide to encode in their communication. Our experiments show
    that it is possible to learn this task using DRQN and even more importantly
    that the words the agents use correspond to physical attributes present in the
    images that make up the agents environment.

    SoK: Applying Machine Learning in Security – A Survey

    Heju Jiang, Jasvir Nagra, Parvez Ahammad
    Comments: 18 pages, 2 figures, 11 tables
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    The idea of applying machine learning(ML) to solve problems in security
    domains is almost 3 decades old. As information and communications grow more
    ubiquitous and more data become available, many security risks arise as well as
    appetite to manage and mitigate such risks. Consequently, research on applying
    and designing ML algorithms and systems for security has grown fast, ranging
    from intrusion detection systems(IDS) and malware classification to security
    policy management(SPM) and information leak checking. In this paper, we
    systematically study the methods, algorithms, and system designs in academic
    publications from 2008-2015 that applied ML in security domains. 98 percent of
    the surveyed papers appeared in the 6 highest-ranked academic security
    conferences and 1 conference known for pioneering ML applications in security.
    We examine the generalized system designs, underlying assumptions,
    measurements, and use cases in active research. Our examinations lead to 1) a
    taxonomy on ML paradigms and security domains for future exploration and
    exploitation, and 2) an agenda detailing open and upcoming challenges. Based on
    our survey, we also suggest a point of view that treats security as a game
    theory problem instead of a batch-trained ML problem.


    Information Theory

    Reconfigurable Dual Mode IEEE 802.15.4 Digital Baseband Receiver for Diverse IoT Applications

    Mohammed Abdullah Zubair, P. Rajalakshmi
    Subjects: Information Theory (cs.IT)

    IEEE 802.15.4 takes a center stage in IoT as Low- Rate Wireless Personal Area
    Networks(LR-WPANs). The standard specifies Offset Quadrature Phase Shift Keying
    Physical Layer (O-QPSK PHY) with half-sine pulse shaping which can be either
    analyzed under the class of M-ary PSK signals (QPSK signal with offset) or as
    Minimum Shift Keying (MSK) signal. M-ary PSK demodulation is requires perfect
    carrier and has minimal error. MSK signals which falls under Continuous Phase
    Frequency Shift Keying can be demodulated non-coherently but error performance
    is not as good. In our paper, this dual nature of IEEE 802.15.4 PHY is
    exploited to propose a dual mode receiver comprising of QPSK demodulator chain
    and MSK demodulator chain as a single system on chip. The mode can be
    configured manually depending on the type of application or based on the
    feedback from a Signal to Noise (SNR) indicator employed in the proposed
    receiver. M-ary PSK chain is selected for lower SNRs and MSK for higher SNRs.
    Each of these properties are analyzed in detail for both demodulator chains and
    we go on to prove that MSK detection can be used for low power, low complex and
    low latency while QPSK detection is employed for minimal error.

    Some bounds of separating hash families

    X. Niu, H. Cao
    Comments: Separating hash family, Frameproof code, Strong separating hash family
    Subjects: Information Theory (cs.IT)

    Separating hash families were first introduced by Stinson, Trung and Wei.

    In this paper, we present some new bounds of SHF with small parameter. By the
    small parameter, we improve previously known bound of types ({w,w}) and
    ({w_1,w_2}). we also give a construction for strong separating hash family.

    Massive Machine Type Communication with Data Aggregation and Resource Scheduling

    Jing Guo, Salman Durrani, Xiangyun Zhou, Halim Yanikomeroglu
    Comments: submitted to possible journal publication
    Subjects: Information Theory (cs.IT)

    To enable massive machine type communication (mMTC), data aggregation is a
    promising approach to reduce the congestion caused by a massive number of
    machine type devices (MTDs). In this work, we consider a two-phase
    cellular-based mMTC network where MTDs transmit to aggregators (i.e.,
    aggregation phase) and the aggregated data is then relayed to base stations
    (i.e., relaying phase). Due to the limited resources, the aggregators not only
    aggregate data, but also schedule resources among MTDs. We consider two
    scheduling schemes: random resource scheduling (RRS) and channel-aware resource
    scheduling (CRS). By leveraging the stochastic geometry, we present a tractable
    analytical framework to investigate the signal-to-interference ratio (SIR) for
    each phase, thereby computing the MTD success probability, the average number
    of successful MTDs and probability of successful channel utilization, which are
    the key metrics characterizing the overall mMTC performance. Our numerical
    results show that, although the CRS outperforms the RRS in terms of SIR at the
    aggregation phase, the simpler RRS has almost the same performance as the CRS
    for most cases with regards to the overall mMTC performance. Furthermore, the
    provision of more resources at the aggregation phase is not always beneficial
    to the mMTC performance.

    5G Multimedia Massive MIMO Communications Systems

    Xiaohu Ge, Haichao Wang, Ran Zi, Qiang Li, Qiang Ni
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    In the Fifth generation (5G) wireless communication systems, a majority of
    the traffic demands is contributed by various multimedia applications. To
    support the future 5G multimedia communication systems, the massive
    multiple-input multiple-output (MIMO) technique is recognized as a key enabler
    due to its high spectral efficiency. The massive antennas and radio frequency
    (RF) chains not only improve the implementation cost of 5G wireless
    communication systems but also result in an intense mutual coupling effect
    among antennas because of the limited space for deploying antennas. To reduce
    the cost, an optimal equivalent precoding matrix with the minimum number of RF
    chains is proposed for 5G multimedia massive MIMO communication systems
    considering the mutual coupling effect. Moreover, an upper bound of the
    effective capacity is derived for 5G multimedia massive MIMO communication
    systems. Two antenna receive diversity gain models are built and analyzed. The
    impacts of the antenna spacing, the number of antennas, the quality of service
    (QoS) statistical exponent, and the number of independent incident directions
    on the effective capacity of 5G multimedia massive MIMO communication systems
    are analyzed. Comparing with the conventional zero-forcing precoding matrix,
    simulation results demonstrate that the proposed optimal equivalent precoding
    matrix can achieve a higher achievable rate for 5G multimedia massive MIMO
    communication systems.

    On (mathbb{Z}_{2}mathbb{Z}_{2}[u])-((1+u))-additive constacyclic

    Ping Li, Wei Dai, Xiaoshan Kai
    Subjects: Information Theory (cs.IT)

    In this paper, we study (mathbb{Z}_{2}mathbb{Z}_{2}[u])-((1+u))-additive
    constacyclic code of arbitrary length. Firstly, we study the algebraic
    structure of this family of codes and a set of generator polynomials for this
    family as a ((mathbb{Z}_{2}+umathbb{Z}_{2})[x])-submodule of the ring
    (R_{alpha,eta}). Secondly, we give the minimal generating sets of this
    family codes, and we determine the relationship of generators between the
    (mathbb{Z}_{2}mathbb{Z}_{2}[u])-((1+u))-additive constacyclic codes and its
    dual and give the parameters in terms of the degrees of the generator
    polynomials of the code. Lastly, we also study
    (mathbb{Z}_{2}mathbb{Z}_{2}[u])-((1+u))-additive constacyclic code in terms
    of the Gray images.

    Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures

    Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart
    Subjects: Learning (cs.LG); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)

    We prove the first {em Statistical Query lower bounds} for two fundamental
    high-dimensional learning problems involving Gaussian distributions: (1)
    learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of
    a single unknown mean Gaussian. In particular, we show a {em super-polynomial
    gap} between the (information-theoretic) sample complexity and the complexity
    of {em any} Statistical Query algorithm for these problems.

    Our SQ lower bound for Problem (1) implies that — as far as SQ algorithms
    are concerned — the computational complexity of learning GMMs is inherently
    exponential {em in the dimension of the latent space} — even though there is
    no such information-theoretic barrier. Our lower bound for Problem (2) implies
    that the accuracy of the robust learning algorithm
    in~cite{DiakonikolasKKLMS16} is essentially best possible among all
    polynomial-time SQ algorithms. On the positive side, we give a new SQ learning
    algorithm for this problem with optimal accuracy whose running time nearly
    matches our lower bound. Both our SQ lower bounds are attained via a unified
    moment-matching technique that may be useful in other contexts. Our SQ learning
    algorithm for Problem (2) relies on a filtering technique that removes outliers
    based on higher-order tensors.

    Our lower bound technique also has implications for related inference
    problems, specifically for the problem of robust {em testing} of an unknown
    mean Gaussian. Here we show an information-theoretic lower bound which
    separates the sample complexity of the robust testing problem from its
    non-robust variant. This result is surprising because such a separation does
    not exist for the corresponding learning problem.

    Fast Adaptation of Activity Sensing Policies in Mobile Devices

    Mohammad Abu Alsheikh, Dusit Niyato, Shaowei Lin, Hwee-Pink Tan, Dong In Kim
    Comments: 14 pages, 10 figure
    Subjects: Human-Computer Interaction (cs.HC); Information Theory (cs.IT)

    With the proliferation of sensors, such as accelerometers, in mobile devices,
    activity and motion tracking has become a viable technology to understand and
    create an engaging user experience. This paper proposes a fast adaptation and
    learning scheme of activity tracking policies when user statistics are unknown
    a priori, varying with time, and inconsistent for different users. In our
    stochastic optimization, user activities are required to be synchronized with a
    backend under a cellular data limit to avoid overcharges from cellular
    operators. The mobile device is charged intermittently using wireless or wired
    charging for receiving the required energy for transmission and sensing
    operations. Firstly, we propose an activity tracking policy by formulating a
    stochastic optimization as a constrained Markov decision process (CMDP).
    Secondly, we prove that the optimal policy of the CMDP has a threshold
    structure using a Lagrangian relaxation approach and the submodularity concept.
    We accordingly present a fast Q-learning algorithm by considering the policy
    structure to improve the convergence speed over that of conventional
    Q-learning. Finally, simulation examples are presented to support the
    theoretical findings of this paper.




沪ICP备19023445号-2号
友情链接