IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 29 Sep 2016

    我爱机器学习(52ml.net)发表于 2016-09-29 00:00:00
    love 0

    Neural and Evolutionary Computing

    Training a Probabilistic Graphical Model with Resistive Switching Electronic Synapses

    S. Burc Eryilmaz, Emre Neftci, Siddharth Joshi, SangBum Kim, Matthew BrightSky, Hsiang-Lan Lung, Chung Lam, Gert Cauwenberghs, H.-S. Philip Wong
    Comments: submitted to IEEE Transactions on Electron Devices
    Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)

    Current large scale implementations of deep learning and data mining require
    thousands of processors, massive amounts of off-chip memory, and consume
    gigajoules of energy. Emerging memory technologies such as nanoscale
    two-terminal resistive switching memory devices offer a compact, scalable and
    low power alternative that permits on-chip co-located processing and memory in
    fine-grain distributed parallel architecture. Here we report first use of
    resistive switching memory devices for implementing and training a Restricted
    Boltzmann Machine (RBM), a generative probabilistic graphical model as a key
    component for unsupervised learning in deep networks. We experimentally
    demonstrate a 45-synapse RBM realized with 90 resistive switching phase change
    memory (PCM) elements trained with a bio-inspired variant of the Contrastive
    Divergence (CD) algorithm, implementing Hebbian and anti-Hebbian weight
    updates. The resistive PCM devices show a two-fold to ten-fold reduction in
    error rate in a missing pixel pattern completion task trained over 30 epochs,
    compared to untrained case. Measured programming energy consumption is 6.1 nJ
    per epoch with the resistive switching PCM devices, a factor of ~150 times
    lower than conventional processor-memory systems. We analyze and discuss the
    dependence of learning performance on cycle-to-cycle variations as well as
    number of gradual levels in the PCM analog memory devices.

    Learning Genomic Representations to Predict Clinical Outcomes in Cancer

    Safoora Yousefi, Congzheng Song, Nelson Nauata, Lee Cooper
    Comments: ICLR 2016 Workshop Track- May 2nd 2016 International Conference on Learning Representations
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Genomics are rapidly transforming medical practice and basic biomedical
    research, providing insights into disease mechanisms and improving therapeutic
    strategies, particularly in cancer. The ability to predict the future course of
    a patient’s disease from high-dimensional genomic profiling will be essential
    in realizing the promise of genomic medicine, but presents significant
    challenges for state-of-the-art survival analysis methods. In this abstract we
    present an investigation in learning genomic representations with neural
    networks to predict patient survival in cancer. We demonstrate the advantages
    of this approach over existing survival analysis methods using brain tumor
    data.

    Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

    Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, Shiyue Zhang
    Comments: Submitted to ICASSP 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Recurrent neural networks (RNNs) have shown clear superiority in sequence
    modeling, particularly the ones with gated units, such as long short-term
    memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties
    behind the remarkable performance remain unclear in many applications, e.g.,
    automatic speech recognition (ASR). This paper employs visualization techniques
    to study the behavior of LSTM and GRU when performing speech recognition tasks.
    Our experiments show some interesting patterns in the gated memory, and some of
    them have inspired simple yet effective modifications on the network structure.
    We report two of such modifications: (1) lazy cell update in LSTM, and (2)
    shortcut connections for residual learning. Both modifications lead to more
    comprehensible and powerful networks.

    Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification

    Franck Dernoncourt, Ji Young Lee
    Comments: Accepted as a conference paper at IEEE SLT 2016. The two authors contributed equally to this work
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Systems based on artificial neural networks (ANNs) have achieved
    state-of-the-art results in many natural language processing tasks. Although
    ANNs do not require manually engineered features, ANNs have many
    hyperparameters to be optimized. The choice of hyperparameters significantly
    impacts models’ performances. However, the ANN hyperparameters are typically
    chosen by manual, grid, or random search, which either requires expert
    experiences or is computationally expensive. Recent approaches based on
    Bayesian optimization using Gaussian processes (GPs) is a more systematic way
    to automatically pinpoint optimal or near-optimal machine learning
    hyperparameters. Using a previously published ANN model yielding
    state-of-the-art results for dialog act classification, we demonstrate that
    optimizing hyperparameters using GP further improves the results, and reduces
    the computational time by a factor of 4 compared to a random search. Therefore
    it is a useful technique for tuning ANN models to yield the best performances
    for natural language processing tasks.


    Computer Vision and Pattern Recognition

    A Simple, Fast and Highly-Accurate Algorithm to Recover 3D Shape from 2D Landmarks on a Single Image

    Ruiqi Zhao, Yan Wang, Aleix Martinez
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Three-dimensional shape reconstruction of 2D landmark points on a single
    image is a hallmark of human vision, but is a task that has been proven
    difficult for computer vision algorithms. We define a feed-forward deep neural
    network algorithm that can reconstruct 3D shapes from 2D landmark points almost
    perfectly (i.e., with extremely small reconstruction errors), even when these
    2D landmarks are from a single image. Our experimental results show an
    improvement of up to two-fold over state-of-the-art computer vision algorithms;
    3D shape reconstruction of human faces is given at a reconstruction error <
    .004, cars at .0022, human bodies at .022, and highly-deformable flags at an
    error of .0004. Our algorithm was also a top performer at the 2016 3D Face
    Alignment in the Wild Challenge competition (done in conjunction with the
    European Conference on Computer Vision, ECCV) that required the reconstruction
    of 3D face shape from a single image. The derived algorithm can be trained in a
    couple hours and testing runs at more than 1, 000 frames/s on an i7 desktop. We
    also present an innovative data augmentation approach that allows us to train
    the system efficiently with small number of samples. And the system is robust
    to noise (e.g., imprecise landmark points) and missing data (e.g., occluded or
    undetected landmark points).

    Deep Architectures for Face Attributes

    Tobi Baumgartner, Jack Culpepper
    Comments: 11 pages, 2 figures, accepted in “Workshop on Facial Informatics in conjunction with ACCV ’16”
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We train a deep convolutional neural network to perform identity
    classification using a new dataset of public figures annotated with age,
    gender, ethnicity and emotion labels, and then fine-tune it for attribute
    classification. An optimal sharing pattern of computational resources within
    this network is determined by experiment, requiring only 1 G flops to produce
    all predictions. Rather than fine-tune by relearning weights in one additional
    layer after the penultimate layer of the identity network, we try several
    different depths for each attribute. We find that prediction of age and emotion
    is improved by fine-tuning from earlier layers onward, presumably because
    deeper layers are progressively invariant to non-identity related changes in
    the input.

    Graph Based Convolutional Neural Network

    Michael Edwards, Xianghua Xie
    Comments: 11 pages, accepted into BMVC 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The benefit of localized features within the regular domain has given rise to
    the use of Convolutional Neural Networks (CNNs) in machine learning, with great
    proficiency in the image classification. The use of CNNs becomes problematic
    within the irregular spatial domain due to design and convolution of a kernel
    filter being non-trivial. One solution to this problem is to utilize graph
    signal processing techniques and the convolution theorem to perform
    convolutions on the graph of the irregular domain to obtain feature map
    responses to learnt filters. We propose graph convolution and pooling operators
    analogous to those in the regular domain. We also provide gradient calculations
    on the input data and spectral filters, which allow for the deep learning of an
    irregular spatial domain problem. Signal filters take the form of spectral
    multipliers, applying convolution in the graph spectral domain. Applying smooth
    multipliers results in localized convolutions in the spatial domain, with
    smoother multipliers providing sharper feature maps. Algebraic Multigrid is
    presented as a graph pooling method, reducing the resolution of the graph
    through agglomeration of nodes between layers of the network. Evaluation of
    performance on the MNIST digit classification problem in both the regular and
    irregular domain is presented, with comparison drawn to standard CNN. The
    proposed graph CNN provides a deep learning method for the irregular domains
    present in the machine learning community, obtaining 94.23% on the regular
    grid, and 94.96% on a spatially irregular subsampled MNIST.

    A Discriminative Framework for Anomaly Detection in Large Videos

    Allison Del Giorno, J. Andrew Bagnell, Martial Hebert
    Comments: 14 pages without references, 16 pages with. 7 figures. Accepted to ECCV 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We address an anomaly detection setting in which training sequences are
    unavailable and anomalies are scored independently of temporal ordering.
    Current algorithms in anomaly detection are based on the classical density
    estimation approach of learning high-dimensional models and finding
    low-probability events. These algorithms are sensitive to the order in which
    anomalies appear and require either training data or early context assumptions
    that do not hold for longer, more complex videos. By defining anomalies as
    examples that can be distinguished from other examples in the same video, our
    definition inspires a shift in approaches from classical density estimation to
    simple discriminative learning. Our contributions include a novel framework for
    anomaly detection that is (1) independent of temporal ordering of anomalies,
    and (2) unsupervised, requiring no separate training sequences. We show that
    our algorithm can achieve state-of-the-art results even when we adjust the
    setting by removing training sequences from standard datasets.

    Towards the effectiveness of Deep Convolutional Neural Network based Fast Random Forest Classifier

    Mrutyunjaya Panda (Reader, P.G.Department of Computer Science and Applications, Utkal University), Vani Vihar (Bhubaneswar, India)
    Comments: 11 pages, 9 figures, 1 table
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep Learning is considered to be a quite young in the area of machine
    learning research, found its effectiveness in dealing complex yet high
    dimensional dataset that includes but limited to images, text and speech etc.
    with multiple levels of representation and abstraction. As there are a plethora
    of research on these datasets by various researchers , a win over them needs
    lots of attention. Careful setting of Deep learning parameters is of paramount
    importance in order to avoid the overfitting unlike conventional methods with
    limited parameter settings. Deep Convolutional neural network (DCNN) with
    multiple layers of compositions and appropriate settings might be is an
    efficient machine learning method that can outperform the conventional methods
    in a great way. However, due to its slow adoption in learning, there are also
    always a chance of overfitting during feature selection process, which can be
    addressed by employing a regularization method called dropout. Fast Random
    Forest (FRF) is a powerful ensemble classifier especially when the datasets are
    noisy and when the number of attributes is large in comparison to the number of
    instances, as is the case of Bioinformatics datasets. Several publicly
    available Bioinformatics dataset, Handwritten digits recognition and Image
    segmentation dataset are considered for evaluation of the proposed approach.
    The excellent performance obtained by the proposed DCNN based feature selection
    with FRF classifier on high dimensional datasets makes it a fast and accurate
    classifier in comparison the state-of-the-art.

    Understanding data augmentation for classification: when to warp?

    Sebastien C. Wong, Adam Gatt, Victor Stamatescu, Mark D. McDonnell
    Comments: 6 pages, 6 figures, DICTA 2016 conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we investigate the benefit of augmenting data with
    synthetically created samples when training a machine learning classifier. Two
    approaches for creating additional training samples are data warping, which
    generates additional samples through transformations applied in the data-space,
    and synthetic over-sampling, which creates additional samples in feature-space.
    We experimentally evaluate the benefits of data augmentation for a
    convolutional backpropagation-trained neural network, a convolutional support
    vector machine and a convolutional extreme learning machine classifier, using
    the standard MNIST handwritten digit dataset. We found that while it is
    possible to perform generic augmentation in feature-space, if plausible
    transforms for the data are known then augmentation in data-space provides a
    greater benefit for improving performance and reducing overfitting.

    Video Summarization using Deep Semantic Features

    Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
    Comments: 16 pages, the 13th Asian Conference on Computer Vision (ACCV’16)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a video summarization technique for an Internet video to
    provide a quick way to overview its content. This is a challenging problem
    because finding important or informative parts of the original video requires
    to understand its content. Furthermore the content of Internet videos is very
    diverse, ranging from home videos to documentaries, which makes video
    summarization much more tough as prior knowledge is almost not available. To
    tackle this problem, we propose to use deep video features that can encode
    various levels of content semantics, including objects, actions, and scenes,
    improving the efficiency of standard video summarization techniques. For this,
    we design a deep neural network that maps videos as well as descriptions to a
    common semantic space and jointly trained it with associated pairs of videos
    and descriptions. To generate a video summary, we extract the deep features
    from each segment of the original video and apply a clustering-based
    summarization technique to them. We evaluate our video summaries using the
    SumMe dataset as well as baseline approaches. The results demonstrated the
    advantages of incorporating our deep semantic features in a video summarization
    technique.

    Scalable Discrete Supervised Hash Learning with Asymmetric Matrix Factorization

    Shifeng Zhang, Jianmin Li, Jinma Guo, Bo Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Hashing method maps similar data to binary hashcodes with smaller hamming
    distance, and it has received a broad attention due to its low storage cost and
    fast retrieval speed. However, the existing limitations make the present
    algorithms difficult to deal with large-scale datasets: (1) discrete
    constraints are involved in the learning of the hash function; (2) pairwise or
    triplet similarity is adopted to generate efficient hashcodes, resulting both
    time and space complexity are greater than O(n^2). To address these issues, we
    propose a novel discrete supervised hash learning framework which can be
    scalable to large-scale datasets. First, the discrete learning procedure is
    decomposed into a binary classifier learning scheme and binary codes learning
    scheme, which makes the learning procedure more efficient. Second, we adopt the
    Asymmetric Low-rank Matrix Factorization and propose the Fast Clustering-based
    Batch Coordinate Descent method, such that the time and space complexity is
    reduced to O(n). The proposed framework also provides a flexible paradigm to
    incorporate with arbitrary hash function, including deep neural networks and
    kernel methods. Experiments on large-scale datasets demonstrate that the
    proposed method is superior or comparable with state-of-the-art hashing
    algorithms.

    A Fast Factorization-based Approach to Robust PCA

    Chong Peng, Zhao Kang, Qiang Chen
    Comments: ICDM 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Robust principal component analysis (RPCA) has been widely used for
    recovering low-rank matrices in many data mining and machine learning problems.
    It separates a data matrix into a low-rank part and a sparse part. The convex
    approach has been well studied in the literature. However, state-of-the-art
    algorithms for the convex approach usually have relatively high complexity due
    to the need of solving (partial) singular value decompositions of large
    matrices. A non-convex approach, AltProj, has also been proposed with lighter
    complexity and better scalability. Given the true rank $r$ of the underlying
    low rank matrix, AltProj has a complexity of $O(r^2dn)$, where $d imes n$ is
    the size of data matrix. In this paper, we propose a novel factorization-based
    model of RPCA, which has a complexity of $O(kdn)$, where $k$ is an upper bound
    of the true rank. Our method does not need the precise value of the true rank.
    From extensive experiments, we observe that AltProj can work only when $r$ is
    precisely known in advance; however, when the needed rank parameter $r$ is
    specified to a value different from the true rank, AltProj cannot fully
    separate the two parts while our method succeeds. Even when both work, our
    method is about 4 times faster than AltProj. Our method can be used as a
    light-weight, scalable tool for RPCA in the absence of the precise value of the
    true rank.

    YouTube-8M: A Large-Scale Video Classification Benchmark

    Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan
    Comments: 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Many recent advancements in Computer Vision are attributed to large datasets.
    Open-source software packages for Machine Learning and inexpensive commodity
    hardware have reduced the barrier of entry for exploring novel approaches at
    scale. It is possible to train models over millions of examples within a few
    days. Although large-scale datasets exist for image understanding, such as
    ImageNet, there are no comparable size video classification datasets.

    In this paper, we introduce YouTube-8M, the largest multi-label video
    classification dataset, composed of ~8 million videos (500K hours of video),
    annotated with a vocabulary of 4800 visual entities. To get the videos and
    their labels, we used a YouTube video annotation system, which labels videos
    with their main topics. While the labels are machine-generated, they have
    high-precision and are derived from a variety of human-based signals including
    metadata and query click signals. We filtered the video labels (Knowledge Graph
    entities) using both automated and manual curation strategies, including asking
    human raters if the labels are visually recognizable. Then, we decoded each
    video at one-frame-per-second, and used a Deep CNN pre-trained on ImageNet to
    extract the hidden representation immediately prior to the classification
    layer. Finally, we compressed the frame features and make both the features and
    video-level labels available for download.

    We trained various (modest) classification models on the dataset, evaluated
    them using popular evaluation metrics, and report them as baselines. Despite
    the size of the dataset, some of our models train to convergence in less than a
    day on a single machine using TensorFlow. We plan to release code for training
    a TensorFlow model and for computing metrics.

    A Transportation $L^p$ Distance for Signal Analysis

    Matthew Thorpe, Serim Park, Soheil Kolouri, Gustavo K. Rohde, Dejan Slepčev
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Transport based distances, such as the Wasserstein distance and earth mover’s
    distance, have been shown to be an effective tool in signal and image analysis.
    The success of transport based distances is in part due to their Lagrangian
    nature which allows it to capture the important variations in many signal
    classes. However these distances require the signal to be nonnegative and
    normalized. Furthermore, the signals are considered as measures and compared by
    redistributing (transporting) them, which does not directly take into account
    the signal intensity. Here we study a transport-based distance, called the
    $TL^p$ distance, that combines Lagrangian and intensity modelling and is
    directly applicable to general, non-positive and multi-channelled signals. The
    framework allows the application of existing numerical methods. We give an
    overview of the basic properties of this distance and applications to
    classification, with multi-channelled, non-positive one and two-dimensional
    signals, and color transfer.

    Task Specific Adversarial Cost Function

    Antonia Creswell, Anil A. Bharath
    Comments: Submitted to TPAMI
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The cost function used to train a generative model should fit the purpose of
    the model. If the model is intended for tasks such as generating perceptually
    correct samples, it is beneficial to maximise the likelihood of a sample drawn
    from the model, Q, coming from the same distribution as the training data, P.
    This is equivalent to minimising the Kullback-Leibler (KL) distance, KL[Q||P].
    However, if the model is intended for tasks such as retrieval or classification
    it is beneficial to maximise the likelihood that a sample drawn from the
    training data is captured by the model, equivalent to minimising KL[P||Q]. The
    cost function used in adversarial training optimises the Jensen-Shannon entropy
    which can be seen as an even interpolation between KL[Q||P] and KL[P||Q]. Here,
    we propose an alternative adversarial cost function which allows easy tuning of
    the model for either task. Our task specific cost function is evaluated on a
    dataset of hand-written characters in the following tasks: Generation,
    retrieval and one-shot learning.

    Learning to Push by Grasping: Using multiple tasks for effective learning

    Lerrel Pinto, Abhinav Gupta
    Comments: Under review at the International Conference on Robotics and Automation (ICRA) 2017
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Recently, end-to-end learning frameworks are gaining prevalence in the field
    of robot control. These frameworks input states/images and directly predict the
    torques or the action parameters. However, these approaches are often critiqued
    due to their huge data requirements for learning a task. The argument of the
    difficulty in scalability to multiple tasks is well founded, since training
    these tasks often require hundreds or thousands of examples. But do end-to-end
    approaches need to learn a unique model for every task? Intuitively, it seems
    that sharing across tasks should help since all tasks require some common
    understanding of the environment. In this paper, we attempt to take the next
    step in data-driven end-to-end learning frameworks: move from the realm of
    task-specific models to joint learning of multiple robot tasks. In an
    astonishing result we show that models with multi-task learning tend to perform
    better than task-specific models trained with same amounts of data. For
    example, a deep-network learned with 2.5K grasp and 2.5K push examples performs
    better on grasping than a network trained on 5K grasp examples.

    Understanding and Exploiting Object Interaction Landscapes

    Sören Pirk, Vojtech Krs, Kaimo Hu, Suren Deepak Rajasekaran, Hao Kang, Bedrich Benes, Yusuke Yoshiyasu, Leonidas J. Guibas
    Comments: 14 pages, 19 figures
    Subjects: Graphics (cs.GR); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV)

    Interactions play a key role in understanding objects and scenes, for both
    virtual and real world agents. We introduce a new general representation for
    proximal interactions among physical objects that is agnostic to the type of
    objects or interaction involved. The representation is based on tracking
    particles on one of the participating objects and then observing them with
    sensors appropriately placed in the interaction volume or on the interaction
    surfaces. We show how to factorize these interaction descriptors and project
    them into a particular participating object so as to obtain a new functional
    descriptor for that object, its interaction landscape, capturing its observed
    use in a spatio-temporal framework. Interaction landscapes are independent of
    the particular interaction and capture subtle dynamic effects in how objects
    move and behave when in functional use. Our method relates objects based on
    their function, establishes correspondences between shapes based on functional
    key points and regions, and retrieves peer and partner objects with respect to
    an interaction.

    Analyzing the Behavior of Visual Question Answering Models

    Aishwarya Agrawal, Dhruv Batra, Devi Parikh
    Comments: 13 pages, 20 figures; To appear in EMNLP 2016
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Recently, a number of deep-learning based models have been proposed for the
    task of Visual Question Answering (VQA). The performance of most models is
    clustered around 60-70%. In this paper we propose systematic methods to analyze
    the behavior of these models as a first step towards recognizing their
    strengths and weaknesses, and identifying the most fruitful directions for
    progress. We analyze two models, one each from two major classes of VQA models
    — with-attention and without-attention and show the similarities and
    differences in the behavior of these models. We also analyze the winning entry
    of the VQA Challenge 2016.

    Our behavior analysis reveals that despite recent progress, today’s VQA
    models are “myopic” (tend to fail on sufficiently novel instances), often “jump
    to conclusions” (converge on a predicted answer after ‘listening’ to just half
    the question), and are “stubborn” (do not change their answers across images).


    Artificial Intelligence

    Global Constraint Catalog, Volume II, Time-Series Constraints

    Ekaterina Arafailova, Nicolas Beldiceanu, Rémi Douence, Mats Carlsson, Pierre Flener, María Andreína Francisco Rodríguez, Justin Pearson, Helmut Simonis
    Comments: 2709 pages
    Subjects: Artificial Intelligence (cs.AI)

    First this report presents a restricted set of finite transducers used to
    synthesise structural time-series constraints described by means of a
    multi-layered function composition scheme. Second it provides the corresponding
    synthesised catalogue of structural time-series constraints where each
    constraint is explicitly described in terms of automata with accumulators.

    Learning from the Hindsight Plan — Episodic MPC Improvement

    Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Model predictive control (MPC) is a popular control method that has proved
    effective for robotics, among other fields. MPC performs re-planning at every
    time step. Re-planning is done with a limited horizon per computational and
    real-time constraints and often also for robustness to potential model errors.
    However, the limited horizon leads to suboptimal performance. In this work, we
    consider the iterative learning setting, where the same task can be repeated
    several times, and propose a policy improvement scheme for MPC. The main idea
    is that between executions we can, offline, run MPC with a longer horizon,
    resulting in a hindsight plan. To bring the next real-world execution closer to
    the hindsight plan, our approach learns to re-shape the original cost function
    with the goal of satisfying the following property: short horizon planning (as
    realistic during real executions) with respect to the shaped cost should result
    in mimicking the hindsight plan. This effectively consolidates long-term
    reasoning into the short-horizon planning. We empirically evaluate our approach
    in contact-rich manipulation tasks both in simulated and real environments,
    such as peg insertion by a real PR2 robot.

    Hierarchical Memory Networks for Answer Selection on Unknown Words

    Jiaming Xu, Jing Shi, Yiqun Yao, Suncong Zheng, Bo Xu, Bo Xu
    Comments: 10 pages, to appear in COLING 2016
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Recently, end-to-end memory networks have shown promising results on Question
    Answering task, which encode the past facts into an explicit memory and perform
    reasoning ability by making multiple computational steps on the memory.
    However, memory networks conduct the reasoning on sentence-level memory to
    output coarse semantic vectors and do not further take any attention mechanism
    to focus on words, which may lead to the model lose some detail information,
    especially when the answers are rare or unknown words. In this paper, we
    propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the
    past facts into sentence-level memory and word-level memory respectively. Then,
    (k)-max pooling is exploited following reasoning module on the sentence-level
    memory to sample the (k) most relevant sentences to a question and feed these
    sentences into attention mechanism on the word-level memory to focus the words
    in the selected sentences. Finally, the prediction is jointly learned over the
    outputs of the sentence-level reasoning module and the word-level attention
    mechanism. The experimental results demonstrate that our approach successfully
    conducts answer selection on unknown words and achieves a better performance
    than memory networks.

    A Fast Factorization-based Approach to Robust PCA

    Chong Peng, Zhao Kang, Qiang Chen
    Comments: ICDM 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Robust principal component analysis (RPCA) has been widely used for
    recovering low-rank matrices in many data mining and machine learning problems.
    It separates a data matrix into a low-rank part and a sparse part. The convex
    approach has been well studied in the literature. However, state-of-the-art
    algorithms for the convex approach usually have relatively high complexity due
    to the need of solving (partial) singular value decompositions of large
    matrices. A non-convex approach, AltProj, has also been proposed with lighter
    complexity and better scalability. Given the true rank $r$ of the underlying
    low rank matrix, AltProj has a complexity of $O(r^2dn)$, where $d imes n$ is
    the size of data matrix. In this paper, we propose a novel factorization-based
    model of RPCA, which has a complexity of $O(kdn)$, where $k$ is an upper bound
    of the true rank. Our method does not need the precise value of the true rank.
    From extensive experiments, we observe that AltProj can work only when $r$ is
    precisely known in advance; however, when the needed rank parameter $r$ is
    specified to a value different from the true rank, AltProj cannot fully
    separate the two parts while our method succeeds. Even when both work, our
    method is about 4 times faster than AltProj. Our method can be used as a
    light-weight, scalable tool for RPCA in the absence of the precise value of the
    true rank.


    Information Retrieval

    Hierarchical Memory Networks for Answer Selection on Unknown Words

    Jiaming Xu, Jing Shi, Yiqun Yao, Suncong Zheng, Bo Xu, Bo Xu
    Comments: 10 pages, to appear in COLING 2016
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Recently, end-to-end memory networks have shown promising results on Question
    Answering task, which encode the past facts into an explicit memory and perform
    reasoning ability by making multiple computational steps on the memory.
    However, memory networks conduct the reasoning on sentence-level memory to
    output coarse semantic vectors and do not further take any attention mechanism
    to focus on words, which may lead to the model lose some detail information,
    especially when the answers are rare or unknown words. In this paper, we
    propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the
    past facts into sentence-level memory and word-level memory respectively. Then,
    (k)-max pooling is exploited following reasoning module on the sentence-level
    memory to sample the (k) most relevant sentences to a question and feed these
    sentences into attention mechanism on the word-level memory to focus the words
    in the selected sentences. Finally, the prediction is jointly learned over the
    outputs of the sentence-level reasoning module and the word-level attention
    mechanism. The experimental results demonstrate that our approach successfully
    conducts answer selection on unknown words and achieves a better performance
    than memory networks.


    Computation and Language

    Stance classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations

    Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik
    Comments: COLING 2016
    Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    Rumour stance classification, the task that determines if each tweet in a
    collection discussing a rumour is supporting, denying, questioning or simply
    commenting on the rumour, has been attracting substantial interest. Here we
    introduce a novel approach that makes use of the sequence of transitions
    observed in tree-structured conversation threads in Twitter. The conversation
    threads are formed by harvesting users’ replies to one another, which results
    in a nested tree-like structure. Previous work addressing stance classification
    task in Twitter has treated each tweet as a separate unit. Here we analyse
    tweets by virtue of their position in a sequence and test two sequential
    classifiers, Linear-Chain CRF and Tree CRF, each of which makes different
    assumptions about the conversation structure. We experiment with eight Twitter
    datasets, collected during breaking news, and show that exploiting the
    sequential structure of Twitter conversations achieves significant improvements
    over the non-sequential methods. Our work is the first to model Twitter
    conversations as a tree structure in this manner, introducing a novel way of
    tackling NLP tasks on Twitter conversations.

    Psychologically Motivated Text Mining

    Ekaterina Shutova, Patricia Lichtenstein
    Subjects: Computation and Language (cs.CL)

    Natural language processing techniques are increasingly applied to identify
    social trends and predict behavior based on large text collections. Existing
    methods typically rely on surface lexical and syntactic information. Yet,
    research in psychology shows that patterns of human conceptualisation, such as
    metaphorical framing, are reliable predictors of human expectations and
    decisions. In this paper, we present a method to learn patterns of metaphorical
    framing from large text collections, using statistical techniques. We apply the
    method to data in three different languages and evaluate the identified
    patterns, demonstrating their psychological validity.

    Unsupervised Neural Hidden Markov Models

    Ke Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, Kevin Knight
    Comments: accepted at EMNLP 2016, Workshop on Structured Prediction for NLP. Oral presentation
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    In this work, we present the first results for neuralizing an Unsupervised
    Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach
    outperforms existing generative models and is competitive with the
    state-of-the-art though with a simpler model easily extended to include
    additional context.

    Byte-based Language Identification with Deep Convolutional Networks

    Johannes Bjerva
    Comments: 6 pages. arXiv admin note: text overlap with arXiv:1609.07053
    Subjects: Computation and Language (cs.CL)

    We report on our system for the shared task on discriminating between similar
    languages (DSL 2016). The system uses only byte representations in a deep
    residual network (ResNet). The system, named ResIdent, is trained only on the
    data released with the task (closed training). We obtain 84.88% accuracy on
    subtask A, 68.80% accuracy on subtask B1, and 69.80% accuracy on subtask B2. A
    large difference in accuracy on development data can be observed with
    relatively minor changes in our network’s architecture and hyperparameters. We
    therefore expect fine-tuning of these parameters to yield higher accuracies.

    Equation Parsing: Mapping Sentences to Grounded Equations

    Subhro Roy, Shyam Upadhyay, Dan Roth
    Subjects: Computation and Language (cs.CL)

    Identifying mathematical relations expressed in text is essential to
    understanding a broad range of natural language text from election reports, to
    financial news, to sport commentaries to mathematical word problems. This paper
    focuses on identifying and understanding mathematical relations described
    within a single sentence. We introduce the problem of Equation Parsing — given
    a sentence, identify noun phrases which represent variables, and generate the
    mathematical equation expressing the relation described in the sentence. We
    introduce the notion of projective equation parsing and provide an efficient
    algorithm to parse text to projective equations. Our system makes use of a high
    precision lexicon of mathematical expressions and a pipeline of structured
    predictors, and generates correct equations in $70\%$ of the cases. In $60\%$
    of the time, it also identifies the correct noun phrase $
    ightarrow$ variables
    mapping, significantly outperforming baselines. We also release a new annotated
    dataset for task evaluation.

    Effective Combination of Language and Vision Through Model Composition and the R-CCA Method

    Hagar Loeub, Roi Reichart
    Comments: 6 pages, 1 figure
    Subjects: Computation and Language (cs.CL)

    We address the problem of integrating textual and visual information in
    vector space models for word meaning representation. We first present the
    Residual CCA (R-CCA) method, that complements the standard CCA method by
    representing, for each modality, the difference between the original signal and
    the signal projected to the shared, max correlation, space. We then show that
    constructing visual and textual representations and then post-processing them
    through composition of common modeling motifs such as PCA, CCA, R-CCA and
    linear interpolation (a.k.a sequential modeling) yields high quality models. On
    five standard semantic benchmarks our sequential models outperform recent
    multimodal representation learning alternatives, including ones that rely on
    joint representation learning. For two of these benchmarks our R-CCA method is
    part of the Best configuration our algorithm yields.

    Character Sequence Models for ColorfulWords

    Kazuya Kawakami, Chris Dyer, Bryan R. Routledge, Noah A. Smith
    Subjects: Computation and Language (cs.CL)

    We present a neural network architecture to predict a point in color space
    from the sequence of characters in the color’s name. Using large scale
    color–name pairs obtained from an online color design forum, we evaluate our
    model on a “color Turing test” and find that, given a name, the colors
    predicted by our model are preferred by annotators to color names created by
    humans. Our datasets and demo system are available online at
    this http URL

    Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification

    Franck Dernoncourt, Ji Young Lee
    Comments: Accepted as a conference paper at IEEE SLT 2016. The two authors contributed equally to this work
    Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Systems based on artificial neural networks (ANNs) have achieved
    state-of-the-art results in many natural language processing tasks. Although
    ANNs do not require manually engineered features, ANNs have many
    hyperparameters to be optimized. The choice of hyperparameters significantly
    impacts models’ performances. However, the ANN hyperparameters are typically
    chosen by manual, grid, or random search, which either requires expert
    experiences or is computationally expensive. Recent approaches based on
    Bayesian optimization using Gaussian processes (GPs) is a more systematic way
    to automatically pinpoint optimal or near-optimal machine learning
    hyperparameters. Using a previously published ANN model yielding
    state-of-the-art results for dialog act classification, we demonstrate that
    optimizing hyperparameters using GP further improves the results, and reduces
    the computational time by a factor of 4 compared to a random search. Therefore
    it is a useful technique for tuning ANN models to yield the best performances
    for natural language processing tasks.

    Deep Reinforcement Learning for Mention-Ranking Coreference Models

    Kevin Clark, Christopher D. Manning
    Comments: To appear in EMNLP 2016
    Subjects: Computation and Language (cs.CL)

    Coreference resolution systems are typically trained with heuristic loss
    functions that require careful tuning. In this paper we instead apply
    reinforcement learning to directly optimize a neural mention-ranking model for
    coreference evaluation metrics. We experiment with two approaches: the
    REINFORCE policy gradient algorithm and a reward-rescaled max-margin objective.
    We find the latter to be more effective, resulting in significant improvements
    over the current state-of-the-art on the English and Chinese portions of the
    CoNLL 2012 Shared Task.

    Hierarchical Memory Networks for Answer Selection on Unknown Words

    Jiaming Xu, Jing Shi, Yiqun Yao, Suncong Zheng, Bo Xu, Bo Xu
    Comments: 10 pages, to appear in COLING 2016
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Recently, end-to-end memory networks have shown promising results on Question
    Answering task, which encode the past facts into an explicit memory and perform
    reasoning ability by making multiple computational steps on the memory.
    However, memory networks conduct the reasoning on sentence-level memory to
    output coarse semantic vectors and do not further take any attention mechanism
    to focus on words, which may lead to the model lose some detail information,
    especially when the answers are rare or unknown words. In this paper, we
    propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the
    past facts into sentence-level memory and word-level memory respectively. Then,
    (k)-max pooling is exploited following reasoning module on the sentence-level
    memory to sample the (k) most relevant sentences to a question and feed these
    sentences into attention mechanism on the word-level memory to focus the words
    in the selected sentences. Finally, the prediction is jointly learned over the
    outputs of the sentence-level reasoning module and the word-level attention
    mechanism. The experimental results demonstrate that our approach successfully
    conducts answer selection on unknown words and achieves a better performance
    than memory networks.

    Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

    Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, Shiyue Zhang
    Comments: Submitted to ICASSP 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Recurrent neural networks (RNNs) have shown clear superiority in sequence
    modeling, particularly the ones with gated units, such as long short-term
    memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties
    behind the remarkable performance remain unclear in many applications, e.g.,
    automatic speech recognition (ASR). This paper employs visualization techniques
    to study the behavior of LSTM and GRU when performing speech recognition tasks.
    Our experiments show some interesting patterns in the gated memory, and some of
    them have inspired simple yet effective modifications on the network structure.
    We report two of such modifications: (1) lazy cell update in LSTM, and (2)
    shortcut connections for residual learning. Both modifications lead to more
    comprehensible and powerful networks.

    Using Natural Language Processing and Qualitative Analysis to Intervene in Gang Violence: A Collaboration Between Social Work Researchers and Data Scientists

    Desmond Upton Patton (Columbia University), Kathleen McKeown (Columbia University), Owen Rambow (Columbia University), Jamie Macbeth (Columbia University)
    Comments: Presented at the Data For Good Exchange 2016
    Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)

    The U.S. has the highest rate of firearm-related deaths when compared to
    other industrialized countries. Violence particularly affects low-income, urban
    neighborhoods in cities like Chicago, which saw a 40% increase in firearm
    violence from 2014 to 2015 to more than 3,000 shooting victims. While recent
    studies have found that urban, gang-involved individuals curate a unique and
    complex communication style within and between social media platforms,
    organizations focused on reducing gang violence are struggling to keep up with
    the growing complexity of social media platforms and the sheer volume of data
    they present. In this paper, describe the Digital Urban Violence Analysis
    Approach (DUVVA), a collaborative qualitative analysis method used in a
    collaboration between data scientists and social work researchers to develop a
    suite of systems for decoding the high- stress language of urban, gang-involved
    youth. Our approach leverages principles of grounded theory when analyzing
    approximately 800 tweets posted by Chicago gang members and participation of
    youth from Chicago neighborhoods to create a language resource for natural
    language processing (NLP) methods. In uncovering the unique language and
    communication style, we developed automated tools with the potential to detect
    aggressive language on social media and aid individuals and groups in
    performing violence prevention and interruption.


    Distributed, Parallel, and Cluster Computing

    A generic framework for the development of geospatial processing pipelines on clusters

    Remi Cresson
    Journal-ref: IEEE journal of geoscience and remote sensing letters, 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The amount of remote sensing data available to applications is constantly
    growing due to the rise of very-high-resolution sensors and short repeat cycle
    satellites. Consequently, tackling computational complexity in Earth
    Observation information extraction is rising as a major challenge. Resorting to
    High Performance Computing (HPC) is becoming a common practice, since it
    provides environments and programming facilities able to speed-up processes. In
    particular, clusters are flexible, cost-effective systems able to perform
    data-intensive tasks ideally fulfilling any computational requirement. However,
    their use typically implies a significant coding effort to build proper
    implementations of specific processing pipelines. This paper presents a generic
    framework for the development of RS images processing applications targeting
    cluster computing. It is based on common open sources libraries, and leverages
    the parallelization of a wide variety of image processing pipelines in a
    transparent way. Performances on typical RS tasks implemented using the
    proposed framework demonstrate a great potential for the effective and timely
    processing of large amount of data.

    Training a Probabilistic Graphical Model with Resistive Switching Electronic Synapses

    S. Burc Eryilmaz, Emre Neftci, Siddharth Joshi, SangBum Kim, Matthew BrightSky, Hsiang-Lan Lung, Chung Lam, Gert Cauwenberghs, H.-S. Philip Wong
    Comments: submitted to IEEE Transactions on Electron Devices
    Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET)

    Current large scale implementations of deep learning and data mining require
    thousands of processors, massive amounts of off-chip memory, and consume
    gigajoules of energy. Emerging memory technologies such as nanoscale
    two-terminal resistive switching memory devices offer a compact, scalable and
    low power alternative that permits on-chip co-located processing and memory in
    fine-grain distributed parallel architecture. Here we report first use of
    resistive switching memory devices for implementing and training a Restricted
    Boltzmann Machine (RBM), a generative probabilistic graphical model as a key
    component for unsupervised learning in deep networks. We experimentally
    demonstrate a 45-synapse RBM realized with 90 resistive switching phase change
    memory (PCM) elements trained with a bio-inspired variant of the Contrastive
    Divergence (CD) algorithm, implementing Hebbian and anti-Hebbian weight
    updates. The resistive PCM devices show a two-fold to ten-fold reduction in
    error rate in a missing pixel pattern completion task trained over 30 epochs,
    compared to untrained case. Measured programming energy consumption is 6.1 nJ
    per epoch with the resistive switching PCM devices, a factor of ~150 times
    lower than conventional processor-memory systems. We analyze and discuss the
    dependence of learning performance on cycle-to-cycle variations as well as
    number of gradual levels in the PCM analog memory devices.


    Learning

    Statistical comparison of classifiers through Bayesian hierarchical modelling

    Giorgio Corani, Alessio Benavoli, Janez Demšar, Francesca Mangili, Marco Zaffalon
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new approach for the statistical comparison of algorithms which
    have been cross-validated on multiple data sets. It is a Bayesian hierarchical
    method; it draws inferences on single and on multiple datasets taking into
    account the mean and the variability of the cross-validation results. It is
    able to detect equivalent classifiers and to claim significances which have a
    practical impact. On each data sets it estimates more accurately than the
    existing methods the difference of accuracy between the two classifiers thanks
    to shrinkage. Such advantages are demonstrated by simulations on synthetic and
    real data.

    Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

    Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, Shiyue Zhang
    Comments: Submitted to ICASSP 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)

    Recurrent neural networks (RNNs) have shown clear superiority in sequence
    modeling, particularly the ones with gated units, such as long short-term
    memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties
    behind the remarkable performance remain unclear in many applications, e.g.,
    automatic speech recognition (ASR). This paper employs visualization techniques
    to study the behavior of LSTM and GRU when performing speech recognition tasks.
    Our experiments show some interesting patterns in the gated memory, and some of
    them have inspired simple yet effective modifications on the network structure.
    We report two of such modifications: (1) lazy cell update in LSTM, and (2)
    shortcut connections for residual learning. Both modifications lead to more
    comprehensible and powerful networks.

    Deep Reinforcement Learning for Tensegrity Robot Locomotion

    Xinyang Geng, Marvin Zhang, Jonathan Bruce, Ken Caluwaerts, Massimo Vespignani, Vytas SunSpiral, Pieter Abbeel, Sergey Levine
    Comments: Under review at the International Conference on Robotics and Automation (ICRA), 2017
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    Tensegrity robots, composed of rigid rods connected by elastic cables, have a
    number of unique properties that make them appealing for use as planetary
    exploration rovers. However, control of tensegrity robots remains a difficult
    problem due to their unusual structures and complex dynamics. In this work, we
    show how locomotion gaits can be learned automatically using a novel extension
    of mirror descent guided policy search (MDGPS) applied to periodic locomotion
    movements, and we demonstrate the effectiveness of our approach on tensegrity
    robot locomotion. We evaluate our method with real-world and simulated
    experiments on the SUPERball tensegrity robot, showing that the learned
    policies generalize to changes in system parameters, unreliable sensor
    measurements, and variation in environmental conditions, including varied
    terrains and a range of different gravities. Our experiments demonstrate that
    our method not only learns fast, power-efficient feedback policies for rolling
    gaits, but that these policies can succeed with only the limited onboard
    sensing provided by SUPERball’s accelerometers. We compare the learned feedback
    policies to learned open-loop policies and hand-engineered controllers, and
    demonstrate that the learned policy enables the first continuous, reliable
    locomotion gait for the real SUPERball robot.

    Learning to Push by Grasping: Using multiple tasks for effective learning

    Lerrel Pinto, Abhinav Gupta
    Comments: Under review at the International Conference on Robotics and Automation (ICRA) 2017
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Recently, end-to-end learning frameworks are gaining prevalence in the field
    of robot control. These frameworks input states/images and directly predict the
    torques or the action parameters. However, these approaches are often critiqued
    due to their huge data requirements for learning a task. The argument of the
    difficulty in scalability to multiple tasks is well founded, since training
    these tasks often require hundreds or thousands of examples. But do end-to-end
    approaches need to learn a unique model for every task? Intuitively, it seems
    that sharing across tasks should help since all tasks require some common
    understanding of the environment. In this paper, we attempt to take the next
    step in data-driven end-to-end learning frameworks: move from the realm of
    task-specific models to joint learning of multiple robot tasks. In an
    astonishing result we show that models with multi-task learning tend to perform
    better than task-specific models trained with same amounts of data. For
    example, a deep-network learned with 2.5K grasp and 2.5K push examples performs
    better on grasping than a network trained on 5K grasp examples.

    Unsupervised Neural Hidden Markov Models

    Ke Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, Kevin Knight
    Comments: accepted at EMNLP 2016, Workshop on Structured Prediction for NLP. Oral presentation
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    In this work, we present the first results for neuralizing an Unsupervised
    Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach
    outperforms existing generative models and is competitive with the
    state-of-the-art though with a simpler model easily extended to include
    additional context.

    Learning from the Hindsight Plan — Episodic MPC Improvement

    Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Model predictive control (MPC) is a popular control method that has proved
    effective for robotics, among other fields. MPC performs re-planning at every
    time step. Re-planning is done with a limited horizon per computational and
    real-time constraints and often also for robustness to potential model errors.
    However, the limited horizon leads to suboptimal performance. In this work, we
    consider the iterative learning setting, where the same task can be repeated
    several times, and propose a policy improvement scheme for MPC. The main idea
    is that between executions we can, offline, run MPC with a longer horizon,
    resulting in a hindsight plan. To bring the next real-world execution closer to
    the hindsight plan, our approach learns to re-shape the original cost function
    with the goal of satisfying the following property: short horizon planning (as
    realistic during real executions) with respect to the shaped cost should result
    in mimicking the hindsight plan. This effectively consolidates long-term
    reasoning into the short-horizon planning. We empirically evaluate our approach
    in contact-rich manipulation tasks both in simulated and real environments,
    such as peg insertion by a real PR2 robot.

    Variational Autoencoder for Deep Learning of Images, Labels and Captions

    Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, Lawrence Carin
    Comments: NIPS 2016 (To appear)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    A novel variational autoencoder is developed to model images, as well as
    associated labels or captions. The Deep Generative Deconvolutional Network
    (DGDN) is used as a decoder of the latent image features, and a deep
    Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used
    to approximate a distribution for the latent DGDN features/code. The latent
    code is also linked to generative models for labels (Bayesian support vector
    machine) or captions (recurrent neural network). When predicting a
    label/caption for a new image at test, averaging is performed across the
    distribution of latent codes; this is computationally efficient as a
    consequence of the learned CNN-based encoder. Since the framework is capable of
    modeling the image in the presence/absence of associated labels/captions, a new
    semi-supervised setting is manifested for CNN learning with images; the
    framework even allows unsupervised CNN learning, based on images alone.

    Multiplicative weights, equalizers, and P=PPAD

    Ioannis Avramopoulos
    Subjects: Computer Science and Game Theory (cs.GT); Computational Complexity (cs.CC); Learning (cs.LG)

    We show that, by using multiplicative weights in a game-theoretic thought
    experiment (and an important convexity result on the composition of
    multiplicative weights with the relative entropy function), a symmetric
    bimatrix game (that is, a bimatrix matrix wherein the payoff matrix of each
    player is the transpose of the payoff matrix of the other) either has an
    interior symmetric equilibrium or there is a pure strategy that is weakly
    dominated by some mixed strategy. Weakly dominated pure strategies can be
    detected and eliminated in polynomial time by solving a linear program.
    Furthermore, interior symmetric equilibria are a special case of a more general
    notion, namely, that of an “equalizer,” which can also be computed efficiently
    in polynomial time by solving a linear program. An elegant “symmetrization
    method” of bimatrix games [Jurg et al., 1992] and the well-known
    PPAD-completeness results on equilibrium computation in bimatrix games
    [Daskalakis et al., 2009, Chen et al., 2009] imply then the compelling P =
    PPAD.

    The Famine of Forte: Few Search Problems Greatly Favor Your Algorithm

    George D. Montanez
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    No Free Lunch theorems show that the average performance across any
    closed-under-permutation set of problems is fixed for all algorithms, under
    appropriate conditions. Extending these results, we demonstrate that the
    proportion of favorable problems is itself strictly bounded, such that no
    single algorithm can perform well over a large fraction of possible problems.
    Our results explain why we must either continue to develop new learning methods
    year after year or move towards highly parameterized models that are both
    flexible and sensitive to their hyperparameters.

    Learning Genomic Representations to Predict Clinical Outcomes in Cancer

    Safoora Yousefi, Congzheng Song, Nelson Nauata, Lee Cooper
    Comments: ICLR 2016 Workshop Track- May 2nd 2016 International Conference on Learning Representations
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Genomics are rapidly transforming medical practice and basic biomedical
    research, providing insights into disease mechanisms and improving therapeutic
    strategies, particularly in cancer. The ability to predict the future course of
    a patient’s disease from high-dimensional genomic profiling will be essential
    in realizing the promise of genomic medicine, but presents significant
    challenges for state-of-the-art survival analysis methods. In this abstract we
    present an investigation in learning genomic representations with neural
    networks to predict patient survival in cancer. We demonstrate the advantages
    of this approach over existing survival analysis methods using brain tumor
    data.

    Analyzing the Behavior of Visual Question Answering Models

    Aishwarya Agrawal, Dhruv Batra, Devi Parikh
    Comments: 13 pages, 20 figures; To appear in EMNLP 2016
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Recently, a number of deep-learning based models have been proposed for the
    task of Visual Question Answering (VQA). The performance of most models is
    clustered around 60-70%. In this paper we propose systematic methods to analyze
    the behavior of these models as a first step towards recognizing their
    strengths and weaknesses, and identifying the most fruitful directions for
    progress. We analyze two models, one each from two major classes of VQA models
    — with-attention and without-attention and show the similarities and
    differences in the behavior of these models. We also analyze the winning entry
    of the VQA Challenge 2016.

    Our behavior analysis reveals that despite recent progress, today’s VQA
    models are “myopic” (tend to fail on sufficiently novel instances), often “jump
    to conclusions” (converge on a predicted answer after ‘listening’ to just half
    the question), and are “stubborn” (do not change their answers across images).


    Information Theory

    Binary Cyclic Codes that are Locally Repairable

    Sreechakra Goparaju, Robert Calderbank
    Comments: This 5 page paper appeared in the proceedings of the IEEE International Symposium on Information Theory (ISIT), 2014
    Journal-ref: 2014 IEEE International Symposium on Information Theory, pp.
    676-680. IEEE, 2014
    Subjects: Information Theory (cs.IT)

    Codes for storage systems aim to minimize the repair locality, which is the
    number of disks (or nodes) that participate in the repair of a single failed
    disk. Simultaneously, the code must sustain a high rate, operate on a small
    finite field to be practically significant and be tolerant to a large number of
    erasures. To this end, we construct new families of binary linear codes that
    have an optimal dimension (rate) for a given minimum distance and locality.
    Specifically, we construct cyclic codes that are locally repairable for
    locality 2 and distances 2, 6 and 10. In doing so, we discover new upper bounds
    on the code dimension, and prove the optimality of enabling local repair by
    provisioning disjoint groups of disks. Finally, we extend our construction to
    build codes that have multiple repair sets for each disk.

    Graph-Theoretic Approaches to Two-Sender Index Coding

    Chandra Thapa, Lawrence Ong, Sarah J. Johnson
    Comments: To be presented at 2016 IEEE Global Communications Conference (GLOBECOM 2016) Workshop on Network Coding and Applications (NetCod), Washington, USA, 2016
    Subjects: Information Theory (cs.IT)

    Consider a communication scenario over a noiseless channel where a sender is
    required to broadcast messages to multiple receivers, each having side
    information about some messages. In this scenario, the sender can leverage the
    receivers’ side information during the encoding of messages in order to reduce
    the required transmissions. This type of encoding is called index coding. In
    this paper, we study index coding with two cooperative senders, each with some
    subset of messages, and multiple receivers, each requesting one unique message.
    The index coding in this setup is called two-sender unicast index coding
    (TSUIC). The main aim of TSUIC is to minimize the total number of transmissions
    required by the two senders. Based on graph-theoretic approaches, we prove that
    TSUIC is equivalent to single-sender unicast index coding (SSUIC) for some
    special cases. Moreover, we extend the existing schemes for SSUIC, viz., the
    cycle-cover scheme, the clique-cover scheme, and the local-chromatic scheme to
    the corresponding schemes for TSUIC.

    Reliability of universal decoding based on vector-quantized codewords

    Neri Merhav
    Comments: 31 pages; submitted for publication
    Subjects: Information Theory (cs.IT)

    Motivated by applications of biometric identification and content
    identification systems, we consider the problem of random coding for channels,
    where each codeword undergoes lossy compression (vector quantization), and
    where the decoder bases its decision only on the compressed codewords and the
    channel output, which is in turn, the channel’s response to the transmission of
    an original codeword, before compression. For memoryless sources and memoryless
    channels with finite alphabets, we propose a new universal decoder and analyze
    its error exponent, which improves on an earlier result by Dasarathy and Draper
    (2011), who used the classic maximum mutual information (MMI) universal
    decoder. Further, we show that our universal decoder provides the same error
    exponent as that of the optimal, maximum likelihood (ML) decoder, at least as
    long as all single-letter transition probabilities of the channel are positive.
    We conjecture that the same argument remains true even without this positivity
    condition.

    On Shared Rate Time Series for Mobile Users in Poisson Networks

    Pranav Madadi, François Baccelli, Gustavo de Veciana
    Subjects: Information Theory (cs.IT)

    This paper focuses on modeling and analysis of the temporal performance
    variations experienced by a mobile user in a wireless network and its impact on
    system level design. We consider a simple stochastic geometry model: the
    infrastructure nodes are Poisson distributed while the user’s motion is the
    simplest possible i.e., constant velocity on a straight line. We first
    characterize variations in the SNR process, and associated downlink Shannon
    rate, resulting from variations in the infrastructure geometry seen by the
    mobile. Specifically, by making a connection between stochastic geometry and
    queueing theory the level crossings of the SNR process are shown to form an
    alternating renewal process whose distribution can be completely characterized.
    For large or small SNR levels, and associated rare events, we further derive
    simple distributional (exponential) models. We then characterize the second
    major contributor to variation, associated with changes in the number of other
    users sharing infrastructure. Combining these two effects, we study what are
    the dominant factors (infrastructure geometry or sharing number) given mobile
    experiences a very high or low shared rate. These results are then used to
    evaluate and optimize the system-level Quality of Service (QoS) and
    system-level capacity to support mobile users sharing wireless infrastructure,
    including mobile devices streaming video which proactively buffer content to
    prevent rebuffering and mobiles which are downloading large files. Finally, we
    use simulation to assess the fidelity of this model and its robustness to
    factors which are presently taken into account.

    Reduced-Complexity SCL Decoding of Multi-CRC-Aided Polar Codes

    Mao-Ching Chiu, Wei-De Wu
    Comments: 9 pages, 7 figures
    Subjects: Information Theory (cs.IT)

    Cyclic redundancy check (CRC) aided polar codes are capable of achieving
    better performance than low-density parity-check (LDPC) codes under the
    successive cancelation list (SCL) decoding scheme. However, the SCL decoding
    scheme suffers from very high space and time complexities. Especially, the high
    space complexity is a major concern for adopting polar codes in modern mobile
    communication standards. In this paper, we propose a novel reduced-complexity
    successive cancelation list (R-SCL) decoding scheme which is effective to
    reduce the space complexity. Simulation results show that, with a (2048, 1024)
    CRC-aided polar code, the R-SCL decoders with 25% reduction of space complexity
    and 8% reduction of time complexity can still achieve almost the same
    performance levels as those decoded by SCL decoders. To further reduce the
    complexity, we propose a multi-CRC coding scheme for polar codes. Simulation
    results show that, with a (16384, 8192) multi-CRC-aided polar code, a R-SCL
    decoder with about 85% reduction of space complexity and 20% reduction of time
    complexity results in a worst performance loss of only 0.04dB.

    Mitigating Pilot Contamination Through Location-Aware Pilot Assignment in Massive MIMO Networks

    Noman Akbar, Shihao Yan, Nan Yang, Jinhong Yuan
    Comments: Accepted to appear in IEEE Globecom Workshop (ET5G)
    Subjects: Information Theory (cs.IT)

    We propose a novel location-aware pilot assignment scheme to mitigate pilot
    contamination in massive multiple-input multiple-output (MIMO) networks, where
    the channels are subjected to Rician fading. Our proposed scheme utilizes the
    location information of users as the input to conduct pilot assignment in the
    network. Based on the location information, we first determine the line of
    sight (LOS) interference between the intended signal and the interfering
    signal. Our analysis reveals that the LOS interference converges to zero as the
    number of antennas at the base station (BS) goes to infinity, whereas for
    finite number of antennas at the BS the LOS interference indeed depends on
    specific pilot allocation strategies. Following this revelation, we assign
    pilot sequences to all the users in the massive MIMO network such that the LOS
    interference is minimized for finite number of antennas at the BS. Our proposed
    scheme outperforms the random pilot assignment in terms of achieving a much
    higher uplink sum rate for reasonable values of the Rician K-factor. Moreover,
    we propose a new performance metric, which measures the strength of the LOS
    interference and demonstrate that it is a good metric to analyze the
    performance of pilot assignment schemes. Theoretical analysis is provided and
    verified through numerical simulations to support the proposed scheme.

    Efficient Offline Waveform Design Using Quincunx/Hexagonal Time-Frequency Lattices For 5G Systems

    Raouia Ayadi, Inès Kammoun, Mohamed Siala
    Subjects: Information Theory (cs.IT)

    Conventional OFDM, adopted in LTE-A systems, cannot provide the quality of
    service requirements sought in 5G systems because of extreme natural channel
    impairments caused by higher Doppler spreads and unexpected artificial
    impairments caused by multi-source transmission, to be brought by 5G, and by
    synchronization relaxation for closed-loop signaling overhead reduction in some
    5G applications. These severe impairments induce a strong loss of orthogonality
    between subcarriers and OFDM symbols and, therefore, lead to a dramatic
    increase in ICI and ISI. To be well armed against these dramatic impairments,
    we, in the present paper, optimize the transmit/receive waveforms for FBMC
    systems, with hexagonal time-frequency lattices, operating over severe doubly
    dispersive channels, accounting for both natural and artificial impairments.
    For this, we exploit the POPS paradigm, recently proposed for rectangular
    time-frequency lattices, to design offline waveforms maximizing the SINR for
    hexagonal time-frequency lattices. We show that FBMC, with hexagonal lattices,
    offers a strong improvement in SIR with respect to conventional OFDM and an
    improvement of 1dB with respect to POPS-FBMC, with classical rectangular
    lattices. Furthermore, we show that the hexagonal POPS-FBMC brings more
    robustness to frequency synchronization errors and offers a 10dB reduction in
    OOB emissions with respect to rectangular POPS-FBMC.

    The Famine of Forte: Few Search Problems Greatly Favor Your Algorithm

    George D. Montanez
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)

    No Free Lunch theorems show that the average performance across any
    closed-under-permutation set of problems is fixed for all algorithms, under
    appropriate conditions. Extending these results, we demonstrate that the
    proportion of favorable problems is itself strictly bounded, such that no
    single algorithm can perform well over a large fraction of possible problems.
    Our results explain why we must either continue to develop new learning methods
    year after year or move towards highly parameterized models that are both
    flexible and sensitive to their hyperparameters.




沪ICP备19023445号-2号
友情链接