IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 24 May 2017

    我爱机器学习(52ml.net)发表于 2017-05-24 00:00:00
    love 0

    Neural and Evolutionary Computing

    Unbiasing Truncated Backpropagation Through Time

    Corentin Tallec, Yann Ollivier
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Truncated Backpropagation Through Time (truncated BPTT) is a widespread
    method for learning recurrent computational graphs. Truncated BPTT keeps the
    computational benefits of Backpropagation Through Time (BPTT) while relieving
    the need for a complete backtrack through the whole data sequence at every
    step. However, truncation favors short-term dependencies: the gradient estimate
    of truncated BPTT is biased, so that it does not benefit from the convergence
    guarantees from stochastic gradient theory. We introduce Anticipated Reweighted
    Truncated Backpropagation (ARTBP), an algorithm that keeps the computational
    benefits of truncated BPTT, while providing unbiasedness. ARTBP works by using
    variable truncation lengths together with carefully chosen compensation factors
    in the backpropagation equation. We check the viability of ARTBP on two tasks.
    First, a simple synthetic task where careful balancing of temporal dependencies
    at different scales is needed: truncated BPTT displays unreliable performance,
    and in worst case scenarios, divergence, while ARTBP converges reliably.
    Second, on Penn Treebank character-level language modelling, ARTBP slightly
    outperforms truncated BPTT.

    A divide and conquer method for symbolic regression

    Changtong Luo, Chen Chen, Zonglin Jiang
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Symbolic regression aims to find a function that best explains the
    relationship between independent variables and the objective value based on a
    given set of sample data. Genetic programming (GP) is usually considered as an
    appropriate method for the problem since it can optimize functional structure
    and coefficients simultaneously. However, the convergence speed of GP might be
    too slow for large scale problems that involve a large number of variables.
    Fortunately, in many applications, the target function is separable or
    partially separable. This feature motivated us to design a new method, divide
    and conquer (D&C), for symbolic regression, in which the target function is
    divided into a number of sub-functions and the sub-functions are then
    determined by any GP algorithms available. The separability is probed by a new
    proposed technique, Bi-Correlation test (BiCT). D&C powered GP has been tested
    on some real-world applications, and the study shows that D&C can help GP to
    get the target function more rapidly and more reliably.

    An evolutionary strategy for DeltaE – E identification

    Katarzyna Schmidt, Oskar Wyszynski
    Subjects: Instrumentation and Detectors (physics.ins-det); Neural and Evolutionary Computing (cs.NE); Software Engineering (cs.SE); Nuclear Experiment (nucl-ex)

    In this article we present an automatic method for charge and mass
    identification of charged nuclear fragments produced in heavy ion collisions at
    intermediate energies. The algorithm combines a generative model of DeltaE – E
    relation and a Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES). The
    CMA-ES is a stochastic and derivative-free method employed to search parameter
    space of the model by means of a fitness function. The article describes
    details of the method along with results of an application on simulated labeled
    data.

    Matching neural paths: transfer from recognition to correspondence search

    Nikolay Savinov, Lubor Ladicky, Marc Pollefeys
    Comments: Submitted to NIPS 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Many machine learning tasks require finding per-part correspondences between
    objects. In this work we focus on low-level correspondences – a highly
    ambiguous matching problem. We propose to use a hierarchical semantic
    representation of the objects, coming from a convolutional neural network, to
    solve this ambiguity. Training it for low-level correspondence prediction
    directly might not be an option in some domains where the ground-truth
    correspondences are hard to obtain. We show how transfer from recognition can
    be used to avoid such training. Our idea is to mark parts as “matching” if
    their features are close to each other at all the levels of convolutional
    feature hierarchy (neural paths). Although the overall number of such paths is
    exponential in the number of layers, we propose a polynomial algorithm for
    aggregating all of them in a single backward pass. The empirical validation is
    done on the task of stereo correspondence and demonstrates that we achieve
    competitive results among the methods which do not use labeled target domain
    data.

    Sluice networks: Learning what to share between loosely related tasks

    Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard
    Comments: 10 pages, 3 figures, 5 tables
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Multi-task learning is partly motivated by the observation that humans bring
    to bear what they know about related problems when solving new ones. Similarly,
    deep neural networks can profit from related tasks by sharing parameters with
    other networks. However, humans do not consciously decide to transfer knowledge
    between tasks (and are typically not aware of the transfer). In machine
    learning, it is hard to estimate if sharing will lead to improvements;
    especially if tasks are only loosely related. To overcome this, we introduce
    Sluice Networks, a general framework for multi-task learning where trainable
    parameters control the amount of sharing — including which parts of the models
    to share. Our framework goes beyond and generalizes over previous proposals in
    enabling hard or soft sharing of all combinations of subspaces, layers, and
    skip connections. We perform experiments on three task pairs from natural
    language processing, and across seven different domains, using data from
    OntoNotes 5.0, and achieve up to 15% average error reductions over common
    approaches to multi-task learning. We analyze when the architecture is
    particularly helpful, as well as its ability to fit noise. We show that a)
    label entropy is predictive of gains in sluice networks, confirming findings
    for hard parameter sharing, and b) while sluice networks easily fit noise, they
    are robust across domains in practice.

    Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices

    Tayfun Gokmen, O. Murat Onen, Wilfried Haensch
    Comments: 22 pages, 6 figures, 2 tables
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    In a previous work we have detailed the requirements to obtain a maximal
    performance benefit by implementing fully connected deep neural networks (DNN)
    in form of arrays of resistive devices for deep learning. This concept of
    Resistive Processing Unit (RPU) devices we extend here towards convolutional
    neural networks (CNNs). We show how to map the convolutional layers to RPU
    arrays such that the parallelism of the hardware can be fully utilized in all
    three cycles of the backpropagation algorithm. We find that the noise and bound
    limitations imposed due to analog nature of the computations performed on the
    arrays effect the training accuracy of the CNNs. Noise and bound management
    techniques are presented that mitigate these problems without introducing any
    additional complexity in the analog circuits and can be addressed by the
    digital circuits. In addition, we discuss digitally programmable update
    management and device variability reduction techniques that can be used
    selectively for some of the layers in a CNN. We show that combination of all
    those techniques enables a successful application of the RPU concept for
    training CNNs. The techniques discussed here are more general and can be
    applied beyond CNN architectures and therefore enables applicability of RPU
    approach for large class of neural network architectures.

    pix2code: Generating Code from a Graphical User Interface Screenshot

    Tony Beltramelli
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Transforming a graphical user interface screenshot created by a designer into
    computer code is a typical task conducted by a developer in order to build
    customized software, websites and mobile applications. In this paper, we show
    that Deep Learning techniques can be leveraged to automatically generate code
    given a graphical user interface screenshot as input. Our model is able to
    generate code targeting three different platforms (i.e. iOS, Android and
    web-based technologies) from a single input image with over 77% of accuracy.

    Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

    Chris Donahue, Akshay Balsubramani, Julian McAuley, Zachary C. Lipton
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We propose a new algorithm for training generative adversarial networks to
    jointly learn latent codes for both identities (e.g. individual humans) and
    observations (e.g. specific photographs). In practice, this means that by
    fixing the identity portion of latent codes, we can generate diverse images of
    the same subject, and by fixing the observation portion we can traverse the
    manifold of subjects while maintaining contingent aspects such as lighting and
    pose. Our algorithm features a pairwise training scheme in which each sample
    from the generator consists of two images with a common identity code.
    Corresponding samples from the real dataset consist of two distinct photographs
    of the same subject. In order to fool the discriminator, the generator must
    produce images that are both photorealistic, distinct, and appear to depict the
    same person. We augment both the DCGAN and BEGAN approaches with Siamese
    discriminators to accommodate pairwise training. Experiments with human judges
    and an off-the-shelf face verification system demonstrate our algorithm’s
    ability to generate convincing, identity-matched photographs.


    Computer Vision and Pattern Recognition

    AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

    Chunhui Gu, Chen Sun, Sudheendra Vijayanarasimhan, Caroline Pantofaru, David A. Ross, George Toderici, Yeqing Li, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik
    Comments: 15 pages, ICCV 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper introduces a video dataset of spatio-temporally localized Atomic
    Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual
    actions in 64k movie clips with actions localized in space and time, resulting
    in 197k action labels with multiple labels per human occurring frequently. The
    main differences with existing video datasets are: (1) the definition of atomic
    visual actions, which avoids collecting data for each and every complex action;
    (2) precise spatio-temporal annotations with possibly multiple annotations for
    each human; (3) the use of diverse, realistic video material (movies). This
    departs from existing datasets for spatio-temporal action recognition, such as
    JHMDB and UCF datasets, which provide annotations for at most 24 composite
    actions, such as basketball dunk, captured in specific environments, i.e.,
    basketball court.

    We implement a state-of-the-art approach for action localization. Despite
    this, the performance on our dataset remains low and underscores the need for
    developing new approaches for video understanding. The AVA dataset is the first
    step in this direction, and enables the measurement of performance and progress
    in realistic scenarios.

    Classification of Aerial Photogrammetric 3D Point Clouds

    Carlos Becker, Nicolai Häni, Elena Rosinskaya, Emmanuel d'Angelo, Christoph Strecha
    Comments: ISPRS 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a powerful method to extract per-point semantic class labels from
    aerialphotogrammetry data. Labeling this kind of data is important for tasks
    such as environmental modelling, object classification and scene understanding.
    Unlike previous point cloud classification methods that rely exclusively on
    geometric features, we show that incorporating color information yields a
    significant increase in accuracy in detecting semantic classes. We test our
    classification method on three real-world photogrammetry datasets that were
    generated with Pix4Dmapper Pro, and with varying point densities. We show that
    off-the-shelf machine learning techniques coupled with our new features allow
    us to train highly accurate classifiers that generalize well to unseen data,
    processing point clouds containing 10 million points in less than 3 minutes on
    a desktop computer.

    Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues

    Talha Qaiser, Abhik Mukherjee, Chaitanya Reddy Pb, Sai Dileep Munugoti, Vamsi Tallam, Tomi Pitkäaho, Taina Lehtimäki, Thomas Naughton, Matt Berseth, Aníbal Pedraza, Ramakrishnan Mukundan, Matthew Smith, Abhir Bhalerao, Erik Rodner, Marcel Simon, Joachim Denzler, Chao-Hui Huang, Gloria Bueno David Snead, Ian Ellis, Mohammad Ilyas, Nasir Rajpoot
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Evaluating expression of the Human epidermal growth factor receptor 2 (Her2)
    by visual examination of immunohistochemistry (IHC) on invasive breast cancer
    (BCa) is a key part of the diagnostic assessment of BCa due to its recognised
    importance as a predictive and prognostic marker in clinical practice. However,
    visual scoring of Her2 is subjective and consequently prone to inter-observer
    variability. Given the prognostic and therapeutic implications of Her2 scoring,
    a more objective method is required. In this paper, we report on a recent
    automated Her2 scoring contest, held in conjunction with the annual PathSoc
    meeting held in Nottingham in June 2016, aimed at systematically comparing and
    advancing the state-of-the-art Artificial Intelligence (AI) based automated
    methods for Her2 scoring. The contest dataset comprised of digitised whole
    slide images (WSI) of sections from 86 cases of invasive breast carcinoma
    stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting
    algorithms automatically predicted scores of the IHC slides for an unseen
    subset of the dataset and the predicted scores were compared with the ‘ground
    truth’ (a consensus score from at least two experts). We also report on a
    simple Man vs Machine contest for the scoring of Her2 and show that the
    automated methods could beat the pathology experts on this contest dataset.
    This paper presents a benchmark for comparing the performance of automated
    algorithms for scoring of Her2. It also demonstrates the enormous potential of
    automated algorithms in assisting the pathologist with objective IHC scoring.

    Improvements to Frank-Wolfe optimization for multi-detector multi-object tracking

    Roberto Henschel, Laura Leal-Taixé, Daniel Cremers, Bodo Rosenhahn
    Comments: 13 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a novel formulation for the multi-object
    tracking-by-detection paradigm for two (or more) input detectors. Using
    full-body and heads detections, the fusion helps to recover heavily occluded
    persons and to reduce false positives. The assignment of the two input features
    to a person and the extraction of the trajectories is commonly solved from one
    binary quadratic program (BQP). Due to the computational complexity of the
    NP-hard QP, we approximate the solution using the Frank-Wolfe algorithm. We
    propose several improvements to this solver affecting better minimization and
    shorter computations, compared to off-the-shelf BQP-solvers and the standard
    Frank-Wolfe algorithm. Evaluation on pedestrian tracking is provided for
    multiple scenarios, showing improved tracking quality over single input feature
    trackers and standard QP-solvers. Finally we present the performance of our
    tracker on the challenging MOTNEW benchmark, being comparable to
    state-of-the-art trackers.

    Anatomically Constrained Neural Networks (ACNN): Application to Cardiac Image Enhancement and Segmentation

    Ozan Oktay, Enzo Ferrante, Konstantinos Kamnitsas, Mattias Heinrich, Wenjia Bai, Jose Caballero, Ricardo Guerrero, Stuart Cook, Antonio de Marvao, Declan O'Regan, Bernhard Kainz, Ben Glocker, Daniel Rueckert
    Comments: Submission to IEEE Transactions on Medical Imaging / May’17
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Incorporation of prior knowledge about organ shape and location is key to
    improve performance of image analysis approaches. In particular, priors can be
    useful in cases where images are corrupted and contain artefacts due to
    limitations in image acquisition. The highly constrained nature of anatomical
    objects can be well captured with learning based techniques. However, in most
    recent and promising techniques such as CNN based segmentation it is not
    obvious how to incorporate such prior knowledge. State-of-the-art methods
    operate as pixel-wise classifiers where the training objectives do not
    incorporate the structure and inter-dependencies of the output. To overcome
    this limitation, we propose a generic training strategy that incorporates
    anatomical prior knowledge into CNNs through a new regularisation model, which
    is trained end-to-end. The new framework encourages models to follow the global
    anatomical properties of the underlying anatomy (e.g. shape, label structure)
    via learned non-linear representations of the shape. We show that the proposed
    approach can be easily adapted to different analysis tasks (e.g. image
    enhancement, segmentation) and improve the prediction accuracy of the
    state-of-the-art models. The applicability of our approach is shown on
    multi-modal cardiac datasets and public benchmarks. Additionally, we
    demonstrate how the learned deep models of 3D shapes can be interpreted and
    used as biomarkers for classification of cardiac pathologies.

    An Invariant Model of the Significance of Different Body Parts in Recognizing Different Actions

    Yuping Shen, Hassan Foroosh
    Comments: arXiv admin note: substantial text overlap with arXiv:1705.04641, arXiv:1705.05741, arXiv:1705.04433
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we show that different body parts do not play equally
    important roles in recognizing a human action in video data. We investigate to
    what extent a body part plays a role in recognition of different actions and
    hence propose a generic method of assigning weights to different body points.
    The approach is inspired by the strong evidence in the applied perception
    community that humans perform recognition in a foveated manner, that is they
    recognize events or objects by only focusing on visually significant aspects.
    An important contribution of our method is that the computation of the weights
    assigned to body parts is invariant to viewing directions and camera parameters
    in the input data. We have performed extensive experiments to validate the
    proposed approach and demonstrate its significance. In particular, results show
    that considerable improvement in performance is gained by taking into account
    the relative importance of different body parts as defined by our approach.

    How hard can it be? Estimating the difficulty of visual search in an image

    Radu Tudor Ionescu, Bogdan Alexe, Marius Leordeanu, Marius Popescu, Dim P. Papadopoulos, Vittorio Ferrari
    Comments: Published at CVPR 2016
    Journal-ref: In Proceedings of CVPR, pp. 2157-2166, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We address the problem of estimating image difficulty defined as the human
    response time for solving a visual search task. We collect human annotations of
    image difficulty for the PASCAL VOC 2012 data set through a crowd-sourcing
    platform. We then analyze what human interpretable image properties can have an
    impact on visual search difficulty, and how accurate are those properties for
    predicting difficulty. Next, we build a regression model based on deep features
    learned with state of the art convolutional neural networks and show better
    results for predicting the ground-truth visual search difficulty scores
    produced by human annotators. Our model is able to correctly rank about 75%
    image pairs according to their difficulty score. We also show that our
    difficulty predictor generalizes well to new classes not seen during training.
    Finally, we demonstrate that our predicted difficulty scores are useful for
    weakly supervised object localization (8% improvement) and semi-supervised
    object classification (1% improvement).

    A New 3D Segmentation Technique for QCT Scans of the Lumbar Spine to Determine BMD and Vertebral Geometry

    Andre Mastmeyer, Klaus Engelke, Christina Fuchs, Willi Kalender
    Comments: 2 pages, 2 figures, International Congress of Medical Physics 2005
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Quantitative computed tomography (QCT) is a standard method to determine bone
    mineral density (BMD) in the spine. Traditionally single 8 – 10 mm thick slices
    have been analyzed only. Current spiral CT scanners provide true 3D acquisition
    schemes allowing for a more differential BMD analysis and an assessment of
    geometric parameters, which may improve fracture prediction. We developed a
    novel 3D segmentation approach that combines deformable balloons, multi seeded
    volume growing, and dedicated morphological operations to extract the vertebral
    bodies. An anatomy-oriented coordinate system attached automatically to each
    vertebra is used to define volumes of interest. We analyzed intra-operator
    precision of the segmentation procedure using abdominal scans from 10 patients
    (60 mAs, 120 kV, slice thickness 1mm, B40s, Siemens Sensation 16). Our new
    segmentation method shows excellent precision errors in the order of < 1 % for
    BMD and < 2 % for volume.

    Matching neural paths: transfer from recognition to correspondence search

    Nikolay Savinov, Lubor Ladicky, Marc Pollefeys
    Comments: Submitted to NIPS 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Many machine learning tasks require finding per-part correspondences between
    objects. In this work we focus on low-level correspondences – a highly
    ambiguous matching problem. We propose to use a hierarchical semantic
    representation of the objects, coming from a convolutional neural network, to
    solve this ambiguity. Training it for low-level correspondence prediction
    directly might not be an option in some domains where the ground-truth
    correspondences are hard to obtain. We show how transfer from recognition can
    be used to avoid such training. Our idea is to mark parts as “matching” if
    their features are close to each other at all the levels of convolutional
    feature hierarchy (neural paths). Although the overall number of such paths is
    exponential in the number of layers, we propose a polynomial algorithm for
    aggregating all of them in a single backward pass. The empirical validation is
    done on the task of stereo correspondence and demonstrates that we achieve
    competitive results among the methods which do not use labeled target domain
    data.

    Accelerating Discrete Wavelet Transforms on GPUs

    David Barina, Michal Kula, Michal Matysek, Pavel Zemcik
    Comments: preprint submitted to ICIP 2017. arXiv admin note: substantial text overlap with arXiv:1704.08657
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    The two-dimensional discrete wavelet transform has a huge number of
    applications in image-processing techniques. Until now, several papers compared
    the performance of such transform on graphics processing units (GPUs). However,
    all of them only dealt with lifting and convolution computation schemes. In
    this paper, we show that corresponding horizontal and vertical lifting parts of
    the lifting scheme can be merged into non-separable lifting units, which halves
    the number of steps. We also discuss an optimization strategy leading to a
    reduction in the number of arithmetic operations. The schemes were assessed
    using the OpenCL and pixel shaders. The proposed non-separable lifting scheme
    outperforms the existing schemes in many cases, irrespective of its higher
    complexity.

    Isomorphism between Differential and Moment Invariants under Affine Transform

    Erbo Li, Hua Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The invariant is one of central topics in science, technology and
    engineering. The differential invariant is essential in understanding or
    describing some important phenomena or procedures in mathematics, physics,
    chemistry, biology or computer science etc. The derivation of differential
    invariants is usually difficult or complicated. This paper reports a discovery
    that under the affine transform, differential invariants have similar
    structures with moment invariants up to a scalar function of transform
    parameters. If moment invariants are known, relative differential invariants
    can be obtained by the substitution of moments by derivatives with the same
    order. Whereas moment invariants can be calculated by multiple integrals, this
    method provides a simple way to derive differential invariants without the need
    to resolve any equation system. Since the definition of moments on different
    manifolds or in different dimension of spaces is well established, differential
    invariants on or in them will also be well defined. Considering that moments
    have a strong background in mathematics and physics, this technique offers a
    new view angle to the inner structure of invariants. Projective differential
    invariants can also be found in this way with a screening process.

    Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery

    Menglong Ye, Edward Johns, Ankur Handa, Lin Zhang, Philip Pratt, Guang-Zhong Yang
    Comments: A two-page short report to be presented at the Hamlyn Symposium on Medical Robotics 2017. An extension of this work is on progress
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    Robotic surgery has become a powerful tool for performing minimally invasive
    procedures, providing advantages in dexterity, precision, and 3D vision, over
    traditional surgery. One popular robotic system is the da Vinci surgical
    platform, which allows preoperative information to be incorporated into live
    procedures using Augmented Reality (AR). Scene depth estimation is a
    prerequisite for AR, as accurate registration requires 3D correspondences
    between preoperative and intraoperative organ models. In the past decade, there
    has been much progress on depth estimation for surgical scenes, such as using
    monocular or binocular laparoscopes [1,2]. More recently, advances in deep
    learning have enabled depth estimation via Convolutional Neural Networks (CNNs)
    [3], but training requires a large image dataset with ground truth depths.
    Inspired by [4], we propose a deep learning framework for surgical scene depth
    estimation using self-supervision for scalable data acquisition. Our framework
    consists of an autoencoder for depth prediction, and a differentiable spatial
    transformer for training the autoencoder on stereo image pairs without ground
    truth depths. Validation was conducted on stereo videos collected in robotic
    partial nephrectomy.

    Distributed Algorithms for Feature Extraction Off-loading in Multi-Camera Visual Sensor Networks

    Emil Eriksson, György Dán, Viktoria Fodor
    Comments: 12 pages, 7 figures, submitted to Transactions on Circuits and Systems for Video Technology
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Real-time visual analysis tasks, like tracking and recognition, require swift
    execution of computationally intensive algorithms. Visual sensor networks can
    be enabled to perform such tasks by augmenting the sensor network with
    processing nodes and distributing the computational burden in a way that the
    cameras contend for the processing nodes while trying to minimize their task
    completion times. In this paper, we formulate the problem of minimizing the
    completion time of all camera sensors as an optimization problem. We propose
    algorithms for fully distributed optimization, analyze the existence of
    equilibrium allocations, evaluate the effect of the network topology and of the
    video characteristics, and the benefits of central coordination. Our results
    demonstrate that with sufficient information available, distributed
    optimization can provide low completion times, moreover predictable and stable
    performance can be achieved with additional, sparse central coordination.

    On the mathematics of beauty: beautiful images

    Abdullah Khalili
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we will study the simplest kind of beauty that can be found in
    a simple visual pattern and can be appreciated universally. The proposed model
    suggest that there is a link between beautiful pattern and the Dyson
    distribution, which seems to be the result of a deeper optimisation process
    between randomness and regularity. Then we show that beautiful patterns need to
    satisfy a more fundamental law that seeks to deliver the highest amount of
    information using the least amount of energy. The proposed model is used to
    classify and generate beautiful patterns.

    Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

    Michael Gygli
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    Shot boundary detection (SBD) is an important component of many video
    analysis tasks, such as action recognition, video indexing, summarization and
    editing. Previous work typically used a combination of low-level features like
    color histograms, in conjunction with simple models such as SVMs. Instead, we
    propose to learn shot detection end-to-end, from pixels to final shot
    boundaries. For training such a model, we rely on our insight that all shot
    boundaries are generated. Thus, we create a dataset with one million frames and
    automatically generated transitions such as cuts, dissolves and fades. In order
    to efficiently analyze hours of videos, we propose a Convolutional Neural
    Network (CNN) which is fully convolutional in time, thus allowing to use a
    large temporal context without the need to repeatedly processing frames. With
    this architecture our method obtains state-of-the-art results while running at
    an unprecedented speed of more than 120x real-time.

    Salient Object Detection with Semantic Priors

    Tam V. Nguyen, Luoqi Liu
    Comments: accepted to IJCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Salient object detection has increasingly become a popular topic in cognitive
    and computational sciences, including computer vision and artificial
    intelligence research. In this paper, we propose integrating extit{semantic
    priors} into the salient object detection process. Our algorithm consists of
    three basic steps. Firstly, the explicit saliency map is obtained based on the
    semantic segmentation refined by the explicit saliency priors learned from the
    data. Next, the implicit saliency map is computed based on a trained model
    which maps the implicit saliency priors embedded into regional features with
    the saliency values. Finally, the explicit semantic map and the implicit map
    are adaptively fused to form a pixel-accurate saliency map which uniformly
    covers the objects of interest. We further evaluate the proposed framework on
    two challenging datasets, namely, ECSSD and HKUIS. The extensive experimental
    results demonstrate that our method outperforms other state-of-the-art methods.

    Unmasking the abnormal events in video

    Radu Tudor Ionescu, Sorina Smeureanu, Bogdan Alexe, Marius Popescu
    Comments: Submitted at a conference for review
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel framework for abnormal event detection in video that
    requires no training sequences. Our framework is based on unmasking, a
    technique previously used for authorship verification in text documents, which
    we adapt to our task. We iteratively train a binary classifier to distinguish
    between two consecutive video sequences while removing at each step the most
    discriminant features. Higher training accuracy rates of the intermediately
    obtained classifiers represent abnormal events. To the best of our knowledge,
    this is the first work to apply unmasking for a computer vision task. We
    compare our method with several state-of-the-art supervised and unsupervised
    methods on four benchmark data sets. The empirical results indicate that our
    abnormal event detection framework can achieve state-of-the-art results, while
    running in real-time at 20 frames per second.

    Correlation Alignment by Riemannian Metric for Domain Adaptation

    Pietro Morerio, Vittorio Murino
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Domain adaptation techniques address the problem of reducing the sensitivity
    of machine learning methods to the so-called domain shift, namely the
    difference between source (training) and target (test) data distributions. In
    particular, unsupervised domain adaptation assumes no labels are available in
    the target domain. To this end, aligning second order statistics (covariances)
    of target and source domains have proven to be an effective approach ti fill
    the gap between the domains. However, covariance matrices do not form a
    subspace of the Euclidean space, but live in a Riemannian manifold with
    non-positive curvature, making the usual Euclidean metric suboptimal to measure
    distances. In this paper, we extend the idea of training a neural network with
    a constraint on the covariances of the hidden layer features, by rigorously
    accounting for the curved structure of the manifold of symmetric positive
    definite matrices. The resulting loss function exploits a theoretically sound
    geodesic distance on such manifold. Results show indeed the suboptimal nature
    of the Euclidean distance. This makes us able to perform better than previous
    approaches on the standard Office dataset, a benchmark for domain adaptation
    techniques.

    Look, Listen and Learn

    Relja Arandjelović, Andrew Zisserman
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We consider the question: what can be learnt by looking at and listening to a
    large amount of unlabelled videos? There is a valuable, but so far untapped,
    source of information contained in the video itself — the correspondence
    between the visual and the audio streams, and we introduce a novel
    “Audio-Visual Correspondence” learning task that makes use of this. Training
    visual and audio networks from scratch, without any additional supervision
    other than the raw unconstrained videos themselves, is shown to successfully
    solve this task, and, more interestingly, result in good vision and audio
    representations. These features set the new state-of-the-art on two sound
    classification benchmarks, and perform on par with the state-of-the-art
    self-supervised approaches on ImageNet classification. We also demonstrate that
    the network is able to localize objects in both modalities, as well as perform
    fine-grained recognition tasks.

    A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data

    Benjamin Gutierrez, Loic Peter, Tassilo Klein, Christian Wachinger
    Comments: MICCAI 2017 Proceedings
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    With the availability of big medical image data, the selection of an adequate
    training set is becoming more important to address the heterogeneity of
    different datasets. Simply including all the data does not only incur high
    processing costs but can even harm the prediction. We formulate the smart and
    efficient selection of a training dataset from big medical image data as a
    multi-armed bandit problem, solved by Thompson sampling. Our method assumes
    that image features are not available at the time of the selection of the
    samples, and therefore relies only on meta information associated with the
    images. Our strategy simultaneously exploits data sources with high chances of
    yielding useful samples and explores new data regions. For our evaluation, we
    focus on the application of estimating the age from a brain MRI. Our results on
    7,250 subjects from 10 datasets show that our approach leads to higher accuracy
    while only requiring a fraction of the training data.

    Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition

    Hong Liu, Juanhui Tu, Mengyuan Liu
    Comments: 5 pages, 6 figures, 3 tabels
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    It remains a challenge to efficiently extract spatialtemporal data from
    skeleton sequences for 3D human action recognition. Since 3D convolutional
    neural network(3DCNN) is a powerful tool to simultaneously learn features from
    both spatial and temporal dimensions, in this paper, we propose a two-stream
    model using 3DCNN. To our best knowledge, this is the first application of
    3DCNN to action recognition based on skeleton sequences. It contains three
    stages. First, skeleton joint positions are encoded as spatial value and time
    value. Second, 3DCNN models are respectively adopted to extract deep features
    from two streams. Third, to enhance the ability of deep features to capture
    global relationships, we extend every stream into multi-temporal version.
    Extensive experiments on the current largest benchmark NTU RGB-D dataset
    demonstrate that spatial-temporal modeling can reinforce each other and the
    superiority of our method over the state-of-the-art 3DCNN-based action
    recognition approaches.

    Towards seamless multi-view scene analysis from satellite to street-level

    Sébastien Lefèvre, Devis Tuia, Jan Dirk Wegner, Timothée Produit, Ahmed Samy Nassar
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we discuss and review how combined multi-view imagery from
    satellite to street-level can benefit scene analysis. Numerous works exist that
    merge information from remote sensing and images acquired from the ground for
    tasks like land cover mapping, object detection, or scene understanding. What
    makes the combination of overhead and street-level images challenging, is the
    strongly varying viewpoint, different scale, illumination, sensor modality and
    time of acquisition. Direct (dense) matching of images on a per-pixel basis is
    thus often impossible, and one has to resort to alternative strategies that
    will be discussed in this paper. We review recent works that attempt to combine
    images taken from the ground and overhead views for purposes like scene
    registration, reconstruction, or classification. Three methods that represent
    the wide range of potential methods and applications (change detection, image
    orientation, and tree cataloging) are described in detail. We show that
    cross-fertilization between remote sensing, computer vision and machine
    learning is very valuable to make the best of geographic data available from
    Earth Observation sensors and ground imagery. Despite its challenges, we
    believe that integrating these complementary data sources will lead to major
    breakthroughs in Big GeoData.

    Universal Style Transfer via Feature Transforms

    Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Universal style transfer aims to transfer any arbitrary visual styles to
    content images. Existing feed-forward based methods, while enjoying the
    inference efficiency, are mainly limited by inability of generalizing to unseen
    styles or compromised visual quality. In this paper, we present a simple yet
    effective method that tackles these limitations without training on any
    pre-defined styles. The key ingredient of our method is a pair of feature
    transforms, whitening and coloring, that are embedded to an image
    reconstruction network. The whitening and coloring transforms reflect a direct
    matching of feature covariance of the content image to a given style image,
    which shares similar spirits with the optimization of Gram matrix based cost in
    neural style transfer. We demonstrate the effectiveness of our algorithm by
    generating high-quality stylized images with comparisons to a number of recent
    methods. We also analyze our method by visualizing the whitened features and
    synthesizing textures via simple feature coloring.

    Visual Semantic Planning using Deep Successor Representations

    Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

    A crucial capability of real-world intelligent agents is their ability to
    plan a sequence of actions to achieve their goals in the visual world. In this
    work, we address the problem of visual semantic planning: the task of
    predicting a sequence of actions from visual observations that transform a
    dynamic environment from an initial state to a goal state. Doing so entails
    knowledge about objects and their affordances, as well as actions and their
    preconditions and effects. We propose learning these through interacting with a
    visual and dynamic environment. Our proposed solution involves bootstrapping
    reinforcement learning with imitation learning. To ensure cross-task
    generalization, we develop a deep predictive model based on successor
    representations. Our experimental results show near optimal results across a
    wide range of tasks in the challenging THOR environment. The supplementary
    video can be accessed at the following link: this https URL

    Patchnet: Interpretable Neural Networks for Image Classification

    Adityanarayanan Radhakrishnan, Charles Durham, Ali Soylemezoglu, Caroline Uhler
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The ability to visually understand and interpret learned features from
    complex predictive models is crucial for their acceptance in sensitive areas
    such as health care. To move closer to this goal of truly interpretable complex
    models, we present PatchNet, a network that restricts global context for image
    classification tasks in order to easily provide visual representations of
    learned texture features on a predetermined local scale. We demonstrate how
    PatchNet provides visual heatmap representations of the learned features, and
    we mathematically analyze the behavior of the network during convergence. We
    also present a version of PatchNet that is particularly well suited for
    lowering false positive rates in image classification tasks. We apply PatchNet
    to the classification of textures from the Describable Textures Dataset and to
    the ISBI-ISIC 2016 melanoma classification challenge.

    Multiple Images Recovery Using a Single Affine Transformation

    Bo Jiang, Chris Ding, Bin Luo
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In many real-world applications, image data often come with noises,
    corruptions or large errors. One approach to deal with noise image data is to
    use data recovery techniques which aim to recover the true uncorrupted signals
    from the observed noise images. In this paper, we first introduce a novel
    corruption recovery transformation (CRT) model which aims to recover multiple
    (or a collection of) corrupted images using a single affine transformation.
    Then, we show that the introduced CRT can be efficiently constructed through
    learning from training data. Once CRT is learned, we can recover the true
    signals from the new incoming/test corrupted images explicitly. As an
    application, we apply our CRT to image recognition task. Experimental results
    on six image datasets demonstrate that the proposed CRT model is effective in
    recovering noise image data and thus leads to better recognition results.

    Learning multiple visual domains with residual adapters

    Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    There is a growing interest in learning data representations that work well
    for many different types of problems and data. In this paper, we look in
    particular at the task of learning a single visual representation that can be
    successfully utilized in the analysis of very different types of images, from
    dog breeds to stop signs and digits. Inspired by recent work on learning
    networks that predict the parameters of another, we develop a tunable deep
    network architecture that, by means of adapter residual modules, can be steered
    on the fly to diverse visual domains. Our method achieves a high degree of
    parameter sharing while maintaining or even improving the accuracy of
    domain-specific representations. We also introduce the Visual Decathlon
    Challenge, a benchmark that evaluates the ability of representations to capture
    simultaneously ten very different visual domains and measures their ability to
    recognize well uniformly.

    Unrolled Optimization with Deep Priors

    Steven Diamond, Vincent Sitzmann, Felix Heide, Gordon Wetzstein
    Comments: First two authors contributed equally
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A broad class of problems at the core of computational imaging, sensing, and
    low-level computer vision reduces to the inverse problem of extracting latent
    images that follow a prior distribution, from measurements taken under a known
    physical image formation model. Traditionally, hand-crafted priors along with
    iterative optimization methods have been used to solve such problems. In this
    paper we present unrolled optimization with deep priors, a principled framework
    for infusing knowledge of the image formation into deep networks that solve
    inverse problems in imaging, inspired by classical iterative methods. We show
    that instances of the framework outperform the state-of-the-art by a
    substantial margin for a wide variety of imaging problems, such as denoising,
    deblurring, and compressed sensing magnetic resonance imaging (MRI). Moreover,
    we conduct experiments that explain how the framework is best used and why it
    outperforms previous methods.

    Training with Confusion for Fine-Grained Visual Classification

    Abhimanyu Dubey, Otkrist Gupta, Pei Guo, Ramesh Raskar, Ryan Farrell, Nikhil Naik
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Research in Fine-Grained Visual Classification has focused on tackling the
    variations in pose, lighting, and viewpoint using sophisticated localization
    and segmentation techniques, and the usage of robust texture features to
    improve performance. In this work, we look at the fundamental optimization of
    neural network training for fine-grained classification tasks with minimal
    inter-class variance, and attempt to learn features with increased
    generalization to prevent overfitting. We introduce Training-with-Confusion, an
    optimization procedure for fine-grained classification tasks that regularizes
    training by introducing confusion in activations. Our method can be generalized
    to any fine-tuning task; it is robust to the presence of small training sets
    and label noise; and adds no overhead to the prediction time. We find that
    Training-with-Confusion improves the state-of-the-art on all major fine-grained
    classification datasets.

    GP-Unet: Lesion Detection from Weak Labels with a 3D Regression Network

    Florian Dubost, Gerda Bortsova, Hieab Adams, Arfan Ikram, Wiro Niessen, Meike Vernooij, Marleen De Bruijne
    Comments: Article accepted in MICCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel convolutional neural network for lesion detection from
    weak labels. Only a single, global label per image – the lesion count – is
    needed for training. We train a regression network with a fully convolutional
    architecture combined with a global pooling layer to aggregate the 3D output
    into a scalar indicating the lesion count. When testing on unseen images, we
    first run the network to estimate the number of lesions. Then we remove the
    global pooling layer to compute localization maps of the size of the input
    image. We evaluate the proposed network on the detection of enlarged
    perivascular spaces in the basal ganglia in MRI. Our method achieves a
    sensitivity of 62% with on average 1.5 false positives per image. Compared with
    four other approaches based on intensity thresholding, saliency and class maps,
    our method has a 20% higher sensitivity.

    Universal 3D Wearable Fingerprint Targets: Advancing Fingerprint Reader Evaluations

    Joshua J. Engelsma, Sunpreet S. Arora, Anil K. Jain, Nicholas G. Paulter Jr
    Comments: 14 pages, 14 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present the design and manufacturing of high fidelity universal 3D
    fingerprint targets, which can be imaged on a variety of fingerprint sensing
    technologies, namely capacitive, contact-optical, and contactless-optical.
    Universal 3D fingerprint targets enable, for the first time, not only a
    repeatable and controlled evaluation of fingerprint readers, but also the
    ability to conduct fingerprint reader interoperability studies. Fingerprint
    reader interoperability refers to how robust fingerprint recognition systems
    are to variations in the images acquired by different types of fingerprint
    readers. To build universal 3D fingerprint targets, we adopt a molding and
    casting framework consisting of (i) digital mapping of fingerprint images to a
    negative mold, (ii) CAD modeling a scaffolding system to hold the negative
    mold, (iii) fabricating the mold and scaffolding system with a high resolution
    3D printer, (iv) producing or mixing a material with similar electrical,
    optical, and mechanical properties to that of the human finger, and (v)
    fabricating a 3D fingerprint target using controlled casting. Our experiments
    conducted with PIV and Appendix F certified optical (contact and contactless)
    and capacitive fingerprint readers demonstrate the usefulness of universal 3D
    fingerprint targets for controlled and repeatable fingerprint reader
    evaluations and also fingerprint reader interoperability studies.

    Better Text Understanding Through Image-To-Text Transfer

    Karol Kurach, Sylvain Gelly, Michal Jastrzebski, Philip Haeusser, Olivier Teytaud, Damien Vincent, Olivier Bousquet
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    Generic text embeddings are successfully used in a variety of tasks. However,
    they are often learnt by capturing the co-occurrence structure from pure text
    corpora, resulting in limitations of their ability to generalize. In this
    paper, we explore models that incorporate visual information into the text
    representation. Based on comprehensive ablation studies, we propose a
    conceptually simple, yet well performing architecture. It outperforms previous
    multimodal approaches on a set of well established benchmarks. We also improve
    the state-of-the-art results for image-related text datasets, using orders of
    magnitude less data.

    pix2code: Generating Code from a Graphical User Interface Screenshot

    Tony Beltramelli
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Transforming a graphical user interface screenshot created by a designer into
    computer code is a typical task conducted by a developer in order to build
    customized software, websites and mobile applications. In this paper, we show
    that Deep Learning techniques can be leveraged to automatically generate code
    given a graphical user interface screenshot as input. Our model is able to
    generate code targeting three different platforms (i.e. iOS, Android and
    web-based technologies) from a single input image with over 77% of accuracy.

    Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

    Chris Donahue, Akshay Balsubramani, Julian McAuley, Zachary C. Lipton
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We propose a new algorithm for training generative adversarial networks to
    jointly learn latent codes for both identities (e.g. individual humans) and
    observations (e.g. specific photographs). In practice, this means that by
    fixing the identity portion of latent codes, we can generate diverse images of
    the same subject, and by fixing the observation portion we can traverse the
    manifold of subjects while maintaining contingent aspects such as lighting and
    pose. Our algorithm features a pairwise training scheme in which each sample
    from the generator consists of two images with a common identity code.
    Corresponding samples from the real dataset consist of two distinct photographs
    of the same subject. In order to fool the discriminator, the generator must
    produce images that are both photorealistic, distinct, and appear to depict the
    same person. We augment both the DCGAN and BEGAN approaches with Siamese
    discriminators to accommodate pairwise training. Experiments with human judges
    and an off-the-shelf face verification system demonstrate our algorithm’s
    ability to generate convincing, identity-matched photographs.


    Artificial Intelligence

    Knowledge Acquisition, Representation & Manipulation in Decision Support Systems

    M.Michalewicz, S.T.Wierzchoń, M.A. Kłopotek
    Comments: Intelligent Information Systems Proceedings of a Workshop held in August’ow, Poland, 7-11 June, 1993, pages 210- 238
    Subjects: Artificial Intelligence (cs.AI)

    In this paper we present a methodology and discuss some implementation issues
    for a project on statistical/expert approach to data analysis and knowledge
    acquisition. We discuss some general assumptions underlying the project.
    Further, the requirements for a user-friendly computer assistant are specified
    along with the nature of tools aiding the researcher. Next we show some aspects
    of belief network approach and Dempster-Shafer (DST) methodology introduced in
    practice to system SEAD. Specifically we present the application of DS
    methodology to belief revision problem. Further a concept of an interface to
    probabilistic and DS belief networks enabling a user to understand the
    communication with a belief network based reasoning system is presented

    Thinking Fast and Slow with Deep Learning and Tree Search

    Thomas Anthony, Zheng Tian, David Barber
    Subjects: Artificial Intelligence (cs.AI)

    Solving sequential decision making problems, such as text parsing, robotic
    control, and game playing, requires a combination of planning policies and
    generalisation of those plans. In this paper, we present Expert Iteration, a
    novel algorithm which decomposes the problem into separate planning and
    generalisation tasks. Planning new policies is performed by tree search, while
    a deep neural network generalises those plans. In contrast, standard Deep
    Reinforcement Learning algorithms rely on a neural network not only to
    generalise plans, but to discover them too. We show that our method
    substantially outperforms Policy Gradients in the board game Hex, winning 84.4%
    of games against it when trained for equal time.

    Reinforcement Learning with a Corrupted Reward Channel

    Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg
    Comments: A shorter version of this report was accepted to IJCAI 2017 AI and Autonomy track
    Subjects: Artificial Intelligence (cs.AI)

    No real-world reward function is perfect. Sensory errors and software bugs
    may result in RL agents observing higher (or lower) rewards than they should.
    For example, a reinforcement learning agent may prefer states where a sensory
    error gives it the maximum reward, but where the true reward is actually small.
    We formalise this problem as a generalised Markov Decision Problem called
    Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under
    strong simplifying assumptions and when trying to compensate for the possibly
    corrupt rewards. Two ways around the problem are investigated. First, by giving
    the agent richer data, such as in inverse reinforcement learning and
    semi-supervised reinforcement learning, reward corruption stemming from
    systematic sensory errors may sometimes be completely managed. Second, by using
    randomisation to blunt the agent’s optimisation, reward corruption can be
    partially managed under some assumptions.

    Explaining Transition Systems through Program Induction

    Svetlin Penkov, Subramanian Ramamoorthy
    Subjects: Artificial Intelligence (cs.AI)

    Explaining and reasoning about processes which underlie observed black-box
    phenomena enables the discovery of causal mechanisms, derivation of suitable
    abstract representations and the formulation of more robust predictions. We
    propose to learn high level functional programs in order to represent abstract
    models which capture the invariant structure in the observed data. We introduce
    the (pi)-machine (program-induction machine) — an architecture able to induce
    interpretable LISP-like programs from observed data traces. We propose an
    optimisation procedure for program learning based on backpropagation, gradient
    descent and A* search. We apply the proposed method to three problems: system
    identification of dynamical systems, explaining the behaviour of a DQN agent
    and learning by demonstration in a human-robot interaction scenario. Our
    experimental results show that the (pi)-machine can efficiently induce
    interpretable programs from individual data traces.

    Enhanced Experience Replay Generation for Efficient Reinforcement Learning

    Vincent Huang, Tobias Ley, Martha Vlachou-Konchylaki, Wenfeng Hu
    Subjects: Artificial Intelligence (cs.AI)

    Applying deep reinforcement learning (RL) on real systems suffers from slow
    data sampling. We propose an enhanced generative adversarial network (EGAN) to
    initialize an RL agent in order to achieve faster learning. The EGAN utilizes
    the relation between states and actions to enhance the quality of data samples
    generated by a GAN. Pre-training the agent with the EGAN shows a steeper
    learning curve with a 20% improvement of training time in the beginning of
    learning, compared to no pre-training, and an improvement compared to training
    with GAN by about 5% with smaller variations. For real time systems with sparse
    and slow data sampling the EGAN could be used to speed up the early phases of
    the training process.

    XOR-Sampling for Network Design with Correlated Stochastic Events

    Xiaojian Wu, Yexiang Xue, Bart Selman, Carla P. Gomes
    Comments: In Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
    Subjects: Artificial Intelligence (cs.AI)

    Many network optimization problems can be formulated as stochastic network
    design problems in which edges are present or absent stochastically.
    Furthermore, protective actions can guarantee that edges will remain present.
    We consider the problem of finding the optimal protection strategy under a
    budget limit in order to maximize some connectivity measurements of the
    network. Previous approaches rely on the assumption that edges are independent.
    In this paper, we consider a more realistic setting where multiple edges are
    not independent due to natural disasters or regional events that make the
    states of multiple edges stochastically correlated. We use Markov Random Fields
    to model the correlation and define a new stochastic network design framework.
    We provide a novel algorithm based on Sample Average Approximation (SAA)
    coupled with a Gibbs or XOR sampler. The experimental results on real road
    network data show that the policies produced by SAA with the XOR sampler have
    higher quality and lower variance compared to SAA with Gibbs sampler.

    Logical Learning Through a Hybrid Neural Network with Auxiliary Inputs

    Fang Wan, Chaoyang Song
    Comments: 11 pages, 9 figures, 4 tables
    Subjects: Artificial Intelligence (cs.AI)

    The human reasoning process is seldom a one-way process from an input leading
    to an output. Instead, it often involves a systematic deduction by ruling out
    other possible outcomes as a self-checking mechanism. In this paper, we
    describe the design of a hybrid neural network for logical learning that is
    similar to the human reasoning through the introduction of an auxiliary input,
    namely the indicators, that act as the hints to suggest logical outcomes. We
    generate these indicators by digging into the hidden information buried
    underneath the original training data for direct or indirect suggestions. We
    used the MNIST data to demonstrate the design and use of these indicators in a
    convolutional neural network. We trained a series of such hybrid neural
    networks with variations of the indicators. Our results show that these hybrid
    neural networks are very robust in generating logical outcomes with inherently
    higher prediction accuracy than the direct use of the original input and output
    in apparent models. Such improved predictability with reassured logical
    confidence is obtained through the exhaustion of all possible indicators to
    rule out all illogical outcomes, which is not available in the apparent models.
    Our logical learning process can effectively cope with the unknown unknowns
    using a full exploitation of all existing knowledge available for learning. The
    design and implementation of the hints, namely the indicators, become an
    essential part of artificial intelligence for logical learning. We also
    introduce an ongoing application setup for this hybrid neural network in an
    autonomous grasping robot, namely as_DeepClaw, aiming at learning an optimized
    grasping pose through logical learning.

    Poincaré Embeddings for Learning Hierarchical Representations

    Maximilian Nickel, Douwe Kiela
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Representation learning has become an invaluable approach for learning from
    symbolic data such as text and graphs. However, while complex symbolic datasets
    often exhibit a latent hierarchical structure, state-of-the-art methods
    typically learn embeddings in Euclidean vector spaces, which do not account for
    this property. For this purpose, we introduce a new approach for learning
    hierarchical representations of symbolic data by embedding them into hyperbolic
    space — or more precisely into an n-dimensional Poincar’e ball. Due to the
    underlying hyperbolic geometry, this allows us to learn parsimonious
    representations of symbolic data by simultaneously capturing hierarchy and
    similarity. We introduce an efficient algorithm to learn the embeddings based
    on Riemannian optimization and show experimentally that Poincar’e embeddings
    outperform Euclidean embeddings significantly on data with latent hierarchies,
    both in terms of representation capacity and in terms of generalization
    ability.

    Living Together: Mind and Machine Intelligence

    Neil D. Lawrence
    Subjects: Artificial Intelligence (cs.AI)

    In this paper we consider the nature of the machine intelligences we have
    created in the context of our human intelligence. We suggest that the
    fundamental difference between human and machine intelligence comes down to
    emph{embodiment factors}. We define embodiment factors as the ratio between an
    entity’s ability to communicate information vs compute information. We
    speculate on the role of embodiment factors in driving our own intelligence and
    consciousness. We briefly review dual process models of cognition and cast
    machine intelligence within that framework, characterising it as a dominant
    System Zero, which can drive behaviour through interfacing with us
    subconsciously. Driven by concerns about the consequence of such a system we
    suggest prophylactic courses of action that could be considered. Our main
    conclusion is that it is emph{not} sentient intelligence we should fear but
    emph{non-sentient} intelligence.

    Compatible extensions and consistent closures: a fuzzy approach

    Irina Georgescu
    Subjects: Artificial Intelligence (cs.AI)

    In this paper (ast)–compatible extensions of fuzzy relations are studied,
    generalizing some results obtained by Duggan in case of crisp relations. From
    this general result are obtained as particular cases fuzzy versions of some
    important extension theorems for crisp relations (Szpilrajn, Hansson,
    Suzumura). Two notions of consistent closure of a fuzzy relation are
    introduced.

    Symbolic LTLf Synthesis

    Shufang Zhu, Lucas M. Tabajara, Jianwen Li, Geguang Pu, Moshe Y. Vardi
    Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

    LTLf synthesis is the process of finding a strategy that satisfies a linear
    temporal specification over finite traces. An existing solution to this problem
    relies on a reduction to a DFA game. In this paper, we propose a symbolic
    framework for LTLf synthesis based on this technique, by performing the
    computation over a representation of the DFA as a boolean formula rather than
    as an explicit graph. This approach enables strategy generation by utilizing
    the mechanism of boolean synthesis. We implement this symbolic synthesis method
    in a tool called Syft, and demonstrate by experiments on scalable benchmarks
    that the symbolic approach scales better than the explicit one.

    Continual Learning in Generative Adversarial Nets

    Ari Seff, Alex Beatson, Daniel Suo, Han Liu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Developments in deep generative models have allowed for tractable learning of
    high-dimensional data distributions. While the employed learning procedures
    typically assume that training data is drawn i.i.d. from the distribution of
    interest, it may be desirable to model distinct distributions which are
    observed sequentially, such as when different classes are encountered over
    time. Although conditional variations of deep generative models permit multiple
    distributions to be modeled by a single network in a disentangled fashion, they
    are susceptible to catastrophic forgetting when the distributions are
    encountered sequentially. In this paper, we adapt recent work in reducing
    catastrophic forgetting to the task of training generative adversarial networks
    on a sequence of distinct distributions, enabling continual generative
    modeling.

    Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues

    Talha Qaiser, Abhik Mukherjee, Chaitanya Reddy Pb, Sai Dileep Munugoti, Vamsi Tallam, Tomi Pitkäaho, Taina Lehtimäki, Thomas Naughton, Matt Berseth, Aníbal Pedraza, Ramakrishnan Mukundan, Matthew Smith, Abhir Bhalerao, Erik Rodner, Marcel Simon, Joachim Denzler, Chao-Hui Huang, Gloria Bueno David Snead, Ian Ellis, Mohammad Ilyas, Nasir Rajpoot
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Evaluating expression of the Human epidermal growth factor receptor 2 (Her2)
    by visual examination of immunohistochemistry (IHC) on invasive breast cancer
    (BCa) is a key part of the diagnostic assessment of BCa due to its recognised
    importance as a predictive and prognostic marker in clinical practice. However,
    visual scoring of Her2 is subjective and consequently prone to inter-observer
    variability. Given the prognostic and therapeutic implications of Her2 scoring,
    a more objective method is required. In this paper, we report on a recent
    automated Her2 scoring contest, held in conjunction with the annual PathSoc
    meeting held in Nottingham in June 2016, aimed at systematically comparing and
    advancing the state-of-the-art Artificial Intelligence (AI) based automated
    methods for Her2 scoring. The contest dataset comprised of digitised whole
    slide images (WSI) of sections from 86 cases of invasive breast carcinoma
    stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting
    algorithms automatically predicted scores of the IHC slides for an unseen
    subset of the dataset and the predicted scores were compared with the ‘ground
    truth’ (a consensus score from at least two experts). We also report on a
    simple Man vs Machine contest for the scoring of Her2 and show that the
    automated methods could beat the pathology experts on this contest dataset.
    This paper presents a benchmark for comparing the performance of automated
    algorithms for scoring of Her2. It also demonstrates the enormous potential of
    automated algorithms in assisting the pathologist with objective IHC scoring.

    Sluice networks: Learning what to share between loosely related tasks

    Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard
    Comments: 10 pages, 3 figures, 5 tables
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Multi-task learning is partly motivated by the observation that humans bring
    to bear what they know about related problems when solving new ones. Similarly,
    deep neural networks can profit from related tasks by sharing parameters with
    other networks. However, humans do not consciously decide to transfer knowledge
    between tasks (and are typically not aware of the transfer). In machine
    learning, it is hard to estimate if sharing will lead to improvements;
    especially if tasks are only loosely related. To overcome this, we introduce
    Sluice Networks, a general framework for multi-task learning where trainable
    parameters control the amount of sharing — including which parts of the models
    to share. Our framework goes beyond and generalizes over previous proposals in
    enabling hard or soft sharing of all combinations of subspaces, layers, and
    skip connections. We perform experiments on three task pairs from natural
    language processing, and across seven different domains, using data from
    OntoNotes 5.0, and achieve up to 15% average error reductions over common
    approaches to multi-task learning. We analyze when the architecture is
    particularly helpful, as well as its ability to fit noise. We show that a)
    label entropy is predictive of gains in sluice networks, confirming findings
    for hard parameter sharing, and b) while sluice networks easily fit noise, they
    are robust across domains in practice.

    Effective injury prediction in professional soccer with GPS data and machine learning

    Alessio Rossi, Luca Pappalardo, Paolo Cintia, Marcello Iaia, Javier Fernandez, Daniel Medina
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Applications (stat.AP)

    Injuries have a great impact on professional soccer, due to their large
    influence on team performance and the considerable costs of rehabilitation for
    players. Existing studies in the literature provide just a preliminary
    understanding of which factors mostly affect injury risk, while an evaluation
    of the potential of statistical models in forecasting injuries is still
    missing. In this paper, we propose a multidimensional approach to injury
    prediction in professional soccer which is based on GPS measurements and
    machine learning. By using GPS tracking technology, we collect data describing
    the training workload of players in a professional soccer club during a season.
    We show that our injury predictors are both accurate and interpretable by
    providing a set of case studies of interest to soccer practitioners. Our
    approach opens a novel perspective on injury prevention, providing a set of
    simple and practical rules for evaluating and interpreting the complex
    relations between injury risk and training performance in professional soccer.

    Detection Algorithms for Communication Systems Using Deep Learning

    Nariman Farsad, Andrea Goldsmith
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

    The design and analysis of communication systems typically rely on the
    development of mathematical models that describe the underlying communication
    channel, which dictates the relationship between the transmitted and the
    received signals. However, in some systems, such as molecular communication
    systems where chemical signals are used for transfer of information, it is not
    possible to accurately model this relationship. In these scenarios, because of
    the lack of mathematical channel models, a completely new approach to design
    and analysis is required. In this work, we focus on one important aspect of
    communication systems, the detection algorithms, and demonstrate that by
    borrowing tools from deep learning, it is possible to train detectors that
    perform well, without any knowledge of the underlying channel models. We
    evaluate these algorithms using experimental data that is collected by a
    chemical communication platform, where the channel model is unknown and
    difficult to model analytically. We show that deep learning algorithms perform
    significantly better than a simple detector that was used in previous works,
    which also did not assume any knowledge of the channel.

    pix2code: Generating Code from a Graphical User Interface Screenshot

    Tony Beltramelli
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Transforming a graphical user interface screenshot created by a designer into
    computer code is a typical task conducted by a developer in order to build
    customized software, websites and mobile applications. In this paper, we show
    that Deep Learning techniques can be leveraged to automatically generate code
    given a graphical user interface screenshot as input. Our model is able to
    generate code targeting three different platforms (i.e. iOS, Android and
    web-based technologies) from a single input image with over 77% of accuracy.

    Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

    Chris Donahue, Akshay Balsubramani, Julian McAuley, Zachary C. Lipton
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We propose a new algorithm for training generative adversarial networks to
    jointly learn latent codes for both identities (e.g. individual humans) and
    observations (e.g. specific photographs). In practice, this means that by
    fixing the identity portion of latent codes, we can generate diverse images of
    the same subject, and by fixing the observation portion we can traverse the
    manifold of subjects while maintaining contingent aspects such as lighting and
    pose. Our algorithm features a pairwise training scheme in which each sample
    from the generator consists of two images with a common identity code.
    Corresponding samples from the real dataset consist of two distinct photographs
    of the same subject. In order to fool the discriminator, the generator must
    produce images that are both photorealistic, distinct, and appear to depict the
    same person. We augment both the DCGAN and BEGAN approaches with Siamese
    discriminators to accommodate pairwise training. Experiments with human judges
    and an off-the-shelf face verification system demonstrate our algorithm’s
    ability to generate convincing, identity-matched photographs.

    Quantifying the relation between performance and success in soccer

    Luca Pappalardo, Paolo Cintia
    Subjects: Applications (stat.AP); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    The availability of massive data about sports activities offers nowadays the
    opportunity to quantify the relation between performance and success. In this
    work, we analyze more than 6,000 games and 10 million events in the six major
    European leagues and investigate the relation between team performance and
    success in soccer competitions. We discover that a team’s success in the
    national tournament is significantly related to its typical performance.
    Moreover, we observe that while victory and defeats can be explained by the
    team’s performance during a game, draws are difficult to describe with a
    machine learning approach. We then perform a simulation of an entire season of
    the six leagues where the outcome of every game is replaced by a synthetic
    outcome (victory, defeat, or draw) based on a machine learning model trained on
    the previous seasons. We find that the final rankings in the simulated
    tournaments are close to the actual rankings in the real tournaments,
    suggesting that a complex systems’ view on soccer has the potential of
    revealing hidden patterns regarding the relation between performance and
    success.


    Information Retrieval

    Increasing Papers' Discoverability with Precise Semantic Labeling: the sci.AI Platform

    Roman Gurinovich, Alexander Pashuk, Yuriy Petrovskiy, Alex Dmitrievskij, Oleg Kuryan, Alexei Scerbacov, Antonia Tiggre, Elena Moroz, Yuri Nikolsky
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    The number of published findings in biomedicine increases continually. At the
    same time, specifics of the domain’s terminology complicates the task of
    relevant publications retrieval. In the current research, we investigate
    influence of terms’ variability and ambiguity on a paper’s likelihood of being
    retrieved. We obtained statistics that demonstrate significance of the issue
    and its challenges, followed by presenting the sci.AI platform, which allows
    precise terms labeling as a resolution.

    Music Playlist Continuation by Learning from Hand-Curated Examples and Song Features

    Andreu Vall, Hamid Eghbal-zadeh, Matthias Dorfer, Markus Schedl
    Subjects: Information Retrieval (cs.IR)

    Automated music playlist generation is a specific form of music
    recommendation. Generally stated, the user receives a set of song suggestions
    defining a coherent listening session. We hypothesize that the best way to
    convey such playlist coherence to new recommendations is by learning it from
    actual curated examples, in contrast to imposing ad hoc constraints. Examples
    of thoroughly curated playlists are rather scarce, especially compared to the
    amount of listening logs available (e.g., in music streaming services).
    Collaborative filtering methods can be used to capture underlying patterns in
    hand-curated playlists, but their performance is severely affected by the size
    of the data and the low number of song observations. We propose an alternative
    method based on a song-to-playlist classifier, which learns the underlying
    structure from actual playlist examples, while leveraging song features based
    on audio, social tags and independent listening logs. Experiments on two
    datasets of hand-curated playlists show competitive performance compared to
    collaborative filtering when enough training data is available and also more
    robust performance in cold-start situations. For example, both methods achieve
    a recall@100 of roughly 35% for songs observed 5 or more times in the training
    set, whereas the proposed model achieves a recall@100 of roughly 15% for songs
    observed 4 or less times in the training set, compared to the 3% achieved by
    collaborative filtering.

    Reference String Extraction Using Line-Based Conditional Random Fields

    Martin Körner
    Comments: 5 pages, preprint
    Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)

    The extraction of individual reference strings from the reference section of
    scientific publications is an important step in the citation extraction
    pipeline. Current approaches divide this task into two steps by first detecting
    the reference section areas and then grouping the text lines in such areas into
    reference strings. We propose a classification model that considers every line
    in a publication as a potential part of a reference string. By applying
    line-based conditional random fields rather than constructing the graphical
    model based on the individual words, dependencies and patterns that are typical
    in reference sections provide strong features while the overall complexity of
    the model is reduced.

    TwiInsight: Discovering Topics and Sentiments from Social Media Datasets

    Zhengkui Wang, Guangdong Bai, Soumyadeb Chowdhury, Quanqing Xu, Zhi Lin Seow
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Social media platforms contain a great wealth of information which provides
    opportunities for us to explore hidden patterns or unknown correlations, and
    understand people’s satisfaction with what they are discussing. As one
    showcase, in this paper, we present a system, TwiInsight which explores the
    insight of Twitter data. Different from other Twitter analysis systems,
    TwiInsight automatically extracts the popular topics under different categories
    (e.g., healthcare, food, technology, sports and transport) discussed in Twitter
    via topic modeling and also identifies the correlated topics across different
    categories. Additionally, it also discovers the people’s opinions on the tweets
    and topics via the sentiment analysis. The system also employs an intuitive and
    informative visualization to show the uncovered insight. Furthermore, we also
    develop and compare six most popular algorithms – three for sentiment analysis
    and three for topic modeling.

    Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

    Arman Cohan, Nazli Goharian
    Comments: SIGIR 2017
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Citation texts are sometimes not very informative or in some cases inaccurate
    by themselves; they need the appropriate context from the referenced paper to
    reflect its exact contributions. To address this problem, we propose an
    unsupervised model that uses distributed representation of words as well as
    domain knowledge to extract the appropriate context from the reference paper.
    Evaluation results show the effectiveness of our model by significantly
    outperforming the state-of-the-art. We furthermore demonstrate how an effective
    contextualization method results in improving citation-based summarization of
    the scientific articles.


    Computation and Language

    Deep Learning of Grammatically-Interpretable Representations Through Question-Answering

    Hamid Palangi, Paul Smolensky, Xiaodong He, Li Deng
    Subjects: Computation and Language (cs.CL)

    We introduce an architecture in which internal representations, learned by
    end-to-end optimization in a deep neural network performing a textual
    question-answering task, can be interpreted using basic concepts from
    linguistic theory. This interpretability comes at a cost of only a few
    percentage-point reduction in accuracy relative to the original model on which
    the new one is based (BiDAF [1]). The internal representation that is
    interpreted is a Tensor Product Representation: for each input word, the model
    selects a symbol to encode the word, and a role in which to place the symbol,
    and binds the two together. The selection is via soft attention. The overall
    interpretation is built from interpretations of the symbols, as recruited by
    the trained model, and interpretations of the roles as used by the model. We
    find support for our initial hypothesis that symbols can be interpreted as
    lexical-semantic word meanings, while roles can be interpreted as
    approximations of grammatical roles (or categories) such as subject, wh-word,
    determiner, etc. Through extremely detailed, fine-grained analysis, we find
    specific correspondences between the learned roles and parts of speech as
    assigned by a standard parser [2], and find several discrepancies in the
    model’s favor. In this sense, the model learns significant aspects of grammar,
    after having been exposed solely to linguistically unannotated text, questions,
    and answers: no prior linguistic knowledge is given to the model. What is given
    is the means to represent using symbols and roles and an inductive bias
    favoring use of these in an approximately discrete manner.

    Better Text Understanding Through Image-To-Text Transfer

    Karol Kurach, Sylvain Gelly, Michal Jastrzebski, Philip Haeusser, Olivier Teytaud, Damien Vincent, Olivier Bousquet
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

    Generic text embeddings are successfully used in a variety of tasks. However,
    they are often learnt by capturing the co-occurrence structure from pure text
    corpora, resulting in limitations of their ability to generalize. In this
    paper, we explore models that incorporate visual information into the text
    representation. Based on comprehensive ablation studies, we propose a
    conceptually simple, yet well performing architecture. It outperforms previous
    multimodal approaches on a set of well established benchmarks. We also improve
    the state-of-the-art results for image-related text datasets, using orders of
    magnitude less data.

    Local Monotonic Attention Mechanism for End-to-End Speech Recognition

    Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
    Comments: 12 pages, 2 figures
    Subjects: Computation and Language (cs.CL)

    Recently, sequence-to-sequence model by using encoder-decoder neural network
    has gained popularity for automatic speech recognition (ASR). The architecture
    commonly uses an attentional mechanism which allows the model to learn
    alignments between source speech sequence and target text sequence. Most
    attentional mechanisms used today is based on a global attention property which
    requires a computation of a weighted summarization of the whole input sequence
    generated by encoder states. However, it is computationally expensive and often
    produces misalignment on the longer input sequence. Furthermore, it does not
    fit with monotonous or left-to-right nature in speech recognition task. In this
    paper, we propose a novel attention mechanism that has local and monotonic
    properties. Various ways to control those properties are also explored.
    Experimental results demonstrate that encoder-decoder based ASR with local
    monotonic attention could achieve significant performance improvements and
    reduce the computational complexity in comparison with the one that used the
    standard global attention architecture.

    Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

    Arman Cohan, Nazli Goharian
    Comments: SIGIR 2017
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Citation texts are sometimes not very informative or in some cases inaccurate
    by themselves; they need the appropriate context from the referenced paper to
    reflect its exact contributions. To address this problem, we propose an
    unsupervised model that uses distributed representation of words as well as
    domain knowledge to extract the appropriate context from the reference paper.
    Evaluation results show the effectiveness of our model by significantly
    outperforming the state-of-the-art. We furthermore demonstrate how an effective
    contextualization method results in improving citation-based summarization of
    the scientific articles.

    Latent Human Traits in the Language of Social Media: An Open-Vocabulary Approach

    Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski, Sandra Matz, Lyle Ungar, Steven Skiena, H. Andrew Schwartz
    Comments: In submission to PLOS One
    Subjects: Computation and Language (cs.CL)

    Over the past century, personality theory and research has successfully
    identified core sets of characteristics that consistently describe and explain
    fundamental differences in the way people think, feel and behave. Such
    characteristics were derived through theory, dictionary analyses, and survey
    research using explicit self-reports. The availability of social media data
    spanning millions of users now makes it possible to automatically derive
    characteristics from language use — at large scale. Taking advantage of
    linguistic information available through Facebook, we study the process of
    inferring a new set of potential human traits based on unprompted language use.
    We subject these new traits to a comprehensive set of evaluations and compare
    them with a popular five factor model of personality. We find that our
    language-based trait construct is often more generalizable in that it often
    predicts non-questionnaire-based outcomes better than questionnaire-based
    traits (e.g. entities someone likes, income and intelligence quotient), while
    the factors remain nearly as stable as traditional factors. Our approach
    suggests a value in new constructs of personality derived from everyday human
    language use.

    Use of Knowledge Graph in Rescoring the N-Best List in Automatic Speech Recognition

    Ashwini Jaya Kumar, Camilo Morales, Maria-Esther Vidal, Christoph Schmidt, Sören Auer
    Subjects: Computation and Language (cs.CL)

    With the evolution of neural network based methods, automatic speech
    recognition (ASR) field has been advanced to a level where building an
    application with speech interface is a reality. In spite of these advances,
    building a real-time speech recogniser faces several problems such as low
    recognition accuracy, domain constraint, and out-of-vocabulary words. The low
    recognition accuracy problem is addressed by improving the acoustic model,
    language model, decoder and by rescoring the N-best list at the output of the
    decoder. We are considering the N-best list rescoring approach to improve the
    recognition accuracy. Most of the methods in the literature use the
    grammatical, lexical, syntactic and semantic connection between the words in a
    recognised sentence as a feature to rescore. In this paper, we have tried to
    see the semantic relatedness between the words in a sentence to rescore the
    N-best list. Semantic relatedness is computed using
    TransE~cite{bordes2013translating}, a method for low dimensional embedding of
    a triple in a knowledge graph. The novelty of the paper is the application of
    semantic web to automatic speech recognition.

    Increasing Papers' Discoverability with Precise Semantic Labeling: the sci.AI Platform

    Roman Gurinovich, Alexander Pashuk, Yuriy Petrovskiy, Alex Dmitrievskij, Oleg Kuryan, Alexei Scerbacov, Antonia Tiggre, Elena Moroz, Yuri Nikolsky
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    The number of published findings in biomedicine increases continually. At the
    same time, specifics of the domain’s terminology complicates the task of
    relevant publications retrieval. In the current research, we investigate
    influence of terms’ variability and ambiguity on a paper’s likelihood of being
    retrieved. We obtained statistics that demonstrate significance of the issue
    and its challenges, followed by presenting the sci.AI platform, which allows
    precise terms labeling as a resolution.

    Sluice networks: Learning what to share between loosely related tasks

    Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard
    Comments: 10 pages, 3 figures, 5 tables
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Multi-task learning is partly motivated by the observation that humans bring
    to bear what they know about related problems when solving new ones. Similarly,
    deep neural networks can profit from related tasks by sharing parameters with
    other networks. However, humans do not consciously decide to transfer knowledge
    between tasks (and are typically not aware of the transfer). In machine
    learning, it is hard to estimate if sharing will lead to improvements;
    especially if tasks are only loosely related. To overcome this, we introduce
    Sluice Networks, a general framework for multi-task learning where trainable
    parameters control the amount of sharing — including which parts of the models
    to share. Our framework goes beyond and generalizes over previous proposals in
    enabling hard or soft sharing of all combinations of subspaces, layers, and
    skip connections. We perform experiments on three task pairs from natural
    language processing, and across seven different domains, using data from
    OntoNotes 5.0, and achieve up to 15% average error reductions over common
    approaches to multi-task learning. We analyze when the architecture is
    particularly helpful, as well as its ability to fit noise. We show that a)
    label entropy is predictive of gains in sluice networks, confirming findings
    for hard parameter sharing, and b) while sluice networks easily fit noise, they
    are robust across domains in practice.

    TwiInsight: Discovering Topics and Sentiments from Social Media Datasets

    Zhengkui Wang, Guangdong Bai, Soumyadeb Chowdhury, Quanqing Xu, Zhi Lin Seow
    Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

    Social media platforms contain a great wealth of information which provides
    opportunities for us to explore hidden patterns or unknown correlations, and
    understand people’s satisfaction with what they are discussing. As one
    showcase, in this paper, we present a system, TwiInsight which explores the
    insight of Twitter data. Different from other Twitter analysis systems,
    TwiInsight automatically extracts the popular topics under different categories
    (e.g., healthcare, food, technology, sports and transport) discussed in Twitter
    via topic modeling and also identifies the correlated topics across different
    categories. Additionally, it also discovers the people’s opinions on the tweets
    and topics via the sentiment analysis. The system also employs an intuitive and
    informative visualization to show the uncovered insight. Furthermore, we also
    develop and compare six most popular algorithms – three for sentiment analysis
    and three for topic modeling.

    pix2code: Generating Code from a Graphical User Interface Screenshot

    Tony Beltramelli
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Transforming a graphical user interface screenshot created by a designer into
    computer code is a typical task conducted by a developer in order to build
    customized software, websites and mobile applications. In this paper, we show
    that Deep Learning techniques can be leveraged to automatically generate code
    given a graphical user interface screenshot as input. Our model is able to
    generate code targeting three different platforms (i.e. iOS, Android and
    web-based technologies) from a single input image with over 77% of accuracy.


    Distributed, Parallel, and Cluster Computing

    Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers

    Richard Cole, Vijaya Ramachandran
    Comments: To appear in Proceedings of ACM Symp. on Parallel Alg. and Architectures (SPAA) 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    We analyze the caching overhead incurred by a class of multithreaded
    algorithms when scheduled by an arbitrary scheduler. We obtain bounds that
    match or improve upon the well-known (O(Q+S cdot (M/B))) caching cost for the
    randomized work stealing (RWS) scheduler, where (S) is the number of steals,
    (Q) is the sequential caching cost, and (M) and (B) are the cache size and
    block (or cache line) size respectively.

    Towards Blockchain-based Auditable Storage and Sharing of IoT Data

    Hossein Shafagh, Anwar Hithnawi, Simon Duquennoy
    Comments: USENIX NSDI’17, poster
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Today the cloud plays a central role in storing, processing, and distributing
    data. Despite contributing to the rapid development of various applications,
    including the IoT, the current centralized storage architecture has led into a
    myriad of isolated data silos and is preventing the full potential of holistic
    data-driven analytics for IoT data. In this abstract, we advocate a
    data-centric design for IoT with focus on resilience, sharing, and auditable
    protection of information. We introduce the initial design of our
    blockchain-based end-to-end encrypted data storage system. We enable a secure
    and persistent data management, by utilizing the blockchain as an auditable
    access control layer to a decentralized storage layer.

    Parallel Accelerated Custom Correlation Coefficient Calculations for Genomics Applications

    Wayne Joubert, James Nance, Sharlee Climer, Deborah Weighill, Daniel Jacobson
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Performance (cs.PF)

    The massive quantities of genomic data being made available through gene
    sequencing techniques are enabling breakthroughs in genomic science in many
    areas such as medical advances in the diagnosis and treatment of diseases.
    Analyzing this data, however, is a computational challenge insofar as the
    computational costs of the relevant algorithms can grow with quadratic, cubic
    or higher complexity–leading to the need for leadership scale computing. In
    this paper we describe a new approach to calculations of the Custom Correlation
    Coefficient (CCC) between Single Nucleotide Polymorphisms (SNPs) across a
    population, suitable for parallel systems equipped with graphics processing
    units (GPUs) or Intel Xeon Phi processors. We describe the mapping of the
    algorithms to accelerated processors, techniques used for eliminating redundant
    calculations due to symmetries, and strategies for efficient mapping of the
    calculations to many-node parallel systems. Results are presented demonstrating
    high per-node performance and near-ideal parallel scalability with rates of
    more than four quadrillion elementwise comparisons achieved per second on the
    ORNL Titan system. In a companion paper we describe corresponding techniques
    applied to calculations of the Proportional Similarity metric for comparative
    genomics applications.

    Parallel Accelerated Vector Similarity Calculations for Genomics Applications

    Wayne Joubert, James Nance, Deborah Weighill, Daniel Jacobson
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Performance (cs.PF)

    The surge in availability of genomic data holds promise for enabling
    determination of genetic causes of observed individual traits, with
    applications to problems such as discovery of the genetic roots of phenotypes,
    be they molecular phenotypes such as gene expression or metabolite
    concentrations, or complex phenotypes such as diseases. However, the growing
    sizes of these datasets and the quadratic, cubic or higher scaling
    characteristics of the relevant algorithms pose a serious computational
    challenge necessitating use of leadership scale computing. In this paper we
    describe a new approach to performing vector similarity metrics calculations,
    suitable for parallel systems equipped with graphics processing units (GPUs) or
    Intel Xeon Phi processors. Our primary focus is the Proportional Similarity
    metric applied to Genome Wide Association Studies (GWAS) and Phenome Wide
    Association Studies (PheWAS). We describe the implementation of the algorithms
    on accelerated processors, methods used for eliminating redundant calculations
    due to symmetries, and techniques for efficient mapping of the calculations to
    many-node parallel systems. Results are presented demonstrating high per-node
    performance and parallel scalability with rates of more than five quadrillion
    elementwise comparisons achieved per second on the ORNL Titan system. In a
    companion paper we describe corresponding techniques applied to calculations of
    the Custom Correlation Coefficient for comparative genomics applications.

    Distributed Testing of Conductance

    Hendrik Fichtenberger, Yadu Vasudev
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    We study the problem of testing conductance in the distributed computing
    model and give a two-sided tester that takes (mathcal{O}(log n)) rounds to
    decide if a graph has conductance at least (Phi) or is (epsilon)-far from
    having conductance at least (Theta(Phi^2)) in the distributed CONGEST model.
    We also show that (Omega(log n)) rounds are necessary for testing conductance
    even in the LOCAL model. In the case of a connected graph, we show that we can
    perform the test even when the number of vertices in the graph is not known a
    priori. This is the first two-sided tester in the distributed model we are
    aware of. The key idea in our algorithm is a way to perform a polynomial number
    of random walks from a set of vertices, avoiding the congestion on the edges.

    Transformation of Python Applications into Function-as-a-Service Deployments

    Josef Spillner
    Comments: 14 pages, 2 tables, 5 figures, repeatable, unreviewed
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

    New cloud programming and deployment models pose challenges to software
    application engineers who are looking, often in vain, for tools to automate any
    necessary code adaptation and transformation. Function-as-a-Service interfaces
    are particular non-trivial targets when considering that most cloud
    applications are implemented in non-functional languages. Among the most widely
    used of these languages is Python. This starting position calls for an
    automated approach to transform monolithic Python code into modular FaaS units
    by partially automated decomposition. Hence, this paper introduces and
    evaluates Lambada, a Python module to dynamically decompose, convert and deploy
    unmodified Python code into AWS Lambda functions. Beyond the tooling in the
    form of a measured open source prototype implementation, the paper contributes
    a description of the algorithms and code rewriting rules as blueprints for
    transformations of other scripting languages.

    Liquid Cloud Storage

    Michael G. Luby, Roberto Padovani, Thomas J. Richardson, Lorenz Minder, Pooja Aggarwal
    Comments: 44 pages, 21 figures, 1 table
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Systems and Control (cs.SY)

    A liquid system provides durable object storage based on spreading
    redundantly generated data across a network of hundreds to thousands of
    potentially unreliable storage nodes. A liquid system uses a combination of a
    large code, lazy repair, and a flow storage organization. We show that a liquid
    system can be operated to enable flexible and essentially optimal combinations
    of storage durability, storage overhead, repair bandwidth usage, and access
    performance.

    Fast and Differentially Private Algorithms for Decentralized Collaborative Machine Learning

    Aurélien Bellet, Rachid Guerraoui, Mahsa Taziki, Marc Tommasi
    Comments: 18 pages
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY); Machine Learning (stat.ML)

    Consider a set of agents in a peer-to-peer communication network, where each
    agent has a personal dataset and a personal learning objective. The main
    question addressed in this paper is: how can agents collaborate to improve upon
    their locally learned model without leaking sensitive information about their
    data? Our first contribution is to reformulate this problem so that it can be
    solved by a block coordinate descent algorithm. We obtain an efficient and
    fully decentralized protocol working in an asynchronous fashion. Our second
    contribution is to make our algorithm differentially private to protect against
    the disclosure of any information about personal datasets. We prove convergence
    rates and exhibit the trade-off between utility and privacy. Our experiments
    show that our approach dramatically outperforms previous work in the
    non-private case, and that under privacy constraints we significantly improve
    over purely local models.

    Randomized Composable Coresets for Matching and Vertex Cover

    Sepehr Assadi, Sanjeev Khanna
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    A common approach for designing scalable algorithms for massive data sets is
    to distribute the computation across, say (k), machines and process the data
    using limited communication between them. A particularly appealing framework
    here is the simultaneous communication model whereby each machine constructs a
    small representative summary of its own data and one obtains an
    approximate/exact solution from the union of the representative summaries. If
    the representative summaries needed for a problem are small, then this results
    in a communication-efficient and round-optimal protocol. While many fundamental
    graph problems admit efficient solutions in this model, two prominent problems
    are notably absent from the list of successes, namely, the maximum matching
    problem and the minimum vertex cover problem. Indeed, it was shown recently
    that for both these problems, even achieving a polylog((n)) approximation
    requires essentially sending the entire input graph from each machine.

    The main insight of our work is that the intractability of matching and
    vertex cover in the simultaneous communication model is inherently connected to
    an adversarial partitioning of the underlying graph across machines. We show
    that when the underlying graph is randomly partitioned across machines, both
    these problems admit randomized composable coresets of size (widetilde{O}(n))
    that yield an (widetilde{O}(1))-approximate solution. This results in an
    (widetilde{O}(1))-approximation simultaneous protocol for these problems with
    (widetilde{O}(nk)) total communication when the input is randomly partitioned
    across (k) machines. We further prove the optimality of our results. Finally,
    by a standard application of composable coresets, our results also imply
    MapReduce algorithms with the same approximation guarantee in one or two rounds
    of communication


    Learning

    Fast and Differentially Private Algorithms for Decentralized Collaborative Machine Learning

    Aurélien Bellet, Rachid Guerraoui, Mahsa Taziki, Marc Tommasi
    Comments: 18 pages
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (cs.SY); Machine Learning (stat.ML)

    Consider a set of agents in a peer-to-peer communication network, where each
    agent has a personal dataset and a personal learning objective. The main
    question addressed in this paper is: how can agents collaborate to improve upon
    their locally learned model without leaking sensitive information about their
    data? Our first contribution is to reformulate this problem so that it can be
    solved by a block coordinate descent algorithm. We obtain an efficient and
    fully decentralized protocol working in an asynchronous fashion. Our second
    contribution is to make our algorithm differentially private to protect against
    the disclosure of any information about personal datasets. We prove convergence
    rates and exhibit the trade-off between utility and privacy. Our experiments
    show that our approach dramatically outperforms previous work in the
    non-private case, and that under privacy constraints we significantly improve
    over purely local models.

    Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues

    Noga Alon, Moshe Babaioff, Yannai A. Gonczarowski, Yishay Mansour, Shay Moran, Amir Yehudayoff
    Subjects: Learning (cs.LG); Computer Science and Game Theory (cs.GT)

    In this work we derive a variant of the classic Glivenko-Cantelli Theorem,
    which asserts uniform convergence of the empirical Cumulative Distribution
    Function (CDF) to the CDF of the underlying distribution. Our variant allows
    for tighter convergence bounds for extreme values of the CDF.

    We apply our bound in the context of revenue learning, which is a
    well-studied problem in economics and algorithmic game theory. We derive
    sample-complexity bounds on the uniform convergence rate of the empirical
    revenues to the true revenues, assuming a bound on the (k)th moment of the
    valuations, for any (possibly fractional) (k>1).

    For uniform convergence in the limit, we give a complete characterization and
    a zero-one law: if the first moment of the valuations is finite, then uniform
    convergence almost surely occurs; conversely, if the first moment is infinite,
    then uniform convergence almost never occurs.

    Continuous State-Space Models for Optimal Sepsis Treatment – a Deep Reinforcement Learning Approach

    Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, Marzyeh Ghassemi
    Subjects: Learning (cs.LG)

    Sepsis is a leading cause of mortality in intensive care units (ICUs) and
    costs hospitals billions annually. Treating a septic patient is highly
    challenging, because individual patients respond very differently to medical
    interventions and there is no universally agreed-upon treatment for sepsis.
    Understanding more about a patient’s physiological state at a given time could
    hold the key to effective treatment policies. In this work, we propose a new
    approach to deduce optimal treatment policies for septic patients by using
    continuous state-space models and deep reinforcement learning. Learning
    treatment policies over continuous spaces is important, because we retain more
    of the patient’s physiological information. Our model is able to learn
    clinically interpretable treatment policies, similar in important aspects to
    the treatment policies of physicians. Evaluating our algorithm on past ICU
    patient data, we find that our model could reduce patient mortality in the
    hospital by up to 3.6% over observed clinical policies, from a baseline
    mortality of 13.7%. The learned treatment policies could be used to aid
    intensive care clinicians in medical decision making and improve the likelihood
    of patient survival.

    Ridesourcing Car Detection by Transfer Learning

    Leye Wang, Xu Geng, Jintao Ke, Chen Peng, Xiaojuan Ma, Daqing Zhang, Qiang Yang
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Ridesourcing platforms like Uber and Didi are getting more and more popular
    around the world. However, unauthorized ridesourcing activities taking
    advantages of the sharing economy can greatly impair the healthy development of
    this emerging industry. As the first step to regulate on-demand ride services
    and eliminate black market, we design a method to detect ridesourcing cars from
    a pool of cars based on their trajectories. Since licensed ridesourcing car
    traces are not openly available and may be completely missing in some cities
    due to legal issues, we turn to transferring knowledge from public transport
    open data, i.e, taxis and buses, to ridesourcing detection among ordinary
    vehicles. We propose a two-stage transfer learning framework. In Stage 1, we
    take taxi and bus data as input to learn a random forest (RF) classifier using
    trajectory features shared by taxis/buses and ridesourcing/other cars. Then, we
    use the RF to label all the candidate cars. In Stage 2, leveraging the subset
    of high confident labels from the previous stage as input, we further learn a
    convolutional neural network (CNN) classifier for ridesourcing detection, and
    iteratively refine RF and CNN, as well as the feature set, via a co-training
    process. Finally, we use the resulting ensemble of RF and CNN to identify the
    ridesourcing cars in the candidate pool. Experiments on real car, taxi and bus
    traces show that our transfer learning framework, with no need of a pre-labeled
    ridesourcing dataset, can achieve similar accuracy as the supervised learning
    methods.

    Continual Learning in Generative Adversarial Nets

    Ari Seff, Alex Beatson, Daniel Suo, Han Liu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Developments in deep generative models have allowed for tractable learning of
    high-dimensional data distributions. While the employed learning procedures
    typically assume that training data is drawn i.i.d. from the distribution of
    interest, it may be desirable to model distinct distributions which are
    observed sequentially, such as when different classes are encountered over
    time. Although conditional variations of deep generative models permit multiple
    distributions to be modeled by a single network in a disentangled fashion, they
    are susceptible to catastrophic forgetting when the distributions are
    encountered sequentially. In this paper, we adapt recent work in reducing
    catastrophic forgetting to the task of training generative adversarial networks
    on a sequence of distinct distributions, enabling continual generative
    modeling.

    Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions

    Aryeh Kontorovich, Sivan Sabato, Roi Weiss
    Subjects: Learning (cs.LG); Statistics Theory (math.ST)

    We examine the Bayes-consistency of a recently proposed
    1-nearest-neighbor-based multiclass learning algorithm. This algorithm is
    derived from sample compression bounds and enjoys the statistical advantages of
    tight, fully empirical generalization bounds, as well as the algorithmic
    advantages of runtime and memory savings. We prove that this algorithm is
    strongly Bayes-consistent in metric spaces with finite doubling dimension —
    the first consistency result for an efficient nearest-neighbor sample
    compression scheme. Rather surprisingly, we discover that this algorithm
    continues to be Bayes-consistent even in a certain infinite-dimensional
    setting, in which the basic measure-theoretic conditions on which classic
    consistency proofs hinge are violated. This is all the more surprising, since
    it is known that k-NN is not Bayes-consistent in this setting. We pose several
    challenging open problems for future research.

    Black-Box Attacks against RNN based Malware Detection Algorithms

    Weiwei Hu, Ying Tan
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR)

    Recent researches have shown that machine learning based malware detection
    algorithms are very vulnerable under the attacks of adversarial examples. These
    works mainly focused on the detection algorithms which use features with fixed
    dimension, while some researchers have begun to use recurrent neural networks
    (RNN) to detect malware based on sequential API features. This paper proposes a
    novel algorithm to generate sequential adversarial examples, which are used to
    attack a RNN based malware detection system. It is usually hard for malicious
    attackers to know the exact structures and weights of the victim RNN. A
    substitute RNN is trained to approximate the victim RNN. Then we propose a
    generative RNN to output sequential adversarial examples from the original
    sequential malware inputs. Experimental results showed that RNN based malware
    detection algorithms fail to detect most of the generated malicious adversarial
    examples, which means the proposed model is able to effectively bypass the
    detection algorithms.

    Consistent Multitask Learning with Nonlinear Output Relations

    Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco, Massimiliano Pontil
    Comments: 25 pages, 1 figure, 1 table
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Key to multitask learning is exploiting relationships between different tasks
    to improve prediction performance. If the relations are linear, regularization
    approaches can be used successfully. However, in practice assuming the tasks to
    be linearly related might be restrictive, and allowing for nonlinear structures
    is a challenge. In this paper, we tackle this issue by casting the problem
    within the framework of structured prediction. Our main contribution is a novel
    algorithm for learning multiple tasks which are related by a system of
    nonlinear equations that their joint outputs need to satisfy. We show that the
    algorithm is consistent and can be efficiently implemented. Experimental
    results show the potential of the proposed method.

    Semi-Bandits with Knapsacks

    Karthik Abinav Sankararaman, Aleksandrs Slivkins
    Subjects: Learning (cs.LG)

    This paper unifies two lines of work on multi-armed bandits, Bandits with
    Knapsacks (BwK) and semi-bandits. The former concerns scenarios with limited
    “resources” consumed by the algorithm, e.g., limited inventory in a dynamic
    pricing problem. The latter has a huge number of actions, but there is
    combinatorial structure and additional feedback which makes the problem
    tractable. Both lines of work has received considerable recent attention, and
    are supported by numerous application examples. We define a common
    generalization, and design a general algorithm for this model. Our regret rates
    are comparable with those for BwK and semi-bandits in general, and essentially
    optimal for important special cases.

    Learning from partial correction

    Sanjoy Dasgupta, Michael Luby
    Subjects: Learning (cs.LG)

    We introduce a new model of interactive learning in which an expert examines
    the predictions of a learner and partially fixes them if they are wrong.
    Although this kind of feedback is not i.i.d., we show statistical
    generalization bounds on the quality of the learned model.

    Compressing Recurrent Neural Network with Tensor Train

    Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
    Comments: Accepted at IJCNN 2017
    Subjects: Learning (cs.LG)

    Recurrent Neural Network (RNN) are a popular choice for modeling temporal and
    sequential tasks and achieve many state-of-the-art performance on various
    complex problems. However, most of the state-of-the-art RNNs have millions of
    parameters and require many computational resources for training and predicting
    new data. This paper proposes an alternative RNN model to reduce the number of
    parameters significantly by representing the weight parameters based on Tensor
    Train (TT) format. In this paper, we implement the TT-format representation for
    several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU). We
    compare and evaluate our proposed RNN model with uncompressed RNN model on
    sequence classification and sequence prediction tasks. Our proposed RNNs with
    TT-format are able to preserve the performance while reducing the number of RNN
    parameters significantly up to 40 times smaller.

    Wasserstein Learning of Deep Generative Point Process Models

    Shuai Xiao, Mehrdad Farajtabar, Xiaojing Ye, Junchi Yan, Le Song, Hongyuan Zha
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Point processes are becoming very popular in modeling asynchronous sequential
    data due to their sound mathematical foundation and strength in modeling a
    variety of real-world phenomena. Currently, they are often characterized via
    intensity function which limits model’s expressiveness due to unrealistic
    assumptions on its parametric form used in practice. Furthermore, they are
    learned via maximum likelihood approach which is prone to failure in
    multi-modal distributions of sequences. In this paper, we propose an
    intensity-free approach for point processes modeling that transforms nuisance
    processes to a target one. Furthermore, we train the model using a
    likelihood-free leveraging Wasserstein distance between point processes.
    Experiments on various synthetic and real-world data substantiate the
    superiority of the proposed point process model over conventional ones.

    Detection Algorithms for Communication Systems Using Deep Learning

    Nariman Farsad, Andrea Goldsmith
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

    The design and analysis of communication systems typically rely on the
    development of mathematical models that describe the underlying communication
    channel, which dictates the relationship between the transmitted and the
    received signals. However, in some systems, such as molecular communication
    systems where chemical signals are used for transfer of information, it is not
    possible to accurately model this relationship. In these scenarios, because of
    the lack of mathematical channel models, a completely new approach to design
    and analysis is required. In this work, we focus on one important aspect of
    communication systems, the detection algorithms, and demonstrate that by
    borrowing tools from deep learning, it is possible to train detectors that
    perform well, without any knowledge of the underlying channel models. We
    evaluate these algorithms using experimental data that is collected by a
    chemical communication platform, where the channel model is unknown and
    difficult to model analytically. We show that deep learning algorithms perform
    significantly better than a simple detector that was used in previous works,
    which also did not assume any knowledge of the channel.

    Parallel Stochastic Gradient Descent with Sound Combiners

    Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz
    Comments: 16 pages, 4 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Stochastic gradient descent (SGD) is a well known method for regression and
    classification tasks. However, it is an inherently sequential algorithm at each
    step, the processing of the current example depends on the parameters learned
    from the previous examples. Prior approaches to parallelizing linear learners
    using SGD, such as HOGWILD! and ALLREDUCE, do not honor these dependencies
    across threads and thus can potentially suffer poor convergence rates and/or
    poor scalability. This paper proposes SYMSGD, a parallel SGD algorithm that, to
    a first-order approximation, retains the sequential semantics of SGD. Each
    thread learns a local model in addition to a model combiner, which allows local
    models to be combined to produce the same result as what a sequential SGD would
    have produced. This paper evaluates SYMSGD’s accuracy and performance on 6
    datasets on a shared-memory machine shows upto 11x speedup over our heavily
    optimized sequential baseline on 16 cores and 2.2x, on average, faster than
    HOGWILD!.

    Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices

    Tayfun Gokmen, O. Murat Onen, Wilfried Haensch
    Comments: 22 pages, 6 figures, 2 tables
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    In a previous work we have detailed the requirements to obtain a maximal
    performance benefit by implementing fully connected deep neural networks (DNN)
    in form of arrays of resistive devices for deep learning. This concept of
    Resistive Processing Unit (RPU) devices we extend here towards convolutional
    neural networks (CNNs). We show how to map the convolutional layers to RPU
    arrays such that the parallelism of the hardware can be fully utilized in all
    three cycles of the backpropagation algorithm. We find that the noise and bound
    limitations imposed due to analog nature of the computations performed on the
    arrays effect the training accuracy of the CNNs. Noise and bound management
    techniques are presented that mitigate these problems without introducing any
    additional complexity in the analog circuits and can be addressed by the
    digital circuits. In addition, we discuss digitally programmable update
    management and device variability reduction techniques that can be used
    selectively for some of the layers in a CNN. We show that combination of all
    those techniques enables a successful application of the RPU concept for
    training CNNs. The techniques discussed here are more general and can be
    applied beyond CNN architectures and therefore enables applicability of RPU
    approach for large class of neural network architectures.

    Convergence Analysis of Batch Normalization for Deep Neural Nets

    Yintai Ma, Diego Klabjan
    Subjects: Learning (cs.LG)

    Batch normalization (BN) is very effective in accelerating the convergence of
    a neural network training phase that it has become a common practice. We
    propose a generalization of BN, the diminishing batch normalization (DBN)
    algorithm. We provide an analysis of the convergence of the DBN algorithm that
    converges to a stationary point with respect to trainable parameters. We
    analyze a two layer model with linear activation. The main challenge of the
    analysis is the fact that some parameters are updated by gradient while others
    are not. In the numerical experiments, we use models with more layers and ReLU
    activation. We observe that DBN outperforms the original BN algorithm on MNIST,
    NI and CIFAR-10 datasets with reasonable complex FNN and CNN models.

    pix2code: Generating Code from a Graphical User Interface Screenshot

    Tony Beltramelli
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Transforming a graphical user interface screenshot created by a designer into
    computer code is a typical task conducted by a developer in order to build
    customized software, websites and mobile applications. In this paper, we show
    that Deep Learning techniques can be leveraged to automatically generate code
    given a graphical user interface screenshot as input. Our model is able to
    generate code targeting three different platforms (i.e. iOS, Android and
    web-based technologies) from a single input image with over 77% of accuracy.

    Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

    Chris Donahue, Akshay Balsubramani, Julian McAuley, Zachary C. Lipton
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We propose a new algorithm for training generative adversarial networks to
    jointly learn latent codes for both identities (e.g. individual humans) and
    observations (e.g. specific photographs). In practice, this means that by
    fixing the identity portion of latent codes, we can generate diverse images of
    the same subject, and by fixing the observation portion we can traverse the
    manifold of subjects while maintaining contingent aspects such as lighting and
    pose. Our algorithm features a pairwise training scheme in which each sample
    from the generator consists of two images with a common identity code.
    Corresponding samples from the real dataset consist of two distinct photographs
    of the same subject. In order to fool the discriminator, the generator must
    produce images that are both photorealistic, distinct, and appear to depict the
    same person. We augment both the DCGAN and BEGAN approaches with Siamese
    discriminators to accommodate pairwise training. Experiments with human judges
    and an off-the-shelf face verification system demonstrate our algorithm’s
    ability to generate convincing, identity-matched photographs.

    Detecting Adversarial Examples in Deep Networks with Adaptive Noise Reduction

    Bin Liang, Hongcheng Li, Miaoqiang Su, Xirong Li, Wenchang Shi, Xiaofeng Wang
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    Deep neural networks (DNNs) play a key role in many applications.
    Unsurprisingly, they also became a potential attack target of adversaries. Some
    studies have demonstrated DNN classifiers can be fooled by the adversarial
    example, which is crafted via introducing some perturbations into an original
    sample. Accordingly, some powerful defense techniques were proposed against
    adversarial examples. However, existing defense techniques require modifying
    the target model or depend on the prior knowledge of attack techniques to
    different degrees. In this paper, we propose a straightforward method for
    detecting adversarial image examples. It doesn’t require any prior knowledge of
    attack techniques and can be directly deployed into unmodified off-the-shelf
    DNN models. Specifically, we consider the perturbation to images as a kind of
    noise and introduce two classical image processing techniques, scalar
    quantization and smoothing spatial filter, to reduce its effect. The image
    two-dimensional entropy is employed as a metric to implement an adaptive noise
    reduction for different kinds of images. As a result, the adversarial example
    can be effectively detected by comparing the classification results of a given
    sample and its denoised version. Thousands of adversarial examples against some
    state-of-the-art DNN models are used to evaluate the proposed method, which are
    crafted with different attack techniques. The experiment shows that our
    detection method can achieve an overall recall of 93.73% and an overall
    precision of 95.45% without referring to any prior knowledge of attack
    techniques.

    Efficient and principled score estimation

    Dougal J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton
    Comments: Full version including appendices
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

    We propose a fast method with statistical guarantees for learning an
    exponential family density model where the natural parameter is in a
    reproducing kernel Hilbert space, and may be infinite dimensional. The model is
    learned by fitting the derivative of the log density, the score, thus avoiding
    the need to compute a normalization constant. We improved the computational
    efficiency of an earlier solution with a low-rank, Nystr”om-like solution. The
    new solution remains consistent, and is shown to converge in Fisher distance at
    the same rate as a full-rank solution, with guarantees on the degree of cost
    and storage reduction. We compare to a popular score learning approach using a
    denoising autoencoder, in experiments on density estimation and in the
    construction of an adaptive Hamiltonian Monte Carlo sampler. Apart from the
    lack of statistical guarantees for the autoencoder, our estimator is more
    data-efficient when estimating the score, runs faster, and has fewer parameters
    (which can be tuned in a principled and interpretable way).

    The Marginal Value of Adaptive Gradient Methods in Machine Learning

    Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Adaptive optimization methods, which perform local optimization with a metric
    constructed from the history of iterates, are becoming increasingly popular for
    training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We
    show that for simple overparameterized problems, adaptive methods often find
    drastically different solutions than gradient descent (GD) or stochastic
    gradient descent (SGD). We construct an illustrative binary classification
    problem where the data is linearly separable, GD and SGD achieve zero test
    error, and AdaGrad, Adam, and RMSProp attain test errors arbitrarily close to
    half. We additionally study the empirical generalization capability of adaptive
    methods on several state-of-the-art deep learning models. We observe that the
    solutions found by adaptive methods generalize worse (often significantly
    worse) than SGD, even when these solutions have better training performance.
    These results suggest that practitioners should reconsider the use of adaptive
    methods to train neural networks.

    Matching neural paths: transfer from recognition to correspondence search

    Nikolay Savinov, Lubor Ladicky, Marc Pollefeys
    Comments: Submitted to NIPS 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Many machine learning tasks require finding per-part correspondences between
    objects. In this work we focus on low-level correspondences – a highly
    ambiguous matching problem. We propose to use a hierarchical semantic
    representation of the objects, coming from a convolutional neural network, to
    solve this ambiguity. Training it for low-level correspondence prediction
    directly might not be an option in some domains where the ground-truth
    correspondences are hard to obtain. We show how transfer from recognition can
    be used to avoid such training. Our idea is to mark parts as “matching” if
    their features are close to each other at all the levels of convolutional
    feature hierarchy (neural paths). Although the overall number of such paths is
    exponential in the number of layers, we propose a polynomial algorithm for
    aggregating all of them in a single backward pass. The empirical validation is
    done on the task of stereo correspondence and demonstrates that we achieve
    competitive results among the methods which do not use labeled target domain
    data.

    Unbiasing Truncated Backpropagation Through Time

    Corentin Tallec, Yann Ollivier
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Truncated Backpropagation Through Time (truncated BPTT) is a widespread
    method for learning recurrent computational graphs. Truncated BPTT keeps the
    computational benefits of Backpropagation Through Time (BPTT) while relieving
    the need for a complete backtrack through the whole data sequence at every
    step. However, truncation favors short-term dependencies: the gradient estimate
    of truncated BPTT is biased, so that it does not benefit from the convergence
    guarantees from stochastic gradient theory. We introduce Anticipated Reweighted
    Truncated Backpropagation (ARTBP), an algorithm that keeps the computational
    benefits of truncated BPTT, while providing unbiasedness. ARTBP works by using
    variable truncation lengths together with carefully chosen compensation factors
    in the backpropagation equation. We check the viability of ARTBP on two tasks.
    First, a simple synthetic task where careful balancing of temporal dependencies
    at different scales is needed: truncated BPTT displays unreliable performance,
    and in worst case scenarios, divergence, while ARTBP converges reliably.
    Second, on Penn Treebank character-level language modelling, ARTBP slightly
    outperforms truncated BPTT.

    Learning to Succeed while Teaching to Fail: Privacy in Closed Machine Learning Systems

    Jure Sokolic, Qiang Qiu, Miguel R. D. Rodrigues, Guillermo Sapiro
    Comments: 14 pages, 1 figure
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Security, privacy, and fairness have become critical in the era of data
    science and machine learning. More and more we see that achieving universally
    secure, private, and fair systems is practically impossible. We have seen for
    example how generative adversarial networks can be used to learn about the
    expected private training data; how the exploitation of additional data can
    reveal private information in the original one; and how what looks like
    unrelated features can teach us about each other. Confronted with this
    challenge, in this paper we open a new line of research, where the security,
    privacy, and fairness is learned and used in a closed environment. The goal is
    to ensure that a given entity (e.g., the company or the government), trusted to
    infer certain information with our data, is blocked from inferring protected
    information from it. For example, a hospital might be allowed to produce
    diagnosis on the patient (the positive task), without being able to infer the
    gender of the subject (negative task). Similarly, a company can guarantee that
    internally it is not using the provided data for any undesired task, an
    important goal that is not contradicting the virtually impossible challenge of
    blocking everybody from the undesired task. We design a system that learns to
    succeed on the positive task while simultaneously fail at the negative one, and
    illustrate this with challenging cases where the positive task is actually
    harder than the negative one being blocked. Fairness, to the information in the
    negative task, is often automatically obtained as a result of this proposed
    approach. The particular framework and examples open the door to security,
    privacy, and fairness in very important closed scenarios, ranging from private
    data accumulation companies like social networks to law-enforcement and
    hospitals.

    Look, Listen and Learn

    Relja Arandjelović, Andrew Zisserman
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We consider the question: what can be learnt by looking at and listening to a
    large amount of unlabelled videos? There is a valuable, but so far untapped,
    source of information contained in the video itself — the correspondence
    between the visual and the audio streams, and we introduce a novel
    “Audio-Visual Correspondence” learning task that makes use of this. Training
    visual and audio networks from scratch, without any additional supervision
    other than the raw unconstrained videos themselves, is shown to successfully
    solve this task, and, more interestingly, result in good vision and audio
    representations. These features set the new state-of-the-art on two sound
    classification benchmarks, and perform on par with the state-of-the-art
    self-supervised approaches on ImageNet classification. We also demonstrate that
    the network is able to localize objects in both modalities, as well as perform
    fine-grained recognition tasks.

    Visualizing LSTM decisions

    Jos van der Westhuizen, Joan Lasenby
    Comments: 10 pages, 7 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Applications (stat.AP)

    Long Short-Term Memory (LSTM) recurrent neural networks are renowned for
    being uninterpretable “black boxes”. In the medical domain where LSTMs have
    shown promise, this is specifically concerning because it is imperative to
    understand the decisions made by machine learning models in such acute
    situations. This study employs techniques used in the Convolutional Neural
    Network domain to elucidate the operations that LSTMs perform on time series.
    The visualization techniques include input saliency by means of occlusion and
    derivatives, class mode visualization, and temporal outputs. Moreover, we
    demonstrate that LSTMs appear to extract features similar to those extracted by
    wavelets. It was found that deriving the inputs for saliency is a poor
    approximation and occlusion is a better approach. Moreover, analyzing LSTMs on
    different sets of data provide novel interpretations.

    Sluice networks: Learning what to share between loosely related tasks

    Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard
    Comments: 10 pages, 3 figures, 5 tables
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Multi-task learning is partly motivated by the observation that humans bring
    to bear what they know about related problems when solving new ones. Similarly,
    deep neural networks can profit from related tasks by sharing parameters with
    other networks. However, humans do not consciously decide to transfer knowledge
    between tasks (and are typically not aware of the transfer). In machine
    learning, it is hard to estimate if sharing will lead to improvements;
    especially if tasks are only loosely related. To overcome this, we introduce
    Sluice Networks, a general framework for multi-task learning where trainable
    parameters control the amount of sharing — including which parts of the models
    to share. Our framework goes beyond and generalizes over previous proposals in
    enabling hard or soft sharing of all combinations of subspaces, layers, and
    skip connections. We perform experiments on three task pairs from natural
    language processing, and across seven different domains, using data from
    OntoNotes 5.0, and achieve up to 15% average error reductions over common
    approaches to multi-task learning. We analyze when the architecture is
    particularly helpful, as well as its ability to fit noise. We show that a)
    label entropy is predictive of gains in sluice networks, confirming findings
    for hard parameter sharing, and b) while sluice networks easily fit noise, they
    are robust across domains in practice.

    Visual Semantic Planning using Deep Successor Representations

    Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Robotics (cs.RO)

    A crucial capability of real-world intelligent agents is their ability to
    plan a sequence of actions to achieve their goals in the visual world. In this
    work, we address the problem of visual semantic planning: the task of
    predicting a sequence of actions from visual observations that transform a
    dynamic environment from an initial state to a goal state. Doing so entails
    knowledge about objects and their affordances, as well as actions and their
    preconditions and effects. We propose learning these through interacting with a
    visual and dynamic environment. Our proposed solution involves bootstrapping
    reinforcement learning with imitation learning. To ensure cross-task
    generalization, we develop a deep predictive model based on successor
    representations. Our experimental results show near optimal results across a
    wide range of tasks in the challenging THOR environment. The supplementary
    video can be accessed at the following link: this https URL

    Ambiguity set and learning via Bregman and Wasserstein

    Xin Guo, Johnny Hong, Nan Yang
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Construction of ambiguity set in robust optimization relies on the choice of
    divergences between probability distributions. In distribution learning,
    choosing appropriate probability distributions based on observed data is
    critical for approximating the true distribution. To improve the performance of
    machine learning models, there has recently been interest in designing
    objective functions based on Lp-Wasserstein distance rather than the classical
    Kullback-Leibler (KL) divergence. In this paper, we derive concentration and
    asymptotic results using Bregman divergence. We propose a novel asymmetric
    statistical divergence called Wasserstein-Bregman divergence as a
    generalization of L2-Wasserstein distance. We discuss how these results can be
    applied to the construction of ambiguity set in robust optimization.

    Neural Network Memory Architectures for Autonomous Robot Navigation

    Steven W Chen, Nikolay Atanasov, Arbaaz Khan, Konstantinos Karydis, Daniel D. Lee, Vijay Kumar
    Subjects: Robotics (cs.RO); Learning (cs.LG)

    This paper highlights the significance of including memory structures in
    neural networks when the latter are used to learn perception-action loops for
    autonomous robot navigation. Traditional navigation approaches rely on global
    maps of the environment to overcome cul-de-sacs and plan feasible motions. Yet,
    maintaining an accurate global map may be challenging in real-world settings. A
    possible way to mitigate this limitation is to use learning techniques that
    forgo hand-engineered map representations and infer appropriate control
    responses directly from sensed information. An important but unexplored aspect
    of such approaches is the effect of memory on their performance. This work is a
    first thorough study of memory structures for deep-neural-network-based robot
    navigation, and offers novel tools to train such networks from supervision and
    quantify their ability to generalize to unseen scenarios. We analyze the
    separation and generalization abilities of feedforward, long short-term memory,
    and differentiable neural computer networks. We introduce a new method to
    evaluate the generalization ability by estimating the VC-dimension of networks
    with a final linear readout layer. We validate that the VC estimates are good
    predictors of actual test performance. The reported method can be applied to
    deep learning problems beyond robotics.

    Poincaré Embeddings for Learning Hierarchical Representations

    Maximilian Nickel, Douwe Kiela
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Representation learning has become an invaluable approach for learning from
    symbolic data such as text and graphs. However, while complex symbolic datasets
    often exhibit a latent hierarchical structure, state-of-the-art methods
    typically learn embeddings in Euclidean vector spaces, which do not account for
    this property. For this purpose, we introduce a new approach for learning
    hierarchical representations of symbolic data by embedding them into hyperbolic
    space — or more precisely into an n-dimensional Poincar’e ball. Due to the
    underlying hyperbolic geometry, this allows us to learn parsimonious
    representations of symbolic data by simultaneously capturing hierarchy and
    similarity. We introduce an efficient algorithm to learn the embeddings based
    on Riemannian optimization and show experimentally that Poincar’e embeddings
    outperform Euclidean embeddings significantly on data with latent hierarchies,
    both in terms of representation capacity and in terms of generalization
    ability.

    Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

    Mark Eisen, Aryan Mokhtari, Alejandro Ribeiro
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Machine Learning (stat.ML)

    We consider large scale empirical risk minimization (ERM) problems, where
    both the problem dimension and variable size is large. In these cases, most
    second order methods are infeasible due to the high cost in both computing the
    Hessian over all samples and computing its inverse in high dimensions. In this
    paper, we propose a novel adaptive sample size second-order method, which
    reduces the cost of computing the Hessian by solving a sequence of ERM problems
    corresponding to a subset of samples and lowers the cost of computing the
    Hessian inverse using a truncated eigenvalue decomposition. We show that while
    we geometrically increase the size of the training set at each stage, a single
    iteration of the truncated Newton method is sufficient to solve the new ERM
    within its statistical accuracy. Moreover, for a large number of samples we are
    allowed to double the size of the training set at each stage, and the proposed
    method subsequently reaches the statistical accuracy of the full training set
    approximately after two effective passes. In addition to this theoretical
    result, we show empirically on a number of well known data sets that the
    proposed truncated adaptive sample size algorithm outperforms stochastic
    alternatives for solving ERM problems.

    Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

    Yichen Chen, Mengdi Wang
    Subjects: Computational Complexity (cs.CC); Learning (cs.LG)

    We study the computational complexity of the infinite-horizon
    discounted-reward Markov Decision Problem (MDP) with a finite state space
    (|mathcal{S}|) and a finite action space (|mathcal{A}|). We show that any
    randomized algorithm needs a running time at least
    (Omega(|mathcal{S}|^2|mathcal{A}|)) to compute an (epsilon)-optimal policy
    with high probability. We consider two variants of the MDP where the input is
    given in specific data structures, including arrays of cumulative probabilities
    and binary trees of transition probabilities. For these cases, we show that the
    complexity lower bound reduces to (Omegaleft( frac{|mathcal{S}|
    |mathcal{A}|}{epsilon}
    ight)). These results reveal a surprising
    observation that the computational complexity of the MDP depends on the data
    structure of input.

    Modelling spatio-temporal routines in human mobility

    Luca Pappalardo, Filippo Simini
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph); Other Statistics (stat.OT)

    Human mobility modelling is of fundamental importance in a wide range of
    applications, such as the developing of protocols for mobile ad-hoc networks or
    what-if analysis in urban ecosystems. Current generative models fail in
    accurately reproducing the individuals’ recurrent schedules and at the same
    time in accounting for the possibility that individuals may break the routine
    during periods of variable duration. In this article we present DITRAS
    (DIary-based TRAjectory Simulator), a framework to simulate the spatio-temporal
    patterns of human mobility. DITRAS operates in two steps: the generation of a
    mobility diary and the translation of the mobility diary into a mobility
    trajectory. We propose a data-driven algorithm which constructs a diary
    generator from real data, capturing the tendency of individuals to follow or
    break their routine. We also propose a trajectory generator based on the
    concept of preferential exploration and preferential return. We instantiate
    DITRAS with the proposed diary and trajectory generators and compare the
    resulting spatio-temporal model with real data and synthetic data produced by
    other spatio-temporal mobility models, built by instantiating DITRAS with
    several combinations of diary and trajectory generators. We show that the
    proposed model reproduces the statistical properties of real trajectories in
    the most accurate way, making a step forward the understanding of the origin of
    the spatio-temporal patterns of human mobility.


    Information Theory

    Preserving Privacy while Broadcasting: (k)-Limited-Access Schemes

    Mohammed Karmoose, Linqi Song, Martina Cardone, Christina Fragouli
    Subjects: Information Theory (cs.IT)

    Index coding employs coding across clients within the same broadcast domain.
    This typically assumes that all clients learn the coding matrix so that they
    can decode and retrieve their requested data. However, learning the coding
    matrix can pose privacy concerns: it may enable clients to infer information
    about the requests and side information of other clients [1]. In this paper, we
    formalize the intuition that the achieved privacy can increase by decreasing
    the number of rows of the coding matrix that a client learns. Based on this, we
    propose the use of (k)-limited-access schemes: given an index coding scheme
    that employs (T) transmissions, we create a (k)-limited-access scheme with
    (T_kgeq T) transmissions, and with the property that each client learns at
    most (k) rows of the coding matrix to decode its message. We derive upper and
    lower bounds on (T_k) for all values of (k), and develop deterministic designs
    for these schemes for which (T_k) has an order-optimal exponent for some
    regimes.

    Information Theoretic Principles of Universal Discrete Denoising

    Janis Nötzel, Andreas Winter
    Comments: 10 pages, 6 figures
    Subjects: Information Theory (cs.IT)

    Today, the internet makes tremendous amounts of data widely available. Often,
    the same information is behind multiple different available data sets. This
    lends growing importance to latent variable models that try to learn the hidden
    information from the available imperfect versions. For example, social media
    platforms can contain an abundance of pictures of the same person or object,
    yet all of which are taken from different perspectives. In a simplified
    scenario, one may consider pictures taken from the same perspective, which are
    distorted by noise. This latter application allows for a rigorous mathematical
    treatment, which is the content of this contribution. We apply a recently
    developed method of dependent component analysis to image denoising when
    multiple distorted copies of one and the same image are available, each being
    corrupted by a different and unknown noise process. In a simplified scenario,
    we assume that the distorted image is corrupted by noise that acts
    independently on each pixel. We answer completely the question of how to
    perform optimal denoising, when at least three distorted copies are available:
    First we define optimality of an algorithm in the presented scenario, and then
    we describe an aymptotically optimal universal discrete denoising algorithm
    (UDDA). In the case of binary data and binary symmetric noise, we develop a
    simplified variant of the algorithm, dubbed BUDDA, which we prove to attain
    universal denoising uniformly.

    Distributed Precoding Systems in Multi-Gateway Multibeam Satellites: Regularization and Coarse Beamforming

    Carlos Mosquera, Roberto Lopez-Valcarce, Vahid Joroughi
    Comments: Submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    This paper deals with the problem of beamforming design in a multibeam
    satellite, which is shared by different groups of terminals -clusters-, each
    served by an Earth station or gateway. Each gateway precodes the symbols
    addressed to its respective users; the design follows an MMSE criterion, and a
    regularization factor judiciously chosen allows to account for the presence of
    mutually interfering clusters, extending more classical results applicable to
    one centralized station. More importantly, channel statistics can be used
    instead of instantaneous channel state information, avoiding the exchange of
    information among gateways through backhaul links. The on-board satellite
    beamforming weights are designed to exploit the degrees of freedom of the
    satellite antennas to minimize the noise impact and the interference to some
    specific users. On-ground beamforming results are provided as a reference to
    compare the joint performance of MMSE precoders and on-board beamforming
    network. A non-adaptive design complements the results and makes them more
    amenable to practical use by designing a coarse beamforming network.

    Capacity Outer Bound and Degrees of Freedom of Wiener Phase Noise Channels with Oversampling

    Luca Barletta, Stefano Rini
    Comments: 5 pages, 1 figure, Submitted to Intern. Workshop Inf. Theory (ITW) 2017
    Subjects: Information Theory (cs.IT)

    The discrete-time Wiener phase noise channel with an integrate-and-dump
    multi-sample receiver is studied.

    A novel outer bound on the capacity with an average input power constraint is
    derived as a function of the oversampling factor.

    This outer bound yields the degrees of freedom for the scenario in which the
    oversampling factor grows with the transmit power (P) as (P^{alpha}).

    The result shows, perhaps surprisingly, that the largest pre-log that can be
    attained with phase modulation at high signal-to-noise ratio is at most (1/4).

    An Improved Secretive Coded Caching Scheme exploiting Common Demands

    Hari Hara Suthan C, Ishani Chugh, Prasad Krishnan
    Comments: Submitted to Information Theory Workshop 2017
    Subjects: Information Theory (cs.IT)

    Coded caching schemes on broadcast networks with user caches help to offload
    traffic from peak times to off-peak times by prefetching information from the
    server to the receivers during off-peak times and thus serving the users more
    efficiently during peak times using coded transmissions. We consider the
    problem of secretive coded caching which was proposed recently, in which a user
    should not be able to decode any information about any file that the user has
    not demanded. We propose a new secretive coded caching scheme which has a lower
    average rate compared to the existing state-of-the-art scheme, for the same
    memory available at the receivers. The proposed scheme is based on exploiting
    the presence of common demands between multiple receivers.

    Asymptotically optimal codebooks based on generalized Jacobi sums

    Ziling Heng
    Subjects: Information Theory (cs.IT)

    Codebooks with small inner-product correlation are applied in many practical
    applications including direct spread code division multiple access
    communications, space-time codes and compressed sensing. It is extremely
    difficult to construct codebooks achieving the Welch or Levenshtein bound. In
    this paper, we firstly generalize Jacobi sums over finite fields and secondly
    obtain asymptotically optimal codebooks with respect to the Welch or
    Levenshtein bound. Our main results generalize those in [11] and the codebooks
    in this paper have more flexible parameters than those in [11].

    Capacity of Molecular Channels with Imperfect Particle-Intensity Modulation and Detection

    Nariman Farsad, Christopher Rose, Muriel Médard, Andrea Goldsmith
    Comments: Accepted at IEEE International Symposium on Information Theory (ISIT)
    Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)

    This work introduces the particle-intensity channel (PIC) as a model for
    molecular communication systems and characterizes the properties of the optimal
    input distribution and the capacity limits for this system. In the PIC, the
    transmitter encodes information, in symbols of a given duration, based on the
    number of particles released, and the receiver detects and decodes the message
    based on the number of particles detected during the symbol interval. In this
    channel, the transmitter may be unable to control precisely the number of
    particles released, and the receiver may not detect all the particles that
    arrive. We demonstrate that the optimal input distribution for this channel
    always has mass points at zero and the maximum number of particles that can be
    released. We then consider diffusive particle transport, derive the capacity
    expression when the input distribution is binary, and show conditions under
    which the binary input is capacity-achieving. In particular, we demonstrate
    that when the transmitter cannot generate particles at a high rate, the optimal
    input distribution is binary.

    Joint Uplink/Downlink Resource Allocation and Data Offloading in OFDMA-Based Wireless Powered HetNets

    Sepehr Rezvani, Nader Mokari, Mohammad R. Javan
    Comments: 36 pages, 11 figures, Submitted to IEEE Transactions on Wireless Communications
    Subjects: Information Theory (cs.IT)

    This paper considers joint uplink/downlink of an orthogonal frequency
    division multiple access (OFDMA)-based heterogeneous network (HetNet)
    consisting of a single macro base station (MBS), multiple femto base stations
    (FBSs) and access points (APs) where base stations (BSs) can offload data to
    APs and each mobile user (MU) is able to harvest the received energy using the
    simultaneous wireless information and power transfer (SWIPT) technique. We also
    suppose that the harvested energy of MUs are used for their uplink information
    transmission. We devise a radio resource allocation (RRA) algorithm to maximize
    the uplink sum data rate of MUs subject to a minimum required downlink data
    rate of each MU and maximum allowable transmit power of each BS, AP, and MU.
    More specifically, both the frequency division duplex (FDD) and time division
    duplex (TDD) schemes are investigated. The proposed non-convex optimization
    problems are solved using an iterative algorithm. It is also proved that the
    proposed algorithm converges to a near-optimal solution. Simulation results
    illustrate that the TDD scheme improves the performance compared to the FDD
    scheme. In addition, it is shown that utilizing the data offloading technique
    improves the uplink sum data rate of MUs compared to the scenario without any
    AP.

    Exponential error rates of SDP for block models: Beyond Grothendieck's inequality

    Yingjie Fei, Yudong Chen
    Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Social and Information Networks (cs.SI); Statistics Theory (math.ST)

    In this paper we consider the cluster estimation problem under the Stochastic
    Block Model. We show that the semidefinite programming (SDP) formulation for
    this problem achieves an error rate that decays exponentially in the
    signal-to-noise ratio. The error bound implies weak recovery in the sparse
    graph regime with bounded expected degrees, as well as exact recovery in the
    dense regime. An immediate corollary of our results yields error bounds under
    the Censored Block Model. Moreover, these error bounds are robust, continuing
    to hold under heterogeneous edge probabilities and a form of the so-called
    monotone attack.

    Significantly, this error rate is achieved by the SDP solution itself without
    any further pre- or post-processing, and improves upon existing
    polynomially-decaying error bounds proved using the Grothendieck extquoteright
    s inequality. Our analysis has two key ingredients: (i) showing that the graph
    has a well-behaved spectrum, even in the sparse regime, after discounting an
    exponentially small number of edges, and (ii) an order-statistics argument that
    governs the final error rate. Both arguments highlight the implicit
    regularization effect of the SDP formulation.

    Spectral Simplicity of Apparent Complexity, Part I: The Nondiagonalizable Metadynamics of Prediction

    Paul M. Riechers, James P. Crutchfield
    Comments: 24 pages, 3 figures, 4 tables; current version always at this http URL
    Subjects: Chaotic Dynamics (nlin.CD); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Dynamical Systems (math.DS); Functional Analysis (math.FA)

    Virtually all questions that one can ask about the behavioral and structural
    complexity of a stochastic process reduce to a linear algebraic framing of a
    time evolution governed by an appropriate hidden-Markov process generator. Each
    type of question—correlation, predictability, predictive cost, observer
    synchronization, and the like—induces a distinct generator class. Answers are
    then functions of the class-appropriate transition dynamic. Unfortunately,
    these dynamics are generically nonnormal, nondiagonalizable, singular, and so
    on. Tractably analyzing these dynamics relies on adapting the recently
    introduced meromorphic functional calculus, which specifies the spectral
    decomposition of functions of nondiagonalizable linear operators, even when the
    function poles and zeros coincide with the operator’s spectrum. Along the way,
    we establish special properties of the projection operators that demonstrate
    how they capture the organization of subprocesses within a complex system.
    Circumventing the spurious infinities of alternative calculi, this leads in the
    sequel, Part II, to the first closed-form expressions for complexity measures,
    couched either in terms of the Drazin inverse (negative-one power of a singular
    operator) or the eigenvalues and projection operators of the appropriate
    transition dynamic.

    Predicting stock market movements using network science: An information theoretic approach

    Minjun Kim, Hiroki Sayama
    Comments: 13 pages, 7 figures, 3 tables
    Subjects: Social and Information Networks (cs.SI); Computational Engineering, Finance, and Science (cs.CE); Information Theory (cs.IT); Physics and Society (physics.soc-ph)

    A stock market is considered as one of the highly complex systems, which
    consists of many components whose prices move up and down without having a
    clear pattern. The complex nature of a stock market challenges us on making a
    reliable prediction of its future movements. In this paper, we aim at building
    a new method to forecast the future movements of Standard & Poor’s 500 Index
    (S&P 500) by constructing time-series complex networks of S&P 500 underlying
    companies by connecting them with links whose weights are given by the mutual
    information of 60-minute price movements of the pairs of the companies with the
    consecutive 5,340 minutes price records. We showed that the changes in the
    strength distributions of the networks provide an important information on the
    network’s future movements. We built several metrics using the strength
    distributions and network measurements such as centrality, and we combined the
    best two predictors by performing a linear combination. We found that the
    combined predictor and the changes in S&P 500 show a quadratic relationship,
    and it allows us to predict the amplitude of the one step future change in S&P
    500. The result showed significant fluctuations in S&P 500 Index when the
    combined predictor was high. In terms of making the actual index predictions,
    we built ARIMA models. We found that adding the network measurements into the
    ARIMA models improves the model accuracy. These findings are useful for
    financial market policy makers as an indicator based on which they can
    interfere with the markets before the markets make a drastic change, and for
    quantitative investors to improve their forecasting models.




沪ICP备19023445号-2号
友情链接