IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 23 May 2017

    我爱机器学习(52ml.net)发表于 2017-05-23 00:00:00
    love 0

    Neural and Evolutionary Computing

    Block building programming for symbolic regression

    Chen Chen, Changtong Luo, Zonglin Jiang
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Symbolic regression that aims to detect underlying data-driven model, has
    become increasingly important for industrial data analysis when the
    experimental model structure is unknown or wrong, or the concerned system has
    changed. For most of the existing algorithms for symbolic regression, such as
    genetic programming, the convergence speed might be too slow for large scale
    problems with a large number of variables. This situation may become even worse
    with increasing problem size. The aforementioned difficulty make symbolic
    regression limited in practical applications. Fortunately, in many engineering
    problems, the independent variables in target models are separable or partially
    separable. This feature inspires us to develop a new approach, block building
    programming (BBP), in this paper. BBP divides the original target model into
    several simple models, and then optimizes them sequentially. Under such
    circumstance, BBP can make large reductions to the search space. The partition
    of separability is based on a designed method, block and factor detection.
    Using two different optimization engines, experiments on a set of symbolic
    regression problems with separability are conducted. Numerical results show
    that BBP has a good capability of `structure and coefficient optimization’ and
    a better computational efficiency.

    Improving classification accuracy of feedforward neural networks for spiking neuromorphic chips

    Antonio Jimeno Yepes, Jianbin Tang, Benjamin Scott Mashford
    Comments: IJCAI-2017. arXiv admin note: text overlap with arXiv:1605.07740
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Deep Neural Networks (DNN) achieve human level performance in many image
    analytics tasks but DNNs are mostly deployed to GPU platforms that consume a
    considerable amount of power. New hardware platforms using lower precision
    arithmetic achieve drastic reductions in power consumption. More recently,
    brain-inspired spiking neuromorphic chips have achieved even lower power
    consumption, on the order of milliwatts, while still offering real-time
    processing.

    However, for deploying DNNs to energy efficient neuromorphic chips the
    incompatibility between continuous neurons and synaptic weights of traditional
    DNNs, discrete spiking neurons and synapses of neuromorphic chips need to be
    overcome. Previous work has achieved this by training a network to learn
    continuous probabilities, before it is deployed to a neuromorphic architecture,
    such as IBM TrueNorth Neurosynaptic System, by random sampling these
    probabilities.

    The main contribution of this paper is a new learning algorithm that learns a
    TrueNorth configuration ready for deployment. We achieve this by training
    directly a binary hardware crossbar that accommodates the TrueNorth axon
    configuration constrains and we propose a different neuron model.

    Results of our approach trained on electroencephalogram (EEG) data show a
    significant improvement with previous work (76% vs 86% accuracy) while
    maintaining state of the art performance on the MNIST handwritten data set.

    Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

    Xin Dong, Shangyu Chen, Sinno Jialin Pan
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    How to develop slim and accurate deep neural networks has become crucial for
    real- world applications, especially for those employed in embedded systems.
    Though previous work along this research line has shown some promising results,
    most existing methods either fail to significantly compress a well-trained deep
    network or require a heavy retraining process for the pruned deep network to
    re-boost its prediction performance. In this paper, we propose a new layer-wise
    pruning method for deep neural networks. In our proposed method, parameters of
    each individual layer are pruned independently based on second order
    derivatives of a layer-wise error function with respect to the corresponding
    parameters. We prove that the final prediction performance drop after pruning
    is bounded by a linear combination of the reconstructed errors caused at each
    layer. Therefore, there is a guarantee that one only needs to perform a light
    retraining process on the pruned network to resume its original prediction
    performance. We conduct extensive experiments on benchmark datasets to
    demonstrate the effectiveness of our pruning method compared with several
    state-of-the-art baseline methods.

    Parallel and in-process compilation of individuals for genetic programming on GPU

    Hakan Ayral, Songül Albayrak
    Comments: Submitted to PeerJ Computer Science
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Three approaches to implement genetic programming on GPU hardware are
    compilation, interpretation and direct generation of machine code. The compiled
    approach is known to have a prohibitive overhead compared to other two. This
    paper investigates methods to accelerate compilation of individuals for genetic
    programming on GPU hardware. We apply in-process compilation to minimize the
    compilation overhead at each generation; and we investigate ways to parallelize
    in-process compilation. In-process compilation doesn’t lend itself to trivial
    parallelization with threads; we propose a multiprocess parallelization using
    memory sharing and operating systems interprocess communication primitives.
    With parallelized compilation we achieve further reductions on compilation
    overhead. Another contribution of this work is the code framework we built in
    C# for the experiments. The framework makes it possible to build arbitrary
    grammatical genetic programming experiments that run on GPU with minimal extra
    coding effort, and is available as open source.

    Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks

    Roby Velez, Jeff Clune
    Subjects: Neural and Evolutionary Computing (cs.NE)

    A long-term goal of AI is to produce agents that can learn a diversity of
    skills throughout their lifetimes and continuously improve those skills via
    experience. A longstanding obstacle towards that goal is catastrophic
    forgetting, which is when learning new information erases previously learned
    information. Catastrophic forgetting occurs in artificial neural networks
    (ANNs), which have fueled most recent advances in AI. A recent paper proposed
    that catastrophic forgetting in ANNs can be reduced by promoting modularity,
    which can limit forgetting by isolating task information to specific clusters
    of nodes and connections (functional modules). While the prior work did show
    that modular ANNs suffered less from catastrophic forgetting, it was not able
    to produce ANNs that possessed task-specific functional modules, thereby
    leaving the main theory regarding modularity and forgetting untested. We
    introduce diffusion-based neuromodulation, which simulates the release of
    diffusing, neuromodulatory chemicals within an ANN that can modulate (i.e. up
    or down regulate) learning in a spatial region. On the simple diagnostic
    problem from the prior work, diffusion-based neuromodulation 1) induces
    task-specific learning in groups of nodes and connections (task-specific
    localized learning), which 2) produces functional modules for each subtask, and
    3) yields higher performance by eliminating catastrophic forgetting. Overall,
    our results suggest that diffusion-based neuromodulation promotes task-specific
    localized learning and functional modularity, which can help solve the
    challenging, but important problem of catastrophic forgetting.

    TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

    Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li
    Comments: 9 pages
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)

    High network communication cost for synchronizing gradients and parameters is
    the well-known bottleneck of distributed training. In this work, we propose
    TernGrad that uses ternary gradients to accelerate distributed deep learning in
    data parallelism. Our approach requires only three numerical levels {-1,0,1}
    which can aggressively reduce the communication time. We mathematically prove
    the convergence of TernGrad under the assumption of a bound on gradients.
    Guided by the bound, we propose layer-wise ternarizing and gradient clipping to
    improve its convergence. Our experiments show that applying TernGrad on AlexNet
    does not incur any accuracy loss and can even improve accuracy. The accuracy
    loss of GoogLeNet induced by TernGrad is less than 2% on average. Finally, a
    performance model is proposed to study the scalability of TernGrad. Experiments
    show significant speed gains for various deep neural networks.

    An Out-of-the-box Full-network Embedding for Convolutional Neural Networks

    Dario Garcia-Gasulla, Armand Vilalta, Ferran Parés, Jonatan Moreno, Eduard Ayguadé, Jesus Labarta, Ulises Cortés, Toyotaro Suzumura
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Transfer learning for feature extraction can be used to exploit deep
    representations in contexts where there is very few training data, where there
    are limited computational resources, or when tuning the hyper-parameters needed
    for training is not an option. While previous contributions to feature
    extraction propose embeddings based on a single layer of the network, in this
    paper we propose a full-network embedding which successfully integrates
    convolutional and fully connected features, coming from all layers of a deep
    convolutional neural network. To do so, the embedding normalizes features in
    the context of the problem, and discretizes their values to reduce noise and
    regularize the embedding space. Significantly, this also reduces the
    computational cost of processing the resultant representations. The proposed
    method is shown to outperform single layer embeddings on several image
    classification tasks, while also being more robust to the choice of the
    pre-trained model used for obtaining the initial features. The performance gap
    in classification accuracy between thoroughly tuned solutions and the
    full-network embedding is also reduced, which makes of the proposed approach a
    competitive solution for a large set of applications.

    Bayesian Belief Updating of Spatiotemporal Seizure Dynamics

    Gerald K Cooray, Richard Rosch, Torsten Baldeweg, Louis Lemieux, Karl Friston, Biswa Sengupta
    Subjects: Machine Learning (stat.ML); Neural and Evolutionary Computing (cs.NE); Methodology (stat.ME)

    Epileptic seizure activity shows complicated dynamics in both space and time.
    To understand the evolution and propagation of seizures spatially extended sets
    of data need to be analysed. This requires efficient methods of inference. We
    have previously described an efficient filtering scheme using variational
    Laplace that can be used in the Dynamic Causal Modelling (DCM) framework to
    estimate the temporal dynamics of seizures recorded using either invasive or
    non-invasive electrical recordings (EEG/ECoG). In this note, we present a
    method to analyse the spatiotemporal dynamics of seizures. Spatiotemporal
    dynamics are modelled using a partial differential equation – in contrast to
    the ordinary differential equation used in our previous work on temporal
    estimation of seizure dynamics (Cooray et al 2015). We provide the requisite
    theoretical background for the method and test the ensuing scheme on simulated
    seizure activity data and empirical invasive ECoG data. The method provides a
    framework to assimilate the spatial and temporal dynamics of seizure activity,
    an aspect of great physiological and clinical importance.

    Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

    Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)

    The Particle Swarm Optimization Policy (PSO-P) has been recently introduced
    and proven to produce remarkable results on interacting with academic
    reinforcement learning benchmarks in an off-policy, batch-based setting. To
    further investigate the properties and feasibility on real-world applications,
    this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a
    novel reinforcement learning (RL) benchmark that aims at being realistic by
    including a variety of aspects found in industrial applications, like
    continuous state and action spaces, a high dimensional, partially observable
    state space, delayed effects, and complex stochasticity. The experimental
    results of PSO-P on IB are compared to results of closed-form control policies
    derived from the model-based Recurrent Control Neural Network (RCNN) and the
    model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not
    only of interest for academic benchmarks, but also for real-world industrial
    applications, since it also yielded the best performing policy in our IB
    setting. Compared to other well established RL techniques, PSO-P produced
    outstanding results in performance and robustness, requiring only a relatively
    low amount of effort in finding adequate parameters or making complex design
    decisions.

    How to Train Your DRAGAN

    Naveen Kodali, Jacob Abernethy, James Hays, Zsolt Kira
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Generative Adversarial Networks have emerged as an effective technique for
    estimating data distributions. The basic setup consists of two deep networks
    playing against each other in a zero-sum game setting. However, it is not
    understood if the networks reach an equilibrium eventually and what dynamics
    makes this possible. The current GAN training procedure, which involves
    simultaneous gradient descent, lacks a clear game-theoretic justification in
    the literature. In this paper, we introduce regret minimization as a technique
    to reach equilibrium in games and use this to motivate the use of simultaneous
    GD in GANs. In addition, we present a hypothesis that mode collapse, which is a
    common occurrence in GAN training, happens due to the existence of spurious
    local equilibria in non-convex games. Motivated by these insights, we develop
    an algorithm called DRAGAN that is fast, simple to implement and achieves
    competitive performance in a stable fashion across different architectures,
    datasets (MNIST, CIFAR-10, and CelebA), and divergence measures with almost no
    hyperparameter tuning.

    Espresso: Efficient Forward Propagation for BCNNs

    Fabrizio Pedersoli, George Tzanetakis, Andrea Tagliasacchi
    Comments: 10 pages, 4 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There are many applications scenarios for which the computational performance
    and memory footprint of the prediction phase of Deep Neural Networks (DNNs)
    needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an
    effective way of achieving this objective. In this paper, we show how
    Convolutional Neural Networks (CNNs) can be implemented using binary
    representations. Espresso is a compact, yet powerful library written in C/CUDA
    that features all the functionalities required for the forward propagation of
    CNNs, in a binary file less than 400KB, without any external dependencies.
    Although it is mainly designed to take advantage of massive GPU parallelism,
    Espresso also provides an equivalent CPU implementation for CNNs. Espresso
    provides special convolutional and dense layers for BCNNs, leveraging
    bit-packing and bit-wise computations for efficient execution. These techniques
    provide a speed-up of matrix-multiplication routines, and at the same time,
    reduce memory usage when storing parameters and activations. We experimentally
    show that Espresso is significantly faster than existing implementations of
    optimized binary neural networks ((approx) 2 orders of magnitude). Espresso is
    released under the Apache 2.0 license and is available at
    this http URL

    Local Information with Feedback Perturbation Suffices for Dictionary Learning in Neural Circuits

    Tsung-Han Lin
    Comments: 10 pages, 4 figures
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    While the sparse coding principle can successfully model information
    processing in sensory neural systems, it remains unclear how learning can be
    accomplished under neural architectural constraints. Feasible learning rules
    must rely solely on synaptically local information in order to be implemented
    on spatially distributed neurons. We describe a neural network with spiking
    neurons that can address the aforementioned fundamental challenge and solve the
    L1-minimizing dictionary learning problem, representing the first model able to
    do so. Our major innovation is to introduce feedback synapses to create a
    pathway to turn the seemingly non-local information into local ones. The
    resulting network encodes the error signal needed for learning as the change of
    network steady states caused by feedback, and operates akin to the classical
    stochastic gradient descent method.


    Computer Vision and Pattern Recognition

    Facial Affect Estimation in the Wild Using Deep Residual and Convolutional Networks

    Behzad Hasani, Mohammad H. Mahoor
    Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automated affective computing in the wild is a challenging task in the field
    of computer vision. This paper presents three neural network-based methods
    proposed for the task of facial affect estimation submitted to the First
    Affect-in-the-Wild challenge. These methods are based on Inception-ResNet
    modules redesigned specifically for the task of facial affect estimation. These
    methods are: Shallow Inception-ResNet, Deep Inception-ResNet, and
    Inception-ResNet with LSTMs. These networks extract facial features in
    different scales and simultaneously estimate both the valence and arousal in
    each frame. Root Mean Square Error (RMSE) rates of 0.4 and 0.3 are achieved for
    the valence and arousal respectively with corresponding Concordance Correlation
    Coefficient (CCC) rates of 0.04 and 0.29 using Deep Inception-ResNet method.

    Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks

    Behzad Hasani, Mohammad H. Mahoor
    Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep Neural Networks (DNNs) have shown to outperform traditional methods in
    various visual recognition tasks including Facial Expression Recognition (FER).
    In spite of efforts made to improve the accuracy of FER systems using DNN,
    existing methods still are not generalizable enough in practical applications.
    This paper proposes a 3D Convolutional Neural Network method for FER in videos.
    This new network architecture consists of 3D Inception-ResNet layers followed
    by an LSTM unit that together extracts the spatial relations within facial
    images as well as the temporal relations between different frames in the video.
    Facial landmark points are also used as inputs to our network which emphasize
    on the importance of facial components rather than the facial regions that may
    not contribute significantly to generating facial expressions. Our proposed
    method is evaluated using four publicly available databases in
    subject-independent and cross-database tasks and outperforms state-of-the-art
    methods.

    DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels

    Paul Guerrero, Holger Winnemöller, Wilmot Li, Niloy J. Mitra
    Comments: 12 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In the context of scene understanding, a variety of methods exists to
    estimate different information channels from mono or stereo images, including
    disparity, depth, and normals. Although several advances have been reported in
    the recent years for these tasks, the estimated information is often imprecise
    particularly near depth discontinuities or creases. Studies have however shown
    that precisely such depth edges carry critical cues for the perception of
    shape, and play important roles in tasks like depth-based segmentation or
    foreground selection. Unfortunately, the currently extracted channels often
    carry conflicting signals, making it difficult for subsequent applications to
    effectively use them. In this paper, we focus on the problem of obtaining
    high-precision depth edges (i.e., depth contours and creases) by jointly
    analyzing such unreliable information channels. We propose DepthCut, a
    data-driven fusion of the channels using a convolutional neural network trained
    on a large dataset with known depth. The resulting depth edges can be used for
    segmentation, decomposing a scene into depth layers with relatively flat depth,
    or improving the accuracy of the depth estimate near depth edges by
    constraining its gradients to agree with these edges. Quantitatively, we
    compare against 15 variants of baselines and demonstrate that our depth edges
    result in an improved segmentation performance and an improved depth estimate
    near depth edges compared to data-agnostic channel fusion. Qualitatively, we
    demonstrate that the depth edges result in superior segmentation and depth
    orderings.

    Regularizing deep networks using efficient layerwise adversarial training

    Swami Sankaranarayanan, Arpit Jain, Rama Chellappa, Ser Nam Lim
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Adversarial training has been shown to regularize deep neural networks in
    addition to increasing their robustness to adversarial examples. However, its
    impact on very deep state of the art networks has not been fully investigated.
    In this paper, we present an efficient approach to perform adversarial training
    by perturbing intermediate layer activations and study the use of such
    perturbations as a regularizer during training. We use these perturbations to
    train very deep models such as ResNets and show improvement in performance both
    on adversarial and original test data. Our experiments highlight the benefits
    of perturbing intermediate layer activations compared to perturbing only the
    inputs. The results on CIFAR-10 and CIFAR-100 datasets show the merits of the
    proposed adversarial training approach. Additional results on WideResNets show
    that our approach provides significant improvement in classification accuracy
    for a given base model, outperforming dropout and other base models of larger
    size.

    TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

    Li Ding, Chenliang Xu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Action segmentation as a milestone towards building automatic systems to
    understand untrimmed videos has received considerable attention in the recent
    years. It is typically being modeled as a sequence labeling problem but
    contains intrinsic and sufficient differences than text parsing or speech
    processing. In this paper, we introduce a novel hybrid temporal convolutional
    and recurrent network (TricorNet), which has an encoder-decoder architecture:
    the encoder consists of a hierarchy of temporal convolutional kernels that
    capture the local motion changes of different actions; the decoder is a
    hierarchy of recurrent neural networks that are able to learn and memorize
    long-term action dependencies after the encoding stage. Our model is simple but
    extremely effective in terms of video sequence labeling. The experimental
    results on three public action segmentation datasets have shown that the
    proposed model achieves superior performance over the state of the art.

    Robust Localized Multi-view Subspace Clustering

    Yanbo Fan, Jian Liang, Ran He, Bao-Gang Hu, Siwei Lyu
    Comments: 7 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In multi-view clustering, different views may have different confidence
    levels when learning a consensus representation. Existing methods usually
    address this by assigning distinctive weights to different views. However, due
    to noisy nature of real-world applications, the confidence levels of samples in
    the same view may also vary. Thus considering a unified weight for a view may
    lead to suboptimal solutions. In this paper, we propose a novel localized
    multi-view subspace clustering model that considers the confidence levels of
    both views and samples. By assigning weight to each sample under each view
    properly, we can obtain a robust consensus representation via fusing the
    noiseless structures among views and samples. We further develop a regularizer
    on weight parameters based on the convex conjugacy theory, and samples weights
    are determined in an adaptive manner. An efficient iterative algorithm is
    developed with a convergence guarantee. Experimental results on four benchmarks
    demonstrate the correctness and effectiveness of the proposed model.

    Convolutional Networks with MuxOut Layers as Multi-rate Systems for Image Upscaling

    Pablo Navarrete Michelini, Hanwen Liu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We interpret convolutional networks as adaptive filters and combine them with
    so-called MuxOut layers to efficiently upscale low resolution images. We
    formalize this interpretation by deriving a linear and space-variant structure
    of a convolutional network when its activations are fixed. We introduce general
    purpose algorithms to analyze a network and show its overall filter effect for
    each given location. We use this analysis to evaluate two types of image
    upscalers: deterministic upscalers that target the recovery of details from
    original content; and second, a new generation of upscalers that can sample the
    distribution of upscale aliases (images that share the same downscale version)
    that look like real content.

    Learning to Associate Words and Images Using a Large-scale Graph

    Heqing Ya, Haonan Sun, Jeffrey Helt, Tai Sing Lee
    Comments: 8 pages, 7 figures, 14th Conference on Computer and Robot Vision 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We develop an approach for unsupervised learning of associations between
    co-occurring perceptual events using a large graph. We applied this approach to
    successfully solve the image captcha of China’s railroad system. The approach
    is based on the principle of suspicious coincidence. In this particular
    problem, a user is presented with a deformed picture of a Chinese phrase and
    eight low-resolution images. They must quickly select the relevant images in
    order to purchase their train tickets. This problem presents several
    challenges: (1) the teaching labels for both the Chinese phrases and the images
    were not available for supervised learning, (2) no pre-trained deep
    convolutional neural networks are available for recognizing these Chinese
    phrases or the presented images, and (3) each captcha must be solved within a
    few seconds. We collected 2.6 million captchas, with 2.6 million deformed
    Chinese phrases and over 21 million images. From these data, we constructed an
    association graph, composed of over 6 million vertices, and linked these
    vertices based on co-occurrence information and feature similarity between
    pairs of images. We then trained a deep convolutional neural network to learn a
    projection of the Chinese phrases onto a 230-dimensional latent space. Using
    label propagation, we computed the likelihood of each of the eight images
    conditioned on the latent space projection of the deformed phrase for each
    captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on
    average. Our work, in answering this practical challenge, illustrates the power
    of this class of unsupervised association learning techniques, which may be
    related to the brain’s general strategy for associating language stimuli with
    visual objects on the principle of suspicious coincidence.

    Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

    Joao Carreira, Andrew Zisserman
    Comments: To appear at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    The paucity of videos in current action classification datasets (UCF-101 and
    HMDB-51) has made it difficult to identify good video architectures, as most
    methods obtain similar performance on existing small-scale benchmarks. This
    paper re-evaluates state-of-the-art architectures in light of the new Kinetics
    Human Action Video dataset. Kinetics has two orders of magnitude more data,
    with 400 human action classes and over 400 clips per class, and is collected
    from realistic, challenging YouTube videos. We provide an analysis on how
    current architectures fare on the task of action classification on this dataset
    and how much performance improves on the smaller benchmark datasets after
    pre-training on Kinetics.

    We also introduce a new Two-Stream Inflated 3D ConvNet (I3D) that is based on
    2D ConvNet inflation: filters and pooling kernels of very deep image
    classification ConvNets are expanded into 3D, making it possible to learn
    seamless spatio-temporal feature extractors from video while leveraging
    successful ImageNet architecture designs and even their parameters. We show
    that, after pre-training on Kinetics, I3D models considerably improve upon the
    state-of-the-art in action classification, reaching 80.7% on HMDB-51 and 98.0%
    on UCF-101.

    Semantic Softmax Loss for Zero-Shot Learning

    Zhong Ji, Yunxin Sun, Yulong Yu, Jichang Guo, Yanwei Pang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A typical pipeline for Zero-Shot Learning (ZSL) is to integrate the visual
    features and the class semantic descriptors into a multimodal framework with a
    linear or bilinear model. However, the visual features and the class semantic
    descriptors locate in different structural spaces, a linear or bilinear model
    can not capture the semantic interactions between different modalities well. In
    this letter, we propose a nonlinear approach to impose ZSL as a multi-class
    classification problem via a Semantic Softmax Loss by embedding the class
    semantic descriptors into the softmax layer of multi-class classification
    network. To narrow the structural differences between the visual features and
    semantic descriptors, we further use an L2 normalization constraint to the
    differences between the visual features and visual prototypes reconstructed
    with the semantic descriptors. The results on three benchmark datasets, i.e.,
    AwA, CUB and SUN demonstrate the proposed approach can boost the performances
    steadily and achieve the state-of-the-art performance for both zero-shot
    classification and zero-shot retrieval.

    Dynamics Based 3D Skeletal Hand Tracking

    Stan Melax, Leonid Keselman, Sterling Orsten
    Comments: Published in Graphics Interface 2013
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    Tracking the full skeletal pose of the hands and fingers is a challenging
    problem that has a plethora of applications for user interaction. Existing
    techniques either require wearable hardware, add restrictions to user pose, or
    require significant computation resources. This research explores a new
    approach to tracking hands, or any articulated model, by using an augmented
    rigid body simulation. This allows us to phrase 3D object tracking as a linear
    complementarity problem with a well-defined solution. Based on a depth sensor’s
    samples, the system generates constraints that limit motion orthogonal to the
    rigid body model’s surface. These constraints, along with prior motion,
    collision/contact constraints, and joint mechanics, are resolved with a
    projected Gauss-Seidel solver. Due to camera noise properties and attachment
    errors, the numerous surface constraints are impulse capped to avoid
    overpowering mechanical constraints. To improve tracking accuracy, multiple
    simulations are spawned at each frame and fed a variety of heuristics,
    constraints and poses. A 3D error metric selects the best-fit simulation,
    helping the system handle challenging hand motions. Such an approach enables
    real-time, robust, and accurate 3D skeletal tracking of a user’s hand on a
    variety of depth cameras, while only utilizing a single x86 CPU core for
    processing.

    Computer vision-based food calorie estimation: dataset, method, and experiment

    Yanchao Liang, Jianhua Li
    Comments: 5 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Computer vision has been introduced to estimate calories from food images.
    But current food image data sets don’t contain volume information and quality
    information of foods, which leads to an incomplete calorie estimation. In this
    paper, we present a novel food image data set with volume information and
    quality information of foods, and a deep learning method for food detection, to
    make a complete calorie estimation. Our data set includes 2978 images, and
    every image contains corresponding each food’s annotation, volume and quality
    information, as well as a certain calibration reference. To estimate calorie of
    food in the proposed data set, a deep learning method using Faster R-CNN first
    is put forward to detect the food. And the experiment results show our method
    is effective to estimate calories and our data set contains adequate
    information for calorie estimation. Our data set is the first released food
    image data set which can be used to evaluate computer vision-based calorie
    estimation methods.

    View-Invariant Recognition of Action Style Self-Dissimilarity

    Yuping Shen, Hassan Foroosh
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Self-similarity was recently introduced as a measure of inter-class
    congruence for classification of actions. Herein, we investigate the dual
    problem of intra-class dissimilarity for classification of action styles. We
    introduce self-dissimilarity matrices that discriminate between same actions
    performed by different subjects regardless of viewing direction and camera
    parameters. We investigate two frameworks using these invariant style
    dissimilarity measures based on Principal Component Analysis (PCA) and Fisher
    Discriminant Analysis (FDA). Extensive experiments performed on IXMAS dataset
    indicate remarkably good discriminant characteristics for the proposed
    invariant measures for gender recognition from video data.

    Learning Robust Object Recognition Using Composed Scenes from Generative Models

    Hao Wang, Xingyu Lin, Yimeng Zhang, Tai Sing Lee
    Comments: Accepted by 14th Conference on Computer and Robot Vision
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recurrent feedback connections in the mammalian visual system have been
    hypothesized to play a role in synthesizing input in the theoretical framework
    of analysis by synthesis. The comparison of internally synthesized
    representation with that of the input provides a validation mechanism during
    perceptual inference and learning. Inspired by these ideas, we proposed that
    the synthesis machinery can compose new, unobserved images by imagination to
    train the network itself so as to increase the robustness of the system in
    novel scenarios. As a proof of concept, we investigated whether images composed
    by imagination could help an object recognition system to deal with occlusion,
    which is challenging for the current state-of-the-art deep convolutional neural
    networks. We fine-tuned a network on images containing objects in various
    occlusion scenarios, that are imagined or self-generated through a deep
    generator network. Trained on imagined occluded scenarios under the object
    persistence constraint, our network discovered more subtle and localized image
    features that were neglected by the original network for object classification,
    obtaining better separability of different object classes in the feature space.
    This leads to significant improvement of object recognition under occlusion for
    our network relative to the original network trained only on un-occluded
    images. In addition to providing practical benefits in object recognition under
    occlusion, this work demonstrates the use of self-generated composition of
    visual scenes through the synthesis loop, combined with the object persistence
    constraint, can provide opportunities for neural networks to discover new
    relevant patterns in the data, and become more flexible in dealing with novel
    situations.

    Boosting the accuracy of multi-spectral image pan-sharpening by learning a deep residual network

    Yancong Wei, Qiangqiang Yuan, Huanfeng Shen, Liangpei Zhang
    Comments: 5 pages,5 figures, 1 table
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In the field of fusing multi-spectral and panchromatic images
    (Pan-sharpening), the impressive effectiveness of deep neural networks has been
    recently employed to overcome the drawbacks of traditional linear models and
    boost the fusing accuracy. However, to the best of our knowledge, existing
    research works are mainly based on simple and flat networks with relatively
    shallow architecture, which severely limited their performances. In this paper,
    the concept of residual learning has been introduced to form a very deep
    convolutional neural network to make a full use of the high non-linearity of
    deep learning models. By both quantitative and visual assessments on a large
    number of high quality multi-spectral images from various sources, it has been
    supported that our proposed model is superior to all mainstream algorithms
    included in the comparison, and achieved the highest spatial-spectral unified
    accuracy.

    Building Emotional Machines: Recognizing Image Emotions through Deep Neural Networks

    Hye-Rin Kim, Seon Joo Kim, In-Kwon Lee
    Comments: 11 pages, 15 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

    An image is a very effective tool for conveying emotions. Many researchers
    have investigated in computing the image emotions by using various features
    extracted from images. In this paper, we focus on two high level features, the
    object and the background, and assume that the semantic information of images
    is a good cue for emotion prediction. An object is one of the most important
    elements that define an image, and we find out through experiments that there
    is a high correlation between the object and the emotion in images. Even with
    the same object, there may be slight difference in emotion due to different
    backgrounds, and we use the semantic information of the background to improve
    the prediction performance. By combining the different levels of features, we
    build an emotion based feed forward deep neural network which produces the
    emotion values of a given image. The output emotion values in our framework are
    continuous values in the 2-dimensional space (Valence and Arousal), which are
    more effective than using a few number of emotion categories in describing
    emotions. Experiments confirm the effectiveness of our network in predicting
    the emotion of images.

    Classification and Retrieval of Digital Pathology Scans: A New Dataset

    Morteza Babaie, Shivam Kalra, Aditya Sriram, Christopher Mitcheltree, Shujin Zhu, Amin Khatami, Shahryar Rahnamayan, H.R. Tizhoosh
    Comments: Accepted for presentation at Workshop for Computer Vision for Microscopy Image Analysis (CVMI 2017) @ CVPR 2017, Honolulu, Hawaii
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we introduce a new dataset, extbf{Kimia Path24}, for image
    classification and retrieval in digital pathology. We use the whole scan images
    of 24 different tissue textures to generate 1,325 test patches of size
    1000( imes)1000 (0.5mm( imes)0.5mm). Training data can be generated according
    to preferences of algorithm designer and can range from approximately 27,000 to
    over 50,000 patches if the preset parameters are adopted. We propose a compound
    patch-and-scan accuracy measurement that makes achieving high accuracies quite
    challenging. In addition, we set the benchmarking line by applying LBP,
    dictionary approach and convolutional neural nets (CNNs) and report their
    results. The highest accuracy was 41.80\% for CNN.

    Image Segmentation by Iterative Inference from Conditional Score Estimation

    Adriana Romero, Michal Drozdzal, Akram Erraqabi, Simon Jégou, Yoshua Bengio
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Inspired by the combination of feedforward and iterative computations in the
    virtual cortex, and taking advantage of the ability of denoising autoencoders
    to estimate the score of a joint distribution, we propose a novel approach to
    iterative inference for capturing and exploiting the complex joint distribution
    of output variables conditioned on some input variables. This approach is
    applied to image pixel-wise segmentation, with the estimated conditional score
    used to perform gradient ascent towards a mode of the estimated conditional
    distribution. This extends previous work on score estimation by denoising
    autoencoders to the case of a conditional distribution, with a novel use of a
    corrupted feedforward predictor replacing Gaussian corruption. An advantage of
    this approach over more classical ways to perform iterative inference for
    structured outputs, like conditional random fields (CRFs), is that it is not
    any more necessary to define an explicit energy function linking the output
    variables. To keep computations tractable, such energy function
    parametrizations are typically fairly constrained, involving only a few
    neighbors of each of the output variables in each clique. We experimentally
    find that the proposed iterative inference from conditional score estimation by
    conditional denoising autoencoders performs better than comparable models based
    on CRFs or those not using any explicit modeling of the conditional joint
    distribution of outputs.

    The Do's and Don'ts for CNN-based Face Verification

    Ankan Bansal, Carlos Castillo, Rajeev Ranjan, Rama Chellappa
    Comments: 10 pages including references
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks (CNN) have become the most sought after tools
    for addressing object recognition problems. Specifically, they have produced
    state-of-the art results for unconstrained face recognition and verification
    tasks. While the research community appears to have developed a consensus on
    the methods of acquiring annotated data, design and training of CNNs, many
    questions still remain to be answered. In this paper, we explore the following
    questions that are critical to face recognition research: (i) Can we train on
    still images and expect the systems to work on videos? (ii) Are deeper datasets
    better than wider datasets? (iii) Does adding label noise lead to improvement
    in performance of deep networks? (iv) Is alignment needed for face recognition?
    We address these questions by training CNNs using CASIA-WebFace, UMDFaces, and
    a new video dataset and testing on YouTubeFaces, IJBA and a disjoint portion of
    UMDFaces datasets. Our new data set, which will be made publicly available, has
    22,075 videos and 3,735,476 human annotated frames extracted from them.

    Generative Partition Networks for Multi-Person Pose Estimation

    Xuecheng Nie, Jiashi Feng, Junliang Xing, Shuicheng Yan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a new framework, named Generative Partition Network
    (GPN), for addressing the challenging multi-person pose estimation problem.
    Different from existing pure top-down and bottom-up solutions, the proposed GPN
    models the multi-person partition detection as a generative process from joint
    candidates and infers joint configurations for person instances from each
    person partition locally, resulting in both low joint detection and joint
    partition complexities. In particular, GPN designs a generative model based on
    the Generalized Hough Transform framework to detect person partitions via votes
    from joint candidates in the Hough space, parameterized by centroids of
    persons. Such generative model produces joint candidates and their
    corresponding person partitions by performing only one pass of joint detection.
    In addition, GPN formulates the inference procedure for joint configurations of
    human poses as a graph partition problem and optimizes it locally. Inspired by
    recent success of deep learning techniques for human pose estimation, GPN
    designs a multi-stage convolutional neural network with feature pyramid branch
    to jointly learn joint confidence maps and Hough transformation maps. Extensive
    experiments on two benchmarks demonstrate the efficiency and effectiveness of
    the proposed GPN.

    Structured Image Classification from Conditional Random Field with Deep Class Embedding

    Eran Goldman, Jacob Goldberger
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel deep learning architecture to classify structured
    objects in datasets with a large number of visually similar categories. Our
    model extends the CRF objective function to a nonlinear form, by factorizing
    the pairwise potential matrix, to learn neighboring-class embedding. The
    embedding and the classifier are jointly trained to optimize this highly
    nonlinear CRF objective function. The non-linear model is trained on
    object-level samples, which is much faster and more accurate than the standard
    sequence-level training of the linear model. This model overcomes the
    difficulties of existing CRF methods to learn the contextual relationships
    thoroughly when there is a large number of classes and the data is sparse. The
    performance of the proposed method is illustrated on a huge dataset that
    contains images of retail-store product displays, taken in varying settings and
    viewpoints, and shows significantly improved results compared to linear CRF
    modeling and sequence-level training.

    CrossNets : A New Approach to Complex Learning

    Chirag Agarwal, Mehdi Sharifzhadeh, Joe Klobusicky, Dan Schonfeld
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a novel neural network structure called CrossNets, which considers
    architectures on directed acyclic graphs. This structure builds on previous
    generalizations of feed forward models, such as ResNets, by allowing for all
    forward cross connections between layers (both adjacent and non-adjacent). The
    addition of cross connections among the network increases information flow
    across the whole network, leading to better training and testing performances.
    The superior performance of the network is tested against four benchmark
    datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN. We conclude with a proof of
    convergence for Crossnets to a local minimum for error, where weights for
    connections are chosen through backpropagation with momentum.

    DeepMasterPrint: Generating Fingerprints for Presentation Attacks

    Philip Bontrager, Julian Togelius, Nasir Memon
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Learning (cs.LG)

    We present two related methods for creating MasterPrints, synthetic
    fingerprints that a fingerprint verification system identifies as many
    different people. Both methods start with training a Generative Adversarial
    Network (GAN) on a set of real fingerprint images. The generator network is
    then used to search for images that can be recognized as multiple individuals.
    The first method uses evolutionary optimization in the space of latent
    variables, and the second uses gradient-based search. Our method is able to
    design a MasterPrint that a commercial fingerprint system matches to 22% of all
    users in a strict security setting, and 75% of all users at a looser security
    setting.

    Incorporating Depth into both CNN and CRF for Indoor Semantic Segmentation

    Jindong Jiang, Zhijun Zhang, Yongqian Huang, Lunan Zheng
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    To improve segmentation performance, a novel neural network architecture
    (termed DFCN-DCRF) is proposed, which combines an RGB-D fully convolutional
    neural network (DFCN) with a depth-sensitive fully-connected conditional random
    field (DCRF). First, a DFCN architecture which fuses depth information into the
    early layers and applies dilated convolution for later contextual reasoning is
    designed. Then, a depth-sensitive fully-connected conditional random field
    (DCRF) is proposed and combined with the previous DFCN to refine the
    preliminary result. Comparative experiments show that the proposed DFCN-DCRF
    has the best performance compared with most state-of-the-art methods.

    Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning

    Reza Abbasi-Asl, Bin Yu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks (CNNs) have state-of-the-art performance on
    many problems in machine vision. However, networks with superior performance
    often have millions of weights so that it is difficult or impossible to use
    CNNs on computationally limited devices or to humanly interpret them. A myriad
    of CNN compression approaches have been proposed and they involve pruning and
    compressing the weights and filters. In this article, we introduce a greedy
    structural compression scheme that prunes filters in a trained CNN. We define a
    filter importance index equal to the classification accuracy reduction (CAR) of
    the network after pruning that filter (similarly defined as RAR for
    regression). We then iteratively prune filters based on the CAR index. This
    algorithm achieves substantially higher classification accuracy in AlexNet
    compared to other structural compression schemes that prune filters. Pruning
    half of the filters in the first or second layer of AlexNet, our CAR algorithm
    achieves 26% and 20% higher classification accuracies respectively, compared to
    the best benchmark filter pruning scheme. Our CAR algorithm, combined with
    further weight pruning and compressing, reduces the size of first or second
    convolutional layer in AlexNet by a factor of 42, while achieving close to
    original classification accuracy through retraining (or fine-tuning) network.
    Finally, we demonstrate the interpretability of CAR-compressed CNNs by showing
    that our algorithm prunes filters with visually redundant functionalities. In
    fact, out of top 20 CAR-pruned filters in AlexNet, 17 of them in the first
    layer and 14 of them in the second layer are color-selective filters as opposed
    to shape-selective filters. To our knowledge, this is the first reported result
    on the connection between compression and interpretability of CNNs.

    Phase-Shifting Separable Haar Wavelets and Applications

    Mais Alnasser, Hassan Foroosh
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a new approach for tackling the shift-invariance problem
    in the discrete Haar domain, without trading off any of its desirable
    properties, such as compression, separability, orthogonality, and symmetry. The
    paper presents several key theoretical contributions. First, we derive closed
    form expressions for phase shifting in the Haar domain both in partially
    decimated and fully decimated transforms. Second, it is shown that the wavelet
    coefficients of the shifted signal can be computed solely by using the
    coefficients of the original transformed signal. Third, we derive closed-form
    expressions for non-integer shifts, which have not been previously reported in
    the literature. Fourth, we establish the complexity of the proposed phase
    shifting approach using the derived analytic expressions. As an application
    example of these results, we apply the new formulae to image rotation and
    interpolation, and evaluate its performance against standard methods.

    Critical Contours: An Invariant Linking Image Flow with Salient Surface Organization

    Benjamin S. Kunsberg, Steven W. Zucker
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We exploit a key result from visual psychophysics — that individuals
    perceive shape qualitatively — to develop a geometrical/topological invariant
    (the Morse-Smale complex) relating image structure with surface structure.
    Differences across individuals are minimal near certain configurations such as
    ridges and boundaries, and it is these configurations that are often
    represented in line drawings. In particular, we introduce a method for
    inferring qualitative 3D shape from shading patterns that link the
    shape-from-shading inference with shape-from-contour. For a given shape,
    certain shading patches become “line drawings” in a well-defined limit. Under
    this limit, and invariantly, these shading patterns provide a topological
    description of the surface. We further show that, under this model, the
    contours partition the surface into meaningful parts using the Morse-Smale
    complex. Critical contours are the (perceptually) stable parts of this complex
    and are invariant over a wide class of rendering models. Intuitively, our main
    result shows that critical contours partition smooth surfaces into bumps and
    valleys, in effect providing a scaffold on the image from which a full surface
    can be interpolated.

    Forecasting Hand and Object Locations in Future Frames

    Chenyou Fan, Jangwon Lee, Michael S. Ryoo
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents an approach to forecast future locations of human hands
    and objects. Given an image frame, the goal is to predict presence and location
    of hands and objects in the future frame (e.g., 5 seconds later), even when
    they are not visible in the current frame. The key idea is that (1) an
    intermediate representation of a convolutional object recognition model
    abstracts scene information in its frame and that (2) we can predict (i.e.,
    regress) such representations corresponding to the future frames based on that
    of the current frame. We design a new two-stream convolutional neural network
    (CNN) architecture for videos by extending the state-of-the-art convolutional
    object detection network, and present a new fully convolutional regression
    network for predicting future scene representations. Our experiments confirm
    that combining the regressed future representation with our detection network
    allows reliable estimation of future hands and objects in videos

    Gaze Distribution Analysis and Saliency Prediction Across Age Groups

    Onkar Krishna, Kiyoharu Aizawa, Andrea Helo, Rama Pia
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Knowledge of the human visual system helps to develop better computational
    models of visual attention. State-of-the-art models have been developed to
    mimic the visual attention system of young adults that, however, largely ignore
    the variations that occur with age. In this paper, we investigated how visual
    scene processing changes with age and we propose an age-adapted framework that
    helps to develop a computational model that can predict saliency across
    different age groups. Our analysis uncovers how the explorativeness of an
    observer varies with age, how well saliency maps of an age group agree with
    fixation points of observers from the same or different age groups, and how age
    influences the center bias. We analyzed the eye movement behavior of 82
    observers belonging to four age groups while they explored visual scenes.
    Explorativeness was quantified in terms of the entropy of a saliency map, and
    area under the curve (AUC) metrics was used to quantify the agreement analysis
    and the center bias. These results were used to develop age adapted saliency
    models. Our results suggest that the proposed age-adapted saliency model
    outperforms existing saliency models in predicting the regions of interest
    across age groups.

    Non-Linear Phase-Shifting of Haar Wavelets for Run-Time All-Frequency Lighting

    Mais Alnasser, Hassan Foroosh
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper focuses on real-time all-frequency image-based rendering using an
    innovative solution for run-time computation of light transport. The approach
    is based on new results derived for non-linear phase shifting in the Haar
    wavelet domain. Although image-based methods for real-time rendering of dynamic
    glossy objects have been proposed, they do not truly scale to all possible
    frequencies and high sampling rates without trading storage, glossiness, or
    computational time, while varying both lighting and viewpoint. This is due to
    the fact that current approaches are limited to precomputed radiance transfer
    (PRT), which is prohibitively expensive in terms of memory requirements and
    real-time rendering when both varying light and viewpoint changes are required
    together with high sampling rates for high frequency lighting of glossy
    material. On the other hand, current methods cannot handle object rotation,
    which is one of the paramount issues for all PRT methods using wavelets. This
    latter problem arises because the precomputed data are defined in a global
    coordinate system and encoded in the wavelet domain, while the object is
    rotated in a local coordinate system. At the root of all the above problems is
    the lack of efficient run-time solution to the nontrivial problem of rotating
    wavelets (a non-linear phase-shift), which we solve in this paper.

    Recurrent Scene Parsing with Perspective Understanding in the Loop

    Shu Kong, Charless Fowlkes
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Objects may appear at arbitrary scales in perspective images of a scene,
    posing a challenge for recognition systems that process an image at a fixed
    resolution. We propose a depth-aware gating module that adaptively chooses the
    pooling field size in a convolutional network architecture according to the
    object scale (inversely proportional to the depth) so that small details can be
    preserved for objects at distance and a larger receptive field can be used for
    objects nearer to the camera. The depth gating signal is provided from stereo
    disparity (when available) or estimated directly from a single image. We
    integrate this depth-aware gating into a recurrent convolutional neural network
    trained in an end-to-end fashion to perform semantic segmentation. Our
    recurrent module iteratively refines the segmentation results, leveraging the
    depth estimate and output prediction from the previous loop. Through extensive
    experiments on three popular large-scale RGB-D datasets, we demonstrate our
    approach achieves competitive semantic segmentation performance using more
    compact model than existing methods. Interestingly, we find segmentation
    performance improves when we estimate depth directly from the image rather than
    using “ground-truth” and the model produces state-of-the-art results for
    quantitative depth estimation from a single image.

    Quadruplet Network with One-Shot Learning for Visual Tracking

    Xingping Dong, Jianbing Shen, Fatih Porikli
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    As a discriminative method of one-shot learning, Siamese deep network allows
    recognizing an object from a single exemplar with the same class label.
    However, it does not take the advantage of the underlying structure and
    relationship among a multitude of instances since it only relies on pairs of
    instances for training. In this paper, we propose a quadruplet deep network to
    examine the potential connections among the training instances, aiming to
    achieve a more powerful representation. We design four shared networks that
    receive multi-tuple of instances as inputs and are connected by a novel loss
    function consisting of pair-loss and triplet-loss. According to the similarity
    metric, we select the most similar and the most dissimilar instances as the
    positive and negative inputs of triplet loss from each multi-tuple. We show
    that this scheme improves the training performance and convergence speed.
    Furthermore, we introduce a new weighted pair loss for an additional
    acceleration of the convergence. We demonstrate promising results for
    model-free tracking-by-detection of objects from a single initial exemplar in
    the Visual Object Tracking benchmark.

    PixColor: Pixel Recursive Colorization

    Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a novel approach to automatically produce multiple colorized
    versions of a grayscale image. Our method results from the observation that the
    task of automated colorization is relatively easy given a low-resolution
    version of the color image. We first train a conditional PixelCNN to generate a
    low resolution color for a given grayscale image. Then, given the generated
    low-resolution color image and the original grayscale image as inputs, we train
    a second CNN to generate a high-resolution colorization of an image. We
    demonstrate that our approach produces more diverse and plausible colorizations
    than existing methods, as judged by human raters in a “Visual Turing Test”.

    Towards Real World Human Parsing: Multiple-Human Parsing in the Wild

    Jianshu Li, Jian Zhao, Yunchao Wei, Congyan Lang, Yidong Li, Jiashi Feng
    Comments: The first two authors are with equal contribution
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The recent progress of human parsing techniques has been largely driven by
    the availability of rich data resources. In this work, we demonstrate some
    critical discrepancies between the current benchmark datasets and the real
    world human parsing scenarios. For instance, all the human parsing datasets
    only contain one person per image, while usually multiple persons appear
    simultaneously in a realistic scene. It is more practically demanded to
    simultaneously parse multiple persons, which presents a greater challenge to
    modern human parsing methods. Unfortunately, absence of relevant data resources
    severely impedes the development of multiple-human parsing methods.

    To facilitate future human parsing research, we introduce the Multiple-Human
    Parsing (MHP) dataset, which contains multiple persons in a real world scene
    per single image. The MHP dataset contains various numbers of persons (from 2
    to 16) per image with 18 semantic classes for each parsing annotation. Persons
    appearing in the MHP images present sufficient variations in pose, occlusion
    and interaction. To tackle the multiple-human parsing problem, we also propose
    a novel Multiple-Human Parser (MH-Parser), which considers both the global
    context and local cues for each person in the parsing process. The model is
    demonstrated to outperform the naive “detect-and-parse” approach by a large
    margin, which will serve as a solid baseline and help drive the future research
    in real world human parsing.

    Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image Generation

    Lei Cai, Hongyang Gao, Shuiwang Ji
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Variational auto-encoder (VAE) is a powerful unsupervised learning framework
    for image generation. One drawback of VAE is that it generates blurry images
    due to its Gaussianity assumption and thus L2 loss. To allow the generation of
    high quality images by VAE, we increase the capacity of decoder network by
    employing residual blocks and skip connections, which also enable efficient
    optimization. To overcome the limitation of L2 loss, we propose to generate
    images in a multi-stage manner from coarse to fine. In the simplest case, the
    proposed multi-stage VAE divides the decoder into two components in which the
    second component generates refined images based on the course images generated
    by the first component. Since the second component is independent of the VAE
    model, it can employ other loss functions beyond the L2 loss and different
    model architectures. The proposed framework can be easily generalized to
    contain more than two components. Experiment results on the MNIST and CelebA
    datasets demonstrate that the proposed multi-stage VAE can generate sharper
    images as compared to those from the original VAE.

    A Lightweight Approach for On-the-Fly Reflectance Estimation

    Kihwan Kim, Jinwei Gu, Stephen Tyree, Pavlo Molchanov, Matthias Nießner, Jan Kautz
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Estimating surface reflectance (BRDF) is one key component for complete 3D
    scene capture, with wide applications in virtual reality, augmented reality,
    and human computer interaction. Prior work is either limited to controlled
    environments (eg gonioreflectometers, light stages, or multi-camera domes), or
    requires the joint optimization of shape, illumination, and reflectance, which
    is often computationally too expensive (eg hours of running time) for
    real-time applications. Moreover, most prior work requires HDR images as input
    which further complicates the capture process. In this paper, we propose a
    lightweight approach for surface reflectance estimation directly from (8)-bit
    RGB images in real-time, which can be easily plugged into any 3D
    scanning-and-fusion system with a commodity RGBD sensor. Our method is
    learning-based, with an inference time of less than 90ms per scene and a model
    size of less than 340K bytes. We propose two novel network architectures,
    HemiCNN and Grouplet, to deal with the unstructured input data from multiple
    viewpoints under unknown illumination. We further design a loss function to
    resolve the color-constancy and scale ambiguity. In addition, we have created a
    large synthetic dataset, SynBRDF, which comprises a total of (500)K RGBD images
    rendered with a physically-based ray tracer under a variety of natural
    illumination, covering (5000) materials and (5000) shapes. SynBRDF is the first
    large-scale benchmark dataset for reflectance estimation. Experiments on both
    synthetic data and real data show that the proposed method effectively recovers
    surface reflectance, and outperforms prior work for reflectance estimation in
    uncontrolled environments.

    A New 3D Method to Segment the Lumbar Vertebral Bodies and to Determine Bone Mineral Density and Geometry

    Andre Mastmeyer, Klaus Engelke, Sebastian Meller, Willi Kalender
    Comments: 8 pages, 6 figures, 1 table, MICCAI 2005 workshop paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we present a new 3D segmentation approach for the vertebrae of
    the lower thoracic and the lumbar spine in spiral computed tomography datasets.
    We implemented a multi-step procedure. Its main components are deformable
    models, volume growing, and morphological operations. The performance analysis
    that included an evaluation of accuracy using the European Spine Phantom, and
    of intra-operator precision using clinical CT datasets from 10 patients
    highlight the potential for clinical use. The intra-operator precision of the
    segmentation procedure was better than 1% for Bone Mineral Density (BMD) and
    better than 1.8% for volume. The long-term goal of this work is to enable
    better fracture prediction and improved patient monitoring in the field of
    osteoporosis. A true 3D segmentation also enables an accurate measurement of
    geometrical parameters that can augment the classical measurement of BMD.

    Sparse Coding on Stereo Video for Object Detection

    Sheng Y. Lundquist, Melanie Mitchell, Garrett T. Kenyon
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep Convolutional Neural Networks (DCNN) require millions of labeled
    training examples for image classification and object detection tasks, which
    restrict these models to domains where such a dataset is available. We explore
    the use of unsupervised sparse coding applied to stereo-video data to help
    alleviate the need for large amounts of labeled data. In this paper, we show
    that unsupervised sparse coding is able to learn disparity and motion sensitive
    basis functions when exposed to unlabeled stereo-video data. Additionally, we
    show that a DCNN that incorporates unsupervised learning exhibits better
    performance than fully supervised networks. Furthermore, finding a sparse
    representation in the first layer, which infers a sparse set of activations,
    allows for consistent performance over varying initializations and ordering of
    training examples when compared to representations computed from a single
    convolution. Finally, we compare activations between the unsupervised
    sparse-coding layer and the supervised layer when applied to stereo-video data,
    and show that sparse coding exhibits an encoding that is depth selective,
    whereas encodings from a single convolution do not. These result indicates
    promise for using unsupervised sparse-coding approaches in real- world computer
    vision tasks in domains with limited labeled training data.

    A New 3D Segmentation Methodology for Lumbar Vertebral Bodies for the Measurement of BMD and Geometry

    Andre Mastmeyer, Klaus Engelke, Willi Kalender
    Comments: 4 pages,2 figures, MIUA05 conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper a new technique is presented that extracts the geometry of
    lumbar vertebral bodies from spiral CT scans. Our new multi-step segmentation
    approach yields highly accurate and precise measurement of the bone mineral
    density (BMD) in different volumes of interest which are defined relative to a
    local anatomical coordinate systems. The approach also enables the analysis of
    the geometry of the relevant vertebrae. Intra- and inter operator precision for
    segmentation, BMD measurement and position of the coordinate system are below
    1.5% in patient data, accuracy errors are below 1.5% for BMD and below 4% for
    volume in phantom data. The long-term goal of the approach is to improve
    fracture prediction in osteoporosis.

    Simultaneous Multiple Surface Segmentation Using Deep Learning

    Abhay Shah, Michael Abramoff, Xiaodong Wu
    Comments: 8 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The task of automatically segmenting 3-D surfaces representing boundaries of
    objects is important for quantitative analysis of volumetric images, and plays
    a vital role in biomedical image analysis. Recently, graph-based methods with a
    global optimization property have been developed and optimized for various
    medical imaging applications. Despite their widespread use, these require human
    experts to design transformations, image features, surface smoothness priors,
    and re-design for a different tissue, organ or imaging modality. Here, we
    propose a Deep Learning based approach for segmentation of the surfaces in
    volumetric medical images, by learning the essential features and
    transformations from training data, without any human expert intervention. We
    employ a regional approach to learn the local surface profiles. The proposed
    approach was evaluated on simultaneous intraretinal layer segmentation of
    optical coherence tomography (OCT) images of normal retinas and retinas
    affected by age related macular degeneration (AMD). The proposed approach was
    validated on 40 retina OCT volumes including 20 normal and 20 AMD subjects. The
    experiments showed statistically significant improvement in accuracy for our
    approach compared to state-of-the-art graph based optimal surface segmentation
    with convex priors (G-OSC). A single Convolution Neural Network (CNN) was used
    to learn the surfaces for both normal and diseased images. The mean unsigned
    surface positioning errors obtained by G-OSC method 2.31 voxels (95% CI
    2.02-2.60 voxels) was improved to (1.27) voxels (95% CI 1.14-1.40 voxels) using
    our new approach. On average, our approach takes 94.34 s, requiring 95.35 MB
    memory, which is much faster than the 2837.46 s and 6.87 GB memory required by
    the G-OSC method on the same computer system.

    Deep De-Aliasing for Fast Compressive Sensing MRI

    Simiao Yu, Hao Dong, Guang Yang, Greg Slabaugh, Pier Luigi Dragotti, Xujiong Ye, Fangde Liu, Simon Arridge, Jennifer Keegan, David Firmin, Yike Guo
    Comments: 15 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Fast Magnetic Resonance Imaging (MRI) is highly in demand for many clinical
    applications in order to reduce the scanning cost and improve the patient
    experience. This can also potentially increase the image quality by reducing
    the motion artefacts and contrast washout. However, once an image field of view
    and the desired resolution are chosen, the minimum scanning time is normally
    determined by the requirement of acquiring sufficient raw data to meet the
    Nyquist-Shannon sampling criteria. Compressive Sensing (CS) theory has been
    perfectly matched to the MRI scanning sequence design with much less required
    raw data for the image reconstruction. Inspired by recent advances in deep
    learning for solving various inverse problems, we propose a conditional
    Generative Adversarial Networks-based deep learning framework for de-aliasing
    and reconstructing MRI images from highly undersampled data with great promise
    to accelerate the data acquisition process. By coupling an innovative content
    loss with the adversarial loss our de-aliasing results are more realistic.
    Furthermore, we propose a refinement learning procedure for training the
    generator network, which can stabilise the training with fast convergence and
    less parameter tuning. We demonstrate that the proposed framework outperforms
    state-of-the-art CS-MRI methods, in terms of reconstruction error and
    perceptual image quality. In addition, our method can reconstruct each image in
    0.22ms–0.37ms, which is promising for real-time applications.

    Stabilizing GAN Training with Multiple Random Projections

    Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Training generative adversarial networks is unstable in high-dimensions when
    the true data distribution lies on a lower-dimensional manifold. The
    discriminator is then easily able to separate nearly all generated samples
    leaving the generator without meaningful gradients. We propose training a
    single generator simultaneously against an array of discriminators, each of
    which looks at a different random low-dimensional projection of the data. We
    show that individual discriminators then provide stable gradients to the
    generator, and that the generator learns to produce samples consistent with the
    full data distribution to satisfy all discriminators. We demonstrate the
    practical utility of this approach experimentally, and show that it is able to
    produce image samples with higher quality than traditional training with a
    single discriminator.

    Improving classification accuracy of feedforward neural networks for spiking neuromorphic chips

    Antonio Jimeno Yepes, Jianbin Tang, Benjamin Scott Mashford
    Comments: IJCAI-2017. arXiv admin note: text overlap with arXiv:1605.07740
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Deep Neural Networks (DNN) achieve human level performance in many image
    analytics tasks but DNNs are mostly deployed to GPU platforms that consume a
    considerable amount of power. New hardware platforms using lower precision
    arithmetic achieve drastic reductions in power consumption. More recently,
    brain-inspired spiking neuromorphic chips have achieved even lower power
    consumption, on the order of milliwatts, while still offering real-time
    processing.

    However, for deploying DNNs to energy efficient neuromorphic chips the
    incompatibility between continuous neurons and synaptic weights of traditional
    DNNs, discrete spiking neurons and synapses of neuromorphic chips need to be
    overcome. Previous work has achieved this by training a network to learn
    continuous probabilities, before it is deployed to a neuromorphic architecture,
    such as IBM TrueNorth Neurosynaptic System, by random sampling these
    probabilities.

    The main contribution of this paper is a new learning algorithm that learns a
    TrueNorth configuration ready for deployment. We achieve this by training
    directly a binary hardware crossbar that accommodates the TrueNorth axon
    configuration constrains and we propose a different neuron model.

    Results of our approach trained on electroencephalogram (EEG) data show a
    significant improvement with previous work (76% vs 86% accuracy) while
    maintaining state of the art performance on the MNIST handwritten data set.

    Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

    Xin Dong, Shangyu Chen, Sinno Jialin Pan
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    How to develop slim and accurate deep neural networks has become crucial for
    real- world applications, especially for those employed in embedded systems.
    Though previous work along this research line has shown some promising results,
    most existing methods either fail to significantly compress a well-trained deep
    network or require a heavy retraining process for the pruned deep network to
    re-boost its prediction performance. In this paper, we propose a new layer-wise
    pruning method for deep neural networks. In our proposed method, parameters of
    each individual layer are pruned independently based on second order
    derivatives of a layer-wise error function with respect to the corresponding
    parameters. We prove that the final prediction performance drop after pruning
    is bounded by a linear combination of the reconstructed errors caused at each
    layer. Therefore, there is a guarantee that one only needs to perform a light
    retraining process on the pruned network to resume its original prediction
    performance. We conduct extensive experiments on benchmark datasets to
    demonstrate the effectiveness of our pruning method compared with several
    state-of-the-art baseline methods.

    Shake-Shake regularization

    Xavier Gastaldi
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    The method introduced in this paper aims at helping deep learning
    practitioners faced with an overfit problem. The idea is to replace, in a
    multi-branch network, the standard summation of parallel branches with a
    stochastic affine combination. Applied to 3-branch residual networks,
    shake-shake regularization improves on the best single shot published results
    on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%.
    Experiments on architectures without skip connections or Batch Normalization
    show encouraging results and open the door to a large set of applications. Code
    is available at this https URL

    Stabilizing Adversarial Nets With Prediction Methods

    Abhay Yadav (1), Sohil Shah (1), Zheng Xu (1), David Jacobs (1), Tom Goldstein (1) ((1) University of Maryland, College Park, MD, USA)
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (cs.NA)

    Adversarial neural networks solve many important problems in data science,
    but are notoriously difficult to train. These difficulties come from the fact
    that optimal weights for adversarial nets correspond to saddle points, and not
    minimizers, of the loss function. The alternating stochastic gradient methods
    typically used for such problems do not reliably converge to saddle points, and
    when convergence does happen it is often highly sensitive to learning rates. We
    propose a simple modification of stochastic gradient descent that stabilizes
    adversarial networks. We show, both in theory and practice, that the proposed
    method reliably converges to saddle points. This makes adversarial networks
    less likely to “collapse”, and enables faster training with larger learning
    rates.

    Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

    Nicholas Carlini, David Wagner
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

    Neural networks are known to be vulnerable to adversarial examples: inputs
    that are close to valid inputs but classified incorrectly. We investigate the
    security of ten recent proposals that are designed to detect adversarial
    examples. We show that all can be defeated, even when the adversary does not
    know the exact parameters of the detector. We conclude that adversarial
    examples are significantly harder to detect than previously appreciated, and we
    propose several guidelines for evaluating future proposed defenses.

    End-to-end Planning of Fixed Millimeter-Wave Networks

    Tim Danford, Onur Filiz, Jing Huang, Brian Karrer, Manohar Paluri, Guan Pang, Vish Ponnampalam, Nicolas Stier-Moses, Birce Tezel
    Subjects: Networking and Internet Architecture (cs.NI); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)

    This article discusses a framework to support the design and end-to-end
    planning of fixed millimeter-wave networks. Compared to traditional techniques,
    the framework allows an organization to quickly plan a deployment in a
    cost-effective way. We start by using LiDAR data—basically, a 3D point cloud
    captured from a city—to estimate potential sites to deploy antennas and
    whether there is line-of-sight between them. With that data on hand, we use
    combinatorial optimization techniques to determine the optimal set of locations
    and how they should communicate with each other, to satisfy engineering (e.g.,
    latency, polarity), design (e.g., reliability) and financial (e.g., total cost
    of operation) constraints. The primary goal is to connect as many people as
    possible to the network. Our methodology can be used for strategic planning
    when an organization is in the process of deciding whether to adopt a
    millimeter-wave technology or choosing between locations, or for operational
    planning when conducting a detailed design of the actual network to be deployed
    in a selected location.

    How to Train Your DRAGAN

    Naveen Kodali, Jacob Abernethy, James Hays, Zsolt Kira
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Generative Adversarial Networks have emerged as an effective technique for
    estimating data distributions. The basic setup consists of two deep networks
    playing against each other in a zero-sum game setting. However, it is not
    understood if the networks reach an equilibrium eventually and what dynamics
    makes this possible. The current GAN training procedure, which involves
    simultaneous gradient descent, lacks a clear game-theoretic justification in
    the literature. In this paper, we introduce regret minimization as a technique
    to reach equilibrium in games and use this to motivate the use of simultaneous
    GD in GANs. In addition, we present a hypothesis that mode collapse, which is a
    common occurrence in GAN training, happens due to the existence of spurious
    local equilibria in non-convex games. Motivated by these insights, we develop
    an algorithm called DRAGAN that is fast, simple to implement and achieves
    competitive performance in a stable fashion across different architectures,
    datasets (MNIST, CIFAR-10, and CelebA), and divergence measures with almost no
    hyperparameter tuning.

    Espresso: Efficient Forward Propagation for BCNNs

    Fabrizio Pedersoli, George Tzanetakis, Andrea Tagliasacchi
    Comments: 10 pages, 4 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There are many applications scenarios for which the computational performance
    and memory footprint of the prediction phase of Deep Neural Networks (DNNs)
    needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an
    effective way of achieving this objective. In this paper, we show how
    Convolutional Neural Networks (CNNs) can be implemented using binary
    representations. Espresso is a compact, yet powerful library written in C/CUDA
    that features all the functionalities required for the forward propagation of
    CNNs, in a binary file less than 400KB, without any external dependencies.
    Although it is mainly designed to take advantage of massive GPU parallelism,
    Espresso also provides an equivalent CPU implementation for CNNs. Espresso
    provides special convolutional and dense layers for BCNNs, leveraging
    bit-packing and bit-wise computations for efficient execution. These techniques
    provide a speed-up of matrix-multiplication routines, and at the same time,
    reduce memory usage when storing parameters and activations. We experimentally
    show that Espresso is significantly faster than existing implementations of
    optimized binary neural networks ((approx) 2 orders of magnitude). Espresso is
    released under the Apache 2.0 license and is available at
    this http URL

    Evaluation of Direct Haptic 4D Volume Rendering of Partially Segmented Data for Liver Puncture Simulation

    Andre Mastmeyer, Dirk Fortmeier, Heinz Handels
    Comments: 15 pages, 16 figures, 1 tables, 11 equations, 39 references
    Journal-ref: Nature – Scientific Reports, Nature Publishing Group (NPG),
    7(671), 2017
    Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)

    This work presents an evaluation study using a force feedback evaluation
    framework for a novel direct needle force volume rendering concept in the
    context of liver puncture simulation. PTC/PTCD puncture interventions targeting
    the bile ducts have been selected to illustrate this concept. The haptic
    algorithms of the simulator system are based on (1) partially segmented patient
    image data and (2) a non-linear spring model effective at organ borders. The
    primary aim is to quantitatively evaluate force errors caused by our patient
    modeling approach, in comparison to haptic force output obtained from using
    gold-standard, completely manually-segmented data. The evaluation of the force
    algorithms compared to a force output from fully manually segmented
    gold-standard patient models, yields a low mean of 0.12 N root mean squared
    force error and up to 1.6 N for systematic maximum absolute errors. Force
    errors were evaluated on 31,222 preplanned test paths from 10 patients. Only
    twelve percent of the emitted forces along these paths were affected by errors.
    This is the first study evaluating haptic algorithms with deformable virtual
    patients in silico. We prove haptic rendering plausibility on a very high
    number of test paths. Important errors are below just noticeable differences
    for the hand-arm system.


    Artificial Intelligence

    A unified approach to interpreting model predictions

    Scott Lundberg, Su-In Lee
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Understanding why a model made a certain prediction is crucial in many
    applications. However, with large modern datasets the best accuracy is often
    achieved by complex models even experts struggle to interpret, such as ensemble
    or deep learning models. This creates a tension between accuracy and
    interpretability. In response, a variety of methods have recently been proposed
    to help users interpret the predictions of complex models. Here, we present a
    unified framework for interpreting predictions, namely SHAP (SHapley Additive
    exPlanations, which assigns each feature an importance for a particular
    prediction. The key novel components of the SHAP framework are the
    identification of a class of additive feature importance measures and
    theoretical results that there is a unique solution in this class with a set of
    desired properties. This class unifies six existing methods, and several recent
    methods in this class do not have these desired properties. This means that our
    framework can inform the development of new methods for explaining prediction
    models. We demonstrate that several new methods we presented in this paper
    based on the SHAP framework show better computational performance and better
    consistency with human intuition than existing methods.

    AIXIjs: A Software Demo for General Reinforcement Learning

    John Aslanides
    Comments: Masters thesis. Australian National University, October 2016. 97 pp
    Subjects: Artificial Intelligence (cs.AI)

    Reinforcement learning is a general and powerful framework with which to
    study and implement artificial intelligence. Recent advances in deep learning
    have enabled RL algorithms to achieve impressive performance in restricted
    domains such as playing Atari video games (Mnih et al., 2015) and, recently,
    the board game Go (Silver et al., 2016). However, we are still far from
    constructing a generally intelligent agent. Many of the obstacles and open
    questions are conceptual: What does it mean to be intelligent? How does one
    explore and learn optimally in general, unknown environments? What, in fact,
    does it mean to be optimal in the general sense? The universal Bayesian agent
    AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a
    central role in the sub-field of general reinforcement learning (GRL).
    Recently, AIXI has been shown to be flawed in important ways; it doesn’t
    explore enough to be asymptotically optimal (Orseau, 2010), and it can perform
    poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI
    have been proposed to attempt to address these shortfalls: among them are
    entropy-seeking agents (Orseau, 2011), knowledge-seeking agents (Orseau et al.,
    2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike,
    2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and
    Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL
    agents. This implementation is accompanied by a framework for running
    experiments against various environments, similar to OpenAI Gym (Brockman et
    al., 2016), and a suite of interactive demos that explore different properties
    of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to
    present numerous experiments illustrating fundamental properties of, and
    differences between, these agents.

    Shallow Updates for Deep Reinforcement Learning

    Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN)
    have achieved state-of-the-art results in a variety of challenging,
    high-dimensional domains. This success is mainly attributed to the power of
    deep neural networks to learn rich domain representations for approximating the
    value function or policy. Batch reinforcement learning methods with linear
    representations, on the other hand, are more stable and require less hyper
    parameter tuning. Yet, substantial feature engineering is necessary to achieve
    good results. In this work we propose a hybrid approach — the Least Squares
    Deep Q-Network (LS-DQN), which combines rich feature representations learned by
    a DRL algorithm with the stability of a linear least squares method. We do this
    by periodically re-training the last hidden layer of a DRL network with a batch
    least squares update. Key to our approach is a Bayesian regularization term for
    the least squares update, which prevents over-fitting to the more recent data.
    We tested LS-DQN on five Atari games and demonstrate significant improvement
    over vanilla DQN and Double-DQN. We also investigated the reasons for the
    superior performance of our method. Interestingly, we found that the
    performance improvement can be attributed to the large batch size used by the
    LS method when optimizing the last layer.

    Experience enrichment based task independent reward model

    Min Xu
    Comments: 4 pages, 1 figure
    Subjects: Artificial Intelligence (cs.AI)

    For most reinforcement learning approaches, the learning is performed by
    maximizing an accumulative reward that is expectedly and manually defined for
    specific tasks. However, in real world, rewards are emergent phenomena from the
    complex interactions between agents and environments. In this paper, we propose
    an implicit generic reward model for reinforcement learning. Unlike those
    rewards that are manually defined for specific tasks, such implicit reward is
    task independent. It only comes from the deviation from the agents’ previous
    experiences.

    Sketched Answer Set Programming

    Sergey Paramonov, Christian Bessiere, Anton Dries, Luc De Raedt
    Comments: 15 pages, 11 figures
    Subjects: Artificial Intelligence (cs.AI)

    Answer Set Programming (ASP) is a powerful modeling formalism for
    combinatorial problems. However, writing ASP models is not trivial. We propose
    a novel method, called Sketched Answer Set Programming (SkASP), aiming at
    supporting the user in resolving this issue. The user writes an ASP program
    while marking uncertain parts open with question marks. In addition, the user
    provides a number of positive and negative examples of the desired program
    behaviour. The sketched model is rewritten into another ASP program, which is
    solved by traditional methods. As a result, the user obtains a functional and
    reusable ASP program modelling her problem. We evaluate our approach on 21 well
    known puzzles and combinatorial problems inspired by Karp’s 21 NP-complete
    problems and demonstrate a use-case for a database application based on ASP.

    Generalizing the Role of Determinization in Probabilistic Planning

    Luis Pineda, Shlomo Zilberstein
    Subjects: Artificial Intelligence (cs.AI)

    The stochastic shortest path problem (SSP) is a highly expressive model for
    probabilistic planning. The computational hardness of SSPs has sparked interest
    in determinization-based planners that can quickly solve large problems.
    However, existing methods employ a simplistic approach to determinization. In
    particular, they ignore the possibility of tailoring the determinization to the
    specific characteristics of the target domain. In this work we examine this
    question, by showing that learning a good determinization for a planning domain
    can be done efficiently and can improve performance. Moreover, we show how to
    directly incorporate probabilistic reasoning into the planning problem when a
    good determinization is not sufficient by itself. Based on these insights, we
    introduce a planner, FF-LAO*, that outperforms state-of-the-art probabilistic
    planners on several well-known competition benchmarks.

    Why You Should Charge Your Friends for Borrowing Your Stuff

    Kijung Shin, Euiwoong Lee, Dhivya Eswaran, Ariel D. Procaccia
    Comments: to be published in 26th International Joint Conference on Artificial Intelligence (IJCAI-17)
    Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Social and Information Networks (cs.SI)

    We consider goods that can be shared with k-hop neighbors (i.e., the set of
    nodes within k hops from an owner) on a social network. We examine incentives
    to buy such a good by devising game-theoretic models where each node decides
    whether to buy the good or free ride. First, we find that social inefficiency,
    specifically excessive purchase of the good, occurs in Nash equilibria. Second,
    the social inefficiency decreases as k increases and thus a good can be shared
    with more nodes. Third, and most importantly, the social inefficiency can also
    be significantly reduced by charging free riders an access cost and paying it
    to owners, leading to the conclusion that organizations and system designers
    should impose such a cost. These findings are supported by our theoretical
    analysis in terms of the price of anarchy and the price of stability; and by
    simulations based on synthetic and real social networks.

    Combining tabu search and graph reduction to solve the maximum balanced biclique problem

    Yi Zhou, Jin-Kao Hao
    Subjects: Artificial Intelligence (cs.AI)

    The Maximum Balanced Biclique Problem is a well-known graph model with
    relevant applications in diverse domains. This paper introduces a novel
    algorithm, which combines an effective constraint-based tabu search procedure
    and two dedicated graph reduction techniques. We verify the effectiveness of
    the algorithm on 30 classical random benchmark graphs and 25 very large
    real-life sparse graphs from the popular Koblenz Network Collection (KONECT).
    The results show that the algorithm improves the best-known results (new lower
    bounds) for 10 classical benchmarks and obtains the optimal solutions for 14
    KONECT instances.

    RankPL: A Qualitative Probabilistic Programming Language

    Tjitze Rienstra
    Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

    In this paper we introduce RankPL, a modeling language that can be thought of
    as a qualitative variant of a probabilistic programming language with a
    semantics based on Spohn’s ranking theory. Broadly speaking, RankPL can be used
    to represent and reason about processes that exhibit uncertainty expressible by
    distinguishing “normal” from” surprising” events. RankPL allows (iterated)
    revision of rankings over alternative program states and supports various types
    of reasoning, including abduction and causal inference. We present the
    language, its denotational semantics, and a number of practical examples. We
    also discuss an implementation of RankPL that is available for download.

    How to Train Your DRAGAN

    Naveen Kodali, Jacob Abernethy, James Hays, Zsolt Kira
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Generative Adversarial Networks have emerged as an effective technique for
    estimating data distributions. The basic setup consists of two deep networks
    playing against each other in a zero-sum game setting. However, it is not
    understood if the networks reach an equilibrium eventually and what dynamics
    makes this possible. The current GAN training procedure, which involves
    simultaneous gradient descent, lacks a clear game-theoretic justification in
    the literature. In this paper, we introduce regret minimization as a technique
    to reach equilibrium in games and use this to motivate the use of simultaneous
    GD in GANs. In addition, we present a hypothesis that mode collapse, which is a
    common occurrence in GAN training, happens due to the existence of spurious
    local equilibria in non-convex games. Motivated by these insights, we develop
    an algorithm called DRAGAN that is fast, simple to implement and achieves
    competitive performance in a stable fashion across different architectures,
    datasets (MNIST, CIFAR-10, and CelebA), and divergence measures with almost no
    hyperparameter tuning.

    Model-Based Planning in Discrete Action Spaces

    Mikael Henaff, William F. Whitney, Yann LeCun
    Subjects: Artificial Intelligence (cs.AI)

    Planning actions using learned and differentiable forward models of the world
    is a general approach which has a number of desirable properties, including
    improved sample complexity over model-free RL methods, reuse of learned models
    across different tasks, and the ability to perform efficient gradient-based
    optimization in continuous action spaces. However, this approach does not apply
    straightforwardly when the action space is discrete, which may have limited its
    adoption. In this work, we introduce two discrete planning tasks inspired by
    existing question-answering datasets and show that it is in fact possible to
    effectively perform planning via backprop in discrete action spaces using two
    simple yet principled modifications. Our experiments show that this approach
    can significantly outperform model-free RL based methods and supervised
    imitation learners.

    Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

    Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Andrea Gesmundo, Neil Houlsby, Wojciech Gajewski, Wei Wang
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We propose an active question answering agent that learns to reformulate
    questions and combine evidence to improve question answering. The agent sits
    between the user and a black box question-answering system and learns to
    optimally probe the system with natural language reformulations of the initial
    question and to aggregate the evidence to return the best possible answer. The
    system is trained end-to-end to maximize answer quality using policy gradient.
    We evaluate on SearchQA, a dataset of complex questions extracted from
    Jeopardy!. Our agent improves F1 by 11% over a state-of-the-art base model that
    uses the original question/answer pairs.

    A unified view of entropy-regularized Markov decision processes

    Gergely Neu, Anders Jonsson, Vicenç Gómez
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We propose a general framework for entropy-regularized average-reward
    reinforcement learning in Markov decision processes (MDPs). Our approach is
    based on extending the linear-programming formulation of policy optimization in
    MDPs to accommodate convex regularization functions. Our key result is showing
    that using the conditional entropy of the joint state-action distributions as
    regularization yields a dual optimization problem closely resembling the
    Bellman optimality equations. This result enables us to formalize a number of
    state-of-the-art entropy-regularized reinforcement learning algorithms as
    approximate variants of Mirror Descent or Dual Averaging, and thus to argue
    about the convergence properties of these methods. In particular, we show that
    the exact version of the TRPO algorithm of Schulman et al. (2015) actually
    converges to the optimal policy, while the entropy-regularized policy gradient
    methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally,
    we illustrate empirically the effects of using various regularization
    techniques on learning performance in a simple reinforcement learning setup.

    Note on Evolution and Forecasting of Requirements: Communications Example

    Mark Sh. Levin
    Comments: 8 pages, 8 figures, 9 tables
    Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

    Combinatorial evolution and forecasting of system requirements is examined.
    The morphological model is used for a hierarchical requirements system (i.e.,
    system parts, design alternatives for the system parts, ordinal estimates for
    the alternatives). A set of system changes involves changes of the system
    structure, component alternatives and their estimates. The composition process
    of the forecast is based on combinatorial synthesis (knapsack problem, multiple
    choice problem, hierarchical morphological design). An illustrative numerical
    example for four-phase evolution and forecasting of requirements to
    communications is described.

    Statistical inference using SGD

    Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)

    We present a novel method for frequentist statistical inference in
    (M)-estimation problems, based on stochastic gradient descent (SGD) with a
    fixed step size: we demonstrate that the average of such SGD sequences can be
    used for statistical inference, after proper scaling. An intuitive analysis
    using the Ornstein-Uhlenbeck process suggests that such averages are
    asymptotically normal. From a practical perspective, our SGD-based inference
    procedure is a first order method, and is well-suited for large scale problems.
    To show its merits, we apply it to both synthetic and real datasets, and
    demonstrate that its accuracy is comparable to classical statistical methods,
    while requiring potentially far less computation.

    Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning

    Sahil Sharma, Srivatsan Ramesh, Girish Raguvir J, Balaraman Ravindran
    Comments: 10 pages + 11 page appendix
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Reinforcement Learning (RL) can model complex behavior policies for
    goal-directed sequential decision making tasks. A hallmark of RL algorithms is
    Temporal Difference (TD) learning: value function for the current state is
    moved towards a bootstrapped target that is estimated using next state’s value
    function. (lambda)-returns generalize beyond 1-step returns and strike a
    balance between Monte Carlo and TD learning methods. While lambda-returns have
    been extensively studied in RL, they haven’t been explored a lot in Deep RL.
    This paper’s first contribution is an exhaustive benchmarking of
    lambda-returns. Although mathematically tractable, the use of exponentially
    decaying weighting of n-step returns based targets in lambda-returns is a
    rather ad-hoc design choice. Our second major contribution is that we propose a
    generalization of lambda-returns called Confidence-based Autodidactic Returns
    (CAR), wherein the RL agent learns the weighting of the n-step returns in an
    end-to-end manner. This allows the agent to learn to decide how much it wants
    to weigh the n-step returns based targets. In contrast, lambda-returns restrict
    RL agents to use an exponentially decaying weighting scheme. Autodidactic
    returns can be used for improving any RL algorithm which uses TD learning. We
    empirically demonstrate that using sophisticated weighted mixtures of
    multi-step returns (like CAR and lambda-returns) considerably outperforms the
    use of n-step returns. We perform our experiments on the Asynchronous Advantage
    Actor Critic (A3C) algorithm in the Atari 2600 domain.

    Mixed Membership Word Embeddings for Computational Social Science

    James Foulds
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Word embeddings improve the performance of NLP systems by revealing the
    hidden structural relationships between words. These models have recently risen
    in popularity due to the performance of scalable algorithms trained in the big
    data setting. Despite their success, word embeddings have seen very little use
    in computational social science NLP tasks, presumably due to their reliance on
    big data, and to a lack of interpretability. I propose a probabilistic
    model-based word embedding method which can recover interpretable embeddings,
    without big data. The key insight is to leverage the notion of mixed membership
    modeling, in which global representations are shared, but individual entities
    (i.e. dictionary words) are free to use these representations to uniquely
    differing degrees. Leveraging connections to topic models, I show how to train
    these models in high dimensions using a combination of state-of-the-art
    techniques for word embeddings and topic modeling. Experimental results show an
    improvement in predictive performance of up to 63% in MRR over the skip-gram on
    small datasets. The models are interpretable, as embeddings of topics are used
    to encode embeddings for words (and hence, documents) in a model-based way. I
    illustrate this with two computational social science case studies, on NIPS
    articles and State of the Union addresses.

    Ensemble Sampling

    Xiuyuan Lu, Benjamin Van Roy
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Thompson sampling has emerged as an effective heuristic for a broad range of
    online decision problems. In its basic form, the algorithm requires computing
    and sampling from a posterior distribution over models, which is tractable only
    for simple special cases. This paper develops ensemble sampling, which aims to
    approximate Thompson sampling while maintaining tractability even in the face
    of complex models such as neural networks. Ensemble sampling dramatically
    expands on the range of applications for which Thompson sampling is viable. We
    establish a theoretical basis that supports the approach and present
    computational results that offer further insight.

    Fast Change Point Detection on Dynamic Social Networks

    Yu Wang, Aniket Chakrabarti, David Sivakoff, Srinivasan Parthasarathy
    Comments: IJCAI’17
    Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)

    A number of real world problems in many domains (e.g. sociology, biology,
    political science and communication networks) can be modeled as dynamic
    networks with nodes representing entities of interest and edges representing
    interactions among the entities at different points in time. A common
    representation for such models is the snapshot model – where a network is
    defined at logical time-stamps. An important problem under this model is change
    point detection. In this work we devise an effective and efficient
    three-step-approach for detecting change points in dynamic networks under the
    snapshot model. Our algorithm achieves up to 9X speedup over the
    state-of-the-art while improving quality on both synthetic and real world
    networks.

    Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning

    Sahil Sharma, Aravind Suresh, Rahul Ramesh, Balaraman Ravindran
    Comments: 11 pages + 7 pages appendix
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Deep Reinforcement Learning (DRL) methods have performed well in an
    increasing numbering of high-dimensional visual decision making domains. Among
    all such visual decision making problems, those with discrete action spaces
    often tend to have underlying compositional structure in the said action space.
    Such action spaces often contain actions such as go left, go up as well as go
    diagonally up and left (which is a composition of the former two actions). The
    representations of control policies in such domains have traditionally been
    modeled without exploiting this inherent compositional structure in the action
    spaces. We propose a new learning paradigm, Factored Action space
    Representations (FAR) wherein we decompose a control policy learned using a
    Deep Reinforcement Learning Algorithm into independent components, analogous to
    decomposing a vector in terms of some orthogonal basis vectors. This
    architectural modification of the control policy representation allows the
    agent to learn about multiple actions simultaneously, while executing only one
    of them. We demonstrate that FAR yields considerable improvements on top of two
    DRL algorithms in Atari 2600: FARA3C outperforms A3C (Asynchronous Advantage
    Actor Critic) in 9 out of 14 tasks and FARAQL outperforms AQL (Asynchronous
    n-step Q-Learning) in 9 out of 13 tasks.

    Search Engine Guided Non-Parametric Neural Machine Translation

    Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O.K. Li
    Comments: 8 pages, 4 figures, 2 tables
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    In this paper, we extend an attention-based neural machine translation (NMT)
    model by allowing it to access an entire training set of parallel sentence
    pairs even after training. The proposed approach consists of two stages. In the
    first stage–retrieval stage–, an off-the-shelf, black-box search engine is
    used to retrieve a small subset of sentence pairs from a training set given a
    source sentence. These pairs are further filtered based on a fuzzy matching
    score based on edit distance. In the second stage–translation stage–, a novel
    translation model, called translation memory enhanced NMT (TM-NMT), seamlessly
    uses both the source sentence and a set of retrieved sentence pairs to perform
    the translation. Empirical evaluation on three language pairs (En-Fr, En-De,
    and En-Es) shows that the proposed approach significantly outperforms the
    baseline approach and the improvement is more significant when more relevant
    sentence pairs were retrieved.

    Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

    Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)

    The Particle Swarm Optimization Policy (PSO-P) has been recently introduced
    and proven to produce remarkable results on interacting with academic
    reinforcement learning benchmarks in an off-policy, batch-based setting. To
    further investigate the properties and feasibility on real-world applications,
    this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a
    novel reinforcement learning (RL) benchmark that aims at being realistic by
    including a variety of aspects found in industrial applications, like
    continuous state and action spaces, a high dimensional, partially observable
    state space, delayed effects, and complex stochasticity. The experimental
    results of PSO-P on IB are compared to results of closed-form control policies
    derived from the model-based Recurrent Control Neural Network (RCNN) and the
    model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not
    only of interest for academic benchmarks, but also for real-world industrial
    applications, since it also yielded the best performing policy in our IB
    setting. Compared to other well established RL techniques, PSO-P produced
    outstanding results in performance and robustness, requiring only a relatively
    low amount of effort in finding adequate parameters or making complex design
    decisions.

    AIDE: An algorithm for measuring the accuracy of probabilistic inference algorithms

    Marco F. Cusumano-Towner, Vikash K. Mansinghka
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Approximate probabilistic inference algorithms are central to many fields.
    Examples include sequential Monte Carlo inference in robotics, variational
    inference in machine learning, and Markov chain Monte Carlo inference in
    statistics. A key problem faced by practitioners is measuring the accuracy of
    an approximate inference algorithm on a specific dataset. This paper introduces
    the auxiliary inference divergence estimator (AIDE), an algorithm for measuring
    the accuracy of approximate inference algorithms. AIDE is based on the
    observation that inference algorithms can be treated as probabilistic models
    and the random variables used within the inference algorithm can be viewed as
    auxiliary variables. This view leads to a new estimator for the symmetric KL
    divergence between the output distributions of two inference algorithms. The
    paper illustrates application of AIDE to algorithms for inference in
    regression, hidden Markov, and Dirichlet process mixture models. The
    experiments show that AIDE captures the qualitative behavior of a broad class
    of inference algorithms and can detect failure modes of inference algorithms
    that are missed by standard heuristics.


    Information Retrieval

    Learning to Rank Using Localized Geometric Mean Metrics

    Yuxin Su, Irwin King, Michael Lyu
    Comments: To appear in SIGIR’17
    Subjects: Information Retrieval (cs.IR)

    Many learning-to-rank (LtR) algorithms focus on query-independent model, in
    which query and document do not lie in the same feature space, and the rankers
    rely on the feature ensemble about query-document pair instead of the
    similarity between query instance and documents. However, existing algorithms
    do not consider local structures in query-document feature space, and are
    fragile to irrelevant noise features. In this paper, we propose a novel
    Riemannian metric learning algorithm to capture the local structures and
    develop a robust LtR algorithm. First, we design a concept called extit{ideal
    candidate document} to introduce metric learning algorithm to query-independent
    model. Previous metric learning algorithms aiming to find an optimal metric
    space are only suitable for query-dependent model, in which query instance and
    documents belong to the same feature space and the similarity is directly
    computed from the metric space. Then we extend the new and extremely fast
    global Geometric Mean Metric Learning (GMML) algorithm to develop a localized
    GMML, namely L-GMML. Based on the combination of local learned metrics, we
    employ the popular Normalized Discounted Cumulative Gain~(NDCG) scorer and
    Weighted Approximate Rank Pairwise (WARP) loss to optimize the extit{ideal
    candidate document} for each query candidate set. Finally, we can quickly
    evaluate all candidates via the similarity between the extit{ideal candidate
    document} and other candidates. By leveraging the ability of metric learning
    algorithms to describe the complex structural information, our approach gives
    us a principled and efficient way to perform LtR tasks. The experiments on
    real-world datasets demonstrate that our proposed L-GMML algorithm outperforms
    the state-of-the-art metric learning to rank methods and the stylish
    query-independent LtR algorithms regarding accuracy and computational
    efficiency.

    Personalized Ranking for Context-Aware Venue Suggestion

    Mohammad Aliannejadi, Ida Mele, Fabio Crestani
    Comments: The 32nd ACM SIGAPP Symposium On Applied Computing (SAC), Marrakech, Morocco, April 4-6, 2017
    Subjects: Information Retrieval (cs.IR)

    Making personalized and context-aware suggestions of venues to the users is
    very crucial in venue recommendation. These suggestions are often based on
    matching the venues’ features with the users’ preferences, which can be
    collected from previously visited locations. In this paper we present a novel
    user-modeling approach which relies on a set of scoring functions for making
    personalized suggestions of venues based on venues content and reviews as well
    as users context. Our experiments, conducted on the dataset of the TREC
    Contextual Suggestion Track, prove that our methodology outperforms
    state-of-the-art approaches by a significant margin.


    Computation and Language

    Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

    Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Andrea Gesmundo, Neil Houlsby, Wojciech Gajewski, Wei Wang
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We propose an active question answering agent that learns to reformulate
    questions and combine evidence to improve question answering. The agent sits
    between the user and a black box question-answering system and learns to
    optimally probe the system with natural language reformulations of the initial
    question and to aggregate the evidence to return the best possible answer. The
    system is trained end-to-end to maximize answer quality using policy gradient.
    We evaluate on SearchQA, a dataset of complex questions extracted from
    Jeopardy!. Our agent improves F1 by 11% over a state-of-the-art base model that
    uses the original question/answer pairs.

    W2VLDA: Almost Unsupervised System for Aspect Based Sentiment Analysis

    Aitor García-Pablos, Montse Cuadros, German Rigau
    Subjects: Computation and Language (cs.CL)

    With the increase of online customer opinions in specialised websites and
    social networks, the necessity of automatic systems to help to organise and
    classify customer reviews by domain-specific aspect/categories and sentiment
    polarity is more important than ever. Supervised approaches to Aspect Based
    Sentiment Analysis obtain good results for the domain/language their are
    trained on, but having manually labelled data for training supervised systems
    for all domains and languages use to be very costly and time consuming. In this
    work we describe W2VLDA, an unsupervised system based on topic modelling, that
    combined with some other unsupervised methods and a minimal configuration,
    performs aspect/category classifiation, aspectterms/opinion-words separation
    and sentiment polarity classification for any given domain and language. We
    also evaluate the performance of the aspect and sentiment classification in the
    multilingual SemEval 2016 task 5 (ABSA) dataset. We show competitive results
    for several languages (English, Spanish, French and Dutch) and domains (hotels,
    restaurants, electronic-devices).

    Learning Semantic Relatedness From Human Feedback Using Metric Learning

    Thomas Niebler, Martin Becker, Christian Pölitz, Andreas Hotho
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Assessing the degree of semantic relatedness between words is an important
    task with a variety of semantic applications, such as ontology learning for the
    Semantic Web, semantic search or query expansion. To accomplish this in an
    automated fashion, many relatedness measures have been proposed. However, most
    of these metrics only encode information contained in the underlying corpus and
    thus do not directly model human intuition. To solve this, we propose to
    utilize a metric learning approach to improve existing semantic relatedness
    measures by learning from additional information, such as explicit human
    feedback. For this, we argue to use word embeddings instead of traditional
    high-dimensional vector representations in order to leverage their semantic
    density and to reduce computational cost. We rigorously test our approach on
    several domains including tagging data as well as publicly available embeddings
    based on Wikipedia texts and navigation. Human feedback about semantic
    relatedness for learning and evaluation is extracted from publicly available
    datasets such as MEN or WS-353. We find that our method can significantly
    improve semantic relatedness measures by learning from additional information,
    such as explicit human feedback. For tagging data, we are the first to generate
    and study embeddings. Our results are of special interest for ontology and
    recommendation engineers, but also for any other researchers and practitioners
    of Semantic Web techniques.

    Recurrent Additive Networks

    Kenton Lee, Omer Levy, Luke Zettlemoyer
    Subjects: Computation and Language (cs.CL)

    We introduce recurrent additive networks (RANs), a new gated RNN which is
    distinguished by the use of purely additive latent state updates. At every time
    step, the new state is computed as a gated component-wise sum of the input and
    the previous state, without any of the non-linearities commonly used in RNN
    transition dynamics. We formally show that RAN states are weighted sums of the
    input vectors, and that the gates only contribute to computing the weights of
    these sums. Despite this relatively simple functional form, experiments
    demonstrate that RANs outperform both LSTMs and GRUs on benchmark language
    modeling problems. This result shows that many of the non-linear computations
    in LSTMs and related networks are not essential, at least for the problems we
    consider, and suggests that the gates are doing more of the computational work
    than previously understood.

    Spelling Correction as a Foreign Language

    Yingbo Zhou, Utkarsh Porwal, Roberto Konow
    Subjects: Computation and Language (cs.CL)

    In this paper, we reformulated the spell correction problem as a machine
    translation task under the encoder-decoder framework. This reformulation
    enabled us to use a single model for solving the problem that is traditionally
    formulated as learning a language model and an error model. This model employs
    multi-layer recurrent neural networks as an encoder and a decoder. We
    demonstrate the effectiveness of this model using an internal dataset, where
    the training data is automatically obtained from user logs. The model offers
    competitive performance as compared to the state of the art methods but does
    not require any feature engineering nor hand tuning between models.

    Mixed Membership Word Embeddings for Computational Social Science

    James Foulds
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Word embeddings improve the performance of NLP systems by revealing the
    hidden structural relationships between words. These models have recently risen
    in popularity due to the performance of scalable algorithms trained in the big
    data setting. Despite their success, word embeddings have seen very little use
    in computational social science NLP tasks, presumably due to their reliance on
    big data, and to a lack of interpretability. I propose a probabilistic
    model-based word embedding method which can recover interpretable embeddings,
    without big data. The key insight is to leverage the notion of mixed membership
    modeling, in which global representations are shared, but individual entities
    (i.e. dictionary words) are free to use these representations to uniquely
    differing degrees. Leveraging connections to topic models, I show how to train
    these models in high dimensions using a combination of state-of-the-art
    techniques for word embeddings and topic modeling. Experimental results show an
    improvement in predictive performance of up to 63% in MRR over the skip-gram on
    small datasets. The models are interpretable, as embeddings of topics are used
    to encode embeddings for words (and hence, documents) in a model-based way. I
    illustrate this with two computational social science case studies, on NIPS
    articles and State of the Union addresses.

    Formalized Lambek Calculus in Higher Order Logic (HOL4)

    Chun Tian
    Comments: 37 pages
    Subjects: Computation and Language (cs.CL); Logic in Computer Science (cs.LO)

    In this project, a rather complete proof-theoretical formalization of Lambek
    Calculus (non-associative with arbitrary extensions) has been ported from Coq
    proof assistent to HOL4 theorem prover, with some improvements and new
    theorems.

    Three deduction systems (Syntactic Calculus, Natural Deduction and Sequent
    Calculus) of Lambek Calculus are defined with many related theorems proved. The
    equivalance between these systems are formally proved. Finally, a formalization
    of Sequent Calculus proofs (where Coq has built-in supports) has been designed
    and implemented in HOL4. Some basic results including the sub-formula
    properties of the so-called “cut-free” proofs are formally proved.

    This work can be considered as the preliminary work towards a language parser
    based on category grammars which is not multimodal but still has ability to
    support context-sensitive languages through customized extensions.

    Search Engine Guided Non-Parametric Neural Machine Translation

    Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O.K. Li
    Comments: 8 pages, 4 figures, 2 tables
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    In this paper, we extend an attention-based neural machine translation (NMT)
    model by allowing it to access an entire training set of parallel sentence
    pairs even after training. The proposed approach consists of two stages. In the
    first stage–retrieval stage–, an off-the-shelf, black-box search engine is
    used to retrieve a small subset of sentence pairs from a training set given a
    source sentence. These pairs are further filtered based on a fuzzy matching
    score based on edit distance. In the second stage–translation stage–, a novel
    translation model, called translation memory enhanced NMT (TM-NMT), seamlessly
    uses both the source sentence and a set of retrieved sentence pairs to perform
    the translation. Empirical evaluation on three language pairs (En-Fr, En-De,
    and En-Es) shows that the proposed approach significantly outperforms the
    baseline approach and the improvement is more significant when more relevant
    sentence pairs were retrieved.

    On-the-fly Operation Batching in Dynamic Computation Graphs

    Graham Neubig, Yoav Goldberg, Chris Dyer
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

    Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer
    more flexibility for implementing models that cope with data of varying
    dimensions and structure, relative to toolkits that operate on statically
    declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing
    toolkits – both static and dynamic – require that the developer organize the
    computations into the batches necessary for exploiting high-performance
    algorithms and hardware. This batching task is generally difficult, but it
    becomes a major hurdle as architectures become complex. In this paper, we
    present an algorithm, and its implementation in the DyNet toolkit, for
    automatically batching operations. Developers simply write minibatch
    computations as aggregations of single instance computations, and the batching
    algorithm seamlessly executes them, on the fly, using computationally efficient
    batched operations. On a variety of tasks, we obtain throughput similar to that
    obtained with manual batches, as well as comparable speedups over
    single-instance learning on architectures that are impractical to batch
    manually.

    A Regularized Framework for Sparse and Structured Neural Attention

    Vlad Niculae, Mathieu Blondel
    Comments: 21 pages, including appendix
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

    Modern neural networks are often augmented with an attention mechanism, which
    tells the network where to focus within the input. We propose in this paper a
    new framework for sparse and structured attention, building upon a max operator
    regularized with a strongly convex function. We show that this operator is
    differentiable and that its gradient defines a mapping from real values to
    probabilities, suitable as an attention mechanism. Our framework includes
    softmax and a slight generalization of the recently-proposed sparsemax as
    special cases. However, we also show how our framework can incorporate modern
    structured penalties, resulting in new attention mechanisms that focus on
    entire segments or groups of an input, encouraging parsimony and
    interpretability. We derive efficient algorithms to compute the forward and
    backward passes of these attention mechanisms, enabling their use in a neural
    network trained with backpropagation. To showcase their potential as a drop-in
    replacement for existing attention mechanisms, we evaluate them on three
    large-scale tasks: textual entailment, machine translation, and sentence
    summarization. Our attention mechanisms improve interpretability without
    sacrificing performance; notably, on textual entailment and summarization, we
    outperform the existing attention mechanisms based on softmax and sparsemax.

    Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

    Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy
    Comments: Under Review of NIPS 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

    Reward augmented maximum likelihood (RAML) is a simple and effective learning
    framework to directly optimize towards the reward function in structured
    prediction tasks. RAML incorporates task-specific reward by performing
    maximum-likelihood updates on candidate outputs sampled according to an
    exponentiated payoff distribution, which gives higher probabilities to
    candidates that are close to the reference output. While RAML is notable for
    its simplicity, efficiency, and its impressive empirical successes, the
    theoretical properties of RAML, especially the behavior of the exponentiated
    payoff distribution, has not been examined thoroughly. In this work, we
    introduce softmax Q-distribution estimation, a novel theoretical interpretation
    of RAML, which reveals the relation between RAML and Bayesian decision theory.
    The softmax Q-distribution can be regarded as a smooth approximation of Bayes
    decision boundary, and the Bayes decision rule is achieved by decoding with
    this Q-distribution. We further show that RAML is equivalent to approximately
    estimating the softmax Q-distribution. Experiments on three structured
    prediction tasks with rewards defined on sequential (named entity recognition),
    tree-based (dependency parsing) and irregular (machine translation) structures
    show notable improvements over maximum likelihood baselines.


    Distributed, Parallel, and Cluster Computing

    Symmetry Breaking in the Congest Model: Time- and Message-Efficient Algorithms for Ruling Sets

    Shreyas Pai, Gopal Pandurangan, Sriram V. Pemmaraju, Talal Riaz, Peter Robinson
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We study local symmetry breaking problems in the CONGEST model, focusing on
    ruling set problems, which generalize the fundamental Maximal Independent Set
    (MIS) problem. A (eta)-ruling set is an independent set such that every node
    in the graph is at most (eta) hops from a node in the independent set. Our
    work is motivated by the following central question: can we break the
    (Theta(log n)) time complexity barrier and the (Theta(m)) message complexity
    barrier in the CONGEST model for MIS or closely-related symmetry breaking
    problems? We present the following results:

    – Time Complexity: We show that we can break the (O(log n)) “barrier” for 2-
    and 3-ruling sets. We compute 3-ruling sets in (Oleft(frac{log n}{log log
    n}
    ight)) rounds with high probability (whp). More generally we show that
    2-ruling sets can be computed in (Oleft(log Delta cdot (log n)^{1/2 +
    varepsilon} + frac{log n}{loglog n}
    ight)) rounds for any (varepsilon >
    0), which is (o(log n)) for a wide range of (Delta) values (e.g., (Delta =
    2^{(log n)^{1/2-varepsilon}})). These are the first 2- and 3-ruling set
    algorithms to improve over the (O(log n))-round complexity of Luby’s algorithm
    in the CONGEST model.

    – Message Complexity: We show an (Omega(n^2)) lower bound on the message
    complexity of computing an MIS (i.e., 1-ruling set) which holds also for
    randomized algorithms and present a contrast to this by showing a randomized
    algorithm for 2-ruling sets that, whp, uses only (O(n log^2 n)) messages and
    runs in (O(Delta log n)) rounds. This is the first message-efficient
    algorithm known for ruling sets, which has message complexity nearly linear in
    (n) (which is optimal up to a polylogarithmic factor).

    Report of the HPC Correctness Summit, Jan 25–26, 2017, Washington, DC

    Ganesh Gopalakrishnan, Paul D. Hovland, Costin Iancu, Sriram Krishnamoorthy, Ignacio Laguna, Richard A. Lethin, Koushik Sen, Stephen F. Siegel, Armando Solar-Lezama
    Comments: 57 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Maintaining leadership in HPC requires the ability to support simulations at
    large scales and fidelity. In this study, we detail one of the most significant
    productivity challenges in achieving this goal, namely the increasing
    proclivity to bugs, especially in the face of growing hardware and software
    heterogeneity and sheer system scale. We identify key areas where timely new
    research must be proactively begun to address these challenges, and create new
    correctness tools that must ideally play a significant role even while ramping
    up toward exacale. We close with the proposal for a two-day workshop in which
    the problems identified in this report can be more broadly discussed, and
    specific plans to launch these new research thrusts identified.

    Broadcasting in Noisy Radio Networks

    Keren Censor-Hillel, Bernhard Haeupler, D. Ellis Hershkowitz, Goran Zuzic
    Comments: Principles of Distributed Computing 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    The widely-studied radio network model [Chlamtac and Kutten, 1985] is a
    graph-based description that captures the inherent impact of collisions in
    wireless communication. In this model, the strong assumption is made that node
    (v) receives a message from a neighbor if and only if exactly one of its
    neighbors broadcasts.

    We relax this assumption by introducing a new noisy radio network model in
    which random faults occur at senders or receivers. Specifically, for a constant
    noise parameter (p in [0,1)), either every sender has probability (p) of
    transmitting noise or every receiver of a single transmission in its
    neighborhood has probability (p) of receiving noise.

    We first study single-message broadcast algorithms in noisy radio networks
    and show that the Decay algorithm [Bar-Yehuda et al., 1992] remains robust in
    the noisy model while the diameter-linear algorithm of Gasieniec et al., 2007
    does not. We give a modified version of the algorithm of Gasieniec et al., 2007
    that is robust to sender and receiver faults, and extend both this modified
    algorithm and the Decay algorithm to robust multi-message broadcast algorithms.

    We next investigate the extent to which (network) coding improves throughput
    in noisy radio networks. We address the previously perplexing result of Alon et
    al. 2014 that worst case coding throughput is no better than worst case routing
    throughput up to constants: we show that the worst case throughput performance
    of coding is, in fact, superior to that of routing — by a (Theta(log(n)))
    gap — provided receiver faults are introduced. However, we show that any
    coding or routing scheme for the noiseless setting can be transformed to be
    robust to sender faults with only a constant throughput overhead. These
    transformations imply that the results of Alon et al., 2014 carry over to noisy
    radio networks with sender faults.

    Space Complexity of Fault Tolerant Register Emulations

    Gregory Chockler, Alexander Spiegelman
    Comments: Conference version appears in Proceedings of PODC ’17
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Driven by the rising popularity of cloud storage, the costs associated with
    implementing reliable storage services from a collection of fault-prone servers
    have recently become an actively studied question. The well-known ABD result
    shows that an f-tolerant register can be emulated using a collection of 2f + 1
    fault-prone servers each storing a single read-modify-write object type, which
    is known to be optimal. In this paper we generalize this bound: we investigate
    the inherent space complexity of emulating reliable multi-writer registers as a
    fucntion of the type of the base objects exposed by the underlying servers, the
    number of writers to the emulated register, the number of available servers,
    and the failure threshold. We establish a sharp separation between registers,
    and both max-registers (the base object types assumed by ABD) and CAS in terms
    of the resources (i.e., the number of base objects of the respective types)
    required to support the emulation; we show that no such separation exists
    between max-registers and CAS. Our main technical contribution is lower and
    upper bounds on the resources required in case the underlying base objects are
    fault-prone read/write registers. We show that the number of required registers
    is directly proportional to the number of writers and inversely proportional to
    the number of servers.

    Espresso: Efficient Forward Propagation for BCNNs

    Fabrizio Pedersoli, George Tzanetakis, Andrea Tagliasacchi
    Comments: 10 pages, 4 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There are many applications scenarios for which the computational performance
    and memory footprint of the prediction phase of Deep Neural Networks (DNNs)
    needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an
    effective way of achieving this objective. In this paper, we show how
    Convolutional Neural Networks (CNNs) can be implemented using binary
    representations. Espresso is a compact, yet powerful library written in C/CUDA
    that features all the functionalities required for the forward propagation of
    CNNs, in a binary file less than 400KB, without any external dependencies.
    Although it is mainly designed to take advantage of massive GPU parallelism,
    Espresso also provides an equivalent CPU implementation for CNNs. Espresso
    provides special convolutional and dense layers for BCNNs, leveraging
    bit-packing and bit-wise computations for efficient execution. These techniques
    provide a speed-up of matrix-multiplication routines, and at the same time,
    reduce memory usage when storing parameters and activations. We experimentally
    show that Espresso is significantly faster than existing implementations of
    optimized binary neural networks ((approx) 2 orders of magnitude). Espresso is
    released under the Apache 2.0 license and is available at
    this http URL

    TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

    Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li
    Comments: 9 pages
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)

    High network communication cost for synchronizing gradients and parameters is
    the well-known bottleneck of distributed training. In this work, we propose
    TernGrad that uses ternary gradients to accelerate distributed deep learning in
    data parallelism. Our approach requires only three numerical levels {-1,0,1}
    which can aggressively reduce the communication time. We mathematically prove
    the convergence of TernGrad under the assumption of a bound on gradients.
    Guided by the bound, we propose layer-wise ternarizing and gradient clipping to
    improve its convergence. Our experiments show that applying TernGrad on AlexNet
    does not incur any accuracy loss and can even improve accuracy. The accuracy
    loss of GoogLeNet induced by TernGrad is less than 2% on average. Finally, a
    performance model is proposed to study the scalability of TernGrad. Experiments
    show significant speed gains for various deep neural networks.

    MITHRIL: Mining Sporadic Associations for Cache Prefetching

    Juncheng Yang, Reza Karimi, Trausti Sæmundsson, Avani Wildani, Ymir Vigfusson
    Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)

    The growing pressure on cloud application scalability has accentuated storage
    performance as a critical bottle- neck. Although cache replacement algorithms
    have been extensively studied, cache prefetching – reducing latency by
    retrieving items before they are actually requested remains an underexplored
    area. Existing approaches to history-based prefetching, in particular, provide
    too few benefits for real systems for the resources they cost. We propose
    MITHRIL, a prefetching layer that efficiently exploits historical patterns in
    cache request associations. MITHRIL is inspired by sporadic association rule
    mining and only relies on the timestamps of requests. Through evaluation of 135
    block-storage traces, we show that MITHRIL is effective, giving an average of a
    55% hit ratio increase over LRU and PROBABILITY GRAPH, a 36% hit ratio gain
    over AMP at reasonable cost. We further show that MITHRIL can supplement any
    cache replacement algorithm and be readily integrated into existing systems.
    Furthermore, we demonstrate the improvement comes from MITHRIL being able to
    capture mid-frequency blocks.

    Dynamic Analysis of the Arrow Distributed Directory Protocol in General Networks

    Abdolhamid Ghodselahi, Fabian Kuhn
    Comments: 31 pages, 3 figures
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    The Arrow protocol is a simple and elegant protocol to coordinate exclusive
    access to a shared object in a network. The protocol solves the underlying
    distributed queueing problem by using path reversal on a pre-computed spanning
    tree (or any other tree topology simulated on top of the given network).

    It is known that the Arrow protocol solves the problem with a competitive
    ratio of O(log D) on trees of diameter D. This implies a distributed queueing
    algorithm with competitive ratio O(s*log D) for general networks with a
    spanning tree of diameter D and stretch s. In this work we show that when
    running the Arrow protocol on top of the well-known probabilistic tree
    embedding of Fakcharoenphol, Rao, and Talwar [STOC 03], we obtain a randomized
    distributed queueing algorithm with a competitive ratio of O(log n) even on
    general network topologies. The result holds even if the queueing requests
    occur in an arbitrarily dynamic and concurrent fashion and even if
    communication is asynchronous. From a technical point of view, the main of the
    paper shows that the competitive ratio of the Arrow protocol is constant on a
    special family of tree topologies, known as hierarchically well separated
    trees.

    BRPL: Backpressure RPL for High-throughput and Mobile IoTs

    Yad Tahir, Shusen Yang, Julie McCann
    Comments: 14 pages, to appear in IEEE Transactions on Mobile Computing, 2017
    Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)

    RPL, an IPv6 routing protocol for Low power Lossy Networks (LLNs), is
    considered to be the de facto routing standard for the Internet of Things
    (IoT). However, more and more experimental results demonstrate that RPL
    performs poorly when it comes to throughput and adaptability to network
    dynamics. This significantly limits the application of RPL in many practical
    IoT scenarios, such as an LLN with high-speed sensor data streams and mobile
    sensing devices. To address this issue, we develop BRPL, an extension of RPL,
    providing a practical approach that allows users to smoothly combine any RPL
    Object Function (OF) with backpressure routing. BRPL uses two novel algorithms,
    QuickTheta and QuickBeta, to support time-varying data traffic loads and node
    mobility respectively. We implement BRPL on Contiki OS, an open-source
    operating system for the Internet of Things. We conduct an extensive evaluation
    using both real-world experiments based on the FIT IoT-LAB testbed and
    large-scale simulations using Cooja over 18 virtual servers on the Cloud. The
    evaluation results demonstrate that BRPL not only is fully backward compatible
    with RPL (i.e. devices running RPL and BRPL can work together seamlessly), but
    also significantly improves network throughput and adaptability to changes in
    network topologies and data traffic loads. The observed packet loss reduction
    in mobile networks is, at a minimum, 60% and up to 1000% can be seen in extreme
    cases.

    BAMHealthCloud: A Biometric Authentication and Data Management System for Healthcare Data in Cloud

    Kashish A. Shakil, Farhana J. Zareen, Mansaf Alam, Suraiya Jabin
    Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC)

    Advancements in healthcare industry with new technology and population growth
    has given rise to security threat to our most personal data. The healthcare
    data management system consists of records in different formats such as text,
    numeric, pictures and videos leading to data which is big and unstructured.
    Also, hospitals have several branches at different locations throughout a
    country and overseas. In view of these requirements a cloud based healthcare
    management system can be an effective solution for efficient health care data
    management. One of the major concerns of a cloud based healthcare system is the
    security aspect. It includes theft to identity, tax fraudulence, insurance
    frauds, medical frauds and defamation of high profile patients. Hence, a secure
    data access and retrieval is needed in order to provide security of critical
    medical records in health care management system. Biometric authentication
    mechanism is suitable in this scenario since it overcomes the limitations of
    token theft and forgetting passwords in conventional token id-password
    mechanism used for providing security. It also has high accuracy rate for
    secure data access and retrieval. In this paper we propose BAMHealthCloud which
    is a cloud based system for management of healthcare data, it ensures security
    of data through biometric authentication. It has been developed after
    performing a detailed case study on healthcare sector in a developing country.
    Training of the signature samples for authentication purpose has been performed
    in parallel on hadoop MapReduce framework using Resilient Backpropagation
    neural network. From rigorous experiments it can be concluded that it achieves
    a speedup of 9x, Equal error rate (EER) of 0.12, sensitivity of 0.98 and
    specificity of 0.95 as compared to other approaches existing in literature.


    Learning

    Dynamic Partition of Complex Networks

    Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Finding the reduced-dimensional structure is critical to understanding
    complex networks. Existing approaches such as spectral clustering are
    applicable only when the full network is explicitly observed. In this paper, we
    focus on the online factorization and partition of implicit large-scale
    networks based on observations from an associated random walk. We propose an
    efficient and scalable nonconvex stochastic gradient algorithm. It is able to
    process dependent data dynamically generated by the underlying network and
    learn a low-dimensional representation for each vertex. By applying a diffusion
    approximation analysis, we show that the nonconvex stochastic algorithm
    achieves nearly optimal sample complexity. Once given the learned
    low-dimensional representations, we further apply clustering techniques to
    recover the network partition. We show that, when the associated Markov process
    is lumpable, one can recover the partition exactly with high probability. The
    proposed approach is experimented on Manhattan taxi data.

    TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

    Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li
    Comments: 9 pages
    Subjects: Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)

    High network communication cost for synchronizing gradients and parameters is
    the well-known bottleneck of distributed training. In this work, we propose
    TernGrad that uses ternary gradients to accelerate distributed deep learning in
    data parallelism. Our approach requires only three numerical levels {-1,0,1}
    which can aggressively reduce the communication time. We mathematically prove
    the convergence of TernGrad under the assumption of a bound on gradients.
    Guided by the bound, we propose layer-wise ternarizing and gradient clipping to
    improve its convergence. Our experiments show that applying TernGrad on AlexNet
    does not incur any accuracy loss and can even improve accuracy. The accuracy
    loss of GoogLeNet induced by TernGrad is less than 2% on average. Finally, a
    performance model is proposed to study the scalability of TernGrad. Experiments
    show significant speed gains for various deep neural networks.

    SmartPaste: Learning to Adapt Source Code

    Miltiadis Allamanis, Marc Brockschmidt
    Subjects: Learning (cs.LG); Software Engineering (cs.SE)

    Deep Neural Networks have been shown to succeed at a range of natural
    language tasks such as machine translation and text summarization. While tasks
    on source code (ie, formal languages) have been considered recently, most work
    in this area does not attempt to capitalize on the unique opportunities offered
    by its known syntax and structure. In this work, we introduce SmartPaste, a
    first task that requires to use such information. The task is a variant of the
    program repair problem that requires to adapt a given (pasted) snippet of code
    to surrounding, existing source code. As first solutions, we design a set of
    deep neural models that learn to represent the context of each variable
    location and variable usage in a data flow-sensitive way. Our evaluation
    suggests that our models can learn to solve the SmartPaste task in many cases,
    achieving 58.6% accuracy, while learning meaningful representation of variable
    usages.

    On-the-fly Operation Batching in Dynamic Computation Graphs

    Graham Neubig, Yoav Goldberg, Chris Dyer
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

    Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer
    more flexibility for implementing models that cope with data of varying
    dimensions and structure, relative to toolkits that operate on statically
    declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing
    toolkits – both static and dynamic – require that the developer organize the
    computations into the batches necessary for exploiting high-performance
    algorithms and hardware. This batching task is generally difficult, but it
    becomes a major hurdle as architectures become complex. In this paper, we
    present an algorithm, and its implementation in the DyNet toolkit, for
    automatically batching operations. Developers simply write minibatch
    computations as aggregations of single instance computations, and the batching
    algorithm seamlessly executes them, on the fly, using computationally efficient
    batched operations. On a variety of tasks, we obtain throughput similar to that
    obtained with manual batches, as well as comparable speedups over
    single-instance learning on architectures that are impractical to batch
    manually.

    Nonparametric Online Regression while Learning the Metric

    Ilja Kuzborskij, Nicolò Cesa-Bianchi
    Subjects: Learning (cs.LG)

    We study algorithms for online nonparametric regression that learn the
    directions along which the regression function is smoother. Our algorithm
    learns the Mahalanobis metric based on the gradient outer product matrix
    (oldsymbol{G}) of the regression function (automatically adapting to the
    effective rank of this matrix), while simultaneously bounding the regret —on
    the same data sequence— in terms of the spectrum of (oldsymbol{G}). As a
    preliminary step in our analysis, we generalize a nonparametric online learning
    algorithm by Hazan and Megiddo by enabling it to compete against functions
    whose Lipschitzness is measured with respect to an arbitrary Mahalanobis
    metric.

    Stabilizing GAN Training with Multiple Random Projections

    Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Training generative adversarial networks is unstable in high-dimensions when
    the true data distribution lies on a lower-dimensional manifold. The
    discriminator is then easily able to separate nearly all generated samples
    leaving the generator without meaningful gradients. We propose training a
    single generator simultaneously against an array of discriminators, each of
    which looks at a different random low-dimensional projection of the data. We
    show that individual discriminators then provide stable gradients to the
    generator, and that the generator learns to produce samples consistent with the
    full data distribution to satisfy all discriminators. We demonstrate the
    practical utility of this approach experimentally, and show that it is able to
    produce image samples with higher quality than traditional training with a
    single discriminator.

    Sparse hierarchical interaction learning with epigraphical projection

    Mingyuan Jiu, Nelly Pustelnik, Stefan Janaqi, Meriam Chebre, Philippe Ricoux
    Subjects: Learning (cs.LG)

    This work focus on regression optimization problem with hierarchical
    interactions between variables, which is beyond the additive models in the
    traditional linear regression. We investigate two different fashions in the
    literature to deal with this problem: “hierNet” and structural-sparsity
    regularization, and study their connections, then we propose a primal-dual
    proximal algorithm based on epigraphical projection to optimize the learning
    problem. The experimental setting first highlight the improvement of the
    proposed procedure compare to state-of-the-art methods based on FISTA or ADMM
    and second we provide comparisons between the different hierarchical
    penalization. The experiments are conducted both on the synthetic and real
    data.

    Minimax Statistical Learning and Domain Adaptation with Wasserstein Distances

    Jaeho Lee, Maxim Raginsky
    Subjects: Learning (cs.LG)

    As opposed to standard empirical risk minimization (ERM), distributionally
    robust optimization aims to minimize the worst-case risk over a larger
    ambiguity set containing the original empirical distribution of the training
    data. In this work, we describe a minimax framework for statistical learning
    with ambiguity sets given by balls in Wasserstein space. In particular, we
    prove a generalization bound that involves the covering number properties of
    the original ERM problem. As an illustrative example, we provide generalization
    guarantees for domain adaptation problems where the Wasserstein distance
    between the source and target domain distributions can be reliably estimated
    from unlabeled samples.

    Information-theoretic analysis of generalization capability of learning algorithms

    Aolin Xu, Maxim Raginsky
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    We derive upper bounds on the generalization error of a learning algorithm in
    terms of the mutual information between its input and output. The upper bounds
    provide theoretical guidelines for striking the right balance between data fit
    and generalization by controlling the input-output mutual information of a
    learning algorithm. The results can also be used to analyze the generalization
    capability of learning algorithms under adaptive composition, and the
    bias-accuracy tradeoffs in adaptive data analytics. Our work extends and leads
    to nontrivial improvements on the recent results of Russo and Zou.

    A unified view of entropy-regularized Markov decision processes

    Gergely Neu, Anders Jonsson, Vicenç Gómez
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We propose a general framework for entropy-regularized average-reward
    reinforcement learning in Markov decision processes (MDPs). Our approach is
    based on extending the linear-programming formulation of policy optimization in
    MDPs to accommodate convex regularization functions. Our key result is showing
    that using the conditional entropy of the joint state-action distributions as
    regularization yields a dual optimization problem closely resembling the
    Bellman optimality equations. This result enables us to formalize a number of
    state-of-the-art entropy-regularized reinforcement learning algorithms as
    approximate variants of Mirror Descent or Dual Averaging, and thus to argue
    about the convergence properties of these methods. In particular, we show that
    the exact version of the TRPO algorithm of Schulman et al. (2015) actually
    converges to the optimal policy, while the entropy-regularized policy gradient
    methods of Mnih et al. (2016) may fail to converge to a fixed point. Finally,
    we illustrate empirically the effects of using various regularization
    techniques on learning performance in a simple reinforcement learning setup.

    Backprop without Learning Rates Through Coin Betting

    Francesco Orabona, Tatiana Tommasi
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Deep learning methods achieve state-of-the-art performance in many
    application scenarios. Yet, these methods require a significant amount of
    hyperparameters tuning in order to achieve the best results. In particular,
    tuning the learning rates in the stochastic optimization process is still one
    of the main bottlenecks. In this paper, we propose a new stochastic gradient
    descent procedure for deep networks that does not require any learning rate
    setting. Contrary to previous methods, we do not adapt the learning rates nor
    we make use of the assumed curvature of the objective function. Instead, we
    reduce the optimization process to a game of betting on a coin and propose a
    learning rate free optimal algorithm for this scenario. Theoretical convergence
    is proven for convex and quasi-convex functions and empirical evidence shows
    the advantage of our algorithm over popular stochastic gradient algorithms.

    Follow the Signs for Robust Stochastic Optimization

    Lukas Balles, Philipp Hennig
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Stochastic noise on gradients is now a common feature in machine learning. It
    complicates the design of optimization algorithms, and its effect can be
    unintuitive: We show that in some settings, particularly those of low
    signal-to-noise ratio, it can be helpful to discard all but the signs of
    stochastic gradient elements. In fact, we argue that three popular existing
    methods already approximate this very paradigm. We devise novel stochastic
    optimization algorithms that explicitly follow stochastic sign estimates while
    appropriately accounting for their uncertainty. These methods favorably compare
    to the state of the art on a number of benchmark problems.

    An Out-of-the-box Full-network Embedding for Convolutional Neural Networks

    Dario Garcia-Gasulla, Armand Vilalta, Ferran Parés, Jonatan Moreno, Eduard Ayguadé, Jesus Labarta, Ulises Cortés, Toyotaro Suzumura
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Transfer learning for feature extraction can be used to exploit deep
    representations in contexts where there is very few training data, where there
    are limited computational resources, or when tuning the hyper-parameters needed
    for training is not an option. While previous contributions to feature
    extraction propose embeddings based on a single layer of the network, in this
    paper we propose a full-network embedding which successfully integrates
    convolutional and fully connected features, coming from all layers of a deep
    convolutional neural network. To do so, the embedding normalizes features in
    the context of the problem, and discretizes their values to reduce noise and
    regularize the embedding space. Significantly, this also reduces the
    computational cost of processing the resultant representations. The proposed
    method is shown to outperform single layer embeddings on several image
    classification tasks, while also being more robust to the choice of the
    pre-trained model used for obtaining the initial features. The performance gap
    in classification accuracy between thoroughly tuned solutions and the
    full-network embedding is also reduced, which makes of the proposed approach a
    competitive solution for a large set of applications.

    Individualized Risk Prognosis for Critical Care Patients: A Multi-task Gaussian Process Model

    Ahmed M. Alaa, Jinsung Yoon, Scott Hu, Mihaela van der Schaar
    Subjects: Learning (cs.LG)

    We report the development and validation of a data-driven real-time risk
    score that provides timely assessments for the clinical acuity of ward patients
    based on their temporal lab tests and vital signs, which allows for timely
    intensive care unit (ICU) admissions. Unlike the existing risk scoring
    technologies, the proposed score is individualized; it uses the electronic
    health record (EHR) data to cluster the patients based on their static
    covariates into subcohorts of similar patients, and then learns a separate
    temporal, non-stationary multi-task Gaussian Process (GP) model that captures
    the physiology of every subcohort. Experiments conducted on data from a
    heterogeneous cohort of 6,094 patients admitted to the Ronald Reagan UCLA
    medical center show that our risk score significantly outperforms the
    state-of-the-art risk scoring technologies, such as the Rothman index and MEWS,
    in terms of timeliness, true positive rate (TPR), and positive predictive value
    (PPV). In particular, the proposed score increases the AUC with 20% and 38% as
    compared to Rothman index and MEWS respectively, and can predict ICU admissions
    8 hours before clinicians at a PPV of 35% and a TPR of 50%. Moreover, we show
    that the proposed risk score allows for better decisions on when to discharge
    clinically stable patients from the ward, thereby improving the efficiency of
    hospital resource utilization.

    CayleyNets: Graph Convolutional Neural Networks with Complex Rational Spectral Filters

    Ron Levie, Federico Monti, Xavier Bresson, Michael M. Bronstein
    Subjects: Learning (cs.LG)

    The rise of graph-structured data such as social networks, regulatory
    networks, citation graphs, and functional brain networks, in combination with
    resounding success of deep learning in various applications, has brought the
    interest in generalizing deep learning models to non-Euclidean domains. In this
    paper, we introduce a new spectral domain convolutional architecture for deep
    learning on graphs. The core ingredient of our model is a new class of
    parametric rational complex functions (Cayley polynomials) allowing to
    efficiently compute localized regular filters on graphs that specialize on
    frequency bands of interest. Our model scales linearly with the size of the
    input data for sparsely-connected graphs, can handle different constructions of
    Laplacian operators, and typically requires less parameters than previous
    models. Extensive experimental results show the superior performance of our
    approach on various graph learning problems.

    Streaming Binary Sketching based on Subspace Tracking and Diagonal Uniformization

    Anne Morvan, Antoine Souloumiac, Cédric Gouy-Pailler, Jamal Atif
    Subjects: Learning (cs.LG)

    In this paper, we address the problem of learning compact
    similarity-preserving embeddings for massive high-dimensional streams of data
    in order to perform efficient similarity search. We present a new method for
    computing binary compressed representations – extit{sketches}- of
    high-dimensional real feature vectors. Given an expected code length (c) and
    high-dimensional input data points, our algorithm provides a binary code of (c)
    bits aiming at preserving the distance between the points from the original
    high-dimensional space. Our offline version of the algorithm outperforms the
    offline state-of-the-art methods regarding their computation time complexity
    and have a similar quality of the sketches. It also provides convergence
    guarantees. Moreover, our algorithm can be straightforwardly used in the
    streaming context by not requiring neither the storage of the whole dataset nor
    a chunk. We demonstrate the quality of our binary sketches through extensive
    experiments on real data for the nearest neighbors search task in the offline
    and online settings.

    Classification Using Proximity Catch Digraphs (Technical Report)

    Artür Manukyan, Elvan Ceyhan
    Comments: 41 pages, 16 figures
    Subjects: Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

    We employ random geometric digraphs to construct semi-parametric classifiers.
    These data-random digraphs are from parametrized random digraph families called
    proximity catch digraphs (PCDs). A related geometric digraph family, class
    cover catch digraph (CCCD), has been used to solve the class cover problem by
    using its approximate minimum dominating set. CCCDs showed relatively good
    performance in the classification of imbalanced data sets, and although CCCDs
    have a convenient construction in (mathbb{R}^d), finding minimum dominating
    sets is NP-hard and its probabilistic behaviour is not mathematically tractable
    except for (d=1). On the other hand, a particular family of PCDs, called
    emph{proportional-edge} PCDs (PE-PCDs), has mathematical tractable minimum
    dominating sets in (mathbb{R}^d); however their construction in higher
    dimensions may be computationally demanding. More specifically, we show that
    the classifiers based on PE-PCDs are prototype-based classifiers such that the
    exact minimum number of prototypes (equivalent to minimum dominating sets) are
    found in polynomial time on the number of observations. We construct two types
    of classifiers based on PE-PCDs. One is a family of hybrid classifiers depend
    on the location of the points of the training data set, and another type is a
    family of classifiers solely based on class covers. We assess the
    classification performance of our PE-PCD based classifiers by extensive Monte
    Carlo simulations, and compare them with that of other commonly used
    classifiers. We also show that, similar to CCCD classifiers, our classifiers
    are relatively better in classification in the presence of class imbalance.

    Infrastructure for Usable Machine Learning: The Stanford DAWN Project

    Peter Bailis, Kunle Olukoton, Christopher Re, Matei Zaharia
    Subjects: Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)

    Despite incredible recent advances in machine learning, building machine
    learning applications remains prohibitively time-consuming and expensive for
    all but the best-trained, best-funded engineering organizations. This expense
    comes not from a need for new and improved statistical models but instead from
    a lack of systems and tools for supporting end-to-end machine learning
    application development, from data preparation and labeling to
    productionization and monitoring. In this document, we outline opportunities
    for infrastructure supporting usable, end-to-end machine learning applications
    in the context of the nascent DAWN (Data Analytics for What’s Next) project at
    Stanford.

    Shake-Shake regularization

    Xavier Gastaldi
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    The method introduced in this paper aims at helping deep learning
    practitioners faced with an overfit problem. The idea is to replace, in a
    multi-branch network, the standard summation of parallel branches with a
    stochastic affine combination. Applied to 3-branch residual networks,
    shake-shake regularization improves on the best single shot published results
    on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%.
    Experiments on architectures without skip connections or Batch Normalization
    show encouraging results and open the door to a large set of applications. Code
    is available at this https URL

    Statistical inference using SGD

    Tianyang Li, Liu Liu, Anastasios Kyrillidis, Constantine Caramanis
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)

    We present a novel method for frequentist statistical inference in
    (M)-estimation problems, based on stochastic gradient descent (SGD) with a
    fixed step size: we demonstrate that the average of such SGD sequences can be
    used for statistical inference, after proper scaling. An intuitive analysis
    using the Ornstein-Uhlenbeck process suggests that such averages are
    asymptotically normal. From a practical perspective, our SGD-based inference
    procedure is a first order method, and is well-suited for large scale problems.
    To show its merits, we apply it to both synthetic and real datasets, and
    demonstrate that its accuracy is comparable to classical statistical methods,
    while requiring potentially far less computation.

    Nice latent variable models have log-rank

    Madeleine Udell, Alex Townsend
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Matrices of low rank are pervasive in big data, appearing in recommender
    systems, movie preferences, topic models, medical records, and genomics. While
    there is a vast literature on how to exploit low rank structure in these
    datasets, there is less attention on explaining why the low rank structure
    appears in the first place. We explain the abundance of low rank matrices in
    big data by proving that certain latent variable models associated to piecewise
    analytic functions are of log-rank. A large matrix from such a latent variable
    model can be approximated, up to a small error, by a low rank matrix.

    Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning

    Sahil Sharma, Srivatsan Ramesh, Girish Raguvir J, Balaraman Ravindran
    Comments: 10 pages + 11 page appendix
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Reinforcement Learning (RL) can model complex behavior policies for
    goal-directed sequential decision making tasks. A hallmark of RL algorithms is
    Temporal Difference (TD) learning: value function for the current state is
    moved towards a bootstrapped target that is estimated using next state’s value
    function. (lambda)-returns generalize beyond 1-step returns and strike a
    balance between Monte Carlo and TD learning methods. While lambda-returns have
    been extensively studied in RL, they haven’t been explored a lot in Deep RL.
    This paper’s first contribution is an exhaustive benchmarking of
    lambda-returns. Although mathematically tractable, the use of exponentially
    decaying weighting of n-step returns based targets in lambda-returns is a
    rather ad-hoc design choice. Our second major contribution is that we propose a
    generalization of lambda-returns called Confidence-based Autodidactic Returns
    (CAR), wherein the RL agent learns the weighting of the n-step returns in an
    end-to-end manner. This allows the agent to learn to decide how much it wants
    to weigh the n-step returns based targets. In contrast, lambda-returns restrict
    RL agents to use an exponentially decaying weighting scheme. Autodidactic
    returns can be used for improving any RL algorithm which uses TD learning. We
    empirically demonstrate that using sophisticated weighted mixtures of
    multi-step returns (like CAR and lambda-returns) considerably outperforms the
    use of n-step returns. We perform our experiments on the Asynchronous Advantage
    Actor Critic (A3C) algorithm in the Atari 2600 domain.

    Parallel Streaming Wasserstein Barycenters

    Matthew Staib, Sebastian Claici, Justin Solomon, Stefanie Jegelka
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO); Machine Learning (stat.ML)

    Efficiently aggregating data from different sources is a challenging problem,
    particularly when samples from each source are distributed differently. These
    differences can be inherent to the inference task or present for other reasons:
    sensors in a sensor network may be placed far apart, affecting their individual
    measurements. Conversely, it is computationally advantageous to split Bayesian
    inference tasks across subsets of data, but data need not be identically
    distributed across subsets. One principled way to fuse probability
    distributions is via the lens of optimal transport: the Wasserstein barycenter
    is a single distribution that summarizes a collection of input measures while
    respecting their geometry. However, computing the barycenter scales poorly and
    requires discretization of all input distributions and the barycenter itself.
    Improving on this situation, we present a scalable, communication-efficient,
    parallel algorithm for computing the Wasserstein barycenter of arbitrary
    distributions. Our algorithm can operate directly on continuous input
    distributions and is optimized for streaming data. Our method is even robust to
    nonstationary input distributions and produces a barycenter estimate that
    tracks the input measures over time. The algorithm is semi-discrete, needing to
    discretize only the barycenter estimate. To the best of our knowledge, we also
    provide the first bounds on the quality of the approximate barycenter as the
    discretization becomes finer. Finally, we demonstrate the practical
    effectiveness of our method, both in tracking moving distributions on a sphere,
    as well as in a large-scale Bayesian inference task.

    Stabilizing Adversarial Nets With Prediction Methods

    Abhay Yadav (1), Sohil Shah (1), Zheng Xu (1), David Jacobs (1), Tom Goldstein (1) ((1) University of Maryland, College Park, MD, USA)
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (cs.NA)

    Adversarial neural networks solve many important problems in data science,
    but are notoriously difficult to train. These difficulties come from the fact
    that optimal weights for adversarial nets correspond to saddle points, and not
    minimizers, of the loss function. The alternating stochastic gradient methods
    typically used for such problems do not reliably converge to saddle points, and
    when convergence does happen it is often highly sensitive to learning rates. We
    propose a simple modification of stochastic gradient descent that stabilizes
    adversarial networks. We show, both in theory and practice, that the proposed
    method reliably converges to saddle points. This makes adversarial networks
    less likely to “collapse”, and enables faster training with larger learning
    rates.

    Deep Sparse Coding Using Optimized Linear Expansion of Thresholds

    Debabrata Mahapatra, Subhadip Mukherjee, Chandra Sekhar Seelamantula
    Comments: Submission date: November 11, 2016. 19 pages; 9 figures
    Subjects: Learning (cs.LG)

    We address the problem of reconstructing sparse signals from noisy and
    compressive measurements using a feed-forward deep neural network (DNN) with an
    architecture motivated by the iterative shrinkage-thresholding algorithm
    (ISTA). We maintain the weights and biases of the network links as prescribed
    by ISTA and model the nonlinear activation function using a linear expansion of
    thresholds (LET), which has been very successful in image denoising and
    deconvolution. The optimal set of coefficients of the parametrized activation
    is learned over a training dataset containing measurement-sparse signal pairs,
    corresponding to a fixed sensing matrix. For training, we develop an efficient
    second-order algorithm, which requires only matrix-vector product computations
    in every training epoch (Hessian-free optimization) and offers superior
    convergence performance than gradient-descent optimization. Subsequently, we
    derive an improved network architecture inspired by FISTA, a faster version of
    ISTA, to achieve similar signal estimation performance with about 50% of the
    number of layers. The resulting architecture turns out to be a deep residual
    network, which has recently been shown to exhibit superior performance in
    several visual recognition tasks. Numerical experiments demonstrate that the
    proposed DNN architectures lead to 3 to 4 dB improvement in the reconstruction
    signal-to-noise ratio (SNR), compared with the state-of-the-art sparse coding
    algorithms.

    Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning

    Sahil Sharma, Aravind Suresh, Rahul Ramesh, Balaraman Ravindran
    Comments: 11 pages + 7 pages appendix
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Deep Reinforcement Learning (DRL) methods have performed well in an
    increasing numbering of high-dimensional visual decision making domains. Among
    all such visual decision making problems, those with discrete action spaces
    often tend to have underlying compositional structure in the said action space.
    Such action spaces often contain actions such as go left, go up as well as go
    diagonally up and left (which is a composition of the former two actions). The
    representations of control policies in such domains have traditionally been
    modeled without exploiting this inherent compositional structure in the action
    spaces. We propose a new learning paradigm, Factored Action space
    Representations (FAR) wherein we decompose a control policy learned using a
    Deep Reinforcement Learning Algorithm into independent components, analogous to
    decomposing a vector in terms of some orthogonal basis vectors. This
    architectural modification of the control policy representation allows the
    agent to learn about multiple actions simultaneously, while executing only one
    of them. We demonstrate that FAR yields considerable improvements on top of two
    DRL algorithms in Atari 2600: FARA3C outperforms A3C (Asynchronous Advantage
    Actor Critic) in 9 out of 14 tasks and FARAQL outperforms AQL (Asynchronous
    n-step Q-Learning) in 9 out of 13 tasks.

    Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

    Nicholas Carlini, David Wagner
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

    Neural networks are known to be vulnerable to adversarial examples: inputs
    that are close to valid inputs but classified incorrectly. We investigate the
    security of ten recent proposals that are designed to detect adversarial
    examples. We show that all can be defeated, even when the adversary does not
    know the exact parameters of the detector. We conclude that adversarial
    examples are significantly harder to detect than previously appreciated, and we
    propose several guidelines for evaluating future proposed defenses.

    Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

    Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)

    The Particle Swarm Optimization Policy (PSO-P) has been recently introduced
    and proven to produce remarkable results on interacting with academic
    reinforcement learning benchmarks in an off-policy, batch-based setting. To
    further investigate the properties and feasibility on real-world applications,
    this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a
    novel reinforcement learning (RL) benchmark that aims at being realistic by
    including a variety of aspects found in industrial applications, like
    continuous state and action spaces, a high dimensional, partially observable
    state space, delayed effects, and complex stochasticity. The experimental
    results of PSO-P on IB are compared to results of closed-form control policies
    derived from the model-based Recurrent Control Neural Network (RCNN) and the
    model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not
    only of interest for academic benchmarks, but also for real-world industrial
    applications, since it also yielded the best performing policy in our IB
    setting. Compared to other well established RL techniques, PSO-P produced
    outstanding results in performance and robustness, requiring only a relatively
    low amount of effort in finding adequate parameters or making complex design
    decisions.

    Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

    Samet Oymak, Mehrdad Mahdavi, Jiasi Chen
    Comments: 22 pages, 7 figures
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)

    For various applications, the relations between the dependent and independent
    variables are highly nonlinear. Consequently, for large scale complex problems,
    neural networks and regression trees are commonly preferred over linear models
    such as Lasso. This work proposes learning the feature nonlinearities by
    binning feature values and finding the best fit in each quantile using
    non-convex regularized linear regression. The algorithm first captures the
    dependence between neighboring quantiles by enforcing smoothness via
    piecewise-constant/linear approximation and then selects a sparse subset of
    good features. We prove that the proposed algorithm is statistically and
    computationally efficient. In particular, it achieves linear rate of
    convergence while requiring near-minimal number of samples. Evaluations on
    synthetic and real datasets demonstrate that algorithm is competitive with
    current state-of-the-art and accurately learns feature nonlinearities. Finally,
    we explore an interesting connection between the binning stage of our algorithm
    and sparse Johnson-Lindenstrauss matrices.

    SVM via Saddle Point Optimization: New Bounds and Distributed Algorithms

    Yifei Jin, Lingxiao Huang, Jian Li
    Comments: 20 pages
    Subjects: Learning (cs.LG); Numerical Analysis (cs.NA)

    Support Vector Machine is one of the most classical approaches for
    classification and regression. Despite being studied for decades, obtaining
    practical algorithms for SVM is still an active research problem in machine
    learning. In this paper, we propose a new perspective for SVM via saddle point
    optimization. We provide an algorithm which achieves
    ((1-epsilon))-approximations with running time ( ilde{O}(nd+nsqrt{d /
    epsilon})) for both separable (hard margin SVM) and non-separable cases
    ((
    u)-SVM ), where (n) is the number of points and (d) is the dimensionality.
    To the best of our knowledge, the current best algorithm for hard margin SVM
    achieved by Gilbert algorithm~cite{gartner2009coresets} requires (O(nd /
    epsilon )) time. Our algorithm improves the running time by a factor of
    (sqrt{d}/sqrt{epsilon}). For (
    u)-SVM, besides the well known quadratic
    programming approach which requires (Omega(n^2 d))
    time~cite{joachims1998making,platt199912}, no better algorithm is known. In
    the paper, we provide the first nearly linear time algorithm for (
    u)-SVM. We
    also consider the distributed settings and provide distributed algorithms with
    low communication cost via saddle point optimization. Our algorithms require
    ( ilde{O}(k(d +sqrt{d/epsilon}))) communication cost where (k) is the number
    of clients, almost matching the theoretical lower bound.

    Speedup from a different parametrization within the Neural Net algorithm

    Michael F. Zimmer
    Comments: 8 pages
    Subjects: Learning (cs.LG)

    A different parametrization of the hyperplanes is used in the neural net
    algorithm. As demonstrated on several autoencoder examples it significantly
    outperforms the usual parametrization, reaching lower training error values
    with only a fraction of the number of epochs. It’s argued that it makes it
    easier to understand and initialize the parameters.

    GAR: An efficient and scalable Graph-based Activity Regularization for semi-supervised learning

    Ozsel Kilinc, Ismail Uysal
    Comments: Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017)
    Subjects: Learning (cs.LG)

    In this paper, we propose a novel graph-based approach for semi-supervised
    learning problems, which considers an adaptive adjacency of the examples
    throughout the unsupervised portion of the training. Adjacency of the examples
    is inferred using the predictions of a neural network model which is first
    initialized by a supervised pretraining. These predictions are then updated
    according to a novel unsupervised objective which regularizes another
    adjacency, now linking the output nodes. Regularizing the adjacency of the
    output nodes, inferred from the predictions of the network, creates an easier
    optimization problem and ultimately provides that the predictions of the
    network turn into the optimal embedding. Ultimately, the proposed framework
    provides an effective and scalable graph-based solution which is natural to the
    operational mechanism of deep neural networks. Our results show
    state-of-the-art performance within semi-supervised learning with the highest
    accuracies reported to date in the literature for SVHN and NORB datasets.

    Securing Deep Neural Nets against Adversarial Attacks with Moving Target Defense

    Sailik Sengupta, Tathagata Chakraborti, Subbarao Kambhampati
    Comments: 10 page, 4 figures
    Subjects: Learning (cs.LG); Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT)

    Deep Neural Networks (DNNs) are presently the state-of-the-art for image
    classification tasks. However, recent works have shown that these systems can
    be easily fooled to misidentify images by modifying the image in a particular
    way. Moreover, defense mechanisms proposed in the literature so far are mostly
    attack-specific and prove to be ineffective against new attacks. Indeed, recent
    work on universal perturbations can generate a single modification for all test
    images that is able to make existing networks misclassify 90% of the time.
    Presently, to our knowledge, no defense mechanisms are effective in preventing
    this. As such, the design of a general defense strategy against a wide range of
    attacks for Neural Networks becomes a challenging problem. In this paper, we
    derive inspiration from recent advances in the field of cybersecurity and
    multi-agent systems and propose to use the concept of Moving Target Defense
    (MTD) for increasing the robustness of well-known deep networks trained on the
    ImageNet dataset towards such adversarial attacks. In using this technique, we
    formalize and exploit the notion of differential immunity of different networks
    to specific attacks. To classify a single test image, we pick one of the
    trained networks each time and then use its classification output. To ensure
    maximum robustness, we generate an effective strategy by formulating this
    interaction as a Repeated Bayesian Stackelberg Game with a Defender and the
    Users. As a network switching strategy, we compute a Strong Stackelberg
    Equilibrium that optimizes the accuracy of prediction while at the same time
    reduces the misclassification rate on adversarial modification of test images.
    We show that while our approach produces an accuracy of 92.79% for the
    legitimate users, attackers can only misclassify images 58% (instead of 93.7%)
    of the time even when they select the best attack available to them.

    Two-temperature logistic regression based on the Tsallis divergence

    Ehsan Amid, Manfred K. Warmuth
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We develop a variant of multiclass logistic regression that achieves three
    properties: i) We minimize a non-convex surrogate loss which makes the method
    robust to outliers, ii) our method allows transitioning between non-convex and
    convex losses by the choice of the parameters, iii) the surrogate loss is Bayes
    consistent, even in the non-convex case. The algorithm has one weight vector
    per class and the surrogate loss is a function of the linear activations (one
    per class). The surrogate loss of an example with linear activation vector
    (mathbf{a}) and class (c) has the form (-log_{t_1} exp_{t_2} (a_c –
    G_{t_2}(mathbf{a}))) where the two temperatures (t_1) and (t_2) “temper” the
    (log) and (exp), respectively, and (G_{t_2}) is a generalization of the
    log-partition function. We motivate this loss using the Tsallis divergence. As
    the temperature of the logarithm becomes smaller than the temperature of the
    exponential, the surrogate loss becomes “more quasi-convex”. Various tunings of
    the temperatures recover previous methods and tuning the degree of
    non-convexity is crucial in the experiments. The choice (t_1<1) and (t_2>1)
    performs best experimentally. We explain this by showing that (t_1 < 1) caps
    the surrogate loss and (t_2 >1) makes the predictive distribution have a heavy
    tail.

    Machine learning modeling for time series problem: Predicting flight ticket prices

    Jun Lu
    Comments: 17 pages, 19 figures
    Subjects: Learning (cs.LG)

    Machine learning has been used in all kinds of fields. In this article, we
    introduce how machine learning can be applied into time series problem.
    Especially, we use the airline ticket prediction problem as our specific
    problem. Airline companies use many different variables to determine the flight
    ticket prices: indicator whether the travel is during the holidays, the number
    of free seats in the plane etc. Some of the variables are observed, but some of
    them are hidden. Based on the data over a 103 day period, we trained our
    models, getting the best model – which is AdaBoost-Decision Tree
    Classification. This algorithm has best performance over the observed 8 routes
    which has 61.35(\%) better performance than the random purchase strategy, and
    relatively small variance over these routes. And we also considered the
    situation that we cannot get too much historical datas for some routes (for
    example the route is new and does not have historical data) or we do not want
    to train historical data to predict to buy or wait quickly, in which problem,
    we used HMM Sequence Classification based AdaBoost-Decision Tree Classification
    to perform our prediction on 12 new routes. Finally, we got 31.71(\%) better
    performance than the random purchase strategy.

    The High-Dimensional Geometry of Binary Neural Networks

    Alexander G. Anderson, Cory P. Berg
    Comments: 12 pages, 4 Figures
    Subjects: Learning (cs.LG)

    Recent research has shown that one can train a neural network with binary
    weights and activations at train time by augmenting the weights with a
    high-precision continuous latent variable that accumulates small changes from
    stochastic gradient descent. However, there is a dearth of theoretical analysis
    to explain why we can effectively capture the features in our data with binary
    weights and activations. Our main result is that the neural networks with
    binary weights and activations trained using the method of Courbariaux, Hubara
    et al. (2016) work because of the high-dimensional geometry of binary vectors.
    In particular, the ideal continuous vectors that extract out features in the
    intermediate representations of these BNNs are well-approximated by binary
    vectors in the sense that dot products are approximately preserved. Compared to
    previous research that demonstrated the viability of such BNNs, our work
    explains why these BNNs work in terms of the HD geometry. Our theory serves as
    a foundation for understanding not only BNNs but a variety of methods that seek
    to compress traditional neural networks. Furthermore, a better understanding of
    multilayer binary neural networks serves as a starting point for generalizing
    BNNs to other neural network architectures such as recurrent neural networks.

    Nestrov's Acceleration For Second Order Method

    Haishan Ye, Zhihua Zhang
    Subjects: Learning (cs.LG)

    Optimization plays a key role in machine learning. Recently, stochastic
    second-order methods have attracted much attention due to their low
    computational cost in each iteration. However, these algorithms might perform
    poorly especially if it is hard to approximate the Hessian well and
    efficiently. As far as we know, there is no effective way to handle this
    problem. In this paper, we resort to Nestrov’s acceleration technique to
    improve the convergence performance of a class of second-order methods called
    approximate Newton. We give a theoretical analysis that Nestrov’s acceleration
    technique can improve the convergence performance for approximate Newton just
    like for first-order methods. We accordingly propose an accelerated regularized
    sub-sampled Newton. Our accelerated algorithm performs much better than the
    original regularized sub-sampled Newton in experiments, which validates our
    theory empirically. Besides, the accelerated regularized sub-sampled Newton has
    good performance comparable to or even better than state-of-art algorithms.

    Local Information with Feedback Perturbation Suffices for Dictionary Learning in Neural Circuits

    Tsung-Han Lin
    Comments: 10 pages, 4 figures
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    While the sparse coding principle can successfully model information
    processing in sensory neural systems, it remains unclear how learning can be
    accomplished under neural architectural constraints. Feasible learning rules
    must rely solely on synaptically local information in order to be implemented
    on spatially distributed neurons. We describe a neural network with spiking
    neurons that can address the aforementioned fundamental challenge and solve the
    L1-minimizing dictionary learning problem, representing the first model able to
    do so. Our major innovation is to introduce feedback synapses to create a
    pathway to turn the seemingly non-local information into local ones. The
    resulting network encodes the error signal needed for learning as the change of
    network steady states caused by feedback, and operates akin to the classical
    stochastic gradient descent method.

    Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

    Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy
    Comments: Under Review of NIPS 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

    Reward augmented maximum likelihood (RAML) is a simple and effective learning
    framework to directly optimize towards the reward function in structured
    prediction tasks. RAML incorporates task-specific reward by performing
    maximum-likelihood updates on candidate outputs sampled according to an
    exponentiated payoff distribution, which gives higher probabilities to
    candidates that are close to the reference output. While RAML is notable for
    its simplicity, efficiency, and its impressive empirical successes, the
    theoretical properties of RAML, especially the behavior of the exponentiated
    payoff distribution, has not been examined thoroughly. In this work, we
    introduce softmax Q-distribution estimation, a novel theoretical interpretation
    of RAML, which reveals the relation between RAML and Bayesian decision theory.
    The softmax Q-distribution can be regarded as a smooth approximation of Bayes
    decision boundary, and the Bayes decision rule is achieved by decoding with
    this Q-distribution. We further show that RAML is equivalent to approximately
    estimating the softmax Q-distribution. Experiments on three structured
    prediction tasks with rewards defined on sequential (named entity recognition),
    tree-based (dependency parsing) and irregular (machine translation) structures
    show notable improvements over maximum likelihood baselines.

    VAE with a VampPrior

    Jakub M. Tomczak, Max Welling
    Comments: 15 pages
    Subjects: Learning (cs.LG)

    Many different methods to train deep generative models have been proposed in
    the past. In this paper, we propose to extend the variational auto-encoder
    (VAE) framework with a new type of prior which we call “Variational Mixture of
    Posteriors” prior, or VampPrior for short. The VampPrior consists of a mixture
    distribution (e.g., a mixture of Gaussians) with components given by
    variational posteriors conditioned on learnable pseudo-inputs. We further
    extend this prior to a two layer hierarchical model and show that this
    architecture where prior and posterior are coupled, learns significantly better
    models. The model also avoids the usual local optima issues that plague VAEs
    related to useless latent dimensions. We provide empirical studies on three
    benchmark datasets, namely, MNIST, OMNIGLOT and Caltech 101 Silhouettes, and
    show that applying the hierarchical VampPrior delivers state-of-the-art results
    on all three datasets in the unsupervised permutation invariant setting.

    Deep Learning as a Tool to Predict Flow Patterns in Two-Phase Flow

    Mohammadmehdi Ezzatabadipour, Parth Singh, Melvin D. Robinson, Pablo Guillen-Rondon, Carlos Torres
    Comments: Part of DM4OG 2017 proceedings (arXiv:1705.03451)
    Subjects: Learning (cs.LG)

    In order to better model complex real-world data such as multiphase flow, one
    approach is to develop pattern recognition techniques and robust features that
    capture the relevant information. In this paper, we use deep learning methods,
    and in particular employ the multilayer perceptron, to build an algorithm that
    can predict flow pattern in twophase flow from fluid properties and pipe
    conditions. The preliminary results show excellent performance when compared
    with classical methods of flow pattern prediction.

    A unified approach to interpreting model predictions

    Scott Lundberg, Su-In Lee
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Understanding why a model made a certain prediction is crucial in many
    applications. However, with large modern datasets the best accuracy is often
    achieved by complex models even experts struggle to interpret, such as ensemble
    or deep learning models. This creates a tension between accuracy and
    interpretability. In response, a variety of methods have recently been proposed
    to help users interpret the predictions of complex models. Here, we present a
    unified framework for interpreting predictions, namely SHAP (SHapley Additive
    exPlanations, which assigns each feature an importance for a particular
    prediction. The key novel components of the SHAP framework are the
    identification of a class of additive feature importance measures and
    theoretical results that there is a unique solution in this class with a set of
    desired properties. This class unifies six existing methods, and several recent
    methods in this class do not have these desired properties. This means that our
    framework can inform the development of new methods for explaining prediction
    models. We demonstrate that several new methods we presented in this paper
    based on the SHAP framework show better computational performance and better
    consistency with human intuition than existing methods.

    Regularizing deep networks using efficient layerwise adversarial training

    Swami Sankaranarayanan, Arpit Jain, Rama Chellappa, Ser Nam Lim
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Adversarial training has been shown to regularize deep neural networks in
    addition to increasing their robustness to adversarial examples. However, its
    impact on very deep state of the art networks has not been fully investigated.
    In this paper, we present an efficient approach to perform adversarial training
    by perturbing intermediate layer activations and study the use of such
    perturbations as a regularizer during training. We use these perturbations to
    train very deep models such as ResNets and show improvement in performance both
    on adversarial and original test data. Our experiments highlight the benefits
    of perturbing intermediate layer activations compared to perturbing only the
    inputs. The results on CIFAR-10 and CIFAR-100 datasets show the merits of the
    proposed adversarial training approach. Additional results on WideResNets show
    that our approach provides significant improvement in classification accuracy
    for a given base model, outperforming dropout and other base models of larger
    size.

    Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

    Anupam Datta, Matthew Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    This paper presents an approach to formalizing and enforcing a class of use
    privacy properties in data-driven systems. In contrast to prior work, we focus
    on use restrictions on proxies (i.e. strong predictors) of protected
    information types. Our definition relates proxy use to intermediate
    computations that occur in a program, and identify two essential properties
    that characterize this behavior: 1) its result is strongly associated with the
    protected information type in question, and 2) it is likely to causally affect
    the final output of the program. For a specific instantiation of this
    definition, we present a program analysis technique that detects instances of
    proxy use in a model, and provides a witness that identifies which parts of the
    corresponding program exhibit the behavior. Recognizing that not all instances
    of proxy use of a protected information type are inappropriate, we make use of
    a normative judgment oracle that makes this inappropriateness determination for
    a given witness. Our repair algorithm uses the witness of an inappropriate
    proxy use to transform the model into one that provably does not exhibit proxy
    use, while avoiding changes that unduly affect classification accuracy. Using a
    corpus of social datasets, our evaluation shows that these algorithms are able
    to detect proxy use instances that would be difficult to find using existing
    techniques, and subsequently remove them while maintaining acceptable
    classification performance. Authors

    Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

    Joao Carreira, Andrew Zisserman
    Comments: To appear at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    The paucity of videos in current action classification datasets (UCF-101 and
    HMDB-51) has made it difficult to identify good video architectures, as most
    methods obtain similar performance on existing small-scale benchmarks. This
    paper re-evaluates state-of-the-art architectures in light of the new Kinetics
    Human Action Video dataset. Kinetics has two orders of magnitude more data,
    with 400 human action classes and over 400 clips per class, and is collected
    from realistic, challenging YouTube videos. We provide an analysis on how
    current architectures fare on the task of action classification on this dataset
    and how much performance improves on the smaller benchmark datasets after
    pre-training on Kinetics.

    We also introduce a new Two-Stream Inflated 3D ConvNet (I3D) that is based on
    2D ConvNet inflation: filters and pooling kernels of very deep image
    classification ConvNets are expanded into 3D, making it possible to learn
    seamless spatio-temporal feature extractors from video while leveraging
    successful ImageNet architecture designs and even their parameters. We show
    that, after pre-training on Kinetics, I3D models considerably improve upon the
    state-of-the-art in action classification, reaching 80.7% on HMDB-51 and 98.0%
    on UCF-101.

    A Regularized Framework for Sparse and Structured Neural Attention

    Vlad Niculae, Mathieu Blondel
    Comments: 21 pages, including appendix
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

    Modern neural networks are often augmented with an attention mechanism, which
    tells the network where to focus within the input. We propose in this paper a
    new framework for sparse and structured attention, building upon a max operator
    regularized with a strongly convex function. We show that this operator is
    differentiable and that its gradient defines a mapping from real values to
    probabilities, suitable as an attention mechanism. Our framework includes
    softmax and a slight generalization of the recently-proposed sparsemax as
    special cases. However, we also show how our framework can incorporate modern
    structured penalties, resulting in new attention mechanisms that focus on
    entire segments or groups of an input, encouraging parsimony and
    interpretability. We derive efficient algorithms to compute the forward and
    backward passes of these attention mechanisms, enabling their use in a neural
    network trained with backpropagation. To showcase their potential as a drop-in
    replacement for existing attention mechanisms, we evaluate them on three
    large-scale tasks: textual entailment, machine translation, and sentence
    summarization. Our attention mechanisms improve interpretability without
    sacrificing performance; notably, on textual entailment and summarization, we
    outperform the existing attention mechanisms based on softmax and sparsemax.

    A Linear-Time Kernel Goodness-of-Fit Test

    Wittawat Jitkrittum, Wenkai Xu, Zoltan Szabo, Kenji Fukumizu, Arthur Gretton
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a novel adaptive test of goodness-of-fit, with computational cost
    linear in the number of samples. We learn the test features that best indicate
    the differences between observed samples and a reference model, by minimizing
    the false negative rate. These features are constructed via Stein’s method,
    meaning that it is not necessary to compute the normalising constant of the
    model. We analyse the asymptotic Bahadur efficiency of the new test, and prove
    that under a mean-shift alternative, our test always has greater relative
    efficiency than a previous linear-time kernel test, regardless of the choice of
    parameters for that test. In experiments, the performance of our method exceeds
    that of the earlier linear-time test, and matches or exceeds the power of a
    quadratic-time kernel test. In high dimensions and where model structure may be
    exploited, our goodness of fit test performs far better than a quadratic-time
    two-sample test based on the Maximum Mean Discrepancy, with samples drawn from
    the model.

    LOGAN: Evaluating Privacy Leakage of Generative Models Using Generative Adversarial Networks

    Jamie Hayes, Luca Melis, George Danezis, Emiliano De Cristofaro
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    Recent advances in machine learning are paving the way for the artificial
    generation of high quality images and videos. In this paper, we investigate how
    generating synthetic samples through generative models can lead to information
    leakage, and, consequently, to privacy breaches affecting individuals’ privacy
    that contribute their personal or sensitive data to train these models. In
    order to quantitatively measure privacy leakage, we train a Generative
    Adversarial Network (GAN), which combines a discriminative model and a
    generative model, to detect overfitting by relying on the discriminator
    capacity to learn statistical differences in distributions.

    We present attacks based on both white-box and black-box access to the target
    model, and show how to improve it through auxiliary knowledge of samples in the
    dataset. We test our attacks on several state-of-the-art models such as Deep
    Convolutional GAN (DCGAN), Boundary Equilibrium GAN (BEGAN), and the
    combination of DCGAN with a Variational Autoencoder (DCGAN+VAE), using datasets
    consisting of complex representations of faces (LFW) and objects (CIFAR-10).
    Our white-box attacks are 100% successful at inferring which samples were used
    to train the target model, while the best black-box attacks can infer training
    set membership with over 60% accuracy.

    Multi-output Polynomial Networks and Factorization Machines

    Mathieu Blondel, Vlad Niculae, Takuma Otsuka, Naonori Ueda
    Comments: 16 pages, including appendix
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Factorization machines and polynomial networks are supervised polynomial
    models based on an efficient low-rank decomposition. We extend these models to
    the multi-output setting, i.e., for learning vector-valued functions, with
    application to multi-class or multi-task problems. We cast this as the problem
    of learning a 3-way tensor whose slices share a common decomposition and
    propose a convex formulation of that problem. We then develop an efficient
    conditional gradient algorithm and prove its global convergence, despite the
    fact that it involves a non-convex hidden unit selection step. On
    classification tasks, we show that our algorithm achieves excellent accuracy
    with much sparser models than existing methods. On recommendation system tasks,
    we show how to combine our algorithm with a reduction from ordinal regression
    to multi-output classification and show that the resulting algorithm
    outperforms existing baselines in terms of ranking accuracy.

    Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk

    Paul Hand, Vladislav Voroninski
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR)

    We examine the theoretical properties of enforcing priors provided by
    generative deep neural networks via empirical risk minimization. In particular
    we consider two models, one in which the task is to invert a generative neural
    network given access to its last layer and another which entails recovering a
    latent code in the domain of a generative neural network from compressive
    linear observations of its last layer. We establish that in both cases, in
    suitable regimes of network layer sizes and a randomness assumption on the
    network weights, that the non-convex objective function given by empirical risk
    minimization does not have any spurious stationary points. That is, we
    establish that with high probability, at any point away from small
    neighborhoods around two scalar multiples of the desired solution, there is a
    descent direction. These results constitute the first theoretical guarantees
    which establish the favorable global geometry of these non-convex optimization
    problems, and bridge the gap between the empirical success of deep learning and
    a rigorous understanding of non-linear inverse problems.

    Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

    Xin Dong, Shangyu Chen, Sinno Jialin Pan
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    How to develop slim and accurate deep neural networks has become crucial for
    real- world applications, especially for those employed in embedded systems.
    Though previous work along this research line has shown some promising results,
    most existing methods either fail to significantly compress a well-trained deep
    network or require a heavy retraining process for the pruned deep network to
    re-boost its prediction performance. In this paper, we propose a new layer-wise
    pruning method for deep neural networks. In our proposed method, parameters of
    each individual layer are pruned independently based on second order
    derivatives of a layer-wise error function with respect to the corresponding
    parameters. We prove that the final prediction performance drop after pruning
    is bounded by a linear combination of the reconstructed errors caused at each
    layer. Therefore, there is a guarantee that one only needs to perform a light
    retraining process on the pruned network to resume its original prediction
    performance. We conduct extensive experiments on benchmark datasets to
    demonstrate the effectiveness of our pruning method compared with several
    state-of-the-art baseline methods.

    Batch Size Matters: A Diffusion Approximation Framework on Nonconvex Stochastic Gradient Descent

    Chris Junchi Li, Lei Li, Junyang Qian, Jian-Guo Liu
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In this paper, we study the stochastic gradient descent method in analyzing
    nonconvex statistical optimization problems from a diffusion approximation
    point of view. Using the theory of large deviation of random dynamical system,
    we prove in the small stepsize regime and the presence of omnidirectional noise
    the following: starting from a local minimizer (resp.~saddle point) the SGD
    iteration escapes in a number of iteration that is exponentially
    (resp.~linearly) dependent on the inverse stepsize. We take the deep neural
    network as an example to study this phenomenon. Based on a new analysis of the
    mixing rate of multidimensional Ornstein-Uhlenbeck processes, our theory
    substantiate a very recent empirical results by citet{keskar2016large},
    suggesting that large batch sizes in training deep learning for synchronous
    optimization leads to poor generalization error.

    Learning from Complementary Labels

    Takashi Ishida, Gang Niu, Masashi Sugiyama
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Collecting labeled data is costly and thus is a critical bottleneck in
    real-world classification tasks. To mitigate the problem, we consider a
    complementary label, which specifies a class that a pattern does not belong to.
    Collecting complementary labels would be less laborious than ordinary labels
    since users do not have to carefully choose the correct class from many
    candidate classes. However, complementary labels are less informative than
    ordinary labels and thus a suitable approach is needed to better learn from
    complementary labels. In this paper, we show that an unbiased estimator of the
    classification risk can be obtained only from complementary labels, if a loss
    function satisfies a particular symmetric condition. We theoretically prove the
    estimation error bounds for the proposed method, and experimentally demonstrate
    the usefulness of the proposed algorithms.

    Annealed Generative Adversarial Networks

    Arash Mehrjou, Bernhard Schölkopf, Saeed Saremi
    Comments: 9 pages, 6 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We introduce a novel framework for adversarial training where the target
    distribution is annealed between the uniform distribution and the data
    distribution. We posited a conjecture that learning under continuous annealing
    in the nonparametric regime is stable irrespective of the divergence measures
    in the objective function and proposed an algorithm, dubbed {ss}-GAN, in
    corollary. In this framework, the fact that the initial support of the
    generative network is the whole ambient space combined with annealing are key
    to balancing the minimax game. In our experiments on synthetic data, MNIST, and
    CelebA, {ss}-GAN with a fixed annealing schedule was stable and did not suffer
    from mode collapse.

    Shallow Updates for Deep Reinforcement Learning

    Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN)
    have achieved state-of-the-art results in a variety of challenging,
    high-dimensional domains. This success is mainly attributed to the power of
    deep neural networks to learn rich domain representations for approximating the
    value function or policy. Batch reinforcement learning methods with linear
    representations, on the other hand, are more stable and require less hyper
    parameter tuning. Yet, substantial feature engineering is necessary to achieve
    good results. In this work we propose a hybrid approach — the Least Squares
    Deep Q-Network (LS-DQN), which combines rich feature representations learned by
    a DRL algorithm with the stability of a linear least squares method. We do this
    by periodically re-training the last hidden layer of a DRL network with a batch
    least squares update. Key to our approach is a Bayesian regularization term for
    the least squares update, which prevents over-fitting to the more recent data.
    We tested LS-DQN on five Atari games and demonstrate significant improvement
    over vanilla DQN and Double-DQN. We also investigated the reasons for the
    superior performance of our method. Interestingly, we found that the
    performance improvement can be attributed to the large batch size used by the
    LS method when optimizing the last layer.

    Learning Semantic Relatedness From Human Feedback Using Metric Learning

    Thomas Niebler, Martin Becker, Christian Pölitz, Andreas Hotho
    Subjects: Computation and Language (cs.CL); Learning (cs.LG)

    Assessing the degree of semantic relatedness between words is an important
    task with a variety of semantic applications, such as ontology learning for the
    Semantic Web, semantic search or query expansion. To accomplish this in an
    automated fashion, many relatedness measures have been proposed. However, most
    of these metrics only encode information contained in the underlying corpus and
    thus do not directly model human intuition. To solve this, we propose to
    utilize a metric learning approach to improve existing semantic relatedness
    measures by learning from additional information, such as explicit human
    feedback. For this, we argue to use word embeddings instead of traditional
    high-dimensional vector representations in order to leverage their semantic
    density and to reduce computational cost. We rigorously test our approach on
    several domains including tagging data as well as publicly available embeddings
    based on Wikipedia texts and navigation. Human feedback about semantic
    relatedness for learning and evaluation is extracted from publicly available
    datasets such as MEN or WS-353. We find that our method can significantly
    improve semantic relatedness measures by learning from additional information,
    such as explicit human feedback. For tagging data, we are the first to generate
    and study embeddings. Our results are of special interest for ontology and
    recommendation engineers, but also for any other researchers and practitioners
    of Semantic Web techniques.

    CrossNets : A New Approach to Complex Learning

    Chirag Agarwal, Mehdi Sharifzhadeh, Joe Klobusicky, Dan Schonfeld
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a novel neural network structure called CrossNets, which considers
    architectures on directed acyclic graphs. This structure builds on previous
    generalizations of feed forward models, such as ResNets, by allowing for all
    forward cross connections between layers (both adjacent and non-adjacent). The
    addition of cross connections among the network increases information flow
    across the whole network, leading to better training and testing performances.
    The superior performance of the network is tested against four benchmark
    datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN. We conclude with a proof of
    convergence for Crossnets to a local minimum for error, where weights for
    connections are chosen through backpropagation with momentum.

    DeepMasterPrint: Generating Fingerprints for Presentation Attacks

    Philip Bontrager, Julian Togelius, Nasir Memon
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Learning (cs.LG)

    We present two related methods for creating MasterPrints, synthetic
    fingerprints that a fingerprint verification system identifies as many
    different people. Both methods start with training a Generative Adversarial
    Network (GAN) on a set of real fingerprint images. The generator network is
    then used to search for images that can be recognized as multiple individuals.
    The first method uses evolutionary optimization in the space of latent
    variables, and the second uses gradient-based search. Our method is able to
    design a MasterPrint that a commercial fingerprint system matches to 22% of all
    users in a strict security setting, and 75% of all users at a looser security
    setting.

    Balanced Policy Evaluation and Learning

    Nathan Kallus
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

    We present a new approach to the problems of evaluating and learning
    personalized decision policies from observational data of past contexts,
    decisions, and outcomes. Only the outcome of the enacted decision is available
    and the historical policy is unknown. These problems arise in personalized
    medicine using electronic health records and in internet advertising. Existing
    approaches use inverse propensity weighting (or, doubly robust versions) to
    make historical outcome (or, residual) data look like it were generated by a
    new policy being evaluated or learned. But this relies on a plug-in approach
    that rejects data points with a decision that disagrees with the new policy,
    leading to high variance estimates and ineffective learning. We propose a new,
    balance-based approach that too makes the data look like the new policy but
    does so directly by finding weights that optimize for balance between the
    weighted data and the target policy in the given, finite sample, which is
    equivalent to minimizing worst-case or posterior conditional mean square error.
    Our policy learner proceeds as a two-level optimization problem over policies
    and weights. We demonstrate that this approach markedly outperforms existing
    ones both in evaluation and learning, which is unsurprising given the wider
    support of balance-based weights. We establish extensive theoretical
    consistency guarantees and regret bounds that support this empirical success.

    Instrument-Armed Bandits

    Nathan Kallus
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We extend the classic multi-armed bandit (MAB) model to the setting of
    noncompliance, where the arm pull is a mere instrument and the treatment
    applied may differ from it, which gives rise to the instrument-armed bandit
    (IAB) problem. The IAB setting is relevant whenever the experimental units are
    human since free will, ethics, and the law may prohibit unrestricted or forced
    application of treatment. In particular, the setting is relevant in bandit
    models of dynamic clinical trials and other controlled trials on human
    interventions. Nonetheless, the setting has not been fully investigate in the
    bandit literature. We show that there are various and divergent notions of
    regret in this setting, all of which coincide only in the classic MAB setting.
    We characterize the behavior of these regrets and analyze standard MAB
    algorithms. We argue for a particular kind of regret that captures the causal
    effect of treatments but show that standard MAB algorithms cannot achieve
    sublinear control on this regret. Instead, we develop new algorithms for the
    IAB problem, prove new regret bounds for them, and compare them to standard MAB
    algorithms in numerical examples.

    Mixed Membership Word Embeddings for Computational Social Science

    James Foulds
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Word embeddings improve the performance of NLP systems by revealing the
    hidden structural relationships between words. These models have recently risen
    in popularity due to the performance of scalable algorithms trained in the big
    data setting. Despite their success, word embeddings have seen very little use
    in computational social science NLP tasks, presumably due to their reliance on
    big data, and to a lack of interpretability. I propose a probabilistic
    model-based word embedding method which can recover interpretable embeddings,
    without big data. The key insight is to leverage the notion of mixed membership
    modeling, in which global representations are shared, but individual entities
    (i.e. dictionary words) are free to use these representations to uniquely
    differing degrees. Leveraging connections to topic models, I show how to train
    these models in high dimensions using a combination of state-of-the-art
    techniques for word embeddings and topic modeling. Experimental results show an
    improvement in predictive performance of up to 63% in MRR over the skip-gram on
    small datasets. The models are interpretable, as embeddings of topics are used
    to encode embeddings for words (and hence, documents) in a model-based way. I
    illustrate this with two computational social science case studies, on NIPS
    articles and State of the Union addresses.

    Forward Thinking: Building Deep Random Forests

    Kevin Miller, Chris Hettinger, Jeffrey Humpherys, Tyler Jarvis, David Kartchner
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    The success of deep neural networks has inspired many to wonder whether other
    learners could benefit from deep, layered architectures. We present a general
    framework called forward thinking for deep learning that generalizes the
    architectural flexibility and sophistication of deep neural networks while also
    allowing for (i) different types of learning functions in the network, other
    than neurons, and (ii) the ability to adaptively deepen the network as needed
    to improve results. This is done by training one layer at a time, and once a
    layer is trained, the input data are mapped forward through the layer to create
    a new learning problem. The process is then repeated, transforming the data
    through multiple layers, one at a time, rendering a new dataset, which is
    expected to be better behaved, and on which a final output layer can achieve
    good performance. In the case where the neurons of deep neural nets are
    replaced with decision trees, we call the result a Forward Thinking Deep Random
    Forest (FTDRF). We demonstrate a proof of concept by applying FTDRF on the
    MNIST dataset. We also provide a general mathematical formulation that allows
    for other types of deep learning problems to be considered.

    Stability of cross-validation and minmax-optimal number of folds

    Ning Xu, Jian Hong, Timothy Fisher
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

    In this paper, we analyze the properties of cross-validation from the
    perspective of the stability, that is, the difference between the training
    error and the error of the selected model applied to any other finite sample.
    In both the i.i.d. and non-i.i.d. cases, we derive the upper bounds of the
    one-round and average test error, referred to as the one-round/convoluted
    Rademacher-bounds, to quantify the stability of model evaluation for
    cross-validation. We show that the convoluted Rademacher-bounds quantify the
    stability of the out-of-sample performance of the model in terms of its
    training error. We also show that the stability of cross-validation is closely
    related to the sample sizes of the training and test sets, the `heaviness’ in
    the tails of the loss distribution, and the Rademacher complexity of the model
    class. Using the convoluted Rademacher-bounds, we also define the
    minmax-optimal number of folds, at which the performance of the selected model
    on new-coming samples is most stable, for cross-validation. The minmax-optimal
    number of folds also reveals that, given sample size, stability maximization
    (or upper bound minimization) may help to quantify optimality in
    hyper-parameter tuning or other learning tasks with large variation.

    Ensemble Sampling

    Xiuyuan Lu, Benjamin Van Roy
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Thompson sampling has emerged as an effective heuristic for a broad range of
    online decision problems. In its basic form, the algorithm requires computing
    and sampling from a posterior distribution over models, which is tractable only
    for simple special cases. This paper develops ensemble sampling, which aims to
    approximate Thompson sampling while maintaining tractability even in the face
    of complex models such as neural networks. Ensemble sampling dramatically
    expands on the range of applications for which Thompson sampling is viable. We
    establish a theoretical basis that supports the approach and present
    computational results that offer further insight.

    Search Engine Guided Non-Parametric Neural Machine Translation

    Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O.K. Li
    Comments: 8 pages, 4 figures, 2 tables
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)

    In this paper, we extend an attention-based neural machine translation (NMT)
    model by allowing it to access an entire training set of parallel sentence
    pairs even after training. The proposed approach consists of two stages. In the
    first stage–retrieval stage–, an off-the-shelf, black-box search engine is
    used to retrieve a small subset of sentence pairs from a training set given a
    source sentence. These pairs are further filtered based on a fuzzy matching
    score based on edit distance. In the second stage–translation stage–, a novel
    translation model, called translation memory enhanced NMT (TM-NMT), seamlessly
    uses both the source sentence and a set of retrieved sentence pairs to perform
    the translation. Empirical evaluation on three language pairs (En-Fr, En-De,
    and En-Es) shows that the proposed approach significantly outperforms the
    baseline approach and the improvement is more significant when more relevant
    sentence pairs were retrieved.

    Stochastic Recursive Gradient Algorithm for Nonconvex Optimization

    Lam M. Nguyen, Jie Liu, Katya Scheinberg, Martin Takáč
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)

    In this paper, we study and analyze the mini-batch version of StochAstic
    Recursive grAdient algoritHm (SARAH), a method employing the stochastic
    recursive gradient, for solving empirical loss minimization for the case of
    nonconvex losses. We provide a sublinear convergence rate (to stationary
    points) for general nonconvex functions and a linear convergence rate for
    gradient dominated functions, both of which have some advantages compared to
    other modern stochastic gradient algorithms for nonconvex losses.

    AIDE: An algorithm for measuring the accuracy of probabilistic inference algorithms

    Marco F. Cusumano-Towner, Vikash K. Mansinghka
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Approximate probabilistic inference algorithms are central to many fields.
    Examples include sequential Monte Carlo inference in robotics, variational
    inference in machine learning, and Markov chain Monte Carlo inference in
    statistics. A key problem faced by practitioners is measuring the accuracy of
    an approximate inference algorithm on a specific dataset. This paper introduces
    the auxiliary inference divergence estimator (AIDE), an algorithm for measuring
    the accuracy of approximate inference algorithms. AIDE is based on the
    observation that inference algorithms can be treated as probabilistic models
    and the random variables used within the inference algorithm can be viewed as
    auxiliary variables. This view leads to a new estimator for the symmetric KL
    divergence between the output distributions of two inference algorithms. The
    paper illustrates application of AIDE to algorithms for inference in
    regression, hidden Markov, and Dirichlet process mixture models. The
    experiments show that AIDE captures the qualitative behavior of a broad class
    of inference algorithms and can detect failure modes of inference algorithms
    that are missed by standard heuristics.

    How to Train Your DRAGAN

    Naveen Kodali, Jacob Abernethy, James Hays, Zsolt Kira
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Generative Adversarial Networks have emerged as an effective technique for
    estimating data distributions. The basic setup consists of two deep networks
    playing against each other in a zero-sum game setting. However, it is not
    understood if the networks reach an equilibrium eventually and what dynamics
    makes this possible. The current GAN training procedure, which involves
    simultaneous gradient descent, lacks a clear game-theoretic justification in
    the literature. In this paper, we introduce regret minimization as a technique
    to reach equilibrium in games and use this to motivate the use of simultaneous
    GD in GANs. In addition, we present a hypothesis that mode collapse, which is a
    common occurrence in GAN training, happens due to the existence of spurious
    local equilibria in non-convex games. Motivated by these insights, we develop
    an algorithm called DRAGAN that is fast, simple to implement and achieves
    competitive performance in a stable fashion across different architectures,
    datasets (MNIST, CIFAR-10, and CelebA), and divergence measures with almost no
    hyperparameter tuning.

    PixColor: Pixel Recursive Colorization

    Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We propose a novel approach to automatically produce multiple colorized
    versions of a grayscale image. Our method results from the observation that the
    task of automated colorization is relatively easy given a low-resolution
    version of the color image. We first train a conditional PixelCNN to generate a
    low resolution color for a given grayscale image. Then, given the generated
    low-resolution color image and the original grayscale image as inputs, we train
    a second CNN to generate a high-resolution colorization of an image. We
    demonstrate that our approach produces more diverse and plausible colorizations
    than existing methods, as judged by human raters in a “Visual Turing Test”.

    Ensemble Adversarial Training: Attacks and Defenses

    Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, Patrick McDaniel
    Comments: 14 pages, 3 figures
    Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)

    Machine learning models are vulnerable to adversarial examples, inputs
    maliciously perturbed to mislead the model. These inputs transfer between
    models, thus enabling black-box attacks against deployed models. Adversarial
    training increases robustness to attacks by injecting adversarial examples into
    training data.

    Surprisingly, we find that although adversarially trained models exhibit
    strong robustness to some white-box attacks (i.e., with knowledge of the model
    parameters), they remain highly vulnerable to transferred adversarial examples
    crafted on other models. We show that the reason for this vulnerability is the
    model’s decision surface exhibiting sharp curvature in the vicinity of the data
    points, thus hindering attacks based on first-order approximations of the
    model’s loss, but permitting black-box attacks that use adversarial examples
    transferred from another model.

    We harness this observation in two ways: First, we propose a simple yet
    powerful novel attack that first applies a small random perturbation to an
    input, before finding the optimal perturbation under a first-order
    approximation. Our attack outperforms prior “single-step” attacks on models
    trained with or without adversarial training.

    Second, we propose Ensemble Adversarial Training, an extension of adversarial
    training that additionally augments training data with perturbed inputs
    transferred from a number of fixed pre-trained models. On MNIST and ImageNet,
    ensemble adversarial training vastly improves robustness to black-box attacks.

    Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image Generation

    Lei Cai, Hongyang Gao, Shuiwang Ji
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Variational auto-encoder (VAE) is a powerful unsupervised learning framework
    for image generation. One drawback of VAE is that it generates blurry images
    due to its Gaussianity assumption and thus L2 loss. To allow the generation of
    high quality images by VAE, we increase the capacity of decoder network by
    employing residual blocks and skip connections, which also enable efficient
    optimization. To overcome the limitation of L2 loss, we propose to generate
    images in a multi-stage manner from coarse to fine. In the simplest case, the
    proposed multi-stage VAE divides the decoder into two components in which the
    second component generates refined images based on the course images generated
    by the first component. Since the second component is independent of the VAE
    model, it can employ other loss functions beyond the L2 loss and different
    model architectures. The proposed framework can be easily generalized to
    contain more than two components. Experiment results on the MNIST and CelebA
    datasets demonstrate that the proposed multi-stage VAE can generate sharper
    images as compared to those from the original VAE.

    Espresso: Efficient Forward Propagation for BCNNs

    Fabrizio Pedersoli, George Tzanetakis, Andrea Tagliasacchi
    Comments: 10 pages, 4 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There are many applications scenarios for which the computational performance
    and memory footprint of the prediction phase of Deep Neural Networks (DNNs)
    needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an
    effective way of achieving this objective. In this paper, we show how
    Convolutional Neural Networks (CNNs) can be implemented using binary
    representations. Espresso is a compact, yet powerful library written in C/CUDA
    that features all the functionalities required for the forward propagation of
    CNNs, in a binary file less than 400KB, without any external dependencies.
    Although it is mainly designed to take advantage of massive GPU parallelism,
    Espresso also provides an equivalent CPU implementation for CNNs. Espresso
    provides special convolutional and dense layers for BCNNs, leveraging
    bit-packing and bit-wise computations for efficient execution. These techniques
    provide a speed-up of matrix-multiplication routines, and at the same time,
    reduce memory usage when storing parameters and activations. We experimentally
    show that Espresso is significantly faster than existing implementations of
    optimized binary neural networks ((approx) 2 orders of magnitude). Espresso is
    released under the Apache 2.0 license and is available at
    this http URL

    Relaxed Wasserstein with Applications to GANs

    Xin Guo, Johnny Hong, Tianyi Lin, Nan Yang
    Comments: 9 pages, 24 figures, submitted to NIPS 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Generative Adversarial Networks (GANs) provide a versatile class of models
    for generative modeling. To improve the performance of machine learning models,
    there has recently been interest in designing objective functions based on
    Wasserstein distance rather than Jensen-Shannon (JS) divergence. In this paper,
    we propose a novel asymmetric statistical divergence called Relaxed Wasserstein
    (RW) divergence as a generalization of Wasserstein-(L^2) distance of order 2.
    We show that RW is dominated by Total Variation (TV) and Wasserstein-(L^2)
    distance, and establish continuity, differentiability, and duality
    representation of RW divergence. Finally, we provide a nonasymptotic moment
    estimate and a concentration inequality for RW divergence. Our experiments show
    that RWGANs with Kullback-Leibler (KL) divergence produce recognizable images
    with a ReLU Multi-Layer Perceptron (MLP) generator in fewer iterations,
    compared to Wasserstein-(L^1) GAN (WGAN).

    Clustering under Local Stability: Bridging the Gap between Worst-Case and Beyond Worst-Case Analysis

    Maria-Florina Balcan, Colin White
    Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    Recently, there has been substantial interest in clustering research that
    takes a beyond worst-case approach to the analysis of algorithms. The typical
    idea is to design a clustering algorithm that outputs a near-optimal solution,
    provided the data satisfy a natural stability notion. For example, Bilu and
    Linial (2010) and Awasthi et al. (2012) presented algorithms that output
    near-optimal solutions, assuming the optimal solution is preserved under small
    perturbations to the input distances. A drawback to this approach is that the
    algorithms are often explicitly built according to the stability assumption and
    give no guarantees in the worst case; indeed, several recent algorithms output
    arbitrarily bad solutions even when just a small section of the data does not
    satisfy the given stability notion.

    In this work, we address this concern in two ways. First, we provide
    algorithms that inherit the worst-case guarantees of clustering approximation
    algorithms, while simultaneously guaranteeing near-optimal solutions when the
    data is stable. Our algorithms are natural modifications to existing
    state-of-the-art approximation algorithms. Second, we initiate the study of
    local stability, which is a property of a single optimal cluster rather than an
    entire optimal solution. We show our algorithms output all optimal clusters
    which satisfy stability locally. Specifically, we achieve strong positive
    results in our local framework under recent stability notions including metric
    perturbation resilience (Angelidakis et al. 2017) and robust perturbation
    resilience (Balcan and Liang 2012) for the (k)-median, (k)-means, and
    symmetric/asymmetric (k)-center objectives.


    Information Theory

    Finite Blocklength Rates over a Fading Channel with CSIT and CSIR

    Deekshith P K, Vinod Sharma
    Comments: 6 pages, 2 figures
    Subjects: Information Theory (cs.IT)

    In this work, we obtain lower and upper bounds on the maximal transmission
    rate at a given codeword length (n), average probability of error (epsilon)
    and power constraint (ar{P}), over a block fading additive white Gaussian
    noise (AWGN) channel with channel state information (CSI) at the transmitter
    and the receiver. These bounds characterize deviation of the finite blocklength
    coding rates from the channel capacity which is in turn achieved by the water
    filling power allocation across time. The bounds obtained also characterize the
    rate enhancement possible due to the CSI at the transmitter in the finite
    blocklength regime. The results are further elucidated via numerical examples.

    An improvement of the asymptotic Elias bound for non-binary codes

    Krishna Kaipa
    Subjects: Information Theory (cs.IT)

    For non-binary codes the Elias bound is a good upper bound for the asymptotic
    information rate at low relative minimum distance, where as the Plotkin bound
    is better at high relative minimum distance. In this work, we obtain a hybrid
    of these bounds which improves both. This in turn is based on the anticode
    bound which is a hybrid of the Hamming and Singleton bounds and improves both
    bounds.

    The question of convexity of the asymptotic rate function is an important
    open question. We conjecture a much weaker form of the convexity, and we show
    that our bounds follow immediately if we assume the conjecture.

    Compressed Sensing with Prior Information via Maximizing Correlation

    Xu Zhang, Wei Cui, Yulong Liu
    Comments: To appear in Proceedings of IEEE International Symposium on Information Theory 2017
    Subjects: Information Theory (cs.IT)

    Compressed sensing (CS) with prior information concerns the problem of
    reconstructing a sparse signal with the aid of a similar signal which is known
    beforehand. We consider a new approach to integrate the prior information into
    CS via maximizing the correlation between the prior knowledge and the desired
    signal. We then present a geometric analysis for the proposed method under
    sub-Gaussian measurements. Our results reveal that if the prior information is
    good enough, then the proposed approach can improve the performance of the
    standard CS. Simulations are provided to verify our results.

    A decoding algorithm for Twisted Gabidulin codes

    Tovohery Randrianarisoa, Joachim Rosenthal
    Comments: This paper was submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    In this work, we modify the decoding algorithm for subspace codes by Koetter
    and Kschischang to get a decoding algorithm for (generalized) twisted Gabidulin
    codes. The decoding algorithm we present applies to cases where the code is
    linear over the base field (mathbb{F}_q) but not linear over
    (mathbb{F}_{q^n}).

    Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk

    Paul Hand, Vladislav Voroninski
    Subjects: Information Theory (cs.IT); Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR)

    We examine the theoretical properties of enforcing priors provided by
    generative deep neural networks via empirical risk minimization. In particular
    we consider two models, one in which the task is to invert a generative neural
    network given access to its last layer and another which entails recovering a
    latent code in the domain of a generative neural network from compressive
    linear observations of its last layer. We establish that in both cases, in
    suitable regimes of network layer sizes and a randomness assumption on the
    network weights, that the non-convex objective function given by empirical risk
    minimization does not have any spurious stationary points. That is, we
    establish that with high probability, at any point away from small
    neighborhoods around two scalar multiples of the desired solution, there is a
    descent direction. These results constitute the first theoretical guarantees
    which establish the favorable global geometry of these non-convex optimization
    problems, and bridge the gap between the empirical success of deep learning and
    a rigorous understanding of non-linear inverse problems.

    An Overview of Massive MIMO Research at the University of Bristol

    Paul Harris, Wael Boukley Hasan, Henry Brice, Benny Chitambira, Mark Beach, Evangelos Mellios, Andrew Nix, Simon Armour, Angela Doufexi
    Comments: Presented at the IET Radio Propagation and Technologies for 5G Conference (2016). 5 pages
    Subjects: Information Theory (cs.IT)

    Massive MIMO has rapidly gained popularity as a technology crucial to the
    capacity advances required for 5G wireless systems. Since its theoretical
    conception six years ago, research activity has grown exponentially, and there
    is now a developing industrial interest to commercialise the technology. For
    this to happen effectively, we believe it is crucial that further pragmatic
    research is conducted with a view to establish how reality differs from
    theoretical ideals. This paper presents an overview of the massive MIMO
    research activities occurring within the Communication Systems & Networks Group
    at the University of Bristol centred around our 128-antenna real-time testbed,
    which has been developed through the BIO programmable city initiative in
    collaboration with NI and Lund University. Through recent preliminary trials,
    we achieved a world first spectral efficiency of 79.4 bits/s/Hz, and
    subsequently demonstrated that this could be increased to 145.6 bits/s/Hz. We
    provide a summary of this work here along with some of our ongoing research
    directions such as large-scale array wave-front analysis, optimised power
    control and localisation techniques.

    On the Phase Transition of Corrupted Sensing

    Huan Zhang, Yulong Liu, Hong Lei
    Comments: To appear in Proceedings of IEEE International Symposium on Information Theory 2017
    Subjects: Information Theory (cs.IT)

    In cite{FOY2014}, a sharp phase transition has been numerically observed
    when a constrained convex procedure is used to solve the corrupted sensing
    problem. In this paper, we present a theoretical analysis for this phenomenon.
    Specifically, we establish the threshold below which this convex procedure
    fails to recover signal and corruption with high probability. Together with the
    work in cite{FOY2014}, we prove that a sharp phase transition occurs around
    the sum of the squares of spherical Gaussian widths of two tangent cones.
    Numerical experiments are provided to demonstrate the correctness and sharpness
    of our results.

    A Note on the Information-Theoretic-(in)Security of Fading Generated Secret Keys

    Robert Malaney
    Comments: 4 pages, 2 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
    Subjects: Information Theory (cs.IT); Quantum Physics (quant-ph)

    In this work we explore the security of secret keys generated via the
    electromagnetic reciprocity of the wireless fading channel. Identifying a new
    sophisticated colluding attack, we explore the information-theoretic-security
    for such keys in the presence of an all-powerful adversary constrained only by
    the laws of quantum mechanics. Specifically, we calculate the reduction in the
    conditional mutual information between transmitter and receiver that can occur
    when an adversary with unlimited computational and communication resources
    places directional antenna interceptors at chosen locations. Such locations, in
    principal, can be arbitrarily far from the intended receiver yet still
    influence the secret key rate.

    Corrupted Sensing with Sub-Gaussian Measurements

    Jinchi Chen, Yulong Liu
    Comments: To appear in Proceedings of IEEE International Symposium on Information Theory 2017
    Subjects: Information Theory (cs.IT)

    This paper studies the problem of accurately recovering a structured signal
    from a small number of corrupted sub-Gaussian measurements. We consider three
    different procedures to reconstruct signal and corruption when different kinds
    of prior knowledge are available. In each case, we provide conditions for
    stable signal recovery from structured corruption with added unstructured
    noise. The key ingredient in our analysis is an extended matrix deviation
    inequality for isotropic sub-Gaussian matrices.

    On Capacity of Noncoherent MIMO with Asymmetric Link Strengths

    J. Sebastian, A. Sengupta, S. N. Diggavi
    Subjects: Information Theory (cs.IT)

    We study the generalized degrees of freedom (gDoF) of the block-fading
    noncoherent MIMO channel with asymmetric distributions of link strengths, and a
    coherence time of (T) symbol durations. We first derive the optimal signaling
    structure for communication over this channel, which is distinct from that for
    the i.i.d MIMO setting. We prove that for (T=1), the gDoF is zero for MIMO
    channels with arbitrary link strength distributions, extending the result for
    MIMO with i.i.d links. We then show that selecting the statistically best
    antenna is gDoF-optimal for both Multiple Input Single Output (MISO) and Single
    Input Multiple Output (SIMO) channels. We also derive the gDoF for the
    (2 imes2) MIMO channel with different exponents in the direct and cross links.
    In this setting, we show that it is always necessary to use both antennas to
    achieve the optimal gDoF, in contrast to the results for (2 imes2) MIMO with
    identical link distributions.

    Full-Duplex Bidirectional Secure Communications under Perfect and Distributionally Ambiguous Eavesdropper's CSI

    Qiang Li, Ying Zhang, Jingran Lin, Sissi Xiaoxiao Wu
    Comments: Accepted by IEEE Transactions Signal Procesing, 14 pages, 8 figures
    Subjects: Information Theory (cs.IT)

    Consider a full-duplex (FD) bidirectional secure communication system, where
    two communication nodes, named Alice and Bob, simultaneously transmit and
    receive confidential information from each other, and an eavesdropper, named
    Eve, overhears the transmissions. Our goal is to maximize the sum secrecy rate
    (SSR) of the bidirectional transmissions by optimizing the transmit covariance
    matrices at Alice and Bob. To tackle this SSR maximization (SSRM) problem, we
    develop an alternating difference-of-concave (ADC) programming approach to
    alternately optimize the transmit covariance matrices at Alice and Bob. We show
    that the ADC iteration has a semi-closed-form beamforming solution, and is
    guaranteed to converge to a stationary solution of the SSRM problem. Besides
    the SSRM design, this paper also deals with a robust SSRM transmit design under
    a moment-based random channel state information (CSI) model, where only some
    roughly estimated first and second-order statistics of Eve’s CSI are available,
    but the exact distribution or other high-order statistics is not known. This
    moment-based error model is new and different from the widely used
    bounded-sphere error model and the Gaussian random error model. Under the
    consider CSI error model, the robust SSRM is formulated as an outage
    probability-constrained SSRM problem. By leveraging the Lagrangian duality
    theory and DC programming, a tractable safe solution to the robust SSRM problem
    is derived. The effectiveness and the robustness of the proposed designs are
    demonstrated through simulations.

    Polar Coding for Parallel Gaussian Channel

    David Tse, Bin Li, Kai Chen
    Subjects: Information Theory (cs.IT)

    In this paper, we propose a Polar coding scheme for parallel Gaussian
    channels. The encoder knows the sum rate of the parallel channels but does not
    know the rate of any channel. By using the nesting property of Polar code, we
    design a coding/decoding scheme to achieve the sum rates.

    Coexistence of RF-powered IoT and a Primary Wireless Network with Secrecy Guard Zones

    Mustafa A. Kishk, Harpreet S. Dhillon
    Subjects: Information Theory (cs.IT)

    This paper studies the secrecy performance of a wireless network (primary
    network) overlaid with an ambient RF energy harvesting IoT network (secondary
    network). The nodes in the secondary network are assumed to be solely powered
    by ambient RF energy harvested from the transmissions of the primary network.
    We assume that the secondary nodes can eavesdrop on the primary transmissions
    due to which the primary network uses secrecy guard zones. The primary
    transmitter goes silent if any secondary receiver is detected within its guard
    zone. Using tools from stochastic geometry, we derive the probability of
    successful connection of the primary network as well as the probability of
    secure communication. Two conditions must be jointly satisfied in order to
    ensure successful connection: (i) the SINR at the primary receiver is above a
    predefined threshold, and (ii) the primary transmitter is not silent. In order
    to ensure secure communication, the SINR value at each of the secondary nodes
    should be less than a predefined threshold. Clearly, when more secondary nodes
    are deployed, more primary transmitters will remain silent for a given guard
    zone radius, thus impacting the amount of energy harvested by the secondary
    network. Our results concretely show the existence of an optimal deployment
    density for the secondary network that maximizes the density of nodes that are
    able to harvest sufficient amount of energy. Furthermore, we show the
    dependence of this optimal deployment density on the guard zone radius of the
    primary network. In addition, we show that the optimal guard zone radius
    selected by the primary network is a function of the deployment density of the
    secondary network. This interesting coupling between the two networks is
    studied using tools from game theory. Overall, this work is one of the few
    concrete works that symbiotically merge tools from stochastic geometry and game
    theory.

    Large System Analysis of Power Normalization Techniques in Massive MIMO

    Meysam Sadeghi, Luca Sanguinetti, Romain Couillet, Chau Yuen
    Comments: 12 pages, 3 figures, Accepted for publication in the IEEE Transactions on Vehicular Technology
    Subjects: Information Theory (cs.IT)

    Linear precoding has been widely studied in the context of Massive
    multiple-input-multiple-output (MIMO) together with two common power
    normalization techniques, namely, matrix normalization (MN) and vector
    normalization (VN). Despite this, their effect on the performance of Massive
    MIMO systems has not been thoroughly studied yet. The aim of this paper is to
    fulfill this gap by using large system analysis. Considering a system model
    that accounts for channel estimation, pilot contamination, arbitrary pathloss,
    and per-user channel correlation, we compute tight approximations for the
    signal-to-interference-plus-noise ratio and the rate of each user equipment in
    the system while employing maximum ratio transmission (MRT), zero forcing (ZF),
    and regularized ZF precoding under both MN and VN techniques. Such
    approximations are used to analytically reveal how the choice of power
    normalization affects the performance of MRT and ZF under uncorrelated fading
    channels. It turns out that ZF with VN resembles a sum rate maximizer while it
    provides a notion of fairness under MN. Numerical results are used to validate
    the accuracy of the asymptotic analysis and to show that in Massive MIMO,
    non-coherent interference and noise, rather than pilot contamination, are often
    the major limiting factors of the considered precoding schemes.

    On deep holes of generalized projective Reed-Solomon codes

    Xiaofan Xu, Shaofang Hong, Yongchao Xu
    Comments: 16 pages
    Subjects: Number Theory (math.NT); Information Theory (cs.IT)

    Determining deep holes is an important topic in decoding Reed-Solomon codes.
    Let (lge 1) be an integer and (a_1,ldots,a_l) be arbitrarily given (l)
    distinct elements of the finite field ({f F}_q) of (q) elements with the odd
    prime number (p) as its characteristic. Let (D={f
    F}_qackslash{a_1,ldots,a_l}) and (k) be an integer such that (2le kle
    q-l-1). In this paper, we study the deep holes of generalized projective
    Reed-Solomon code ({
    m GPRS}_q(D, k)) of length (q-l+1) and dimension (k) over
    ({f F}_q). For any (f(x)in {f F}_q[x]), we let
    (f(D)=(f(y_1),ldots,f(y_{q-l}))) if (D={y_1, …, y_{q-l}}) and
    (c_{k-1}(f(x))) be the coefficient of (x^{k-1}) of (f(x)). By using D”ur’s
    theorem on the relation between the covering radius and minimum distance of
    ({
    m GPRS}_q(D, k)), we show that if (u(x)in {f F}_q[x]) with (deg
    (u(x))=k), then the received codeword ((u(D), c_{k-1}(u(x)))) is a deep hole of
    ({
    m GPRS}_q(D, k)) if and only if the sum (sumlimits_{yin I}y) is nonzero
    for any subset (Isubseteq D) with (#(I)=k). We show also that if (j) is an
    integer with (1leq jleq l) and (u_j(x):= lambda_j(x-a_j)^{q-2}+
    u_j
    x^{k-1}+f_{leq k-2}^{(j)}(x)) with (lambda_jin {f F}_q^*), (
    u_jin {f
    F}_q) and (f_{leq{k-2}}^{(j)}(x)in{f F}_q[x]) being a polynomial of degree
    at most (k-2), then ((u_j(D), c_{k-1}(u_j(x)))) is a deep hole of ({
    m
    GPRS}_q(D, k)) if and only if the sum
    (inom{q-2}{k-1}(-a_j)^{q-1-k}prodlimits_{yin I}(a_j-y)+e) is nonzero for
    any subset (Isubseteq D) with (#(I)=k), where (e) is the identity of the
    group ({f F}_q^*). This implies that ((u_j(D), c_{k-1}(u_j(x)))) is a deep
    hole of ({
    m GPRS}_q(D, k)) if (p|k).

    Information-theoretic analysis of generalization capability of learning algorithms

    Aolin Xu, Maxim Raginsky
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    We derive upper bounds on the generalization error of a learning algorithm in
    terms of the mutual information between its input and output. The upper bounds
    provide theoretical guidelines for striking the right balance between data fit
    and generalization by controlling the input-output mutual information of a
    learning algorithm. The results can also be used to analyze the generalization
    capability of learning algorithms under adaptive composition, and the
    bias-accuracy tradeoffs in adaptive data analytics. Our work extends and leads
    to nontrivial improvements on the recent results of Russo and Zou.

    Detection Estimation and Grid matching of Multiple Targets with Single Snapshot Measurements

    Rakshith Jagannath
    Subjects: Applications (stat.AP); Information Theory (cs.IT)

    In this work, we explore the problems of detecting the number of narrow-band,
    far-field targets and estimating their corresponding directions from single
    snapshot measurements. The principles of sparse signal recovery (SSR) are used
    for the single snapshot detection and estimation of multiple targets. In the
    SSR framework, the DoA estimation problem is grid based and can be posed as the
    lasso optimization problem. However, the SSR framework for DoA estimation gives
    rise to the grid mismatch problem, when the unknown targets (sources) are not
    matched with the estimation grid chosen for the construction of the array
    steering matrix at the receiver. The block sparse recovery framework is known
    to mitigate the grid mismatch problem by jointly estimating the targets and
    their corresponding offsets from the estimation grid using the group lasso
    estimator. The corresponding detection problem reduces to estimating the
    optimal regularization parameter (( au)) of the lasso (in case of perfect
    grid-matching) or group-lasso estimation problem for achieving the required
    probability of correct detection ((P_c)). We propose asymptotic and finite
    sample test statistics for detecting the number of sources with the required
    (P_c) at moderate to high signal to noise ratios. Once the number of sources
    are detected, or equivalently the optimal (hat{ au}) is estimated, the
    corresponding estimation and grid matching of the DoAs can be performed by
    solving the lasso or group-lasso problem at (hat{ au})

    Spatially Controlled Relay Beamforming: (2)-Stage Optimal Policies

    Dionysios S. Kalogerias, Athina P. Petropulu
    Comments: 68 pages, 10 figures, this work constitutes an extended preprint/version of a two part paper (soon to be) submitted for publication to the IEEE Transactions on Signal Processing in Spring/Summer 2017
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Statistics Theory (math.ST); Applications (stat.AP); Methodology (stat.ME)

    The problem of enhancing Quality-of-Service (QoS) in power constrained,
    mobile relay beamforming networks, by optimally and dynamically controlling the
    motion of the relaying nodes, is considered, in a dynamic channel environment.
    We assume a time slotted system, where the relays update their positions before
    the beginning of each time slot. Modeling the wireless channel as a Gaussian
    spatiotemporal stochastic field, we propose a novel (2)-stage stochastic
    programming problem formulation for optimally specifying the positions of the
    relays at each time slot, such that the expected QoS of the network is
    maximized, based on causal Channel State Information (CSI) and under a total
    relay transmit power budget. This results in a schema where, at each time slot,
    the relays, apart from optimally beamforming to the destination, also
    optimally, predictively decide their positions at the next time slot, based on
    causally accumulated experience. Exploiting either the Method of Statistical
    Differentials, or the multidimensional Gauss-Hermite Quadrature Rule, the
    stochastic program considered is shown to be approximately equivalent to a set
    of simple subproblems, which are solved in a distributed fashion, one at each
    relay. Optimality and performance of the proposed spatially controlled system
    are also effectively assessed, under a rigorous technical framework; strict
    optimality is rigorously demonstrated via the development of a version of the
    Fundamental Lemma of Stochastic Control, and, performance-wise, it is shown
    that, quite interestingly, the optimal average network QoS exhibits an
    increasing trend across time slots, despite our myopic problem formulation.
    Numerical simulations are presented, experimentally corroborating the success
    of the proposed approach and the validity of our theoretical predictions.

    Performance Evaluation of Optimal Radio Access Technology Selection Algorithms for LTE-WiFi Network

    Arghyadip Roy, Prasanna Chaporkar, Abhay Karandikar
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    A Heterogeneous Network (HetNet) comprises of multiple Radio Access
    Technologies (RATs) allowing a user to associate with a specific RAT and steer
    to other RATs in a seamless manner. To cope up with the unprecedented growth of
    data traffic, mobile data can be offloaded to Wireless Fidelity (WiFi) in a
    Long Term Evolution (LTE) based HetNet. In this paper, an optimal RAT selection
    problem is considered to maximize the total system throughput in an LTE-WiFi
    system with offload capability. Another formulation is also developed where
    maximizing the total system throughput is subject to a constraint on the voice
    user blocking probability. It is proved that the optimal policies for the
    association and offloading of voice/data users contain threshold structures.
    Based on the threshold structures, we propose algorithms for the association
    and offloading of users in LTE-WiFi HetNet. Simulation results are presented to
    demonstrate the voice user blocking probability and the total system throughput
    performance of the proposed algorithms in comparison to another benchmark
    algorithm.

    Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

    Samet Oymak, Mehrdad Mahdavi, Jiasi Chen
    Comments: 22 pages, 7 figures
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)

    For various applications, the relations between the dependent and independent
    variables are highly nonlinear. Consequently, for large scale complex problems,
    neural networks and regression trees are commonly preferred over linear models
    such as Lasso. This work proposes learning the feature nonlinearities by
    binning feature values and finding the best fit in each quantile using
    non-convex regularized linear regression. The algorithm first captures the
    dependence between neighboring quantiles by enforcing smoothness via
    piecewise-constant/linear approximation and then selects a sparse subset of
    good features. We prove that the proposed algorithm is statistically and
    computationally efficient. In particular, it achieves linear rate of
    convergence while requiring near-minimal number of samples. Evaluations on
    synthetic and real datasets demonstrate that algorithm is competitive with
    current state-of-the-art and accurately learns feature nonlinearities. Finally,
    we explore an interesting connection between the binning stage of our algorithm
    and sparse Johnson-Lindenstrauss matrices.




沪ICP备19023445号-2号
友情链接