IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Fri, 18 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-18 00:00:00
    love 0

    Neural and Evolutionary Computing

    Designing and Training Feedforward Neural Networks: A Smooth Optimisation Perspective

    Hao Shen
    Comments: 15 pages, 2 figures, submitted for publication
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

    Despite the recent great success of deep neural networks in various
    applications, designing and training a deep neural network is still among the
    greatest challenges in the field. In this work, we present a smooth
    optimisation perspective on designing and training multilayer Feedforward
    Neural Networks (FNNs) in the supervised learning setting. By characterising
    the critical point conditions of an FNN based optimisation problem, we identify
    the conditions to eliminate local optima of the corresponding cost function.
    Moreover, by studying the Hessian structure of the cost function at the global
    minima, we develop an approximate Newton FNN algorithm, which is capable of
    alleviating the vanishing gradient problem. Finally, our results are
    numerically verified on two classic benchmarks, i.e., the XOR problem and the
    four region classification problem.

    DelugeNets: Deep Networks with Massive and Flexible Cross-layer Information Inflows

    Jason Kuen, Xiangfei Kong, Gang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Human brains are adept at dealing with the deluge of information they
    continuously receive, by suppressing the non-essential inputs and focusing on
    the important ones. Inspired by such capability, we propose Deluge Networks
    (DelugeNets), a novel class of neural networks facilitating massive cross-layer
    information inflows from preceding layers to succeeding layers. The connections
    between layers in DelugeNets are efficiently established through cross-layer
    depthwise convolutional layers with learnable filters, acting as a flexible
    selection mechanism. By virtue of the massive cross-layer information inflows,
    DelugeNets can propagate information across many layers with greater
    flexibility and utilize network parameters more effectively, compared to
    existing ResNet models. Experiments show the superior performances of
    DelugeNets in terms of both classification accuracies and parameter
    efficiencies. Remarkably, a DelugeNet model with just 20.2M parameters achieve
    state-of-the-art accuracy of 19.02% on CIFAR-100 dataset, outperforming
    DenseNet model with 27.2M parameters.


    Computer Vision and Pattern Recognition

    Video Processing from Electro-optical Sensors for Object Detection and Tracking in Maritime Environment: A Survey

    D. K. Prasad, D. Rajan, L. Rachmawati, E. Rajabaly, C. Quek
    Comments: 23 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a survey on maritime object detection and tracking approaches,
    which are essential for the development of a navigational system for autonomous
    ships. The electro-optical (EO) sensor considered here is a video camera that
    operates in the visible or the infrared spectra, which conventionally
    complement radar and sonar and have demonstrated effectiveness for situational
    awareness at sea has demonstrated its effectiveness over the last few years.
    This paper provides a comprehensive overview of various approaches of video
    processing for object detection and tracking in the maritime environment. We
    follow an approach-based taxonomy wherein the advantages and limitations of
    each approach are compared. The object detection system consists of the
    following modules: horizon detection, static background subtraction and
    foreground segmentation. Each of these has been studied extensively in maritime
    situations and has been shown to be challenging due to the presence of
    background motion especially due to waves and wakes. The main processes
    involved in object tracking include video frame registration, dynamic
    background subtraction, and the object tracking algorithm itself. The
    challenges for robust tracking arise due to camera motion, dynamic background
    and low contrast of tracked object, possibly due to environmental degradation.
    The survey also discusses multisensor approaches and commercial maritime
    systems that use EO sensors. The survey also highlights methods from computer
    vision research which hold promise to perform well in maritime EO data
    processing. Performance of several maritime and computer vision techniques is
    evaluated on newly proposed Singapore Maritime Dataset.

    AutoScaler: Scale-Attention Networks for Visual Correspondence

    Shenlong Wang, Linjie Luo, Ning Zhang, Jia Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Finding visual correspondence between local features is key to many computer
    vision problems. While defining features with larger contextual scales usually
    implies greater discriminativeness, it could also lead to less spatial accuracy
    of the features. We propose AutoScaler, a scale-attention network to explicitly
    optimize this trade-off in visual correspondence tasks. Our network consists of
    a weight-sharing feature network to compute multi-scale feature maps and an
    attention network to combine them optimally in the scale space. This allows our
    network to have adaptive receptive field sizes over different scales of the
    input. The entire network is trained end-to-end in a siamese framework for
    visual correspondence tasks. Our method achieves favorable results compared to
    state-of-the-art methods on challenging optical flow and semantic matching
    benchmarks, including Sintel, KITTI and CUB-2011. We also show that our method
    can generalize to improve hand-crafted descriptors (e.g Daisy) on general
    visual correspondence tasks. Finally, our attention network can generate
    visually interpretable scale attention maps.

    The Freiburg Groceries Dataset

    Philipp Jund, Nichola Abdo, Andreas Eitel, Wolfram Burgard
    Comments: Link to dataset: this http URL Link to code: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    With the increasing performance of machine learning techniques in the last
    few years, the computer vision and robotics communities have created a large
    number of datasets for benchmarking object recognition tasks. These datasets
    cover a large spectrum of natural images and object categories, making them not
    only useful as a testbed for comparing machine learning approaches, but also a
    great resource for bootstrapping different domain-specific perception and
    robotic systems. One such domain is domestic environments, where an autonomous
    robot has to recognize a large variety of everyday objects such as groceries.
    This is a challenging task due to the large variety of objects and products,
    and where there is great need for real-world training data that goes beyond
    product images available online. In this paper, we address this issue and
    present a dataset consisting of 5,000 images covering 25 different classes of
    groceries, with at least 97 images per class. We collected all images from
    real-world settings at different stores and apartments. In contrast to existing
    groceries datasets, our dataset includes a large variety of perspectives,
    lighting conditions, and degrees of clutter. Overall, our images contain
    thousands of different object instances. It is our hope that machine learning
    and robotics researchers find this dataset of use for training, testing, and
    bootstrapping their approaches. As a baseline classifier to facilitate
    comparison, we re-trained the CaffeNet architecture (an adaptation of the
    well-known AlexNet) on our dataset and achieved a mean accuracy of 78.9%. We
    release this trained model along with the code and data splits we used in our
    experiments.

    DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins

    Hamid Reza Hassanzadeh, May D. Wang
    Comments: in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Transcription factors (TFs) are macromolecules that bind to
    extit{cis}-regulatory specific sub-regions of DNA promoters and initiate
    transcription. Finding the exact location of these binding sites (aka motifs)
    is important in a variety of domains such as drug design and development. To
    address this need, several extit{in vivo} and extit{in vitro} techniques
    have been developed so far that try to characterize and predict the binding
    specificity of a protein to different DNA loci. The major problem with these
    techniques is that they are not accurate enough in prediction of the binding
    affinity and characterization of the corresponding motifs. As a result,
    downstream analysis is required to uncover the locations where proteins of
    interest bind. Here, we propose DeeperBind, a long short term recurrent
    convolutional network for prediction of protein binding specificities with
    respect to DNA probes. DeeperBind can model the positional dynamics of probe
    sequences and hence reckons with the contributions made by individual
    sub-regions in DNA sequences, in an effective way. Moreover, it can be trained
    and tested on datasets containing varying-length sequences. We apply our
    pipeline to the datasets derived from protein binding microarrays (PBMs), an
    in-vitro high-throughput technology for quantification of protein-DNA binding
    preferences, and present promising results. To the best of our knowledge, this
    is the most accurate pipeline that can predict binding specificities of DNA
    sequences from the data produced by high-throughput technologies through
    utilization of the power of deep learning for feature generation and positional
    dynamics modeling.

    Examining the Impact of Blur on Recognition by Convolutional Networks

    Igor Vasiljevic, Ayan Chakrabarti, Gregory Shakhnarovich
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    State-of-the-art algorithms for semantic visual tasks—such as image
    classification and semantic segmentation—are based on the use of
    convolutional neural networks. These networks are commonly trained, and
    evaluated, on large annotated datasets of high-quality images that are free of
    artifacts. In this paper, we investigate the effect of one such artifact that
    is quite common in natural capture settings—blur. We show that standard
    pre-trained network models suffer a significant degradation in performance when
    applied to blurred images. We investigate the extent to which this degradation
    is due to the mismatch between training and input image statistics.
    Specifically, we find that fine-tuning a pre-trained model with blurred images
    added to the training set allows it to regain much of the lost accuracy. By
    considering different combinations of sharp and blurred images in the training
    set, we characterize how much degradation is caused by loss of information, and
    how much by the uncertainty of not knowing the nature and magnitude of blur. We
    find that by fine-tuning on a diverse mix of blurred images, convolutional
    neural networks can in fact learn to generate a blur invariant representation
    in their hidden layers. Broadly, our results provide practitioners with useful
    insights for developing vision systems that perform reliably on real world
    images affected by blur.

    Cross-Domain Face Verification: Matching ID Document and Self-Portrait Photographs

    Guilherme Folego, Marcus A. Angeloni, José Augusto Stuchi, Alan Godoy, Anderson Rocha
    Comments: XII WORKSHOP DE VIS~AO COMPUTACIONAL (Campo Grande, Brazil). In XII Workshop de Vis~ao Computacional (pp. 311-316) (2016)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Cross-domain biometrics has been emerging as a new necessity, which poses
    several additional challenges, including harsh illumination changes, noise,
    pose variation, among others. In this paper, we explore approaches to
    cross-domain face verification, comparing self-portrait photographs (“selfies”)
    to ID documents. We approach the problem with proper image photometric
    adjustment and data standardization techniques, along with deep learning
    methods to extract the most prominent features from the data, reducing the
    effects of domain shift in this problem. We validate the methods using a novel
    dataset comprising 50 individuals. The obtained results are promising and
    indicate that the adopted path is worth further investigation.

    Compensating for Large In-Plane Rotations in Natural Images

    Lokesh Boominathan, Suraj Srinivas, R. Venkatesh Babu
    Comments: Accepted at Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP) 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Rotation invariance has been studied in the computer vision community
    primarily in the context of small in-plane rotations. This is usually achieved
    by building invariant image features. However, the problem of achieving
    invariance for large rotation angles remains largely unexplored. In this work,
    we tackle this problem by directly compensating for large rotations, as opposed
    to building invariant features. This is inspired by the neuro-scientific
    concept of mental rotation, which humans use to compare pairs of rotated
    objects. Our contributions here are three-fold. First, we train a Convolutional
    Neural Network (CNN) to detect image rotations. We find that generic CNN
    architectures are not suitable for this purpose. To this end, we introduce a
    convolutional template layer, which learns representations for canonical
    ‘unrotated’ images. Second, we use Bayesian Optimization to quickly sift
    through a large number of candidate images to find the canonical ‘unrotated’
    image. Third, we use this method to achieve robustness to large angles in an
    image retrieval scenario. Our method is task-agnostic, and can be used as a
    pre-processing step in any computer vision system.

    Building Deep Networks on Grassmann Manifolds

    Zhiwu Huang, Jiqing Wu, Luc Van Gool
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Representing the data on Grassmann manifolds is popular in quite a few image
    and video recognition tasks. In order to enable deep learning on Grassmann
    manifolds, this paper proposes a deep network architecture which generalizes
    the Euclidean network paradigm to Grassmann manifolds. In particular, we design
    full rank mapping layers to transform input Grassmannian data into more
    desirable ones, exploit orthogonal re-normalization layers to normalize the
    resulting matrices, study projection pooling layers to reduce the model
    complexity in the Grassmannian context, and devise projection mapping layers to
    turn the resulting Grassmannian data into Euclidean forms for regular output
    layers. To train the deep network, we exploit a stochastic gradient descent
    setting on manifolds where the connection weights reside on, and study a matrix
    generalization of backpropagation to update the structured data. We
    experimentally evaluate the proposed network for three computer vision tasks,
    and show that it has clear advantages over existing Grassmann learning methods,
    and achieves results comparable with state-of-the-art approaches.

    PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

    Xingcheng Zhang, Zhizhong Li, Chen Change Loy, Dahua Lin
    Comments: Tech report
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A number of studies have shown that increasing the depth or width of
    convolutional networks is a rewarding approach to improve the performance of
    image recognition. In our study, however, we observed difficulties along both
    directions. On one hand, the pursuit for very deep networks are met with
    diminishing return and increased training difficulty; on the other hand,
    widening a network would result in a quadratic growth in both computational
    cost and memory demand. These difficulties motivate us to explore structural
    diversity in designing deep networks, a new dimension beyond just depth and
    width. Specifically, we present a new family of modules, namely the
    PolyInception, which can be flexibly inserted in isolation or in a composition
    as replacements of different parts of a network. Choosing PolyInception modules
    with the guidance of architectural efficiency can improve the expressive power
    while preserving comparable computational cost. A benchmark on the ILSVRC 2012
    validation set demonstrates substantial improvements over the state-of-the-art.
    Compared to Inception-ResNet-v2, it reduces the top-5 error on single crops
    from 4.9% to 4.25%, and that on multi-crops from 3.7% to 3.45%.

    Hard-Aware Deeply Cascaded Embedding

    Yuhui Yuan, Kuiyuan Yang, Chao Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Riding on the waves of deep neural networks, deep metric learning has also
    achieved promising results in various tasks using triplet network or Siamese
    network. Though the basic goal of making images from the same category closer
    than the ones from different categories is intuitive, it is hard to directly
    optimize due to the quadratic or cubic sample size. To solve the problem, hard
    example mining which only focuses on a subset of samples that are considered
    hard is widely used. However, hard is defined relative to a model, where
    complex models treat most samples as easy ones and vice versa for simple
    models, and both are not good for training. Samples are also with different
    hard levels, it is hard to define a model with the just right complexity and
    choose hard examples adequately. This motivates us to ensemble a set of models
    with different complexities in cascaded manner and mine hard examples
    adaptively, a sample is judged by a series of models with increasing
    complexities and only updates models that consider the sample as a hard case.
    We evaluate our method on CARS196, CUB-200-2011, Stanford Online Products,
    VehicleID and DeepFashion datasets. Our method outperforms state-of-the-art
    methods by a large margin.

    Factorized Bilinear Models for Image Recognition

    Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Although Deep Convolutional Neural Networks (CNNs) have liberated their power
    in various computer vision tasks, the most important components of CNN,
    convolutional layers and fully connected layers, are still limited to linear
    transformations. In this paper, we propose a novel Factorized Bilinear (FB)
    layer to model the pairwise feature interactions by considering the quadratic
    terms in the transformations. Compared with existing methods that tried to
    incorporate complex non-linearity structures into CNNs, the factorized
    parameterization makes our FB layer only require a linear increase of
    parameters and affordable computational cost. To further reduce the risk of
    overfitting of the FB layer, a specific remedy called DropFactor is devised
    during the training process. We also analyze the connection between FB layer
    and some existing models, and show FB layer is a generalization to them.
    Finally, we validate the effectiveness of FB layer on several widely adopted
    datasets including CIFAR-10, CIFAR-100 and ImageNet, and demonstrate superior
    results compared with various state-of-the-art deep models.

    Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation

    Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, Pascal Fua
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Most recent approaches to monocular 3D human pose estimation rely on Deep
    Learning. They typically involve training a network to regress from an image to
    either 3D joint coordinates directly, or 2D joint locations from which the 3D
    coordinates are inferred by a model-fitting procedure. The former takes
    advantage of 3D cues present in the images but rarely models uncertainty. By
    contrast, the latter often models 2D uncertainty, for example in the form of
    joint location heatmaps, but discards all the image information, such as
    texture, shading and depth cues, in the fitting step.

    In this paper, we therefore propose to jointly model 2D uncertainty and
    leverage 3D image cues in a regression framework for monocular 3D human pose
    estimation. To this end, we introduce a novel two-stream deep architecture. One
    stream focuses on modeling uncertainty via probability maps of 2D joint
    locations and the other exploits 3D cues by directly acting on the image. We
    then study different approaches to fusing their outputs to obtain the final 3D
    prediction. Our experiments evidence in particular that our late-fusion
    mechanism improves upon the state-of-the-art by a large margin on standard 3D
    human pose estimation benchmarks.

    DSAC – Differentiable RANSAC for Camera Localization

    Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, Carsten Rother
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    RANSAC is an important algorithm in robust optimization and a central
    building block for many computer vision applications. In recent years,
    traditionally hand-crafted pipelines have been replaced by deep learning
    pipelines, which can be trained in an end-to-end fashion. However, RANSAC has
    so far not been used as part of such deep learning pipelines, because its
    hypothesis selection procedure is non-differentiable. In this work, we present
    two different ways to overcome this limitation. The most promising approach is
    inspired by reinforcement learning, namely to replace the deterministic
    hypothesis selection by a probabilistic selection for which we can derive the
    expected loss w.r.t. to all learnable parameters. We call this approach DSAC,
    the differentiable counterpart of RANSAC. We apply DSAC to the problem of
    camera localization, where deep learning has so far failed to improve on
    traditional approaches. We demonstrate that by directly minimizing the expected
    loss of the output camera poses, robustly estimated by RANSAC, we achieve an
    increase in accuracy. In the future, any deep learning pipeline can use DSAC as
    a robust optimization component.

    End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo

    Andrey Kuzmin, Dmitry Mikushin, Victor Lempitsky
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a new deep learning-based approach for dense stereo matching.
    Compared to previous works, our approach does not use deep learning of pixel
    appearance descriptors, employing very fast classical matching scores instead.
    At the same time, our approach uses a deep convolutional network to predict the
    local parameters of cost volume aggregation process, which in this paper we
    implement using differentiable domain transform. By treating such transform as
    a recurrent neural network, we are able to train our whole system that includes
    cost volume computation, cost-volume aggregation (smoothing), and
    winner-takes-all disparity selection end-to-end. The resulting method is highly
    efficient at test time, while achieving good matching accuracy. On the KITTI
    2015 benchmark, it achieves a result of 6.34\% error rate while running at 29
    frames per second rate on a modern GPU.

    A Discriminatively Learned CNN Embedding for Person Re-identification

    Zhedong Zheng, Liang Zheng, Yi Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We revisit two popular convolutional neural networks (CNN) in person
    re-identification (re-ID), i.e, verification and classification models. The two
    models have their respective advantages and limitations due to different loss
    functions. In this paper, we shed light on how to combine the two models to
    learn more discriminative pedestrian descriptors. Specifically, we propose a
    new siamese network that simultaneously computes identification loss and
    verification loss. Given a pair of training images, the network predicts the
    identities of the two images and whether they belong to the same identity. Our
    network learns a discriminative embedding and a similarity measurement at the
    same time, thus making full usage of the annotations. Albeit simple, the
    learned embedding improves the state-of-the-art performance on two public
    person re-ID benchmarks. Further, we show our architecture can also be applied
    in image retrieval.

    Learning to detect and localize many objects from few examples

    Bastien Moysset, Christoper Kermorvant, Christian Wolf
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    The current trend in object detection and localization is to learn
    predictions with high capacity deep neural networks trained on a very large
    amount of annotated data and using a high amount of processing power. In this
    work, we propose a new neural model which directly predicts bounding box
    coordinates. The particularity of our contribution lies in the local
    computations of predictions with a new form of local parameter sharing which
    keeps the overall amount of trainable parameters low. Key components of the
    model are spatial 2D-LSTM recurrent layers which convey contextual information
    between the regions of the image. We show that this model is more powerful than
    the state of the art in applications where training data is not as abundant as
    in the classical configuration of natural images and Imagenet/Pascal VOC tasks.
    We particularly target the detection of text in document images, but our method
    is not limited to this setting. The proposed model also facilitates the
    detection of many objects in a single image and can deal with inputs of
    variable sizes without resizing.

    Inverting The Generator Of A Generative Adversarial Network

    Antonia Creswell, Anil Anthony Bharath
    Comments: Accepted at NIPS 2016 Workshop on Adversarial Training
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Generative adversarial networks (GANs) learn to synthesise new samples from a
    high-dimensional distribution by passing samples drawn from a latent space
    through a generative network. When the high-dimensional distribution describes
    images of a particular data set, the network should learn to generate visually
    similar image samples for latent variables that are close to each other in the
    latent space. For tasks such as image retrieval and image classification, it
    may be useful to exploit the arrangement of the latent space by projecting
    images into it, and using this as a representation for discriminative tasks.
    GANs often consist of multiple layers of non-linear computations, making them
    very difficult to invert. This paper introduces techniques for projecting image
    samples into the latent space using any pre-trained GAN, provided that the
    computational graph is available. We evaluate these techniques on both MNIST
    digits and Omniglot handwritten characters. In the case of MNIST digits, we
    show that projections into the latent space maintain information about the
    style and the identity of the digit. In the case of Omniglot characters, we
    show that even characters from alphabets that have not been seen during
    training may be projected well into the latent space; this suggests that this
    approach may have applications in one-shot learning.

    Optical Flow Requires Multiple Strategies (but only one network)

    Tal Schuster, Lior Wolf, David Gadot
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We show that the matching problem that underlies optical flow requires
    multiple strategies, depending on the amount of image motion and other factors.
    We then study the implications of this observation on training a deep neural
    network for representing image patches in the context of descriptor based
    optical flow. We propose a metric learning method, which selects suitable
    negative samples based on the nature of the true match. This type of training
    produces a network that displays multiple strategies depending on the input and
    leads to state of the art results on the KITTI 2012 and KITTI 2015 optical flow
    benchmarks.

    Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization

    Kai Yu, Biao Leng, Zhang Zhang, Dangwei Li, Kaiqi Huang
    Comments: Containing 9 pages and 5 figures. Codes open-sourced on this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    State-of-the-art methods treat pedestrian attribute recognition as a
    multi-label image classification problem. The location information of person
    attributes is usually eliminated or simply encoded in the rigid splitting of
    whole body in previous work. In this paper, we formulate the task in a
    weakly-supervised attribute localization framework. Based on GoogLeNet,
    firstly, a set of mid-level attribute features are discovered by novelly
    designed detection layers, where a max-pooling based weakly-supervised object
    detection technique is used to train these layers with only image-level labels
    without the need of bounding box annotations of pedestrian attributes.
    Secondly, attribute labels are predicted by regression of the detection
    response magnitudes. Finally, the locations and rough shapes of pedestrian
    attributes can be inferred by performing clustering on a fusion of activation
    maps of the detection layers, where the fusion weights are estimated as the
    correlation strengths between each attribute and its relevant mid-level
    features. Extensive experiments are performed on the two currently largest
    pedestrian attribute datasets, i.e. the PETA dataset and the RAP dataset.
    Results show that the proposed method has achieved competitive performance on
    attribute recognition, compared to other state-of-the-art methods. Moreover,
    the results of attribute localization are visualized to understand the
    characteristics of the proposed method.

    SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

    Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Tat-Seng Chua
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Visual attention has been successfully applied in structural prediction tasks
    such as visual captioning and question answering. Existing visual attention
    models are generally spatial, i.e., the attention is modeled as spatial
    probabilities that re-weight the last conv-layer feature map of a CNN which
    encodes an input image. However, we argue that such spatial attention does not
    necessarily conform to the attention mechanism — a dynamic feature extractor
    that combines contextual fixations over time, as CNN features are naturally
    spatial, channel-wise and multi-layer. In this paper, we introduce a novel
    convolutional neural network dubbed SCA-CNN that incorporates Spatial and
    Channel-wise Attentions in a CNN. In the task of image captioning, SCA-CNN
    dynamically modulates the sentence generation context in multi-layer feature
    maps, encoding where (i.e., attentive spatial locations at multiple layers) and
    what (i.e., attentive channels) the visual attention is. We evaluate the
    SCA-CNN architecture on three benchmark image captioning datasets: Flickr8K,
    Flickr30K, and MSCOCO. SCA-CNN achieves significant improvements over
    state-of-the-art visual attention-based image captioning methods.

    Multimodal Memory Modelling for Video Captioning

    Junbo Wang, Wei Wang, Yan Huang, Liang Wang, Tieniu Tan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Video captioning which automatically translates video clips into natural
    language sentences is a very important task in computer vision. By virtue of
    recent deep learning technologies, e.g., convolutional neural networks (CNNs)
    and recurrent neural networks (RNNs), video captioning has made great progress.
    However, learning an effective mapping from visual sequence space to language
    space is still a challenging problem. In this paper, we propose a Multimodal
    Memory Model (M3) to describe videos, which builds a visual and textual shared
    memory to model the long-term visual-textual dependency and further guide
    global visual attention on described targets. Specifically, the proposed M3
    attaches an external memory to store and retrieve both visual and textual
    contents by interacting with video and sentence with multiple read and write
    operations. First, text representation in the Long Short-Term Memory (LSTM)
    based text decoder is written into the memory, and the memory contents will be
    read out to guide an attention to select related visual targets. Then, the
    selected visual information is written into the memory, which will be further
    read out to the text decoder. To evaluate the proposed model, we perform
    experiments on two publicly benchmark datasets: MSVD and MSR-VTT. The
    experimental results demonstrate that our method outperforms the
    state-of-theart methods in terms of BLEU and METEOR.

    Instance-aware Image and Sentence Matching with Selective Multimodal LSTM

    Yan Huang, Wei Wang, Liang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Effective image and sentence matching depends on how to well measure their
    global visual-semantic similarity. Based on the observation that such a global
    similarity arises from a complex aggregation of multiple local similarities
    between pairwise instances of image (objects) and sentence (words), we propose
    a selective multimodal Long Short-Term Memory network (sm-LSTM) for
    instance-aware image and sentence matching. The sm-LSTM includes a multimodal
    context-modulated attention scheme at each timestep that can selectively attend
    to a pair of instances of image and sentence, by predicting pairwise
    instance-aware saliency maps for image and sentence. For selected pairwise
    instances, their representations are obtained based on the predicted saliency
    maps, and then compared to measure their local similarity. By similarly
    measuring multiple local similarities within a few timesteps, the sm-LSTM
    sequentially aggregates them with hidden states to obtain a final matching
    score as the desired global similarity. Extensive experiments show that our
    model can well match image and sentence with complex content, and achieve the
    state-of-the-art results on two public benchmark datasets.

    DelugeNets: Deep Networks with Massive and Flexible Cross-layer Information Inflows

    Jason Kuen, Xiangfei Kong, Gang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Human brains are adept at dealing with the deluge of information they
    continuously receive, by suppressing the non-essential inputs and focusing on
    the important ones. Inspired by such capability, we propose Deluge Networks
    (DelugeNets), a novel class of neural networks facilitating massive cross-layer
    information inflows from preceding layers to succeeding layers. The connections
    between layers in DelugeNets are efficiently established through cross-layer
    depthwise convolutional layers with learnable filters, acting as a flexible
    selection mechanism. By virtue of the massive cross-layer information inflows,
    DelugeNets can propagate information across many layers with greater
    flexibility and utilize network parameters more effectively, compared to
    existing ResNet models. Experiments show the superior performances of
    DelugeNets in terms of both classification accuracies and parameter
    efficiencies. Remarkably, a DelugeNet model with just 20.2M parameters achieve
    state-of-the-art accuracy of 19.02% on CIFAR-100 dataset, outperforming
    DenseNet model with 27.2M parameters.

    Zero-Shot Visual Question Answering

    Damien Teney, Anton van den Hengel
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Part of the appeal of Visual Question Answering (VQA) is its promise to
    answer new questions about previously unseen images. Most current methods
    demand training questions that illustrate every possible concept, and will
    therefore never achieve this capability, since the volume of required training
    data would be prohibitive. Answering general questions about images requires
    methods capable of Zero-Shot VQA, that is, methods able to answer questions
    beyond the scope of the training questions. We propose a new evaluation
    protocol for VQA methods which measures their ability to perform Zero-Shot VQA,
    and in doing so highlights significant practical deficiencies of current
    approaches, some of which are masked by the biases in current datasets. We
    propose and evaluate several strategies for achieving Zero-Shot VQA, including
    methods based on pretrained word embeddings, object classifiers with semantic
    embeddings, and test-time retrieval of example images. Our extensive
    experiments are intended to serve as baselines for Zero-Shot VQA, and they also
    achieve state-of-the-art performance in the standard VQA evaluation setting.

    Deep Action- and Context-Aware Sequence Learning for Activity Recognition and Anticipation

    Mohammad Sadegh Aliakbarian, Fatemehsadat Saleh, Basura Fernando, Mathieu Salzmann, Lars Petersson, Lars Andersson
    Comments: 10 pages, 4 figures, 7 tables. arXiv admin note: text overlap with arXiv:1601.00740 by other authors
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Action recognition and anticipation are key to the success of many computer
    vision applications. Existing methods can roughly be grouped into those that
    extract global, context-aware representations of the entire image or sequence,
    and those that aim at focusing on the regions where the action occurs. While
    the former may suffer from the fact that context is not always reliable, the
    latter completely ignore this source of information, which can nonetheless be
    helpful in many situations. In this paper, we aim at making the best of both
    worlds by developing an approach that leverages both context-aware and
    action-aware features. At the core of our method lies a novel multi-stage
    recurrent architecture that allows us to effectively combine these two sources
    of information throughout a video. This architecture first exploits the global,
    context-aware features, and merges the resulting representation with the
    localized, action-aware ones. Our experiments on standard datasets evidence the
    benefits of our approach over methods that use each information type
    separately. We outperform the state-of-the-art methods that, as us, rely only
    on RGB frames as input for both action recognition and anticipation.

    Deep Feature Interpolation for Image Content Changes

    Paul Upchurch, Jacob Gardner, Kavita Bala, Robert Pless, Noah Snavely, Kilian Weinberger
    Comments: First two authors contributed equally. Submitted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose Deep Feature Interpolation (DFI), a new data-driven baseline for
    automatic high-resolution image transformation. As the name suggests, it relies
    only on simple linear interpolation of deep convolutional features from
    pre-trained convnets. We show that despite its simplicity, DFI can perform
    high-level semantic transformations like “make older/younger”, “make
    bespectacled”, “add smile”, among others, surprisingly well – sometimes even
    matching or outperforming the state-of-the-art. This is particularly unexpected
    as DFI requires no specialized network architecture or even any deep network to
    be trained for these tasks. DFI therefore can be used as a new baseline to
    evaluate more complex algorithms and provides a practical answer to the
    question of which image transformation tasks are still challenging in the rise
    of deep learning.

    On the Exploration of Convolutional Fusion Networks for Visual Recognition

    Yu Liu, Yanming Guo, Michael S. Lew
    Comments: 23rd International Conference on MultiMedia Modeling (MMM 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Despite recent advances in multi-scale deep representations, their
    limitations are attributed to expensive parameters and weak fusion modules.
    Hence, we propose an efficient approach to fuse multi-scale deep
    representations, called convolutional fusion networks (CFN). Owing to using
    1( imes)1 convolution and global average pooling, CFN can efficiently generate
    the side branches while adding few parameters. In addition, we present a
    locally-connected fusion module, which can learn adaptive weights for the side
    branches and form a discriminatively fused feature. CFN models trained on the
    CIFAR and ImageNet datasets demonstrate remarkable improvements over the plain
    CNNs. Furthermore, we generalize CFN to three new tasks, including scene
    recognition, fine-grained recognition and image retrieval. Our experiments show
    that it can obtain consistent improvements towards the transferring tasks.

    Semantic Regularisation for Recurrent Image Annotation

    Feng Liu, Tao Xiang, Timothy M. Hospedales, Wankou Yang, Changyin Sun
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The “CNN-RNN” design pattern is increasingly widely applied in a variety of
    image annotation tasks including multi-label classification and captioning.
    Existing models use the weakly semantic CNN hidden layer or its transform as
    the image embedding that provides the interface between the CNN and RNN. This
    leaves the RNN overstretched with two jobs: predicting the visual concepts and
    modelling their correlations for generating structured annotation output.
    Importantly this makes the end-to-end training of the CNN and RNN slow and
    ineffective due to the difficulty of back propagating gradients through the RNN
    to train the CNN. We propose a simple modification to the design pattern that
    makes learning more effective and efficient. Specifically, we propose to use a
    semantically regularised embedding layer as the interface between the CNN and
    RNN. Regularising the interface can partially or completely decouple the
    learning problems, allowing each to be more effectively trained and jointly
    training much more efficient. Extensive experiments show that state-of-the art
    performance is achieved on multi-label classification as well as image
    captioning.

    Probabilistic Fluorescence-Based Synapse Detection

    Anish K. Simhal, Cecilia Aguerrebere, Forrest Collman, Joshua T. Vogelstein, Kristina D. Micheva, Richard J. Weinberg, Stephen J. Smith, Guillermo Sapiro
    Comments: Current awaiting peer review
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)

    Brain function results from communication between neurons connected by
    complex synaptic networks. Synapses are themselves highly complex and diverse
    signaling machines, containing protein products of hundreds of different genes,
    some in hundreds of copies, arranged in precise lattice at each individual
    synapse. Synapses are fundamental not only to synaptic network function but
    also to network development, adaptation, and memory. In addition, abnormalities
    of synapse numbers or molecular components are implicated in most mental and
    neurological disorders. Despite their obvious importance, mammalian synapse
    populations have so far resisted detailed quantitative study. In human brains
    and most animal nervous systems, synapses are very small and very densely
    packed: there are approximately 1 billion synapses per cubic millimeter of
    human cortex. This volumetric density poses very substantial challenges to
    proteometric analysis at the critical level of the individual synapse. The
    present work describes new probabilistic image analysis methods for
    single-synapse analysis of synapse populations in both animal and human brains.

    Self-calibration-based Approach to Critical Motion Sequences of Rolling-shutter Structure from Motion

    Eisuke Ito, Takayuki Okatani
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we consider critical motion sequences (CMSs) of rolling-shutter
    (RS) SfM. Employing an RS camera model with linearized pure rotation, we show
    that the RS distortion can be approximately expressed by two internal
    parameters of an “imaginary” camera plus one-parameter nonlinear transformation
    similar to lens distortion. We then reformulate the problem as self-calibration
    of the imaginary camera, in which its skew and aspect ratio are unknown and
    varying in the image sequence. In the formulation, we derive a general
    representation of CMSs. We also show that our method can explain the CMS that
    was recently reported in the literature, and then present a new remedy to deal
    with the degeneracy. Our theoretical results agree well with experimental
    results; it explains degeneracies observed when we employ naive bundle
    adjustment, and how they are resolved by our method.


    Artificial Intelligence

    Fast Non-Parametric Tests of Relative Dependency and Similarity

    Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko
    Subjects: Artificial Intelligence (cs.AI)

    We introduce two novel non-parametric statistical hypothesis tests. The first
    test, called the relative test of dependency, enables us to determine whether
    one source variable is significantly more dependent on a first target variable
    or a second. Dependence is measured via the Hilbert-Schmidt Independence
    Criterion (HSIC). The second test, called the relative test of similarity, is
    use to determine which of the two samples from arbitrary distributions is
    significantly closer to a reference sample of interest and the relative measure
    of similarity is based on the Maximum Mean Discrepancy (MMD). To construct
    these tests, we have used as our test statistics the difference of HSIC
    statistics and of MMD statistics, respectively. The resulting tests are
    consistent and unbiased, and have favorable convergence properties. The
    effectiveness of the relative dependency test is demonstrated on several
    real-world problems: we identify languages groups from a multilingual parallel
    corpus, and we show that tumor location is more dependent on gene expression
    than chromosome imbalance. We also demonstrate the performance of the relative
    test of similarity over a broad selection of model comparisons problems in deep
    generative models.

    Optimal Dynamic Coverage Infrastructure for Large-Scale Fleets of Reconnaissance UAVs

    Yaniv Altshuler, Alex Pentland, Shlomo Bekhor, Yoram Shiftan, Alfred Bruckstein
    Comments: 35 pages, 19 figures
    Subjects: Artificial Intelligence (cs.AI)

    Current state of the art in the field of UAV activation relies solely on
    human operators for the design and adaptation of the drones’ flying routes.
    Furthermore, this is being done today on an individual level (one vehicle per
    operators), with some exceptions of a handful of new systems, that are
    comprised of a small number of self-organizing swarms, manually guided by a
    human operator.

    Drones-based monitoring is of great importance in variety of civilian
    domains, such as road safety, homeland security, and even environmental
    control. In its military aspect, efficiently detecting evading targets by a
    fleet of unmanned drones has an ever increasing impact on the ability of modern
    armies to engage in warfare. The latter is true both traditional symmetric
    conflicts among armies as well as asymmetric ones. Be it a speeding driver, a
    polluting trailer or a covert convoy, the basic challenge remains the same —
    how can its detection probability be maximized using as little number of drones
    as possible.

    In this work we propose a novel approach for the optimization of large scale
    swarms of reconnaissance drones — capable of producing on-demand optimal
    coverage strategies for any given search scenario. Given an estimation cost of
    the threat’s potential damages, as well as types of monitoring drones available
    and their comparative performance, our proposed method generates an
    analytically provable strategy, stating the optimal number and types of drones
    to be deployed, in order to cost-efficiently monitor a pre-defined region for
    targets maneuvering using a given roads networks.

    We demonstrate our model using a unique dataset of the Israeli transportation
    network, on which different deployment schemes for drones deployment are
    evaluated.

    Explicable Robot Planning as Minimizing Distance from Expected Behavior

    Anagha Kulkarni, Tathagata Chakraborti, Yantian Zha, Satya Gautam Vadlamudi, Yu Zhang, Subbarao Kambhampati
    Comments: 8 pages, 8 figures
    Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

    In order for robots to be integrated effectively into human work-flows, it is
    not enough to address the question of autonomy but also how their actions or
    plans are being perceived by their human counterparts. When robots generate
    task plans without such considerations, they may often demonstrate what we
    refer to as inexplicable behavior from the point of view of humans who may be
    observing it. This problem arises due to the human observer’s partial or
    inaccurate understanding of the robot’s deliberative process and/or the model
    (i.e. capabilities of the robot) that informs it. This may have serious
    implications on the human-robot work-space, from increased cognitive load and
    reduced trust in the robot from the human, to more serious concerns of safety
    in human-robot interactions. In this paper, we propose to address this issue by
    learning a distance function that can accurately model the notion of
    explicability, and develop an anytime search algorithm that can use this
    measure in its search process to come up with progressively explicable plans.
    As the first step, robot plans are evaluated by human subjects based on how
    explicable they perceive the plan to be, and a scoring function called
    explicability distance based on the different plan distance measures is
    learned. We then use this explicability distance as a heuristic to guide our
    search in order to generate explicable robot plans, by minimizing the plan
    distances between the robot’s plan and the human’s expected plans. We conduct
    our experiments in a toy autonomous car domain, and provide empirical
    evaluations that demonstrate the usefulness of the approach in making the
    planning process of an autonomous agent conform to human expectations.

    Designing and Training Feedforward Neural Networks: A Smooth Optimisation Perspective

    Hao Shen
    Comments: 15 pages, 2 figures, submitted for publication
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

    Despite the recent great success of deep neural networks in various
    applications, designing and training a deep neural network is still among the
    greatest challenges in the field. In this work, we present a smooth
    optimisation perspective on designing and training multilayer Feedforward
    Neural Networks (FNNs) in the supervised learning setting. By characterising
    the critical point conditions of an FNN based optimisation problem, we identify
    the conditions to eliminate local optima of the corresponding cost function.
    Moreover, by studying the Hessian structure of the cost function at the global
    minima, we develop an approximate Newton FNN algorithm, which is capable of
    alleviating the vanishing gradient problem. Finally, our results are
    numerically verified on two classic benchmarks, i.e., the XOR problem and the
    four region classification problem.

    Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance

    Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    At the core of interpretable machine learning is the question of whether
    humans are able to make accurate predictions about a model’s behavior. Assumed
    in this question are three properties of the interpretable output: coverage,
    precision, and effort. Coverage refers to how often humans think they can
    predict the model’s behavior, precision to how accurate humans are in those
    predictions, and effort is either the up-front effort required in interpreting
    the model, or the effort required to make predictions about a model’s behavior.

    In this work, we propose anchor-LIME (aLIME), a model-agnostic technique that
    produces high-precision rule-based explanations for which the coverage
    boundaries are very clear. We compare aLIME to linear LIME with simulated
    experiments, and demonstrate the flexibility of aLIME with qualitative examples
    from a variety of domains and tasks.

    Learning to reinforcement learn

    Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick
    Comments: 17 pages, 7 figures, 1 table
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    In recent years deep reinforcement learning (RL) systems have attained
    superhuman performance in a number of challenging task domains. However, a
    major limitation of such applications is their demand for massive amounts of
    training data. A critical present objective is thus to develop deep RL methods
    that can adapt rapidly to new tasks. In the present work we introduce a novel
    approach to this challenge, which we refer to as deep meta-reinforcement
    learning. Previous work has shown that recurrent networks can support
    meta-learning in a fully supervised context. We extend this approach to the RL
    setting. What emerges is a system that is trained using one RL algorithm, but
    whose recurrent dynamics implement a second, quite separate RL procedure. This
    second, learned RL algorithm can differ from the original one in arbitrary
    ways. Importantly, because it is learned, it is configured to exploit structure
    in the training domain. We unpack these points in a series of seven
    proof-of-concept experiments, each of which examines a key aspect of deep
    meta-RL. We consider prospects for extending and scaling up the approach, and
    also point out some potentially important implications for neuroscience.

    Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

    Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
    Comments: 5 pages, 4 figures, ICASSP-2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Feature subspace selection is an important part in speech emotion
    recognition. Most of the studies are devoted to finding a feature subspace for
    representing all emotions. However, some studies have indicated that the
    features associated with different emotions are not exactly the same. Hence,
    traditional methods may fail to distinguish some of the emotions with just one
    global feature subspace. In this work, we propose a new divide and conquer idea
    to solve the problem. First, the feature subspaces are constructed for all the
    combinations of every two different emotions (emotion-pair). Bi-classifiers are
    then trained on these feature subspaces respectively. The final emotion
    recognition result is derived by the voting and competition method.
    Experimental results demonstrate that the proposed method can get better
    results than the traditional multi-classification method.

    Learning to detect and localize many objects from few examples

    Bastien Moysset, Christoper Kermorvant, Christian Wolf
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    The current trend in object detection and localization is to learn
    predictions with high capacity deep neural networks trained on a very large
    amount of annotated data and using a high amount of processing power. In this
    work, we propose a new neural model which directly predicts bounding box
    coordinates. The particularity of our contribution lies in the local
    computations of predictions with a new form of local parameter sharing which
    keeps the overall amount of trainable parameters low. Key components of the
    model are spatial 2D-LSTM recurrent layers which convey contextual information
    between the regions of the image. We show that this model is more powerful than
    the state of the art in applications where training data is not as abundant as
    in the classical configuration of natural images and Imagenet/Pascal VOC tasks.
    We particularly target the detection of text in document images, but our method
    is not limited to this setting. The proposed model also facilitates the
    detection of many objects in a single image and can deal with inputs of
    variable sizes without resizing.

    Stream Packing for Asynchronous Multi-Context Systems using ASP

    Stefan Ellmauthaler, Jörg Pührer
    Comments: Workshop on Trends and Applications of Answer Set Programming (TAASP 2016)
    Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

    When a processing unit relies on data from external streams, we may face the
    problem that the stream data needs to be rearranged in a way that allows the
    unit to perform its task(s). On arrival of new data, we must decide whether
    there is sufficient information available to start processing or whether to
    wait for more data. Furthermore, we need to ensure that the data meets the
    input specification of the processing step. In the case of multiple input
    streams it is also necessary to coordinate which data from which incoming
    stream should form the input of the next process instantiation. In this work,
    we propose a declarative approach as an interface between multiple streams and
    a processing unit. The idea is to specify via answer-set programming how to
    arrange incoming data in packages that are suitable as input for subsequent
    processing. Our approach is intended for use in asynchronous multi-context
    systems (aMCSs), a recently proposed framework for loose coupling of knowledge
    representation formalisms that allows for online reasoning in a dynamic
    environment. Contexts in aMCSs process data streams from external sources and
    other contexts.

    Zero-Shot Visual Question Answering

    Damien Teney, Anton van den Hengel
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Part of the appeal of Visual Question Answering (VQA) is its promise to
    answer new questions about previously unseen images. Most current methods
    demand training questions that illustrate every possible concept, and will
    therefore never achieve this capability, since the volume of required training
    data would be prohibitive. Answering general questions about images requires
    methods capable of Zero-Shot VQA, that is, methods able to answer questions
    beyond the scope of the training questions. We propose a new evaluation
    protocol for VQA methods which measures their ability to perform Zero-Shot VQA,
    and in doing so highlights significant practical deficiencies of current
    approaches, some of which are masked by the biases in current datasets. We
    propose and evaluate several strategies for achieving Zero-Shot VQA, including
    methods based on pretrained word embeddings, object classifiers with semantic
    embeddings, and test-time retrieval of example images. Our extensive
    experiments are intended to serve as baselines for Zero-Shot VQA, and they also
    achieve state-of-the-art performance in the standard VQA evaluation setting.

    A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

    Shayne Longpre, Sabeek Pradhan, Caiming Xiong, Richard Socher
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    LSTMs have become a basic building block for many deep NLP models. In recent
    years, many improvements and variations have been proposed for deep sequence
    models in general, and LSTMs in particular. We propose and analyze a series of
    architectural modifications for LSTM networks resulting in improved performance
    for text classification datasets. We observe compounding improvements on
    traditional LSTMs using Monte Carlo test-time model averaging, deep vector
    averaging (DVA), and residual connections, along with four other suggested
    modifications. Our analysis provides a simple, reliable, and high quality
    baseline model.


    Information Retrieval

    Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach

    Jianbo Yuan, Walid Shalaby, Mohammed Korayem, David Lin, Khalifeh AlJadda, Jiebo Luo
    Comments: in Big Data, IEEE International Conference on, 2016
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Collaborative Filtering (CF) is widely used in large-scale recommendation
    engines because of its efficiency, accuracy and scalability. However, in
    practice, the fact that recommendation engines based on CF require interactions
    between users and items before making recommendations, make it inappropriate
    for new items which haven’t been exposed to the end users to interact with.
    This is known as the cold-start problem. In this paper we introduce a novel
    approach which employs deep learning to tackle this problem in any CF based
    recommendation engine. One of the most important features of the proposed
    technique is the fact that it can be applied on top of any existing CF based
    recommendation engine without changing the CF core. We successfully applied
    this technique to overcome the item cold-start problem in Careerbuilder’s CF
    based recommendation engine. Our experiments show that the proposed technique
    is very efficient to resolve the cold-start problem while maintaining high
    accuracy of the CF recommendations.


    Computation and Language

    What Do Recurrent Neural Network Grammars Learn About Syntax?

    Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, Noah A. Smith
    Subjects: Computation and Language (cs.CL)

    Recurrent neural network grammars (RNNG) are a recently proposed
    probabilistic generative modeling family for natural language. They show
    state-of-the-art language modeling and parsing performance. We investigate what
    information they learn, from a linguistic perspective, through various
    ablations to the model and the data, and by augmenting the model with an
    attention mechanism (GA-RNNG) to enable closer inspection. We find that
    explicit modeling of composition is crucial for achieving the best performance.
    Through the attention mechanism, we find that headedness plays a central role
    in phrasal representation (with the model’s latent attention largely agreeing
    with predictions made by hand-crafted rules, albeit with some important
    differences). By training grammars without non-terminal labels, we find that
    phrasal representations depend minimally on non-terminals, providing support
    for the endocentricity hypothesis.

    Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization

    Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, Shigeru Katagiri
    Comments: Submitted to ICASSP 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    We examine the effect of the Group Lasso (gLasso) regularizer in selecting
    the salient nodes of Deep Neural Network (DNN) hidden layers by applying a
    DNN-HMM hybrid speech recognizer to TED Talks speech data. We test two types of
    gLasso regularization, one for outgoing weight vectors and another for incoming
    weight vectors, as well as two sizes of DNNs: 2048 hidden layer nodes and 4096
    nodes. Furthermore, we compare gLasso and L2 regularizers. Our experiment
    results demonstrate that our DNN training, in which the gLasso regularizer was
    embedded, successfully selected the hidden layer nodes that are necessary and
    sufficient for achieving high classification power.

    Zero-Shot Visual Question Answering

    Damien Teney, Anton van den Hengel
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Part of the appeal of Visual Question Answering (VQA) is its promise to
    answer new questions about previously unseen images. Most current methods
    demand training questions that illustrate every possible concept, and will
    therefore never achieve this capability, since the volume of required training
    data would be prohibitive. Answering general questions about images requires
    methods capable of Zero-Shot VQA, that is, methods able to answer questions
    beyond the scope of the training questions. We propose a new evaluation
    protocol for VQA methods which measures their ability to perform Zero-Shot VQA,
    and in doing so highlights significant practical deficiencies of current
    approaches, some of which are masked by the biases in current datasets. We
    propose and evaluate several strategies for achieving Zero-Shot VQA, including
    methods based on pretrained word embeddings, object classifiers with semantic
    embeddings, and test-time retrieval of example images. Our extensive
    experiments are intended to serve as baselines for Zero-Shot VQA, and they also
    achieve state-of-the-art performance in the standard VQA evaluation setting.


    Distributed, Parallel, and Cluster Computing

    How Lock-free Data Structures Perform in Dynamic Environments: Models and Analyses

    Aras Atalar, Paul Renaud-Goud, Philippas Tsigas
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper we present two analytical frameworks for calculating the
    performance of lock-free data structures. Lock-free data structures are based
    on retry loops and are called by application-specific routines. In contrast to
    previous work, we consider in this paper lock-free data structures in dynamic
    environments. The size of each of the retry loops, and the size of the
    application routines invoked in between, are not constant but may change
    dynamically. The new frameworks follow two different approaches. The first
    framework, the simplest one, is based on queuing theory. It introduces an
    average-based approach that facilitates a more coarse-grained analysis, with
    the benefit of being ignorant of size distributions. Because of this
    independence from the distribution nature it covers a set of complicated
    designs. The second approach, instantiated with an exponential distribution for
    the size of the application routines, uses Markov chains, and is tighter
    because it constructs stochastically the execution, step by step.

    Both frameworks provide a performance estimate which is close to what we
    observe in practice. We have validated our analysis on (i) several fundamental
    lock-free data structures such as stacks, queues, deques and counters, some of
    them employing helping mechanisms, and (ii) synthetic tests covering a wide
    range of possible lock-free designs. We show the applicability of our results
    by introducing new back-off mechanisms, tested in application contexts, and by
    designing an efficient memory management scheme that typical lock-free
    algorithms can utilize.

    Self-Stabilizing Maximal Matching and Anonymous Networks

    Johanne Cohen, Jonas Lefèvre, Khaled Maâmra, Laurence Pilard, Devan Sohier
    Comments: 17 pages, 4 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We propose a self-stabilizing algorithm for computing a maximal matching in
    an anonymous network. The complexity is (O(n^3)) moves with high probability,
    under the adversarial distributed daemon. In this algorithm, each node can
    determine whether one of its neighbors points to it or to another node, leading
    to a contradiction with the anonymous assumption. To solve this problem, we
    provide under the classical link-register model, a self-stabilizing algorithm
    that gives a unique name to a link such that this name is shared by both
    extremities of the link.

    Parallel multiple selection by regular sampling

    Krzysztof Nowicki
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper we present a deterministic parallel algorithm solving the
    multiple selection problem in congested clique model. In this problem for given
    set of elements S and a set of ranks K = {k_1 , k_2 , …, k_r } we are asking
    for the k_i-th smallest element of S for 1 <= i <= r. The presented algorithm
    is deterministic, time optimal , and needs O(log* r+1 (n)) communication
    rounds, where n is the size of the input set, and r is the size of the rank
    set. This algorithm may be of theoretical interest, as for r = 1 (classic
    selection problem) it gives an improvement in the asymptotic synchronization
    cost over previous O(log log p) communication rounds solution, where p is size
    of clique.

    Fog Computing: A Taxonomy, Survey and Future Directions

    Redowan Mahmud, Rajkumar Buyya
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In recent years, the number of Internet of Things (IoT) devices/sensors has
    increased to a great extent. To support the computational demand of real-time
    latency-sensitive applications of largely geo-distributed IoT devices/sensors,
    a new computing paradigm named “Fog computing” has been introduced. Generally,
    Fog computing resides closer to the IoT devices/sensors and extends the
    Cloud-based computing, storage and networking facilities. In this chapter, we
    comprehensively analyse the challenges in Fogs acting as an intermediate layer
    between IoT devices/ sensors and Cloud datacentres and review the current
    developments in this field. We present a taxonomy of Fog computing according to
    the identified challenges and its key features.We also map the existing works
    to the taxonomy in order to identify current research gaps in the area of Fog
    computing. Moreover, based on the observations, we propose future directions
    for research.


    Learning

    Designing and Training Feedforward Neural Networks: A Smooth Optimisation Perspective

    Hao Shen
    Comments: 15 pages, 2 figures, submitted for publication
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)

    Despite the recent great success of deep neural networks in various
    applications, designing and training a deep neural network is still among the
    greatest challenges in the field. In this work, we present a smooth
    optimisation perspective on designing and training multilayer Feedforward
    Neural Networks (FNNs) in the supervised learning setting. By characterising
    the critical point conditions of an FNN based optimisation problem, we identify
    the conditions to eliminate local optima of the corresponding cost function.
    Moreover, by studying the Hessian structure of the cost function at the global
    minima, we develop an approximate Newton FNN algorithm, which is capable of
    alleviating the vanishing gradient problem. Finally, our results are
    numerically verified on two classic benchmarks, i.e., the XOR problem and the
    four region classification problem.

    Learning to reinforcement learn

    Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick
    Comments: 17 pages, 7 figures, 1 table
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    In recent years deep reinforcement learning (RL) systems have attained
    superhuman performance in a number of challenging task domains. However, a
    major limitation of such applications is their demand for massive amounts of
    training data. A critical present objective is thus to develop deep RL methods
    that can adapt rapidly to new tasks. In the present work we introduce a novel
    approach to this challenge, which we refer to as deep meta-reinforcement
    learning. Previous work has shown that recurrent networks can support
    meta-learning in a fully supervised context. We extend this approach to the RL
    setting. What emerges is a system that is trained using one RL algorithm, but
    whose recurrent dynamics implement a second, quite separate RL procedure. This
    second, learned RL algorithm can differ from the original one in arbitrary
    ways. Importantly, because it is learned, it is configured to exploit structure
    in the training domain. We unpack these points in a series of seven
    proof-of-concept experiments, each of which examines a key aspect of deep
    meta-RL. We consider prospects for extending and scaling up the approach, and
    also point out some potentially important implications for neuroscience.

    A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival

    Hamid Reza Hassanzadeh, John H. Phan, May D. Wang
    Comments: in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Cancer survival prediction is an active area of research that can help
    prevent unnecessary therapies and improve patient’s quality of life. Gene
    expression profiling is being widely used in cancer studies to discover
    informative biomarkers that aid predict different clinical endpoint prediction.
    We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq)
    to predict survival of cancer patients. Despite the wealth of information
    available in expression profiles of cancer tumors, fulfilling the
    aforementioned objective remains a big challenge, for the most part, due to the
    paucity of data samples compared to the high dimension of the expression
    profiles. As such, analysis of transcriptomic data modalities calls for
    state-of-the-art big-data analytics techniques that can maximally use all the
    available data to discover the relevant information hidden within a significant
    amount of noise. In this paper, we propose a pipeline that predicts cancer
    patients’ survival by exploiting the structure of the input (manifold learning)
    and by leveraging the unlabeled samples using Laplacian support vector
    machines, a graph-based semi supervised learning (GSSL) paradigm. We show that
    under certain circumstances, no single modality per se will result in the best
    accuracy and by fusing different models together via a stacked generalization
    strategy, we may boost the accuracy synergistically. We apply our approach to
    two cancer datasets and present promising results. We maintain that a similar
    pipeline can be used for predictive tasks where labeled samples are expensive
    to acquire.

    Relational Multi-Manifold Co-Clustering

    Ping Li, Jiajun Bu, Chun Chen, Zhanying He, Deng Cai
    Comments: 11 pages, 4 figures, published in IEEE Transactions on Cybernetics (TCYB)
    Journal-ref: IEEE Transactions on Cybernetics, 43(6): 1871-1881, 2013
    Subjects: Learning (cs.LG)

    Co-clustering targets on grouping the samples (e.g., documents, users) and
    the features (e.g., words, ratings) simultaneously. It employs the dual
    relation and the bilateral information between the samples and features. In
    many realworld applications, data usually reside on a submanifold of the
    ambient Euclidean space, but it is nontrivial to estimate the intrinsic
    manifold of the data space in a principled way. In this study, we focus on
    improving the co-clustering performance via manifold ensemble learning, which
    is able to maximally approximate the intrinsic manifolds of both the sample and
    feature spaces. To achieve this, we develop a novel co-clustering algorithm
    called Relational Multi-manifold Co-clustering (RMC) based on symmetric
    nonnegative matrix tri-factorization, which decomposes the relational data
    matrix into three submatrices. This method considers the intertype relationship
    revealed by the relational data matrix, and also the intra-type information
    reflected by the affinity matrices encoded on the sample and feature data
    distributions. Specifically, we assume the intrinsic manifold of the sample or
    feature space lies in a convex hull of some pre-defined candidate manifolds. We
    want to learn a convex combination of them to maximally approach the desired
    intrinsic manifold. To optimize the objective function, the multiplicative
    rules are utilized to update the submatrices alternatively. Besides, both the
    entropic mirror descent algorithm and the coordinate descent algorithm are
    exploited to learn the manifold coefficient vector. Extensive experiments on
    documents, images and gene expression data sets have demonstrated the
    superiority of the proposed algorithm compared to other well-established
    methods.

    Unimodal Thompson Sampling for Graph-Structured Arms

    Stefano Paladino, Francesco Trovò, Marcello Restelli, Nicola Gatti
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We study, to the best of our knowledge, the first Bayesian algorithm for
    unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this
    setting, each arm corresponds to a node of a graph and each edge provides a
    relationship, unknown to the learner, between two nodes in terms of expected
    reward. Furthermore, for any node of the graph there is a path leading to the
    unique node providing the maximum expected reward, along which the expected
    reward is monotonically increasing. Previous results on this setting describe
    the behavior of frequentist MAB algorithms. In our paper, we design a Thompson
    Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound
    for the considered setting. We show that -as it happens in a wide number of
    scenarios- Bayesian MAB algorithms dramatically outperform frequentist ones. In
    particular, we provide a thorough experimental evaluation of the performance of
    our and state-of-the-art algorithms as the properties of the graph vary.

    Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

    Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
    Comments: 5 pages, 4 figures, ICASSP-2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Feature subspace selection is an important part in speech emotion
    recognition. Most of the studies are devoted to finding a feature subspace for
    representing all emotions. However, some studies have indicated that the
    features associated with different emotions are not exactly the same. Hence,
    traditional methods may fail to distinguish some of the emotions with just one
    global feature subspace. In this work, we propose a new divide and conquer idea
    to solve the problem. First, the feature subspaces are constructed for all the
    combinations of every two different emotions (emotion-pair). Bi-classifiers are
    then trained on these feature subspaces respectively. The final emotion
    recognition result is derived by the voting and competition method.
    Experimental results demonstrate that the proposed method can get better
    results than the traditional multi-classification method.

    Robust Hashing for Multi-View Data: Jointly Learning Low-Rank Kernelized Similarity Consensus and Hash Functions

    Lin Wu, Yang Wang
    Comments: Accepted to appear in Image and Vision Computing
    Subjects: Learning (cs.LG)

    Learning hash functions/codes for similarity search over multi-view data is
    attracting increasing attention, where similar hash codes are assigned to the
    data objects characterizing consistently neighborhood relationship across
    views. Traditional methods in this category inherently suffer three
    limitations: 1) they commonly adopt a two-stage scheme where similarity matrix
    is first constructed, followed by a subsequent hash function learning; 2) these
    methods are commonly developed on the assumption that data samples with
    multiple representations are noise-free,which is not practical in real-life
    applications; 3) they often incur cumbersome training model caused by the
    neighborhood graph construction using all (N) points in the database ((O(N))).
    In this paper, we motivate the problem of jointly and efficiently training the
    robust hash functions over data objects with multi-feature representations
    which may be noise corrupted. To achieve both the robustness and training
    efficiency, we propose an approach to effectively and efficiently learning
    low-rank kernelized footnote{We use kernelized similarity rather than kernel,
    as it is not a squared symmetric matrix for data-landmark affinity matrix.}
    hash functions shared across views. Specifically, we utilize landmark graphs to
    construct tractable similarity matrices in multi-views to automatically
    discover neighborhood structure in the data. To learn robust hash functions, a
    latent low-rank kernel function is used to construct hash functions in order to
    accommodate linearly inseparable data. In particular, a latent kernelized
    similarity matrix is recovered by rank minimization on multiple kernel-based
    similarity matrices. Extensive experiments on real-world multi-view datasets
    validate the efficacy of our method in the presence of error corruptions.

    Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance

    Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    At the core of interpretable machine learning is the question of whether
    humans are able to make accurate predictions about a model’s behavior. Assumed
    in this question are three properties of the interpretable output: coverage,
    precision, and effort. Coverage refers to how often humans think they can
    predict the model’s behavior, precision to how accurate humans are in those
    predictions, and effort is either the up-front effort required in interpreting
    the model, or the effort required to make predictions about a model’s behavior.

    In this work, we propose anchor-LIME (aLIME), a model-agnostic technique that
    produces high-precision rule-based explanations for which the coverage
    boundaries are very clear. We compare aLIME to linear LIME with simulated
    experiments, and demonstrate the flexibility of aLIME with qualitative examples
    from a variety of domains and tasks.

    Data Science in Service of Performing Arts: Applying Machine Learning to Predicting Audience Preferences

    Jacob Abernethy (University of Michigan), Cyrus Anderson (University of Michigan), Alex Chojnacki (University of Michigan), Chengyu Dai (University of Michigan), John Dryden (University of Michigan), Eric Schwartz (University of Michigan), Wenbo Shen (University of Michigan), Jonathan Stroud (University of Michigan), Laura Wendlandt (University of Michigan), Sheng Yang (University of Michigan), Daniel Zhang (University of Michigan)
    Comments: Presented at the Data For Good Exchange 2016
    Subjects: Applications (stat.AP); Databases (cs.DB); Learning (cs.LG)

    Performing arts organizations aim to enrich their communities through the
    arts. To do this, they strive to match their performance offerings to the taste
    of those communities. Success relies on understanding audience preference and
    predicting their behavior. Similar to most e-commerce or digital entertainment
    firms, arts presenters need to recommend the right performance to the right
    customer at the right time. As part of the Michigan Data Science Team (MDST),
    we partnered with the University Musical Society (UMS), a non-profit performing
    arts presenter housed in the University of Michigan, Ann Arbor. We are
    providing UMS with analysis and business intelligence, utilizing historical
    individual-level sales data. We built a recommendation system based on
    collaborative filtering, gaining insights into the artistic preferences of
    customers, along with the similarities between performances. To better
    understand audience behavior, we used statistical methods from customer-base
    analysis. We characterized customer heterogeneity via segmentation, and we
    modeled customer cohorts to understand and predict ticket purchasing patterns.
    Finally, we combined statistical modeling with natural language processing
    (NLP) to explore the impact of wording in program descriptions. These ongoing
    efforts provide a platform to launch targeted marketing campaigns, helping UMS
    carry out its mission by allocating its resources more efficiently. Celebrating
    its 138th season, UMS is a 2014 recipient of the National Medal of Arts, and it
    continues to enrich communities by connecting world-renowned artists with
    diverse audiences, especially students in their formative years. We aim to
    contribute to that mission through data science and customer analytics.

    Gap Safe screening rules for sparsity enforcing penalties

    Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO)

    In high dimensional regression context, sparsity enforcing penalties have
    proved useful to regularize the data-fitting term. A recently introduced
    technique called emph{screening rules}, leverage the expected sparsity of the
    solutions by ignoring some variables in the optimization, hence leading to
    solver speed-ups. When the procedure is guaranteed not to discard features
    wrongly the rules are said to be emph{safe}. We propose a unifying framework
    that can cope with generalized linear models regularized with standard sparsity
    enforcing penalties such as (ell_1) or (ell_1/ell_2) norms. Our technique
    allows to discard safely more variables than previously considered safe rules,
    particularly for low regularization parameters. Our proposed Gap Safe rules (so
    called because they rely on duality gap computation) can cope with any
    iterative solver but is particularly well suited to block coordinate descent
    for many standard learning tasks: Lasso, Sparse-Group Lasso, multi-task Lasso,
    binary and multinomial logistic regression, etc. For all such tasks and on all
    tested datasets, we report significant speed-ups compared to previously
    proposed safe rules.

    GENESIM: genetic extraction of a single, interpretable model

    Gilles Vandewiele, Olivier Janssens, Femke Ongenae, Filip De Turck, Sofie Van Hoecke
    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Models obtained by decision tree induction techniques excel in being
    interpretable.However, they can be prone to overfitting, which results in a low
    predictive performance. Ensemble techniques are able to achieve a higher
    accuracy. However, this comes at a cost of losing interpretability of the
    resulting model. This makes ensemble techniques impractical in applications
    where decision support, instead of decision making, is crucial.

    To bridge this gap, we present the GENESIM algorithm that transforms an
    ensemble of decision trees to a single decision tree with an enhanced
    predictive performance by using a genetic algorithm. We compared GENESIM to
    prevalent decision tree induction and ensemble techniques using twelve publicly
    available data sets. The results show that GENESIM achieves a better predictive
    performance on most of these data sets than decision tree induction techniques
    and a predictive performance in the same order of magnitude as the ensemble
    techniques. Moreover, the resulting model of GENESIM has a very low complexity,
    making it very interpretable, in contrast to ensemble techniques.

    Inverting The Generator Of A Generative Adversarial Network

    Antonia Creswell, Anil Anthony Bharath
    Comments: Accepted at NIPS 2016 Workshop on Adversarial Training
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Generative adversarial networks (GANs) learn to synthesise new samples from a
    high-dimensional distribution by passing samples drawn from a latent space
    through a generative network. When the high-dimensional distribution describes
    images of a particular data set, the network should learn to generate visually
    similar image samples for latent variables that are close to each other in the
    latent space. For tasks such as image retrieval and image classification, it
    may be useful to exploit the arrangement of the latent space by projecting
    images into it, and using this as a representation for discriminative tasks.
    GANs often consist of multiple layers of non-linear computations, making them
    very difficult to invert. This paper introduces techniques for projecting image
    samples into the latent space using any pre-trained GAN, provided that the
    computational graph is available. We evaluate these techniques on both MNIST
    digits and Omniglot handwritten characters. In the case of MNIST digits, we
    show that projections into the latent space maintain information about the
    style and the identity of the digit. In the case of Omniglot characters, we
    show that even characters from alphabets that have not been seen during
    training may be projected well into the latent space; this suggests that this
    approach may have applications in one-shot learning.

    Boosting Variational Inference

    Fangjian Guo, Xiangyu Wang, Kai Fan, Tamara Broderick, David B. Dunson
    Comments: 13 pages, 2 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Modern Bayesian inference typically requires some form of posterior
    approximation, and mean-field variational inference (MFVI) is an increasingly
    popular choice due to its speed. But MFVI can be inaccurate in various aspects,
    including an inability to capture multimodality in the posterior and
    underestimation of the posterior covariance. These issues arise since MFVI
    considers approximations to the posterior only in a family of factorized
    distributions. We instead consider a much more flexible approximating family
    consisting of all possible finite mixtures of a parametric base distribution
    (e.g., Gaussian). In order to efficiently find a high-quality posterior
    approximation within this family, we borrow ideas from gradient boosting and
    propose boosting variational inference (BVI). BVI iteratively improves the
    current approximation by mixing it with a new component from the base
    distribution family. We develop practical algorithms for BVI and demonstrate
    their performance on both real and simulated data.

    DelugeNets: Deep Networks with Massive and Flexible Cross-layer Information Inflows

    Jason Kuen, Xiangfei Kong, Gang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Human brains are adept at dealing with the deluge of information they
    continuously receive, by suppressing the non-essential inputs and focusing on
    the important ones. Inspired by such capability, we propose Deluge Networks
    (DelugeNets), a novel class of neural networks facilitating massive cross-layer
    information inflows from preceding layers to succeeding layers. The connections
    between layers in DelugeNets are efficiently established through cross-layer
    depthwise convolutional layers with learnable filters, acting as a flexible
    selection mechanism. By virtue of the massive cross-layer information inflows,
    DelugeNets can propagate information across many layers with greater
    flexibility and utilize network parameters more effectively, compared to
    existing ResNet models. Experiments show the superior performances of
    DelugeNets in terms of both classification accuracies and parameter
    efficiencies. Remarkably, a DelugeNet model with just 20.2M parameters achieve
    state-of-the-art accuracy of 19.02% on CIFAR-100 dataset, outperforming
    DenseNet model with 27.2M parameters.

    Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization

    Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, Shigeru Katagiri
    Comments: Submitted to ICASSP 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    We examine the effect of the Group Lasso (gLasso) regularizer in selecting
    the salient nodes of Deep Neural Network (DNN) hidden layers by applying a
    DNN-HMM hybrid speech recognizer to TED Talks speech data. We test two types of
    gLasso regularization, one for outgoing weight vectors and another for incoming
    weight vectors, as well as two sizes of DNNs: 2048 hidden layer nodes and 4096
    nodes. Furthermore, we compare gLasso and L2 regularizers. Our experiment
    results demonstrate that our DNN training, in which the gLasso regularizer was
    embedded, successfully selected the hidden layer nodes that are necessary and
    sufficient for achieving high classification power.

    Algebraic multigrid support vector machines

    Ehsan Sadrfaridpour, Sandeep Jeereddy, Ken Kennedy, Andre Luckow, Talayeh Razzaghi, Ilya Safro
    Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Computation (stat.CO)

    The support vector machine is a flexible optimization-based technique widely
    used for classification problems. In practice, its training part becomes
    computationally expensive on large-scale data sets because of such reasons as
    the complexity and number of iterations in parameter fitting methods,
    underlying optimization solvers, and nonlinearity of kernels. We introduce a
    fast multilevel framework for solving support vector machine models that is
    inspired by the algebraic multigrid. Significant improvement in the running has
    been achieved without any loss in the quality. The proposed technique is highly
    beneficial on imbalanced sets. We demonstrate computational results on publicly
    available and industrial data sets.

    Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach

    Jianbo Yuan, Walid Shalaby, Mohammed Korayem, David Lin, Khalifeh AlJadda, Jiebo Luo
    Comments: in Big Data, IEEE International Conference on, 2016
    Subjects: Information Retrieval (cs.IR); Learning (cs.LG)

    Collaborative Filtering (CF) is widely used in large-scale recommendation
    engines because of its efficiency, accuracy and scalability. However, in
    practice, the fact that recommendation engines based on CF require interactions
    between users and items before making recommendations, make it inappropriate
    for new items which haven’t been exposed to the end users to interact with.
    This is known as the cold-start problem. In this paper we introduce a novel
    approach which employs deep learning to tackle this problem in any CF based
    recommendation engine. One of the most important features of the proposed
    technique is the fact that it can be applied on top of any existing CF based
    recommendation engine without changing the CF core. We successfully applied
    this technique to overcome the item cold-start problem in Careerbuilder’s CF
    based recommendation engine. Our experiments show that the proposed technique
    is very efficient to resolve the cold-start problem while maintaining high
    accuracy of the CF recommendations.


    Information Theory

    Achievable Uplink Rates for Massive MIMO with Coarse Quantization

    Christopher Mollén, Junil Choi, Erik G. Larsson, Robert W. Heath Jr
    Subjects: Information Theory (cs.IT)

    The high hardware complexity of a massive MIMO base station, which requires
    hundreds of radio chains, makes it challenging to build commercially. One way
    to reduce the hardware complexity and power consumption of the receiver is to
    lower the resolution of the analog-to-digital converters (ADCs). We derive an
    achievable rate for a massive MIMO system with arbitrary quantization and use
    this rate to show that ADCs with as low as 3 bits can be used without
    significant performance loss at spectral efficiencies around 3.5 bpcu per user,
    also under interference from stronger transmitters and with some imperfections
    in the automatic gain control.

    Iterative Channel Estimation Using LSE and Sparse Message Passing for MmWave MIMO Systems

    Chongwen Huang (Student Member, IEEE), Lei Liu (Student Member, IEEE), Chau Yuen (Senior Member, IEEE), Sumei Sun (Fellow, IEEE)
    Comments: 31 pages, 10 figures, submitted to IEEE JSAC Special Issue on Millimeter Wave Communications for Future Mobile Networks
    Subjects: Information Theory (cs.IT)

    We propose an iterative channel estimation algorithm based on the Least
    Square Estimation (LSE) and Sparse Message Passing (SMP) algorithm for the
    Millimeter Wave (mmWave) MIMO systems. The channel coefficients of the mmWave
    MIMO are approximately modeled as a Bernoulli-Gaussian distribution since there
    are relatively fewer paths in the mmWave channel, i.e., the channel matrix is
    sparse and only has a few non-zero entries. By leveraging the advantage of
    sparseness, we proposed an algorithm that iteratively detects the exact
    location and value of non-zero entries of the sparse channel matrix. The SMP is
    used to detect the exact location of non-zero entries of the channel matrix,
    while the LSE is used for estimating its value at each iteration. We also
    analyze the Cramer-Rao Lower Bound (CLRB), and show that the proposed algorithm
    is a minimum variance unbiased estimator. Furthermore, we employ the Gaussian
    approximation for message densities under density evolution to simplify the
    analysis of the algorithm, which provides a simple method to predict the
    performance of the proposed algorithm. Numerical experiments show that the
    proposed algorithm has much better performance than the existing sparse
    estimators, especially when the channel is sparse. In addition, our proposed
    algorithm converges to the CRLB of the genie-aided estimation of sparse
    channels in just 5 turbo iterations.

    Decoupled Signal Detection for the Uplink of Large-Scale MIMO Systems in Heterogeneous Networks

    L. Arevalo, R. C. de Lamare, M. Haardt, R. Sampaio-Neto
    Comments: 10 figures
    Subjects: Information Theory (cs.IT)

    Massive multiple-input multiple-output (MIMO) systems are strong candidates
    for future fifth generation (5G) heterogeneous cellular networks. For 5G, a
    network densification with a high number of different classes of users and data
    service requirements is expected. Such a large number of connected devices
    needs to be separated in order to allow the detection of the transmitted
    signals according to different data requirements. In this paper, a decoupled
    signal detection (DSD) technique which allows the separation of the uplink
    signals, for each user class, at the base station (BS) is proposed for massive
    MIMO systems. A mathematical signal model for massive MIMO systems with
    centralized and distributed antennas in heterogeneous networks is also
    developed. The performance of the proposed DSD algorithm is evaluated and
    compared with existing detection schemes in a realistic scenario with
    distributed antennas. A sum-rate analysis and a computational cost study for
    DSD are also presented. Simulation results show an excellent performance of the
    proposed DSD algorithm when combined with linear and successive interference
    cancellation detection techniques.

    Convex Optimization of Distributed Cooperative Detection in Multi-Receiver Molecular Communication

    Yuting Fang, Adam Noel, Nan Yang, Andrew W. Eckford, Rodney A. Kennedy
    Comments: 14 page, 8 figures, submitted to IEEE Transactions on Molecular, Biological and Multi-Scale Communications
    Subjects: Information Theory (cs.IT)

    In this paper, the error performance achieved by cooperative detection among
    K distributed receivers in a diffusion-based molecular communication (MC)
    system is analyzed and optimized. In this system, the receivers first make
    local hard decisions on the transmitted symbol and then report these decisions
    to a fusion center (FC). The FC combines the local hard decisions to make a
    global decision using an N-out-of-K fusion rule. Two reporting scenarios,
    namely, perfect reporting and noisy reporting, are considered. Closed-form
    expressions are derived for the expected global error probability of the system
    for both reporting scenarios. New approximated expressions are also derived for
    the expected error probability. Convex constraints are then found to make the
    approximated expressions jointly convex with respect to the decision thresholds
    at the receivers and the FC. Based on such constraints, suboptimal convex
    optimization problems are formulated and solved to determine the optimal
    decision thresholds which minimize the expected error probability of the
    system. Numerical and simulation results reveal that the system error
    performance is greatly improved by combining the detection information of
    distributed receivers. They also reveal that the solutions to the formulated
    suboptimal convex optimization problems achieve near-optimal global error
    performance.

    Multiple Access Technologies for cellular M2M Communications: An Overview

    Mahyar Shirvanimoghaddam, Sarah Johnson
    Comments: Submitted to ZTE Communications
    Subjects: Information Theory (cs.IT)

    This paper reviews the multiple access techniques for machine-to-machine
    (M2M) communications in future wireless cellular networks. M2M communications
    aims at providing te communication infrastructure for the emerging Internet of
    Things (IoT), which will revolutionize the way we interact with our surrounding
    physical environment. We provide an overview of the multiple access strategies
    and explain their limitations when used for M2M communications. We show the
    throughput efficiency of different multiple access techniques when used in
    coordinated and uncoordinated scenarios. Non-orthogonal multiple access is also
    shown to support a larger number of devices compared to orthogonal multiple
    access techniques, especially in uncoordinated scenarios. We also detail the
    issues and challenges of different multiple access techniques to be used for
    M2M applications in cellular networks.

    Duplication Distance to the Root for Binary Sequences

    Noga Alon, Jehoshua Bruck, Farzad Farnoud, Siddharth Jain
    Comments: submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Genomics (q-bio.GN)

    We study the tandem duplication distance between binary sequences and their
    roots. In other words, the quantity of interest is the number of tandem
    duplication operations of the form (seq x = seq a seq b seq c o seq y =
    seq a seq b seq b seq c), where (seq x) and (seq y) are sequences and
    (seq a), (seq b), and (seq c) are their substrings, needed to generate a
    binary sequence of length (n) starting from a square-free sequence from the set
    ({0,1,01,10,010,101}). This problem is a restricted case of finding the
    duplication/deduplication distance between two sequences, defined as the
    minimum number of duplication and deduplication operations required to
    transform one sequence to the other. We consider both exact and approximate
    tandem duplications. For exact duplication, denoting the maximum distance to
    the root of a sequence of length (n) by (f(n)), we prove that (f(n)=Theta(n)).
    For the case of approximate duplication, where a (eta)-fraction of symbols
    may be duplicated incorrectly, we show that the maximum distance has a sharp
    transition from linear in (n) to logarithmic at (eta=1/2). We also study the
    duplication distance to the root for sequences with a given root and for
    special classes of sequences, namely, the de Bruijn sequences, the Thue-Morse
    sequence, and the Fibbonaci words. The problem is motivated by genomic tandem
    duplication mutations and the smallest number of tandem duplication events
    required to generate a given biological sequence.

    Common Reconstructions in the Successive Refinement Problem with Receiver Side Information

    Badri N. Vellambi, Roy Timo
    Comments: 37 pages, 8 figures. Some of the material in this paper was presented at the 2013 IEEE Information Theory Workshop in Seville, Spain, and the 2014 IEEEInternational Symposium on Information Theory in Honolulu, USA, 2014. This work was supported by the Australian Research Council Discovery Project DP120102123
    Subjects: Information Theory (cs.IT)

    We study a variant of the successive refinement problem with receiver side
    information where the receivers require identical reconstructions. We present
    general inner and outer bounds for the rate region for this variant and present
    a single-letter characterization of the admissible rate region for several
    classes of the joint distribution of the source and the side information. The
    characterization indicates that the side information can be fully used to
    reduce the communication rates via binning; however, the reconstruction
    functions can depend only on the G’acs-K”orner common randomness shared by
    the two receivers. Unlike existing (inner and outer) bounds to the rate region
    of the general successive refinement problem, the characterization of the
    admissible rate region derived for several settings of the variant studied
    requires only one auxiliary random variable. Using the derived
    characterization, we establish that the admissible rate region is not
    continuous in the underlying source source distribution even though the problem
    formulation does not involve zero-error or functional reconstruction
    constraints.

    Maximizing the minimum achievable secrecy rate of two-way relay networks using the null space beamforming method

    Erfan khordad, Soroush Akhlaghi, Meysam Mirzaee
    Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

    This paper concerns maximizing the minimum achievable secrecy rate of a
    two-way relay network in the presence of an eavesdropper, in which two nodes
    aim to exchange messages in two hops, using a multi-antenna relay. Throughout
    the first hop, the two nodes simultaneously transmit their messages to the
    relay. In the second hop, the relay broadcasts a combination of the received
    information to the users such that the transmitted signal lies in the null
    space of the eavesdropper’s channel; this is called null space beamforming
    (NSBF). The best NSBF matrix for maximizing the minimum achievable secrecy rate
    is studied, showing that the problem is not convex in general. To address this
    issue, the problem is divided into three sub-problems: a close-to-optimal
    solution is derived by using the semi-definite relaxation (SDR) technique.
    Simulation results demonstrate the superiority of the proposed method w.r.t.
    the most well-known method addressed in the literature.




沪ICP备19023445号-2号
友情链接