IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Thu, 25 May 2017

    我爱机器学习(52ml.net)发表于 2017-05-25 00:00:00
    love 0

    Neural and Evolutionary Computing

    Fast-Slow Recurrent Neural Networks

    Asier Mujika, Florian Meier, Angelika Steger
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Processing sequential data of variable length is a major challenge in a wide
    range of applications, such as speech recognition, language modeling,
    generative image modeling and machine translation. Here, we address this
    challenge by proposing a novel recurrent neural network (RNN) architecture, the
    Fast-Slow RNN (FS-RNN). The FS-RNN incorporates the strengths of both
    multiscale RNNs and deep transition RNNs as it processes sequential data on
    different timescales and learns complex transition functions from one time step
    to the next. We evaluate the FS-RNN on two character level language modeling
    data sets, Penn Treebank and Hutter Prize Wikipedia, where we improve state of
    the art results to (1.19) and (1.25) bits-per-character (BPC), respectively. In
    addition, an ensemble of two FS-RNNs achieves (1.20) BPC on Hutter Prize
    Wikipedia outperforming the best known compression algorithm with respect to
    the BPC measure. We also present an empirical investigation of the learning and
    network dynamics of the FS-RNN, which explains the improved performance
    compared to other RNN architectures. Our approach is general as any kind of RNN
    cell is a possible building block for the FS-RNN architecture, and thus can be
    flexibly applied to different tasks.

    Dense Transformer Networks

    Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The key idea of current deep learning methods for dense prediction is to
    apply a model on a regular patch centered on each pixel to make pixel-wise
    predictions. These methods are limited in the sense that the patches are
    determined by network architecture instead of learned from data. In this work,
    we propose the dense transformer networks, which can learn the shapes and sizes
    of patches from data. The dense transformer networks employ an encoder-decoder
    architecture, and a pair of dense transformer modules are inserted into each of
    the encoder and decoder paths. The novelty of this work is that we provide
    technical solutions for learning the shapes and sizes of patches from data and
    efficiently restoring the spatial correspondence required for dense prediction.
    The proposed dense transformer modules are differentiable, thus the entire
    network can be trained. We apply the proposed networks on natural and
    biological image segmentation tasks and show superior performance is achieved
    in comparison to baseline methods.

    Flow-GAN: Bridging implicit and prescribed learning in generative models

    Aditya Grover, Manik Dhar, Stefano Ermon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Evaluating the performance of generative models for unsupervised learning is
    inherently challenging due to the lack of well-defined and tractable
    objectives. This is particularly difficult for implicit models such as
    generative adversarial networks (GANs) which perform extremely well in practice
    for tasks such as sample generation, but sidestep the explicit characterization
    of a density.

    We propose Flow-GANs, a generative adversarial network with the generator
    specified as a normalizing flow model which can perform exact likelihood
    evaluation. Subsequently, we learn a Flow-GAN using a hybrid objective that
    integrates adversarial training with maximum likelihood estimation. We show
    empirically the benefits of Flow-GANs on MNIST and CIFAR-10 datasets in
    learning generative models that can attain low generalization error based on
    the log-likelihoods and generate high quality samples. Finally, we show a
    simple, yet hard to beat baseline for GANs based on Gaussian Mixture Models.

    Grounded Recurrent Neural Networks

    Ankit Vani, Yacine Jernite, David Sontag
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In this work, we present the Grounded Recurrent Neural Network (GRNN), a
    recurrent neural network architecture for multi-label prediction which
    explicitly ties labels to specific dimensions of the recurrent hidden state (we
    call this process “grounding”). The approach is particularly well-suited for
    extracting large numbers of concepts from text. We apply the new model to
    address an important problem in healthcare of understanding what medical
    concepts are discussed in clinical text. Using a publicly available dataset
    derived from Intensive Care Units, we learn to label a patient’s diagnoses and
    procedures from their discharge summary. Our evaluation shows a clear advantage
    to using our proposed architecture over a variety of strong baselines.

    Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification

    Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie
    Comments: MICCAI 2017 Camera Ready
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Mammogram classification is directly related to computer-aided diagnosis of
    breast cancer. Traditional methods rely on regions of interest (ROIs) which
    require great efforts to annotate. Inspired by the success of using deep
    convolutional features for natural image analysis and multi-instance learning
    (MIL) for labeling a set of instances/patches, we propose end-to-end trained
    deep multi-instance networks for mass classification based on whole mammogram
    without the aforementioned ROIs. We explore three different schemes to
    construct deep multi-instance networks for whole mammogram classification.
    Experimental results on the INbreast dataset demonstrate the robustness of
    proposed networks compared to previous work using segmentation and detection
    annotations.

    An effective algorithm for hyperparameter optimization of neural networks

    Gonzalo Diaz, Achille Fokoue, Giacomo Nannicini, Horst Samulowitz
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    A major challenge in designing neural network (NN) systems is to determine
    the best structure and parameters for the network given the data for the
    machine learning problem at hand. Examples of parameters are the number of
    layers and nodes, the learning rates, and the dropout rates. Typically, these
    parameters are chosen based on heuristic rules and manually fine-tuned, which
    may be very time-consuming, because evaluating the performance of a single
    parametrization of the NN may require several hours. This paper addresses the
    problem of choosing appropriate parameters for the NN by formulating it as a
    box-constrained mathematical optimization problem, and applying a
    derivative-free optimization tool that automatically and effectively searches
    the parameter space. The optimization tool employs a radial basis function
    model of the objective function (the prediction accuracy of the NN) to
    accelerate the discovery of configurations yielding high accuracy. Candidate
    configurations explored by the algorithm are trained to a small number of
    epochs, and only the most promising candidates receive full training. The
    performance of the proposed methodology is assessed on benchmark sets and in
    the context of predicting drug-drug interactions, showing promising results.
    The optimization tool used in this paper is open-source.


    Computer Vision and Pattern Recognition

    Dense Transformer Networks

    Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The key idea of current deep learning methods for dense prediction is to
    apply a model on a regular patch centered on each pixel to make pixel-wise
    predictions. These methods are limited in the sense that the patches are
    determined by network architecture instead of learned from data. In this work,
    we propose the dense transformer networks, which can learn the shapes and sizes
    of patches from data. The dense transformer networks employ an encoder-decoder
    architecture, and a pair of dense transformer modules are inserted into each of
    the encoder and decoder paths. The novelty of this work is that we provide
    technical solutions for learning the shapes and sizes of patches from data and
    efficiently restoring the spatial correspondence required for dense prediction.
    The proposed dense transformer modules are differentiable, thus the entire
    network can be trained. We apply the proposed networks on natural and
    biological image segmentation tasks and show superior performance is achieved
    in comparison to baseline methods.

    From source to target and back: symmetric bi-directional adaptive GAN

    Paolo Russo, Fabio Maria Carlucci, Tatiana Tommasi, Barbara Caputo
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The effectiveness of generative adversarial approaches in producing images
    according to a specific style or visual domain has recently opened new
    directions to solve the unsupervised domain adaptation problem. It has been
    shown that source labeled images can be modified to mimic target samples making
    it possible to train directly a classifier in the target domain, despite the
    original lack of annotated data. Inverse mappings from the target to the source
    domain have also been evaluated but only passing through adapted feature
    spaces, thus without new image generation.

    In this paper we propose to better exploit the potential of generative
    adversarial networks for adaptation by introducing a novel symmetric mapping
    among domains. We jointly optimize bi-directional image transformations
    combining them with target self-labeling. Moreover we define a new class
    consistency loss that aligns the generators in the two directions imposing to
    conserve the class identity of an image passing through both domain mappings.

    A detailed qualitative and quantitative analysis of the reconstructed images
    confirm the power of our approach. By integrating the two domain specific
    classifiers obtained with our bi-directional network we exceed previous
    state-of-the-art unsupervised adaptation results on four different benchmark
    datasets.

    Optimization of the Jaccard index for image segmentation with the Lovász hinge

    Maxim Berman, Matthew B. Blaschko
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The Jaccard loss, commonly referred to as the intersection-over-union loss,
    is commonly employed in the evaluation of segmentation quality due to its
    better perceptual quality and scale invariance, which lends appropriate
    relevance to small objects compared with per-pixel losses. We present a method
    for direct optimization of the per-image intersection-over-union loss in neural
    networks, in the context of semantic image segmentation, based on a convex
    surrogate: the Lov’asz hinge. The loss is shown to perform better with respect
    to the Jaccard index measure than other losses traditionally used in the
    context of semantic segmentation; such as cross-entropy. We develop a
    specialized optimization method, based on an efficient computation of the
    proximal operator of the Lov’asz hinge, yielding reliably faster and more
    stable optimization than alternatives. We demonstrate the effectiveness of the
    method by showing substantially improved intersection-overunion segmentation
    scores on the Pascal VOC dataset using a state-of-the-art deep learning
    segmentation architecture.

    Adaptive Detrending to Accelerate Convolutional Gated Recurrent Unit Training for Contextual Video Recognition

    Minju Jung, Haanvid Lee, Jun Tani
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Based on the progress of image recognition, video recognition has been
    extensively studied recently. However, most of the existing methods are focused
    on short-term but not long-term video recognition, called contextual video
    recognition. To address contextual video recognition, we use convolutional
    recurrent neural networks (ConvRNNs) having a rich spatio-temporal information
    processing capability, but ConvRNNs requires extensive computation that slows
    down training. In this paper, inspired by the normalization and detrending
    methods, we propose adaptive detrending (AD) for temporal normalization in
    order to accelerate the training of ConvRNNs, especially for convolutional
    gated recurrent unit (ConvGRU). AD removes internal covariate shift within a
    sequence of each neuron in recurrent neural networks (RNNs) by subtracting a
    trend. In the experiments for contextual recognition on ConvGRU, the results
    show that (1) ConvGRU clearly outperforms the feed-forward neural networks, (2)
    AD consistently offers a significant training acceleration and generalization
    improvement, and (3) AD is further improved by collaborating with the existing
    normalization methods.

    Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

    Qing Sun, Stefan Lee, Dhruv Batra
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We develop the first approximate inference algorithm for 1-Best (and M-Best)
    decoding in bidirectional neural sequence models by extending Beam Search (BS)
    to reason about both forward and backward time dependencies. Beam Search (BS)
    is a widely used approximate inference algorithm for decoding sequences from
    unidirectional neural sequence models. Interestingly, approximate inference in
    bidirectional models remains an open problem, despite their significant
    advantage in modeling information from both the past and future. To enable the
    use of bidirectional models, we present Bidirectional Beam Search (BiBS), an
    efficient algorithm for approximate bidirectional inference.To evaluate our
    method and as an interesting problem in its own right, we introduce a novel
    Fill-in-the-Blank Image Captioning task which requires reasoning about both
    past and future sentence structure to reconstruct sensible image descriptions.
    We use this task as well as the Visual Madlibs dataset to demonstrate the
    effectiveness of our approach, consistently outperforming all baseline methods.

    Self-supervised learning of visual features through embedding images into text topic spaces

    Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, C.V. Jawahar
    Comments: Accepted CVPR 2017 paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    End-to-end training from scratch of current deep architectures for new
    computer vision problems would require Imagenet-scale datasets, and this is not
    always possible. In this paper we present a method that is able to take
    advantage of freely available multi-modal content to train computer vision
    algorithms without human supervision. We put forward the idea of performing
    self-supervised learning of visual features by mining a large scale corpus of
    multi-modal (text and image) documents. We show that discriminative visual
    features can be learnt efficiently by training a CNN to predict the semantic
    context in which a particular image is more probable to appear as an
    illustration. For this we leverage the hidden semantic structures discovered in
    the text corpus with a well-known topic modeling technique. Our experiments
    demonstrate state of the art performance in image classification, object
    detection, and multi-modal retrieval compared to recent self-supervised or
    natural-supervised approaches.

    VANETs Meet Autonomous Vehicles: A Multimodal 3D Environment Learning Approach

    Yassine Maalej, Sameh Sorour, Ahmed Abdel-Rahim, Mohsen Guizani
    Comments: 7 pages, 12 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we design a multimodal framework for object detection,
    recognition and mapping based on the fusion of stereo camera frames, point
    cloud Velodyne Lidar scans, and Vehicle-to-Vehicle (V2V) Basic Safety Messages
    (BSMs) exchanged using Dedicated Short Range Communication (DSRC). We merge the
    key features of rich texture descriptions of objects from 2D images, depth and
    distance between objects provided by 3D point cloud and awareness of hidden
    vehicles from BSMs’ 3D information. We present a joint pixel to point cloud and
    pixel to V2V correspondences of objects in frames from the Kitti Vision
    Benchmark Suite by using a semi-supervised manifold alignment approach to
    achieve camera-Lidar and camera-V2V mapping of their recognized objects that
    have the same underlying manifold.

    Deep Rotation Equivariant Network

    Junying Li, Zichen Yang, Haifeng Liu, Deng Cai
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, learning equivariant representations has attracted considerable
    research attention. Dieleman et al. introduce four operations which can be
    inserted to CNN to learn deep representations equivariant to rotation. However,
    feature maps should be copied and rotated four times in each layer in their
    approach, which causes much running time and memory overhead. In order to
    address this problem, we propose Deep Rotation Equivariant Network(DREN)
    consisting of cycle layers, isotonic layers and decycle layers.Our proposed
    layers apply rotation transformation on filters rather than feature maps,
    achieving a speed up of more than 2 times with even less memory overhead. We
    evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it
    can improve the performance of state-of-the-art architectures. Our codes are
    released on GitHub.

    Robust Data Geometric Structure Aligned Close yet Discriminative Domain Adaptation

    Lingkun Luo, Xiaofang Wang, Shiqiang Hu, Liming Chen
    Comments: 12 pages, 1 figure
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Domain adaptation (DA) is transfer learning which aims to leverage labeled
    data in a related source domain to achieve informed knowledge transfer and help
    the classification of unlabeled data in a target domain. In this paper, we
    propose a novel DA method, namely Robust Data Geometric Structure Aligned,
    Close yet Discriminative Domain Adaptation (RSA-CDDA), which brings closer, in
    a latent joint subspace, both source and target data distributions, and aligns
    inherent hidden source and target data geometric structures while performing
    discriminative DA in repulsing both interclass source and target data. The
    proposed method performs domain adaptation between source and target in solving
    a unified model, which incorporates data distribution constraints, in
    particular via a nonparametric distance, i.e., Maximum Mean Discrepancy (MMD),
    as well as constraints on inherent hidden data geometric structure segmentation
    and alignment between source and target, through low rank and sparse
    representation. RSA-CDDA achieves the search of a joint subspace in solving the
    proposed unified model through iterative optimization, alternating Rayleigh
    quotient algorithm and inexact augmented Lagrange multiplier algorithm.
    Extensive experiments carried out on standard DA benchmarks, i.e., 16
    cross-domain image classification tasks, verify the effectiveness of the
    proposed method, which consistently outperforms the state-of-the-art methods.

    Deep Learning Improves Template Matching by Normalized Cross Correlation

    Davit Buniatyan, Thomas Macrina, Dodam Ih, Jonathan Zung, H. Sebastian Seung
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Template matching by normalized cross correlation (NCC) is widely used for
    finding image correspondences. We improve the robustness of this algorithm by
    preprocessing images with “siamese” convolutional networks trained to maximize
    the contrast between NCC values of true and false matches. The improvement is
    quantified using patches of brain images from serial section electron
    microscopy. Relative to a parameter-tuned bandpass filter, siamese
    convolutional networks significantly reduce false matches. Furthermore, all
    false matches can be eliminated by removing a tiny fraction of all matches
    based on NCC values. The improved accuracy of our method could be essential for
    connectomics, because emerging petascale datasets may require billions of
    template matches to assemble 2D images of serial sections into a 3D image
    stack. Our method is also expected to generalize to many other computer vision
    applications that use NCC template matching to find image correspondences.

    Generative Model with Coordinate Metric Learning for Object Recognition Based on 3D Models

    Yida Wang, Weihong Deng
    Comments: 15 pages, 12 figures, 3 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Given large amount of real photos for training, Convolutional neural network
    shows excellent performance on object recognition tasks. However, the process
    of collecting data is so tedious and the background are also limited which
    makes it hard to establish a perfect database. In this paper, our generative
    model trained with synthetic images rendered from 3D models reduces the
    workload of data collection and limitation of conditions. Our structure is
    composed of two sub-networks: semantic foreground object reconstruction network
    based on Bayesian inference and classification network based on multi-triplet
    cost function for avoiding over-fitting problem on monotone surface and fully
    utilizing pose information by establishing sphere-like distribution of
    descriptors in each category which is helpful for recognition on regular photos
    according to poses, lighting condition, background and category information of
    rendered images. Firstly, our conjugate structure called generative model with
    metric learning utilizing additional foreground object channels generated from
    Bayesian rendering as the joint of two sub-networks. Multi-triplet cost
    function based on poses for object recognition are used for metric learning
    which makes it possible training a category classifier purely based on
    synthetic data. Secondly, we design a coordinate training strategy with the
    help of adaptive noises acting as corruption on input images to help both
    sub-networks benefit from each other and avoid inharmonious parameter tuning
    due to different convergence speed of two sub-networks. Our structure achieves
    the state of the art accuracy of over 50\% on ShapeNet database with data
    migration obstacle from synthetic images to real photos. This pipeline makes it
    applicable to do recognition on real images only based on 3D models.

    Sequence Summarization Using Order-constrained Kernelized Feature Subspaces

    Anoop Cherian, Suvrit Sra, Richard Hartley
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Representations that can compactly and effectively capture temporal evolution
    of semantic content are important to machine learning algorithms that operate
    on multi-variate time-series data. We investigate such representations
    motivated by the task of human action recognition. Here each data instance is
    encoded by a multivariate feature (such as via a deep CNN) where action
    dynamics are characterized by their variations in time. As these features are
    often non-linear, we propose a novel pooling method, kernelized rank pooling,
    that represents a given sequence compactly as the pre-image of the parameters
    of a hyperplane in an RKHS, projections of data onto which captures their
    temporal order. We develop this idea further and show that such a pooling
    scheme can be cast as an order-constrained kernelized PCA objective; we then
    propose to use the parameters of a kernelized low-rank feature subspace as the
    representation of the sequences. We cast our formulation as an optimization
    problem on generalized Grassmann manifolds and then solve it efficiently using
    Riemannian optimization techniques. We present experiments on several action
    recognition datasets using diverse feature modalities and demonstrate
    state-of-the-art results.

    Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification

    Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie
    Comments: MICCAI 2017 Camera Ready
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Mammogram classification is directly related to computer-aided diagnosis of
    breast cancer. Traditional methods rely on regions of interest (ROIs) which
    require great efforts to annotate. Inspired by the success of using deep
    convolutional features for natural image analysis and multi-instance learning
    (MIL) for labeling a set of instances/patches, we propose end-to-end trained
    deep multi-instance networks for mass classification based on whole mammogram
    without the aforementioned ROIs. We explore three different schemes to
    construct deep multi-instance networks for whole mammogram classification.
    Experimental results on the INbreast dataset demonstrate the robustness of
    proposed networks compared to previous work using segmentation and detection
    annotations.

    Input Fast-Forwarding for Better Deep Learning

    Ahmed Ibrahim, A. Lynn Abbott, Mohamed E. Hussein
    Comments: Accepted in the 14th International Conference on Image Analysis and Recognition (ICIAR) 2017, Montreal, Canada
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper introduces a new architectural framework, known as input
    fast-forwarding, that can enhance the performance of deep networks. The main
    idea is to incorporate a parallel path that sends representations of input
    values forward to deeper network layers. This scheme is substantially different
    from “deep supervision” in which the loss layer is re-introduced to earlier
    layers. The parallel path provided by fast-forwarding enhances the training
    process in two ways. First, it enables the individual layers to combine
    higher-level information (from the standard processing path) with lower-level
    information (from the fast-forward path). Second, this new architecture reduces
    the problem of vanishing gradients substantially because the fast-forwarding
    path provides a shorter route for gradient backpropagation. In order to
    evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet),
    with 20 convolutional layers along with parallel fast-forward paths, has been
    created and tested. The paper presents empirical results that demonstrate
    improved learning capacity of FFNet due to fast-forwarding, as compared to
    GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in
    size, respectively. All of the source code and deep learning models described
    in this paper will be made available to the entire research community

    How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval

    Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto
    Comments: Accepted in IJCAI-17
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

    The knowledge representation community has built general-purpose ontologies
    which contain large amounts of commonsense knowledge over relevant aspects of
    the world, including useful visual information, e.g.: “a ball is used by a
    football player”, “a tennis player is located at a tennis court”. Current
    state-of-the-art approaches for visual recognition do not exploit these
    rule-based knowledge sources. Instead, they learn recognition models directly
    from training examples. In this paper, we study how general-purpose
    ontologies—specifically, MIT’s ConceptNet ontology—can improve the
    performance of state-of-the-art vision systems. As a testbed, we tackle the
    problem of sentence-based image retrieval. Our retrieval approach incorporates
    knowledge from ConceptNet on top of a large pool of object detectors derived
    from a deep learning technique. In our experiments, we show that ConceptNet can
    improve performance on a common benchmark dataset. Key to our performance is
    the use of the ESPGAME dataset to select visually relevant relations from
    ConceptNet. Consequently, a main conclusion of this work is that
    general-purpose commonsense ontologies improve performance on visual reasoning
    tasks when properly filtered to select meaningful visual relations.

    Stochastic Sequential Neural Networks with Structured Inference

    Hao Liu, Haoli Bai, Lirong He, Zenglin Xu
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Unsupervised structure learning in high-dimensional time series data has
    attracted a lot of research interests. For example, segmenting and labelling
    high dimensional time series can be helpful in behavior understanding and
    medical diagnosis. Recent advances in generative sequential modeling have
    suggested to combine recurrent neural networks with state space models (e.g.,
    Hidden Markov Models). This combination can model not only the long term
    dependency in sequential data, but also the uncertainty included in the hidden
    states. Inheriting these advantages of stochastic neural sequential models, we
    propose a structured and stochastic sequential neural network, which models
    both the long-term dependencies via recurrent neural networks and the
    uncertainty in the segmentation and labels via discrete random variables. For
    accurate and efficient inference, we present a bi-directional inference network
    by reparamterizing the categorical segmentation and labels with the recent
    proposed Gumbel-Softmax approximation and resort to the Stochastic Gradient
    Variational Bayes. We evaluate the proposed model in a number of tasks,
    including speech modeling, automatic segmentation and labeling in behavior
    understanding, and sequential multi-objects recognition. Experimental results
    have demonstrated that our proposed model can achieve significant improvement
    over the state-of-the-art methods.

    Continual Learning with Deep Generative Replay

    Hanul Shin, Jung Kwon Lee, Jaehong Kim, Jiwon Kim
    Comments: Submitted to NIPS 2017
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Attempts to train a comprehensive artificial intelligence capable of solving
    multiple tasks have been impeded by a chronic problem called catastrophic
    forgetting. Although simply replaying all previous data alleviates the problem,
    it requires large memory and even worse, often infeasible in real world
    applications where the access to past data is limited. Inspired by the
    generative nature of hippocampus as a short-term memory system in primate
    brain, we propose the Deep Generative Replay, a novel framework with a
    cooperative dual model architecture consisting of a deep generative model
    (“generator”) and a task solving model (“solver”). With only these two models,
    training data for previous tasks can easily be sampled and interleaved with
    those for a new task. We test our methods in several sequential learning
    settings involving image classification tasks.

    Hashing as Tie-Aware Learning to Rank

    Kun He, Fatih Cakir, Sarah A. Bargal, Stan Sclaroff
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We formulate the problem of supervised hashing, or learning binary embeddings
    of data, as a learning to rank problem. Specifically, we optimize two common
    ranking-based evaluation metrics, Average Precision (AP) and Normalized
    Discounted Cumulative Gain (NDCG). Observing that ranking with the discrete
    Hamming distance naturally results in ties, we propose to use tie-aware
    versions of ranking metrics in both the evaluation and the learning of
    supervised hashing. For AP and NDCG, we derive continuous relaxations of their
    tie-aware versions, and optimize them using stochastic gradient ascent with
    deep neural networks. Our results establish the new state-of-the-art for
    tie-aware AP and NDCG on common hashing benchmarks.

    Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

    Matthias Hein, Maksym Andriushchenko
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Recent work has shown that state-of-the-art classifiers are quite brittle, in
    the sense that a small adversarial change of an originally with high confidence
    correctly classified input leads to a wrong classification again with high
    confidence. This raises concerns that such classifiers are vulnerable to
    attacks and calls into question their usage in safety-critical systems. We show
    in this paper for the first time formal guarantees on the robustness of a
    classifier by giving instance-specific lower bounds on the norm of the input
    manipulation required to change the classifier decision. Based on this analysis
    we propose the Cross-Lipschitz regularization functional. We show that using
    this form of regularization in kernel methods resp. neural networks improves
    the robustness of the classifier without any loss in prediction performance.


    Artificial Intelligence

    How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval

    Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto
    Comments: Accepted in IJCAI-17
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

    The knowledge representation community has built general-purpose ontologies
    which contain large amounts of commonsense knowledge over relevant aspects of
    the world, including useful visual information, e.g.: “a ball is used by a
    football player”, “a tennis player is located at a tennis court”. Current
    state-of-the-art approaches for visual recognition do not exploit these
    rule-based knowledge sources. Instead, they learn recognition models directly
    from training examples. In this paper, we study how general-purpose
    ontologies—specifically, MIT’s ConceptNet ontology—can improve the
    performance of state-of-the-art vision systems. As a testbed, we tackle the
    problem of sentence-based image retrieval. Our retrieval approach incorporates
    knowledge from ConceptNet on top of a large pool of object detectors derived
    from a deep learning technique. In our experiments, we show that ConceptNet can
    improve performance on a common benchmark dataset. Key to our performance is
    the use of the ESPGAME dataset to select visually relevant relations from
    ConceptNet. Consequently, a main conclusion of this work is that
    general-purpose commonsense ontologies improve performance on visual reasoning
    tasks when properly filtered to select meaningful visual relations.

    When Will AI Exceed Human Performance? Evidence from AI Experts

    Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, Owain Evans
    Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

    Advances in artificial intelligence (AI) will transform modern life by
    reshaping transportation, health, science, finance, and the military. To adapt
    public policy, we need to better anticipate these advances. Here we report the
    results from a large survey of machine learning researchers on their beliefs
    about progress in AI. Researchers predict AI will outperform humans in many
    activities in the next ten years, such as translating languages (by 2024),
    writing high-school essays (by 2026), driving a truck (by 2027), working in
    retail (by 2031), writing a bestselling book (by 2049), and working as a
    surgeon (by 2053). Researchers believe there is a 50% chance of AI
    outperforming humans in all tasks in 45 years and of automating all human jobs
    in 120 years, with Asian respondents expecting these dates much sooner than
    North Americans. These results will inform discussion amongst researchers and
    policymakers about anticipating and managing trends in AI.

    Continual Learning with Deep Generative Replay

    Hanul Shin, Jung Kwon Lee, Jaehong Kim, Jiwon Kim
    Comments: Submitted to NIPS 2017
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Attempts to train a comprehensive artificial intelligence capable of solving
    multiple tasks have been impeded by a chronic problem called catastrophic
    forgetting. Although simply replaying all previous data alleviates the problem,
    it requires large memory and even worse, often infeasible in real world
    applications where the access to past data is limited. Inspired by the
    generative nature of hippocampus as a short-term memory system in primate
    brain, we propose the Deep Generative Replay, a novel framework with a
    cooperative dual model architecture consisting of a deep generative model
    (“generator”) and a task solving model (“solver”). With only these two models,
    training data for previous tasks can easily be sampled and interleaved with
    those for a new task. We test our methods in several sequential learning
    settings involving image classification tasks.

    An effective algorithm for hyperparameter optimization of neural networks

    Gonzalo Diaz, Achille Fokoue, Giacomo Nannicini, Horst Samulowitz
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    A major challenge in designing neural network (NN) systems is to determine
    the best structure and parameters for the network given the data for the
    machine learning problem at hand. Examples of parameters are the number of
    layers and nodes, the learning rates, and the dropout rates. Typically, these
    parameters are chosen based on heuristic rules and manually fine-tuned, which
    may be very time-consuming, because evaluating the performance of a single
    parametrization of the NN may require several hours. This paper addresses the
    problem of choosing appropriate parameters for the NN by formulating it as a
    box-constrained mathematical optimization problem, and applying a
    derivative-free optimization tool that automatically and effectively searches
    the parameter space. The optimization tool employs a radial basis function
    model of the objective function (the prediction accuracy of the NN) to
    accelerate the discovery of configurations yielding high accuracy. Candidate
    configurations explored by the algorithm are trained to a small number of
    epochs, and only the most promising candidates receive full training. The
    performance of the proposed methodology is assessed on benchmark sets and in
    the context of predicting drug-drug interactions, showing promising results.
    The optimization tool used in this paper is open-source.

    Predictive Analytics for Enhancing Travel Time Estimation in Navigation Apps of Apple, Google, and Microsoft

    Pouria Amirian, Anahid Basiri, Jeremy Morley
    Subjects: Artificial Intelligence (cs.AI)

    The explosive growth of the location-enabled devices coupled with the
    increasing use of Internet services has led to an increasing awareness of the
    importance and usage of geospatial information in many applications. The
    navigation apps (often called Maps), use a variety of available data sources to
    calculate and predict the travel time as well as several options for routing in
    public transportation, car or pedestrian modes. This paper evaluates the
    pedestrian mode of Maps apps in three major smartphone operating systems
    (Android, iOS and Windows Phone). In the paper, we will show that the Maps apps
    on iOS, Android and Windows Phone in pedestrian mode, predict travel time
    without learning from the individual’s movement profile. In addition, we will
    exemplify that those apps suffer from a specific data quality issue which
    relates to the absence of information about location and type of pedestrian
    crossings. Finally, we will illustrate learning from movement profile of
    individuals using various predictive analytics models to improve the accuracy
    of travel time estimation.

    Uplift Modeling with Multiple Treatments and General Response Types

    Yan Zhao, Xiao Fang, David Simchi-Levi
    Subjects: Artificial Intelligence (cs.AI)

    Randomized experiments have been used to assist decision-making in many
    areas. They help people select the optimal treatment for the test population
    with certain statistical guarantee. However, subjects can show significant
    heterogeneity in response to treatments. The problem of customizing treatment
    assignment based on subject characteristics is known as uplift modeling,
    differential response analysis, or personalized treatment learning in
    literature. A key feature for uplift modeling is that the data is unlabeled. It
    is impossible to know whether the chosen treatment is optimal for an individual
    subject because response under alternative treatments is unobserved. This
    presents a challenge to both the training and the evaluation of uplift models.
    In this paper we describe how to obtain an unbiased estimate of the key
    performance metric of an uplift model, the expected response. We present a new
    uplift algorithm which creates a forest of randomized trees. The trees are
    built with a splitting criterion designed to directly optimize their uplift
    performance based on the proposed evaluation method. Both the evaluation method
    and the algorithm apply to arbitrary number of treatments and general response
    types. Experimental results on synthetic data and industry-provided data show
    that our algorithm leads to significant performance improvement over other
    applicable methods.

    Flow-GAN: Bridging implicit and prescribed learning in generative models

    Aditya Grover, Manik Dhar, Stefano Ermon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Evaluating the performance of generative models for unsupervised learning is
    inherently challenging due to the lack of well-defined and tractable
    objectives. This is particularly difficult for implicit models such as
    generative adversarial networks (GANs) which perform extremely well in practice
    for tasks such as sample generation, but sidestep the explicit characterization
    of a density.

    We propose Flow-GANs, a generative adversarial network with the generator
    specified as a normalizing flow model which can perform exact likelihood
    evaluation. Subsequently, we learn a Flow-GAN using a hybrid objective that
    integrates adversarial training with maximum likelihood estimation. We show
    empirically the benefits of Flow-GANs on MNIST and CIFAR-10 datasets in
    learning generative models that can attain low generalization error based on
    the log-likelihoods and generate high quality samples. Finally, we show a
    simple, yet hard to beat baseline for GANs based on Gaussian Mixture Models.

    Improved Semi-supervised Learning with GANs using Manifold Invariances

    Abhishek Kumar, Prasanna Sattigeri, P. Thomas Fletcher
    Comments: 16 pages, 7 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Semi-supervised learning methods using Generative Adversarial Networks (GANs)
    have shown promising empirical success recently. Most of these methods use a
    shared discriminator/classifier which discriminates real examples from fake
    while also predicting the class label. Motivated by the ability of the GANs
    generator to capture the data manifold well, we propose to estimate the tangent
    space to the data manifold using GANs and employ it to inject invariances into
    the classifier. In the process, we propose enhancements over existing methods
    for learning the inverse mapping (i.e., the encoder) which greatly improves in
    terms of semantic similarity of the reconstructed sample with the input sample.
    We observe considerable empirical gains in semi-supervised learning over
    baselines, particularly in the cases when the number of labeled examples is
    low. We also provide insights into how fake examples influence the
    semi-supervised learning procedure.

    Safe Model-based Reinforcement Learning with Stability Guarantees

    Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Reinforcement learning is a powerful paradigm for learning optimal policies
    from experimental data. However, to find optimal policies, most reinforcement
    learning algorithms explore all possible actions, which may be harmful for
    real-world systems. As a consequence, learning algorithms are rarely applied on
    safety-critical systems in the real world. In this paper, we present a learning
    algorithm that explicitly considers safety in terms of stability guarantees.
    Specifically, we extend control theoretic results on Lyapunov stability
    verification and show how to use statistical models of the dynamics to obtain
    high-performance control policies with provable stability certificates.
    Moreover, under additional regularity assumptions in terms of a Gaussian
    process prior, we prove that one can effectively and safely collect data in
    order to learn about the dynamics and thus both improve control performance and
    expand the safe region of the state space. In our experiments, we show how the
    resulting algorithm can safely optimize a neural network policy on a simulated
    inverted pendulum, without the pendulum ever falling down.

    Vehicle Traffic Driven Camera Placement for Better Metropolis Security Surveillance

    Xiaobo Ma, Yihui He, Xiapu Luo, Jianfeng Li, Mengchen Zhao, Bo An, Xiaohong Guan
    Comments: 10 pages, 2 figures, under review for IEEE Intelligent Systems
    Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

    Security surveillance is one of the most important issues in smart cities,
    especially in an era of terrorism. Deploying a number of (video) cameras is a
    common surveillance approach. Given the never-ending power offered by vehicles
    to metropolises, exploiting vehicle traffic to design camera placement
    strategies could potentially facilitate security surveillance. This article
    constitutes the first effort toward building the linkage between vehicle
    traffic and security surveillance, which is a critical problem for smart
    cities. We expect our study could influence the decision making of surveillance
    camera placement, and foster more research of principled ways of security
    surveillance beneficial to our physical-world life.

    Selective Classification for Deep Neural Networks

    Yonatan Geifman, Ran El-Yaniv
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Selective classification techniques (also known as reject option) have not
    yet been considered in the context of deep neural networks (DNNs). These
    techniques can potentially significantly improve DNNs prediction performance by
    trading-off coverage. In this paper we propose a method to construct a
    selective classifier given a trained neural network. Our method allows a user
    to set a desired risk level. At test time, the classifier rejects instances as
    needed, to grant the desired risk (with high probability). Empirical results
    over CIFAR and ImageNet convincingly demonstrate the viability of our method,
    which opens up possibilities to operate DNNs in mission-critical applications.
    For example, using our method an unprecedented 2% error in top-5 ImageNet
    classification can be guaranteed with probability 99.9%, and almost 60% test
    coverage.

    Second-Order Word Embeddings from Nearest Neighbor Topological Features

    Denis Newman-Griffis, Eric Fosler-Lussier
    Comments: Submitted to NIPS 2017. (8 pages + 4 reference)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We introduce second-order vector representations of words, induced from
    nearest neighborhood topological features in pre-trained contextual word
    embeddings. We then analyze the effects of using second-order embeddings as
    input features in two deep natural language processing models, for named entity
    recognition and recognizing textual entailment, as well as a linear model for
    paraphrase recognition. Surprisingly, we find that nearest neighbor information
    alone is sufficient to capture most of the performance benefits derived from
    using pre-trained word embeddings. Furthermore, second-order embeddings are
    able to handle highly heterogeneous data better than first-order
    representations, though at the cost of some specificity. Additionally,
    augmenting contextual embeddings with second-order information further improves
    model performance in some cases. Due to variance in the random initializations
    of word embeddings, utilizing nearest neighbor features from multiple
    first-order embedding samples can also contribute to downstream performance
    gains. Finally, we identify intriguing characteristics of second-order
    embedding spaces for further research, including much higher density and
    different semantic interpretations of cosine similarity.

    Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

    Matthias Hein, Maksym Andriushchenko
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Recent work has shown that state-of-the-art classifiers are quite brittle, in
    the sense that a small adversarial change of an originally with high confidence
    correctly classified input leads to a wrong classification again with high
    confidence. This raises concerns that such classifiers are vulnerable to
    attacks and calls into question their usage in safety-critical systems. We show
    in this paper for the first time formal guarantees on the robustness of a
    classifier by giving instance-specific lower bounds on the norm of the input
    manipulation required to change the classifier decision. Based on this analysis
    we propose the Cross-Lipschitz regularization functional. We show that using
    this form of regularization in kernel methods resp. neural networks improves
    the robustness of the classifier without any loss in prediction performance.

    VAE with a VampPrior

    Jakub M. Tomczak, Max Welling
    Comments: 15 pages
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Many different methods to train deep generative models have been proposed in
    the past. In this paper, we propose to extend the variational auto-encoder
    (VAE) framework with a new type of prior which we call “Variational Mixture of
    Posteriors” prior, or VampPrior for short. The VampPrior consists of a mixture
    distribution (e.g., a mixture of Gaussians) with components given by
    variational posteriors conditioned on learnable pseudo-inputs. We further
    extend this prior to a two layer hierarchical model and show that this
    architecture where prior and posterior are coupled, learns significantly better
    models. The model also avoids the usual local optima issues that plague VAEs
    related to useless latent dimensions. We provide empirical studies on three
    benchmark datasets, namely, MNIST, OMNIGLOT and Caltech 101 Silhouettes, and
    show that applying the hierarchical VampPrior delivers state-of-the-art results
    on all three datasets in the unsupervised permutation invariant setting.


    Information Retrieval

    Beyond Parity: Fairness Objectives for Collaborative Filtering

    Sirui Yao, Bert Huang
    Subjects: Information Retrieval (cs.IR)

    We study fairness in collaborative-filtering recommender systems, which are
    sensitive to discrimination that exists in historical data. Biased data can
    lead collaborative-filtering methods to make unfair predictions for users from
    minority groups. We identify the insufficiency of existing fairness metrics and
    propose four new metrics that address different forms of unfairness. These
    fairness metrics can be optimized by adding fairness terms to the learning
    objective. Experiments on synthetic and real data show that our new metrics can
    better measure fairness than the baseline, and that the fairness objectives
    effectively help reduce unfairness.

    Journalists' information needs, seeking behavior, and its determinants on social media

    Omid Aghili, Mark Sanderson
    Subjects: Information Retrieval (cs.IR)

    We describe the results of a qualitative study on journalists’ information
    seeking behavior on social media. Based on interviews with eleven journalists
    along with a study of a set of university level journalism modules, we
    determined the categories of information need types that lead journalists to
    social media. We also determined the ways that social media is exploited as a
    tool to satisfy information needs and to define influential factors, which
    impacted on journalists’ information seeking behavior. We find that not only is
    social media used as an information source, but it can also be a supplier of
    stories found serendipitously. We find seven information need types that expand
    the types found in previous work. We also find five categories of influential
    factors that affect the way journalists seek information.

    How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval

    Rodrigo Toro Icarte, Jorge A. Baier, Cristian Ruz, Alvaro Soto
    Comments: Accepted in IJCAI-17
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

    The knowledge representation community has built general-purpose ontologies
    which contain large amounts of commonsense knowledge over relevant aspects of
    the world, including useful visual information, e.g.: “a ball is used by a
    football player”, “a tennis player is located at a tennis court”. Current
    state-of-the-art approaches for visual recognition do not exploit these
    rule-based knowledge sources. Instead, they learn recognition models directly
    from training examples. In this paper, we study how general-purpose
    ontologies—specifically, MIT’s ConceptNet ontology—can improve the
    performance of state-of-the-art vision systems. As a testbed, we tackle the
    problem of sentence-based image retrieval. Our retrieval approach incorporates
    knowledge from ConceptNet on top of a large pool of object detectors derived
    from a deep learning technique. In our experiments, we show that ConceptNet can
    improve performance on a common benchmark dataset. Key to our performance is
    the use of the ESPGAME dataset to select visually relevant relations from
    ConceptNet. Consequently, a main conclusion of this work is that
    general-purpose commonsense ontologies improve performance on visual reasoning
    tasks when properly filtered to select meaningful visual relations.


    Computation and Language

    Parsing with CYK over Distributed Representations: "Classical" Syntactic Parsing in the Novel Era of Neural Networks

    Fabio Massimo Zanzotto, Giordano Cristini
    Subjects: Computation and Language (cs.CL)

    Syntactic parsing is a key task in natural language processing which has been
    dominated by symbolic, grammar-based syntactic parsers. Neural networks, with
    their distributed representations, are challenging these methods.

    In this paper, we want to show that existing parsing algorithms can cross the
    border and be defined over distributed representations. We then define D-CYK: a
    version of the traditional CYK algorithm defined over distributed
    representations. Our D-CYK operates as the original CYK but uses matrix
    multiplications. These operations are compatible with traditional neural
    networks. Experiments show that D-CYK approximates the original CYK. By showing
    that CYK can be performed on distributed representations, our D-CYK opens the
    possibility of defining recurrent layers of CYK-informed neural networks.

    Deep Investigation of Cross-Language Plagiarism Detection Methods

    Jeremy Ferrero, Laurent Besacier, Didier Schwab, Frederic Agnes
    Comments: Accepted to BUCC (10th Workshop on Building and Using Comparable Corpora) colocated with ACL 2017
    Subjects: Computation and Language (cs.CL)

    This paper is a deep investigation of cross-language plagiarism detection
    methods on a new recently introduced open dataset, which contains parallel and
    comparable collections of documents with multiple characteristics (different
    genres, languages and sizes of texts). We investigate cross-language plagiarism
    detection methods for 6 language pairs on 2 granularities of text units in
    order to draw robust conclusions on the best methods while deeply analyzing
    correlations across document styles and languages.

    Second-Order Word Embeddings from Nearest Neighbor Topological Features

    Denis Newman-Griffis, Eric Fosler-Lussier
    Comments: Submitted to NIPS 2017. (8 pages + 4 reference)
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We introduce second-order vector representations of words, induced from
    nearest neighborhood topological features in pre-trained contextual word
    embeddings. We then analyze the effects of using second-order embeddings as
    input features in two deep natural language processing models, for named entity
    recognition and recognizing textual entailment, as well as a linear model for
    paraphrase recognition. Surprisingly, we find that nearest neighbor information
    alone is sufficient to capture most of the performance benefits derived from
    using pre-trained word embeddings. Furthermore, second-order embeddings are
    able to handle highly heterogeneous data better than first-order
    representations, though at the cost of some specificity. Additionally,
    augmenting contextual embeddings with second-order information further improves
    model performance in some cases. Due to variance in the random initializations
    of word embeddings, utilizing nearest neighbor features from multiple
    first-order embedding samples can also contribute to downstream performance
    gains. Finally, we identify intriguing characteristics of second-order
    embedding spaces for further research, including much higher density and
    different semantic interpretations of cosine similarity.

    Grounded Recurrent Neural Networks

    Ankit Vani, Yacine Jernite, David Sontag
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In this work, we present the Grounded Recurrent Neural Network (GRNN), a
    recurrent neural network architecture for multi-label prediction which
    explicitly ties labels to specific dimensions of the recurrent hidden state (we
    call this process “grounding”). The approach is particularly well-suited for
    extracting large numbers of concepts from text. We apply the new model to
    address an important problem in healthcare of understanding what medical
    concepts are discussed in clinical text. Using a publicly available dataset
    derived from Intensive Care Units, we learn to label a patient’s diagnoses and
    procedures from their discharge summary. Our evaluation shows a clear advantage
    to using our proposed architecture over a variety of strong baselines.


    Distributed, Parallel, and Cluster Computing

    On Using Time Without Clocks via Zigzag Causality

    Asa Dan, Rajit Manohar, Yoram Moses
    Comments: This is an extended version of a paper to appear in PODC 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Even in the absence of clocks, time bounds on the duration of actions enable
    the use of time for distributed coordination. This paper initiates an
    investigation of coordination in such a setting. A new communication structure
    called a zigzag pattern is introduced, and shown to guarantee bounds on the
    relative timing of events in this clockless model. Indeed, zigzag patterns are
    shown to be necessary and sufficient for establishing that events occur in a
    manner that satisfies prescribed bounds. We capture when a process can know
    that an appropriate zigzag pattern exists, and use this to provide necessary
    and sufficient conditions for timed coordination of events using a
    full-information protocol in the clockless model.

    Linearizable Iterators for Concurrent Data Structures

    Archita Agarwal, Zhiyu Liu, Eli Rosenthal, Vikram Saraph
    Comments: 17 pages, 9 figures
    Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this work, we provide a general framework for adding a linearizable
    iterator to data structures with set operations. We propose a condition on
    these set operations, called locality, so that any data structure implemented
    from local atomic operations can be augmented with a linearizable iterator as
    described by our framework. We then apply the iterator framework to various
    data structures, prove locality of their operations, and demonstrate that the
    iterator framework does not significantly affect the performance of concurrent
    operations.

    Developing an edge analytics platform for analyzing real-time transit data streams

    Hung Cao, Monica Wachowicz, Sangwhan Cha
    Comments: Edge-based analytics, real-time transit data streams, fog computing, descriptive analytics, Internet of Mobile Things, edge computing, mobile cloud computing, mobile edge computing
    Subjects: Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

    The Internet of Mobile Things encompasses stream data being generated by
    sensors, network communications that pull and push these data streams, as well
    as running processing and analytics that can effectively leverage actionable
    information for planning, management, and business advantage. Edge computing
    emerges as a new paradigm that decentralizes the communication, computation,
    control and storage resources from the cloud to the edge of the Internet. This
    paper proposes an edge computing platform where mobile fog nodes are physical
    devices where descriptive analytics is deployed to analyze real-time transit
    data streams. An application experiment is used to evaluate the advantages and
    disadvantages of our proposed platform to run descriptive analytics at the
    mobile fog node and support transit managers with actionable information.


    Learning

    Flow-GAN: Bridging implicit and prescribed learning in generative models

    Aditya Grover, Manik Dhar, Stefano Ermon
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    Evaluating the performance of generative models for unsupervised learning is
    inherently challenging due to the lack of well-defined and tractable
    objectives. This is particularly difficult for implicit models such as
    generative adversarial networks (GANs) which perform extremely well in practice
    for tasks such as sample generation, but sidestep the explicit characterization
    of a density.

    We propose Flow-GANs, a generative adversarial network with the generator
    specified as a normalizing flow model which can perform exact likelihood
    evaluation. Subsequently, we learn a Flow-GAN using a hybrid objective that
    integrates adversarial training with maximum likelihood estimation. We show
    empirically the benefits of Flow-GANs on MNIST and CIFAR-10 datasets in
    learning generative models that can attain low generalization error based on
    the log-likelihoods and generate high quality samples. Finally, we show a
    simple, yet hard to beat baseline for GANs based on Gaussian Mixture Models.

    Improved Semi-supervised Learning with GANs using Manifold Invariances

    Abhishek Kumar, Prasanna Sattigeri, P. Thomas Fletcher
    Comments: 16 pages, 7 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Semi-supervised learning methods using Generative Adversarial Networks (GANs)
    have shown promising empirical success recently. Most of these methods use a
    shared discriminator/classifier which discriminates real examples from fake
    while also predicting the class label. Motivated by the ability of the GANs
    generator to capture the data manifold well, we propose to estimate the tangent
    space to the data manifold using GANs and employ it to inject invariances into
    the classifier. In the process, we propose enhancements over existing methods
    for learning the inverse mapping (i.e., the encoder) which greatly improves in
    terms of semantic similarity of the reconstructed sample with the input sample.
    We observe considerable empirical gains in semi-supervised learning over
    baselines, particularly in the cases when the number of labeled examples is
    low. We also provide insights into how fake examples influence the
    semi-supervised learning procedure.

    Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations

    Diane Bouchacourt, Ryota Tomioka, Sebastian Nowozin
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We would like to learn a representation of the data which decomposes an
    observation into factors of variation which we can independently control.
    Specifically, we want to use minimal supervision to learn a latent
    representation that reflects the semantics behind a specific grouping of the
    data, where within a group the samples share a common factor of variation. For
    example, consider a collection of face images grouped by identity. We wish to
    anchor the semantics of the grouping into a relevant and disentangled
    representation that we can easily exploit. However, existing deep probabilistic
    models often assume that the observations are independent and identically
    distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new
    deep probabilistic model for learning a disentangled representation of a set of
    grouped observations. The ML-VAE separates the latent representation into
    semantically meaningful parts by working both at the group level and the
    observation level, while retaining efficient test-time inference. Quantitative
    and qualitative evaluations show that the ML-VAE model (i) learns a
    semantically meaningful disentanglement of grouped data, (ii) enables
    manipulation of the latent representation, and (iii) generalises to unseen
    groups.

    Open-Category Classification by Adversarial Sample Generation

    Yang Yu, Wei-Yang Qu, Nan Li, Zimin Guo
    Comments: Published in IJCAI 2017
    Subjects: Learning (cs.LG)

    In real-world classification tasks, it is difficult to collect samples of all
    possible categories of the environment in the training stage. Therefore, the
    classifier should be prepared for unseen classes. When an instance of an unseen
    class appears in the prediction stage, a robust classifier should have the
    ability to tell it is unseen, instead of classifying it to be any known
    category. In this paper, adopting the idea of adversarial learning, we propose
    the ASG framework for open-category classification. ASG generates positive and
    negative samples of seen categories in the unsupervised manner via an
    adversarial learning strategy. With the generated samples, ASG then learns to
    tell seen from unseen in the supervised manner. Experiments performed on
    several datasets show the effectiveness of ASG.

    Stochastic Sequential Neural Networks with Structured Inference

    Hao Liu, Haoli Bai, Lirong He, Zenglin Xu
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Unsupervised structure learning in high-dimensional time series data has
    attracted a lot of research interests. For example, segmenting and labelling
    high dimensional time series can be helpful in behavior understanding and
    medical diagnosis. Recent advances in generative sequential modeling have
    suggested to combine recurrent neural networks with state space models (e.g.,
    Hidden Markov Models). This combination can model not only the long term
    dependency in sequential data, but also the uncertainty included in the hidden
    states. Inheriting these advantages of stochastic neural sequential models, we
    propose a structured and stochastic sequential neural network, which models
    both the long-term dependencies via recurrent neural networks and the
    uncertainty in the segmentation and labels via discrete random variables. For
    accurate and efficient inference, we present a bi-directional inference network
    by reparamterizing the categorical segmentation and labels with the recent
    proposed Gumbel-Softmax approximation and resort to the Stochastic Gradient
    Variational Bayes. We evaluate the proposed model in a number of tasks,
    including speech modeling, automatic segmentation and labeling in behavior
    understanding, and sequential multi-objects recognition. Experimental results
    have demonstrated that our proposed model can achieve significant improvement
    over the state-of-the-art methods.

    Dictionary-based Monitoring of Premature Ventricular Contractions: An Ultra-Low-Cost Point-of-Care Service

    Bollepalli S. Chandra, Challa S. Sastry, Laxminarayana Anumandla, Soumya Jana
    Comments: 19 pages, 9 figures and 5 tables
    Subjects: Learning (cs.LG)

    While cardiovascular diseases (CVDs) are prevalent across economic strata,
    the economically disadvantaged population is disproportionately affected due to
    the high cost of traditional CVD management. Accordingly, developing an
    ultra-low-cost alternative, affordable even to groups at the bottom of the
    economic pyramid, has emerged as a societal imperative. Against this backdrop,
    we propose an inexpensive yet accurate home-based electrocardiogram(ECG)
    monitoring service. Specifically, we seek to provide point-of-care monitoring
    of premature ventricular contractions (PVCs), high frequency of which could
    indicate the onset of potentially fatal arrhythmia. Note that a traditional
    telecardiology system acquires the ECG, transmits it to a professional
    diagnostic centre without processing, and nearly achieves the diagnostic
    accuracy of a bedside setup, albeit at high bandwidth cost. In this context, we
    aim at reducing cost without significantly sacrificing reliability. To this
    end, we develop a dictionary-based algorithm that detects with high sensitivity
    the anomalous beats only which are then transmitted. We further compress those
    transmitted beats using class-specific dictionaries subject to suitable
    reconstruction/diagnostic fidelity. Such a scheme would not only reduce the
    overall bandwidth requirement, but also localising anomalous beats, thereby
    reducing physicians’ burden. Finally, using Monte Carlo cross validation on
    MIT/BIH arrhythmia database, we evaluate the performance of the proposed
    system. In particular, with a sensitivity target of at most one undetected PVC
    in one hundred beats, and a percentage root mean squared difference less than
    9% (a clinically acceptable level of fidelity), we achieved about 99.15%
    reduction in bandwidth cost, equivalent to 118-fold savings over traditional
    telecardiology.

    MMD GAN: Towards Deeper Understanding of Moment Matching Network

    Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, Barnabás Póczos
    Comments: submitted to NIPS 2017
    Subjects: Learning (cs.LG)

    Generative moment matching network (GMMN) is a deep generative model that
    differs from Generative Adversarial Network (GAN) by replacing the
    discriminator in GAN with a two-sample test based on kernel maximum mean
    discrepancy (MMD). Although some theoretical guarantees of MMD have been
    studied, the empirical performance of GMMN is still not as competitive as that
    of GAN on challenging and large benchmark datasets. The computational
    efficiency of GMMN is also less desirable in comparison with GAN, partially due
    to its requirement for a rather large batch size during the training. In this
    paper, we propose to improve both the model expressiveness of GMMN and its
    computational efficiency by introducing adversarial kernel learning techniques,
    as the replacement of a fixed Gaussian kernel in the original GMMN. The new
    approach combines the key ideas in both GMMN and GAN, hence we name it MMD-GAN.
    The new distance measure in MMD-GAN is a meaningful loss that enjoys the
    advantage of weak topology and can be optimized via gradient descent with
    relatively small batch sizes. In our evaluation on multiple benchmark datasets,
    including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN
    significantly outperforms GMMN, and is competitive with other representative
    GAN works.

    Towards Interrogating Discriminative Machine Learning Models

    Wenbo Guo, Kaixuan Zhang, Lin Lin, Sui Huang, Xinyu Xing
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    It is oftentimes impossible to understand how machine learning models reach a
    decision. While recent research has proposed various technical approaches to
    provide some clues as to how a learning model makes individual decisions, they
    cannot provide users with ability to inspect a learning model as a complete
    entity. In this work, we propose a new technical approach that augments a
    Bayesian regression mixture model with multiple elastic nets. Using the
    enhanced mixture model, we extract explanations for a target model through
    global approximation. To demonstrate the utility of our approach, we evaluate
    it on different learning models covering the tasks of text mining and image
    recognition. Our results indicate that the proposed approach not only
    outperforms the state-of-the-art technique in explaining individual decisions
    but also provides users with an ability to discover the vulnerabilities of a
    learning model.

    Data-driven Random Fourier Features using Stein Effect

    Wei-Cheng Chang, Chun-Liang Li, Yiming Yang, Barnabas Poczos
    Comments: To appear in International Joint Conference on Artificial Intelligence (IJCAI), 2017
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Large-scale kernel approximation is an important problem in machine learning
    research. Approaches using random Fourier features have become increasingly
    popular [Rahimi and Recht, 2007], where kernel approximation is treated as
    empirical mean estimation via Monte Carlo (MC) or Quasi-Monte Carlo (QMC)
    integration [Yang et al., 2014]. A limitation of the current approaches is that
    all the features receive an equal weight summing to 1. In this paper, we
    propose a novel shrinkage estimator from “Stein effect”, which provides a
    data-driven weighting strategy for random features and enjoys theoretical
    justifications in terms of lowering the empirical risk. We further present an
    efficient randomized algorithm for large-scale applications of the proposed
    method. Our empirical results on six benchmark data sets demonstrate the
    advantageous performance of this approach over representative baselines in both
    kernel approximation and supervised learning tasks.

    Interpreting Blackbox Models via Model Extraction

    Osbert Bastani, Carolyn Kim, Hamsa Bastani
    Subjects: Learning (cs.LG)

    Interpretability has become an important issue as machine learning is
    increasingly used to inform consequential decisions. We propose an approach for
    interpreting a blackbox model by extracting a decision tree that approximates
    the model. Our model extraction algorithm avoids overfitting by leveraging
    blackbox model access to actively sample new training points. We prove that as
    the number of samples goes to infinity, the decision tree learned using our
    algorithm converges to the exact greedy decision tree. In our evaluation, we
    use our algorithm to interpret random forests and neural nets trained on
    several datasets from the UCI Machine Learning Repository, as well as control
    policies learned for three classical reinforcement learning problems. We show
    that our algorithm improves over a baseline based on CART on every problem
    instance. Furthermore, we show how an interpretation generated by our approach
    can be used to understand and debug these models.

    Selective Classification for Deep Neural Networks

    Yonatan Geifman, Ran El-Yaniv
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    Selective classification techniques (also known as reject option) have not
    yet been considered in the context of deep neural networks (DNNs). These
    techniques can potentially significantly improve DNNs prediction performance by
    trading-off coverage. In this paper we propose a method to construct a
    selective classifier given a trained neural network. Our method allows a user
    to set a desired risk level. At test time, the classifier rejects instances as
    needed, to grant the desired risk (with high probability). Empirical results
    over CIFAR and ImageNet convincingly demonstrate the viability of our method,
    which opens up possibilities to operate DNNs in mission-critical applications.
    For example, using our method an unprecedented 2% error in top-5 ImageNet
    classification can be guaranteed with probability 99.9%, and almost 60% test
    coverage.

    The Prediction Advantage: A Universally Meaningful Performance Measure for Classification and Regression

    Ran El-Yaniv, Yonatan Geifman, Yair Weiner
    Subjects: Learning (cs.LG)

    We introduce the Prediction Advantage (PA), a novel performance measure for
    prediction functions under any loss function (e.g., classification or
    regression). The PA is defined as the performance advantage relative to the
    Bayesian risk restricted to knowing only the distribution of the labels. We
    derive the PA for well-known loss functions, including 0/1 loss, cross-entropy
    loss, absolute loss, and squared loss. In the latter case, the PA is identical
    to the well-known R-squared measure, widely used in statistics. The use of the
    PA ensures meaningful quantification of prediction performance, which is not
    guaranteed, for example, when dealing with noisy imbalanced classification
    problems. We argue that among several known alternative performance measures,
    PA is the best (and only) quantity ensuring meaningfulness for all noise and
    imbalance levels.

    Clinical Intervention Prediction and Understanding using Deep Networks

    Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, Marzyeh Ghassemi
    Subjects: Learning (cs.LG)

    Real-time prediction of clinical interventions remains a challenge within
    intensive care units (ICUs). This task is complicated by data sources that are
    noisy, sparse, heterogeneous and outcomes that are imbalanced. In this paper,
    we integrate data from all available ICU sources (vitals, labs, notes,
    demographics) and focus on learning rich representations of this data to
    predict onset and weaning of multiple invasive interventions. In particular, we
    compare both long short-term memory networks (LSTM) and convolutional neural
    networks (CNN) for prediction of five intervention tasks: invasive ventilation,
    non-invasive ventilation, vasopressors, colloid boluses, and crystalloid
    boluses. Our predictions are done in a forward-facing manner to enable
    “real-time” performance, and predictions are made with a six hour gap time to
    support clinically actionable planning. We achieve state-of-the-art results on
    our predictive tasks using deep architectures. We explore the use of feature
    occlusion to interpret LSTM models, and compare this to the interpretability
    gained from examining inputs that maximally activate CNN outputs. We show that
    our models are able to significantly outperform baselines in intervention
    prediction, and provide insight into model learning, which is crucial for the
    adoption of such models in practice.

    Efficiently applying attention to sequential data with the Recurrent Discounted Attention unit

    Brendan Maginnis, Pierre H. Richemond
    Subjects: Learning (cs.LG)

    Recurrent Neural Networks architectures excel at processing sequences by
    modelling dependencies over different timescales. The recently introduced
    Recurrent Weighted Average (RWA) unit captures long term dependencies far
    better than an LSTM on several challenging tasks. The RWA achieves this by
    applying attention to each input and computing a weighted average over the full
    history of its computations. Unfortunately, the RWA cannot change the attention
    it has assigned to previous timesteps, and so struggles with carrying out
    consecutive tasks or tasks with changing requirements. We present the Recurrent
    Discounted Attention (RDA) unit that builds on the RWA by additionally allowing
    the discounting of the past.

    We empirically compare our model to RWA, LSTM and GRU units on several
    challenging tasks. On tasks with a single output the RWA, RDA and GRU units
    learn much quicker than the LSTM and with better performance. On the multiple
    sequence copy task our RDA unit learns the task three times as quickly as the
    LSTM or GRU units while the RWA fails to learn at all. On the Wikipedia
    character prediction task the LSTM performs best but it followed closely by our
    RDA unit. Overall our RDA unit performs well and is sample efficient on a large
    variety of sequence tasks.

    Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

    Matthias Hein, Maksym Andriushchenko
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Recent work has shown that state-of-the-art classifiers are quite brittle, in
    the sense that a small adversarial change of an originally with high confidence
    correctly classified input leads to a wrong classification again with high
    confidence. This raises concerns that such classifiers are vulnerable to
    attacks and calls into question their usage in safety-critical systems. We show
    in this paper for the first time formal guarantees on the robustness of a
    classifier by giving instance-specific lower bounds on the norm of the input
    manipulation required to change the classifier decision. Based on this analysis
    we propose the Cross-Lipschitz regularization functional. We show that using
    this form of regularization in kernel methods resp. neural networks improves
    the robustness of the classifier without any loss in prediction performance.

    Dense Transformer Networks

    Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The key idea of current deep learning methods for dense prediction is to
    apply a model on a regular patch centered on each pixel to make pixel-wise
    predictions. These methods are limited in the sense that the patches are
    determined by network architecture instead of learned from data. In this work,
    we propose the dense transformer networks, which can learn the shapes and sizes
    of patches from data. The dense transformer networks employ an encoder-decoder
    architecture, and a pair of dense transformer modules are inserted into each of
    the encoder and decoder paths. The novelty of this work is that we provide
    technical solutions for learning the shapes and sizes of patches from data and
    efficiently restoring the spatial correspondence required for dense prediction.
    The proposed dense transformer modules are differentiable, thus the entire
    network can be trained. We apply the proposed networks on natural and
    biological image segmentation tasks and show superior performance is achieved
    in comparison to baseline methods.

    Anti-spoofing Methods for Automatic SpeakerVerification System

    Galina Lavrentyeva, Sergey Novoselov, Konstantin Simonchik
    Comments: 12 pages, 0 figures, published in Springer Communications in Computer and Information Science (CCIS) vol. 661
    Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)

    Growing interest in automatic speaker verification (ASV)systems has lead to
    significant quality improvement of spoofing attackson them. Many research works
    confirm that despite the low equal er-ror rate (EER) ASV systems are still
    vulnerable to spoofing attacks. Inthis work we overview different acoustic
    feature spaces and classifiersto determine reliable and robust countermeasures
    against spoofing at-tacks. We compared several spoofing detection systems,
    presented so far,on the development and evaluation datasets of the Automatic
    SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
    2015.Experimental results presented in this paper demonstrate that the useof
    magnitude and phase information combination provides a substantialinput into
    the efficiency of the spoofing detection systems. Also wavelet-based features
    show impressive results in terms of equal error rate. Inour overview we compare
    spoofing performance for systems based on dif-ferent classifiers. Comparison
    results demonstrate that the linear SVMclassifier outperforms the conventional
    GMM approach. However, manyresearchers inspired by the great success of deep
    neural networks (DNN)approaches in the automatic speech recognition, applied
    DNN in thespoofing detection task and obtained quite low EER for known and
    un-known type of spoofing attacks.

    Audio-replay attack detection countermeasures

    Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, Vadim Shchemelinin
    Comments: 11 pages, 3 figures, accepted for Specom 2017
    Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)

    This paper presents the Speech Technology Center (STC) replay attack
    detection systems proposed for Automatic Speaker Verification Spoofing and
    Countermeasures Challenge 2017. In this study we focused on comparison of
    different spoofing detection approaches. These were GMM based methods, high
    level features extraction with simple classifier and deep learning frameworks.
    Experiments performed on the development and evaluation parts of the challenge
    dataset demonstrated stable efficiency of deep learning approaches in case of
    changing acoustic conditions. At the same time SVM classifier with high level
    features provided a substantial input in the efficiency of the resulting STC
    systems according to the fusion systems results.

    Joint Distribution Optimal Transportation for Domain Adaptation

    Nicolas Courty, Rémi Flamary, Amaury Habrard, Alain Rakotomamonjy
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    This paper deals with the unsupervised domain adaptation problem, where one
    wants to estimate a prediction function (f) in a given target domain without
    any labeled sample by exploiting the knowledge available from a source domain
    where labels are known. Our work makes the following assumption: there exists a
    non-linear transformation between the joint feature/label space distributions
    of the two domain (mathcal{P}_s) and (mathcal{P}_t). We propose a solution of
    this problem with optimal transport, that allows to recover an estimated target
    (mathcal{P}^f_t=(X,f(X))) by optimizing simultaneously the optimal coupling
    and (f). We show that our method corresponds to the minimization of a bound on
    the target error, and provide an efficient algorithmic solution, for which
    convergence is proved. The versatility of our approach, both in terms of class
    of hypothesis or loss functions is demonstrated with real world classification
    and regression problems, for which we reach or surpass state-of-the-art
    results.

    Learning with Average Top-k Loss

    Yanbo Fan, Siwei Lyu, Yiming Ying, Bao-Gang Hu
    Comments: 18 pages
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In this work, we introduce the average top-(k) (AT(_k)) loss as a new
    ensemble loss for supervised learning, which is the average over the (k)
    largest individual losses over a training dataset. We show that the AT(_k) loss
    is a natural generalization of the two widely used ensemble losses, namely the
    average loss and the maximum loss, but can combines their advantages and
    mitigate their drawbacks to better adapt to different data distributions.
    Furthermore, it remains a convex function over all individual losses, which can
    lead to convex optimization problems that can be solved effectively with
    conventional gradient-based method. We provide an intuitive interpretation of
    the AT(_k) loss based on its equivalent effect on the continuous individual
    loss functions, suggesting that it can reduce the penalty on correctly
    classified data. We further give a learning theory analysis of MAT(_k) learning
    on the classification calibration of the AT(_k) loss and the error bounds of
    AT(_k)-SVM. We demonstrate the applicability of minimum average top-(k)
    learning for binary classification and regression using synthetic and real
    datasets.

    Causal Effect Inference with Deep Latent-Variable Models

    Christos Louizos, Uri Shalit, Joris Mooij, David Sontag, Richard Zemel, Max Welling
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Learning individual-level causal effects from observational data, such as
    inferring the most effective medication for a specific patient, is a problem of
    growing importance for policy makers. The most important aspect of inferring
    causal effects from observational data is the handling of confounders, factors
    that affect both an intervention and its outcome. A carefully designed
    observational study attempts to measure all important confounders. However,
    even if one does not have direct access to all confounders, there may exist
    noisy and uncertain measurement of proxies for confounders. We build on recent
    advances in latent variable modelling to simultaneously estimate the unknown
    latent space summarizing the confounders and the causal effect. Our method is
    based on Variational Autoencoders (VAE) which follow the causal structure of
    inference with proxies. We show our method is significantly more robust than
    existing methods, and matches the state-of-the-art on previous benchmarks
    focused on individual treatment effects.

    Train longer, generalize better: closing the generalization gap in large batch training of neural networks

    Elad Hoffer, Itay Hubara, Daniel Soudry
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Background: Deep learning models are typically trained using stochastic
    gradient descent or one of its variants. These methods update the weights using
    their gradient, estimated from a small fraction of the training data. It has
    been observed that when using large batch sizes there is a persistent
    degradation in generalization performance – known as the “generalization gap”
    phenomena. Identifying the origin of this gap and closing it had remained an
    open problem.

    Contributions: We examine the initial high learning rate training phase. We
    find that the weight distance from its initialization grows logarithmically
    with the number of weight updates. We therefore propose a “random walk on
    random landscape” statistical model which is known to exhibit similar
    “ultra-slow” diffusion behavior. Following this hypothesis we conducted
    experiments to show empirically that the “generalization gap” stems from the
    relatively small number of updates rather than the batch size, and can be
    completely eliminated by adapting the training regime used. We further
    investigate different techniques to train models in the large-batch regime and
    present a novel algorithm named “Ghost Batch Normalization” which enables
    significant decrease in the generalization gap without increasing the number of
    updates. To validate our findings we conduct several additional experiments on
    MNIST, CIFAR-10, CIFAR-100 and ImageNet. Finally, we reassess common practices
    and beliefs concerning training of deep models and suggest they may not be
    optimal to achieve good generalization.

    Non-Stationary Spectral Kernels

    Sami Remes, Markus Heinonen, Samuel Kaski
    Comments: 16 pages, 5 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose non-stationary spectral kernels for Gaussian process regression.
    We propose to model the spectral density of a non-stationary kernel function as
    a mixture of input-dependent Gaussian process frequency density surfaces. We
    solve the generalised Fourier transform with such a model, and present a family
    of non-stationary and non-monotonic kernels that can learn input-dependent and
    potentially long-range, non-monotonic covariances between inputs. We derive
    efficient inference using model whitening and marginalized posterior, and show
    with case studies that these kernels are necessary when modelling even rather
    simple time series, image or geospatial data with non-stationary
    characteristics.

    Continual Learning with Deep Generative Replay

    Hanul Shin, Jung Kwon Lee, Jaehong Kim, Jiwon Kim
    Comments: Submitted to NIPS 2017
    Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Attempts to train a comprehensive artificial intelligence capable of solving
    multiple tasks have been impeded by a chronic problem called catastrophic
    forgetting. Although simply replaying all previous data alleviates the problem,
    it requires large memory and even worse, often infeasible in real world
    applications where the access to past data is limited. Inspired by the
    generative nature of hippocampus as a short-term memory system in primate
    brain, we propose the Deep Generative Replay, a novel framework with a
    cooperative dual model architecture consisting of a deep generative model
    (“generator”) and a task solving model (“solver”). With only these two models,
    training data for previous tasks can easily be sampled and interleaved with
    those for a new task. We test our methods in several sequential learning
    settings involving image classification tasks.

    Bayesian Compression for Deep Learning

    Christos Louizos, Karen Ullrich, Max Welling
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Compression and computational efficiency in deep learning have become a
    problem of great significance. In this work, we argue that the most principled
    and effective way to attack this problem is by taking a Bayesian point of view,
    where through sparsity inducing priors we prune large parts of the network. We
    introduce two novelties in this paper: 1) we use hierarchical priors to prune
    nodes instead of individual weights, and 2) we use the posterior uncertainties
    to determine the optimal fixed point precision to encode the weights. Both
    factors significantly contribute to achieving the state of the art in terms of
    compression rates, while still staying competitive with methods designed to
    optimize for speed or energy efficiency.

    Towards Understanding the Invertibility of Convolutional Neural Networks

    Anna C. Gilbert, Yi Zhang, Kibok Lee, Yuting Zhang, Honglak Lee
    Journal-ref: IJCAI 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Several recent works have empirically observed that Convolutional Neural Nets
    (CNNs) are (approximately) invertible. To understand this approximate
    invertibility phenomenon and how to leverage it more effectively, we focus on a
    theoretical explanation and develop a mathematical model of sparse signal
    recovery that is consistent with CNNs with random weights. We give an exact
    connection to a particular model of model-based compressive sensing (and its
    recovery algorithms) and random-weight CNNs. We show empirically that several
    learned networks are consistent with our mathematical analysis and then
    demonstrate that with such a simple theoretical framework, we can obtain
    reasonable re- construction results on real images. We also discuss gaps
    between our model assumptions and the CNN trained for classification in
    practical scenarios.

    Nonparametric Preference Completion

    Julian Katz-Samuels, Clayton Scott
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We consider the task of collaborative preference completion: given a pool of
    items, a pool of users and a partially observed item-user rating matrix, the
    goal is to recover the personalized ranking of each user over all of the items.
    Our approach is nonparametric: we assume that each item (i) and each user (u)
    have unobserved features (x_i) and (y_u), and that the associated rating is
    given by (g_u(f(x_i,y_u))) where (f) is Lipschitz and (g_u) is a monotonic
    transformation that depends on the user. We propose a (k)-nearest
    neighbors-like algorithm and prove that it is consistent. To the best of our
    knowledge, this is the first consistency result for the collaborative
    preference completion problem in a nonparametric setting. Finally, we conduct
    experiments on the Netflix and Movielens datasets that suggest that our
    algorithm has some advantages over existing neighborhood-based methods and that
    its performance is comparable to some state-of-the art matrix factorization
    methods.

    Multi-Task Learning for Contextual Bandits

    Aniket Anand Deshmukh, Urun Dogan, Clayton Scott
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Contextual bandits are a form of multi-armed bandit in which the agent has
    access to predictive side information (known as the context) for each arm at
    each time step, and have been used to model personalized news recommendation,
    ad placement, and other applications. In this work, we propose a multi-task
    learning framework for contextual bandit problems. Like multi-task learning in
    the batch setting, the goal is to leverage similarities in contexts for
    different arms so as to improve the agent’s ability to predict rewards from
    contexts. We propose an upper confidence bound-based multi-task learning
    algorithm for contextual bandits, establish a corresponding regret bound, and
    interpret this bound to quantify the advantages of learning in the presence of
    high task (arm) similarity. We also describe an effective scheme for estimating
    task similarity from data, and demonstrate our algorithm’s performance on
    several data sets.

    Hashing as Tie-Aware Learning to Rank

    Kun He, Fatih Cakir, Sarah A. Bargal, Stan Sclaroff
    Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We formulate the problem of supervised hashing, or learning binary embeddings
    of data, as a learning to rank problem. Specifically, we optimize two common
    ranking-based evaluation metrics, Average Precision (AP) and Normalized
    Discounted Cumulative Gain (NDCG). Observing that ranking with the discrete
    Hamming distance naturally results in ties, we propose to use tie-aware
    versions of ranking metrics in both the evaluation and the learning of
    supervised hashing. For AP and NDCG, we derive continuous relaxations of their
    tie-aware versions, and optimize them using stochastic gradient ascent with
    deep neural networks. Our results establish the new state-of-the-art for
    tie-aware AP and NDCG on common hashing benchmarks.

    Grounded Recurrent Neural Networks

    Ankit Vani, Yacine Jernite, David Sontag
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In this work, we present the Grounded Recurrent Neural Network (GRNN), a
    recurrent neural network architecture for multi-label prediction which
    explicitly ties labels to specific dimensions of the recurrent hidden state (we
    call this process “grounding”). The approach is particularly well-suited for
    extracting large numbers of concepts from text. We apply the new model to
    address an important problem in healthcare of understanding what medical
    concepts are discussed in clinical text. Using a publicly available dataset
    derived from Intensive Care Units, we learn to label a patient’s diagnoses and
    procedures from their discharge summary. Our evaluation shows a clear advantage
    to using our proposed architecture over a variety of strong baselines.

    Safe Model-based Reinforcement Learning with Stability Guarantees

    Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

    Reinforcement learning is a powerful paradigm for learning optimal policies
    from experimental data. However, to find optimal policies, most reinforcement
    learning algorithms explore all possible actions, which may be harmful for
    real-world systems. As a consequence, learning algorithms are rarely applied on
    safety-critical systems in the real world. In this paper, we present a learning
    algorithm that explicitly considers safety in terms of stability guarantees.
    Specifically, we extend control theoretic results on Lyapunov stability
    verification and show how to use statistical models of the dynamics to obtain
    high-performance control policies with provable stability certificates.
    Moreover, under additional regularity assumptions in terms of a Gaussian
    process prior, we prove that one can effectively and safely collect data in
    order to learn about the dynamics and thus both improve control performance and
    expand the safe region of the state space. In our experiments, we show how the
    resulting algorithm can safely optimize a neural network policy on a simulated
    inverted pendulum, without the pendulum ever falling down.

    Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification

    Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie
    Comments: MICCAI 2017 Camera Ready
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Mammogram classification is directly related to computer-aided diagnosis of
    breast cancer. Traditional methods rely on regions of interest (ROIs) which
    require great efforts to annotate. Inspired by the success of using deep
    convolutional features for natural image analysis and multi-instance learning
    (MIL) for labeling a set of instances/patches, we propose end-to-end trained
    deep multi-instance networks for mass classification based on whole mammogram
    without the aforementioned ROIs. We explore three different schemes to
    construct deep multi-instance networks for whole mammogram classification.
    Experimental results on the INbreast dataset demonstrate the robustness of
    proposed networks compared to previous work using segmentation and detection
    annotations.

    Statistical Convergence Analysis of Gradient EM on General Gaussian Mixture Models

    Bowei Yan, Mingzhang Yin, Purnamrita Sarkar
    Comments: 31 pages
    Subjects: Statistics Theory (math.ST); Learning (cs.LG)

    In this paper, we study convergence properties of the gradient
    Expectation-Maximization algorithm cite{lange1995gradient} for Gaussian
    Mixture Models for general number of clusters and mixing coefficients. We
    derive the convergence rate depending on the mixing coefficients, minimum and
    maximum pairwise distances between the true centers and dimensionality and
    number of components; and obtain a near-optimal local contraction radius. While
    there have been some recent notable works that derive local convergence rates
    for EM in the two equal mixture symmetric GMM, in the more general case, the
    derivations need structurally different and non-trivial arguments. We use
    recent tools from learning theory and empirical processes to achieve our
    theoretical results.

    An effective algorithm for hyperparameter optimization of neural networks

    Gonzalo Diaz, Achille Fokoue, Giacomo Nannicini, Horst Samulowitz
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    A major challenge in designing neural network (NN) systems is to determine
    the best structure and parameters for the network given the data for the
    machine learning problem at hand. Examples of parameters are the number of
    layers and nodes, the learning rates, and the dropout rates. Typically, these
    parameters are chosen based on heuristic rules and manually fine-tuned, which
    may be very time-consuming, because evaluating the performance of a single
    parametrization of the NN may require several hours. This paper addresses the
    problem of choosing appropriate parameters for the NN by formulating it as a
    box-constrained mathematical optimization problem, and applying a
    derivative-free optimization tool that automatically and effectively searches
    the parameter space. The optimization tool employs a radial basis function
    model of the objective function (the prediction accuracy of the NN) to
    accelerate the discovery of configurations yielding high accuracy. Candidate
    configurations explored by the algorithm are trained to a small number of
    epochs, and only the most promising candidates receive full training. The
    performance of the proposed methodology is assessed on benchmark sets and in
    the context of predicting drug-drug interactions, showing promising results.
    The optimization tool used in this paper is open-source.

    Bayesian Pool-based Active Learning with Abstention Feedbacks

    Cuong V. Nguyen, Lam Si Tung Ho, Huan Xu, Vu Dinh, Binh Nguyen
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We study pool-based active learning with abstention feedbacks, where a
    labeler can abstain from labeling a queried example. We take a Bayesian
    approach to the problem and propose a general framework that learns both the
    target classification problem and the unknown abstention pattern at the same
    time. As specific instances of the framework, we develop two useful greedy
    algorithms with theoretical guarantees: they respectively achieve the
    ({(1-frac{1}{e})}) factor approximation of the optimal expected or worst-case
    value of a useful utility function. Our experiments show the algorithms perform
    well in various practical scenarios.


    Information Theory

    Matrix-product structure of repeated-root constacyclic codes over finite fields

    Yuan Cao, Yonglin Cao
    Subjects: Information Theory (cs.IT)

    For any prime number (p), positive integers (m, k, n) satisfying ({
    m
    gcd}(p,n)=1) and (lambda_0in mathbb{F}_{p^m}^ imes), we prove that any
    (lambda_0^{p^k})-constacyclic code of length (p^kn) over the finite field
    (mathbb{F}_{p^m}) is monomially equivalent to a matrix-product code of a
    nested sequence of (p^k) (lambda_0)-constacyclic codes with length (n) over
    (mathbb{F}_{p^m}).

    STFT with Adaptive Window Width Based on the Chirp Rate

    Soo-Chang Pei, Shih-Gu Huang
    Comments: Accepted by IEEE Transactions on Signal Processing
    Subjects: Information Theory (cs.IT)

    An adaptive time-frequency representation (TFR) with higher energy
    concentration usually requires higher complexity. Recently, a low-complexity
    adaptive short-time Fourier transform (ASTFT) based on the chirp rate has been
    proposed. To enhance the performance, this method is substantially modified in
    this paper: i) because the wavelet transform used for instantaneous frequency
    (IF) estimation is not signal-dependent, a low-complexity ASTFT based on a
    novel concentration measure is addressed; ii) in order to increase robustness
    to IF estimation error, the principal component analysis (PCA) replaces the
    difference operator for calculating the chirp rate; and iii) a more robust
    Gaussian kernel with time-frequency-varying window width is proposed.
    Simulation results show that our method has higher energy concentration than
    the other ASTFTs, especially for multicomponent signals and nonlinear FM
    signals. Also, for IF estimation, our method is superior to many other adaptive
    TFRs in low signal-to-noise ratio (SNR) environments.

    V2X Meets NOMA: Non-Orthogonal Multiple Access for 5G Enabled Vehicular Networks

    Boya Di, Lingyang Song, Yonghui Li, Zhu Han
    Comments: Accepted by IEEE Wireless Communications Magazine
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    Benefited from the widely deployed infrastructure, the LTE network has
    recently been considered as a promising candidate to support the
    vehicle-to-everything (V2X) services. However, with a massive number of devices
    accessing the V2X network in the future, the conventional OFDM-based LTE
    network faces the congestion issues due to its low efficiency of orthogonal
    access, resulting in significant access delay and posing a great challenge
    especially to safety-critical applications. The non-orthogonal multiple access
    (NOMA) technique has been well recognized as an effective solution for the
    future 5G cellular networks to provide broadband communications and massive
    connectivity. In this article, we investigate the applicability of NOMA in
    supporting cellular V2X services to achieve low latency and high reliability.
    Starting with a basic V2X unicast system, a novel NOMA-based scheme is proposed
    to tackle the technical hurdles in designing high spectral efficient scheduling
    and resource allocation schemes in the ultra dense topology. We then extend it
    to a more general V2X broadcasting system. Other NOMA-based extended V2X
    applications and some open issues are also discussed.

    On the Success Probability of Decoding (Partial) Unit Memory Codes

    Sven Puchinger, Sven Müelich, Martin Bossert
    Comments: 9 pages, extended version of a paper submitted to the International Workshop on Optimal Codes and Related Topics, 2017
    Subjects: Information Theory (cs.IT)

    In this paper, we derive analytic expressions for the success probability of
    decoding (Partial) Unit Memory codes in memoryless channels. An applications of
    this result is that these codes outperform individual block codes in certain
    channels.

    Flexible Cache-Aided Networks with Backhauling

    Italo Atzeni, Marco Maso, Imène Ghamnia, Ejder Baştuğ, Mérouane Debbah
    Comments: 5 pages, 5 figures, to be presented at 18th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC’2017), Sapporo, Japan, 2017
    Subjects: Information Theory (cs.IT)

    Caching at the edge is a promising technique to cope with the increasing data
    demand in wireless networks. This paper analyzes the performance of cellular
    networks consisting of a tier macro-cell wireless backhaul nodes overlaid with
    a tier of cache-aided small cells. We consider both static and dynamic
    association policies for content delivery to the user terminals and analyze
    their performance. In particular, we derive closed-form expressions for the
    area spectral efficiency and the energy efficiency, which are used to optimize
    relevant design parameters such as the density of cache-aided small cells and
    the storage size. By means of this approach, we are able to draw useful design
    insights for the deployment of highly performing cache-aided tiered networks.

    Joint Rate Control and Power Allocation for Non-Orthogonal Multiple Access Systems

    Wei Bao, He Chen, Yonghui Li, Branka Vucetic
    Comments: Accepted to appear in IEEE Journal on Selected Areas in Communications (JSAC)
    Subjects: Information Theory (cs.IT)

    This paper investigates the optimal resource allocation of a downlink
    non-orthogonal multiple access (NOMA) system consisting of one base station and
    multiple users. Unlike existing short-term NOMA designs that focused on the
    resource allocation for only the current transmission timeslot, we aim to
    maximize a long-term network utility by jointly optimizing the data rate
    control at the network layer and the power allocation among multiple users at
    the physical layer, subject to practical constraints on both the short-term and
    long-term power consumptions. To solve this problem, we leverage the
    recently-developed Lyapunov optimization framework to convert the original
    long-term optimization problem into a series of online rate control and power
    allocation problems in each timeslot. The power allocation problem, however, is
    shown to be non-convex in nature and thus cannot be solved with a standard
    method. However, we explore two structures of the optimal solution and develop
    a dynamic programming based power allocation algorithm, which can derive a
    globally optimal solution, with a polynomial computational complexity.
    Extensive simulation results are provided to evaluate the performance of the
    proposed joint rate control and power allocation framework for NOMA systems,
    which demonstrate that the proposed NOMA design can significantly outperform
    multiple benchmark schemes, including orthogonal multiple access (OMA) schemes
    with optimal power allocation and NOMA schemes with non-optimal power
    allocation, in terms of average throughput and data delay.

    The Benefit of Being Flexible in Distributed Computation

    Linqi Song, Sundara Rajan Srinivasavaradhan, Christina Fragouli
    Subjects: Information Theory (cs.IT)

    In wireless distributed computing, networked nodes perform intermediate
    computations over data placed in their memory and exchange these intermediate
    values to calculate function values. In this paper we consider an asymmetric
    setting where each node has access to a random subset of the data, i.e., we
    cannot control the data placement. The paper makes a simple point: we can
    realize significant benefits if we are allowed to be “flexible”, and decide
    which node computes which function, in our system. We make this argument in the
    case where each function depends on only two of the data messages, as is the
    case in similarity searches. We establish a percolation in the behavior of the
    system, where, depending on the amount of observed data, by being flexible, we
    may need no communication at all.

    Quantum Channel Capacities Per Unit Cost

    Dawei Ding, Dmitri S. Pavlichin, Mark M. Wilde
    Comments: 29 pages
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    Communication over a noisy channel is often conducted in a setting in which
    different input symbols to the channel incur a certain cost. For example, for
    the additive white Gaussian noise channel, the cost associated with a real
    number input symbol is the square of its magnitude. In such a setting, it is
    often useful to know the maximum amount of information that can be reliably
    transmitted per cost incurred. This is known as the capacity per unit cost. In
    this paper, we generalize the capacity per unit cost to various communication
    tasks involving a quantum channel; in particular, we consider classical
    communication, entanglement-assisted classical communication, private
    communication, and quantum communication. For each task, we define the
    corresponding capacity per unit cost and derive a formula for it via the
    expression for the capacity per channel use. Furthermore, for the special case
    in which there is a zero-cost quantum state, we obtain expressions for the
    various capacities per unit cost in terms of an optimized relative entropy
    involving the zero-cost state. For each communication task, we construct an
    explicit pulse-position-modulation coding scheme that achieves the capacity per
    unit cost. Finally, we compute capacities per unit cost for various quantum
    Gaussian channels.

    Which bridge estimator is optimal for variable selection?

    Shuaiwen Wang, Haolei Weng, Arian Maleki
    Comments: 63 pages, 10 figures
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)

    We study the problem of variable selection for linear models under the high
    dimensional asymptotic setting, where the number of observations n grows at the
    same rate as the number of predictors p. We consider two stage variable
    selection techniques (TVS) in which the first stage uses bridge estimators to
    obtain an estimate of the regression coefficients, and the second stage simply
    thresholds the regression coefficients estimate to select the “important”
    predictors. The asymptotic false discovery proportion (AFDP) and true positive
    proportion (ATPP) of these TVS are evaluated. We prove that for a fixed ATTP,
    in order to obtain the smallest AFDP one should pick an estimator that
    minimizes the asymptotic mean square error in the first stage of TVS. This
    simple observation enables us to evaluate and compare the performances of
    different TVS with each other and with some standard variable selection
    techniques, such as LASSO and Sure Independence Screening. For instance, we
    prove that a TVS with LASSO in its first stage can outperform LASSO (only one
    stage) in a large range of ATTP. Furthermore, we will show that for large
    values of noise, a TVS with ridge in its first stage outperforms TVS with other
    bridge estimators including the one that has LASSO in its first stage.




沪ICP备19023445号-2号
友情链接