IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 28 Feb 2017

    我爱机器学习(52ml.net)发表于 2017-02-28 00:00:00
    love 0

    Neural and Evolutionary Computing

    Low-Precision Batch-Normalized Activations

    Benjamin Graham
    Comments: 16 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Artificial neural networks can be trained with relatively low-precision
    floating-point and fixed-point arithmetic, using between one and 16 bits.
    Previous works have focused on relatively wide-but-shallow, feed-forward
    networks. We introduce a quantization scheme that is compatible with training
    very deep neural networks. Quantizing the network activations in the middle of
    each batch-normalization module can greatly reduce the amount of memory and
    computational power needed, with little loss in accuracy.

    Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

    Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick
    Comments: 12 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)

    Recent work on generative modeling of text has found that variational
    auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM
    language models (Bowman et al., 2015). This negative result is so far poorly
    understood, but has been attributed to the propensity of LSTM decoders to
    ignore conditioning information from the encoder. In this paper, we experiment
    with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s
    dilation architecture, we control the effective context from previously
    generated words. In experiments, we find that there is a trade off between the
    contextual capacity of the decoder and the amount of encoding information used.
    We show that with the right decoder, VAE can outperform LSTM language models.
    We demonstrate perplexity gains on two datasets, representing the first
    positive experimental result on the use VAE for generative modeling of text.
    Further, we conduct an in-depth investigation of the use of VAE (with our new
    decoding architecture) for semi-supervised and unsupervised labeling tasks,
    demonstrating gains over several strong baselines.

    Revisiting NARX Recurrent Neural Networks for Long-Term Dependencies

    Robert DiPietro, Nassir Navab, Gregory D. Hager
    Comments: Code will be released soon
    Subjects: Neural and Evolutionary Computing (cs.NE)

    Recurrent neural networks (RNNs) have shown success for many
    sequence-modeling tasks, but learning long-term dependencies from data remains
    difficult. This is often attributed to the vanishing gradient problem, which
    shows that gradient components relating a loss at time (t) to time (t – au)
    tend to decay exponentially with ( au).

    Long short-term memory (LSTM) and gated recurrent units (GRUs), the most
    widely-used RNN architectures, attempt to remedy this problem by making the
    decay’s base closer to 1. NARX RNNs take an orthogonal approach: by including
    direct connections, or delays, from the past, NARX RNNs make the decay’s
    exponent closer to 0. However, as introduced, NARX RNNs reduce the decay’s
    exponent only by a factor of (n_d), the number of delays, and simultaneously
    increase computation by this same factor.

    We introduce a new variant of NARX RNNs, called MIxed hiSTory RNNs, which
    addresses these drawbacks. We show that for ( au leq 2^{n_d-1}), MIST RNNs
    reduce the decay’s worst-case exponent from ( au / n_d) to (log au), while
    maintaining computational complexity that is similar to LSTM and GRUs. We
    compare MIST RNNs to simple RNNs, LSTM, and GRUs across 4 diverse tasks. MIST
    RNNs outperform all other methods in 2 cases, and in all cases are competitive.

    Equivariance Through Parameter-Sharing

    Siamak Ravanbakhsh, Jeff Schneider, Barnabas Poczos
    Subjects: Machine Learning (stat.ML); Neural and Evolutionary Computing (cs.NE)

    We propose to study equivariance in deep neural networks through parameter
    symmetries. In particular, given a group G that acts discretely on the input
    and output of a standard neural network layer (phi_W), we show that
    equivariance of (phi_W) is linked to the symmetry group of network parameters
    W. We then propose a sparse parameter-sharing scheme to induce the desirable
    symmetry on W. Under some conditions on the action of G, our procedure for
    tying the parameters achieves G-equivariance and guarantee sensitivity to all
    other permutation groups outside G. We demonstrate the relation of our approach
    to recently-proposed “structured” neural layers such as group-convolution and
    graph-convolution which leads to new insights and improvement of these
    operations.

    Deep Voice: Real-time Neural Text-to-Speech

    Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi
    Comments: Submitted to ICML 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

    We present Deep Voice, a production-quality text-to-speech system constructed
    entirely from deep neural networks. Deep Voice lays the groundwork for truly
    end-to-end neural speech synthesis. The system comprises five major building
    blocks: a segmentation model for locating phoneme boundaries, a
    grapheme-to-phoneme conversion model, a phoneme duration prediction model, a
    fundamental frequency prediction model, and an audio synthesis model. For the
    segmentation model, we propose a novel way of performing phoneme boundary
    detection with deep neural networks using connectionist temporal classification
    (CTC) loss. For the audio synthesis model, we implement a variant of WaveNet
    that requires fewer parameters and trains faster than the original. By using a
    neural network for each component, our system is simpler and more flexible than
    traditional text-to-speech systems, where each component requires laborious
    feature engineering and extensive domain expertise. Finally, we show that
    inference with our system can be performed faster than real time and describe
    optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x
    speedups over existing implementations.

    Adaptive Neural Networks for Fast Test-Time Prediction

    Tolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We present an approach to adaptively utilize deep neural networks in order to
    reduce the evaluation time on new examples without loss of classification
    performance. Rather than attempting to redesign or approximate existing
    networks, we propose two schemes that adaptively utilize networks. First, we
    pose an adaptive network evaluation scheme, where we learn a system to
    adaptively choose the components of a deep network to be evaluated for each
    example. By allowing examples correctly classified using early layers of the
    system to exit, we avoid the computational time associated with full evaluation
    of the network. Building upon this approach, we then learn a network selection
    system that adaptively selects the network to be evaluated for each example. We
    exploit the fact that many examples can be correctly classified using
    relatively efficient networks and that complex, computationally costly networks
    are only necessary for a small fraction of examples. By avoiding evaluation of
    these complex networks for a large fraction of examples, computational time can
    be dramatically reduced. Empirically, these approaches yield dramatic
    reductions in computational cost, with up to a 2.8x speedup on state-of-the-art
    networks from the ImageNet image recognition challenge with minimal (less than
    1%) loss of accuracy.

    On the Origin of Deep Learning

    Haohan Wang, Bhiksha Raj, Eric P. Xing
    Comments: 81 pages, 200 references
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    This paper is a review of the evolutionary history of deep learning models.
    It covers from the genesis of neural networks when associationism modeling of
    the brain is studied, to the models that dominate the last decade of research
    in deep learning like convolutional neural networks, deep belief networks, and
    recurrent neural networks, and extends to popular recent models like
    variational autoencoder and generative adversarial nets. In addition to a
    review of these models, this paper primarily focuses on the precedents of the
    models above, examining how the initial ideas are assembled to construct the
    early models and how these preliminary models are developed into their current
    forms. Many of these evolutionary paths last more than half a century and have
    a diversity of directions. For example, CNN is built on prior knowledge of
    biological vision system; DBN is evolved from a trade-off of modeling power and
    computation complexity of graphical models and many nowadays models are neural
    counterparts of ancient linear models. This paper reviews these evolutionary
    paths and offers a concise thought flow of how these models are developed, and
    aims to provide a thorough background for deep learning. More importantly,
    along with the path, this paper summarizes the gist behind these milestones and
    proposes many directions to guide the future research of deep learning.

    Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

    Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
    Comments: Accepted to IJCNN2017, Anchorage, Alaska, USA
    Subjects: Sound (cs.SD); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Environmental audio tagging is a newly proposed task to predict the presence
    or absence of a specific audio event in a chunk. Deep neural network (DNN)
    based methods have been successfully adopted for predicting the audio tags in
    the domestic audio scene. In this paper, we propose to use a convolutional
    neural network (CNN) to extract robust features from mel-filter banks (MFBs),
    spectrograms or even raw waveforms for audio tagging. Gated recurrent unit
    (GRU) based recurrent neural networks (RNNs) are then cascaded to model the
    long-term temporal structure of the audio signal. To complement the input
    information, an auxiliary CNN is designed to learn on the spatial features of
    stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging)
    of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE
    2016) challenge. Compared with our recent DNN-based method, the proposed
    structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the
    development set. The spatial features can further reduce the EER to 0.10. The
    performance of the end-to-end learning on raw waveforms is also comparable.
    Finally, on the evaluation set, we get the state-of-the-art performance with
    0.12 EER while the performance of the best existing system is 0.15 EER.


    Computer Vision and Pattern Recognition

    Skin Lesion Classification Using Hybrid Deep Neural Networks

    Amirreza Mahbod, Rupert Ecker, Isabella Ellinger
    Comments: 5 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Skin cancer is one of the major types of cancers and its incidence has been
    increasing over the past decades. Skin lesions can arise from various
    dermatologic disorders and can be classified to various types according to
    their texture, structure, color and other morphological features. The accuracy
    of diagnosis of skin lesions, specifically the discrimination of benign and
    malignant lesions, is paramount to ensure appropriate patient treatment.
    Machine learning-based classification approaches are among popular automatic
    methods for skin lesion classification. While there are many existing methods,
    convolutional neural networks (CNN) have shown to be superior over other
    classical machine learning methods for object detection and classification
    tasks. In this work, a fully automatic computerized method is proposed, which
    employs well established pre-trained convolutional neural networks and
    ensembles learning to classify skin lesions. We trained the networks using 2000
    skin lesion images available from the ISIC 2017 challenge, which has three main
    categories and includes 374 melanoma, 254 seborrheic keratosis and 1372 benign
    nevi images. The trained classifier was then tested on 150 unlabeled images.
    The results, evaluated by the challenge organizer and based on the area under
    the receiver operating characteristic curve (AUC), were 84.8% and 93.6% for
    Melanoma and seborrheic keratosis binary classification problem, respectively.

    The proposed method achieved competitive results to experienced
    dermatologist. Further improvement and optimization of the proposed method with
    a larger training dataset could lead to a more precise, reliable and robust
    method for skin lesion classification.

    Age Progression/Regression by Conditional Adversarial Autoencoder

    Zhifei Zhang, Yang Song, Hairong Qi
    Comments: Accepted by The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    “If I provide you a face image of mine (without telling you the actual age
    when I took the picture) and a large amount of face images that I crawled
    (containing labeled faces of different ages but not necessarily paired), can
    you show me what I would look like when I am 80 or what I was like when I was
    5?” The answer is probably a “No.” Most existing face aging works attempt to
    learn the transformation between age groups and thus would require the paired
    samples as well as the labeled query image. In this paper, we look at the
    problem from a generative modeling perspective such that no paired samples is
    required. In addition, given an unlabeled image, the generative model can
    directly produce the image with desired age attribute. We propose a conditional
    adversarial autoencoder (CAAE) that learns a face manifold, traversing on which
    smooth age progression and regression can be realized simultaneously. In CAAE,
    the face is first mapped to a latent vector through a convolutional encoder,
    and then the vector is projected to the face manifold conditional on age
    through a deconvolutional generator. The latent vector preserves personalized
    face features (i.e., personality) and the age condition controls progression
    vs. regression. Two adversarial networks are imposed on the encoder and
    generator, respectively, forcing to generate more photo-realistic faces.
    Experimental results demonstrate the appealing performance and flexibility of
    the proposed framework by comparing with the state-of-the-art and ground truth.

    Asymmetric Tri-training for Unsupervised Domain Adaptation

    Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Deep-layered models trained on a large number of labeled samples boost the
    accuracy of many tasks. It is important to apply such models to different
    domains because collecting many labeled samples in various domains is
    expensive. In unsupervised domain adaptation, one needs to train a classifier
    that works well on a target domain when provided with labeled source samples
    and unlabeled target samples. Although many methods aim to match the
    distributions of source and target samples, simply matching the distribution
    cannot ensure accuracy on the target domain. To learn discriminative
    representations for the target domain, we assume that artificially labeling
    target samples can result in a good representation. Tri-training leverages
    three classifiers equally to give pseudo-labels to unlabeled samples, but the
    method does not assume labeling samples generated from a different domain.In
    this paper, we propose an asymmetric tri-training method for unsupervised
    domain adaptation, where we assign pseudo-labels to unlabeled samples and train
    neural networks as if they are true labels. In our work, we use three networks
    asymmetrically. By asymmetric, we mean that two networks are used to label
    unlabeled target samples and one network is trained by the samples to obtain
    target-discriminative representations. We evaluate our method on digit
    recognition and sentiment analysis datasets. Our proposed method achieves
    state-of-the-art performance on the benchmark digit recognition datasets of
    domain adaptation.

    Revealing Hidden Potentials of q-Space Imaging in Breast Cancer

    Paul Jaeger, Sebastian Bickelhaupt, Frederik Bernd Laun, Wolfgang Lederer, Daniel Heidi, Tristan Anselm Kuder, Daniel Paech, David Bonekamp, Alexander Radbruch, Stefan Delorme, Heinz-Peter Schlemmer, Franziska Steudle, Klaus H. Maier-Hein
    Comments: currently under review for MICCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Mammography screening for early detection of breast lesions currently suffers
    from high amounts of false positive findings, which result in unnecessary
    invasive biopsies. Diffusion-weighted MR images (DWI) can help to reduce many
    of these false-positive findings prior to biopsy. Current approaches estimate
    tissue properties by means of quantitative parameters taken from generative,
    biophysical models fit to the q-space encoded signal under certain assumptions
    regarding noise and spatial homogeneity. This process is prone to fitting
    instability and partial information loss due to model simplicity. We reveal
    previously unexplored potentials of the signal by integrating all data
    processing components into a convolutional neural network (CNN) architecture
    that is designed to propagate clinical target information down to the raw input
    images. This approach enables simultaneous and target-specific optimization of
    image normalization, signal exploitation, global representation learning and
    classification. Using a multicentric data set of 222 patients, we demonstrate
    that our approach significantly improves clinical decision making with respect
    to the current state of the art.

    Multi-Label Segmentation via Residual-Driven Adaptive Regularization

    Byung-Woo Hong, Ja-Keoung Koo, Stefano Soatto
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a variational multi-label segmentation algorithm based on a robust
    Huber loss for both the data and the regularizer, minimized within a convex
    optimization framework. We introduce a novel constraint on the common areas, to
    bias the solution towards mutually exclusive regions. We also propose a
    regularization scheme that is adapted to the spatial statistics of the residual
    at each iteration, resulting in a varying degree of regularization being
    applied as the algorithm proceeds: the effect of the regularizer is strongest
    at initialization, and wanes as the solution increasingly fits the data. This
    minimizes the bias induced by the regularizer at convergence. We design an
    efficient convex optimization algorithm based on the alternating direction
    method of multipliers using the equivalent relation between the Huber function
    and the proximal operator of the one-norm. We empirically validate our proposed
    algorithm on synthetic and real images and offer an information-theoretic
    derivation of the cost-function that highlights the modeling choices made.

    Visual Translation Embedding Network for Visual Relation Detection

    Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat-Seng Chua
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Visual relations, such as “person ride bike” and “bike next to car”, offer a
    comprehensive scene understanding of an image, and have already shown their
    great utility in connecting computer vision and natural language. However, due
    to the challenging combinatorial complexity of modeling
    subject-predicate-object relation triplets, very little work has been done to
    localize and predict visual relations. Inspired by the recent advances in
    relational representation learning of knowledge bases and convolutional object
    detection networks, we propose a Visual Translation Embedding network (VTransE)
    for visual relation detection. VTransE places objects in a low-dimensional
    relation space where a relation can be modeled as a simple vector translation,
    i.e., subject + predicate (approx) object. We propose a novel feature
    extraction layer that enables object-relation knowledge transfer in a
    fully-convolutional fashion that supports training and inference in a single
    forward/backward pass. To the best of our knowledge, VTransE is the first
    end-to-end relation detection network. We demonstrate the effectiveness of
    VTransE over other state-of-the-art methods on two large-scale datasets: Visual
    Relationship and Visual Genome. Note that even though VTransE is a purely
    visual model, it is still competitive to the Lu’s multi-modal model with
    language priors.

    Efficient Privacy Preserving Viola-Jones Type Object Detection via Random Base Image Representation

    Xin Jin, Peng Yuan, Xiaodong Li, Chenggen Song, Shiming Ge, Geng Zhao, Yingya Chen
    Comments: 6 pages, 3 figures, To appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 – Jul 14, 2017, Hong Kong, Hong Kong
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A cloud server spent a lot of time, energy and money to train a Viola-Jones
    type object detector with high accuracy. Clients can upload their photos to the
    cloud server to find objects. However, the client does not want the leakage of
    the content of his/her photos. In the meanwhile, the cloud server is also
    reluctant to leak any parameters of the trained object detectors. 10 years ago,
    Avidan & Butman introduced Blind Vision, which is a method for securely
    evaluating a Viola-Jones type object detector. Blind Vision uses standard
    cryptographic tools and is painfully slow to compute, taking a couple of hours
    to scan a single image. The purpose of this work is to explore an efficient
    method that can speed up the process. We propose the Random Base Image (RBI)
    Representation. The original image is divided into random base images. Only the
    base images are submitted randomly to the cloud server. Thus, the content of
    the image can not be leaked. In the meanwhile, a random vector and the secure
    Millionaire protocol are leveraged to protect the parameters of the trained
    object detector. The RBI makes the integral-image enable again for the great
    acceleration. The experimental results reveal that our method can retain the
    detection accuracy of that of the plain vision algorithm and is significantly
    faster than the traditional blind vision, with only a very low probability of
    the information leakage theoretically.

    A Dataset for Developing and Benchmarking Active Vision

    Phil Ammirato, Patrick Poirson, Eunbyung Park, Jana Kosecka, Alexander C. Berg
    Comments: To appear at ICRA 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a new public dataset with a focus on simulating robotic vision
    tasks in everyday indoor environments using real imagery. The dataset includes
    20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely
    captured in 9 unique scenes. We train a fast object category detector for
    instance detection on our data. Using the dataset we show that, although
    increasingly accurate and fast, the state of the art for object detection is
    still severely impacted by object scale, occlusion, and viewing direction all
    of which matter for robotics applications. We next validate the dataset for
    simulating active vision, and use the dataset to develop and evaluate a
    deep-network-based system for next best move prediction for object
    classification using reinforcement learning. Our dataset is available for
    download at cs.unc.edu/~ammirato/active_vision_dataset_website/.

    DeepNAT: Deep Convolutional Neural Network for Segmenting Neuroanatomy

    Christian Wachinger, Martin Reuter, Tassilo Klein
    Comments: Accepted for publication in NeuroImage, special issue “Brain Segmentation and Parcellation”, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We introduce DeepNAT, a 3D Deep convolutional neural network for the
    automatic segmentation of NeuroAnaTomy in T1-weighted magnetic resonance
    images. DeepNAT is an end-to-end learning-based approach to brain segmentation
    that jointly learns an abstract feature representation and a multi-class
    classification. We propose a 3D patch-based approach, where we do not only
    predict the center voxel of the patch but also neighbors, which is formulated
    as multi-task learning. To address a class imbalance problem, we arrange two
    networks hierarchically, where the first one separates foreground from
    background, and the second one identifies 25 brain structures on the
    foreground. Since patches lack spatial context, we augment them with
    coordinates. To this end, we introduce a novel intrinsic parameterization of
    the brain volume, formed by eigenfunctions of the Laplace-Beltrami operator. As
    network architecture, we use three convolutional layers with pooling, batch
    normalization, and non-linearities, followed by fully connected layers with
    dropout. The final segmentation is inferred from the probabilistic output of
    the network with a 3D fully connected conditional random field, which ensures
    label agreement between close voxels. The roughly 2.7 million parameters in the
    network are learned with stochastic gradient descent. Our results show that
    DeepNAT compares favorably to state-of-the-art methods. Finally, the purely
    learning-based method may have a high potential for the adaptation to young,
    old, or diseased brains by fine-tuning the pre-trained network with a small
    training sample on the target application, where the availability of larger
    datasets with manual annotations may boost the overall segmentation accuracy in
    the future.

    HashBox: Hash Hierarchical Segmentation exploiting Bounding Box Object Detection

    Joachim Curto, Irene Zarza, Alexander J. Smola, Luc Van Gool
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a novel approach to address the Simultaneous Detection and
    Segmentation problem. Using hierarchical structures we use an efficient and
    accurate procedure that exploits the hierarchy feature information using
    Locality Sensitive Hashing. We build on recent work that utilizes convolutional
    neural networks to detect bounding boxes in an image (Faster R-CNN) and then
    use the top similar hierarchical region that best fits each bounding box after
    hashing, we call this approach HashBox. We then refine our final segmentation
    results by automatic hierarchy pruning. HashBox introduces a train-free
    alternative to Hypercolumns. We conduct extensive experiments on Pascal VOC
    2012 segmentation dataset, showing that HashBox gives competitive
    state-of-the-art object segmentations.

    Multi-scale Image Fusion Between Pre-operative Clinical CT and X-ray Microtomography of Lung Pathology

    Holger R. Roth, Kai Nagara, Hirohisa Oda, Masahiro Oda, Tomoshi Sugiyama, Shota Nakamura, Kensaku Mori
    Comments: In proceedings of International Forum on Medical Imaging, IFMIA 2017, Okinawa, Japan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Computational anatomy allows the quantitative analysis of organs in medical
    images. However, most analysis is constrained to the millimeter scale because
    of the limited resolution of clinical computed tomography (CT). X-ray
    microtomography ((mu)CT) on the other hand allows imaging of ex-vivo tissues
    at a resolution of tens of microns. In this work, we use clinical CT to image
    lung cancer patients before partial pneumonectomy (resection of pathological
    lung tissue). The resected specimen is prepared for (mu)CT imaging at a voxel
    resolution of 50 (mu)m (0.05 mm). This high-resolution image of the lung
    cancer tissue allows further insides into understanding of tumor growth and
    categorization. For making full use of this additional information, image
    fusion (registration) needs to be performed in order to re-align the (mu)CT
    image with clinical CT. We developed a multi-scale non-rigid registration
    approach. After manual initialization using a few landmark points and rigid
    alignment, several levels of non-rigid registration between down-sampled (in
    the case of (mu)CT) and up-sampled (in the case of clinical CT)
    representations of the image are performed. Any non-lung tissue is ignored
    during the computation of the similarity measure used to guide the registration
    during optimization. We are able to recover the volume differences introduced
    by the resection and preparation of the lung specimen. The average ((pm) std.
    dev.) minimum surface distance between (mu)CT and clinical CT at the resected
    lung surface is reduced from 3.3 (pm) 2.9 (range: [0.1, 15.9]) to 2.3 mm (pm)
    2.8 (range: [0.0, 15.3]) mm. The alignment of clinical CT with (mu)CT will
    allow further registration with even finer resolutions of (mu)CT (up to 10
    (mu)m resolution) and ultimately with histopathological microscopy images for
    further macro to micro image fusion that can aid medical image analysis.

    Bioplausible multiscale filtering in retino-cortical processing as a mechanism in perceptual grouping

    Nasim Nematzadeh, David M. W. Powers, Trent W. Lewis
    Comments: 19 pages, 7 figures, submitted manuscript to Brain Informatics journal
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Why does our visual system fail to reconstruct reality, when we look at
    certain patterns? Where do Geometrical illusions start to emerge in the visual
    pathway? Should computational models of vision have the same visual ability to
    detect illusions as we do? This study addresses these questions, by focusing on
    a specific underlying neural mechanism involved in our visual experiences that
    affects our final perception. Among many types of visual illusion,
    ‘Geometrical’ and, in particular, ‘Tilt’ illusions are rather important, being
    characterized by misperception of geometric patterns involving lines and tiles
    in combination with contrasting orientation, size or position. Over the last
    decade, many new neurophysiological experiments have led to new insights as to
    how, when and where retinal processing takes place, and the encoding nature of
    the retinal representation that is sent to the cortex for further processing.
    Based on these neurobiological discoveries, we provide computer simulation
    evidence to suggest that visual Geometrical illusions are explained in part, by
    the interaction of multiscale visual processing performed in the retina. The
    output of our retinal stage model is presented for several types of Tilt
    illusion, in which the final tilt percept arises from multiple scale processing
    of Differences of Gaussian and the perceptual interaction of foreground and
    background elements. Our results suggest that this multilevel filtering
    explanation, which is a simplified simulation for Retinal Ganglion Cell’s
    responses to these patterns is indeed the underlying mechanism connecting
    low-level filtering to mid- and high-level explanations such as ‘anchoring
    theory’ and ‘perceptual grouping’.

    3D Scanning System for Automatic High-Resolution Plant Phenotyping

    Chuong V Nguyen, Jurgen Fripp, David R Lovell, Robert Furbank, Peter Kuffner, Helen Daily, Xavier Sirault
    Comments: 8 papes, DICTA 2016
    Journal-ref: In Digital Image Computing: Techniques and Applications (DICTA),
    2016 International Conference on, pp. 1-8. IEEE, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Thin leaves, fine stems, self-occlusion, non-rigid and slowly changing
    structures make plants difficult for three-dimensional (3D) scanning and
    reconstruction — two critical steps in automated visual phenotyping. Many
    current solutions such as laser scanning, structured light, and multiview
    stereo can struggle to acquire usable 3D models because of limitations in
    scanning resolution and calibration accuracy. In response, we have developed a
    fast, low-cost, 3D scanning platform to image plants on a rotating stage with
    two tilting DSLR cameras centred on the plant. This uses new methods of camera
    calibration and background removal to achieve high-accuracy 3D reconstruction.
    We assessed the system’s accuracy using a 3D visual hull reconstruction
    algorithm applied on 2 plastic models of dicotyledonous plants, 2 sorghum
    plants and 2 wheat plants across different sets of tilt angles. Scan times
    ranged from 3 minutes (to capture 72 images using 2 tilt angles), to 30 minutes
    (to capture 360 images using 10 tilt angles). The leaf lengths, widths, areas
    and perimeters of the plastic models were measured manually and compared to
    measurements from the scanning system: results were within 3-4% of each other.
    The 3D reconstructions obtained with the scanning system show excellent
    geometric agreement with all six plant specimens, even plants with thin leaves
    and fine stems.

    Adversarial Networks for the Detection of Aggressive Prostate Cancer

    Simon Kohl, David Bonekamp, Heinz-Peter Schlemmer, Kaneschka Yaqubi, Markus Hohenfellner, Boris Hadaschik, Jan-Philipp Radtke, Klaus Maier-Hein
    Comments: 8 pages, 3 figures; under review as a conference paper at MICCAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Semantic segmentation constitutes an integral part of medical image analyses
    for which breakthroughs in the field of deep learning were of high relevance.
    The large number of trainable parameters of deep neural networks however
    renders them inherently data hungry, a characteristic that heavily challenges
    the medical imaging community. Though interestingly, with the de facto standard
    training of fully convolutional networks (FCNs) for semantic segmentation being
    agnostic towards the `structure’ of the predicted label maps, valuable
    complementary information about the global quality of the segmentation lies
    idle. In order to tap into this potential, we propose utilizing an adversarial
    network which discriminates between expert and generated annotations in order
    to train FCNs for semantic segmentation. Because the adversary constitutes a
    learned parametrization of what makes a good segmentation at a global level, we
    hypothesize that the method holds particular advantages for segmentation tasks
    on complex structured, small datasets. This holds true in our experiments: We
    learn to segment aggressive prostate cancer utilizing MRI images of 152
    patients and show that the proposed scheme is superior over the de facto
    standard in terms of the detection sensitivity and the dice-score for
    aggressive prostate cancer. The achieved relative gains are shown to be
    particularly pronounced in the small dataset limit.

    Analyzing Modular CNN Architectures for Joint Depth Prediction and Semantic Segmentation

    Omid Hosseini Jafari, Oliver Groth, Alexander Kirillov, Michael Ying Yang, Carsten Rother
    Comments: Accepted to ICRA 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    This paper addresses the task of designing a modular neural network
    architecture that jointly solves different tasks. As an example we use the
    tasks of depth estimation and semantic segmentation given a single RGB image.
    The main focus of this work is to analyze the cross-modality influence between
    depth and semantic prediction maps on their joint refinement. While most
    previous works solely focus on measuring improvements in accuracy, we propose a
    way to quantify the cross-modality influence. We show that there is a
    relationship between final accuracy and cross-modality influence, although not
    a simple linear one. Hence a larger cross-modality influence does not
    necessarily translate into an improved accuracy. We find that a beneficial
    balance between the cross-modality influences can be achieved by network
    architecture and conjecture that this relationship can be utilized to
    understand different network design choices. Towards this end we propose a
    Convolutional Neural Network (CNN) architecture that fuses the state of the
    state-of-the-art results for depth estimation and semantic labeling. By
    balancing the cross-modality influences between depth and semantic prediction,
    we achieve improved results for both tasks using the NYU-Depth v2 benchmark.

    Bayesian Nonparametric Unmixing of Hyperspectral Images

    Jürgen Hahn, Abdelhak M.Zoubir
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Hyperspectral imaging is an important tool in remote sensing, allowing for
    accurate analysis of vast areas. Due to a low spatial resolution, a pixel of a
    hyperspectral image rarely represents a single material, but rather a mixture
    of different spectra. HSU aims at estimating the pure spectra present in the
    scene of interest, referred to as endmembers, and their fractions in each
    pixel, referred to as abundances. Today, many HSU algorithms have been
    proposed, based either on a geometrical or statistical model. While most
    methods assume that the number of endmembers present in the scene is known,
    there is only little work about estimating this number from the observed data.
    In this work, we propose a Bayesian nonparametric framework that jointly
    estimates the number of endmembers, the endmembers itself, and their
    abundances, by making use of the Indian Buffet Process as a prior for the
    endmembers. Simulation results and experiments on real data demonstrate the
    effectiveness of the proposed algorithm, yielding results comparable with
    state-of-the-art methods while being able to reliably infer the number of
    endmembers. In scenarios with strong noise, where other algorithms provide only
    poor results, the proposed approach tends to overestimate the number of
    endmembers slightly. The additional endmembers, however, often simply represent
    noisy replicas of present endmembers and could easily be merged in a
    post-processing step.

    A multi-task convolutional neural network for mega-city analysis using very high resolution satellite imagery and geospatial data

    Fan Zhang, Bo Du, Liangpei Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Mega-city analysis with very high resolution (VHR) satellite images has been
    drawing increasing interest in the fields of city planning and social
    investigation. It is known that accurate land-use, urban density, and
    population distribution information is the key to mega-city monitoring and
    environmental studies. Therefore, how to generate land-use, urban density, and
    population distribution maps at a fine scale using VHR satellite images has
    become a hot topic. Previous studies have focused solely on individual tasks
    with elaborate hand-crafted features and have ignored the relationship between
    different tasks. In this study, we aim to propose a universal framework which
    can: 1) automatically learn the internal feature representation from the raw
    image data; and 2) simultaneously produce fine-scale land-use, urban density,
    and population distribution maps. For the first target, a deep convolutional
    neural network (CNN) is applied to learn the hierarchical feature
    representation from the raw image data. For the second target, a novel
    CNN-based universal framework is proposed to process the VHR satellite images
    and generate the land-use, urban density, and population distribution maps. To
    the best of our knowledge, this is the first CNN-based mega-city analysis
    method which can process a VHR remote sensing image with such a large data
    volume. A VHR satellite image (1.2 m spatial resolution) of the center of Wuhan
    covering an area of 2606 km2 was used to evaluate the proposed method. The
    experimental results confirm that the proposed method can achieve a promising
    accuracy for land-use, urban density, and population distribution maps.

    Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition

    Xuefeng Xiao, Lianwen Jin, Yafeng Yang, Weixin Yang, Jun Sun, Tianhai Chang
    Comments: 15 pages, 7 figures, 5 tables
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Like other problems in computer vision, offline handwritten Chinese character
    recognition (HCCR) has achieved impressive results using convolutional neural
    network (CNN)-based methods. However, larger and deeper networks are needed to
    deliver state-of-the-art results in this domain. Such networks intuitively
    appear to incur high computational cost, and require the storage of a large
    number of parameters, which renders them unfeasible for deployment in portable
    devices. To solve this problem, we propose a Global Supervised Low-rank
    Expansion (GSLRE) method and an Adaptive Drop-weight (ADW) technique to solve
    the problems of speed and storage capacity. We design a nine-layer CNN for HCCR
    consisting of 3,755 classes, and devise an algorithm that can reduce the
    networks computational cost by nine times and compress the network to 1/18 of
    the original size of the baseline model, with only a 0.21% drop in accuracy. In
    tests, the proposed algorithm surpassed the best single-network performance
    reported thus far in the literature while requiring only 2.3 MB for storage.
    Furthermore, when integrated with our effective forward implementation, the
    recognition of an offline character image took only 9.7 ms on a CPU. Compared
    with the state-of-the-art CNN model for HCCR, our approach is approximately 30
    times faster, yet 10 times more cost efficient.

    Seeing What Is Not There: Learning Context to Determine Where Objects Are Missing

    Jin Sun, David W. Jacobs
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Most of computer vision focuses on what is in an image. We propose to train a
    standalone object-centric context representation to perform the opposite task:
    seeing what is not there. Given an image, our context model can predict where
    objects should exist, even when no object instances are present. Combined with
    object detection results, we can perform a novel vision task: finding where
    objects are missing in an image. Our model is based on a convolutional neural
    network structure. With a specially designed training strategy, the model
    learns to ignore objects and focus on context only. It is fully convolutional
    thus highly efficient. Experiments show the effectiveness of the proposed
    approach in one important accessibility task: finding city street regions where
    curb ramps are missing, which could help millions of people with mobility
    disabilities.

    Spatially Aware Melanoma Segmentation Using Hybrid Deep Learning Techniques

    M. Attia, M. Hossny, S. Nahavandi, A. Yazdabadi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we proposed using a hybrid method that utilises deep
    convolutional and recurrent neural networks for accurate delineation of skin
    lesion of images supplied with ISBI 2017 lesion segmentation challenge. The
    proposed method was trained using 1800 images and tested on 150 images from
    ISBI 2017 challenge.

    BARCHAN: Blob Alignment for Robust CHromatographic ANalysis

    Camille Couprie, Laurent Duval, Maxime Moreaud, Sophie Hénon, Mélinda Tebib, Vincent Souchon
    Comments: 15 pages, published in the Special issue for RIVA 2016, 40th International Symposium on Capillary Chromatography and 13th GCxGC Symposium
    Journal-ref: Journal of Chromatography A, Volume 1484, February 2017, Pages
    65-72
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an)

    Comprehensive Two dimensional gas chromatography (GCxGC) plays a central role
    into the elucidation of complex samples. The automation of the identification
    of peak areas is of prime interest to obtain a fast and repeatable analysis of
    chromatograms. To determine the concentration of compounds or pseudo-compounds,
    templates of blobs are defined and superimposed on a reference chromatogram.
    The templates then need to be modified when different chromatograms are
    recorded. In this study, we present a chromatogram and template alignment
    method based on peak registration called BARCHAN. Peaks are identified using a
    robust mathematical morphology tool. The alignment is performed by a
    probabilistic estimation of a rigid transformation along the first dimension,
    and a non-rigid transformation in the second dimension, taking into account
    noise, outliers and missing peaks in a fully automated way. Resulting aligned
    chromatograms and masks are presented on two datasets. The proposed algorithm
    proves to be fast and reliable. It significantly reduces the time to results
    for GCxGC analysis.

    Image Stitching by Line-guided Local Warping with Global Similarity Constraint

    Tianzhu Xiang, Gui-Song Xia, Xiang Bai, Liangpei Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Low-textured image stitching remains a challenging problem. It is difficult
    to achieve good alignment and is easy to break image structures, due to the
    insufficient and unreliable point correspondences. Besides, for the viewpoint
    variations between multiple images, the stitched images suffer from projective
    distortions. To this end, this paper presents a line-guided local warping with
    global similarity constraint for image stitching. A two-stage alignment scheme
    is adopted for good alignment. More precisely, the line correspondence is
    employed as alignment constraint to guide the accurate estimation of projective
    warp, then line feature constraints are integrated into mesh-based warping
    framework to refine the alignment while preserving image structures. To
    mitigate projectve distortions in non-overlapping regions, we combine global
    similarity constraint with the projective warps via a weight strategy, so that
    the final warp slowly changes from projective to similarity across the image.
    This is also integrated into local multiple homographies model for better
    parallax handling. Our method is evaluated on a series of images and compared
    with several other methods. Experiments demonstrate that the proposed method
    provides convincing stitching performance and outperforms other
    state-of-the-art methods.

    An EM Based Probabilistic Two-Dimensional CCA with Application to Face Recognition

    Mehran Safayani, Seyed Hashem Ahmadi, Homayun Afrabandpey, Abdolreza Mirzaei
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Recently, two-dimensional canonical correlation analysis (2DCCA) has been
    successfully applied for image feature extraction. The method instead of
    concatenating the columns of the images to the one-dimensional vectors,
    directly works with two-dimensional image matrices. Although 2DCCA works well
    in different recognition tasks, it lacks a probabilistic interpretation. In
    this paper, we present a probabilistic framework for 2DCCA called probabilistic
    2DCCA (P2DCCA) and an iterative EM based algorithm for optimizing the
    parameters. Experimental results on synthetic and real data demonstrate
    superior performance in loading factor estimation for P2DCCA compared to 2DCCA.
    For real data, three subsets of AR face database and also the UMIST face
    database confirm the robustness of the proposed algorithm in face recognition
    tasks with different illumination conditions, facial expressions, poses and
    occlusions.

    Transfer Learning for Domain Adaptation in MRI: Application in Brain Lesion Segmentation

    Mohsen Ghafoorian, Alireza Mehrtash, Tina Kapur, Nico Karssemeijer, Elena Marchiori, Mehran Pesteie, Charles R. G. Guttmann, Frank-Erik de Leeuw, Clare M. Tempany, Bram van Ginneken, Andriy Fedorov, Purang Abolmaesumi, Bram Platel, William M. Wells III
    Comments: 8 pages, 3 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Magnetic Resonance Imaging (MRI) is widely used in routine clinical diagnosis
    and treatment. However, variations in MRI acquisition protocols result in
    different appearances of normal and diseased tissue in the images.
    Convolutional neural networks (CNNs), which have shown to be successful in many
    medical image analysis tasks, are typically sensitive to the variations in
    imaging protocols. Therefore, in many cases, networks trained on data acquired
    with one MRI protocol, do not perform satisfactorily on data acquired with
    different protocols. This limits the use of models trained with large annotated
    legacy datasets on a new dataset with a different domain which is often a
    recurring situation in clinical settings. In this study, we aim to answer the
    following central questions regarding domain adaptation in medical image
    analysis: Given a fitted legacy model, 1) How much data from the new domain is
    required for a decent adaptation of the original network?; and, 2) What portion
    of the pre-trained model parameters should be retrained given a certain number
    of the new domain training samples? To address these questions, we conducted
    extensive experiments in white matter hyperintensity segmentation task. We
    trained a CNN on legacy MR images of brain and evaluated the performance of the
    domain-adapted network on the same task with images from a different domain. We
    then compared the performance of the model to the surrogate scenarios where
    either the same trained network is used or a new network is trained from
    scratch on the new dataset.The domain-adapted network tuned only by two
    training examples achieved a Dice score of 0.63 substantially outperforming a
    similar network trained on the same set of examples from scratch.

    Synthesizing Training Data for Object Detection in Indoor Scenes

    Georgios Georgakis, Arsalan Mousavian, Alexander C. Berg, Jana Kosecka
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    Detection of objects in cluttered indoor environments is one of the key
    enabling functionalities for service robots. The best performing object
    detection approaches in computer vision exploit deep Convolutional Neural
    Networks (CNN) to simultaneously detect and categorize the objects of interest
    in cluttered scenes. Training of such models typically requires large amounts
    of annotated training data which is time consuming and costly to obtain. In
    this work we explore the ability of using synthetically generated composite
    images for training state of the art object detectors. We superimpose 2D images
    of textured object models into images of real environments at variety of
    locations and scales. Our experiments evaluate different superimposition
    strategies ranging from purely image-based blending all the way to depth and
    semantics informed positioning of the object models to real scenes. We
    demonstrate the effectiveness of these object detector training strategies on
    publicly available datasets of GMU-Kitchens and Washington RGB-D Scenes v2, and
    show how object detectors can be trained with limited amounts of annotated real
    scenes with objects present. This charts new opportunities for training
    detectors for new objects by exploiting existing object model repositories in
    either a purely automatic fashion or with only a very small number of
    human-annotated examples.

    Video and Accelerometer-Based Motion Analysis for Automated Surgical Skills Assessment

    Aneeq Zia, Yachna Sharma, Vinay Bettadapura, Eric L. Sarin, Irfan Essa
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Purpose: Basic surgical skills of suturing and knot tying are an essential
    part of medical training. Having an automated system for surgical skills
    assessment could help save experts time and improve training efficiency. There
    have been some recent attempts at automated surgical skills assessment using
    either video analysis or acceleration data. In this paper, we present a novel
    approach for automated assessment of OSATS based surgical skills and provide an
    analysis of different features on multi-modal data (video and accelerometer
    data). Methods: We conduct the largest study, to the best of our knowledge, for
    basic surgical skills assessment on a dataset that contained video and
    accelerometer data for suturing and knot-tying tasks. We introduce “entropy
    based” features – Approximate Entropy (ApEn) and Cross-Approximate Entropy
    (XApEn), which quantify the amount of predictability and regularity of
    fluctuations in time-series data. The proposed features are compared to
    existing methods of Sequential Motion Texture (SMT), Discrete Cosine Transform
    (DCT) and Discrete Fourier Transform (DFT), for surgical skills assessment.
    Results: We report average performance of different features across all
    applicable OSATS criteria for suturing and knot tying tasks. Our analysis shows
    that the proposed entropy based features out-perform previous state-of-the-art
    methods using video data. For accelerometer data, our method performs better
    for suturing only. We also show that fusion of video and acceleration features
    can improve overall performance with the proposed entropy features achieving
    highest accuracy. Conclusions: Automated surgical skills assessment can be
    achieved with high accuracy using the proposed entropy features. Such a system
    can significantly improve the efficiency of surgical training in medical
    schools and teaching hospitals.

    Unifying local and non-local signal processing with graph CNNs

    Gilles Puy, Srdan Kitic, Patrick Pérez
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper deals with the unification of local and non-local signal
    processing on graphs within a single convolutional neural network (CNN)
    framework. Building upon recent works on graph CNNs, we propose to use
    convolutional layers that take as inputs two variables, a signal and a graph,
    allowing the network to adapt to changes in the graph structure. This also
    allows us to learn through training the optimal mixing of locality and
    non-locality, in cases where the graph is built on the input signal itself. We
    demonstrate the versatility and the effectiveness of our framework on several
    types of signals (greyscale and color images, color palettes and speech
    signals) and on several applications (style transfer, color transfer, and
    denoising).

    Fast and Accurate Inference with Adaptive Ensemble Prediction in Image Classification with Deep Neural Networks

    Hiroshi Inoue
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Ensembling multiple predictions is a widely used technique to improve the
    accuracy of various machine learning tasks. In image classification tasks, for
    example, averaging the predictions for multiple patches extracted from the
    input image significantly improves accuracy. Using multiple networks trained
    independently to make predictions improves accuracy further. One obvious
    drawback of the ensembling technique is its higher execution cost during
    inference. If we average 100 predictions, the execution cost will be 100 times
    as high as the cost without the ensemble. This higher cost limits the
    real-world use of ensembling, even though using it is almost the norm to win
    image classification competitions. In this paper, we describe a new technique
    called adaptive ensemble prediction, which achieves the benefits of ensembling
    with much smaller additional execution costs. Our observation behind this
    technique is that many easy-to-predict inputs do not require ensembling. Hence
    we calculate the confidence level of the prediction for each input on the basis
    of the probability of the predicted label, i.e. the outputs from the softmax,
    during the ensembling computation. If the prediction for an input reaches a
    high enough probability on the basis of the confidence level, we stop
    ensembling for this input to avoid wasting computation power. We evaluated the
    adaptive ensembling by using various datasets and showed that it reduces the
    computation time significantly while achieving similar accuracy to the naive
    ensembling.

    Low-Precision Batch-Normalized Activations

    Benjamin Graham
    Comments: 16 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)

    Artificial neural networks can be trained with relatively low-precision
    floating-point and fixed-point arithmetic, using between one and 16 bits.
    Previous works have focused on relatively wide-but-shallow, feed-forward
    networks. We introduce a quantization scheme that is compatible with training
    very deep neural networks. Quantizing the network activations in the middle of
    each batch-normalization module can greatly reduce the amount of memory and
    computational power needed, with little loss in accuracy.

    Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration

    Judith Bütepage, Hedvig Kjellström, Danica Kragic
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

    Fluent and safe interactions of humans and robots require both partners to
    anticipate the others’ actions. A common approach to human intention inference
    is to model specific trajectories towards known goals with supervised
    classifiers. However, these approaches do not take possible future movements
    into account nor do they make use of kinematic cues, such as legible and
    predictable motion. The bottleneck of these methods is the lack of an accurate
    model of general human motion. In this work, we present a conditional
    variational autoencoder that is trained to predict a window of future human
    motion given a window of past frames. Using skeletal data obtained from RGB
    depth images, we show how this unsupervised approach can be used for online
    motion prediction for up to 1660 ms. Additionally, we demonstrate online target
    prediction within the first 300-500 ms after motion onset without the use of
    target specific training data. The advantage of our probabilistic approach is
    the possibility to draw samples of possible future motions. Finally, we
    investigate how movements and kinematic cues are represented on the learned low
    dimensional manifold.

    Bayesian Nonparametric Feature and Policy Learning for Decision-Making

    Jürgen Hahn, Abdelhak M. Zoubir
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Learning from demonstrations has gained increasing interest in the recent
    past, enabling an agent to learn how to make decisions by observing an
    experienced teacher. While many approaches have been proposed to solve this
    problem, there is only little work that focuses on reasoning about the observed
    behavior. We assume that, in many practical problems, an agent makes its
    decision based on latent features, indicating a certain action. Therefore, we
    propose a generative model for the states and actions. Inference reveals the
    number of features, the features, and the policies, allowing us to learn and to
    analyze the underlying structure of the observed behavior. Further, our
    approach enables prediction of actions for new states. Simulations are used to
    assess the performance of the algorithm based upon this model. Moreover, the
    problem of learning a driver’s behavior is investigated, demonstrating the
    performance of the proposed model in a real-world scenario.

    Supervised Learning of Labeled Pointcloud Differences via Cover-Tree Entropy Reduction

    Abraham Smith, Paul Bendich, John Harer, Jay Hineman
    Comments: Distribution Statement A – Approved for public release, distribution is unlimited
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We introduce a new algorithm, called CDER, for supervised machine learning
    that merges the multi-scale geometric properties of Cover Trees with the
    information-theoretic properties of entropy. CDER applies to a training set of
    labeled pointclouds embedded in a common Euclidean space. If typical
    pointclouds corresponding to distinct labels tend to differ at any scale in any
    sub-region, CDER can identify these differences in (typically) linear time,
    creating a set of distributional coordinates which act as a feature extraction
    mechanism for supervised learning. We describe theoretical properties and
    implementation details of CDER, and illustrate its benefits on several
    synthetic examples.

    CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

    Andre Viebke, Suejb Memeti, Sabri Pllana, Ajith Abraham
    Comments: The Journal of Supercomputing, 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep learning is an important component of big-data analytic tools and
    intelligent applications, such as, self-driving cars, computer vision, speech
    recognition, or precision medicine. However, the training process is
    computationally intensive, and often requires a large amount of time if
    performed sequentially. Modern parallel computing systems provide the
    capability to reduce the required training time of deep neural networks. In
    this paper, we present our parallelization scheme for training convolutional
    neural networks (CNN) named Controlled Hogwild with Arbitrary Order of
    Synchronization (CHAOS). Major features of CHAOS include the support for thread
    and vector parallelism, non-instant updates of weight parameters during
    back-propagation without a significant delay, and implicit synchronization in
    arbitrary order. CHAOS is tailored for parallel computing systems that are
    accelerated with the Intel Xeon Phi. We evaluate our parallelization approach
    empirically using measurement techniques and performance modeling for various
    numbers of threads and CNN architectures. Experimental results for the MNIST
    dataset of handwritten digits using the total number of threads on the Xeon Phi
    show speedups of up to 103x compared to the execution on one thread of the Xeon
    Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x
    compared to the sequential execution on Intel Core i5.

    Learning Deep NBNN Representations for Robust Place Categorization

    Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
    Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

    This paper presents an approach for semantic place categorization using data
    obtained from RGB cameras. Previous studies on visual place recognition and
    classification have shown that, by considering features derived from
    pre-trained Convolutional Neural Networks (CNNs) in combination with part-based
    classification models, high recognition accuracy can be achieved, even in
    presence of occlusions and severe viewpoint changes. Inspired by these works,
    we propose to exploit local deep representations, representing images as set of
    regions applying a Na”{i}ve Bayes Nearest Neighbor (NBNN) model for image
    classification. As opposed to previous methods where CNNs are merely used as
    feature extractors, our approach seamlessly integrates the NBNN model into a
    fully-convolutional neural network. Experimental results show that the proposed
    algorithm outperforms previous methods based on pre-trained CNN models and
    that, when employed in challenging robot place recognition tasks, it is robust
    to occlusions, environmental and sensor changes.

    Adaptive Neural Networks for Fast Test-Time Prediction

    Tolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We present an approach to adaptively utilize deep neural networks in order to
    reduce the evaluation time on new examples without loss of classification
    performance. Rather than attempting to redesign or approximate existing
    networks, we propose two schemes that adaptively utilize networks. First, we
    pose an adaptive network evaluation scheme, where we learn a system to
    adaptively choose the components of a deep network to be evaluated for each
    example. By allowing examples correctly classified using early layers of the
    system to exit, we avoid the computational time associated with full evaluation
    of the network. Building upon this approach, we then learn a network selection
    system that adaptively selects the network to be evaluated for each example. We
    exploit the fact that many examples can be correctly classified using
    relatively efficient networks and that complex, computationally costly networks
    are only necessary for a small fraction of examples. By avoiding evaluation of
    these complex networks for a large fraction of examples, computational time can
    be dramatically reduced. Empirically, these approaches yield dramatic
    reductions in computational cost, with up to a 2.8x speedup on state-of-the-art
    networks from the ImageNet image recognition challenge with minimal (less than
    1%) loss of accuracy.

    Mode Regularized Generative Adversarial Networks

    Tong Che, Yanran Li, Athul Paul Jacob, Yoshua Bengio, Wenjie Li
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Although Generative Adversarial Networks achieve state-of-the-art results on
    a variety of generative tasks, they are regarded as highly unstable and prone
    to miss modes. We argue that these bad behaviors of GANs are due to the very
    particular functional shape of the trained discriminators in high dimensional
    spaces, which can easily make training stuck or push probability mass in the
    wrong direction, towards that of higher concentration than that of the data
    generating distribution. We introduce several ways of regularizing the
    objective, which can dramatically stabilize the training of GAN models. We also
    show that our regularizers can help the fair distribution of probability mass
    across the modes of the data generating distribution, during the early phases
    of training and thus providing a unified solution to the missing modes problem.


    Artificial Intelligence

    Differentiable Learning of Logical Rules for Knowledge Base Completion

    Fan Yang, Zhilin Yang, William W. Cohen
    Subjects: Artificial Intelligence (cs.AI)

    Learned models composed of probabilistic logical rules are useful for many
    tasks, such as knowledge base completion. Unfortunately this learning problem
    is difficult, since determining the structure of the theory normally requires
    solving a discrete optimization problem. In this paper, we propose an
    alternative approach: a completely differentiable model for learning sets of
    first-order rules. The approach is inspired by a recently-developed
    differentiable logic, i.e. a subset of first-order logic for which inference
    tasks can be compiled into sequences of differentiable operations. Here we
    describe a neural controller system which learns how to sequentially compose
    the these primitive differentiable operations to solve reasoning tasks, and in
    particular, to perform knowledge base completion. The long-term goal of this
    work is to develop integrated, end-to-end systems that can learn to perform
    high-level logical reasoning as well as lower-level perceptual tasks.

    Synergistic Team Composition

    Ewa Andrejczuk, Juan A. Rodriguez-Aguilar, Carme Roig, Carles Sierra
    Subjects: Artificial Intelligence (cs.AI)

    Effective teams are crucial for organisations, especially in environments
    that require teams to be constantly created and dismantled, such as software
    development, scientific experiments, crowd-sourcing, or the classroom. Key
    factors influencing team performance are competences and personality of team
    members. Hence, we present a computational model to compose proficient and
    congenial teams based on individuals’ personalities and their competences to
    perform tasks of different nature. With this purpose, we extend Wilde’s
    post-Jungian method for team composition, which solely employs individuals’
    personalities. The aim of this study is to create a model to partition agents
    into teams that are balanced in competences, personality and gender. Finally,
    we present some preliminary empirical results that we obtained when analysing
    student performance. Results show the benefits of a more informed team
    composition that exploits individuals’ competences besides information about
    their personalities.

    Criticality and Deep Learning, Part I: Theory vs. Empirics

    Dan Oprisa, Peter Toth
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Motivated by the idea that criticality and universality of phase transitions
    might play a crucial role in achieving and sustaining learning and intelligent
    behaviour in biological and artificial networks, we analyse a theoretical and a
    pragmatic experimental set up for critical phenomena in deep learning. On the
    theoretical side, we use results from statistical physics to carry out critical
    point calculations in feed-forward/fully connected networks, while on the
    experimental side we set out to find traces of criticality in deep neural
    networks. This is our first step in a series of upcoming investigations to map
    out the relationship between criticality and learning in deep networks.

    Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

    Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio
    Comments: 11 pages, 3 figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Despite the successes in capturing continuous distributions, the application
    of generative adversarial networks (GANs) to discrete settings, like natural
    language tasks, is rather restricted. The fundamental reason is the difficulty
    of back-propagation through discrete random variables combined with the
    inherent instability of the GAN training objective. To address these problems,
    we propose Maximum-Likelihood Augmented Discrete Generative Adversarial
    Networks. Instead of directly optimizing the GAN objective, we derive a novel
    and low-variance objective using the discriminator’s output that follows
    corresponds to the log-likelihood. Compared with the original, the new
    objective is proved to be consistent in theory and beneficial in practice. The
    experimental results on various discrete datasets demonstrate the effectiveness
    of the proposed approach.

    Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

    Brent Harrison, Upol Ehsan, Mark O. Riedl
    Comments: 9 pages, 4 figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    We introduce AI rationalization, an approach for generating explanations of
    autonomous system behavior as if a human had done the behavior. We describe a
    rationalization technique that uses neural machine translation to translate
    internal state-action representations of the autonomous agent into natural
    language. We evaluate our technique in the Frogger game environment. The
    natural language is collected from human players thinking out loud as they play
    the game. We motivate the use of rationalization as an approach to explanation
    generation, show the results of experiments on the accuracy of our
    rationalization technique, and describe future research agenda.

    Asymmetric Tri-training for Unsupervised Domain Adaptation

    Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Deep-layered models trained on a large number of labeled samples boost the
    accuracy of many tasks. It is important to apply such models to different
    domains because collecting many labeled samples in various domains is
    expensive. In unsupervised domain adaptation, one needs to train a classifier
    that works well on a target domain when provided with labeled source samples
    and unlabeled target samples. Although many methods aim to match the
    distributions of source and target samples, simply matching the distribution
    cannot ensure accuracy on the target domain. To learn discriminative
    representations for the target domain, we assume that artificially labeling
    target samples can result in a good representation. Tri-training leverages
    three classifiers equally to give pseudo-labels to unlabeled samples, but the
    method does not assume labeling samples generated from a different domain.In
    this paper, we propose an asymmetric tri-training method for unsupervised
    domain adaptation, where we assign pseudo-labels to unlabeled samples and train
    neural networks as if they are true labels. In our work, we use three networks
    asymmetrically. By asymmetric, we mean that two networks are used to label
    unlabeled target samples and one network is trained by the samples to obtain
    target-discriminative representations. We evaluate our method on digit
    recognition and sentiment analysis datasets. Our proposed method achieves
    state-of-the-art performance on the benchmark digit recognition datasets of
    domain adaptation.

    Balancing Lexicographic Fairness and a Utilitarian Objective with Application to Kidney Exchange

    Duncan C. McElfresh, John P. Dickerson
    Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

    Balancing fairness and efficiency in resource allocation is a classical
    economic and computational problem. The price of fairness measures the
    worst-case loss of economic efficiency when using an inefficient but fair
    allocation rule; for indivisible goods in many settings, this price is
    unacceptably high. In this work, we propose a hybrid fairness rule that
    balances a strict lexicographic preference ordering over classes of agents and
    a utilitarian objective that maximizes economic efficiency. We develop a
    utility function that favors disadvantaged groups lexicographically; but if
    cost to overall efficiency becomes too high, it smoothly switches to a
    utilitarian objective. This rule has only one parameter which is proportional
    to a bound on the price of fairness, and can be adjusted by policymakers. We
    apply this rule to kidney exchange, where needy patients swap willing but
    incompatible donors, and demonstrate on real data from a large exchange that
    our hybrid rule produces more reliable outcomes than other fairness rules.

    DeepNAT: Deep Convolutional Neural Network for Segmenting Neuroanatomy

    Christian Wachinger, Martin Reuter, Tassilo Klein
    Comments: Accepted for publication in NeuroImage, special issue “Brain Segmentation and Parcellation”, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We introduce DeepNAT, a 3D Deep convolutional neural network for the
    automatic segmentation of NeuroAnaTomy in T1-weighted magnetic resonance
    images. DeepNAT is an end-to-end learning-based approach to brain segmentation
    that jointly learns an abstract feature representation and a multi-class
    classification. We propose a 3D patch-based approach, where we do not only
    predict the center voxel of the patch but also neighbors, which is formulated
    as multi-task learning. To address a class imbalance problem, we arrange two
    networks hierarchically, where the first one separates foreground from
    background, and the second one identifies 25 brain structures on the
    foreground. Since patches lack spatial context, we augment them with
    coordinates. To this end, we introduce a novel intrinsic parameterization of
    the brain volume, formed by eigenfunctions of the Laplace-Beltrami operator. As
    network architecture, we use three convolutional layers with pooling, batch
    normalization, and non-linearities, followed by fully connected layers with
    dropout. The final segmentation is inferred from the probabilistic output of
    the network with a 3D fully connected conditional random field, which ensures
    label agreement between close voxels. The roughly 2.7 million parameters in the
    network are learned with stochastic gradient descent. Our results show that
    DeepNAT compares favorably to state-of-the-art methods. Finally, the purely
    learning-based method may have a high potential for the adaptation to young,
    old, or diseased brains by fine-tuning the pre-trained network with a small
    training sample on the target application, where the availability of larger
    datasets with manual annotations may boost the overall segmentation accuracy in
    the future.

    Reinforcement Learning with Deep Energy-Based Policies

    Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We propose a method for learning expressive energy-based policies for
    continuous states and actions, which has been feasible only in tabular domains
    before. We apply our method to learning maximum entropy policies, resulting
    into a new algorithm, called soft Q-learning, that expresses the optimal policy
    via a Boltzmann distribution. We use the recently proposed amortized Stein
    variational gradient descent to learn a stochastic sampling network that
    approximates samples from this distribution. The benefits of the proposed
    algorithm include improved exploration and compositionality that allows
    transferring skills between tasks, which we confirm in simulated experiments
    with swimming and walking robots. We also draw a connection to actor-critic
    methods, which can be viewed performing approximate inference on the
    corresponding energy-based model.

    Stochastic Variance Reduction Methods for Policy Evaluation

    Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (cs.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Policy evaluation is a crucial step in many reinforcement-learning
    procedures, which estimates a value function that predicts states’ long-term
    value under a given policy. In this paper, we focus on policy evaluation with
    linear function approximation over a fixed dataset. We first transform the
    empirical policy evaluation problem into a (quadratic) convex-concave saddle
    point problem, and then present a primal-dual batch gradient method, as well as
    two stochastic variance reduction methods for solving the problem. These
    algorithms scale linearly in both sample size and feature dimension. Moreover,
    they achieve linear convergence even when the saddle-point problem has only
    strong concavity in the dual variables but no strong convexity in the primal
    variables. Numerical experiments on benchmark problems demonstrate the
    effectiveness of our methods.

    Contractibility for Open Global Constraints

    Michael J. Maher
    Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)
    Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

    Open forms of global constraints allow the addition of new variables to an
    argument during the execution of a constraint program. Such forms are needed
    for difficult constraint programming problems where problem construction and
    problem solving are interleaved, and fit naturally within constraint logic
    programming. However, in general, filtering that is sound for a global
    constraint can be unsound when the constraint is open. This paper provides a
    simple characterization, called contractibility, of the constraints where
    filtering remains sound when the constraint is open. With this characterization
    we can easily determine whether a constraint has this property or not. In the
    latter case, we can use it to derive a contractible approximation to the
    constraint. We demonstrate this work on both hard and soft constraints. In the
    process, we formulate two general classes of soft constraints.

    Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying

    Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Athena Vakali
    Comments: WWW Cybersafety Workshop 2017
    Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

    Over the past few years, online aggression and abusive behaviors have
    occurred in many different forms and on a variety of platforms. In extreme
    cases, these incidents have evolved into hate, discrimination, and bullying,
    and even materialized into real-world threats and attacks against individuals
    or groups. In this paper, we study the Gamergate controversy. Started in August
    2014 in the online gaming world, it quickly spread across various social
    networking platforms, ultimately leading to many incidents of cyberbullying and
    cyberaggression. We focus on Twitter, presenting a measurement study of a
    dataset of 340k unique users and 1.6M tweets to study the properties of these
    users, the content they post, and how they differ from random Twitter users. We
    find that users involved in this “Twitter war” tend to have more friends and
    followers, are generally more engaged and post tweets with negative sentiment,
    less joy, and more hate than random users. We also perform preliminary
    measurements on how the Twitter suspension mechanism deals with such abusive
    behaviors. While we focus on Gamergate, our methodology to collect and analyze
    tweets related to aggressive and bullying activities is of independent
    interest.

    Mode Regularized Generative Adversarial Networks

    Tong Che, Yanran Li, Athul Paul Jacob, Yoshua Bengio, Wenjie Li
    Comments: Published as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Although Generative Adversarial Networks achieve state-of-the-art results on
    a variety of generative tasks, they are regarded as highly unstable and prone
    to miss modes. We argue that these bad behaviors of GANs are due to the very
    particular functional shape of the trained discriminators in high dimensional
    spaces, which can easily make training stuck or push probability mass in the
    wrong direction, towards that of higher concentration than that of the data
    generating distribution. We introduce several ways of regularizing the
    objective, which can dramatically stabilize the training of GAN models. We also
    show that our regularizers can help the fair distribution of probability mass
    across the modes of the data generating distribution, during the early phases
    of training and thus providing a unified solution to the missing modes problem.


    Information Retrieval

    Mutual Information based labelling and comparing clusters

    Rob Koopman, Shenghui Wang
    Comments: Special Issue of Scientometrics: Same data – different results? Towards a comparative approach to the identification of thematic structures in science
    Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)

    After a clustering solution is generated automatically, labelling these
    clusters becomes important to help understanding the results. In this paper, we
    propose to use a Mutual Information based method to label clusters of journal
    articles. Topical terms which have the highest Normalised Mutual Information
    (NMI) with a certain cluster are selected to be the labels of the cluster.
    Discussion of the labelling technique with a domain expert was used as a check
    that the labels are discriminating not only lexical-wise but also semantically.
    Based on a common set of topical terms, we also propose to generate lexical
    fingerprints as a representation of individual clusters. Eventually, we
    visualise and compare these fingerprints of different clusters from either one
    clustering solution or different ones.

    PubTree: A Hierarchical Search Tool for the MEDLINE Database

    William Rowe, Paul D. Dobson, Bede Constantinides, Mark Platt
    Comments: 7 pages, 2 figures
    Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)

    Keeping track of the ever-increasing body of scientific literature is an
    escalating challenge. We present PubTree a hierarchical search tool that
    efficiently searches the PubMed/MEDLINE dataset based upon a decision tree
    constructed using >26 million abstracts. The tool is implemented as a webpage,
    where users are asked a series of eighteen questions to locate pertinent
    articles. The implementation of this hierarchical search tool highlights issues
    endemic with document retrieval. However, the construction of this tree
    indicates that with future developments hierarchical search could become an
    effective tool (or adjunct) in the mining of biological literature.

    Related Pins at Pinterest: The Evolution of a Real-World Recommender System

    David C. Liu, Stephanie Rogers, Raymond Shiau, Dmitry Kislyuk, Kevin C. Ma, Zhigang Zhong, Jenny Liu, Yushi Jing
    Subjects: Information Retrieval (cs.IR)

    Related Pins is the Web-scale recommender system that powers over 40% of user
    engagement on Pinterest. This paper is a longitudinal study of three years of
    its development, exploring the evolution of the system and its components from
    prototypes to present state. Each component was originally built with many
    constraints on engineering effort and computational resources, so we
    prioritized the simplest and highest-leverage solutions. We show how organic
    growth led to a complex system and how we managed this complexity. Many
    challenges arose while building this system, such as avoiding feedback loops,
    evaluating performance, activating content, and eliminating legacy heuristics.
    Finally, we offer suggestions for tackling these challenges when engineering
    Web-scale recommender systems.

    Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media

    Rupinder Paul Khandpur, Taoran Ji, Steve Jan, Gang Wang, Chang-Tien Lu, Naren Ramakrishnan
    Comments: 13 single column pages, 5 figures, submitted to KDD 2017
    Subjects: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Social media is often viewed as a sensor into various societal events such as
    disease outbreaks, protests, and elections. We describe the use of social media
    as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our
    approach detects a broad range of cyber-attacks (e.g., distributed denial of
    service (DDOS) attacks, data breaches, and account hijacking) in an
    unsupervised manner using just a limited fixed set of seed event triggers. A
    new query expansion strategy based on convolutional kernels and dependency
    parses helps model reporting structure and aids in identifying key event
    characteristics. Through a large-scale analysis over Twitter, we demonstrate
    that our approach consistently identifies and encodes events, outperforming
    existing methods.


    Computation and Language

    Stance Classification of Social Media Users in Independence Movements

    Arkaitz Zubiaga, Bo Wang, Maria Liakata, Rob Procter
    Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

    Social media and data mining are increasingly being used to analyse political
    and societal issues. Characterisation of users into socio-demographic groups is
    crucial to improve these analyses. Here we undertake the classification of
    social media users as supporting or opposing ongoing independence movements in
    their territories. Independence movements occur in territories whose citizens
    have conflicting national identities; users with opposing national identities
    will then support or oppose the sense of being part of an independent nation
    that differs from the officially recognised country. We describe a methodology
    that relies on users’ self-reported location to build datasets for three
    territories — Catalonia, the Basque Country and Scotland — and we test
    language-independent classifiers using four types of features. We show the
    effectiveness of the approach to build large annotated datasets, and the
    ability to achieve accurate, language-independent classification performances
    ranging from 85% to 97% for the three territories under study.

    Identifying beneficial task relations for multi-task learning in deep neural networks

    Joachim Bingel, Anders Søgaard
    Comments: Accepted for publication at EACL 2017
    Subjects: Computation and Language (cs.CL)

    Multi-task learning (MTL) in deep neural networks for NLP has recently
    received increasing interest due to some compelling benefits, including its
    potential to efficiently regularize models and to reduce the need for labeled
    data. While it has brought significant improvements in a number of NLP tasks,
    mixed results have been reported, and little is known about the conditions
    under which MTL leads to gains in NLP. This paper sheds light on the specific
    task relations that can lead to gains from MTL models over single-task setups.

    A case study on English-Malayalam Machine Translation

    Sreelekha S, Pushpak Bhattacharyya
    Comments: This paper contains 10 pages with 4 figures and 10 tables
    Subjects: Computation and Language (cs.CL)

    In this paper we present our work on a case study on Statistical Machine
    Translation (SMT) and Rule based machine translation (RBMT) for translation
    from English to Malayalam and Malayalam to English. One of the motivations of
    our study is to make a three way performance comparison, such as, a) SMT and
    RBMT b) English to Malayalam SMT and Malayalam to English SMT c) English to
    Malayalam RBMT and Malayalam to English RBMT. We describe the development of
    English to Malayalam and Malayalam to English baseline phrase based SMT system
    and the evaluation of its performance compared against the RBMT system. Based
    on our study the observations are: a) SMT systems outperform RBMT systems, b)
    In the case of SMT, English – Malayalam systems perform better than that of
    Malayalam – English systems, c) In the case RBMT, Malayalam to English systems
    are performing better than English to Malayalam systems. Based on our
    evaluations and detailed error analysis, we describe the requirements of
    incorporating morphological processing into the SMT to improve the accuracy of
    translation.

    Friends and Enemies of Clinton and Trump: Using Context for Detecting Stance in Political Tweets

    Mirko Lai, Delia Irazú Hernández Farías, Viviana Patti, Paolo Rosso
    Comments: To appear in MICAI 2016 LNAI Proceedings
    Subjects: Computation and Language (cs.CL)

    Stance detection, the task of identifying the speaker’s opinion towards a
    particular target, has attracted the attention of researchers. This paper
    describes a novel approach for detecting stance in Twitter. We define a set of
    features in order to consider the context surrounding a target of interest with
    the final aim of training a model for predicting the stance towards the
    mentioned targets. In particular, we are interested in investigating political
    debates in social media. For this reason we evaluated our approach focusing on
    two targets of the SemEval-2016 Task6 on Detecting stance in tweets, which are
    related to the political campaign for the 2016 U.S. presidential elections:
    Hillary Clinton vs. Donald Trump. For the sake of comparison with the state of
    the art, we evaluated our model against the dataset released in the
    SemEval-2016 Task 6 shared task competition. Our results outperform the best
    ones obtained by participating teams, and show that information about enemies
    and friends of politicians help in detecting stance towards them.

    Detecting (Un)Important Content for Single-Document News Summarization

    Yinfei Yang, Forrest Sheng Bao, Ani Nenkova
    Comments: Accepted By EACL 2017
    Subjects: Computation and Language (cs.CL)

    We present a robust approach for detecting intrinsic sentence importance in
    news, by training on two corpora of document-summary pairs. When used for
    single-document summarization, our approach, combined with the “beginning of
    document” heuristic, outperforms a state-of-the-art summarizer and the
    beginning-of-article baseline in both automatic and manual evaluations. These
    results represent an important advance because in the absence of cross-document
    repetition, single document summarizers for news have not been able to
    consistently outperform the strong beginning-of-article baseline.

    Critical Survey of the Freely Available Arabic Corpora

    Wajdi Zaghouani
    Comments: Published in the Proceedings of the International Conference on Language Resources and Evaluation (LREC’2014), OSACT Workshop. Reykjavik, Iceland, 26-31 May 2014
    Subjects: Computation and Language (cs.CL)

    The availability of corpora is a major factor in building natural language
    processing applications. However, the costs of acquiring corpora can prevent
    some researchers from going further in their endeavours. The ease of access to
    freely available corpora is urgent needed in the NLP research community
    especially for language such as Arabic. Currently, there is not easy was to
    access to a comprehensive and updated list of freely available Arabic corpora.
    We present in this paper, the results of a recent survey conducted to identify
    the list of the freely available Arabic corpora and language resources. Our
    preliminary results showed an initial list of 66 sources. We presents our
    findings in the various categories studied and we provided the direct links to
    get the data when possible.

    Deep Voice: Real-time Neural Text-to-Speech

    Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi
    Comments: Submitted to ICML 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

    We present Deep Voice, a production-quality text-to-speech system constructed
    entirely from deep neural networks. Deep Voice lays the groundwork for truly
    end-to-end neural speech synthesis. The system comprises five major building
    blocks: a segmentation model for locating phoneme boundaries, a
    grapheme-to-phoneme conversion model, a phoneme duration prediction model, a
    fundamental frequency prediction model, and an audio synthesis model. For the
    segmentation model, we propose a novel way of performing phoneme boundary
    detection with deep neural networks using connectionist temporal classification
    (CTC) loss. For the audio synthesis model, we implement a variant of WaveNet
    that requires fewer parameters and trains faster than the original. By using a
    neural network for each component, our system is simpler and more flexible than
    traditional text-to-speech systems, where each component requires laborious
    feature engineering and extensive domain expertise. Finally, we show that
    inference with our system can be performed faster than real time and describe
    optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x
    speedups over existing implementations.

    Residual Convolutional CTC Networks for Automatic Speech Recognition

    Yisen Wang, Xuejiao Deng, Songbai Pu, Zhiheng Huang
    Subjects: Computation and Language (cs.CL)

    Deep learning approaches have been widely used in Automatic Speech
    Recognition (ASR) and they have achieved a significant accuracy improvement.
    Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR
    recently. However, most CNNs used in existing work have less than 10 layers
    which may not be deep enough to capture all human speech signal information. In
    this paper, we propose a novel deep and wide CNN architecture denoted as
    RCNN-CTC, which has residual connections and Connectionist Temporal
    Classification (CTC) loss function. RCNN-CTC is an end-to-end system which can
    exploit temporal and spectral structures of speech signals simultaneously.
    Furthermore, we introduce a CTC-based system combination, which is different
    from the conventional frame-wise senone-based one. The basic subsystems adopted
    in the combination are different types and thus mutually complementary to each
    other. Experimental results show that our proposed single system RCNN-CTC can
    achieve the lowest word error rate (WER) on WSJ and Tencent Chat data sets,
    compared to several widely used neural network systems in ASR. In addition, the
    proposed system combination can offer a further error reduction on these two
    data sets, resulting in relative WER reductions of (14.91\%) and (6.52\%) on
    WSJ dev93 and Tencent Chat data sets respectively.

    When confidence and competence collide: Effects on online decision-making discussions

    Liye Fu, Lillian Lee, Cristian Danescu-Niculescu-Mizil
    Comments: To appear in Proceedings of WWW 2017. Online multiplayer game available at this http URL
    Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

    Group discussions are a way for individuals to exchange ideas and arguments
    in order to reach better decisions than they could on their own. One of the
    premises of productive discussions is that better solutions will prevail, and
    that the idea selection process is mediated by the (relative) competence of the
    individuals involved. However, since people may not know their actual
    competence on a new task, their behavior is influenced by their self-estimated
    competence — that is, their confidence — which can be misaligned with their
    actual competence.

    Our goal in this work is to understand the effects of confidence-competence
    misalignment on the dynamics and outcomes of discussions. To this end, we
    design a large-scale natural setting, in the form of an online team-based
    geography game, that allows us to disentangle confidence from competence and
    thus separate their effects.

    We find that in task-oriented discussions, the more-confident individuals
    have a larger impact on the group’s decisions even when these individuals are
    at the same level of competence as their teammates. Furthermore, this
    unjustified role of confidence in the decision-making process often leads teams
    to under-perform. We explore this phenomenon by investigating the effects of
    confidence on conversational dynamics.

    Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

    Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick
    Comments: 12 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)

    Recent work on generative modeling of text has found that variational
    auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM
    language models (Bowman et al., 2015). This negative result is so far poorly
    understood, but has been attributed to the propensity of LSTM decoders to
    ignore conditioning information from the encoder. In this paper, we experiment
    with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s
    dilation architecture, we control the effective context from previously
    generated words. In experiments, we find that there is a trade off between the
    contextual capacity of the decoder and the amount of encoding information used.
    We show that with the right decoder, VAE can outperform LSTM language models.
    We demonstrate perplexity gains on two datasets, representing the first
    positive experimental result on the use VAE for generative modeling of text.
    Further, we conduct an in-depth investigation of the use of VAE (with our new
    decoding architecture) for semi-supervised and unsupervised labeling tasks,
    demonstrating gains over several strong baselines.

    Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

    Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio
    Comments: 11 pages, 3 figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Despite the successes in capturing continuous distributions, the application
    of generative adversarial networks (GANs) to discrete settings, like natural
    language tasks, is rather restricted. The fundamental reason is the difficulty
    of back-propagation through discrete random variables combined with the
    inherent instability of the GAN training objective. To address these problems,
    we propose Maximum-Likelihood Augmented Discrete Generative Adversarial
    Networks. Instead of directly optimizing the GAN objective, we derive a novel
    and low-variance objective using the discriminator’s output that follows
    corresponds to the log-likelihood. Compared with the original, the new
    objective is proved to be consistent in theory and beneficial in practice. The
    experimental results on various discrete datasets demonstrate the effectiveness
    of the proposed approach.

    Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

    Brent Harrison, Upol Ehsan, Mark O. Riedl
    Comments: 9 pages, 4 figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    We introduce AI rationalization, an approach for generating explanations of
    autonomous system behavior as if a human had done the behavior. We describe a
    rationalization technique that uses neural machine translation to translate
    internal state-action representations of the autonomous agent into natural
    language. We evaluate our technique in the Frogger game environment. The
    natural language is collected from human players thinking out loud as they play
    the game. We motivate the use of rationalization as an approach to explanation
    generation, show the results of experiments on the accuracy of our
    rationalization technique, and describe future research agenda.


    Distributed, Parallel, and Cluster Computing

    Optimized Secure Position Sharing with Non-trusted Servers

    Pavel Skvortsov, Björn Schembera, Frank Dürr, Kurt Rothermel
    Comments: 26 pages, 11 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Today, location-based applications and services such as friend finders and
    geo-social networks are very popular. However, storing private position
    information on third-party location servers leads to privacy problems. In our
    previous work, we proposed a position sharing approach for secure management of
    positions on non-trusted servers, which distributes position shares of limited
    precision among servers of several providers. In this paper, we propose two
    novel contributions to improve the original approach. First, we optimize the
    placement of shares among servers by taking their trustworthiness into account.
    Second, we optimize the location update protocols to minimize the number of
    messages between mobile device and location servers.

    Another Look at the Implementation of Read/write Registers in Crash-prone Asynchronous Message-Passing Systems (Extended Version)

    Damien Imbs (LIF), Achour Mostefaoui (GDD), Matthieu Perrin, Michel Raynal (ASAP)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    ” Yet another paper on ” the implementation of read/write registers in
    crash-prone asynchronous message-passing systems! Yes…, but, differently from
    its predecessors, this paper looks for a communication abstraction which
    captures the essence of such an implementation in the same sense that total
    order broadcast can be associated with consensus, or message causal delivery
    can be associated with causal read/write registers. To this end, the paper
    introduces a new communication abstraction, named SCD-broadcast (SCD standing
    for ” Set Constrained Delivery “), which, instead of a single message, delivers
    to processes sets of messages (whose size can be arbitrary), such that the
    sequences of message sets delivered to any two processes satisfies some
    constraints. The paper then shows that: (a) SCD-broadcast allows for a very
    simple implementation of a snapshot object (and consequently also of atomic
    read/write registers) in crash-prone asynchronous message-passing systems, (b)
    SCD-broadcast can be built from snapshot objects (hence SCD-broadcast and
    snapshot objects –or read/write registers– are ” computationally equivalent
    “), (c) SCD-broadcast can be built in message-passing systems where any
    minority of processes may crash (which is the weakest assumption on the number
    of possible process crashes needed to implement a read/write register).

    Tars: Timeliness-aware Adaptive Replica Selection for Key-Value Stores

    Wanchun Jiang, Liyuan Fang, Haiming Xie, Xiangqian Zhou, Jianxin Wang
    Comments: 10pages,submitted to ICDCS 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In current large-scale distributed key-value stores, a single end-user
    request may lead to key-value access across tens or hundreds of servers. The
    tail latency of these key-value accesses is crucial to the user experience and
    greatly impacts the revenue. To cut the tail latency, it is crucial for clients
    to choose the fastest replica server as much as possible for the service of
    each key-value access. Aware of the challenges on the time varying performance
    across servers and the herd behaviors, an adaptive replica selection scheme C3
    is proposed recently. In C3, feedback from individual servers is brought into
    replica ranking to reflect the time-varying performance of servers, and the
    distributed rate control and backpressure mechanism is invented. Despite of
    C3’s good performance, we reveal the timeliness issue of C3, which has large
    impacts on both the replica ranking and the rate control, and propose the Tars
    (timeliness-aware adaptive replica selection) scheme. Following the same
    framework as C3, Tars improves the replica ranking by taking the timeliness of
    the feedback information into consideration, as well as revises the rate
    control of C3. Simulation results confirm that Tars outperforms C3.

    HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

    Huijun Wu, Chen Wang, Yinjin Fu, Sherif Sakr, Liming Zhu, Kai Lu
    Comments: 14 pages, 11 figures, submitted to MSST2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Eliminating duplicate data in primary storage of clouds increases the
    cost-efficiency of cloud service providers as well as reduces the cost of users
    for using cloud services. Existing primary deduplication techniques either use
    inline caching to exploit locality in primary workloads or use post-processing
    deduplication running in system idle time to avoid the negative impact on I/O
    performance. However, neither of them works well in the cloud servers running
    multiple services or applications for the following two reasons: Firstly, the
    temporal locality of duplicate data writes may not exist in some primary
    storage workloads thus inline caching often fails to achieve good deduplication
    ratio. Secondly, the post-processing deduplication allows duplicate data to be
    written into disks, therefore does not provide the benefit of I/O deduplication
    and requires high peak storage capacity. This paper presents HPDedup, a Hybrid
    Prioritized data Deduplication mechanism to deal with the storage system shared
    by applications running in co-located virtual machines or containers by fusing
    an inline and a post-processing process for exact deduplication. In the inline
    deduplication phase, HPDedup gives a fingerprint caching mechanism that
    estimates the temporal locality of duplicates in data streams from different
    VMs or applications and prioritizes the cache allocation for these streams
    based on the estimation. HPDedup also allows different deduplication threshold
    for streams based on their spatial locality to reduce the disk fragmentation.
    The post-processing phase removes duplicates whose fingerprints are not able to
    be cached due to the weak temporal locality from disks. Our experimental
    results show that HPDedup clearly outperforms the state-of-the-art primary
    storage deduplication techniques in terms of inline cache efficiency and
    primary deduplication efficiency.

    CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

    Andre Viebke, Suejb Memeti, Sabri Pllana, Ajith Abraham
    Comments: The Journal of Supercomputing, 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep learning is an important component of big-data analytic tools and
    intelligent applications, such as, self-driving cars, computer vision, speech
    recognition, or precision medicine. However, the training process is
    computationally intensive, and often requires a large amount of time if
    performed sequentially. Modern parallel computing systems provide the
    capability to reduce the required training time of deep neural networks. In
    this paper, we present our parallelization scheme for training convolutional
    neural networks (CNN) named Controlled Hogwild with Arbitrary Order of
    Synchronization (CHAOS). Major features of CHAOS include the support for thread
    and vector parallelism, non-instant updates of weight parameters during
    back-propagation without a significant delay, and implicit synchronization in
    arbitrary order. CHAOS is tailored for parallel computing systems that are
    accelerated with the Intel Xeon Phi. We evaluate our parallelization approach
    empirically using measurement techniques and performance modeling for various
    numbers of threads and CNN architectures. Experimental results for the MNIST
    dataset of handwritten digits using the total number of threads on the Xeon Phi
    show speedups of up to 103x compared to the execution on one thread of the Xeon
    Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x
    compared to the sequential execution on Intel Core i5.

    Near Data Scheduling for Data Centers with Multi Levels of Data Locality

    Ali Yekkehkhany
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Data locality is a fundamental issue for data-parallel applications.
    Considering MapReduce in Hadoop, the map task scheduling part requires an
    efficient algorithm which takes data locality into consideration; otherwise,
    system may get unstable under loads inside the system’s capacity region or jobs
    may experience longer completion times which are not of interest. The data
    chunk needed for any map task can be in memory, on a local disk, in a local
    rack, in the same cluster or even in another data center. Hence, unless there
    has been so much work on improving the speed of data center networks, still
    there exists different levels of service rates for a task depending on where
    its data chunk is saved and from which server it receives service. Most of the
    theoretical work on load balancing is for systems with two levels of data
    locality among which I can name Pandas algorithm by Xie et al. and JSQ-MW by
    Wang et al., where the former is both throughput and heavy-traffic optimal,
    while the latter is only throughput optimal, but heavy-traffic optimal in only
    a special traffic load. We show that an extension of JSQ-MW for a system with
    thee levels of data locality is throughput optimal, but not heavy-traffic
    optimal for all load, but again for a special traffic scenario. Furthermore, we
    show that Pandas is not even throughput optimal for a system with three levels
    of data locality. We then propose a novel algorithm, Balanced-Pandas, which is
    both throughput and heavy-traffic optimal. To the best of our knowledge, this
    is the first theoretical work on load balancing for a system with more than two
    levels of data locality which as we will see is more challenging than two
    levels of data locality as a dilemma between performance and throughput
    emerges.

    A Methodology for Oracle Selection of Monitors and Knobs for Configuring an HPC System running a Flood Management Application

    Panagiota Nikolaou, Yiannakis Sazeides, Antoni Portero, Radim Vavrik, Vit Vondrak
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This paper defines a methodology for the oracle selection of the monitors and
    knobs to use to configure an HPC system running a scientific application while
    satisfying the application’s requirements and not violating any system
    constraints. This methodology relies on a heuristic correlation analysis
    between requirements, monitors and knobs to determine the minimum subset of
    monitors to observe and knobs to explore, to determine the optimal system
    configuration for the HPC application. At the end of this analysis, we reduce
    an 11-dimensional space to a 3-dimensional space for monitors and a
    6-dimensional space to a 3-dimensional space for knobs. This reduction shows
    the potential and highlights the need for a realistic methodology to help
    identify such minimum set of monitors and knobs.

    Scalable and Distributed Clustering via Lightweight Coresets

    Olivier Bachem, Mario Lucic, Andreas Krause
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Computation (stat.CO)

    Coresets are compact representations of data sets such that models trained on
    a coreset are provably competitive with models trained on the full data set. As
    such, they have been successfully used to scale up clustering models to massive
    data sets. While existing approaches generally only allow for multiplicative
    approximation errors, we propose a novel notion of coresets called lightweight
    coresets that allows for both multiplicative and additive errors. We provide a
    single algorithm to construct light-weight coresets for k-Means clustering,
    Bregman clustering and maximum likelihood estimation of Gaussian mixture
    models. The algorithm is substantially faster than existing constructions,
    embarrassingly parallel and resulting coresets are smaller. In an extensive
    experimental evaluation, we demonstrate that the proposed method outperforms
    existing coreset constructions.

    Using Battery Storage for Peak Shaving and Frequency Regulation: Joint Optimization for Superlinear Gains

    Yuanyuan Shi, Bolun Xu, Di Wang, Baosen Zhang
    Comments: Submitted to IEEE Transaction on Power Systems
    Subjects: Systems and Control (cs.SY); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)

    We consider using a battery storage system simultaneously for peak shaving
    and frequency regulation through a joint optimization framework which captures
    battery degradation, operational constraints and uncertainties in customer load
    and regulation signals. Under this framework, using real data from a Microsoft
    data center, an University of Washington building and PJM, we show the
    electricity bill of users can be reduced by up to 15\%. Furthermore, we
    demonstrate that the saving from joint optimization is often larger than the
    sum of the optimal savings when the battery is used for the two individual
    applications. A simple threshold real-time algorithm is proposed and achieves
    this super-linear gain. Compared to prior works that focused on using battery
    storage systems for single applications, our results suggest that batteries can
    produce much larger economic benefits than previously thought if they jointly
    provide multiple services.


    Learning

    McGan: Mean and Covariance Feature Matching GAN

    Youssef Mroueh, Tom Sercu, Vaibhava Goel
    Comments: 15 pages; under review
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We introduce new families of Integral Probability Metrics (IPM) for training
    Generative Adversarial Networks (GAN). Our IPMs are based on matching
    statistics of distributions embedded in a finite dimensional feature space.
    Mean and covariance feature matching IPMs allow for stable training of GANs,
    which we will call McGan. McGan minimizes a meaningful loss between
    distributions.

    Learning Hierarchical Features from Generative Models

    Shengjia Zhao, Jiaming Song, Stefano Ermon
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Deep neural networks have been shown to be very successful at learning
    feature hierarchies in supervised learning tasks. Generative models, on the
    other hand, have benefited less from hierarchical models with multiple layers
    of latent variables. In this paper, we prove that certain classes of
    hierarchical latent variable models do not take advantage of the hierarchical
    structure when trained with existing variational methods, and provide some
    limitations on the kind of features existing models can learn. Finally we
    propose an alternative flat architecture that learns meaningful and
    disentangled features on natural images.

    Neural Map: Structured Memory for Deep Reinforcement Learning

    Emilio Parisotto, Ruslan Salakhutdinov
    Subjects: Learning (cs.LG)

    A critical component to enabling intelligent reasoning in partially
    observable environments is memory. Despite this importance, Deep Reinforcement
    Learning (DRL) agents have so far used relatively simple memory architectures,
    with the main methods to overcome partial observability being either a temporal
    convolution over the past k frames or an LSTM layer. More recent work (Oh et
    al., 2016) has went beyond these architectures by using memory networks which
    can allow more sophisticated addressing schemes over the past k frames. But
    even these architectures are unsatisfactory due to the reason that they are
    limited to only remembering information from the last k frames. In this paper,
    we develop a memory system with an adaptable write operator that is customized
    to the sorts of 3D environments that DRL agents typically interact with. This
    architecture, called the Neural Map, uses a spatially structured 2D memory
    image to learn to store arbitrary information about the environment over long
    time lags. We demonstrate empirically that the Neural Map surpasses previous
    DRL memories on a set of challenging 2D and 3D maze environments and show that
    it is capable of generalizing to environments that were not seen during
    training.

    Adaptive Learning to Speed-Up Control of Prosthetic Hands: a Few Things Everybody Should Know

    Valentina Gregori, Arjan Gijsberts, Barbara Caputo
    Subjects: Learning (cs.LG)

    A number of studies have proposed to use domain adaptation to reduce the
    training efforts needed to control an upper-limb prosthesis exploiting
    pre-trained models from prior subjects. These studies generally reported
    impressive reductions in the required number of training samples to achieve a
    certain level of accuracy for intact subjects. We further investigate two
    popular methods in this field to verify whether this result equally applies to
    amputees. Our findings show instead that this improvement can largely be
    attributed to a suboptimal hyperparameter configuration. When hyperparameters
    are appropriately tuned, the standard approach that does not exploit prior
    information performs on par with the more complicated transfer learning
    algorithms. Additionally, earlier studies erroneously assumed that the number
    of training samples relates proportionally to the efforts required from the
    subject. However, a repetition of a movement is the atomic unit for subjects
    and the total number of repetitions should therefore be used as reliable
    measure for training efforts. Also when correcting for this mistake, we do not
    find any performance increase due to the use of prior models.

    Fast and Accurate Inference with Adaptive Ensemble Prediction in Image Classification with Deep Neural Networks

    Hiroshi Inoue
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    Ensembling multiple predictions is a widely used technique to improve the
    accuracy of various machine learning tasks. In image classification tasks, for
    example, averaging the predictions for multiple patches extracted from the
    input image significantly improves accuracy. Using multiple networks trained
    independently to make predictions improves accuracy further. One obvious
    drawback of the ensembling technique is its higher execution cost during
    inference. If we average 100 predictions, the execution cost will be 100 times
    as high as the cost without the ensemble. This higher cost limits the
    real-world use of ensembling, even though using it is almost the norm to win
    image classification competitions. In this paper, we describe a new technique
    called adaptive ensemble prediction, which achieves the benefits of ensembling
    with much smaller additional execution costs. Our observation behind this
    technique is that many easy-to-predict inputs do not require ensembling. Hence
    we calculate the confidence level of the prediction for each input on the basis
    of the probability of the predicted label, i.e. the outputs from the softmax,
    during the ensembling computation. If the prediction for an input reaches a
    high enough probability on the basis of the confidence level, we stop
    ensembling for this input to avoid wasting computation power. We evaluated the
    adaptive ensembling by using various datasets and showed that it reduces the
    computation time significantly while achieving similar accuracy to the naive
    ensembling.

    Fixed-point optimization of deep neural networks with adaptive step size retraining

    Sungho Shin, Yoonho Boo, Wonyong Sung
    Comments: This paper is accepted in ICASSP 2017
    Subjects: Learning (cs.LG)

    Fixed-point optimization of deep neural networks plays an important role in
    hardware based design and low-power implementations. Many deep neural networks
    show fairly good performance even with 2- or 3-bit precision when quantized
    weights are fine-tuned by retraining. We propose an improved fixedpoint
    optimization algorithm that estimates the quantization step size dynamically
    during the retraining. In addition, a gradual quantization scheme is also
    tested, which sequentially applies fixed-point optimizations from high- to
    low-precision. The experiments are conducted for feed-forward deep neural
    networks (FFDNNs), convolutional neural networks (CNNs), and recurrent neural
    networks (RNNs).

    Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis

    Dan Garber, Ohad Shamir, Nathan Srebro
    Subjects: Learning (cs.LG)

    We study the fundamental problem of Principal Component Analysis in a
    statistical distributed setting in which each machine out of (m) stores a
    sample of (n) points sampled i.i.d. from a single unknown distribution. We
    study algorithms for estimating the leading principal component of the
    population covariance matrix that are both communication-efficient and achieve
    estimation error of the order of the centralized ERM solution that uses all
    (mn) samples. On the negative side, we show that in contrast to results
    obtained for distributed estimation under convexity assumptions, for the PCA
    objective, simply averaging the local ERM solutions cannot guarantee error that
    is consistent with the centralized ERM. We show that this unfortunate phenomena
    can be remedied by performing a simple correction step which correlates between
    the individual solutions, and provides an estimator that is consistent with the
    centralized ERM for sufficiently-large (n). We also introduce an iterative
    distributed algorithm that is applicable in any regime of (n), which is based
    on distributed matrix-vector products. The algorithm gives significant
    acceleration in terms of communication rounds over previous distributed
    algorithms, in a wide regime of parameters.

    Reinforcement Learning with Deep Energy-Based Policies

    Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We propose a method for learning expressive energy-based policies for
    continuous states and actions, which has been feasible only in tabular domains
    before. We apply our method to learning maximum entropy policies, resulting
    into a new algorithm, called soft Q-learning, that expresses the optimal policy
    via a Boltzmann distribution. We use the recently proposed amortized Stein
    variational gradient descent to learn a stochastic sampling network that
    approximates samples from this distribution. The benefits of the proposed
    algorithm include improved exploration and compositionality that allows
    transferring skills between tasks, which we confirm in simulated experiments
    with swimming and walking robots. We also draw a connection to actor-critic
    methods, which can be viewed performing approximate inference on the
    corresponding energy-based model.

    F2F: A Library For Fast Kernel Expansions

    Joachim Curto, Irene Zarza, Feng Yang, Alexander J. Smola, Luc Van Gool
    Subjects: Learning (cs.LG)

    F2F is a C++ library for large-scale machine learning. It contains a CPU
    optimized implementation of the Fastfood algorithm, that allows the computation
    of approximated kernel expansions in loglinear time. The algorithm requires to
    compute the product of Walsh-Hadamard Transform (WHT) matrices. A cache
    friendly SIMD Fast Walsh-Hadamard Transform (FWHT) that achieves compelling
    speed and outperforms current state-of-the-art methods has been developed. F2F
    allows to obtain non-linear classification combining Fastfood and a linear
    classifier.

    Deceiving Google's Perspective API Built for Detecting Toxic Comments

    Hossein Hosseini, Sreeram Kannan, Baosen Zhang, Radha Poovendran
    Comments: 4 pages
    Subjects: Learning (cs.LG); Computers and Society (cs.CY); Social and Information Networks (cs.SI)

    Social media platforms provide an environment where people can freely engage
    in discussions. Unfortunately, they also enable several problems, such as
    online harassment. Recently, Google and Jigsaw started a project called
    Perspective, which uses machine learning to automatically detect toxic
    language. A demonstration website has been also launched, which allows anyone
    to type a phrase in the interface and instantaneously see the toxicity score
    [1]. In this paper, we propose an attack on the Perspective toxic detection
    system based on the adversarial examples. We show that an adversary can subtly
    modify a highly toxic phrase in a way that the system assigns significantly
    lower toxicity score to it. We apply the attack on the sample phrases provided
    in the Perspective website and show that we can consistently reduce the
    toxicity scores to the level of the non-toxic phrases. The existence of such
    adversarial examples is very harmful for toxic detection systems and seriously
    undermines their usability.

    Online Multiview Representation Learning: Dropping Convexity for Better Efficiency

    Zhehui Chen, Forest L. Yang, Chris J. Li, Tuo Zhao
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Multiview representation learning is very popular for latent factor analysis.
    It naturally arises in many data analysis, machine learning, and information
    retrieval applications to model dependent structures between a pair of data
    matrices. For computational convenience, existing approaches usually formulate
    the multiview representation learning as convex optimization problems, where
    global optima can be obtained by certain algorithms in polynomial time.
    However, many evidences have corroborated that heuristic nonconvex approaches
    also have good empirical computational performance and convergence to the
    global optima, although there is a lack of theoretical justification. Such a
    gap between theory and practice motivates us to study a nonconvex formulation
    for multiview representation learning, which can be efficiently solved by two
    stochastic gradient descent (SGD) methods. Theoretically, by analyzing the
    dynamics of the algorithms based on diffusion processes, we establish global
    rates of convergence to the global optima with high probability. Numerical
    experiments are provided to support our theory.

    Learning Control for Air Hockey Striking using Deep Reinforcement Learning

    Ayal Taitler, Nahum Shimkin
    Subjects: Learning (cs.LG); Robotics (cs.RO)

    We consider the task of learning control policies for a robotic mechanism
    striking a puck in an air hockey game. The control signal is a direct command
    to the robot’s motors. We employ a model free deep reinforcement learning
    framework to learn the motoric skills of striking the puck accurately in order
    to score. We propose certain improvements to the standard learning scheme which
    make the deep Q-learning algorithm feasible when it might otherwise fail. Our
    improvements include integrating prior knowledge into the learning scheme, and
    accounting for the changing distribution of samples in the experience replay
    buffer. Finally we present our simulation results for aimed striking which
    demonstrate the successful learning of this task, and the improvement in
    algorithm stability due to the proposed modifications.

    Bayesian Nonparametric Feature and Policy Learning for Decision-Making

    Jürgen Hahn, Abdelhak M. Zoubir
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Learning from demonstrations has gained increasing interest in the recent
    past, enabling an agent to learn how to make decisions by observing an
    experienced teacher. While many approaches have been proposed to solve this
    problem, there is only little work that focuses on reasoning about the observed
    behavior. We assume that, in many practical problems, an agent makes its
    decision based on latent features, indicating a certain action. Therefore, we
    propose a generative model for the states and actions. Inference reveals the
    number of features, the features, and the policies, allowing us to learn and to
    analyze the underlying structure of the observed behavior. Further, our
    approach enables prediction of actions for new states. Simulations are used to
    assess the performance of the algorithm based upon this model. Moreover, the
    problem of learning a driver’s behavior is investigated, demonstrating the
    performance of the proposed model in a real-world scenario.

    Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

    Alon Brutzkus, Amir Globerson
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Deep learning models are often successfully trained using gradient descent,
    despite the worst case hardness of the underlying non-convex optimization
    problem. The key question is then under what conditions can one prove that
    optimization will succeed. Here we provide a strong result of this kind. We
    consider a neural net with one hidden layer and a convolutional structure with
    no overlap and a ReLU activation function. For this architecture we show that
    learning is NP-complete in the general case, but that when the input
    distribution is Gaussian, gradient descent converges to the global optimum in
    polynomial time. To the best of our knowledge, this is the first global
    optimality guarantee of gradient descent on a convolutional neural network with
    ReLU activations.

    Supervised Learning of Labeled Pointcloud Differences via Cover-Tree Entropy Reduction

    Abraham Smith, Paul Bendich, John Harer, Jay Hineman
    Comments: Distribution Statement A – Approved for public release, distribution is unlimited
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We introduce a new algorithm, called CDER, for supervised machine learning
    that merges the multi-scale geometric properties of Cover Trees with the
    information-theoretic properties of entropy. CDER applies to a training set of
    labeled pointclouds embedded in a common Euclidean space. If typical
    pointclouds corresponding to distinct labels tend to differ at any scale in any
    sub-region, CDER can identify these differences in (typically) linear time,
    creating a set of distributional coordinates which act as a feature extraction
    mechanism for supervised learning. We describe theoretical properties and
    implementation details of CDER, and illustrate its benefits on several
    synthetic examples.

    Efficient Online Bandit Multiclass Learning with ( ilde{O}(sqrt{T})) Regret

    Alina Beygelzimer, Francesco Orabona, Chicheng Zhang
    Comments: 17 pages, 2 figures
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We present an efficient second-order algorithm with
    ( ilde{O}(frac{1}{eta}sqrt{T})) regret for the bandit online multiclass
    problem. The regret bound holds simultaneously with respect to a family of loss
    functions parameterized by (eta), for a range of (eta) restricted by the norm
    of the competitor. The family of loss functions ranges from hinge loss
    ((eta=0)) to squared hinge loss ((eta=1)). This provides a solution to the
    open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for
    (sqrt{T})-regret in online multiclass prediction? In COLT, 2009). We test our
    algorithm experimentally, showing that it also performs favorably against
    earlier algorithms.

    Generative Adversarial Active Learning

    Jia-Jie Zhu, Jose Bento
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We propose a new active learning approach using Generative Adversarial
    Networks (GAN). Different from regular active learning, we adaptively
    synthesize training instances for querying to increase learning speed. Our
    approach outperforms random generation using GAN alone in active learning
    experiments. We demonstrate the effectiveness of the proposed algorithm in
    various datasets when compared to other algorithms. To the best our knowledge,
    this is the first active learning work using GAN.

    Stochastic Variance Reduction Methods for Policy Evaluation

    Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (cs.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)

    Policy evaluation is a crucial step in many reinforcement-learning
    procedures, which estimates a value function that predicts states’ long-term
    value under a given policy. In this paper, we focus on policy evaluation with
    linear function approximation over a fixed dataset. We first transform the
    empirical policy evaluation problem into a (quadratic) convex-concave saddle
    point problem, and then present a primal-dual batch gradient method, as well as
    two stochastic variance reduction methods for solving the problem. These
    algorithms scale linearly in both sample size and feature dimension. Moreover,
    they achieve linear convergence even when the saddle-point problem has only
    strong concavity in the dual variables but no strong convexity in the primal
    variables. Numerical experiments on benchmark problems demonstrate the
    effectiveness of our methods.

    Efficient Learning of Graded Membership Models

    Zilong Tan, Sayan Mukherjee
    Comments: 23 pages, 1 figure, 1 table
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    We present an efficient algorithm for learning graded membership models when
    the number of variables (p) is much larger than the number of hidden components
    (k). This algorithm reduces the computational complexity of state-of-the-art
    tensor methods, which require decomposing an (Oleft(p^3
    ight)) tensor, to
    factorizing (Oleft(p/k
    ight)) sub-tensors each of size (Oleft(k^3
    ight)).
    In addition, we address the issue of negative entries in the empirical method
    of moments based estimators. We provide sufficient conditions under which our
    approach has provable guarantees. Our approach obtains competitive empirical
    results on both simulated and real data.

    Coarse Grained Exponential Variational Autoencoders

    Ke Sun, Xiangliang Zhang
    Subjects: Learning (cs.LG)

    Variational autoencoders (VAE) often use Gaussian or category distribution to
    model the inference process. This puts a limit on variational learning because
    this simplified assumption does not match the true posterior distribution,
    which is usually much more sophisticated. To break this limitation and apply
    arbitrary parametric distribution during inference, this paper derives a
    emph{semi-continuous} latent representation, which approximates a continuous
    density up to a prescribed precision, and is much easier to analyze than its
    continuous counterpart because it is fundamentally discrete. We showcase the
    proposition by applying polynomial exponential family distributions as the
    posterior, which are universal probability density function generators. Our
    experimental results show consistent improvements over commonly used VAE
    models.

    Online Learning with Many Experts

    Alon Cohen, Shie Mannor
    Subjects: Learning (cs.LG)

    We study the problem of prediction with expert advice when the number of
    experts in question may be extremely large or even infinite. We devise an
    algorithm that obtains a tight regret bound of (widetilde{O}(epsilon T + N +
    sqrt{NT})), where (N) is the empirical (epsilon)-covering number of the
    sequence of loss functions generated by the environment. In addition, we
    present a hedging procedure that allows us to find the optimal (epsilon) in
    hindsight.

    Finally, we discuss a few interesting applications of our algorithm. We show
    how our algorithm is applicable in the approximately low rank experts model of
    Hazan et al. (2016), and discuss the case of experts with bounded variation, in
    which there is a surprisingly large gap between the regret bounds obtained in
    the statistical and online settings.

    An Unsupervised Learning Method Exploiting Sequential Output Statistics

    Yu Liu, Jianshu Chen, Li Deng
    Subjects: Learning (cs.LG)

    We address a class of unsupervised learning problems where the same goal of
    supervised learning is aimed except with no output labels provided for training
    classifiers. This type of unsupervised learning is highly valuable in machine
    learning practice since obtaining labels in training data is often costly.
    Instead of pairing input-output samples, we exploit sequential statistics of
    output labels, in the form of N-gram language models, which can be obtained
    independently of input data and thus with low or no cost. We introduce a novel
    cost function in this unsupervised learning setting, whose profiles are
    analyzed and shown to be highly non-convex with large barriers near the global
    optimum. A new stochastic primal-dual gradient method is developed to optimize
    this very difficult type of cost function via the use of dual variables to
    reduce the barriers. We demonstrate in experimental evaluation, with both
    synthetic and real-world data sets, that the new method for unsupervised
    learning gives drastically lower errors and higher learning efficiency than the
    standard stochastic gradient descent, reaching classification errors about
    twice of those obtained by fully supervised learning. We also show the crucial
    role of labels’ sequential statistics exploited for label-free training with
    the new method, reflected by the significantly lower classification errors when
    higher-order language models are used in unsupervised learning than low-order
    ones.

    Adaptive Neural Networks for Fast Test-Time Prediction

    Tolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    We present an approach to adaptively utilize deep neural networks in order to
    reduce the evaluation time on new examples without loss of classification
    performance. Rather than attempting to redesign or approximate existing
    networks, we propose two schemes that adaptively utilize networks. First, we
    pose an adaptive network evaluation scheme, where we learn a system to
    adaptively choose the components of a deep network to be evaluated for each
    example. By allowing examples correctly classified using early layers of the
    system to exit, we avoid the computational time associated with full evaluation
    of the network. Building upon this approach, we then learn a network selection
    system that adaptively selects the network to be evaluated for each example. We
    exploit the fact that many examples can be correctly classified using
    relatively efficient networks and that complex, computationally costly networks
    are only necessary for a small fraction of examples. By avoiding evaluation of
    these complex networks for a large fraction of examples, computational time can
    be dramatically reduced. Empirically, these approaches yield dramatic
    reductions in computational cost, with up to a 2.8x speedup on state-of-the-art
    networks from the ImageNet image recognition challenge with minimal (less than
    1%) loss of accuracy.

    On the Origin of Deep Learning

    Haohan Wang, Bhiksha Raj, Eric P. Xing
    Comments: 81 pages, 200 references
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    This paper is a review of the evolutionary history of deep learning models.
    It covers from the genesis of neural networks when associationism modeling of
    the brain is studied, to the models that dominate the last decade of research
    in deep learning like convolutional neural networks, deep belief networks, and
    recurrent neural networks, and extends to popular recent models like
    variational autoencoder and generative adversarial nets. In addition to a
    review of these models, this paper primarily focuses on the precedents of the
    models above, examining how the initial ideas are assembled to construct the
    early models and how these preliminary models are developed into their current
    forms. Many of these evolutionary paths last more than half a century and have
    a diversity of directions. For example, CNN is built on prior knowledge of
    biological vision system; DBN is evolved from a trade-off of modeling power and
    computation complexity of graphical models and many nowadays models are neural
    counterparts of ancient linear models. This paper reviews these evolutionary
    paths and offers a concise thought flow of how these models are developed, and
    aims to provide a thorough background for deep learning. More importantly,
    along with the path, this paper summarizes the gist behind these milestones and
    proposes many directions to guide the future research of deep learning.

    Boundary-Seeking Generative Adversarial Networks

    R Devon Hjelm, Athul Paul Jacob, Tong Che, Kyunghyun Cho, Yoshua Bengio
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We introduce a novel approach to training generative adversarial networks,
    where we train a generator to match a target distribution that converges to the
    data distribution at the limit of a perfect discriminator. This objective can
    be interpreted as training a generator to produce samples that lie on the
    decision boundary of a current discriminator in training at each update, and we
    call a GAN trained using this algorithm a boundary-seeking GAN (BS-GAN). This
    approach can be used to train a generator with discrete output when the
    generator outputs a parametric conditional distribution. We demonstrate the
    effectiveness of the proposed algorithm with discrete image data. In contrary
    to the proposed algorithm, we observe that the recently proposed Gumbel-Softmax
    technique for re-parametrizing the discrete variables does not work for
    training a GAN with discrete data. Finally, we notice that the proposed
    boundary-seeking algorithm works even with continuous variables, and
    demonstrate its effectiveness with two widely used image data sets, SVHN and
    CelebA.

    An SDP-Based Algorithm for Linear-Sized Spectral Sparsification

    Yin Tat Lee, He Sun
    Comments: To appear at STOC’17
    Subjects: Data Structures and Algorithms (cs.DS); Learning (cs.LG)

    For any undirected and weighted graph (G=(V,E,w)) with (n) vertices and (m)
    edges, we call a sparse subgraph (H) of (G), with proper reweighting of the
    edges, a ((1+varepsilon))-spectral sparsifier if [
    (1-varepsilon)x^{intercal}L_Gxleq x^{intercal} L_{H} xleq (1+varepsilon)
    x^{intercal} L_Gx ] holds for any (xinmathbb{R}^n), where (L_G) and (L_{H})
    are the respective Laplacian matrices of (G) and (H). Noticing that (Omega(m))
    time is needed for any algorithm to construct a spectral sparsifier and a
    spectral sparsifier of (G) requires (Omega(n)) edges, a natural question is to
    investigate, for any constant (varepsilon), if a ((1+varepsilon))-spectral
    sparsifier of (G) with (O(n)) edges can be constructed in ( ilde{O}(m)) time,
    where the ( ilde{O}) notation suppresses polylogarithmic factors. All previous
    constructions on spectral sparsification require either super-linear number of
    edges or (m^{1+Omega(1)}) time.

    In this work we answer this question affirmatively by presenting an algorithm
    that, for any undirected graph (G) and (varepsilon>0), outputs a
    ((1+varepsilon))-spectral sparsifier of (G) with (O(n/varepsilon^2)) edges in
    ( ilde{O}(m/varepsilon^{O(1)})) time. Our algorithm is based on three novel
    techniques: (1) a new potential function which is much easier to compute yet
    has similar guarantees as the potential functions used in previous references;
    (2) an efficient reduction from a two-sided spectral sparsifier to a one-sided
    spectral sparsifier; (3) constructing a one-sided spectral sparsifier by a
    semi-definite program.

    Dynamic Word Embeddings via Skip-Gram Filtering

    Robert Bamler, Stephan Mandt
    Comments: 10 pages + supplement
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We present a probabilistic language model for time-stamped text data which
    tracks the semantic evolution of individual words over time. The model
    represents words and contexts by latent trajectories in an embedding space. At
    each moment in time, the embedding vectors are inferred from a probabilistic
    version of word2vec [Mikolov, 2013]. These embedding vectors are connected in
    time through a latent diffusion process. We describe two scalable variational
    inference algorithms—skip-gram smoothing and skip-gram filtering—that allow
    us to train the model jointly over all times; thus learning on all data while
    simultaneously allowing word and context vectors to drift. Experimental results
    on three different corpora demonstrate that our dynamic model infers word
    embedding trajectories that are more interpretable and lead to higher
    predictive likelihoods than competing methods that are based on static models
    trained separately on time slices.

    Approximate Inference with Amortised MCMC

    Yingzhen Li, Richard E. Turner, Qiang Liu
    Comments: Submitted to ICML 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We propose a novel approximate inference algorithm that approximates a target
    distribution by amortising the dynamics of a user-selected MCMC sampler. The
    idea is to initialise MCMC using samples from an approximation network, apply
    the MCMC operator to improve these samples, and finally use the samples to
    update the approximation network thereby improving its quality. This provides a
    new generic framework for approximate inference, allowing us to deploy highly
    complex, or implicitly defined approximation families with intractable
    densities, including approximations produced by warping a source of randomness
    through a deep neural network. Experiments consider image modelling with deep
    generative models as a challenging test for the method. Deep models trained
    using amortised MCMC are shown to generate realistic looking samples as well as
    producing diverse imputations for images with regions of missing pixels.

    Uniform Deviation Bounds for Unbounded Loss Functions like k-Means

    Olivier Bachem, Mario Lucic, S. Hamed Hassani, Andreas Krause
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Uniform deviation bounds limit the difference between a model’s expected loss
    and its loss on an empirical sample uniformly for all models in a learning
    problem. As such, they are a critical component to empirical risk minimization.
    In this paper, we provide a novel framework to obtain uniform deviation bounds
    for loss functions which are *unbounded*. In our main application, this allows
    us to obtain bounds for (k)-Means clustering under weak assumptions on the
    underlying distribution. If the fourth moment is bounded, we prove a rate of
    (mathcal{O}left(m^{-frac12}
    ight)) compared to the previously known
    (mathcal{O}left(m^{-frac14}
    ight)) rate. Furthermore, we show that the rate
    also depends on the kurtosis – the normalized fourth moment which measures the
    “tailedness” of a distribution. We further provide improved rates under
    progressively stronger assumptions, namely, bounded higher moments,
    subgaussianity and bounded support.

    Scalable and Distributed Clustering via Lightweight Coresets

    Olivier Bachem, Mario Lucic, Andreas Krause
    Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Computation (stat.CO)

    Coresets are compact representations of data sets such that models trained on
    a coreset are provably competitive with models trained on the full data set. As
    such, they have been successfully used to scale up clustering models to massive
    data sets. While existing approaches generally only allow for multiplicative
    approximation errors, we propose a novel notion of coresets called lightweight
    coresets that allows for both multiplicative and additive errors. We provide a
    single algorithm to construct light-weight coresets for k-Means clustering,
    Bregman clustering and maximum likelihood estimation of Gaussian mixture
    models. The algorithm is substantially faster than existing constructions,
    embarrassingly parallel and resulting coresets are smaller. In an extensive
    experimental evaluation, we demonstrate that the proposed method outperforms
    existing coreset constructions.

    Variational Inference using Implicit Distributions

    Ferenc Huszár
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Generative adversarial networks (GANs) have given us a great tool to fit
    implicit generative models to data. Implicit distributions are ones we can
    sample from easily, and take derivatives of samples with respect to model
    parameters. These models are highly expressive and we argue they can prove just
    as useful for variational inference (VI) as they are for generative modelling.
    Several papers have proposed GAN-like algorithms for inference, however,
    connections to the theory of VI are not always well understood. This paper
    provides a unifying review of existing algorithms establishing connections
    between variational autoencoders, adversarially learned inference, operator VI,
    GAN-based image reconstruction, and more. Secondly, the paper provides a
    framework for building new algorithms: depending on the way the variational
    bound is expressed we introduce prior-contrastive and joint-contrastive
    methods, and show practical inference algorithms based on either density ratio
    estimation or denoising.

    Online Nonparametric Learning, Chaining, and the Role of Partial Feedback

    Nicolò Cesa-Bianchi, Pierre Gaillard (SIERRA, Inria), Claudio Gentile, Sébastien Gerchinovitz (AOC, IMT)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Statistics Theory (math.ST)

    We investigate contextual online learning with nonparametric (Lipschitz)
    comparison classes under different assumptions on losses and feedback
    information. For full information feedback and Lipschitz losses, we
    characterize the minimax regret up to log factors by proving an upper bound
    matching a previously known lower bound. In a partial feedback model motivated
    by second-price auctions, we prove upper bounds for Lipschitz and
    semi-Lipschitz losses that improve on the known bounds for standard bandit
    feedback. Our analysis combines novel results for contextual second-price
    auctions with a novel algorithmic approach based on chaining. When the context
    space is Euclidean, our chaining approach is efficient and delivers an even
    better regret bound.

    DeepNAT: Deep Convolutional Neural Network for Segmenting Neuroanatomy

    Christian Wachinger, Martin Reuter, Tassilo Klein
    Comments: Accepted for publication in NeuroImage, special issue “Brain Segmentation and Parcellation”, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    We introduce DeepNAT, a 3D Deep convolutional neural network for the
    automatic segmentation of NeuroAnaTomy in T1-weighted magnetic resonance
    images. DeepNAT is an end-to-end learning-based approach to brain segmentation
    that jointly learns an abstract feature representation and a multi-class
    classification. We propose a 3D patch-based approach, where we do not only
    predict the center voxel of the patch but also neighbors, which is formulated
    as multi-task learning. To address a class imbalance problem, we arrange two
    networks hierarchically, where the first one separates foreground from
    background, and the second one identifies 25 brain structures on the
    foreground. Since patches lack spatial context, we augment them with
    coordinates. To this end, we introduce a novel intrinsic parameterization of
    the brain volume, formed by eigenfunctions of the Laplace-Beltrami operator. As
    network architecture, we use three convolutional layers with pooling, batch
    normalization, and non-linearities, followed by fully connected layers with
    dropout. The final segmentation is inferred from the probabilistic output of
    the network with a 3D fully connected conditional random field, which ensures
    label agreement between close voxels. The roughly 2.7 million parameters in the
    network are learned with stochastic gradient descent. Our results show that
    DeepNAT compares favorably to state-of-the-art methods. Finally, the purely
    learning-based method may have a high potential for the adaptation to young,
    old, or diseased brains by fine-tuning the pre-trained network with a small
    training sample on the target application, where the availability of larger
    datasets with manual annotations may boost the overall segmentation accuracy in
    the future.

    Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

    Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick
    Comments: 12 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Learning (cs.LG)

    Recent work on generative modeling of text has found that variational
    auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM
    language models (Bowman et al., 2015). This negative result is so far poorly
    understood, but has been attributed to the propensity of LSTM decoders to
    ignore conditioning information from the encoder. In this paper, we experiment
    with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s
    dilation architecture, we control the effective context from previously
    generated words. In experiments, we find that there is a trade off between the
    contextual capacity of the decoder and the amount of encoding information used.
    We show that with the right decoder, VAE can outperform LSTM language models.
    We demonstrate perplexity gains on two datasets, representing the first
    positive experimental result on the use VAE for generative modeling of text.
    Further, we conduct an in-depth investigation of the use of VAE (with our new
    decoding architecture) for semi-supervised and unsupervised labeling tasks,
    demonstrating gains over several strong baselines.

    Selection of training populations (and other subset selection problems) with an accelerated genetic algorithm (STPGA: An R-package for selection of training populations with a genetic algorithm)

    Deniz Akdemir
    Subjects: Methodology (stat.ME); Learning (cs.LG); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM); Applications (stat.AP)

    Optimal subset selection is an important task that has numerous algorithms
    designed for it and has many application areas. STPGA contains a special
    genetic algorithm supplemented with a tabu memory property (that keeps track of
    previously tried solutions and their fitness for a number of iterations), and
    with a regression of the fitness of the solutions on their coding that is used
    to form the ideal estimated solution (look ahead property) to search for
    solutions of generic optimal subset selection problems. I have initially
    developed the programs for the specific problem of selecting training
    populations for genomic prediction or association problems, therefore I give
    discussion of the theory behind optimal design of experiments to explain the
    default optimization criteria in STPGA, and illustrate the use of the programs
    in this endeavor. Nevertheless, I have picked a few other areas of application:
    supervised and unsupervised variable selection based on kernel alignment,
    supervised variable selection with design criteria, influential observation
    identification for regression, solving mixed integer quadratic optimization
    problems, balancing gains and inbreeding in a breeding population. Some of
    these illustrations pertain new statistical approaches.

    Criticality and Deep Learning, Part I: Theory vs. Empirics

    Dan Oprisa, Peter Toth
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    Motivated by the idea that criticality and universality of phase transitions
    might play a crucial role in achieving and sustaining learning and intelligent
    behaviour in biological and artificial networks, we analyse a theoretical and a
    pragmatic experimental set up for critical phenomena in deep learning. On the
    theoretical side, we use results from statistical physics to carry out critical
    point calculations in feed-forward/fully connected networks, while on the
    experimental side we set out to find traces of criticality in deep neural
    networks. This is our first step in a series of upcoming investigations to map
    out the relationship between criticality and learning in deep networks.

    Support vector machine and its bias correction in high-dimension, low-sample-size settings

    Yugo Nakayama, Kazuyoshi Yata, Makoto Aoshima
    Comments: 23 pages, 3 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    In this paper, we consider asymptotic properties of the support vector
    machine (SVM) in high-dimension, low-sample-size (HDLSS) settings. We show that
    the hard-margin linear SVM holds a consistency property in which
    misclassification rates tend to zero as the dimension goes to infinity under
    certain severe conditions. We show that the SVM is very biased in HDLSS
    settings and its performance is affected by the bias directly. In order to
    overcome such difficulties, we propose a bias-corrected SVM (BC-SVM). We show
    that the BC-SVM gives preferable performances in HDLSS settings. We also
    discuss the SVMs in multiclass HDLSS settings. Finally, we check the
    performance of the classifiers in actual data analyses.

    Kiefer Wolfowitz Algorithm is Asymptotically Optimal for a Class of Non-Stationary Bandit Problems

    Rahul Singh, Taposh Banerjee
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    We consider the problem of designing an allocation rule or an “online
    learning algorithm” for a class of bandit problems in which the set of control
    actions available at each time (t) is a convex, compact subset of
    (mathbb{R}^d). Upon choosing an action (x) at time (t), the algorithm obtains
    a noisy value of the unknown and time-varying function (f_t) evaluated at (x).
    The “regret” of an algorithm is the gap between its expected reward, and the
    reward earned by a strategy which has the knowledge of the function (f_t) at
    each time (t) and hence chooses the action (x_t) that maximizes (f_t).

    For this non-stationary bandit problem set-up, we propose two variants of the
    Kiefer Wolfowitz (KW) algorithm i) KW with fixed step-size (eta), and ii) KW
    with sliding window of length (L). We show that if the number of times that the
    function (f_t) varies during time (t) is (o(T)), and if the learning rates of
    the proposed algorithms are chosen “optimally”, then the regret of the proposed
    algorithms is (o(T)).

    Maximum-Likelihood Augmented Discrete Generative Adversarial Networks

    Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, Yoshua Bengio
    Comments: 11 pages, 3 figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    Despite the successes in capturing continuous distributions, the application
    of generative adversarial networks (GANs) to discrete settings, like natural
    language tasks, is rather restricted. The fundamental reason is the difficulty
    of back-propagation through discrete random variables combined with the
    inherent instability of the GAN training objective. To address these problems,
    we propose Maximum-Likelihood Augmented Discrete Generative Adversarial
    Networks. Instead of directly optimizing the GAN objective, we derive a novel
    and low-variance objective using the discriminator’s output that follows
    corresponds to the log-likelihood. Compared with the original, the new
    objective is proved to be consistent in theory and beneficial in practice. The
    experimental results on various discrete datasets demonstrate the effectiveness
    of the proposed approach.

    Ratio Utility and Cost Analysis for Privacy Preserving Subspace Projection

    Mert Al, Shibiao Wan, Sun-Yuan Kung
    Comments: Submitted to ICASSP 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    With a rapidly increasing number of devices connected to the internet, big
    data has been applied to various domains of human life. Nevertheless, it has
    also opened new venues for breaching users’ privacy. Hence it is highly
    required to develop techniques that enable data owners to privatize their data
    while keeping it useful for intended applications. Existing methods, however,
    do not offer enough flexibility for controlling the utility-privacy trade-off
    and may incur unfavorable results when privacy requirements are high. To tackle
    these drawbacks, we propose a compressive-privacy based method, namely RUCA
    (Ratio Utility and Cost Analysis), which can not only maximize performance for
    a privacy-insensitive classification task but also minimize the ability of any
    classifier to infer private information from the data. Experimental results on
    Census and Human Activity Recognition data sets demonstrate that RUCA
    significantly outperforms existing privacy preserving data projection
    techniques for a wide range of privacy pricings.

    CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

    Andre Viebke, Suejb Memeti, Sabri Pllana, Ajith Abraham
    Comments: The Journal of Supercomputing, 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep learning is an important component of big-data analytic tools and
    intelligent applications, such as, self-driving cars, computer vision, speech
    recognition, or precision medicine. However, the training process is
    computationally intensive, and often requires a large amount of time if
    performed sequentially. Modern parallel computing systems provide the
    capability to reduce the required training time of deep neural networks. In
    this paper, we present our parallelization scheme for training convolutional
    neural networks (CNN) named Controlled Hogwild with Arbitrary Order of
    Synchronization (CHAOS). Major features of CHAOS include the support for thread
    and vector parallelism, non-instant updates of weight parameters during
    back-propagation without a significant delay, and implicit synchronization in
    arbitrary order. CHAOS is tailored for parallel computing systems that are
    accelerated with the Intel Xeon Phi. We evaluate our parallelization approach
    empirically using measurement techniques and performance modeling for various
    numbers of threads and CNN architectures. Experimental results for the MNIST
    dataset of handwritten digits using the total number of threads on the Xeon Phi
    show speedups of up to 103x compared to the execution on one thread of the Xeon
    Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x
    compared to the sequential execution on Intel Core i5.

    An EM Based Probabilistic Two-Dimensional CCA with Application to Face Recognition

    Mehran Safayani, Seyed Hashem Ahmadi, Homayun Afrabandpey, Abdolreza Mirzaei
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

    Recently, two-dimensional canonical correlation analysis (2DCCA) has been
    successfully applied for image feature extraction. The method instead of
    concatenating the columns of the images to the one-dimensional vectors,
    directly works with two-dimensional image matrices. Although 2DCCA works well
    in different recognition tasks, it lacks a probabilistic interpretation. In
    this paper, we present a probabilistic framework for 2DCCA called probabilistic
    2DCCA (P2DCCA) and an iterative EM based algorithm for optimizing the
    parameters. Experimental results on synthetic and real data demonstrate
    superior performance in loading factor estimation for P2DCCA compared to 2DCCA.
    For real data, three subsets of AR face database and also the UMIST face
    database confirm the robustness of the proposed algorithm in face recognition
    tasks with different illumination conditions, facial expressions, poses and
    occlusions.

    Efficient coordinate-wise leading eigenvector computation

    Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro
    Subjects: Numerical Analysis (cs.NA); Learning (cs.LG); Machine Learning (stat.ML)

    We develop and analyze efficient “coordinate-wise” methods for finding the
    leading eigenvector, where each step involves only a vector-vector product. We
    establish global convergence with overall runtime guarantees that are at least
    as good as Lanczos’s method and dominate it for slowly decaying spectrum. Our
    methods are based on combining a shift-and-invert approach with coordinate-wise
    algorithms for linear regression.

    Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

    Brent Harrison, Upol Ehsan, Mark O. Riedl
    Comments: 9 pages, 4 figures
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    We introduce AI rationalization, an approach for generating explanations of
    autonomous system behavior as if a human had done the behavior. We describe a
    rationalization technique that uses neural machine translation to translate
    internal state-action representations of the autonomous agent into natural
    language. We evaluate our technique in the Frogger game environment. The
    natural language is collected from human players thinking out loud as they play
    the game. We motivate the use of rationalization as an approach to explanation
    generation, show the results of experiments on the accuracy of our
    rationalization technique, and describe future research agenda.

    Deep Voice: Real-time Neural Text-to-Speech

    Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi
    Comments: Submitted to ICML 2017
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

    We present Deep Voice, a production-quality text-to-speech system constructed
    entirely from deep neural networks. Deep Voice lays the groundwork for truly
    end-to-end neural speech synthesis. The system comprises five major building
    blocks: a segmentation model for locating phoneme boundaries, a
    grapheme-to-phoneme conversion model, a phoneme duration prediction model, a
    fundamental frequency prediction model, and an audio synthesis model. For the
    segmentation model, we propose a novel way of performing phoneme boundary
    detection with deep neural networks using connectionist temporal classification
    (CTC) loss. For the audio synthesis model, we implement a variant of WaveNet
    that requires fewer parameters and trains faster than the original. By using a
    neural network for each component, our system is simpler and more flexible than
    traditional text-to-speech systems, where each component requires laborious
    feature engineering and extensive domain expertise. Finally, we show that
    inference with our system can be performed faster than real time and describe
    optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x
    speedups over existing implementations.

    Rank-to-engage: New Listwise Approaches to Maximize Engagement

    Swayambhoo Jain, Akshay Soni, Nikolay Laptev, Yashar Mehdad
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    For many internet businesses, presenting a given list of items in an order
    that maximizes a certain metric of interest (e.g., click-through-rate, average
    engagement time etc.) is crucial. We approach the aforementioned task from a
    learning-to-rank perspective which reveals a new problem setup. In traditional
    learning-to-rank literature, it is implicitly assumed that during the training
    data generation one has access to the emph{best or desired} order for the
    given list of items. In this work, we consider a problem setup where we do not
    observe the desired ranking. We present two novel solutions: the first solution
    is an extension of already existing listwise learning-to-rank
    technique–Listwise maximum likelihood estimation (ListMLE)–while the second
    one is a generic machine learning based framework that tackles the problem in
    its entire generality. We discuss several challenges associated with this
    generic framework, and propose a simple emph{item-payoff} and
    emph{positional-gain} model that addresses these challenges. We provide
    training algorithms, inference procedures, and demonstrate the effectiveness of
    the two approaches over traditional ListMLE on synthetic as well as on
    real-life setting of ranking news articles for increased dwell time.

    Activation Ensembles for Deep Neural Networks

    Mark Harmon, Diego Klabjan
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Many activation functions have been proposed in the past, but selecting an
    adequate one requires trial and error. We propose a new methodology of
    designing activation functions within a neural network at each layer. We call
    this technique an “activation ensemble” because it allows the use of multiple
    activation functions at each layer. This is done by introducing additional
    variables, (alpha), at each activation layer of a network to allow for
    multiple activation functions to be active at each neuron. By design,
    activations with larger (alpha) values at a neuron is equivalent to having the
    largest magnitude. Hence, those higher magnitude activations are “chosen” by
    the network. We implement the activation ensembles on a variety of datasets
    using an array of Feed Forward and Convolutional Neural Networks. By using the
    activation ensemble, we achieve superior results compared to traditional
    techniques. In addition, because of the flexibility of this methodology, we
    more deeply explore activation functions and the features that they capture.

    Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

    Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
    Comments: Accepted to IJCNN2017, Anchorage, Alaska, USA
    Subjects: Sound (cs.SD); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Environmental audio tagging is a newly proposed task to predict the presence
    or absence of a specific audio event in a chunk. Deep neural network (DNN)
    based methods have been successfully adopted for predicting the audio tags in
    the domestic audio scene. In this paper, we propose to use a convolutional
    neural network (CNN) to extract robust features from mel-filter banks (MFBs),
    spectrograms or even raw waveforms for audio tagging. Gated recurrent unit
    (GRU) based recurrent neural networks (RNNs) are then cascaded to model the
    long-term temporal structure of the audio signal. To complement the input
    information, an auxiliary CNN is designed to learn on the spatial features of
    stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging)
    of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE
    2016) challenge. Compared with our recent DNN-based method, the proposed
    structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the
    development set. The spatial features can further reduce the EER to 0.10. The
    performance of the end-to-end learning on raw waveforms is also comparable.
    Finally, on the evaluation set, we get the state-of-the-art performance with
    0.12 EER while the performance of the best existing system is 0.15 EER.

    Changing Model Behavior at Test-Time Using Reinforcement Learning

    Augustus Odena, Dieterich Lawson, Christopher Olah
    Comments: Submitted to ICLR 2017 Workshop Track
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Machine learning models are often used at test-time subject to constraints
    and trade-offs not present at training-time. For example, a computer vision
    model operating on an embedded device may need to perform real-time inference,
    or a translation model operating on a cell phone may wish to bound its average
    compute time in order to be power-efficient. In this work we describe a
    mixture-of-experts model and show how to change its test-time resource-usage on
    a per-input basis using reinforcement learning. We test our method on a small
    MNIST-based example.

    A supervised approach to time scale detection in dynamic networks

    Benjamin Fish, Rajmonda S. Caceres
    Subjects: Social and Information Networks (cs.SI); Learning (cs.LG)

    For any stream of time-stamped edges that form a dynamic network, an
    important choice is the aggregation granularity that an analyst uses to bin the
    data. Picking such a windowing of the data is often done by hand, or left up to
    the technology that is collecting the data. However, the choice can make a big
    difference in the properties of the dynamic network. This is the time scale
    detection problem. In previous work, this problem is often solved with a
    heuristic as an unsupervised task. As an unsupervised problem, it is difficult
    to measure how well a given algorithm performs. In addition, we show that the
    quality of the windowing is dependent on which task an analyst wants to perform
    on the network after windowing. Therefore the time scale detection problem
    should not be handled independently from the rest of the analysis of the
    network.

    We introduce a framework that tackles both of these issues: By measuring the
    performance of the time scale detection algorithm based on how well a given
    task is accomplished on the resulting network, we are for the first time able
    to directly compare different time scale detection algorithms to each other.
    Using this framework, we introduce time scale detection algorithms that take a
    supervised approach: they leverage ground truth on training data to find a good
    windowing of the test data. We compare the supervised approach to previous
    approaches and several baselines on real data.


    Information Theory

    On Fienup Methods for Regularized Phase Retrieval

    Edouard Pauwels, Amir Beck, Yonina C. Eldar, Shoham Sabach
    Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

    Alternating minimization, or Fienup methods, have a long history in phase
    retrieval. We provide new insights related to the empirical and theoretical
    analysis of these algorithms when used with Fourier measurements and combined
    with convex priors. In particular, we show that Fienup methods can be viewed as
    performing alternating minimization on a regularized nonconvex least-squares
    problem with respect to amplitude measurements. We then prove that under mild
    additional structural assumptions on the prior (semi-algebraicity), the
    sequence of signal estimates has a smooth convergent behaviour towards a
    critical point of the nonconvex regularized least-squares objective. Finally,
    we propose an extension to Fienup techniques, based on a projected gradient
    descent interpretation and acceleration using inertial terms. We demonstrate
    experimentally that this modification combined with an (ell_1) prior
    constitutes a competitive approach for sparse phase retrieval.

    Upper and Lower Bounds for the Ergodic Capacity of MIMO Jacobi Fading Channels

    Amor Nafkha, Remi Bonnefoi
    Comments: 4 pages, 6 figures, submitted to the IEEE Photonics Technology Letters
    Subjects: Information Theory (cs.IT)

    In multi-(core/mode) optical fiber communication, the transmission channel
    can be modeled as a complex sub-matrix of the Haar-distributed unitary matrix
    (complex Jacobi unitary ensemble). In this letter, we present new analytical
    expressions of the upper and lower bounds for the ergodic capacity of
    multiple-input multiple-output Jacobi-fading channels. Recent results on the
    determinant of the Jacobi unitary ensemble are employed to derive a tight lower
    bound on the ergodic capacity. We use Jensen’s inequality to provide an
    analytical closed-form upper bound to the ergodic capacity at any
    signal-to-noise ratio (SNR). Closed-form expressions of the ergodic capacity,
    at low and high SNR regimes, are also derived. Simulation results are presented
    to validate the accuracy of the derived expressions.

    On the second Feng-Rao distance of Algebraic Geometry codes related to Arf semigroups

    J. I. Farrán, P. A. García-Sánchez, B. A. Heredia
    Subjects: Information Theory (cs.IT); Combinatorics (math.CO)

    We describe the second (generalized) Feng-Rao distance for elements in an Arf
    numerical semigroup that are greater than or equal to the conductor of the
    semigroup. This provides a lower bound for the second Hamming weight for one
    point AG codes. In particular, we can obtain the second Feng-Rao distance for
    the codes defined by asymptotically good towers of function fields whose
    Weierstrass semigroups are inductive. In addition, we compute the second
    Feng-Rao number, and provide some examples and comparisons with previous
    results on this topic. These calculations rely on Ap'{e}ry sets, and thus
    several results concerning Ap’ery sets of Arf semigroups are presented.

    A KZ Reduction Algorithm

    Jinming Wen, Xiao-Wen Chang
    Comments: This work was presented in part at the IEEE International Symposium on Information Theory (ISIT 2015), Hongkong: arXiv:1504.05086
    Subjects: Information Theory (cs.IT)

    The Korkine-Zolotareff (KZ) reduction is one of the often used reduction
    strategies for decoding lattices. A KZ reduction algorithm involves solving
    shortest vector problems (SVPs) and basis expansion. In this paper, first we
    improve the commonly used Schnorr-Euchner search strategy for solving SVPs.
    Then, we derive upper bounds on the magnitudes of the entries of any solution
    of a SVP when its basis matrix is LLL reduced. These upper bounds only involve
    the parameter of the LLL reduction and the dimension of the solution and they
    are useful for analyzing the complexity of the basis expansion in a KZ
    reduction algorithm. Finally, we modify the basis expansion method proposed by
    Zhang et al. and combine it with the improved Schnorr-Euchner search strategy
    to give a new KZ reduction algorithm. Simulation results show that the new KZ
    reduction algorithm is much faster and numerically more stable than the KZ
    reduction algorithm proposed by Zhang et al., especially when the basis matrix
    is ill conditioned.

    A General Framework for Low-Resolution Receivers for MIMO Channels

    Stefano Rini, Luca Barletta, Yonina C. Eldar, Elza Erkip
    Subjects: Information Theory (cs.IT)

    The capacity of a discrete-time multi-input multioutput (MIMO) Gaussian
    channel with output quantization is investigated for different analog receiver
    architectures. A general formulation of this problem is proposed in which the
    antenna outputs are processed by analog combiners while sign quantizers are
    used for analog/digital conversion. The configuration of the analog combiners
    is chosen as a function of the channel realization, so that capacity can be
    maximized over the set of available configurations for a fixed number of sign
    quantizers. To exemplify this approach, three analog receiver architectures of
    increasing generality are considered: (a) sign quantization of the antenna
    outputs, (b) antenna selection and multilevel quantization and (c) linear
    combining of the antenna outputs and multilevel quantization. By comparing the
    largest attainable rates for a given number of sign quantizers, it is possible
    to quantify the limiting rate advantages provided by each of the receiver
    analog architectures. In particular, it is shown that sign quantization of the
    antenna outputs is sufficient, among all possible receiver architectures
    studied, to attain the optimal high signal-to-noise ratio (SNR) performance for
    MIMO receiver in which the number of antennas is larger than the number of sign
    quantizers.

    Multiuser Precoding and Channel Estimation for Hybrid Millimeter Wave MIMO Systems

    Lou Zhao, Derrick Wing Kwan Ng, Jinhong Yuan
    Comments: 6 pages, accepted for presentation, ICC 2017
    Subjects: Information Theory (cs.IT)

    In this paper, we develop a low-complexity channel estimation for hybrid
    millimeter wave (mmWave) systems, where the number of radio frequency (RF)
    chains is much less than the number of antennas equipped at each transceiver.
    The proposed channel estimation algorithm aims to estimate the strongest
    angle-of-arrivals (AoAs) at both the base station (BS) and the users. Then all
    the users transmit orthogonal pilot symbols to the BS via these estimated
    strongest AoAs to facilitate the channel estimation. The algorithm does not
    require any explicit channel state information (CSI) feedback from the users
    and the associated signalling overhead of the algorithm is only proportional to
    the number of users, which is significantly less compared to various existing
    schemes. Besides, the proposed algorithm is applicable to both non-sparse and
    sparse mmWave channel environments. Based on the estimated CSI, zero-forcing
    (ZF) precoding is adopted for multiuser downlink transmission. In addition, we
    derive a tight achievable rate upper bound of the system. Our analytical and
    simulation results show that the proposed scheme offer a considerable
    achievable rate gain compared to fully digital systems, where the number of RF
    chains equipped at each transceiver is equal to the number of antennas.
    Furthermore, the achievable rate performance gap between the considered hybrid
    mmWave systems and the fully digital system is characterized, which provides
    useful system design insights.

    Lattice Coding and Decoding for Multiple-Antenna Ergodic Fading Channels

    Ahmed Hindy, Aria Nosratinia
    Comments: Accepted at IEEE Transactions on Communications
    Subjects: Information Theory (cs.IT)

    For ergodic fading, a lattice coding and decoding strategy is proposed and
    its performance is analyzed for the single-input single-output (SISO) and
    multiple-input multiple-output (MIMO) point-to-point channel as well as the
    multiple-access channel (MAC), with channel state information available only at
    the receiver (CSIR). At the decoder a novel strategy is proposed consisting of
    a time-varying equalization matrix followed by decision regions that depend
    only on channel statistics, not individual realizations. Our encoder has a
    similar structure to that of Erez and Zamir. For the SISO channel, the gap to
    capacity is bounded by a constant under a wide range of fading distributions.
    For the MIMO channel under Rayleigh fading, the rate achieved is within a gap
    to capacity that does not depend on the signal-to-noise ratio (SNR), and
    diminishes with the number of receive antennas. The analysis is extended to the
    K-user MAC where similar results hold. Achieving a small gap to capacity while
    limiting the use of CSIR to the equalizer highlights the scope for efficient
    decoder implementations, since decision regions are fixed, i.e., independent of
    channel realizations.

    On Algorithmic Statistics for space-bounded algorithms

    Alexey Milovanov
    Comments: accepted to CSR 2017 conference
    Subjects: Information Theory (cs.IT); Computational Complexity (cs.CC)

    Algorithmic statistics studies explanations of observed data that are good in
    the algorithmic sense: an explanation should be simple i.e. should have small
    Kolmogorov complexity and capture all the algorithmically discoverable
    regularities in the data. However this idea can not be used in practice because
    Kolmogorov complexity is not computable.

    In this paper we develop algorithmic statistics using space-bounded
    Kolmogorov complexity. We prove an analogue of one of the main result of
    `classic’ algorithmic statistics (about the connection between optimality and
    randomness deficiences). The main tool of our proof is the Nisan-Wigderson
    generator.

    Topological Interference Management with Decoded Message Passing

    Xinping Yi, Giuseppe Caire
    Comments: A short version has been presented at IEEE International Symposium on Information Theory (ISIT 2016), Barcelona
    Subjects: Information Theory (cs.IT)

    The topological interference management (TIM) problem studies
    partially-connected interference networks with no channel state information
    except for the network topology (i.e., connectivity graph) at the transmitters.
    In this paper, we consider a similar problem in the uplink cellular networks,
    while message passing is enabled at the receivers (e.g., base stations), so
    that the decoded messages can be routed to other receivers via backhaul links
    to help further improve network performance. For this TIM problem with decoded
    message passing (TIM-MP), we model the interference pattern by conflict
    digraphs, connect orthogonal access to the acyclic set coloring on conflict
    digraphs, and show that one-to-one interference alignment boils down to
    orthogonal access because of message passing. With the aid of polyhedral
    combinatorics, we identify the structural properties of certain classes of
    network topologies where orthogonal access achieves the optimal
    degrees-of-freedom (DoF) region in the information-theoretic sense. The
    relation to the conventional index coding with simultaneous decoding is also
    investigated by formulating a generalized index coding problem with successive
    decoding as a result of decoded message passing. The properties of reducibility
    and criticality are also studied, by which we are able to prove the linear
    optimality of orthogonal access in terms of symmetric DoF for the networks up
    to four users with all possible network topologies (218 instances). Practical
    issues of the tradeoff between the overhead of message passing and the
    achievable symmetric DoF are also discussed, in the hope of facilitating
    efficient backhaul utilization.

    Row-Centric Lossless Compression of Markov Images

    Matthew G. Reyes, David L. Neuhoff
    Comments: submitted to ISIT 2017
    Subjects: Information Theory (cs.IT)

    Motivated by the question of whether the recently introduced Reduced Cutset
    Coding (RCC) offers rate-complexity performance benefits over conventional
    context-based conditional coding for sources with two-dimensional Markov
    structure, this paper compares several row-centric coding strategies that vary
    in the amount of conditioning as well as whether a model or an empirical table
    is used in the encoding of blocks of rows. The conclusion is that, at least for
    sources exhibiting low-order correlations, 1-sided model-based conditional
    coding is superior to the method of RCC for a given constraint on complexity,
    and conventional context-based conditional coding is nearly as good as the
    1-sided model-based coding.

    Delay-Optimal Probabilistic Scheduling in Green Communications with Arbitrary Arrival and Adaptive Transmission

    Xiang Chen, Wei Chen, Joohyun Lee, Ness B. Shroff
    Comments: 12 pages, 7 figures, extended version for our ICC’17 paper. arXiv admin note: substantial text overlap with arXiv:1609.03260
    Subjects: Information Theory (cs.IT)

    In this paper, we aim to obtain the optimal delay-power tradeoff and the
    corresponding optimal scheduling policy for arbitrary i.i.d. arrival process
    and adaptive transmissions. The number of backlogged packets at the transmitter
    is known to a scheduler, who has to determine how many backlogged packets to
    transmit during each time slot. The power consumption is assumed to be convex
    in transmission rates. Hence, if the scheduler transmits faster, the delay will
    be reduced but with higher power consumption. To obtain the optimal delay-power
    tradeoff and the corresponding optimal policy, we model the problem as a
    Constrained Markov Decision Process (CMDP), where we minimize the average delay
    given an average power constraint. By steady-state analysis and Lagrangian
    relaxation, we can show that the optimal tradeoff curve is decreasing, convex,
    and piecewise linear, and the optimal policy is threshold-based. Based on the
    revealed properties of the optimal policy, we develop an algorithm to
    efficiently obtain the optimal tradeoff curve and the optimal policy. The
    complexity of our proposed algorithm is much lower than a general algorithm
    based on Linear Programming. We validate the derived results and the proposed
    algorithm through Linear Programming and simulations.

    Benefits of Cache Assignment on Degraded Broadcast Channels

    Shirin Saeedi Bidokhti, Michele Wigger, Aylin Yener
    Comments: Submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    Degraded K-user broadcast channels (BC) are studied when receivers are
    facilitated with cache memories. Lower and upper bounds are derived on the
    capacity-memory tradeoff, i.e., on the largest rate of reliable communication
    over the BC as a function of the receivers’ cache sizes, and the bounds are
    shown to match for some special cases. The lower bounds are achieved by two new
    coding schemes that benefit from non-uniform cache assignment. Lower and upper
    bounds are also established on the global capacity-memory tradeoff, i.e., on
    the largest capacity-memory tradeoff that can be attained by optimizing the
    receivers’ cache sizes subject to a total cache memory budget. The bounds
    coincide when the total cache memory budget is sufficiently small or
    sufficiently large, characterized in terms of the BC statistics. For small
    cache memories, it is optimal to assign all the cache memory to the weakest
    receiver. In this regime, the global capacity-memory tradeoff grows as the
    total cache memory budget divided by the number of files in the system. In
    other words, a perfect global caching gain is achievable in this regime and the
    performance corresponds to a system where all cache contents in the network are
    available to all receivers. For large cache memories, it is optimal to assign a
    positive cache memory to every receiver such that the weaker receivers are
    assigned larger cache memories compared to the stronger receivers. In this
    regime, the growth rate of the global capacity-memory tradeoff is further
    divided by the number of users, which corresponds to a local caching gain.
    Numerical indicate suggest that a uniform cache-assignment of the total cache
    memory is suboptimal in all regimes unless the BC is completely symmetric. For
    erasure BCs, this claim is proved analytically in the regime of small
    cache-sizes.

    Euclidean and Hermitian LCD MDS codes

    Claude Carlet, Sihem Mesnager, Chunming Tang, Yanfeng Qi
    Subjects: Information Theory (cs.IT)

    Linear codes with complementary duals (abbreviated LCD) are linear codes
    whose intersection with their dual is trivial. When they are binary, they play
    an important role in armoring implementations against side-channel attacks and
    fault injection attacks. Non-binary LCD codes in characteristic 2 can be
    transformed into binary LCD codes by expansion. On the other hand, being
    optimal codes, maximum distance separable codes (abbreviated MDS) have been of
    much interest from many researchers due to their theoretical significant and
    practical implications. However, little work has been done on LCD MDS codes. In
    particular, determining the existence of (q)-ary ([n,k]) LCD MDS codes for
    various lengths (n) and dimensions (k) is a basic and interesting problem. In
    this paper, we firstly study the problem of the existence of (q)-ary ([n,k])
    LCD MDS codes and completely solve it for the Euclidean case. More
    specifically, we show that for (q>3) there exists a (q)-ary ([n,k]) Euclidean
    LCD MDS code, where (0le k le nle q+1), or, (q=2^{m}), (n=q+2) and (k= 3
    ext{or} q-1). Secondly, we investigate several constructions of new Euclidean
    and Hermitian LCD MDS codes. Our main techniques in constructing Euclidean and
    Hermitian LCD MDS codes use some linear codes with small dimension or
    codimension, self-orthogonal codes and generalized Reed-Solomon codes.

    Exact Random Coding Exponents and Universal Decoders for the Asymmetric Broadcast Channel

    Ran Averbuch, Neri Merhav
    Subjects: Information Theory (cs.IT)

    This work contains two main contributions concerning the asymmetric broadcast
    channel. The first is an analysis of the exact random coding error exponents
    for both users, and the second is the derivation of universal decoders for both
    users. These universal decoders are certain variants of the maximum mutual
    information (MMI) universal decoder, which achieve the corresponding random
    coding exponents of optimal decoding. In addition, we introduce some lower
    bounds, which involve optimization over very few parameters, unlike the
    original, exact exponents, which involve minimizations over auxiliary
    probability distributions. Numerical results for the binary symmetric broadcast
    channel show improvements over previously derived error exponents for the same
    model.

    Multi-scale Spectrum Sensing in Small-Cell mm-Wave Cognitive Wireless Networks

    Nicolo Michelusi, Matthew Nokleby, Urbashi Mitra, Robert Calderbank
    Comments: To appear on ICC 2017
    Subjects: Information Theory (cs.IT)

    In this paper, a multi-scale approach to spectrum sensing in cognitive
    cellular networks is proposed. In order to overcome the huge cost incurred in
    the acquisition of full network state information, a hierarchical scheme is
    proposed, based on which local state estimates are aggregated up the hierarchy
    to obtain aggregate state information at multiple scales, which are then sent
    back to each cell for local decision making. Thus, each cell obtains
    fine-grained estimates of the channel occupancies of nearby cells, but
    coarse-grained estimates of those of distant cells. The performance of the
    aggregation scheme is studied in terms of the trade-off between the throughput
    achievable by secondary users and the interference generated by the activity of
    these secondary users to primary users. In order to account for the irregular
    structure of interference patterns arising from path loss, shadowing, and
    blockages, which are especially relevant in millimeter wave networks, a greedy
    algorithm is proposed to find a multi-scale aggregation tree to optimize the
    performance. It is shown numerically that this tailored hierarchy outperforms a
    regular tree construction by 60%.

    Global Optimality in Low-rank Matrix Optimization

    Zhihui Zhu, Qiuwei Li, Gongguo Tang, Michael B. Wakin
    Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

    This paper considers the minimization of a general objective function (f(X))
    over the set of non-square (n imes m) matrices where the optimal solution
    (X^star) is low-rank. To reduce the computational burden, we factorize the
    variable (X) into a product of two smaller matrices and optimize over these two
    matrices instead of (X). Despite the resulting nonconvexity, recent studies in
    matrix completion and sensing have shown that the factored problem has no
    spurious local minima and obeys the so-called strict saddle property (the
    function has a directional negative curvature at all critical points but local
    minima). We analyze the global geometry for a general and yet well-conditioned
    objective function (f(X)) whose restricted strong convexity and restricted
    strong smoothness constants are comparable. In particular, we show that the
    reformulated objective function has no spurious local minima and obeys the
    strict saddle property. These geometric properties implies that a number of
    iterative optimization algorithms (such as gradient descent) can provably solve
    the factored problem with global convergence.

    Rician MIMO Channel- and Jamming-Aware Decision Fusion

    D. Ciuonzo, A. Aubry, V. Carotenuto
    Comments: Accepted in IEEE Transactions on Signal Processing 2017
    Subjects: Information Theory (cs.IT)

    In this manuscript we study channel-aware decision fusion (DF) in a wireless
    sensor network (WSN) where: (i) the sensors transmit their decisions
    simultaneously for spectral efficiency purposes and the DF center (DFC) is
    equipped with multiple antennas; (ii) each sensor-DFC channel is described via
    a Rician model. As opposed to the existing literature, in order to account for
    stringent energy constraints in the WSN, only statistical channel information
    is assumed for the non-line-of sight (scattered) fading terms. For such a
    scenario, sub-optimal fusion rules are developed in order to deal with the
    exponential complexity of the likelihood ratio test (LRT) and impractical
    (complete) system knowledge. Furthermore, the considered model is extended to
    the case of (partially unknown) jamming-originated interference. Then the
    obtained fusion rules are modified with the use of composite hypothesis testing
    framework and generalized LRT. Coincidence and statistical equivalence among
    them are also investigated under some relevant simplified scenarios. Numerical
    results compare the proposed rules and highlight their jammingsuppression
    capability.

    On the Performance of Wireless Powered Communication With Non-linear Energy Harvesting

    Rania Morsi, Elena Boshkovska, Esmat Ramadan, Derrick Wing Kwan Ng, Robert Schober
    Comments: 5 pages, 1 Table, submitted for conference publication
    Subjects: Information Theory (cs.IT)

    In this paper, we analyze the performance of a time-slotted multi-antenna
    wireless powered communication (WPC) system, where a wireless device first
    harvests radio frequency (RF) energy from a power station (PS) in the downlink
    to facilitate information transfer to an information receiving station (IRS) in
    the uplink. The main goal of this paper is to provide insights and guidelines
    for the design of practical WPC systems. To this end, we adopt a recently
    proposed parametric non-linear RF energy harvesting (EH) model, which has been
    shown to accurately model the end-to-end non-linearity of practical RF EH
    circuits. In order to enhance the RF power transfer efficiency, maximum ratio
    transmission is adopted at the PS to focus the energy signals on the wireless
    device. Furthermore, at the IRS, maximum ratio combining is used. We analyze
    the outage probability and the average throughput of information transfer,
    assuming Nakagami-(m) fading uplink and downlink channels. Moreover, we study
    the system performance as a function of the number of PS transmit antennas, the
    number of IRS receive antennas, the transmit power of the PS, the fading
    severity, the transmission rate of the wireless device, and the EH time
    duration. In addition, we obtain a fixed point equation for the optimal
    transmission rate and the optimal EH time duration that maximize the asymptotic
    throughput for high PS transmit powers. All analytical results are corroborated
    by simulations.

    New constructions of MDS codes with complementary duals

    Bocong Chen, Hongwei Liu
    Subjects: Information Theory (cs.IT)

    Linear complementary-dual (LCD for short) codes are linear codes that
    intersect with their duals trivially. LCD codes have been used in certain
    communication systems. It is recently found that LCD codes can be applied in
    cryptography. This application of LCD codes renewed the interest in the
    construction of LCD codes having a large minimum distance. MDS codes are
    optimal in the sense that the minimum distance cannot be improved for given
    length and code size. Constructing LCD MDS codes is thus of significance in
    theory and practice. Recently, Jin (cite{Jin}, IEEE Trans. Inf. Theory, 2016)
    constructed several classes of LCD MDS codes through generalized Reed-Solomon
    codes. In this paper, a different approach is proposed to obtain new LCD MDS
    codes from generalized Reed-Solomon codes. Consequently, new code constructions
    are provided and certain previously known results in cite{Jin} are extended.

    Decoding Generalized Reed-Solomon Codes and Its Application to RLCE Encryption Schemes

    Yongge Wang
    Subjects: Information Theory (cs.IT)

    This paper presents a survey on generalized Reed-Solomon codes and various
    decoding algorithms: Berlekamp-Massey decoding algorithms; Berlekamp-Welch
    decoding algorithms; Euclidean decoding algorithms; discrete Fourier decoding
    algorithms, Chien’s search algorithm, and Forney’s algorithm.

    Tensor Balancing on Statistical Manifold

    Mahito Sugiyama, Hiroyuki Nakahara, Koji Tsuda
    Comments: 19 pages, 5 figures
    Subjects: Methodology (stat.ME); Information Theory (cs.IT); Numerical Analysis (cs.NA); Machine Learning (stat.ML)

    We solve tensor balancing, rescaling an Nth order nonnegative tensor by
    multiplying (N – 1)th order N tensors so that every fiber sums to one. This
    generalizes a fundamental process of matrix balancing used to compare matrices
    in a wide range of applications from biology to economics. We present an
    efficient balancing algorithm with quadratic convergence using Newton’s method
    and show in numerical experiments that the proposed algorithm is several orders
    of magnitude faster than existing ones. To theoretically prove the correctness
    of the algorithm, we model tensors as probability distributions in a
    statistical manifold and realize tensor balancing as projection onto a
    submanifold. The key to our algorithm is that the gradient of the manifold,
    used as a Jacobian matrix in Newton’s method, can be analytically obtained
    using the M”obius inversion formula, the essential of combinatorial
    mathematics. Our model is not limited to tensor balancing but has a wide
    applicability as it includes various statistical and machine learning models
    such as weighted DAGs and Boltzmann machines.

    On the calculation of Fisher information for quantum parameter estimation based on the stochastic master equation

    Beili Gong, Wei Cui
    Comments: 8 pages, 6 figur
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    The Fisher information can be used to indicate the precision of parameter
    estimation by the quantum Cram'{e}r-Rao inequality. This paper presents an
    efficient numerical algorithm for the calculation of Fisher information based
    on quantum weak measurement. According to the quantum stochastic master
    equation, the Fisher information is expressed in the form of log-likelihood
    functions. Three main methods are employed in this algorithm: (i) we use the
    numerical differentiation approach to calculate the derivative of the
    log-likelihood function; (ii) we randomly generate a series of parameters of
    interest by the Metropolis Hastings (MH) algorithm; and (iii) the values of
    expectation can be approximated by the Markov chain Monte Carlo (MCMC)
    integration. Finally, as an example to testify the feasibility of the proposed
    algorithm, we consider the dissipation rates of the open quantum system as
    unknown parameters that need to be estimated. We show that the Fisher
    information can reach a precision close to the Heisenberg limit in the weak
    coupling condition. This again demonstrates the effectiveness of the new
    algorithm.

    Upper bounds on the smallest size of a saturating set in projective planes and spaces of even dimension

    Daniele Bartoli, Alexander Davydov, Massimo Giulietti, Stefano Marcugini, Fernanda Pambianco
    Comments: 14 pages, 34 references, 1 figure
    Subjects: Combinatorics (math.CO); Information Theory (cs.IT)

    In a projective plane (Pi_{q}) (not necessarily Desarguesian) of order (q),
    a point subset (mathcal{S}) is saturating (or dense) if any point of
    (Pi_{q}setminus mathcal{S}) is collinear with two points in (mathcal{S}).
    Modifying an approach of [31], we proved the following upper bound on the
    smallest size (s(2,q)) of a saturating set in (Pi_{q}): egin{equation*}
    s(2,q)leq sqrt{(q+1)left(3ln q+lnln q
    +lnfrac{3}{4}
    ight)}+sqrt{frac{q}{3ln q}}+3. end{equation*} The bound
    holds for all q, not necessarily large.

    By using inductive constructions, upper bounds on the smallest size of a
    saturating set in the projective space (mathrm{PG}(N,q)) with even dimension
    (N) are obtained.

    All the results are also stated in terms of linear covering codes.

    Nonparanormal Information Estimation

    Shashank Singh, Barnabás Pøczos
    Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (stat.ML)

    We study the problem of using i.i.d. samples from an unknown multivariate
    probability distribution (p) to estimate the mutual information of (p). This
    problem has recently received attention in two settings: (1) where (p) is
    assumed to be Gaussian and (2) where (p) is assumed only to lie in a large
    nonparametric smoothness class. Estimators proposed for the Gaussian case
    converge in high dimensions when the Gaussian assumption holds, but are
    brittle, failing dramatically when (p) is not Gaussian. Estimators proposed for
    the nonparametric case fail to converge with realistic sample sizes except in
    very low dimensions. As a result, there is a lack of robust mutual information
    estimators for many realistic data. To address this, we propose estimators for
    mutual information when (p) is assumed to be a nonparanormal (a.k.a., Gaussian
    copula) model, a semiparametric compromise between Gaussian and nonparametric
    extremes. Using theoretical bounds and experiments, we show these estimators
    strike a practical balance between robustness and scaling with dimensionality.

    Key Reconciliation with Low-Density Parity-Check Codes for Long-Distance Quantum Cryptography

    Mario Milicevic, Chen Feng, Lei M. Zhang, P. Glenn Gulak
    Comments: 20 pages, 16 figures, 4 tables, 74 references
    Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT)

    The speed at which two remote parties can exchange secret keys over a
    fixed-length fiber-optic cable in continuous-variable quantum key distribution
    (CV-QKD) is currently limited by the computational complexity of
    post-processing algorithms for key reconciliation. Multi-edge low-density
    parity-check (LDPC) codes with low code rates and long block lengths were
    proposed for CV-QKD, in order to extend the maximum reconciliation distance
    between the two remote parties. Key reconciliation over multiple dimensions has
    been shown to further improve the error-correction performance of multi-edge
    LDPC codes in CV-QKD, thereby increasing both the secret key rate and distance.
    However, the computational complexity of LDPC decoding for long block lengths
    on the order of 10^6 bits remains a challenge. This work introduces a
    quasi-cyclic code construction for multi-edge LDPC codes that is highly
    suitable for hardware-accelerated decoding on a modern graphics processing unit
    (GPU). When combined with an 8-dimensional reconciliation scheme, the LDPC
    decoder achieves a raw decoding throughput of 1.72Mbit/s and an information
    throughput of 7.16Kbit/s using an NVIDIA GeForce GTX 1080 GPU, at a maximum
    distance of 164km with an effective secret key rate of 8.24×10^{-7} bits/pulse
    for a rate 0.02 multi-edge code with block length of 10^6 bits. For distances
    beyond 130km, the decoder delivers an information throughput between 2745x and
    9138x higher than the maximum secret key rate achievable with a 1MHz light
    source, thereby showing that LDPC decoding is no longer the computational
    bottleneck in long-distance CV-QKD.

    Holistic Small Cell Traffic Balancing across Licensed and Unlicensed Bands

    Ursula Challita, Mahesh K. Marina
    Comments: Accepted for publication at MSWiM 2016
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    Due to the dramatic growth in mobile data traffic on one hand and the
    scarcity of the licensed spectrum on the other hand, mobile operators are
    considering the use of unlicensed bands (especially those in 5 GHz) as
    complementary spectrum for providing higher system capacity and better user
    experience. This approach is currently being standardized by 3GPP under the
    name of LTE Licensed-Assisted Access (LTE-LAA). In this paper, we take a
    holistic approach for LTE-LAA small cell traffic balancing by jointly
    optimizing the use of the licensed and unlicensed bands. We pose this traffic
    balancing as an optimization problem that seeks proportional fair coexistence
    of WiFi, small cell and macro cell users by adapting the transmission
    probability of the LTE-LAA small cell in the licensed and unlicensed bands. The
    motivation for this formulation is for the LTE-LAA small cell to switch between
    or aggregate licensed and unlicensed bands depending on the
    interference/traffic level and the number of active users in each band. We
    derive a closed form solution for this optimization problem and additionally
    propose a transmission mechanism for the operation of the LTE-LAA small cell on
    both bands. Through numerical and simulation results, we show that our proposed
    traffic balancing scheme, besides enabling better LTE-WiFi coexistence and
    efficient utilization of the radio resources relative to the existing traffic
    balancing scheme, also provides a better tradeoff between maximizing the total
    network throughput and achieving fairness among all network flows compared to
    alternative approaches.




沪ICP备19023445号-2号
友情链接