IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 22 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-22 00:00:00
    love 0

    Neural and Evolutionary Computing

    A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

    Matthew W. Moskewicz, Ali Jannesari, Kurt Keutzer
    Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)

    In recent years, deep neural networks (DNNs), have yielded strong results on
    a wide range of applications. Graphics Processing Units (GPUs) have been one
    key enabling factor leading to the current popularity of DNNs. However, despite
    increasing hardware flexibility and software programming toolchain maturity,
    high efficiency GPU programming remains difficult: it suffers from high
    complexity, low productivity, and low portability. GPU vendors such as NVIDIA
    have spent enormous effort to write special-purpose DNN libraries. However, on
    other hardware targets, especially mobile GPUs, such vendor libraries are not
    generally available. Thus, the development of portable, open, high-performance,
    energy-efficient GPU code for DNN operations would enable broader deployment of
    DNN-based algorithms. Toward this end, this work presents a framework to enable
    productive, high-efficiency GPU programming for DNN computations across
    hardware platforms and programming models. In particular, the framework
    provides specific support for metaprogramming, autotuning, and DNN-tailored
    data types. Using our framework, we explore implementing DNN operations on
    three different hardware targets: NVIDIA, AMD, and Qualcomm GPUs. On NVIDIA
    GPUs, we show both portability between OpenCL and CUDA as well competitive
    performance compared to the vendor library. On Qualcomm GPUs, we show that our
    framework enables productive development of target-specific optimizations, and
    achieves reasonable absolute performance. Finally, On AMD GPUs, we show initial
    results that indicate our framework can yield reasonable performance on a new
    platform with minimal effort.

    Using inspiration from synaptic plasticity rules to optimize traffic flow in distributed engineered networks

    Jonathan Y. Suen, Saket Navlakha
    Comments: 43 pages, 5 Figures. Submitted to Neural Computation
    Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    Controlling the flow and routing of data is a fundamental problem in many
    distributed networks, including transportation systems, integrated circuits,
    and the Internet. In the brain, synaptic plasticity rules have been discovered
    that regulate network activity in response to environmental inputs, which
    enable circuits to be stable yet flexible. Here, we develop a new
    neuro-inspired model for network flow control that only depends on modifying
    edge weights in an activity-dependent manner. We show how two fundamental
    plasticity rules (long-term potentiation and long-term depression) can be cast
    as a distributed gradient descent algorithm for regulating traffic flow in
    engineered networks. We then characterize, both via simulation and
    analytically, how different forms of edge-weight update rules affect network
    routing efficiency and robustness. We find a close correspondence between
    certain classes of synaptic weight update rules derived experimentally in the
    brain and rules commonly used in engineering, suggesting common principles to
    both.

    Efficient Stochastic Inference of Bitwise Deep Neural Networks

    Sebastian Vogel, Christoph Schorn, Andre Guntoro, Gerd Ascheid
    Comments: 6 pages, 3 figures, Workshop on Efficient Methods for Deep Neural Networks at Neural Information Processing Systems Conference 2016, NIPS 2016, EMDNN 2016
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Recently published methods enable training of bitwise neural networks which
    allow reduced representation of down to a single bit per weight. We present a
    method that exploits ensemble decisions based on multiple stochastically
    sampled network models to increase performance figures of bitwise neural
    networks in terms of classification accuracy at inference. Our experiments with
    the CIFAR-10 and GTSRB datasets show that the performance of such network
    ensembles surpasses the performance of the high-precision base model. With this
    technique we achieve 5.81% best classification error on CIFAR-10 test set using
    bitwise networks. Concerning inference on embedded systems we evaluate these
    bitwise networks using a hardware efficient stochastic rounding procedure. Our
    work contributes to efficient embedded bitwise neural networks.

    Spikes as regularizers

    Anders Søgaard
    Comments: Computing with Spikes at NIPS 2016
    Subjects: Neural and Evolutionary Computing (cs.NE)

    We present a confidence-based single-layer feed-forward learning algorithm
    SPIRAL (Spike Regularized Adaptive Learning) relying on an encoding of
    activation spikes. We adaptively update a weight vector relying on confidence
    estimates and activation offsets relative to previous activity. We regularize
    updates proportionally to item-level confidence and weight-specific support,
    loosely inspired by the observation from neurophysiology that high spike rates
    are sometimes accompanied by low temporal precision. Our experiments suggest
    that the new learning algorithm SPIRAL is more robust and less prone to
    overfitting than both the averaged perceptron and AROW.

    Multiple-View Spectral Clustering for Group-wise Functional Community Detection

    Nathan D. Cahill, Harmeet Singh, Chao Zhang, Daryl A. Corcoran, Alison M. Prengaman, Paul S. Wenger, John F. Hamilton, Peter Bajorski, Andrew M. Michael
    Comments: Presented at The MICCAI-BACON 16 Workshop (this https URL)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Functional connectivity analysis yields powerful insights into our
    understanding of the human brain. Group-wise functional community detection
    aims to partition the brain into clusters, or communities, in which functional
    activity is inter-regionally correlated in a common manner across a group of
    subjects. In this article, we show how to use multiple-view spectral clustering
    to perform group-wise functional community detection. In a series of
    experiments on 291 subjects from the Human Connectome Project, we compare three
    versions of multiple-view spectral clustering: MVSC (uniform weights), MVSCW
    (weights based on subject-specific embedding quality), and AASC (weights
    optimized along with the embedding) with the competing technique of Joint
    Diagonalization of Laplacians (JDL). Results show that multiple-view spectral
    clustering not only yields group-wise functional communities that are more
    consistent than JDL when using randomly selected subsets of individual brains,
    but it is several orders of magnitude faster than JDL.

    Generalized Dropout

    Suraj Srinivas, R. Venkatesh Babu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep Neural Networks often require good regularizers to generalize well.
    Dropout is one such regularizer that is widely used among Deep Learning
    practitioners. Recent work has shown that Dropout can also be viewed as
    performing Approximate Bayesian Inference over the network parameters. In this
    work, we generalize this notion and introduce a rich family of regularizers
    which we call Generalized Dropout. One set of methods in this family, called
    Dropout++, is a version of Dropout with trainable parameters. Classical Dropout
    emerges as a special case of this method. Another member of this family selects
    the width of neural network layers. Experiments show that these methods help in
    improving generalization performance over Dropout.

    Deep Tensor Convolution on Multicores

    David Budden, Alexander Matveev, Shibani Santurkar, Shraman Ray Chaudhuri, Nir Shavit
    Comments: 8 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)

    Deep convolutional neural networks (ConvNets) have become a de facto standard
    for image classification and segmentation problems. These networks have also
    had early success in the video domain, despite failing to capture motion
    continuity and other rich temporal correlations. Evidence has since emerged
    that extending ConvNets to 3-dimensions leads to state-of-the-art performance
    across a broad set of video processing tasks by learning these joint
    spatiotemporal features. However, these early 3D networks have been restricted
    to shallower architectures of fewer channels than successful 2D networks due to
    memory constraints inherent to GPU implementations.

    In this study we present the first practical CPU implementation of tensor
    convolution optimized for deep networks of small kernels. Our implementation
    supports arbitrarily deep ConvNets of (N)-dimensional tensors due to the
    relaxed memory constraints of CPU systems, which can be further leveraged for
    an 8-fold reduction in the algorithmic cost of 3D convolution (e.g. C3D
    kernels). Because most of the optimized ConvNets in previous literature are 2
    rather than 3-dimensional, we benchmark our performance against the most
    popular 2D implementations. Even in this special case, which is theoretically
    the least beneficial for our fast algorithm, we observe a 5 to 25-fold
    improvement in throughput compared to previous state-of-the-art. We believe
    this work is an important step toward practical ConvNets for real-time
    applications, such as mobile video processing and biomedical image analysis,
    where high performance 3D networks are a must.

    Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline

    Zhiguang Wang, Weizhong Yan, Tim Oates
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a simple but strong baseline for time series classification from
    scratch with deep neural networks. Our proposed baseline models are pure
    end-to-end without any heavy preprocessing on the raw data or feature crafting.
    The FCN achieves premium performance to other state-of-the-art approaches. Our
    exploration of the very deep neural networks with the ResNet structure achieves
    competitive performance under the same simple experiment settings. The simple
    MLP baseline is also comparable to the 1NN-DTW as a previous golden baseline.
    Our models provides a simple choice for the real world application and a good
    starting point for the future research. An overall analysis is provided to
    discuss the generalization of our models, learned features, network structures
    and the classification semantics.

    Fast Video Classification via Adaptive Cascading of Deep Models

    Haichen Shen, Seungyeop Han, Matthai Philipose, Arvind Krishnamurthy
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recent advances have enabled “oracle” classifiers that can classify across
    many classes and input distributions with high accuracy without retraining.
    However, these classifiers are relatively heavyweight, so that applying them to
    classify video is costly. We show that day-to-day video exhibits highly skewed
    class distributions over the short term, and that these distributions can be
    classified by much simpler models. We formulate the problem of detecting the
    short-term skews online and exploiting models based on it as a new sequential
    decision making problem dubbed the Online Bandit Problem, and present a new
    algorithm to solve it. When applied to recognizing faces in TV shows and
    movies, we realize end-to-end classification speedups of 2.5-8.5x/2.8-12.7x (on
    GPU/CPU) relative to a state-of-the-art convolutional neural network, at
    competitive accuracy.

    Quantized neural network design under weight capacity constraint

    Sungho Shin, Kyuyeon Hwang, Wonyong Sung
    Comments: This paper is accepted at NIPS 2016 workshop on Efficient Methods for Deep Neural Networks (EMDNN). arXiv admin note: text overlap with arXiv:1511.06488
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The complexity of deep neural network algorithms for hardware implementation
    can be lowered either by scaling the number of units or reducing the
    word-length of weights. Both approaches, however, can accompany the performance
    degradation although many types of research are conducted to relieve this
    problem. Thus, it is an important question which one, between the network size
    scaling and the weight quantization, is more effective for hardware
    optimization. For this study, the performances of fully-connected deep neural
    networks (FCDNNs) and convolutional neural networks (CNNs) are evaluated while
    changing the network complexity and the word-length of weights. Based on these
    experiments, we present the effective compression ratio (ECR) to guide the
    trade-off between the network size and the precision of weights when the
    hardware resource is limited.

    Learning the Number of Neurons in Deep Networks

    Jose M Alvarez, Mathieu Salzmann
    Comments: NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Nowadays, the number of layers and of neurons in each layer of a deep network
    are typically set manually. While very deep and wide networks have proven
    effective in general, they come at a high memory and computation cost, thus
    making them impractical for constrained platforms. These networks, however, are
    known to have many redundant parameters, and could thus, in principle, be
    replaced by more compact architectures. In this paper, we introduce an approach
    to automatically determining the number of neurons in each layer of a deep
    network during learning. To this end, we propose to make use of a group
    sparsity regularizer on the parameters of the network, where each group is
    defined to act on a single neuron. Starting from an overcomplete network, we
    show that our approach can reduce the number of parameters by up to 80\% while
    retaining or even improving the network accuracy.

    Local minima in training of deep networks

    Grzegorz Swirszcz, Wojciech Marian Czarnecki, Razvan Pascanu
    Comments: submitted to ICLR 2016
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There has been a lot of recent interest in trying to characterize the error
    surface of deep models. This stems from a long standing question. Given that
    deep networks are highly nonlinear systems optimized by local gradient methods,
    why do they not seem to be affected by bad local minima? It is widely believed
    that training of deep models using gradient methods works so well because the
    error surface either has no local minima, or if they exist they need to be
    close in value to the global minimum. It is known that such results hold under
    very strong assumptions which are not satisfied by real models. In this paper
    we present examples showing that for such theorem to be true additional
    assumptions on the data, initialization schemes and/or the model classes have
    to be made. We look at the particular case of finite size datasets. We
    demonstrate that in this scenario one can construct counter-examples (datasets
    or initialization schemes) when the network does become susceptible to bad
    local minima over the weight space.


    Computer Vision and Pattern Recognition

    Image-to-Image Translation with Conditional Adversarial Networks

    Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We investigate conditional adversarial networks as a general-purpose solution
    to image-to-image translation problems. These networks not only learn the
    mapping from input image to output image, but also learn a loss function to
    train this mapping. This makes it possible to apply the same generic approach
    to problems that traditionally would require very different loss formulations.
    We demonstrate that this approach is effective at synthesizing photos from
    label maps, reconstructing objects from edge maps, and colorizing images, among
    other tasks. As a community, we no longer hand-engineer our mapping functions,
    and this work suggests we can achieve reasonable results without
    hand-engineering our loss functions either.

    Precise Relaxation of the Mumford-Shah Functional

    Thomas Möllenhoff, Daniel Cremers
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Jumps, edges and cutoffs are prevalent in our world across many modalities.
    The Mumford-Shah functional is a classical and elegant approach for modeling
    such discontinuities but global optimization of this non-convex functional
    remains challenging. The state of the art are convex representations based on
    the theory of calibrations. The major drawback of these approaches is the
    ultimate discretization of the co-domain into labels. For the case of total
    variation regularization, this issue has been partially resolved by recent
    sublabel-accurate relaxations, a generalization of which to other regularizers
    is not straightforward. In this work, we show that sublabel-accurate lifting
    approaches can be derived by discretizing a continuous relaxation of the
    Mumford-Shah functional by means of finite elements. We thereby unify and
    generalize existing functional lifting approaches. We show the efficiency of
    the proposed discretizations on discontinuity-preserving denoising tasks.

    Multiple-View Spectral Clustering for Group-wise Functional Community Detection

    Nathan D. Cahill, Harmeet Singh, Chao Zhang, Daryl A. Corcoran, Alison M. Prengaman, Paul S. Wenger, John F. Hamilton, Peter Bajorski, Andrew M. Michael
    Comments: Presented at The MICCAI-BACON 16 Workshop (this https URL)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Functional connectivity analysis yields powerful insights into our
    understanding of the human brain. Group-wise functional community detection
    aims to partition the brain into clusters, or communities, in which functional
    activity is inter-regionally correlated in a common manner across a group of
    subjects. In this article, we show how to use multiple-view spectral clustering
    to perform group-wise functional community detection. In a series of
    experiments on 291 subjects from the Human Connectome Project, we compare three
    versions of multiple-view spectral clustering: MVSC (uniform weights), MVSCW
    (weights based on subject-specific embedding quality), and AASC (weights
    optimized along with the embedding) with the competing technique of Joint
    Diagonalization of Laplacians (JDL). Results show that multiple-view spectral
    clustering not only yields group-wise functional communities that are more
    consistent than JDL when using randomly selected subsets of individual brains,
    but it is several orders of magnitude faster than JDL.

    Kernel Cross-View Collaborative Representation based Classification for Person Re-Identification

    Raphael Prates, William Robson Schwartz
    Comments: Paper submitted to CVPR 2017 conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Person re-identification aims at the maintenance of a global identity as a
    person moves among non-overlapping surveillance cameras. It is a hard task due
    to different illumination conditions, viewpoints and the small number of
    annotated individuals from each pair of cameras (small-sample-size problem).
    Collaborative Representation based Classification (CRC) has been employed
    successfully to address the small-sample-size problem in computer vision.
    However, the original CRC formulation is not well-suited for person
    re-identification since it does not consider that probe and gallery samples are
    from different cameras. Furthermore, it is a linear model, while appearance
    changes caused by different camera conditions indicate a strong nonlinear
    transition between cameras. To overcome such limitations, we propose the Kernel
    Cross-View Collaborative Representation based Classification (Kernel X-CRC)
    that represents probe and gallery images by balancing representativeness and
    similarity nonlinearly. It assumes that a probe and its corresponding gallery
    image are represented with similar coding vectors using individuals from the
    training set. Experimental results demonstrate that our assumption is true when
    using a high-dimensional feature vector and becomes more compelling when
    dealing with a low-dimensional and discriminative representation computed using
    a common subspace learning method. We achieve state-of-the-art for rank-1
    matching rates in two person re-identification datasets (PRID450S and GRID) and
    the second best results on VIPeR and CUHK01 datasets.

    Sampled Image Tagging and Retrieval Methods on User Generated Content

    Karl Ni, Kyle Zaragoza, Carmen Carrano, Barry Chen, Yonas Tesfaye, Alex Gude
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Traditional image tagging and retrieval algorithms have limited value as a
    result of being trained with heavily curated datasets. These limitations are
    most evident when arbitrary search words are used that do not intersect with
    training set labels. Weak labels from user generated content (UGC) found in the
    wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of
    unique words in the metadata tags. Prior work on word embeddings successfully
    leveraged unstructured text with large vocabularies, and our proposed method
    seeks to apply similar cost functions to open source imagery. Specifically, we
    train a deep learning image tagging and retrieval system on large scale, user
    generated content (UGC) using sampling methods and joint optimization of word
    embeddings. By using the Yahoo! FlickR Creative Commons (YFCC100M) dataset,
    such an approach builds robustness to common unstructured data issues that
    include but are not limited to irrelevant tags, misspellings, multiple
    languages, polysemy, and tag imbalance. As a result, the final proposed
    algorithm will not only yield comparable results to state of the art in
    conventional image tagging, but will enable new capability to train algorithms
    on large, scale unstructured text in the YFCC100M dataset and outperform cited
    work in zero-shot capability.

    Statistical Learning for OCR Text Correction

    Jie Mei, Aminul Islam, Yajing Wu, Abidalrahman Moh'd, Evangelos E. Milios
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

    The accuracy of Optical Character Recognition (OCR) is crucial to the success
    of subsequent applications used in text analyzing pipeline. Recent models of
    OCR post-processing significantly improve the quality of OCR-generated text,
    but are still prone to suggest correction candidates from limited observations
    while insufficiently accounting for the characteristics of OCR errors. In this
    paper, we show how to enlarge candidate suggestion space by using external
    corpus and integrating OCR-specific features in a regression approach to
    correct OCR-generated errors. The evaluation results show that our model can
    correct 61.5% of the OCR-errors (considering the top 1 suggestion) and 71.5% of
    the OCR-errors (considering the top 3 suggestions), for cases where the
    theoretical correction upper-bound is 78%.

    Dense Captioning with Joint Inference and Visual Context

    Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Dense captioning is a newly emerging computer vision topic for understanding
    images with dense language descriptions. The goal is to densely detect visual
    concepts (e.g., objects, object parts, and interactions between them) from
    images, labeling each with a short descriptive phrase. We identify two key
    challenges of dense captioning that need to be properly addressed when tackling
    the problem. First, dense visual concept annotations in each image are
    associated with highly overlapping target regions, making accurate localization
    of each visual concept challenging. Second, the large amount of visual concepts
    makes it hard to recognize each of them by appearance alone. We propose a new
    model pipeline based on two novel ideas, joint inference and context fusion, to
    alleviate these two challenges. We design our model architecture in a
    methodical manner and thoroughly evaluate the variations in architecture. Our
    final model, compact and efficient, achieves state-of-the-art accuracy on
    Visual Genome for dense captioning with a relative gain of 73\% compared to the
    previous best algorithm. Qualitative experiments also reveal the semantic
    capabilities of our model in dense captioning.

    Predicting 1p19q Chromosomal Deletion of Low-Grade Gliomas from MR Images using Deep Learning

    Zeynettin Akkus, Issa Ali, Jiri Sedlar, Timothy L. Kline, Jay P. Agrawal, Ian F. Parney, Caterina Giannini, Bradley J. Erickson
    Comments: This work has been presented in Conference on Machine Intelligence in Medical Imaging 2016 and RSNA 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Objective: Several studies have associated codeletion of chromosome arms
    1p/19q in low-grade gliomas (LGG) with positive response to treatment and
    longer progression free survival. Therefore, predicting 1p/19q status is
    crucial for effective treatment planning of LGG. In this study, we predict the
    1p/19q status from MR images using convolutional neural networks (CNN), which
    could be a noninvasive alternative to surgical biopsy and histopathological
    analysis. Method: Our method consists of three main steps: image registration,
    tumor segmentation, and classification of 1p/19q status using CNN. We included
    a total of 159 LGG with 3 image slices each who had biopsy-proven 1p/19q status
    (57 nondeleted and 102 codeleted) and preoperative postcontrast-T1 (T1C) and T2
    images. We divided our data into training, validation, and test sets. The
    training data was balanced for equal class probability and then augmented with
    iterations of random translational shift, rotation, and horizontal and vertical
    flips to increase the size of the training set. We shuffled and augmented the
    training data to counter overfitting in each epoch. Finally, we evaluated
    several configurations of a multi-scale CNN architecture until training and
    validation accuracies became consistent. Results: The results of the best
    performing configuration on the unseen test set were 93.3% (sensitivity),
    82.22% (specificity), and 87.7% (accuracy). Conclusion: Multi-scale CNN with
    their self-learning capability provides promising results for predicting 1p/19q
    status noninvasively based on T1C and T2 images. Significance: Predicting
    1p/19q status noninvasively from MR images would allow selecting effective
    treatment strategies for LGG patients without the need for surgical biopsy.

    Multi-Scale Anisotropic Fourth-Order Diffusion Improves Ridge and Valley Localization

    Shekoufeh Gorgi Zadeh, Stephan Didas, Maximilian W. M. Wintergerst, Thomas Schultz
    Comments: 16 pages, 6 figures, 1 table
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Ridge and valley enhancing filters are widely used in applications such as
    vessel detection in medical image computing. When images are degraded by noise
    or include vessels at different scales, such filters are an essential step for
    meaningful and stable vessel localization. In this work, we propose a novel
    multi-scale anisotropic fourth-order diffusion equation that allows us to
    smooth along vessels, while sharpening them in the orthogonal direction. The
    proposed filter uses a fourth order diffusion tensor whose eigentensors and
    eigenvalues are determined from the local Hessian matrix, at a scale that is
    automatically selected for each pixel. We discuss efficient implementation
    using a Fast Explicit Diffusion scheme and demonstrate results on synthetic
    images and vessels in fundus images. Compared to previous isotropic and
    anisotropic fourth-order filters, as well as established second-order vessel
    enhancing filters, our newly proposed one better restores the centerlines in
    all cases.

    The subset-matched Jaccard index for evaluation of Segmentation for Plant Images

    Jonathan Bell, Hannah M. Dee
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We describe a new measure for the evaluation of region level segmentation of
    objects, as applied to evaluating the accuracy of leaf-level segmentation of
    plant images. The proposed approach enforces the rule that a region (e.g. a
    leaf) in either the image being evaluated or the ground truth image evaluated
    against can be mapped to no more than one region in the other image. We call
    this measure the subset-matched Jaccard index.

    SANet: Structure-Aware Network for Visual Tracking

    Heng Fan, Haibin Ling
    Comments: 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural network (CNN) has drawn increasing interest in visual
    tracking owing to its powerfulness in feature extraction. Most existing
    CNN-based trackers treat tracking as a classification problem. However, these
    trackers are sensitive to similar distractors because their CNN models mainly
    focus on inter-class classification. To deal with this problem, we use
    self-structure information of object to distinguish it from distractors.
    Specifically, we utilize recurrent neural network (RNN) to model object
    structure, and incorporate it into CNN to improve its robustness in presence of
    similar distractors. Considering that convolutional layers in different levels
    characterize the object from different perspectives, we use multiple RNNs to
    model object structure in different levels respectively. In addition, we
    present a skip concatenation strategy to fuse CNN and RNN feature maps, and
    thus are able to provide the next layer with richer information, which further
    improves the performance of the proposed model. Extensive experimental results
    on three large-scale benchmarks, OTB100, TC-128 and VOT2015, show that the
    proposed algorithm outperforms other state-of-the-art methods.

    TextBoxes: A Fast Text Detector with a Single Deep Neural Network

    Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu
    Comments: Accepted by AAAI2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents an end-to-end trainable fast scene text detector, named
    TextBoxes, which detects scene text with both high accuracy and efficiency in a
    single network forward pass, involving no post-process except for a standard
    non-maximum suppression. TextBoxes outperforms competing methods in terms of
    text localization accuracy and is much faster, taking only 0.09s per image in a
    fast implementation. Furthermore, combined with a text recognizer, TextBoxes
    significantly outperforms state-of-the-art approaches on word spotting and
    end-to-end text recognition tasks.

    Efficient Convolutional Neural Network with Binary Quantization Layer

    Mahdyar Ravanbakhsh, Hossein Mousavi, Moin Nabi, Lucio Marcenaro, Carlo Regazzoni
    Comments: Workshop on Efficient Methods for Deep Neural Networks (EMDNN), NIPS 2016, Barcelona, Spain. arXiv admin note: substantial text overlap with arXiv:1609.09220
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we introduce a novel method for segmentation that can benefit
    from general semantics of Convolutional Neural Network (CNN). Our segmentation
    proposes visually and semantically coherent image segments. We use binary
    encoding of CNN features to overcome the difficulty of the clustering on the
    high-dimensional CNN feature space. These binary encoding can be embedded into
    the CNN as an extra layer at the end of the network. This results in real-time
    segmentation. To the best of our knowledge our method is the first attempt on
    general semantic image segmentation using CNN. All the previous papers were
    limited to few number of category of the images (e.g. PASCAL VOC). Experiments
    show that our segmentation algorithm outperform the state-of-the-art
    non-semantic segmentation methods by a large margin.

    Non-Local Color Image Denoising with Convolutional Neural Networks

    Stamatios Lefkimmiatis
    Comments: 15 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    We propose a novel deep network architecture for grayscale and color image
    denoising that is based on a non-local image model. Our motivation for the
    overall design of the proposed network stems from variational methods that
    exploit the inherent non-local self-similarity property of natural images. We
    build on this concept and introduce deep networks that perform non-local
    processing and at the same time they significantly benefit from discriminative
    learning. Experiments on the Berkeley segmentation dataset, comparing several
    state-of-the-art methods, show that the proposed non-local models achieve the
    best reported denoising performance both for grayscale and color images for all
    the tested noise levels. It is also worth noting that this increase in
    performance comes at no extra cost on the capacity of the network compared to
    existing alternative deep network architectures. In addition, we highlight a
    direct link of the proposed non-local models to convolutional neural networks.
    This connection is of significant importance since it allows our models to take
    full advantage of the latest advances on GPU computing in deep learning and
    makes them amenable to efficient implementations through their inherent
    parallelism.

    Crowd Counting by Adapting Convolutional Neural Networks with Side Information

    Di Kang, Debarun Dhar, Antoni B. Chan
    Comments: 8 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Computer vision tasks often have side information available that is helpful
    to solve the task. For example, for crowd counting, the camera perspective
    (e.g., camera angle and height) gives a clue about the appearance and scale of
    people in the scene. While side information has been shown to be useful for
    counting systems using traditional hand-crafted features, it has not been fully
    utilized in counting systems based on deep learning. In order to incorporate
    the available side information, we propose an adaptive convolutional neural
    network (ACNN), where the convolutional filter weights adapt to the current
    scene context via the side information. In particular, we model the filter
    weights as a low-dimensional manifold, parametrized by the side information,
    within the high-dimensional space of filter weights. With the help of side
    information and adaptive weights, the ACNN can disentangle the variations
    related to the side information, and extract discriminative features related to
    the current context. Since existing crowd counting datasets do not contain
    ground-truth side information, we collect a new dataset with the ground-truth
    camera angle and height as the side information. On experiments in crowd
    counting, the ACNN improves counting accuracy compared to a plain CNN with a
    similar number of parameters. We also apply ACNN to image deconvolution to show
    its potential effectiveness on other computer vision applications.

    Training Sparse Neural Networks

    Suraj Srinivas, Akshayvarun Subramanya, R. Venkatesh Babu
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep neural networks with lots of parameters are typically used for
    large-scale computer vision tasks such as image classification. This is a
    result of using dense matrix multiplications and convolutions. However, sparse
    computations are known to be much more efficient. In this work, we train and
    build neural networks which implicitly use sparse computations. We introduce
    additional gate variables to perform parameter selection and show that this is
    equivalent to using a spike-and-slab prior. We experimentally validate our
    method on both small and large networks and achieve state-of-the-art
    compression results for sparse neural network models.

    Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition

    Jiali Duan, Shuai Zhou, Jun Wan, Xiaoyuan Guo, Stan Z. Li
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, the popularity of depth-sensors such as Kinect has made depth
    videos easily available while its advantages have not been fully exploited.
    This paper investigates, for gesture recognition, to explore the spatial and
    temporal information complementarily embedded in RGB and depth sequences. We
    propose a convolutional twostream consensus voting network (2SCVN) which
    explicitly models both the short-term and long-term structure of the RGB
    sequences. To alleviate distractions from background, a 3d depth-saliency
    ConvNet stream (3DDSN) is aggregated in parallel to identify subtle motion
    characteristics. These two components in an unified framework significantly
    improve the recognition accuracy. On the challenging Chalearn IsoGD benchmark,
    our proposed method outperforms the first place on the leader-board by a large
    margin (10.29%) while also achieving the best result on RGBD-HuDaAct dataset
    (96.74%). Both quantitative experiments and qualitative analysis shows the
    effectiveness of our proposed framework and codes will be released to
    facilitate future research.

    Covariate conscious approach for Gait recognition based upon Zernike moment invariants

    Himanshu Aggarwal, Dinesh K. Vishwakarma
    Comments: 11 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Gait recognition i.e. identification of an individual from his/her walking
    pattern is an emerging field. While existing gait recognition techniques
    perform satisfactorily in normal walking conditions, there performance tend to
    suffer drastically with variations in clothing and carrying conditions. In this
    work, we propose a novel covariate cognizant framework to deal with the
    presence of such covariates. We describe gait motion by forming a single 2D
    spatio-temporal template from video sequence, called Average Energy Silhouette
    image (AESI). Zernike moment invariants (ZMIs) are then computed to screen the
    parts of AESI infected with covariates. Following this, features are extracted
    from Spatial Distribution of Oriented Gradients (SDOGs) and novel Mean of
    Directional Pixels (MDPs) methods. The obtained features are fused together to
    form the final well-endowed feature set. Experimental evaluation of the
    proposed framework on three publicly available datasets i.e. CASIA dataset B,
    OU-ISIR Treadmill dataset B and USF Human-ID challenge dataset with recently
    published gait recognition approaches, prove its superior performance.

    Deep Temporal Linear Encoding Networks

    Ali Diba, Vivek Sharma, Luc Van Gool
    Comments: Ali Diba and Vivek Sharma contributed equally to this work and listed in alphabetical order
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The CNN-encoding of features from entire videos for the representation of
    human actions has rarely been addressed. Instead, CNN work has focused on
    approaches to fuse spatial and temporal networks, but these were typically
    limited to processing shorter sequences. We present a new video representation,
    called temporal linear encoding (TLE) and embedded inside of CNNs as a new
    layer, which captures the appearance and motion throughout entire videos. It
    encodes this aggregated information into a robust video feature representation,
    via end-to-end learning. Advantages of TLEs are: (a) they encode the entire
    video into a compact feature representation, learning the semantics and a
    discriminative feature space; (b) they are applicable to all kinds of networks
    like 2D and 3D CNNs for video classification; and (c) they model feature
    interactions in a more expressive way and without loss of information. We
    conduct experiments on two challenging human action datasets: HMDB51 and
    UCF101. The experiments show that TLE outperforms current state-of-the-art
    methods on both datasets.

    Estimation of respiratory pattern from video using selective ensemble aggregation

    A. P. Prathosh, Pragathi Praveena, Lalit K. Mestha, Sanjay Bharadwaj
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Non-contact estimation of respiratory pattern (RP) and respiration rate (RR)
    has multiple applications. Existing methods for RP and RR measurement fall into
    one of the three categories – (i) estimation through nasal air flow
    measurement, (ii) estimation from video-based remote photoplethysmography, and
    (iii) estimation by measurement of motion induced by respiration using motion
    detectors. These methods, however, require specialized sensors, are
    computationally expensive and/or critically depend on selection of a region of
    interest (ROI) for processing. In this paper a general framework is described
    for estimating a periodic signal driving noisy LTI channels connected in
    parallel with unknown dynamics. The method is then applied to derive a
    computationally inexpensive method for estimating RP using 2D cameras that does
    not critically depend on ROI. Specifically, RP is estimated by imaging the
    changes in the reflected light caused by respiration-induced motion. Each
    spatial location in the field of view of the camera is modeled as a
    noise-corrupted linear time-invariant (LTI) measurement channel with unknown
    system dynamics, driven by a single generating respiratory signal. Estimation
    of RP is cast as a blind deconvolution problem and is solved through a method
    comprising subspace projection and statistical aggregation. Experiments are
    carried out on 31 healthy human subjects by generating multiple RPs and
    comparing the proposed estimates with simultaneously acquired ground truth from
    an impedance pneumograph device. The proposed estimator agrees well with the
    ground truth device in terms of correlation measures, despite variability in
    clothing pattern, angle of view and ROI.

    Gland Instance Segmentation Using Deep Multichannel Neural Networks

    Yan Xu, Yang Li, Yipei Wang, Mingyuan Liu, Yubo Fan, Maode Lai, Eric I-Chao Chang
    Comments: arXiv admin note: substantial text overlap with arXiv:1607.04889
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a new image instance segmentation method that segments individ-
    ual glands (instances) in colon histology images. This process is challenging
    since the glands not only need to be segmented from a complex background, they
    must also be individually identified. We leverage the idea of image-to-image
    prediction in recent deep learning by designing an algorithm that automatically
    exploits and fuses complex multichannel information – regional, location and
    boundary cues – in gland histology images. Our proposed algorithm, a deep
    multichannel framework, alleviates heavy feature design due to the use of con-
    volutional neural networks and is able to meet multifarious requirements by
    altering channels. Compared to methods reported in the 2015 MICCAI Gland
    Segmentation Challenge and other currently prevalent instance segmentation
    methods, we observe state-of-the-art results based on the evaluation metrics.
    Keywords: Instance segmentation, convolutional neural networks, segmentation,
    multichannel, histology image.

    ResFeats: Residual Network Based Features for Image Classification

    Ammar Mahmood, Mohammed Bennamoun, Senjian An, Ferdous Sohel
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep residual networks have recently emerged as the state-of-the-art
    architecture in image segmentation and object detection. In this paper, we
    propose new image features (called ResFeats) extracted from the last
    convolutional layer of deep residual networks pre-trained on ImageNet. We
    propose to use ResFeats for diverse image classification tasks namely, object
    classification, scene classification and coral classification and show that
    ResFeats consistently perform better than their CNN counterparts on these
    classification tasks. Since the ResFeats are large feature vectors, we propose
    to use PCA for dimensionality reduction. Experimental results are provided to
    show the effectiveness of ResFeats with state-of-the-art classification
    accuracies on Caltech-101, Caltech-256 and MLC datasets and a significant
    performance improvement on MIT-67 dataset compared to the widely used CNN
    features.

    Self-Supervised Video Representation Learning With Odd-One-Out Networks

    Basura Fernando, Hakan Bilen, Efstratios Gavves, Stephen Gould
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a new self-supervised CNN pre-training technique based on a novel
    auxiliary task called “odd-one-out learning”. In this task, the machine is
    asked to identify the unrelated or odd element from a set of otherwise related
    elements. We apply this technique to self-supervised video representation
    learning where we sample subsequences from videos and ask the network to learn
    to predict the odd video subsequence. The odd video subsequence is sampled such
    that it has wrong temporal order of frames while the even ones have the correct
    temporal order. Therefore, to generate a odd-one-out question no manual
    annotation is required. Our learning machine is implemented as multi-stream
    convolutional neural network, which is learned end-to-end. Using odd-one-out
    networks, we learn temporal representations for videos that generalizes to
    other related tasks such as action recognition.

    On action classification, our method obtains 60.3\% on the UCF101 dataset
    using only UCF101 data for training which is approximately 10% better than
    current state-of-the-art self-supervised learning methods. Similarly, on HMDB51
    dataset we outperform self-supervised state-of-the art methods by 12.7% on
    action classification task.

    Cascaded Face Alignment via Intimacy Definition Feature

    Hailiang Li, Kin-Man Lam, Edmond M. Y. Chiu, Kangheng Wu, Zhibin Lei
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we present a fast cascaded regression for face alignment, via
    a novel local feature. Our proposed local lightweight feature, namely intimacy
    definition feature (IDF), is more discriminative than landmark shape-indexed
    feature, more efficient than the handcrafted scale-invariant feature transform
    (SIFT) feature, and more compact than the local binary feature (LBF).
    Experimental results show that our approach achieves state-of-the-art
    performance, when tested on the most challenging benchmarks. Compared with an
    LBF-based algorithm, our method is able to obtain about two times the speed-up
    and more than 20% improvement, in terms of alignment error measurement, and
    able to save an order of magnitude of memory requirement.

    Phrase Localization and Visual Relationship Detection with Comprehensive Linguistic Cues

    Bryan A. Plummer, Arun Mallya, Christopher M. Cervantes, Julia Hockenmaier, Svetlana Lazebnik
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a framework for localization or grounding of phrases in
    images using a large collection of linguistic and visual cues. We model the
    appearance, size, and position of entity bounding boxes, adjectives that
    contain attribute information, and spatial relationships between pairs of
    entities connected by verbs or prepositions. We pay special attention to
    relationships between people and clothing or body part mentions, as they are
    useful for distinguishing individuals. We automatically learn weights for
    combining these cues and at test time, perform joint inference over all phrases
    in a caption. The resulting system produces a 4% improvement in accuracy over
    the state of the art on phrase localization on the Flickr30k Entities dataset
    and a 4-10% improvement for visual relationship detection on the Stanford VRD
    dataset.

    Not Afraid of the Dark: NIR-VIS Face Recognition via Cross-spectral Hallucination and Low-rank Embedding

    Jose Lezama, Qiang Qiu, Guillermo Sapiro
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Surveillance cameras today often capture NIR (near infrared) images in
    low-light environments. However, most face datasets accessible for training and
    verification are only collected in the VIS (visible light) spectrum. It remains
    a challenging problem to match NIR to VIS face images due to the different
    light spectrum. Recently, breakthroughs have been made for VIS face recognition
    by applying deep learning on a huge amount of labeled VIS face samples. The
    same deep learning approach cannot be simply applied to NIR face recognition
    for two main reasons: First, much limited NIR face images are available for
    training compared to the VIS spectrum. Second, face galleries to be matched are
    mostly available only in the VIS spectrum. In this paper, we propose an
    approach to extend the deep learning breakthrough for VIS face recognition to
    the NIR spectrum, without retraining the underlying deep models that see only
    VIS faces. Our approach consists of two core components, cross-spectral
    hallucination and low-rank embedding, to optimize respectively input and output
    of a VIS deep model for cross-spectral face recognition. Cross-spectral
    hallucination produces VIS faces from NIR images through a deep learning
    approach. Low-rank embedding restores a low-rank structure for faces deep
    features across both NIR and VIS spectrum. We observe that it is often equally
    effective to perform hallucination to input NIR images or low-rank embedding to
    output deep features for a VIS deep model for cross-spectral recognition. When
    hallucination and low-rank embedding are deployed together, we observe
    significant further improvement; we obtain state-of-the-art accuracy on the
    CASIA NIR-VIS v2.0 benchmark, without the need at all to re-train the
    recognition system.

    RefineNet: Multi-Path Refinement Networks with Identity Mappings for High-Resolution Semantic Segmentation

    Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, very deep convolutional neural networks (CNNs) have shown
    outstanding performance in object recognition and have also been the first
    choice for dense classification problems such as semantic segmentation.
    However, repeated subsampling operations like pooling or convolution striding
    in deep CNNs lead to a significant decrease in the initial image resolution.
    Here, we present RefineNet, a generic multi-path refinement network that
    explicitly exploits all the information available along the down-sampling
    process to enable high-resolution prediction using long-range residual
    connections. In this way, the deeper layers that capture high-level semantic
    features can be directly refined using fine-grained features from earlier
    convolutions. The individual components of RefineNet employ residual
    connections following the identity mapping mindset, which allows for effective
    end-to-end training. Further, we introduce chained residual pooling, which
    captures rich background context in an efficient manner. We carry out
    comprehensive experiments and set new state-of-the-art results on seven public
    datasets. In particular, we achieve an intersection-over-union score of 83.4 on
    the challenging PASCAL VOC 2012 dataset, which is the best reported result to
    date.

    A Hierarchical Approach for Generating Descriptive Image Paragraphs

    Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Recent progress on image captioning has made it possible to generate novel
    sentences describing images in natural language, but compressing an image into
    a single sentence can describe visual content in only coarse detail. While one
    new captioning approach, dense captioning, can potentially describe images in
    finer levels of detail by captioning many regions within an image, it in turn
    is unable to produce a coherent story for an image. In this paper we overcome
    these limitations by generating entire paragraphs for describing images, which
    can tell detailed, unified stories. We develop a model that decomposes both
    images and paragraphs into their constituent parts, detecting semantic regions
    in images and using a hierarchical recurrent neural network to reason about
    language. Linguistic analysis confirms the complexity of the paragraph
    generation task, and thorough experiments on a new dataset of image and
    paragraph pairs demonstrate the effectiveness of our approach.

    Object Recognition with and without Objects

    Zhuotun Zhu, Lingxi Xie, Alan L. Yuille
    Comments: 5 figures, 11 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    While recent deep neural network models have given promising performance on
    object recognition, they rely implicitly on the visual contents of the whole
    image. In this paper, we train deep neural networks on the foreground (object)
    and background (context) regions of images respectively. Considering human
    recognition in the same situations, networks trained on pure background without
    objects achieves highly reasonable recognition performance that beats humans to
    a large margin if only given context. However, humans still outperform networks
    with pure object available, which indicates networks and human beings have
    different mechanisms in understanding an image. Furthermore, we
    straightforwardly combine multiple trained networks to explore the different
    visual clues learned by different networks. Experiments show that useful visual
    hints can be learned separately and then combined to achieve higher
    performance, which confirms the advantages of the proposed framework.

    Deep Tensor Convolution on Multicores

    David Budden, Alexander Matveev, Shibani Santurkar, Shraman Ray Chaudhuri, Nir Shavit
    Comments: 8 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)

    Deep convolutional neural networks (ConvNets) have become a de facto standard
    for image classification and segmentation problems. These networks have also
    had early success in the video domain, despite failing to capture motion
    continuity and other rich temporal correlations. Evidence has since emerged
    that extending ConvNets to 3-dimensions leads to state-of-the-art performance
    across a broad set of video processing tasks by learning these joint
    spatiotemporal features. However, these early 3D networks have been restricted
    to shallower architectures of fewer channels than successful 2D networks due to
    memory constraints inherent to GPU implementations.

    In this study we present the first practical CPU implementation of tensor
    convolution optimized for deep networks of small kernels. Our implementation
    supports arbitrarily deep ConvNets of (N)-dimensional tensors due to the
    relaxed memory constraints of CPU systems, which can be further leveraged for
    an 8-fold reduction in the algorithmic cost of 3D convolution (e.g. C3D
    kernels). Because most of the optimized ConvNets in previous literature are 2
    rather than 3-dimensional, we benchmark our performance against the most
    popular 2D implementations. Even in this special case, which is theoretically
    the least beneficial for our fast algorithm, we observe a 5 to 25-fold
    improvement in throughput compared to previous state-of-the-art. We believe
    this work is an important step toward practical ConvNets for real-time
    applications, such as mobile video processing and biomedical image analysis,
    where high performance 3D networks are a must.

    Learning Fully Convolutional Networks for Iterative Non-blind Deconvolution

    Jiawei Zhang, Jinshan Pan, Wei-Sheng Lai, Rynson Lau, Ming-Hsuan Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we propose a fully convolutional networks for iterative
    non-blind deconvolution We decompose the non-blind deconvolution problem into
    image denoising and image deconvolution. We train a FCNN to remove noises in
    the gradient domain and use the learned gradients to guide the image
    deconvolution step. In contrast to the existing deep neural network based
    methods, we iteratively deconvolve the blurred images in a multi-stage
    framework. The proposed method is able to learn an adaptive image prior, which
    keeps both local (details) and global (structures) information. Both
    quantitative and qualitative evaluations on benchmark datasets demonstrate that
    the proposed method performs favorably against state-of-the-art algorithms in
    terms of quality and speed.

    Recurrent Memory Addressing for describing videos

    Kumar Krishna Agrawal, Arnav Kumar Jain, Abhinav Agarwalla, Pabitra Mitra
    Comments: Under review at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Deep Neural Network architectures with external memory components allow the
    model to perform inference and capture long term dependencies, by storing
    information explicitly. In this paper, we generalize Key-Value Memory Networks
    to a multimodal setting, introducing a novel key-addressing mechanism to deal
    with sequence-to-sequence models. The advantages of the framework are
    demonstrated on the task of video captioning, i.e generating natural language
    descriptions for videos. Conditioning on the previous time-step attention
    distributions for the key-value memory slots, we introduce a temporal structure
    in the memory addressing schema. The proposed model naturally decomposes the
    problem of video captioning into vision and language segments, dealing with
    them as key-value pairs. More specifically, we learn a semantic embedding (v)
    corresponding to each frame (k) in the video, thereby creating (k, v) memory
    slots. This allows us to exploit the temporal dependencies at multiple
    hierarchies (in the recurrent key-addressing; and in the language decoder).
    Exploiting this flexibility of the framework, we additionally capture spatial
    dependencies while mapping from the visual to semantic embedding. Extensive
    experiments on the Youtube2Text dataset demonstrate usefulness of recurrent
    key-addressing, while achieving competitive scores on BLEU@4, METEOR metrics
    against state-of-the-art models.

    Nazr-CNN: Object Detection and Fine-Grained Classification in Crowdsourced UAV Images

    N. Attari, F. Ofli, M. Awad, J. Lucas, S. Chawla
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose Nazr-CNN, a deep learning pipeline for object detection and
    fine-grained classification in images acquired from Unmanned Aerial Vehicles
    (UAVs). The UAVs were deployed in the Island of Vanuatu to assess damage in the
    aftermath of cyclone PAM in 2015. The images were labeled by a crowdsourcing
    effort and the labeling categories consisted of fine-grained levels of damage
    to built structures.

    Nazr-CNN consists of two components. The function of the first component is
    to localize objects (e.g. houses) in an image by carrying out a pixel-level
    classification. In the second component, a hidden layer of a Convolutional
    Neural Network (CNN) is used to encode Fisher Vectors (FV) of the segments
    generated from the first component in order to help discriminate between
    between different levels of damage. Since our data set is relatively small, a
    pre-trained network for pixel-level classification and FV encoding was used.
    Nazr-CNN attains promising results both for object detection and damage
    assessment suggesting that the integrated pipeline is robust in the face of
    small data sets and labeling errors by annotators. While the focus of Nazr-CNN
    is on assessment of UAV images in a post-disaster scenario, our solution is
    general and can be applied in many diverse settings.

    LCNN: Lookup-based Convolutional Neural Network

    Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Porting state of the art deep learning algorithms to resource constrained
    compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose
    a fast, compact, and accurate model for convolutional neural networks that
    enables efficient learning and inference. We introduce LCNN, a lookup-based
    convolutional neural network that encodes convolutions by few lookups to a
    dictionary that is trained to cover the space of weights in CNNs. Training LCNN
    involves jointly learning a dictionary and a small set of linear combinations.
    The size of the dictionary naturally traces a spectrum of trade-offs between
    efficiency and accuracy. Our experimental results on ImageNet challenge show
    that LCNN can offer 3.2x speedup while achieving 55.1% top-1 accuracy using
    AlexNet architecture. Our fastest LCNN offers 37.6x speed up over AlexNet while
    maintaining 44.3% top-1 accuracy. LCNN not only offers dramatic speed ups at
    inference, but it also enables efficient training. In this paper, we show the
    benefits of LCNN in few-shot learning and few-iteration learning, two crucial
    aspects of on-device training of deep learning models.

    On The Stability of Video Detection and Tracking

    Hong Zhang, Naiyan Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we study an important yet less explored aspect in video
    detection and multi-object tracking — stability. Surprisingly, there is no
    prior work that tried to quantify it. As a consequence, we start our work by
    proposing a novel evaluation metric for video detection which considers both
    stability and accuracy. For accuracy, we extend the existing accuracy metric
    mean Average Precision (mAP). For stability, we decompose it into three terms:
    fragment error, center position error, scale and ratio error. Each error
    represents one type of stability. Furthermore, we demonstrate that the
    stability metric has low correlation with accuracy metric. Thus, it indeed
    captures a different perspective of quality in object detection. Lastly, based
    on this metric, we evaluate several existing methods for video detection, and
    show how they affect accuracy and stability. We believe our work can provide
    guidance and solid baselines for future researches in related areas.

    Fast Video Classification via Adaptive Cascading of Deep Models

    Haichen Shen, Seungyeop Han, Matthai Philipose, Arvind Krishnamurthy
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recent advances have enabled “oracle” classifiers that can classify across
    many classes and input distributions with high accuracy without retraining.
    However, these classifiers are relatively heavyweight, so that applying them to
    classify video is costly. We show that day-to-day video exhibits highly skewed
    class distributions over the short term, and that these distributions can be
    classified by much simpler models. We formulate the problem of detecting the
    short-term skews online and exploiting models based on it as a new sequential
    decision making problem dubbed the Online Bandit Problem, and present a new
    algorithm to solve it. When applied to recognizing faces in TV shows and
    movies, we realize end-to-end classification speedups of 2.5-8.5x/2.8-12.7x (on
    GPU/CPU) relative to a state-of-the-art convolutional neural network, at
    competitive accuracy.

    PsyPhy: A Psychophysics Driven Evaluation Framework for Visual Recognition

    Brandon RichardWebster, Samuel E. Anthony, Walter J. Scheirer
    Comments: 11 pages, 4 figures. Submitted for publication. For supplemental material see this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    By providing substantial amounts of data and standardized evaluation
    protocols, datasets in computer vision have helped fuel advances across all
    areas of visual recognition. But even in light of breakthrough results on
    recent benchmarks, it is still fair to ask if our recognition algorithms are
    doing as well as we think they are. The vision sciences at large make use of a
    very different evaluation regime known as Visual Psychophysics to study visual
    perception. Psychophysics is the quantitative examination of the relationships
    between controlled stimuli and the behavioral responses they elicit in
    experimental test subjects. Instead of using summary statistics to gauge
    performance, psychophysics directs us to construct item-response curves made up
    of individual stimulus responses to find perceptual thresholds, thus allowing
    one to identify the exact point at which a subject can no longer reliably
    recognize the stimulus class. In this paper, we introduce a comprehensive
    evaluation framework for visual recognition models that is underpinned by this
    methodology. Over millions of procedurally rendered 3D scenes and 2D images, we
    compare the performance of well-known convolutional neural networks. Our
    results bring into question recent claims of human-like performance, and
    provide a path forward for correcting newly surfaced algorithmic deficiencies.

    Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks

    Emily Denton, Sam Gross, Rob Fergus
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a simple semi-supervised learning approach for images based on
    in-painting using an adversarial loss. Images with random patches removed are
    presented to a generator whose task is to fill in the hole, based on the
    surrounding pixels. The in-painted images are then presented to a discriminator
    network that judges if they are real (unaltered training images) or not. This
    task acts as a regularizer for standard supervised training of the
    discriminator. Using our approach we are able to directly train large VGG-style
    networks in a semi-supervised fashion. We evaluate on STL-10 and PASCAL
    datasets, where our approach obtains performance comparable or superior to
    existing methods.

    Deep Outdoor Illumination Estimation

    Yannick Hold-Geoffroy, Kalyan Sunkavalli, Sunil Hadap, Emiliano Gambaretto, Jean-François Lalonde
    Comments: 8 pages + 2 pages of citations, 12 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a CNN-based technique to estimate high-dynamic range outdoor
    illumination from a single low dynamic range image. To train the CNN, we
    leverage a large dataset of outdoor panoramas. We fit a low-dimensional
    physically-based outdoor illumination model to the skies in these panoramas
    giving us a compact set of parameters (including sun position, atmospheric
    conditions, and camera parameters). We extract limited field-of-view images
    from the panoramas, and train a CNN with this large set of input image–output
    lighting parameter pairs. Given a test image, this network can be used to infer
    illumination parameters that can, in turn, be used to reconstruct an outdoor
    illumination environment map. We demonstrate that our approach allows the
    recovery of plausible illumination conditions and enables automatic
    photorealistic virtual object insertion from a single image. An extensive
    evaluation on both the panorama dataset and captured HDR environment maps shows
    that our technique significantly outperforms previous solutions to this
    problem.

    Semantic tracking: Single-target tracking with inter-supervised convolutional networks

    Jingjing Xiao, Qiang Lan, Linbo Qiao, Ales Leonardis
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This article presents a semantic tracker which simultaneously tracks a single
    target and recognises its category. In general, it is hard to design a tracking
    model suitable for all object categories, e.g., a rigid tracker for a car is
    not suitable for a deformable gymnast. Category-based trackers usually achieve
    superior tracking performance for the objects of that specific category, but
    have difficulties being generalised. Therefore, we propose a novel unified
    robust tracking framework which explicitly encodes both generic features and
    category-based features. The tracker consists of a shared convolutional network
    (NetS), which feeds into two parallel networks, NetC for classification and
    NetT for tracking. NetS is pre-trained on ImageNet to serve as a generic
    feature extractor across the different object categories for NetC and NetT.
    NetC utilises those features within fully connected layers to classify the
    object category. NetT has multiple branches, corresponding to multiple
    categories, to distinguish the tracked object from the background. Since each
    branch in NetT is trained by the videos of a specific category or groups of
    similar categories, NetT encodes category-based features for tracking. During
    online tracking, NetC and NetT jointly determine the target regions with the
    right category and foreground labels for target estimation. To improve the
    robustness and precision, NetC and NetT inter-supervise each other and trigger
    network adaptation when their outputs are ambiguous for the same image regions
    (i.e., when the category label contradicts the foreground/background
    classification). We have compared the performance of our tracker to other
    state-of-the-art trackers on a large-scale tracking benchmark (100
    sequences)—the obtained results demonstrate the effectiveness of our proposed
    tracker as it outperformed other 38 state-of-the-art tracking algorithms.

    Deep Residual Learning for Compressed Sensing CT Reconstruction via Persistent Homology Analysis

    Yoseop Han, Jaejoon Yoo, Jong Chul Ye
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, compressed sensing (CS) computed tomography (CT) using sparse
    projection views has been extensively investigated to reduce the potential risk
    of radiation to patient. However, due to the insufficient number of projection
    views, an analytic reconstruction approach results in severe streaking
    artifacts and CS-based iterative approach is computationally very expensive. To
    address this issue, here we propose a novel deep residual learning approach for
    sparse view CT reconstruction. Specifically, based on a novel persistent
    homology analysis showing that the manifold of streaking artifacts is
    topologically simpler than original ones, a deep residual learning architecture
    that estimates the streaking artifacts is developed. Once a streaking artifact
    image is estimated, an artifact-free image can be obtained by subtracting the
    streaking artifacts from the input image. Using extensive experiments with real
    patient data set, we confirm that the proposed residual learning provides
    significantly better image reconstruction performance with several orders of
    magnitude faster computational speed.

    Ordinal Constrained Binary Code Learning for Nearest Neighbor Search

    Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang
    Comments: Accepted to AAAI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent years have witnessed extensive attention in binary code learning,
    a.k.a. hashing, for nearest neighbor search problems. It has been seen that
    high-dimensional data points can be quantized into binary codes to give an
    efficient similarity approximation via Hamming distance. Among existing
    schemes, ranking-based hashing is recent promising that targets at preserving
    ordinal relations of ranking in the Hamming space to minimize retrieval loss.
    However, the size of the ranking tuples, which shows the ordinal relations, is
    quadratic or cubic to the size of training samples. By given a large-scale
    training data set, it is very expensive to embed such ranking tuples in binary
    code learning. Besides, it remains a dificulty to build ranking tuples
    efficiently for most ranking-preserving hashing, which are deployed over an
    ordinal graph-based setting. To handle these problems, we propose a novel
    ranking-preserving hashing method, dubbed Ordinal Constraint Hashing (OCH),
    which efficiently learns the optimal hashing functions with a graph-based
    approximation to embed the ordinal relations. The core idea is to reduce the
    size of ordinal graph with ordinal constraint projection, which preserves the
    ordinal relations through a small data set (such as clusters or random
    samples). In particular, to learn such hash functions effectively, we further
    relax the discrete constraints and design a specific stochastic gradient decent
    algorithm for optimization. Experimental results on three large-scale visual
    search benchmark datasets, i.e. LabelMe, Tiny100K and GIST1M, show that the
    proposed OCH method can achieve superior performance over the state-of-the-arts
    approaches.

    Invertible Conditional GANs for image editing

    Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, Jose M. Álvarez
    Comments: Accepted paper at NIPS 2016 Workshop on Adversarial Training
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Generative Adversarial Networks (GANs) have recently demonstrated to
    successfully approximate complex data distributions. A relevant extension of
    this model is conditional GANs (cGANs), where the introduction of external
    information allows to determine specific representations of the generated
    images. In this work, we evaluate encoders to inverse the mapping of a cGAN,
    i.e., mapping a real image into a latent space and a conditional
    representation. This allows, for example, to reconstruct and modify real images
    of faces conditioning on arbitrary attributes. Additionally, we evaluate the
    design of cGANs. The combination of an encoder with a cGAN, which we call
    Invertible cGAN (IcGAN), enables to re-generate real images with deterministic
    complex modifications.

    Beyond Deep Residual Learning for Image Restoration: Persistent Homology-Guided Manifold Simplification

    Woong Bae, Jaejoon Yoo, Jong Chul Ye
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The latest deep learning approaches perform better than the state-of-the-art
    signal processing approaches in various image restoration tasks. However, if an
    image contains many patterns and structures, the performance of these CNNs is
    still inferior. To address this issue, here we propose a novel wavelet-domain
    deep residual learning algorithm that outperforms the existing residual
    learning. The main idea is originated from observation that the performance of
    a learning algorithm can be improved if the input and/or label manifold can be
    made topologically simpler. Using persistent homology analysis, we show that
    the recent residual learning was benefited from such manifold simplification,
    and wavelet transform provides another way to simplify the data manifold while
    preserving the edge information. Our extensive experiments demonstrate that the
    proposed wavelet-domain residual learning outperforms the existing state-of-the
    art approaches.

    Learning the Number of Neurons in Deep Networks

    Jose M Alvarez, Mathieu Salzmann
    Comments: NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Nowadays, the number of layers and of neurons in each layer of a deep network
    are typically set manually. While very deep and wide networks have proven
    effective in general, they come at a high memory and computation cost, thus
    making them impractical for constrained platforms. These networks, however, are
    known to have many redundant parameters, and could thus, in principle, be
    replaced by more compact architectures. In this paper, we introduce an approach
    to automatically determining the number of neurons in each layer of a deep
    network during learning. To this end, we propose to make use of a group
    sparsity regularizer on the parameters of the network, where each group is
    defined to act on a single neuron. Starting from an overcomplete network, we
    show that our approach can reduce the number of parameters by up to 80\% while
    retaining or even improving the network accuracy.

    Multi-Scale Saliency Detection using Dictionary Learning

    Shubham Pachori, Shanmugananthan Raman
    Comments: arXiv admin note: text overlap with arXiv:1502.01094 by other authors
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Saliency detection has drawn a lot of attention of researchers in various
    fields over the past several years. Saliency is the perceptual quality that
    makes an object, person to draw the attention of humans at the very sight.
    Salient object detection in an image has been used centrally in many
    computational photography and computer vision applications like video
    compression, object recognition and classification, object segmentation,
    adaptive content delivery, motion detection, content aware resizing, camouflage
    images and change blindness images to name a few. We propose a method to detect
    saliency in the objects using multimodal dictionary learning which has been
    recently used in classification and image fusion. The multimodal dictionary
    that we are learning is task driven which gives improved performance over its
    counterpart (the one which is not task specific).

    Inferring Restaurant Styles by Mining Crowd Sourced Photos from User-Review Websites

    Haofu Liao, Yucheng Li, Tianran Hu, Jiebo Luo
    Comments: 10 pages, Accepted by IEEE BigData 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    When looking for a restaurant online, user uploaded photos often give people
    an immediate and tangible impression about a restaurant. Due to their
    informativeness, such user contributed photos are leveraged by restaurant
    review websites to provide their users an intuitive and effective search
    experience. In this paper, we present a novel approach to inferring restaurant
    types or styles (ambiance, dish styles, suitability for different occasions)
    from user uploaded photos on user-review websites. To that end, we first
    collect a novel restaurant photo dataset associating the user contributed
    photos with the restaurant styles from TripAdvior. We then propose a deep
    multi-instance multi-label learning (MIML) framework to deal with the unique
    problem setting of the restaurant style classification task. We employ a
    two-step bootstrap strategy to train a multi-label convolutional neural network
    (CNN). The multi-label CNN is then used to compute the confidence scores of
    restaurant styles for all the images associated with a restaurant. The computed
    confidence scores are further used to train a final binary classifier for each
    restaurant style tag. Upon training, the styles of a restaurant can be profiled
    by analyzing restaurant photos with the trained multi-label CNN and SVM models.
    Experimental evaluation has demonstrated that our crowd sourcing-based approach
    can effectively infer the restaurant style when there are a sufficient number
    of user uploaded photos for a given restaurant.

    A Bayesian approach to type-specific conic fitting

    Matthew Collett
    Comments: 27 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    A perturbative approach is used to quantify the effect of noise in data
    points on fitted parameters in a general homogeneous linear model, and the
    results applied to the case of conic sections. There is an optimal choice of
    normalisation that minimises bias, and iteration with the correct reweighting
    significantly improves statistical reliability. By conditioning on an
    appropriate prior, an unbiased type-specific fit can be obtained. Error
    estimates for the conic coefficients may also be used to obtain both bias
    corrections and confidence intervals for other curve parameters.

    Understanding Anatomy Classification Through Visualization

    Devinder Kumar, Vlado Menkovski
    Comments: Accepted at 30th NIPS Machine learning for Health Workshop, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    One of the main challenges for broad adoption of deep convolutional neural
    network (DCNN) models is the lack of understanding of their decision process.
    In many applications a simpler less capable model that can be easily understood
    is favorable to a black-box model that has superior performance. In this paper,
    we present an approach for designing DCNN models based on visualization of the
    internal activations of the model. We visualize the model’s response using
    fractional stride convolution technique and compare the results with known
    imaging landmarks from the medical literature. We show that sufficiently deep
    and capable models can be successfully trained to use the same medical
    landmarks a human expert would use. The presented approach allows for
    communicating the model decision process well, but also offers insight towards
    detecting biases.

    RhoanaNet Pipeline: Dense Automatic Neural Annotation

    Seymour Knowles-Barley, Verena Kaynig, Thouis Ray Jones, Alyssa Wilson, Joshua Morgan, Dongil Lee, Daniel Berger, Narayanan Kasthuri, Jeff W. Lichtman, Hanspeter Pfister
    Comments: 13 pages, 4 figures
    Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV)

    Reconstructing a synaptic wiring diagram, or connectome, from electron
    microscopy (EM) images of brain tissue currently requires many hours of manual
    annotation or proofreading (Kasthuri and Lichtman, 2010; Lichtman and Sanes,
    2008; Seung, 2009). The desire to reconstruct ever larger and more complex
    networks has pushed the collection of ever larger EM datasets. A cubic
    millimeter of raw imaging data would take up 1 PB of storage and present an
    annotation project that would be impractical without relying heavily on
    automatic segmentation methods. The RhoanaNet image processing pipeline was
    developed to automatically segment large volumes of EM data and ease the burden
    of manual proofreading and annotation. Based on (Kaynig et al., 2015), we
    updated every stage of the software pipeline to provide better throughput
    performance and higher quality segmentation results. We used state of the art
    deep learning techniques to generate improved membrane probability maps, and
    Gala (Nunez-Iglesias et al., 2014) was used to agglomerate 2D segments into 3D
    objects.

    We applied the RhoanaNet pipeline to four densely annotated EM datasets, two
    from mouse cortex, one from cerebellum and one from mouse lateral geniculate
    nucleus (LGN). All training and test data is made available for benchmark
    comparisons. The best segmentation results obtained gave
    (V^ ext{Info}_ ext{F-score}) scores of 0.9054 and 09182 for the cortex
    datasets, 0.9438 for LGN, and 0.9150 for Cerebellum.

    The RhoanaNet pipeline is open source software. All source code, training
    data, test data, and annotations for all four benchmark datasets are available
    at www.rhoana.org.

    Generalized Dropout

    Suraj Srinivas, R. Venkatesh Babu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep Neural Networks often require good regularizers to generalize well.
    Dropout is one such regularizer that is widely used among Deep Learning
    practitioners. Recent work has shown that Dropout can also be viewed as
    performing Approximate Bayesian Inference over the network parameters. In this
    work, we generalize this notion and introduce a rich family of regularizers
    which we call Generalized Dropout. One set of methods in this family, called
    Dropout++, is a version of Dropout with trainable parameters. Classical Dropout
    emerges as a special case of this method. Another member of this family selects
    the width of neural network layers. Experiments show that these methods help in
    improving generalization performance over Dropout.

    Effective Deterministic Initialization for (k)-Means-Like Methods via Local Density Peaks Searching

    Fengfu Li, Hong Qiao, Bo Zhang
    Comments: 16 pages, 9 figures, journal paper
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    The (k)-means clustering algorithm is popular but has the following main
    drawbacks: 1) the number of clusters, (k), needs to be provided by the user in
    advance, 2) it can easily reach local minima with randomly selected initial
    centers, 3) it is sensitive to outliers, and 4) it can only deal with well
    separated hyperspherical clusters. In this paper, we propose a Local Density
    Peaks Searching (LDPS) initialization framework to address these issues. The
    LDPS framework includes two basic components: one of them is the local density
    that characterizes the density distribution of a data set, and the other is the
    local distinctiveness index (LDI) which we introduce to characterize how
    distinctive a data point is compared with its neighbors. Based on these two
    components, we search for the local density peaks which are characterized with
    high local densities and high LDIs to deal with 1) and 2). Moreover, we detect
    outliers characterized with low local densities but high LDIs, and exclude them
    out before clustering begins. Finally, we apply the LDPS initialization
    framework to (k)-medoids, which is a variant of (k)-means and chooses data
    samples as centers, with diverse similarity measures other than the Euclidean
    distance to fix the last drawback of (k)-means. Combining the LDPS
    initialization framework with (k)-means and (k)-medoids, we obtain two novel
    clustering methods called LDPS-means and LDPS-medoids, respectively.
    Experiments on synthetic data sets verify the effectiveness of the proposed
    methods, especially when the ground truth of the cluster number (k) is large.
    Further, experiments on several real world data sets, Handwritten Pendigits,
    Coil-20, Coil-100 and Olivetti Face Database, illustrate that our methods give
    a superior performance than the analogous approaches on both estimating (k) and
    unsupervised object categorization.

    Deep Learning for the Classification of Lung Nodules

    He Yang, Hengyong Yu, Ge Wang
    Subjects: Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep learning, as a promising new area of machine learning, has attracted a
    rapidly increasing attention in the field of medical imaging. Compared to the
    conventional machine learning methods, deep learning requires no hand-tuned
    feature extractor, and has shown a superior performance in many visual object
    recognition applications. In this study, we develop a deep convolutional neural
    network (CNN) and apply it to thoracic CT images for the classification of lung
    nodules. We present the CNN architecture and classification accuracy for the
    original images of lung nodules. In order to understand the features of lung
    nodules, we further construct new datasets, based on the combination of
    artificial geometric nodules and some transformations of the original images,
    as well as a stochastic nodule shape model. It is found that simplistic
    geometric nodules cannot capture the important features of lung nodules.

    Temporal Generative Adversarial Nets

    Masaki Saito, Eiichi Matsumoto
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    In this paper we propose a generative model, the Temporal Generative
    Adversarial Network (TGAN), which can learn a semantic representation of
    unlabelled videos, and is capable of generating consistent videos. Unlike an
    existing GAN that generates videos with a generator consisting of 3D
    deconvolutional layers, our model exploits two types of generators: a temporal
    generator and an image generator. The temporal generator consists of 1D
    deconvolutional layers and outputs a set of latent variables, each of which
    corresponds to a frame in the generated video, and the image generator
    transforms them into a video with 2D deconvolutional layers. This
    representation allows efficient training of the network parameters. Moreover,
    it can handle a wider range of applications including the generation of a long
    sequence, frame interpolation, and the use of pre-trained models. Experimental
    results demonstrate the effectiveness of our method.


    Artificial Intelligence

    Memory Lens: How Much Memory Does an Agent Use?

    Christoph Dann, Katja Hofmann, Sebastian Nowozin
    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems
    Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We propose a new method to study the internal memory used by reinforcement
    learning policies. We estimate the amount of relevant past information by
    estimating mutual information between behavior histories and the current action
    of an agent. We perform this estimation in the passive setting, that is, we do
    not intervene but merely observe the natural behavior of the agent. Moreover,
    we provide a theoretical justification for our approach by showing that it
    yields an implementation-independent lower bound on the minimal memory capacity
    of any agent that implement the observed policy. We demonstrate our approach by
    estimating the use of memory of DQN policies on concatenated Atari frames,
    demonstrating sharply different use of memory across 49 games. The study of
    memory as information that flows from the past to the current action opens
    avenues to understand and improve successful reinforcement learning algorithms.

    Generating machine-executable plans from end-user's natural-language instructions

    Rui Liu, Xiaoli Zhang
    Comments: 16 pages, 10 figures, article submitted to Robotics and Computer-Integrated Manufacturing, 2016 Aug
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)

    It is critical for advanced manufacturing machines to autonomously execute a
    task by following an end-user’s natural language (NL) instructions. However, NL
    instructions are usually ambiguous and abstract so that the machines may
    misunderstand and incorrectly execute the task. To address this NL-based
    human-machine communication problem and enable the machines to appropriately
    execute tasks by following the end-user’s NL instructions, we developed a
    Machine-Executable-Plan-Generation (exePlan) method. The exePlan method
    conducts task-centered semantic analysis to extract task-related information
    from ambiguous NL instructions. In addition, the method specifies machine
    execution parameters to generate a machine-executable plan by interpreting
    abstract NL instructions. To evaluate the exePlan method, an industrial robot
    Baxter was instructed by NL to perform three types of industrial tasks {‘drill
    a hole’, ‘clean a spot’, ‘install a screw’}. The experiment results proved that
    the exePlan method was effective in generating machine-executable plans from
    the end-user’s NL instructions. Such a method has the promise to endow a
    machine with the ability of NL-instructed task execution.

    Coherent Dialogue with Attention-based Language Models

    Hongyuan Mei, Mohit Bansal, Matthew R. Walter
    Comments: To appear at AAAI 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We model coherent conversation continuation via RNN-based dialogue models
    equipped with a dynamic attention mechanism. Our attention-RNN language model
    dynamically increases the scope of attention on the history as the conversation
    continues, as opposed to standard attention (or alignment) models with a fixed
    input scope in a sequence-to-sequence model. This allows each generated word to
    be associated with the most relevant words in its corresponding conversation
    history. We evaluate the model on two popular dialogue datasets, the
    open-domain MovieTriples dataset and the closed-domain Ubuntu Troubleshoot
    dataset, and achieve significant improvements over the state-of-the-art and
    baselines on several metrics, including complementary diversity-based metrics,
    human evaluation, and qualitative visualizations. We also show that a vanilla
    RNN with dynamic attention outperforms more complex memory models (e.g., LSTM
    and GRU) by allowing for flexible, long-distance memory. We promote further
    coherence via topic modeling-based reranking.

    Enforcing Relational Matching Dependencies with Datalog for Entity Resolution

    Zeinab Bahmani, Leopoldo Bertossi
    Comments: Conference submission
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

    Entity resolution (ER) is about identifying and merging records in a database
    that represent the same real-world entity. Matching dependencies (MDs) have
    been introduced and investigated as declarative rules that specify ER policies.
    An ER process induced by MDs over a dirty instance leads to multiple clean
    instances, in general. General “answer sets programs” have been proposed to
    specify the MD-based cleaning task and its results. In this work, we extend MDs
    to “relational MDs”, which capture more application semantics, and identify
    classes of relational MDs for which the general ASP can be automatically
    rewritten into a stratified Datalog program, with the single clean instance as
    its standard model.

    Learning From Graph Neighborhoods Using LSTMs

    Rakshit Agrawal, Luca de Alfaro, Vassilis Polychronopoulos
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Many prediction problems can be phrased as inferences over local
    neighborhoods of graphs. The graph represents the interaction between entities,
    and the neighborhood of each entity contains information that allows the
    inferences or predictions. We present an approach for applying machine learning
    directly to such graph neighborhoods, yielding predicitons for graph nodes on
    the basis of the structure of their local neighborhood and the features of the
    nodes in it. Our approach allows predictions to be learned directly from
    examples, bypassing the step of creating and tuning an inference model or
    summarizing the neighborhoods via a fixed set of hand-crafted features. The
    approach is based on a multi-level architecture built from Long Short-Term
    Memory neural nets (LSTMs); the LSTMs learn how to summarize the neighborhood
    from data. We demonstrate the effectiveness of the proposed technique on a
    synthetic example and on real-world data related to crowdsourced grading,
    Bitcoin transactions, and Wikipedia edit reversions.

    Options Discovery with Budgeted Reinforcement Learning

    Aurélia Léon, Ludovic Denoyer
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We consider the problem of learning hierarchical policies for Reinforcement
    Learning able to discover options, an option corresponding to a sub-policy over
    a set of primitive actions. Different models have been proposed during the last
    decade that usually rely on a predefined set of options. We specifically
    address the problem of automatically discovering options in decision processes.
    We describe a new RL learning framework called Bi-POMDP, and a new learning
    model called Budgeted Option Neural Network (BONN) able to discover options
    based on a budgeted learning objective. Since Bi-POMDP are more general than
    POMDP, our model can also be used to discover options for classical RL tasks.
    The BONN model is evaluated on different classical RL problems, demonstrating
    both quantitative and qualitative interesting results.

    Generalized Dropout

    Suraj Srinivas, R. Venkatesh Babu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep Neural Networks often require good regularizers to generalize well.
    Dropout is one such regularizer that is widely used among Deep Learning
    practitioners. Recent work has shown that Dropout can also be viewed as
    performing Approximate Bayesian Inference over the network parameters. In this
    work, we generalize this notion and introduce a rich family of regularizers
    which we call Generalized Dropout. One set of methods in this family, called
    Dropout++, is a version of Dropout with trainable parameters. Classical Dropout
    emerges as a special case of this method. Another member of this family selects
    the width of neural network layers. Experiments show that these methods help in
    improving generalization performance over Dropout.

    Non-Local Color Image Denoising with Convolutional Neural Networks

    Stamatios Lefkimmiatis
    Comments: 15 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    We propose a novel deep network architecture for grayscale and color image
    denoising that is based on a non-local image model. Our motivation for the
    overall design of the proposed network stems from variational methods that
    exploit the inherent non-local self-similarity property of natural images. We
    build on this concept and introduce deep networks that perform non-local
    processing and at the same time they significantly benefit from discriminative
    learning. Experiments on the Berkeley segmentation dataset, comparing several
    state-of-the-art methods, show that the proposed non-local models achieve the
    best reported denoising performance both for grayscale and color images for all
    the tested noise levels. It is also worth noting that this increase in
    performance comes at no extra cost on the capacity of the network compared to
    existing alternative deep network architectures. In addition, we highlight a
    direct link of the proposed non-local models to convolutional neural networks.
    This connection is of significant importance since it allows our models to take
    full advantage of the latest advances on GPU computing in deep learning and
    makes them amenable to efficient implementations through their inherent
    parallelism.

    Fair Division via Social Comparison

    Rediet Abebe, Jon Kleinberg, David Parkes
    Comments: 18 pages, 3 figures
    Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Combinatorics (math.CO)

    In the classical cake cutting problem, a resource must be divided among
    agents with different utilities so that each agent believes they have received
    a fair share of the resource relative to the other agents. We introduce a
    variant of the problem in which we model an underlying social network on the
    agents with a graph, and agents only evaluate their shares relative to their
    neighbors’ in the network. This formulation captures many situations in which
    it is unrealistic to assume a global view, and also exposes interesting
    phenomena in the original problem.

    Specifically, we say an allocation is locally envy-free if no agent envies a
    neighbor’s allocation and locally proportional if each agent values her own
    allocation as much as the average value of her neighbor’s allocations, with the
    former implying the latter. While global envy-freeness implies local
    envy-freeness, global proportionality does not imply local proportionality, or
    vice versa. A general result is that for any two distinct graphs on the same
    set of nodes and an allocation, there exists a set of valuation functions such
    that the allocation is locally proportional on one but not the other.

    We fully characterize the set of graphs for which an oblivious single-cutter
    protocol– a protocol that uses a single agent to cut the cake into pieces
    –admits a bounded protocol with (O(n^2)) query complexity for locally
    envy-free allocations in the Robertson-Webb model. We also consider the price
    of envy-freeness, which compares the total utility of an optimal allocation to
    the best utility of an allocation that is envy-free. We show that a lower bound
    of (Omega(sqrt{n})) on the price of envy-freeness for global allocations in
    fact holds for local envy-freeness in any connected undirected graph. Thus,
    sparse graphs surprisingly do not provide more flexibility with respect to the
    quality of envy-free allocations.

    A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective

    SamanehSorournejad, Zahra Zojaji, Reza Ebrahimi Atani, Amir Hassan Monadjemi
    Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Credit card plays a very important rule in today’s economy. It becomes an
    unavoidable part of household, business and global activities. Although using
    credit cards provides enormous benefits when used carefully and
    responsibly,significant credit and financial damages may be caused by
    fraudulent activities. Many techniques have been proposed to confront the
    growth in credit card fraud. However, all of these techniques have the same
    goal of avoiding the credit card fraud; each one has its own drawbacks,
    advantages and characteristics. In this paper, after investigating difficulties
    of credit card fraud detection, we seek to review the state of the art in
    credit card fraud detection techniques, data sets and evaluation criteria.The
    advantages and disadvantages of fraud detection methods are enumerated and
    compared.Furthermore, a classification of mentioned techniques into two main
    fraud detection approaches, namely, misuses (supervised) and anomaly detection
    (unsupervised) is presented. Again, a classification of techniques is proposed
    based on capability to process the numerical and categorical data sets.
    Different data sets used in literature are then described and grouped into real
    and synthesized data and the effective and common attributes are extracted for
    further usage.Moreover, evaluation employed criterions in literature are
    collected and discussed.Consequently, open issues for credit card fraud
    detection are explained as guidelines for new researchers.

    Invertible Conditional GANs for image editing

    Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, Jose M. Álvarez
    Comments: Accepted paper at NIPS 2016 Workshop on Adversarial Training
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Generative Adversarial Networks (GANs) have recently demonstrated to
    successfully approximate complex data distributions. A relevant extension of
    this model is conditional GANs (cGANs), where the introduction of external
    information allows to determine specific representations of the generated
    images. In this work, we evaluate encoders to inverse the mapping of a cGAN,
    i.e., mapping a real image into a latent space and a conditional
    representation. This allows, for example, to reconstruct and modify real images
    of faces conditioning on arbitrary attributes. Additionally, we evaluate the
    design of cGANs. The combination of an encoder with a cGAN, which we call
    Invertible cGAN (IcGAN), enables to re-generate real images with deterministic
    complex modifications.


    Information Retrieval

    Neural Information Retrieval: A Literature Review

    Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, Tyler McDonnell, An Thanh Nguyen, Dan Xu, Byron C. Wallace, Matthew Lease
    Comments: 54 pages
    Subjects: Information Retrieval (cs.IR)

    A recent “third wave” of Neural Network (NN) approaches now delivers
    state-of-the-art performance in many machine learning tasks, spanning speech
    recognition, computer vision, and natural language processing. Because these
    modern NNs often comprise multiple interconnected layers, this new NN research
    is often referred to as deep learning. Stemming from this tide of NN work, a
    number of researchers have recently begun to investigate NN approaches to
    Information Retrieval (IR). While deep NNs have yet to achieve the same level
    of success in IR as seen in other areas, the recent surge of interest and work
    in NNs for IR suggest that this state of affairs may be quickly changing. In
    this work, we survey the current landscape of Neural IR research, paying
    special attention to the use of learned representations of queries and
    documents (i.e., neural embeddings). We highlight the successes of neural IR
    thus far, catalog obstacles to its wider adoption, and suggest potentially
    promising directions for future research.

    A Visual and Textual Recurrent Neural Network for Sequential Prediction

    Qiang Cui, Shu Wu, Qiang Liu, Liang Wang
    Subjects: Information Retrieval (cs.IR)

    Sequential prediction is a fundamental task for Web applications. Due to the
    insufficiency of user feedbacks, sequential prediction usually suffers from the
    cold start problem. There are two kinds of popular approaches based on matrix
    factorization (MF) and Markov chains (MC) for item prediction. MF methods
    factorize the user-item matrix to learn general tastes of users. MC methods
    predict the next behavior based on recent behaviors. However, they have
    limitations. MF methods can merge additional information to address cold start
    but could not capture dynamic properties of user’s interest, and MC based
    sequential methods have difficulty in addressing cold start and has a strong
    Markov assumption that the next state only depends on the last state. In this
    work, to deal with the cold start problem of sequential prediction, we propose
    a RNN model adopting visual and textual content of items, which is named as
    (mathbf{V})isual and (mathbf{T})extual (mathbf{R})ecurrent (mathbf{N})eural
    (mathbf{N})etwork ((mathbf{VT})-(mathbf{RNN})). We can simultaneously learn
    the sequential latent vectors that dynamically capture the user’s interest, as
    well as content-based representations that contribute to address the cold
    start. Experiments on two real-world datasets show that our proposed VT-RNN
    model can effectively generate the personalized ranking list and significantly
    alleviate the cold start problem.

    Rising Novelties on Evolving Networks: Recent Behavior Dominant and Non-Dominant Model

    Khushnood Abbas
    Comments: 19 pages, 5 figures
    Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Physics and Society (physics.soc-ph)

    Novelty attracts attention like popularity. Hence predicting novelty is as
    important as popularity. Novelty is the side effect of competition and aging in
    evolving systems. Recent behavior or recent link gain in networks plays an
    important role in emergence or trend. We exploited this wisdom and came up with
    two models considering different scenarios and systems. Where recent behavior
    dominates over total behavior (total link gain) in the first one, and recent
    behavior is as important as total behavior for future link gain in second one.
    It suppose that random walker walks on a network and can jump to any node, the
    probablity of jumping or making connection to other node is based on which node
    is recently more active or receiving more links. In our assumption random
    walker can also jump to node which is already popular but recently not popular.
    We are able to predict rising novelties or popular nodes which is generally
    suppressed under preferential attachment effect. To show performance of our
    model we have conducted experiments on four real data sets namely, MovieLens,
    Netflix, Facebook and Arxiv High Energy Physics paper citation. For testing our
    model we used four information retrieval indices namely Precision, Novelty,
    Area Under Receiving Operating Characteristic(AUC) and Kendal’s rank
    correlation coefficient. We have used four benchmark models for validating our
    proposed models. Although our model doesn’t perform better in all the cases
    but, it has theoretical significance in working better for recent behavior
    dominant systems.

    Ontology Driven Disease Incidence Detection on Twitter

    Mark Abraham Magumba, Peter Nabende
    Comments: 19 pages, 7 figures, 1 table
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    In this work we address the issue of generic automated disease incidence
    monitoring on twitter. We employ an ontology of disease related concepts and
    use it to obtain a conceptual representation of tweets. Unlike previous key
    word based systems and topic modeling approaches, our ontological approach
    allows us to apply more stringent criteria for determining which messages are
    relevant such as spatial and temporal characteristics whilst giving a stronger
    guarantee that the resulting models will perform well on new data that may be
    lexically divergent. We achieve this by training learners on concepts rather
    than individual words. For training we use a dataset containing mentions of
    influenza and Listeria and use the learned models to classify datasets
    containing mentions of an arbitrary selection of other diseases. We show that
    our ontological approach achieves good performance on this task using a variety
    of Natural Language Processing Techniques. We also show that word vectors can
    be learned directly from our concepts to achieve even better results.

    A Business Zone Recommender System Based on Facebook and Urban Planning Data

    Jovian Lin, Richard J. Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus T. Kwee, Philips K. Prasetyo
    Journal-ref: Proceedings of the European Conference on Information Retrieval,
    2016, pp. 641-647
    Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)

    We present ZoneRec—a zone recommendation system for physical businesses in
    an urban city, which uses both public business data from Facebook and urban
    planning data. The system consists of machine learning algorithms that take in
    a business’ metadata and outputs a list of recommended zones to establish the
    business in. We evaluate our system using data of food businesses in Singapore
    and assess the contribution of different feature groups to the recommendation
    quality.

    Spotting Rumors via Novelty Detection

    Yumeng Qin, Dominik Wurzer, Victor Lavrenko, Cunchen Tang
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Rumour detection is hard because the most accurate systems operate
    retrospectively, only recognizing rumours once they have collected repeated
    signals. By then the rumours might have already spread and caused harm. We
    introduce a new category of features based on novelty, tailored to detect
    rumours early on. To compensate for the absence of repeated signals, we make
    use of news wire as an additional data source. Unconfirmed (novel) information
    with respect to the news articles is considered as an indication of rumours.
    Additionally we introduce pseudo feedback, which assumes that documents that
    are similar to previous rumours, are more likely to also be a rumour.
    Comparison with other real-time approaches shows that novelty based features in
    conjunction with pseudo feedback perform significantly better, when detecting
    rumours instantly after their publication.


    Computation and Language

    Coherent Dialogue with Attention-based Language Models

    Hongyuan Mei, Mohit Bansal, Matthew R. Walter
    Comments: To appear at AAAI 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    We model coherent conversation continuation via RNN-based dialogue models
    equipped with a dynamic attention mechanism. Our attention-RNN language model
    dynamically increases the scope of attention on the history as the conversation
    continues, as opposed to standard attention (or alignment) models with a fixed
    input scope in a sequence-to-sequence model. This allows each generated word to
    be associated with the most relevant words in its corresponding conversation
    history. We evaluate the model on two popular dialogue datasets, the
    open-domain MovieTriples dataset and the closed-domain Ubuntu Troubleshoot
    dataset, and achieve significant improvements over the state-of-the-art and
    baselines on several metrics, including complementary diversity-based metrics,
    human evaluation, and qualitative visualizations. We also show that a vanilla
    RNN with dynamic attention outperforms more complex memory models (e.g., LSTM
    and GRU) by allowing for flexible, long-distance memory. We promote further
    coherence via topic modeling-based reranking.

    Robust end-to-end deep audiovisual speech recognition

    Ramon Sanabria, Florian Metze, Fernando De La Torre
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)

    Speech is one of the most effective ways of communication among humans. Even
    though audio is the most common way of transmitting speech, very important
    information can be found in other modalities, such as vision. Vision is
    particularly useful when the acoustic signal is corrupted. Multi-modal speech
    recognition however has not yet found wide-spread use, mostly because the
    temporal alignment and fusion of the different information sources is
    challenging.

    This paper presents an end-to-end audiovisual speech recognizer (AVSR), based
    on recurrent neural networks (RNN) with a connectionist temporal classification
    (CTC) loss function. CTC creates sparse “peaky” output activations, and we
    analyze the differences in the alignments of output targets (phonemes or
    visemes) between audio-only, video-only, and audio-visual feature
    representations. We present the first such experiments on the large vocabulary
    IBM ViaVoice database, which outperform previously published approaches on
    phone accuracy in clean and noisy conditions.

    Bidirectional Tree-Structured LSTM with Head Lexicalization

    Zhiyang Teng, Yue Zhang
    Comments: 12 pages, 6 figures
    Subjects: Computation and Language (cs.CL)

    Sequential LSTM has been extended to model tree structures, giving
    competitive results for a number of tasks. Existing methods model constituent
    trees by bottom-up combinations of constituent nodes, making direct use of
    input word information only for leaf nodes. This is different from sequential
    LSTMs, which contain reference to input words for each node. In this paper, we
    propose a method for automatic head-lexicalization for tree-structure LSTMs,
    propagating head words from leaf nodes to every constituent node. In addition,
    enabled by head lexicalization, we build a tree LSTM in the top-down direction,
    which corresponds to bidirectional sequential LSTM structurally. Experiments
    show that both extensions give better representations of tree structures. Our
    final model gives the best results on the Standford Sentiment Treebank and
    highly competitive results on the TREC question type classification task.

    False-Friend Detection and Entity Matching via Unsupervised Transliteration

    Yanqing Chen, Steven Skiena
    Comments: 11 Pages, ACL style
    Subjects: Computation and Language (cs.CL)

    Transliterations play an important role in multilingual entity reference
    resolution, because proper names increasingly travel between languages in news
    and social media. Previous work associated with machine translation targets
    transliteration only single between language pairs, focuses on specific classes
    of entities (such as cities and celebrities) and relies on manual curation,
    which limits the expression power of transliteration in multilingual
    environment.

    By contrast, we present an unsupervised transliteration model covering 69
    major languages that can generate good transliterations for arbitrary strings
    between any language pair. Our model yields top-(1, 20, 100) averages of
    (32.85%, 60.44%, 83.20%) in matching gold standard transliteration compared to
    results from a recently-published system of (26.71%, 50.27%, 72.79%). We also
    show the quality of our model in detecting true and false friends from
    Wikipedia high frequency lexicons. Our method indicates a strong signal of
    pronunciation similarity and boosts the probability of finding true friends in
    68 out of 69 languages.

    Ontology Driven Disease Incidence Detection on Twitter

    Mark Abraham Magumba, Peter Nabende
    Comments: 19 pages, 7 figures, 1 table
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    In this work we address the issue of generic automated disease incidence
    monitoring on twitter. We employ an ontology of disease related concepts and
    use it to obtain a conceptual representation of tweets. Unlike previous key
    word based systems and topic modeling approaches, our ontological approach
    allows us to apply more stringent criteria for determining which messages are
    relevant such as spatial and temporal characteristics whilst giving a stronger
    guarantee that the resulting models will perform well on new data that may be
    lexically divergent. We achieve this by training learners on concepts rather
    than individual words. For training we use a dataset containing mentions of
    influenza and Listeria and use the learned models to classify datasets
    containing mentions of an arbitrary selection of other diseases. We show that
    our ontological approach achieves good performance on this task using a variety
    of Natural Language Processing Techniques. We also show that word vectors can
    be learned directly from our concepts to achieve even better results.

    Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling

    Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, Bo Xu
    Comments: 11 pages
    Subjects: Computation and Language (cs.CL)

    Recurrent Neural Network (RNN) is one of the most popular architectures used
    in Natural Language Processsing (NLP) tasks because its recurrent structure is
    very suitable to process variable-length text. RNN can utilize distributed
    representations of words by first converting the tokens comprising each text
    into vectors, which form a matrix. And this matrix includes two dimensions: the
    time-step dimension and the feature vector dimension. Then most existing models
    usually utilize one-dimensional (1D) max pooling operation or attention-based
    operation only on the time-step dimension to obtain a fixed-length vector.
    However, the features on the feature vector dimension are not mutually
    independent, and simply applying 1D pooling operation over the time-step
    dimension independently may destroy the structure of the feature
    representation. On the other hand, applying two-dimensional (2D) pooling
    operation over the two dimensions may sample more meaningful features for
    sequence modeling tasks. To integrate the features on both dimensions of the
    matrix, this paper explores applying 2D max pooling operation to obtain a
    fixed-length representation of the text. This paper also utilizes 2D
    convolution to sample more meaningful information of the matrix. Experiments
    are conducted on six text classification tasks, including sentiment analysis,
    question classification, subjectivity classification and newsgroup
    classification. Compared with the state-of-the-art models, the proposed models
    achieve excellent performance on 4 out of 6 tasks. Specifically, one of the
    proposed models achieves highest accuracy on Stanford Sentiment Treebank binary
    classification and fine-grained classification tasks.

    Visualizing Linguistic Shift

    Salman Mahmood, Rami Al-Rfou, Klaus Mueller
    Subjects: Computation and Language (cs.CL)

    Neural network based models are a very powerful tool for creating word
    embeddings, the objective of these models is to group similar words together.
    These embeddings have been used as features to improve results in various
    applications such as document classification, named entity recognition, etc.
    Neural language models are able to learn word representations which have been
    used to capture semantic shifts across time and geography. The objective of
    this paper is to first identify and then visualize how words change meaning in
    different text corpus. We will train a neural language model on texts from a
    diverse set of disciplines philosophy, religion, fiction etc. Each text will
    alter the embeddings of the words to represent the meaning of the word inside
    that text. We will present a computational technique to detect words that
    exhibit significant linguistic shift in meaning and usage. We then use enhanced
    scatterplots and storyline visualization to visualize the linguistic shift.

    Incorporating Pass-Phrase Dependent Background Models for Text Dependent Speaker Verification

    A. K. Sarkar, Zheng-Hua Tan
    Subjects: Computation and Language (cs.CL)

    In this paper, we propose a pass-phrase dependent background model (PBM) for
    text dependent (TD) speaker verification (SV) to integrate pass-phrase
    identification process (without an additional separate identification system)
    in the conventional TD-SV system, where a PBM is derived from a
    text-independent background model through adaptation using the utterances of a
    particular pass-phrase. During training, pass-phrase specific target speaker
    models are derived from the particular PBM using the training data for the
    respective target model. While testing, the best PBM is first selected for the
    test utterance in the maximum likelihood (ML) sense and following the selected
    PBM is used for the log likelihood ratio (LLR) calculation with respect to the
    claimant model. The proposed method incorporates the pass-phrase identification
    step in the LLR calculation, which is not considered in conventional standalone
    TD-SV based systems. The performance of the proposed method is compared to
    conventional text-independent background model based TD-SV systems using a
    Gaussian mixture model (GMM)-universal background model (UBM), Hidden Markov
    model (HMM)-UBM and i-vector paradigms. In addition, we consider two approaches
    to build PBMs: one is speaker independent and the other is speaker dependent.
    We show that the proposed method significantly reduces the error rate of text
    dependent speaker verification for the non-target types: target-wrong and
    imposter-wrong while it maintains comparable TD-SV performance when imposters
    speak a correct utterance with respect to the conventional system. Experiments
    are conducted on the RedDots challenge and the RSR2015 databases which consist
    of short utterances.

    Tracking Words in Chinese Poetry of Tang and Song Dynasties with the China Biographical Database

    Chao-Lin Liu, Kuo-Feng Luo
    Comments: 9 pages, 3 figures, Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), 26th International Conference on Computational Linguistics (COLING)
    Subjects: Computation and Language (cs.CL)

    Large-scale comparisons between the poetry of Tang and Song dynasties shed
    light on how words, collocations, and expressions were used and shared among
    the poets. That some words were used only in the Tang poetry and some only in
    the Song poetry could lead to interesting research in linguistics. That the
    most frequent colors are different in the Tang and Song poetry provides a trace
    of the changing social circumstances in the dynasties. Results of the current
    work link to research topics of lexicography, semantics, and social
    transitions. We discuss our findings and present our algorithms for efficient
    comparisons among the poems, which are crucial for completing billion times of
    comparisons within acceptable time.

    Unsupervised Learning for Lexicon-Based Classification

    Jacob Eisenstein
    Comments: to appear in AAAI 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

    In lexicon-based classification, documents are assigned labels by comparing
    the number of words that appear from two opposed lexicons, such as positive and
    negative sentiment. Creating such words lists is often easier than labeling
    instances, and they can be debugged by non-experts if classification
    performance is unsatisfactory. However, there is little analysis or
    justification of this classification heuristic. This paper describes a set of
    assumptions that can be used to derive a probabilistic justification for
    lexicon-based classification, as well as an analysis of its expected accuracy.
    One key assumption behind lexicon-based classification is that all words in
    each lexicon are equally predictive. This is rarely true in practice, which is
    why lexicon-based approaches are usually outperformed by supervised classifiers
    that learn distinct weights on each word from labeled instances. This paper
    shows that it is possible to learn such weights without labeled data, by
    leveraging co-occurrence statistics across the lexicons. This offers the best
    of both worlds: light supervision in the form of lexicons, and data-driven
    classification with higher accuracy than traditional word-counting heuristics.

    A Hierarchical Approach for Generating Descriptive Image Paragraphs

    Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Recent progress on image captioning has made it possible to generate novel
    sentences describing images in natural language, but compressing an image into
    a single sentence can describe visual content in only coarse detail. While one
    new captioning approach, dense captioning, can potentially describe images in
    finer levels of detail by captioning many regions within an image, it in turn
    is unable to produce a coherent story for an image. In this paper we overcome
    these limitations by generating entire paragraphs for describing images, which
    can tell detailed, unified stories. We develop a model that decomposes both
    images and paragraphs into their constituent parts, detecting semantic regions
    in images and using a hierarchical recurrent neural network to reason about
    language. Linguistic analysis confirms the complexity of the paragraph
    generation task, and thorough experiments on a new dataset of image and
    paragraph pairs demonstrate the effectiveness of our approach.

    Recurrent Memory Addressing for describing videos

    Kumar Krishna Agrawal, Arnav Kumar Jain, Abhinav Agarwalla, Pabitra Mitra
    Comments: Under review at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

    Deep Neural Network architectures with external memory components allow the
    model to perform inference and capture long term dependencies, by storing
    information explicitly. In this paper, we generalize Key-Value Memory Networks
    to a multimodal setting, introducing a novel key-addressing mechanism to deal
    with sequence-to-sequence models. The advantages of the framework are
    demonstrated on the task of video captioning, i.e generating natural language
    descriptions for videos. Conditioning on the previous time-step attention
    distributions for the key-value memory slots, we introduce a temporal structure
    in the memory addressing schema. The proposed model naturally decomposes the
    problem of video captioning into vision and language segments, dealing with
    them as key-value pairs. More specifically, we learn a semantic embedding (v)
    corresponding to each frame (k) in the video, thereby creating (k, v) memory
    slots. This allows us to exploit the temporal dependencies at multiple
    hierarchies (in the recurrent key-addressing; and in the language decoder).
    Exploiting this flexibility of the framework, we additionally capture spatial
    dependencies while mapping from the visual to semantic embedding. Extensive
    experiments on the Youtube2Text dataset demonstrate usefulness of recurrent
    key-addressing, while achieving competitive scores on BLEU@4, METEOR metrics
    against state-of-the-art models.

    Generating machine-executable plans from end-user's natural-language instructions

    Rui Liu, Xiaoli Zhang
    Comments: 16 pages, 10 figures, article submitted to Robotics and Computer-Integrated Manufacturing, 2016 Aug
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)

    It is critical for advanced manufacturing machines to autonomously execute a
    task by following an end-user’s natural language (NL) instructions. However, NL
    instructions are usually ambiguous and abstract so that the machines may
    misunderstand and incorrectly execute the task. To address this NL-based
    human-machine communication problem and enable the machines to appropriately
    execute tasks by following the end-user’s NL instructions, we developed a
    Machine-Executable-Plan-Generation (exePlan) method. The exePlan method
    conducts task-centered semantic analysis to extract task-related information
    from ambiguous NL instructions. In addition, the method specifies machine
    execution parameters to generate a machine-executable plan by interpreting
    abstract NL instructions. To evaluate the exePlan method, an industrial robot
    Baxter was instructed by NL to perform three types of industrial tasks {‘drill
    a hole’, ‘clean a spot’, ‘install a screw’}. The experiment results proved that
    the exePlan method was effective in generating machine-executable plans from
    the end-user’s NL instructions. Such a method has the promise to endow a
    machine with the ability of NL-instructed task execution.

    Gendered Conversation in a Social Game-Streaming Platform

    Supun Nakandala, Giovanni Luca Ciampaglia, Norman Makoto Su, Yong-Yeol Ahn
    Comments: 10 pages, 7 figures, 5 tables
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY)

    Online social media and games are increasingly replacing offline social
    activities. Social media is now an indispensable mode of communication; online
    gaming is not only a genuine social activity but also a popular spectator
    sport. With support for anonymity and larger audiences, online interaction
    shrinks social and geographical barriers. Despite such benefits, social
    disparities such as gender inequality persist in online social media. In
    particular, online gaming communities have been criticized for persistent
    gender disparities and objectification. As gaming evolves into a social
    platform, persistence of gender disparity is a pressing question. Yet, there
    are few large-scale, systematic studies of gender inequality and
    objectification in social gaming platforms. Here we analyze more than one
    billion chat messages from Twitch, a social game-streaming platform, to study
    how the gender of streamers is associated with the nature of conversation.
    Using a combination of computational text analysis methods, we show that
    gendered conversation and objectification is prevalent in chats. Female
    streamers receive significantly more objectifying comments while male streamers
    receive more game-related comments. This difference is more pronounced for
    popular streamers. There also exists a large number of users who post only on
    female or male streams. Employing a neural vector-space embedding (paragraph
    vector) method, we analyze gendered chat messages and create prediction models
    that (i) identify the gender of streamers based on messages posted in the
    channel and (ii) identify the gender a viewer prefers to watch based on their
    chat messages. Our findings suggest that disparities in social game-streaming
    platforms is a nuanced phenomenon that involves the gender of streamers as well
    as those who produce gendered and game-related conversation.

    Spotting Rumors via Novelty Detection

    Yumeng Qin, Dominik Wurzer, Victor Lavrenko, Cunchen Tang
    Subjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Rumour detection is hard because the most accurate systems operate
    retrospectively, only recognizing rumours once they have collected repeated
    signals. By then the rumours might have already spread and caused harm. We
    introduce a new category of features based on novelty, tailored to detect
    rumours early on. To compensate for the absence of repeated signals, we make
    use of news wire as an additional data source. Unconfirmed (novel) information
    with respect to the news articles is considered as an indication of rumours.
    Additionally we introduce pseudo feedback, which assumes that documents that
    are similar to previous rumours, are more likely to also be a rumour.
    Comparison with other real-time approaches shows that novelty based features in
    conjunction with pseudo feedback perform significantly better, when detecting
    rumours instantly after their publication.


    Distributed, Parallel, and Cluster Computing

    Population Protocols with Faulty Interactions: the Impact of a Leader

    Giuseppe Antonio Di Luna, Paola Flocchini, Taisuke Izumi, Tomoko Izumi, Nicola Santoro, Giovanni Viglietta
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider the problem of simulating traditional population protocols under
    weaker models of communication, which include one-way interactions (as opposed
    to two-way interactions) and omission faults (i.e., failure by an agent to read
    its partner’s state during an interaction), which in turn may be detectable or
    undetectable. We focus on the impact of a leader, and we give a complete
    characterization of the models in which the presence of a unique leader in the
    system allows the construction of simulators: when simulations are possible, we
    give explicit protocols; when they are not, we give proofs of impossibility.
    Specifically, if each agent has only a finite amount of memory, the simulation
    is possible only if there are no omission faults. If agents have an unbounded
    amount of memory, the simulation is possible as long as omissions are
    detectable. If an upper bound on the number of omissions involving the leader
    is known, the simulation is always possible, except in the one-way model in
    which one side is unable to detect the interaction.

    Demonstration of a context-switch method for heterogeneous reconfigurable systems

    Arief Wicaksana (TIMA), Alban Bourge (TIMA), Olivier Muller (TIMA), Frédéric Rousseau (TIMA)
    Journal-ref: 2016 26th International Conference on Field Programmable Logic and
    Applications (FPL), Aug 2016, Lausanne, Switzerland. pp.1 – 1, 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Nowadays, FPGAs are integrated in high-performance computing systems,
    servers, or even used as accelerators in System-on-Chip (SoC) platforms. Since
    the execution is performed in hardware, FPGA gives much higher performance and
    lower energy consumption compared to most microprocessor-based systems.
    However, the room to improve FPGA performance still exists, e.g. when it is
    used by multiple users. In multi-user approaches, FPGA resources are shared
    between several users. Therefore, one must be able to interrupt a running
    circuit at any given time and continue the task at will. An image of the state
    of the running circuit (context) is saved during interruption and restored when
    the execution is continued. The ability to extract and restore the context is
    known as context-switch.In the previous work [1], an automatic checkpoint
    selection method is proposed for circuit generation targeting reconfigurable
    systems. The method relies on static analysis of the finite state machine of a
    circuit to select the checkpoint states. States with minimum overhead will be
    selected as checkpoints, which allow optimal context save and restore. The
    maximum time to reach a checkpoint will be defined by the user and consideredas
    the context-switch latency. The method is implemented in C code and integrated
    as plugin in a free and open-source High-Level Synthesis tool AUGH [2].

    A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting

    Sandra Catalán, José R. Herrero, Enrique S. Quintana-Ortí, Rafael Rodríguez-Sánchez, Robert van de Geijn
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Performance (cs.PF)

    We propose two novel techniques for overcoming load-imbalance encountered
    when implementing so-called look-ahead mechanisms in relevant dense matrix
    factorizations for the solution of linear systems. Both techniques target the
    scenario where two thread teams are created/activated during the factorization,
    with each team in charge of performing an independent task/branch of execution.
    The first technique promotes worker sharing (WS) between the two tasks,
    allowing the threads of the task that completes first to be reallocated for use
    by the costlier task. The second technique allows a fast task to alert the
    slower task of completion, enforcing the early termination (ET) of the second
    task, and a smooth transition of the factorization procedure into the next
    iteration.

    The two mechanisms are instantiated via a new malleable thread-level
    implementation of the Basic Linear Algebra Subprograms (BLAS), and their
    benefits are illustrated via an implementation of the LU factorization with
    partial pivoting enhanced with look-ahead. Concretely, our experimental results
    on a six core Intel-Xeon processor show the benefits of combining WS+ET,
    reporting competitive performance in comparison with a task-parallel
    runtime-based solution.

    Gossiping with Latencies

    Seth Gilbert, Peter Robinson, Suman Sourav
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Consider the classical problem of information dissemination: one (or more)
    nodes in a network have some information that they want to distribute to the
    remainder of the network. In this paper, we study the cost of information
    dissemination in networks where edges have latencies, i.e., sending a message
    from one node to another takes some amount of time. We first generalize the
    idea of conductance to weighted graphs, defining (phi_*) to be the “weighted
    conductance” and (ell_*) to be the “critical latency.” One goal of this paper
    is to argue that (phi_*) characterizes the connectivity of a weighted graph
    with latencies in much the same way that conductance characterizes the
    connectivity of unweighted graphs.

    We give near tight upper and lower bounds on the problem of information
    dissemination, up to polylogarithmic factors. Specifically, we show that in a
    graph with (weighted) diameter (D) (with latencies as weights), maximum degree
    (Delta), weighted conductance (phi_*) and critical latency (ell_*), any
    information dissemination algorithm requires at least (Omega(min(D+Delta,
    ell_*/phi_*))) time. We show several variants of the lower bound (e.g., for
    graphs with small diameter, graphs with small max-degree, etc.) by reduction to
    a simple combinatorial game.

    We then give nearly matching algorithms, showing that information
    dissemination can be solved in (O(min((D + Delta)log^3{n}),
    (ell_*/phi_*)log(n))) time. This is achieved by combining two cases. When
    nodes do not know the latency of the adjacent edges, we show that the classical
    push-pull algorithm is (near) optimal when the diameter or maximum degree is
    large. For the case where the diameter and maximum degree are small, we give an
    alternative strategy in which we first discover the latencies and then use an
    algorithm for known latencies based on a weighted spanner construction.

    A Survey of Methods for Collective Communication Optimization and Tuning

    Udayanga Wickramasinghe, Andrew Lumsdaine
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    New developments in HPC technology in terms of increasing computing power on
    multi/many core processors, high-bandwidth memory/IO subsystems and
    communication interconnects, pose a direct impact on software and runtime
    system development. These advancements have become useful in producing
    high-performance collective communication interfaces that integrate efficiently
    on a wide variety of platforms and environments. However, number of
    optimization options that shows up with each new technology or software
    framework has resulted in a emph{combinatorial explosion} in feature space for
    tuning collective parameters such that finding the optimal set has become a
    nearly impossible task. Applicability of algorithmic choices available for
    optimizing collective communication depends largely on the scalability
    requirement for a particular usecase. This problem can be further exasperated
    by any requirement to run collective problems at very large scales such as in
    the case of exascale computing, at which impractical tuning by brute force may
    require many months of resources. Therefore application of statistical, data
    mining and artificial Intelligence or more general hybrid learning models seems
    essential in many collectives parameter optimization problems. We hope to
    explore current and the cutting edge of collective communication optimization
    and tuning methods and culminate with possible future directions towards this
    problem.

    Towards a Complete Framework for Virtual Data Center Embedding

    M P Gilesh
    Comments: Technical Report, NIT Calicut
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Cloud computing is widely adopted by corporate as well as retail customers to
    reduce the upfront cost of establishing computing infrastructure. However,
    switching to the cloud based services poses a multitude of questions, both for
    customers and for data center owners. In this work, we propose an algorithm for
    optimal placement of multiple virtual data centers on a physical data center.
    Our algorithm has two modes of operation – an online mode and a batch mode.
    Coordinated batch and online embedding algorithms are used to maximize resource
    usage while fulfilling the QoS demands. Experimental evaluation of our
    algorithms show that acceptance rate is high – implying higher profit to
    infrastructure provider. Additionaly, we try to keep a check on the number of
    VM migrations, which can increase operational cost and thus lead to service
    level agreement violations.

    A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

    Matthew W. Moskewicz, Ali Jannesari, Kurt Keutzer
    Subjects: Neural and Evolutionary Computing (cs.NE); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)

    In recent years, deep neural networks (DNNs), have yielded strong results on
    a wide range of applications. Graphics Processing Units (GPUs) have been one
    key enabling factor leading to the current popularity of DNNs. However, despite
    increasing hardware flexibility and software programming toolchain maturity,
    high efficiency GPU programming remains difficult: it suffers from high
    complexity, low productivity, and low portability. GPU vendors such as NVIDIA
    have spent enormous effort to write special-purpose DNN libraries. However, on
    other hardware targets, especially mobile GPUs, such vendor libraries are not
    generally available. Thus, the development of portable, open, high-performance,
    energy-efficient GPU code for DNN operations would enable broader deployment of
    DNN-based algorithms. Toward this end, this work presents a framework to enable
    productive, high-efficiency GPU programming for DNN computations across
    hardware platforms and programming models. In particular, the framework
    provides specific support for metaprogramming, autotuning, and DNN-tailored
    data types. Using our framework, we explore implementing DNN operations on
    three different hardware targets: NVIDIA, AMD, and Qualcomm GPUs. On NVIDIA
    GPUs, we show both portability between OpenCL and CUDA as well competitive
    performance compared to the vendor library. On Qualcomm GPUs, we show that our
    framework enables productive development of target-specific optimizations, and
    achieves reasonable absolute performance. Finally, On AMD GPUs, we show initial
    results that indicate our framework can yield reasonable performance on a new
    platform with minimal effort.

    Service-Oriented Sharding with Aspen

    Adem Efe Gencer, Robbert van Renesse, Emin Gün Sirer
    Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

    The rise of blockchain-based cryptocurrencies has led to an explosion of
    services using distributed ledgers as their underlying infrastructure. However,
    due to inherently single-service oriented blockchain protocols, such services
    can bloat the existing ledgers, fail to provide sufficient security, or
    completely forego the property of trustless auditability. Security concerns,
    trust restrictions, and scalability limits regarding the resource requirements
    of users hamper the sustainable development of loosely-coupled services on
    blockchains.

    This paper introduces Aspen, a sharded blockchain protocol designed to
    securely scale with increasing number of services. Aspen shares the same trust
    model as Bitcoin in a peer-to-peer network that is prone to extreme churn
    containing Byzantine participants. It enables introduction of new services
    without compromising the security, leveraging the trust assumptions, or
    flooding users with irrelevant messages.

    Distributed Nonconvex Optimization for Sparse Representation

    Ying Sun, Gesualdo Scutari
    Comments: Submitted to ICASSP 2017
    Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC)

    We consider a non-convex constrained Lagrangian formulation of a fundamental
    bi-criteria optimization problem for variable selection in statistical
    learning; the two criteria are a smooth (possibly) nonconvex loss function,
    measuring the fitness of the model to data, and the latter function is a
    difference-of-convex (DC) regularization, employed to promote some extra
    structure on the solution, like sparsity. This general class of nonconvex
    problems arises in many big-data applications, from statistical machine
    learning to physical sciences and engineering. We develop the first unified
    distributed algorithmic framework for these problems and establish its
    asymptotic convergence to d-stationary solutions. Two key features of the
    method are: i) it can be implemented on arbitrary networks (digraphs) with
    (possibly) time-varying connectivity; and ii) it does not require the
    restrictive assumption that the (sub)gradient of the objective function is
    bounded, which enlarges significantly the class of statistical learning
    problems that can be solved with convergence guarantees.

    Deep Tensor Convolution on Multicores

    David Budden, Alexander Matveev, Shibani Santurkar, Shraman Ray Chaudhuri, Nir Shavit
    Comments: 8 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)

    Deep convolutional neural networks (ConvNets) have become a de facto standard
    for image classification and segmentation problems. These networks have also
    had early success in the video domain, despite failing to capture motion
    continuity and other rich temporal correlations. Evidence has since emerged
    that extending ConvNets to 3-dimensions leads to state-of-the-art performance
    across a broad set of video processing tasks by learning these joint
    spatiotemporal features. However, these early 3D networks have been restricted
    to shallower architectures of fewer channels than successful 2D networks due to
    memory constraints inherent to GPU implementations.

    In this study we present the first practical CPU implementation of tensor
    convolution optimized for deep networks of small kernels. Our implementation
    supports arbitrarily deep ConvNets of (N)-dimensional tensors due to the
    relaxed memory constraints of CPU systems, which can be further leveraged for
    an 8-fold reduction in the algorithmic cost of 3D convolution (e.g. C3D
    kernels). Because most of the optimized ConvNets in previous literature are 2
    rather than 3-dimensional, we benchmark our performance against the most
    popular 2D implementations. Even in this special case, which is theoretically
    the least beneficial for our fast algorithm, we observe a 5 to 25-fold
    improvement in throughput compared to previous state-of-the-art. We believe
    this work is an important step toward practical ConvNets for real-time
    applications, such as mobile video processing and biomedical image analysis,
    where high performance 3D networks are a must.


    Learning

    GRAM: Graph-based Attention Model for Healthcare Representation Learning

    Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, Jimeng Sun
    Comments: Under review for ICLR 2017
    Subjects: Learning (cs.LG)

    Deep learning methods exhibit promising performance for predictive modeling
    in healthcare, but two important challenges remain: -Data insufficiency: Often
    in healthcare predictive modeling, the sample size is insufficient for deep
    learning methods to achieve satisfactory results. -Interpretation: The
    representations learned by deep learning models should align with medical
    knowledge. To address these challenges, we propose a GRaph-based Attention
    Model, GRAM that supplements electronic health records (EHR) with hierarchical
    information inherent to medical ontologies. Based on the data volume and the
    ontology structure, GRAM represents a medical concept as a combination of its
    ancestors in the ontology via an attention mechanism. We compared predictive
    performance (i.e. accuracy, data needs, interpretability) of GRAM to various
    methods including the recurrent neural network (RNN) in two sequential
    diagnoses prediction tasks and one heart failure prediction task. Compared to
    the basic RNN, GRAM achieved 10% higher accuracy for predicting diseases rarely
    observed in the training data and 3% improved area under the ROC curve for
    predicting heart failure using an order of magnitude less training data.
    Additionally, unlike other methods, the medical concept representations learned
    by GRAM are well aligned with the medical ontology. Finally, GRAM exhibits
    intuitive attention behaviors by adaptively generalizing to higher level
    concepts when facing data insufficiency at the lower level concepts.

    Associative Adversarial Networks

    Tarik Arici, Asli Celikyilmaz
    Comments: NIPS 2016 Workshop on Adversarial Training
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We propose a higher-level associative memory for learning adversarial
    networks. Generative adversarial network (GAN) framework has a discriminator
    and a generator network. The generator (G) maps white noise (z) to data samples
    while the discriminator (D) maps data samples to a single scalar. To do so, G
    learns how to map from high-level representation space to data space, and D
    learns to do the opposite. We argue that higher-level representation spaces
    need not necessarily follow a uniform probability distribution. In this work,
    we use Restricted Boltzmann Machines (RBMs) as a higher-level associative
    memory and learn the probability distribution for the high-level features
    generated by D. The associative memory samples its underlying probability
    distribution and G learns how to map these samples to data space. The proposed
    associative adversarial networks (AANs) are generative models in the
    higher-levels of the learning, and use adversarial non-stochastic models D and
    G for learning the mapping between data and higher-level representation spaces.
    Experiments show the potential of the proposed networks.

    Unsupervised Learning for Lexicon-Based Classification

    Jacob Eisenstein
    Comments: to appear in AAAI 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

    In lexicon-based classification, documents are assigned labels by comparing
    the number of words that appear from two opposed lexicons, such as positive and
    negative sentiment. Creating such words lists is often easier than labeling
    instances, and they can be debugged by non-experts if classification
    performance is unsatisfactory. However, there is little analysis or
    justification of this classification heuristic. This paper describes a set of
    assumptions that can be used to derive a probabilistic justification for
    lexicon-based classification, as well as an analysis of its expected accuracy.
    One key assumption behind lexicon-based classification is that all words in
    each lexicon are equally predictive. This is rarely true in practice, which is
    why lexicon-based approaches are usually outperformed by supervised classifiers
    that learn distinct weights on each word from labeled instances. This paper
    shows that it is possible to learn such weights without labeled data, by
    leveraging co-occurrence statistics across the lexicons. This offers the best
    of both worlds: light supervision in the form of lexicons, and data-driven
    classification with higher accuracy than traditional word-counting heuristics.

    Learning From Graph Neighborhoods Using LSTMs

    Rakshit Agrawal, Luca de Alfaro, Vassilis Polychronopoulos
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Many prediction problems can be phrased as inferences over local
    neighborhoods of graphs. The graph represents the interaction between entities,
    and the neighborhood of each entity contains information that allows the
    inferences or predictions. We present an approach for applying machine learning
    directly to such graph neighborhoods, yielding predicitons for graph nodes on
    the basis of the structure of their local neighborhood and the features of the
    nodes in it. Our approach allows predictions to be learned directly from
    examples, bypassing the step of creating and tuning an inference model or
    summarizing the neighborhoods via a fixed set of hand-crafted features. The
    approach is based on a multi-level architecture built from Long Short-Term
    Memory neural nets (LSTMs); the LSTMs learn how to summarize the neighborhood
    from data. We demonstrate the effectiveness of the proposed technique on a
    synthetic example and on real-world data related to crowdsourced grading,
    Bitcoin transactions, and Wikipedia edit reversions.

    Options Discovery with Budgeted Reinforcement Learning

    Aurélia Léon, Ludovic Denoyer
    Comments: Under review as a conference paper at ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    We consider the problem of learning hierarchical policies for Reinforcement
    Learning able to discover options, an option corresponding to a sub-policy over
    a set of primitive actions. Different models have been proposed during the last
    decade that usually rely on a predefined set of options. We specifically
    address the problem of automatically discovering options in decision processes.
    We describe a new RL learning framework called Bi-POMDP, and a new learning
    model called Budgeted Option Neural Network (BONN) able to discover options
    based on a budgeted learning objective. Since Bi-POMDP are more general than
    POMDP, our model can also be used to discover options for classical RL tasks.
    The BONN model is evaluated on different classical RL problems, demonstrating
    both quantitative and qualitative interesting results.

    Generalized Dropout

    Suraj Srinivas, R. Venkatesh Babu
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    Deep Neural Networks often require good regularizers to generalize well.
    Dropout is one such regularizer that is widely used among Deep Learning
    practitioners. Recent work has shown that Dropout can also be viewed as
    performing Approximate Bayesian Inference over the network parameters. In this
    work, we generalize this notion and introduce a rich family of regularizers
    which we call Generalized Dropout. One set of methods in this family, called
    Dropout++, is a version of Dropout with trainable parameters. Classical Dropout
    emerges as a special case of this method. Another member of this family selects
    the width of neural network layers. Experiments show that these methods help in
    improving generalization performance over Dropout.

    Effective Deterministic Initialization for (k)-Means-Like Methods via Local Density Peaks Searching

    Fengfu Li, Hong Qiao, Bo Zhang
    Comments: 16 pages, 9 figures, journal paper
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    The (k)-means clustering algorithm is popular but has the following main
    drawbacks: 1) the number of clusters, (k), needs to be provided by the user in
    advance, 2) it can easily reach local minima with randomly selected initial
    centers, 3) it is sensitive to outliers, and 4) it can only deal with well
    separated hyperspherical clusters. In this paper, we propose a Local Density
    Peaks Searching (LDPS) initialization framework to address these issues. The
    LDPS framework includes two basic components: one of them is the local density
    that characterizes the density distribution of a data set, and the other is the
    local distinctiveness index (LDI) which we introduce to characterize how
    distinctive a data point is compared with its neighbors. Based on these two
    components, we search for the local density peaks which are characterized with
    high local densities and high LDIs to deal with 1) and 2). Moreover, we detect
    outliers characterized with low local densities but high LDIs, and exclude them
    out before clustering begins. Finally, we apply the LDPS initialization
    framework to (k)-medoids, which is a variant of (k)-means and chooses data
    samples as centers, with diverse similarity measures other than the Euclidean
    distance to fix the last drawback of (k)-means. Combining the LDPS
    initialization framework with (k)-means and (k)-medoids, we obtain two novel
    clustering methods called LDPS-means and LDPS-medoids, respectively.
    Experiments on synthetic data sets verify the effectiveness of the proposed
    methods, especially when the ground truth of the cluster number (k) is large.
    Further, experiments on several real world data sets, Handwritten Pendigits,
    Coil-20, Coil-100 and Olivetti Face Database, illustrate that our methods give
    a superior performance than the analogous approaches on both estimating (k) and
    unsupervised object categorization.

    Probabilistic Duality for Parallel Gibbs Sampling without Graph Coloring

    Lars Mescheder, Sebastian Nowozin, Andreas Geiger
    Subjects: Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)

    We present a new notion of probabilistic duality for random variables
    involving mixture distributions. Using this notion, we show how to implement a
    highly-parallelizable Gibbs sampler for weakly coupled discrete pairwise
    graphical models with strictly positive factors that requires almost no
    preprocessing and is easy to implement. Moreover, we show how our method can be
    combined with blocking to improve mixing. Even though our method leads to
    inferior mixing times compared to a sequential Gibbs sampler, we argue that our
    method is still very useful for large dynamic networks, where factors are added
    and removed on a continuous basis, as it is hard to maintain a graph coloring
    in this setup. Similarly, our method is useful for parallelizing Gibbs sampling
    in graphical models that do not allow for graph colorings with a small number
    of colors such as densely connected graphs.

    Temporal Generative Adversarial Nets

    Masaki Saito, Eiichi Matsumoto
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    In this paper we propose a generative model, the Temporal Generative
    Adversarial Network (TGAN), which can learn a semantic representation of
    unlabelled videos, and is capable of generating consistent videos. Unlike an
    existing GAN that generates videos with a generator consisting of 3D
    deconvolutional layers, our model exploits two types of generators: a temporal
    generator and an image generator. The temporal generator consists of 1D
    deconvolutional layers and outputs a set of latent variables, each of which
    corresponds to a frame in the generated video, and the image generator
    transforms them into a video with 2D deconvolutional layers. This
    representation allows efficient training of the network parameters. Moreover,
    it can handle a wider range of applications including the generation of a long
    sequence, frame interpolation, and the use of pre-trained models. Experimental
    results demonstrate the effectiveness of our method.

    Prototypical Recurrent Unit

    Dingkun Long, Richong Zhang, Yongyi Mao
    Subjects: Learning (cs.LG)

    The difficulty in analyzing LSTM-like recurrent neural networks lies in the
    complex structure of the recurrent unit, which induces highly complex nonlinear
    dynamics. In this paper, we design a new simple recurrent unit, which we call
    Prototypical Recurrent Unit (PRU). We verify experimentally that PRU performs
    comparably to LSTM and GRU. This potentially enables PRU to be a prototypical
    example for analytic study of LSTM-like recurrent networks. Along these
    experiments, the memorization capability of LSTM-like networks is also studied
    and some insights are obtained.

    Dealing with Range Anxiety in Mean Estimation via Statistical Queries

    Vitaly Feldman
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In the statistical query (SQ) model an algorithm has access to an SQ oracle
    for the input distribution (D) over (X) instead of i.i.d.~ samples from (D).
    Given a query function (phi:X
    ightarrow [-1,1]), the oracle returns an
    estimate of ({f E}_{{f x}sim D}[phi({f x})]) within some tolerance
    ( au). In a variety of natural problems it is necessary to estimate
    expectations of functions whose standard deviation is much smaller than the
    range. In this note we describe a nearly optimal algorithm for estimation of
    such expectations via statistical queries. As applications, we give algorithms
    for high dimensional mean estimation in the SQ model and in the distributed
    setting where only a single bit is communicated from each sample.

    Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline

    Zhiguang Wang, Weizhong Yan, Tim Oates
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We propose a simple but strong baseline for time series classification from
    scratch with deep neural networks. Our proposed baseline models are pure
    end-to-end without any heavy preprocessing on the raw data or feature crafting.
    The FCN achieves premium performance to other state-of-the-art approaches. Our
    exploration of the very deep neural networks with the ResNet structure achieves
    competitive performance under the same simple experiment settings. The simple
    MLP baseline is also comparable to the 1NN-DTW as a previous golden baseline.
    Our models provides a simple choice for the real world application and a good
    starting point for the future research. An overall analysis is provided to
    discuss the generalization of our models, learned features, network structures
    and the classification semantics.

    Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning

    Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz
    Comments: 5 pages, 8 figures, NIPS workshop: The 1st International Workshop on Efficient Methods for Deep Neural Networks
    Subjects: Learning (cs.LG)

    We propose a new framework for pruning convolutional kernels in neural
    networks to enable efficient inference, focusing on transfer learning where
    large and potentially unwieldy pretrained networks are adapted to specialized
    tasks. We interleave greedy criteria-based pruning with fine-tuning by
    backpropagation – a computationally efficient procedure that maintains good
    generalization in the pruned network. We propose a new criterion based on an
    efficient first-order Taylor expansion to approximate the absolute change in
    training cost induced by pruning a network component. After normalization, the
    proposed criterion scales appropriately across all layers of a deep CNN,
    eliminating the need for per-layer sensitivity analysis. The proposed criterion
    demonstrates superior performance compared to other criteria, such as the norm
    of kernel weights or average feature map activation.

    Quantized neural network design under weight capacity constraint

    Sungho Shin, Kyuyeon Hwang, Wonyong Sung
    Comments: This paper is accepted at NIPS 2016 workshop on Efficient Methods for Deep Neural Networks (EMDNN). arXiv admin note: text overlap with arXiv:1511.06488
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The complexity of deep neural network algorithms for hardware implementation
    can be lowered either by scaling the number of units or reducing the
    word-length of weights. Both approaches, however, can accompany the performance
    degradation although many types of research are conducted to relieve this
    problem. Thus, it is an important question which one, between the network size
    scaling and the weight quantization, is more effective for hardware
    optimization. For this study, the performances of fully-connected deep neural
    networks (FCDNNs) and convolutional neural networks (CNNs) are evaluated while
    changing the network complexity and the word-length of weights. Based on these
    experiments, we present the effective compression ratio (ECR) to guide the
    trade-off between the network size and the precision of weights when the
    hardware resource is limited.

    Cross-model convolutional neural network for multiple modality data representation

    Yanbin Wu, Li Wang, Fan Cui, Hongbin Zhai, Baoming Dong, Jim Jing-Yan Wang
    Subjects: Learning (cs.LG)

    A novel data representation method of convolutional neural net- work (CNN) is
    proposed in this paper to represent data of different modalities. We learn a
    CNN model for the data of each modality to map the data of differ- ent
    modalities to a common space, and regularize the new representations in the
    common space by a cross-model relevance matrix. We further impose that the
    class label of data points can also be predicted from the CNN representa- tions
    in the common space. The learning problem is modeled as a minimiza- tion
    problem, which is solved by an augmented Lagrange method (ALM) with updating
    rules of Alternating direction method of multipliers (ADMM). The experiments
    over benchmark of sequence data of multiple modalities show its advantage.

    GA3C: GPU-based A3C for Deep Reinforcement Learning

    Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, Jan Kautz
    Subjects: Learning (cs.LG)

    We introduce and analyze the computational aspects of a hybrid CPU/GPU
    implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm,
    currently the state-of-the-art method in reinforcement learning for various
    gaming tasks. Our analysis concentrates on the critical aspects to leverage the
    GPU’s computational power, including the introduction of a system of queues and
    a dynamic scheduling strategy, potentially helpful for other asynchronous
    algorithms as well. We also show the potential for the use of larger DNN models
    on a GPU. Our TensorFlow implementation achieves a significant speed up
    compared to our CPU-only implementation, and it will be made publicly available
    to other researchers.

    Robust end-to-end deep audiovisual speech recognition

    Ramon Sanabria, Florian Metze, Fernando De La Torre
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Sound (cs.SD)

    Speech is one of the most effective ways of communication among humans. Even
    though audio is the most common way of transmitting speech, very important
    information can be found in other modalities, such as vision. Vision is
    particularly useful when the acoustic signal is corrupted. Multi-modal speech
    recognition however has not yet found wide-spread use, mostly because the
    temporal alignment and fusion of the different information sources is
    challenging.

    This paper presents an end-to-end audiovisual speech recognizer (AVSR), based
    on recurrent neural networks (RNN) with a connectionist temporal classification
    (CTC) loss function. CTC creates sparse “peaky” output activations, and we
    analyze the differences in the alignments of output targets (phonemes or
    visemes) between audio-only, video-only, and audio-visual feature
    representations. We present the first such experiments on the large vocabulary
    IBM ViaVoice database, which outperform previously published approaches on
    phone accuracy in clean and noisy conditions.

    Statistical Learning for OCR Text Correction

    Jie Mei, Aminul Islam, Yajing Wu, Abidalrahman Moh'd, Evangelos E. Milios
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Learning (cs.LG)

    The accuracy of Optical Character Recognition (OCR) is crucial to the success
    of subsequent applications used in text analyzing pipeline. Recent models of
    OCR post-processing significantly improve the quality of OCR-generated text,
    but are still prone to suggest correction candidates from limited observations
    while insufficiently accounting for the characteristics of OCR errors. In this
    paper, we show how to enlarge candidate suggestion space by using external
    corpus and integrating OCR-specific features in a regression approach to
    correct OCR-generated errors. The evaluation results show that our model can
    correct 61.5% of the OCR-errors (considering the top 1 suggestion) and 71.5% of
    the OCR-errors (considering the top 3 suggestions), for cases where the
    theoretical correction upper-bound is 78%.

    Probabilistic structure discovery in time series data

    David Janz, Brooks Paige, Tom Rainforth, Jan-Willem van de Meent, Frank Wood
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Existing methods for structure discovery in time series data construct
    interpretable, compositional kernels for Gaussian process regression models.
    While the learned Gaussian process model provides posterior mean and variance
    estimates, typically the structure is learned via a greedy optimization
    procedure. This restricts the space of possible solutions and leads to
    over-confident uncertainty estimates. We introduce a fully Bayesian approach,
    inferring a full posterior over structures, which more reliably captures the
    uncertainty of the model.

    Emergence of Compositional Representations in Restricted Boltzmann Machines

    Jérôme Tubiana (LPTENS), Rémi Monasson (LPTENS)
    Comments: Supplementary material available at the authors’ webpage
    Subjects: Data Analysis, Statistics and Probability (physics.data-an); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG); Machine Learning (stat.ML)

    Extracting automatically the complex set of features composing real
    high-dimensional data is crucial for achieving high performance in
    machine–learning tasks. Restricted Boltzmann Machines (RBM) are empirically
    known to be efficient for this purpose, and to be able to generate distributed
    and graded representations of the data. We characterize the structural
    conditions (sparsity of the weights, low effective temperature, nonlinearities
    in the activation functions of hidden units, and adaptation of fields
    maintaining the activity in the visible layer) allowing RBM to operate in such
    a compositional phase. Evidence is provided by the replica analysis of an
    adequate statistical ensemble of random RBMs and by RBM trained on the
    handwritten digits dataset MNIST.

    On the convergence of gradient-like flows with noisy gradient input

    Panayotis Mertikopoulos, Mathias Staudigl
    Comments: 33 pages, 3 figures
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Dynamical Systems (math.DS)

    In view of solving convex optimization problems with noisy gradient input, we
    analyze the asymptotic behavior of gradient-like flows that are subject to
    stochastic disturbances. Specifically, we focus on the widely studied class of
    mirror descent methods for constrained convex programming and we examine the
    dynamics’ convergence and concentration properties in the presence of noise. In
    the small noise limit, we show that the dynamics converge to the solution set
    of the underlying problem (a.s.). Otherwise, if the noise is persistent, we
    estimate the measure of the dynamics’ long-run concentration around interior
    solutions and their convergence to boundary solutions that are sufficiently
    “robust”. Finally, we show that a rectified variant of the method with a
    decreasing sensitivity parameter converges irrespective of the magnitude of the
    noise or the structure of the underlying convex program, and we derive an
    explicit estimate for its rate of convergence.

    Training Sparse Neural Networks

    Suraj Srinivas, Akshayvarun Subramanya, R. Venkatesh Babu
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep neural networks with lots of parameters are typically used for
    large-scale computer vision tasks such as image classification. This is a
    result of using dense matrix multiplications and convolutions. However, sparse
    computations are known to be much more efficient. In this work, we train and
    build neural networks which implicitly use sparse computations. We introduce
    additional gate variables to perform parameter selection and show that this is
    equivalent to using a spike-and-slab prior. We experimentally validate our
    method on both small and large networks and achieve state-of-the-art
    compression results for sparse neural network models.

    Error analysis of regularized least-square regression with Fredholm kernel

    Yanfang Tao, Peipei Yuan, Biqin Song
    Subjects: Statistics Theory (math.ST); Learning (cs.LG)

    Learning with Fredholm kernel has attracted increasing attention recently
    since it can effectively utilize the data information to improve the prediction
    performance. Despite rapid progress on theoretical and experimental
    evaluations, its generalization analysis has not been explored in learning
    theory literature. In this paper, we establish the generalization bound of
    least square regularized regression with Fredholm kernel, which implies that
    the fast learning rate O(l^{-1}) can be reached under mild capacity conditions.
    Simulated examples show that this Fredholm regression algorithm can achieve the
    satisfactory prediction performance.

    Scalable Adaptive Stochastic Optimization Using Random Projections

    Gabriel Krummenacher, Brian McWilliams, Yannic Kilcher, Joachim M. Buhmann, Nicolai Meinshausen
    Comments: To appear in Advances in Neural Information Processing Systems 29 (NIPS 2016)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Adaptive stochastic gradient methods such as AdaGrad have gained popularity
    in particular for training deep neural networks. The most commonly used and
    studied variant maintains a diagonal matrix approximation to second order
    information by accumulating past gradients which are used to tune the step size
    adaptively. In certain situations the full-matrix variant of AdaGrad is
    expected to attain better performance, however in high dimensions it is
    computationally impractical. We present Ada-LR and RadaGrad two computationally
    efficient approximations to full-matrix AdaGrad based on randomized
    dimensionality reduction. They are able to capture dependencies between
    features and achieve similar performance to full-matrix AdaGrad but at a much
    smaller computational cost. We show that the regret of Ada-LR is close to the
    regret of full-matrix AdaGrad which can have an up-to exponentially smaller
    dependence on the dimension than the diagonal variant. Empirically, we show
    that Ada-LR and RadaGrad perform similarly to full-matrix AdaGrad. On the task
    of training convolutional neural networks as well as recurrent neural networks,
    RadaGrad achieves faster convergence than diagonal AdaGrad.

    Deep Learning for the Classification of Lung Nodules

    He Yang, Hengyong Yu, Ge Wang
    Subjects: Quantitative Methods (q-bio.QM); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Deep learning, as a promising new area of machine learning, has attracted a
    rapidly increasing attention in the field of medical imaging. Compared to the
    conventional machine learning methods, deep learning requires no hand-tuned
    feature extractor, and has shown a superior performance in many visual object
    recognition applications. In this study, we develop a deep convolutional neural
    network (CNN) and apply it to thoracic CT images for the classification of lung
    nodules. We present the CNN architecture and classification accuracy for the
    original images of lung nodules. In order to understand the features of lung
    nodules, we further construct new datasets, based on the combination of
    artificial geometric nodules and some transformations of the original images,
    as well as a stochastic nodule shape model. It is found that simplistic
    geometric nodules cannot capture the important features of lung nodules.

    Variational Boosting: Iteratively Refining Posterior Approximations

    Andrew C. Miller, Nicholas Foti, Ryan P. Adams
    Comments: 21 pages, 9 figures
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Methodology (stat.ME)

    We propose a black-box variational inference method to approximate
    intractable distributions with an increasingly rich approximating class. Our
    method, termed variational boosting, iteratively refines an existing
    variational approximation by solving a sequence of optimization problems,
    allowing the practitioner to trade computation time for accuracy. We show how
    to expand the variational approximating class by incorporating additional
    covariance structure and by introducing new components to form a mixture. We
    apply variational boosting to synthetic and real statistical models, and show
    that resulting posterior inferences compare favorably to existing posterior
    approximation algorithms in both accuracy and efficiency.

    Efficient Stochastic Inference of Bitwise Deep Neural Networks

    Sebastian Vogel, Christoph Schorn, Andre Guntoro, Gerd Ascheid
    Comments: 6 pages, 3 figures, Workshop on Efficient Methods for Deep Neural Networks at Neural Information Processing Systems Conference 2016, NIPS 2016, EMDNN 2016
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    Recently published methods enable training of bitwise neural networks which
    allow reduced representation of down to a single bit per weight. We present a
    method that exploits ensemble decisions based on multiple stochastically
    sampled network models to increase performance figures of bitwise neural
    networks in terms of classification accuracy at inference. Our experiments with
    the CIFAR-10 and GTSRB datasets show that the performance of such network
    ensembles surpasses the performance of the high-precision base model. With this
    technique we achieve 5.81% best classification error on CIFAR-10 test set using
    bitwise networks. Concerning inference on embedded systems we evaluate these
    bitwise networks using a hardware efficient stochastic rounding procedure. Our
    work contributes to efficient embedded bitwise neural networks.

    Fast Video Classification via Adaptive Cascading of Deep Models

    Haichen Shen, Seungyeop Han, Matthai Philipose, Arvind Krishnamurthy
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Recent advances have enabled “oracle” classifiers that can classify across
    many classes and input distributions with high accuracy without retraining.
    However, these classifiers are relatively heavyweight, so that applying them to
    classify video is costly. We show that day-to-day video exhibits highly skewed
    class distributions over the short term, and that these distributions can be
    classified by much simpler models. We formulate the problem of detecting the
    short-term skews online and exploiting models based on it as a new sequential
    decision making problem dubbed the Online Bandit Problem, and present a new
    algorithm to solve it. When applied to recognizing faces in TV shows and
    movies, we realize end-to-end classification speedups of 2.5-8.5x/2.8-12.7x (on
    GPU/CPU) relative to a state-of-the-art convolutional neural network, at
    competitive accuracy.

    A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective

    SamanehSorournejad, Zahra Zojaji, Reza Ebrahimi Atani, Amir Hassan Monadjemi
    Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Credit card plays a very important rule in today’s economy. It becomes an
    unavoidable part of household, business and global activities. Although using
    credit cards provides enormous benefits when used carefully and
    responsibly,significant credit and financial damages may be caused by
    fraudulent activities. Many techniques have been proposed to confront the
    growth in credit card fraud. However, all of these techniques have the same
    goal of avoiding the credit card fraud; each one has its own drawbacks,
    advantages and characteristics. In this paper, after investigating difficulties
    of credit card fraud detection, we seek to review the state of the art in
    credit card fraud detection techniques, data sets and evaluation criteria.The
    advantages and disadvantages of fraud detection methods are enumerated and
    compared.Furthermore, a classification of mentioned techniques into two main
    fraud detection approaches, namely, misuses (supervised) and anomaly detection
    (unsupervised) is presented. Again, a classification of techniques is proposed
    based on capability to process the numerical and categorical data sets.
    Different data sets used in literature are then described and grouped into real
    and synthesized data and the effective and common attributes are extracted for
    further usage.Moreover, evaluation employed criterions in literature are
    collected and discussed.Consequently, open issues for credit card fraud
    detection are explained as guidelines for new researchers.

    Learning the Number of Neurons in Deep Networks

    Jose M Alvarez, Mathieu Salzmann
    Comments: NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Nowadays, the number of layers and of neurons in each layer of a deep network
    are typically set manually. While very deep and wide networks have proven
    effective in general, they come at a high memory and computation cost, thus
    making them impractical for constrained platforms. These networks, however, are
    known to have many redundant parameters, and could thus, in principle, be
    replaced by more compact architectures. In this paper, we introduce an approach
    to automatically determining the number of neurons in each layer of a deep
    network during learning. To this end, we propose to make use of a group
    sparsity regularizer on the parameters of the network, where each group is
    defined to act on a single neuron. Starting from an overcomplete network, we
    show that our approach can reduce the number of parameters by up to 80\% while
    retaining or even improving the network accuracy.

    Local minima in training of deep networks

    Grzegorz Swirszcz, Wojciech Marian Czarnecki, Razvan Pascanu
    Comments: submitted to ICLR 2016
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    There has been a lot of recent interest in trying to characterize the error
    surface of deep models. This stems from a long standing question. Given that
    deep networks are highly nonlinear systems optimized by local gradient methods,
    why do they not seem to be affected by bad local minima? It is widely believed
    that training of deep models using gradient methods works so well because the
    error surface either has no local minima, or if they exist they need to be
    close in value to the global minimum. It is known that such results hold under
    very strong assumptions which are not satisfied by real models. In this paper
    we present examples showing that for such theorem to be true additional
    assumptions on the data, initialization schemes and/or the model classes have
    to be made. We look at the particular case of finite size datasets. We
    demonstrate that in this scenario one can construct counter-examples (datasets
    or initialization schemes) when the network does become susceptible to bad
    local minima over the weight space.

    Deep Clustering and Conventional Networks for Music Separation: Stronger Together

    Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani
    Comments: Submitted to ICASSP 2017
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Sound (cs.SD)

    Deep clustering is the first method to handle general audio separation
    scenarios with multiple sources of the same type and an arbitrary number of
    sources, performing impressively in speaker-independent speech separation
    tasks. However, little is known about its effectiveness in other challenging
    situations such as music source separation. Contrary to conventional networks
    that directly estimate the source signals, deep clustering generates an
    embedding for each time-frequency bin, and separates sources by clustering the
    bins in the embedding space. We show that deep clustering outperforms
    conventional networks on a singing voice separation task, in both matched and
    mismatched conditions, even though conventional networks have the advantage of
    end-to-end training for best signal approximation, presumably because its more
    flexible objective engenders better regularization. Since the strengths of deep
    clustering and conventional network architectures appear complementary, we
    explore combining them in a single hybrid network trained via an approach akin
    to multi-task learning. Remarkably, the combination significantly outperforms
    either of its components.

    Using LSTM recurrent neural networks for detecting anomalous behavior of LHC superconducting magnets

    Maciej Wielgosz, Andrzej Skoczeń, Matej Mertik
    Subjects: Instrumentation and Detectors (physics.ins-det); Learning (cs.LG); Accelerator Physics (physics.acc-ph)

    The superconducting LHC magnets are coupled with an electronic monitoring
    system which records and analyses voltage time series reflecting their
    performance. A currently used system is based on a range of preprogrammed
    triggers which launches protection procedures when a misbehavior of the magnets
    is detected. All the procedures used in the protection equipment were designed
    and implemented according to known working scenarios of the system and are
    updated and monitored by human operators.

    This paper proposes a novel approach to monitoring and fault protection of
    the Large Hadron Collider (LHC) superconducting magnets which employs
    state-of-the-art Deep Learning algorithms. Consequently, the authors of the
    paper decided to examine the performance of LSTM recurrent neural networks for
    anomaly detection in voltage time series of the magnets. In order to address
    this challenging task different network architectures and hyper-parameters were
    used to achieve the best possible performance of the solution. The regression
    results were measured in terms of RMSE for different number of future steps and
    history length taken into account for the prediction. The best result of
    RMSE=0.00104 was obtained for a network of 128 LSTM cells within the internal
    layer and 16 steps history buffer.

    Optical Flow Requires Multiple Strategies (but only one network)

    Tal Schuster, Lior Wolf, David Gadot
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    We show that the matching problem that underlies optical flow requires
    multiple strategies, depending on the amount of image motion and other factors.
    We then study the implications of this observation on training a deep neural
    network for representing image patches in the context of descriptor based
    optical flow. We propose a metric learning method, which selects suitable
    negative samples based on the nature of the true match. This type of training
    produces a network that displays multiple strategies depending on the input and
    leads to state of the art results on the KITTI 2012 and KITTI 2015 optical flow
    benchmarks.


    Information Theory

    Augustin's Method – Part II: The Sphere Packing Bound

    Barış Nakiboğlu
    Comments: 34 pages. The original submission (arXiv:1608.02424v1) is split into two upon the suggestion of the executive editor of IT transactions
    Subjects: Information Theory (cs.IT)

    The channel coding problem is reviewed for an abstract framework. If the
    scaled Renyi capacities of a sequence of channels converge to a finite
    continuous function (varphi) on an interval of the form ((1-varepsilon,1])
    for an (varepsilon>0), then the capacity of the sequence of channels is
    (varphi(1)). If the convergence holds on an interval of the form
    ((1-varepsilon,1+varepsilon)) then the strong converse holds. Both hypotheses
    hold for large classes of product channels and for certain memoryless Poisson
    channels. A sphere packing bound with a polynomial prefactor is established for
    the decay rate of the error probability with the block length on any sequence
    of product channels ({mathcal{W}_{[1,n]}}_{ninmathbb{Z}^{+}}) satisfying
    (max_{tleq n} C_{0.5,mathcal{W}_{t}}=mathit{O}(ln n)). For discrete
    stationary product channels with feedback sphere packing exponent is proved to
    bound the exponential decay rate of the error probability with block length
    from above. The latter result continues to hold for product channels with
    feedback satisfying a milder stationarity hypothesis. A sphere packing bound
    with a polynomial prefactor is established for certain memoryless Poisson
    channels.

    Communications and Signals Design for Wireless Power Transmission

    Yong Zeng, Bruno Clerckx, Rui Zhang
    Comments: Invited tutorial paper, submitted for publication, 26 pages, 11 figures
    Subjects: Information Theory (cs.IT)

    Radiative wireless power transfer (WPT) is a promising technology to provide
    cost-effective and real-time power supplies to wireless devices. Although
    radiative WPT shares many similar characteristics with the extensively studied
    wireless information transfer or communication, they also differ significantly
    in terms of design objectives, transmitter/receiver architectures and hardware
    constraints, etc. In this article, we first give an overview on the various WPT
    technologies, the historical development of the radiative WPT technology and
    the main challenges in designing contemporary radiative WPT systems. Then, we
    focus on discussing the new communication and signal processing techniques that
    can be applied to tackle these challenges. Topics discussed include energy
    harvester modeling, energy beamforming for WPT, channel acquisition, power
    region characterization in multi-user WPT, waveform design with linear and
    non-linear energy receiver model, safety and health issues of WPT, massive MIMO
    (multiple-input multiple-output) and millimeter wave (mmWave) enabled WPT,
    wireless charging control, and wireless power and communication systems
    co-design. We also point out directions that are promising for future research.

    Compute-and-Forward in Cell-Free Massive MIMO: Great Performance with Low Backhaul Load

    Qinhui Huang, Alister Burr
    Subjects: Information Theory (cs.IT)

    In this paper, we consider the uplink of cell-free massive MIMO systems,
    where a large number of distributed single antenna access points (APs) serve a
    much smaller number of users simultaneously via limited backhaul. For the first
    time, we investigate the performance of compute-and-forward (C&F) in such an
    ultra dense network with a realistic channel model (including fading, pathloss
    and shadowing). By utilising the characteristic of pathloss, a low complexity
    coefficient selection algorithm for C&F is proposed. We also give a greedy AP
    selection method for message recovery. Additionally, we compare the performance
    of C&F to some other promising linear strategies for distributed massive MIMO,
    such as small cells (SC) and maximum ratio combining (MRC). Numerical results
    reveal that C&F not only reduces the backhaul load, but also significantly
    increases the system throughput for the symmetric scenario.

    On the List-Decodability of Random Self-Orthogonal Codes

    Lingfei Jin, Chaoping Xing, Xiande Zhang
    Subjects: Information Theory (cs.IT)

    In 2011, Guruswami-H{aa}stad-Kopparty cite{Gru} showed that the
    list-decodability of random linear codes is as good as that of general random
    codes. In the present paper, we further strengthen the result by showing that
    the list-decodability of random {it Euclidean self-orthogonal} codes is as
    good as that of general random codes as well, i.e., achieves the classical
    Gilbert-Varshamov bound. Specifically, we show that, for any fixed finite field
    (F_q), error fraction (deltain (0,1-1/q)) satisfying (1-H_q(delta)le
    frac12) and small (epsilon>0), with high probability a random Euclidean
    self-orthogonal code over (F_q) of rate (1-H_q(delta)-epsilon) is ((delta,
    O(1/epsilon)))-list-decodable. This generalizes the result of linear codes to
    Euclidean self-orthogonal codes. In addition, we extend the result to list
    decoding {it symplectic dual-containing} codes by showing that the
    list-decodability of random symplectic dual-containing codes achieves the
    quantum Gilbert-Varshamov bound as well. This implies that list-decodability of
    quantum stabilizer codes can achieve the quantum Gilbert-Varshamov bound.

    The counting argument on self-orthogonal codes is an important ingredient to
    prove our result.

    Throughput Efficient Large M2M Networks through Incremental Redundancy Combining

    Amogh Rajanna, Mos Kaveh
    Comments: 6 pages, 6 figures, submitted to IEEE Wireless Communications Networking Conference (WCNC) 2017 Workshop (M2M and Internet of Things). arXiv admin note: substantial text overlap with arXiv:1508.02117
    Subjects: Information Theory (cs.IT)

    In this paper, we investigate the performance of incremental redundancy
    combining as a new cooperative relaying protocol for large M2M networks with
    opportunistic relaying. The nodes in the large M2M network are modeled by a
    Poisson Point Process, experience Rayleigh fading and utilize slotted ALOHA as
    the MAC protocol. The progress rate density (PRD) of the M2M network is used to
    quantify the performance of proposed relaying protocol and compare it to
    conventional multihop relaying with no cooperation. It is shown that
    incremental redundancy combining in a large M2M network provides substantial
    throughput improvements over conventional relaying with no cooperation at all
    practical values of the network parameters.

    Coded Caching with Distributed Storage

    Tianqiong Luo, Vaneet Aggarwal, Borja Peleato
    Comments: submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    Content delivery networks store information distributed across multiple
    servers, so as to balance the load and avoid unrecoverable losses in case of
    node or disk failures. Coded caching has been shown to be a useful technique
    which can reduce peak traffic rates by pre-fetching popular content at the end
    users and encoding transmissions so that different users can extract different
    information from the same packet. On one hand, distributed storage limits the
    capability of combining content from different servers into a single message,
    causing performance losses in coded caching schemes. But, on the other hand,
    the inherent redundancy existing in distributed storage systems can be used to
    improve the performance of those schemes through parallelism.

    This paper designs a scheme combining distributed storage of the content in
    multiple servers and an efficient coded caching algorithm for delivery to the
    users. This scheme is shown to reduce the peak transmission rate below that of
    state-of-the-art algorithms.

    On Storage Allocation in Cache-Enabled Interference Channels with Mixed CSIT

    Mohammad Ali Tahmasbi Nejad, Seyed Pooya Shariatpanahi, Babak Hossein Khalaj
    Subjects: Information Theory (cs.IT)

    Recently, it has been shown that in a cache-enabled interference channel, the
    storage at the transmit and receive sides are of equal value in terms of
    Degrees of Freedom (DoF). This is derived by assuming full Channel State
    Information at the Transmitter (CSIT). In this paper, we consider a more
    practical scenario, where a training/feedback phase should exist for obtaining
    CSIT, during which instantaneous channel state is not known to the
    transmitters. This results in a combination of delayed and current CSIT
    availability, called mixed CSIT. In this setup, we derive DoF of a
    cache-enabled interference channel with mixed CSIT, which depends on the memory
    available at transmit and receive sides as well as the training/feedback phase
    duration. In contrast to the case of having full CSIT, we prove that, in our
    setup, the storage at the receive side is more valuable than the one at the
    transmit side. This is due to the fact that cooperation opportunities granted
    by transmitters’ caches are strongly based on instantaneous CSIT availability.
    However, multi-casting opportunities provided by receivers’ caches are robust
    to such imperfection.

    Robust Regularized Least-Squares Beamforming Approach to Signal Estimation

    Mohamed Suliman, Tarig Ballal, Tareq Y. Al-Naffouri
    Comments: 5 pages, 2 figures, conference
    Subjects: Information Theory (cs.IT)

    In this paper, we address the problem of robust adaptive beamforming of
    signals received by a linear array. The challenge associated with the
    beamforming problem is twofold. Firstly, the process requires the inversion of
    the usually ill-conditioned covariance matrix of the received signals.
    Secondly, the steering vector pertaining to the direction of arrival of the
    signal of interest is not known precisely. To tackle these two challenges, the
    standard capon beamformer is manipulated to a form where the beamformer output
    is obtained as a scaled version of the inner product of two vectors. The two
    vectors are linearly related to the steering vector and the received signal
    snapshot, respectively. The linear operator, in both cases, is the square root
    of the covariance matrix. A regularized least-squares (RLS) approach is
    proposed to estimate these two vectors and to provide robustness without
    exploiting prior information. Simulation results show that the RLS beamformer
    using the proposed regularization algorithm outperforms state-of-the-art
    beamforming algorithms, as well as another RLS beamformers using a standard
    regularization approaches.

    A Sequence Construction of Cyclic Codes over Finite Fields

    Cunsheng Ding
    Comments: arXiv admin note: substantial text overlap with arXiv:1206.4370
    Subjects: Information Theory (cs.IT)

    Due to their efficient encoding and decoding algorithms, cyclic codes, a
    subclass of linear codes, have applications in communication systems, consumer
    electronics, and data storage systems. There are several approaches to
    constructing all cyclic codes over finite fields, including the generator
    matrix approach, the generator polynomial approach, and the generating
    idempotent approach. Another one is a sequence approach, which has been
    intensively investigated in the past decade. The objective of this paper is to
    survey the progress in this direction in the past decade. Many open problems
    are also presented in this paper.

    A Class of Two-Weight and Three-Weight Linear Codes and Their Duals

    Li Liu, Xianhong Xie, Lanqiang Li
    Subjects: Information Theory (cs.IT)

    The objective of this paper is to construct a class of linear codes with two
    nonzero weights and three nonzero weights by using the general trace functions,
    which weight distributions has been determined. These linear codes contain some
    optimal codes, which meets certain bound on linear codes. The dual codes are
    also studied and proved to be optimal or almost optimal. These codes may have
    applications in authentication codes, secret sharing schemes and strongly
    regular graphs.

    Spectrum Sharing Radar: Coexistence via Xampling

    Deborah Cohen, Kumar Vijay Mishra, Yonina C. Eldar
    Subjects: Information Theory (cs.IT)

    This paper presents a spectrum sharing technology enabling interference-free
    operation of a surveillance radar and communication transmissions over a common
    spectrum. A cognitive radio receiver senses the spectrum using low sampling and
    processing rates. The radar is a cognitive system that employs a Xampling-based
    receiver and transmits in several narrow bands. Our main contribution is the
    alliance of two previous ideas, CRo and cognitive radar (CRr), and their
    adaptation to solve the spectrum sharing problem.

    The second generalized Hamming weight of some evaluation codes arising from a projective torus

    Manuel Gonzalez-Sarabia, Eduardo Camps, Eliseo Sarmiento, Rafael H. Villarreal
    Subjects: Information Theory (cs.IT); Commutative Algebra (math.AC)

    In this paper we find the second generalized Hamming weight of some
    evaluation codes arising from a projective torus, and it allows to compute the
    second generalized Hamming weight of the codes parameterized by the edges of
    any complete bipartite graph. Also, at the beginning, we obtain some results
    about the generalized Hamming weights of some evaluation codes arising from a
    complete intersection when the minimum distance is known and they are
    non–degenerate codes. Finally we give an example where we use these results to
    determine the complete weight hierarchy of some codes.

    On MDS Negacyclic LCD Codes

    Mustafa Sarı, Mehmet Emin Koroglu
    Subjects: Information Theory (cs.IT)

    This paper is devoted to the study of linear codes with complementary-duals
    (LCD) arising from negacyclic codes over finite fields (mathbb{F}_{q},) where
    (q) is an odd prime power. We obtain two classes of MDS negacyclic LCD codes of
    length (n|frac{{q – 1}}{2}), (n|frac{{q + 1}}{2}) and a class of negacyclic
    LCD codes of length (n=q + 1). Also, we drive some parameters of Hermitian
    negacyclic LCD codes over (mathbb{F}_{q^{2}}) of length (n=q^{2}-1) and (n=q –
    1). For both Euclidean and Hermitian cases the dimension of these codes are
    determined and for some classes the minimum distance is settled. For the other
    cases, by studying (q) and (q^{2}-)cyclotomic classes we give lower bounds on
    the minimum distance.

    Decomposition of bent generalized Boolean functions

    Lin Sok, MinJia Shi, Patrick Solé
    Comments: 3 pages, submitted to IEEE Communication Letters
    Subjects: Information Theory (cs.IT)

    A one to one correspondence between regular generalized bent functions from
    (F_2^n) to (_{2^m},) and (m-)tuples of Boolean bent functions is
    established. This correspondence maps self-dual (resp. anti-self-dual)
    generalized bent functions to (m-)tuples of self-dual (resp. anti self-dual)
    Boolean bent functions. An application to the classification of regular
    generalized bent functions under the extended affine group is given.

    A Reinforcement Learning Approach to Power Control and Rate Adaptation in Cellular Networks

    Euhanna Ghadimi, Francesco Davide Calabrese, Gunnar Peters, Pablo Soldati
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)

    Optimizing radio transmission power and user data rates in wireless systems
    via power control requires an accurate and instantaneous knowledge of the
    system model. While this problem has been extensively studied in the
    literature, an efficient solution approaching optimality with the limited
    information available in practical systems is still lacking. This paper
    presents a reinforcement learning framework for power control and rate
    adaptation in the downlink of a radio access network that closes this gap. We
    present a comprehensive design of the learning framework that includes the
    characterization of the system state, the design of a general reward function,
    and the method to learn the control policy. System level simulations show that
    our design can quickly learn a power control policy that brings significant
    energy savings and fairness across users in the system.

    Threshold phenomena for interference with randomly placed sensors

    Rafał Kapelko
    Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)

    Assume (n) sensors are initially placed on the half-infinite interval
    ([0,infty)) according to Poisson process with arrival rate (n.) Let (s ge 0)
    be a given real number. We are allowed to move the sensors on the line, so as
    that no two sensors are placed at distance less than (s.) When a sensor is
    displaced a distance equal to (|m(i)|) the cost of movement is proportional to
    some (fixed) power (a>0) of the (|m(i)|) distance traveled. As cost measure for
    the displacement of the team of sensors we consider the (a)-total movement
    defined as the sum (M_a:=sum_{i=1}^n |m(i)|^a,) for some constans (a>0.) In
    this paper we study tradeoffs between interference value (s) and the expected
    minum (a)-total movement.




沪ICP备19023445号-2号
友情链接