IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Wed, 7 Dec 2016

    我爱机器学习(52ml.net)发表于 2016-12-07 00:00:00
    love 0

    Neural and Evolutionary Computing

    Semi-Supervised Learning with the Deep Rendering Mixture Model

    Tan Nguyen, Wanjia Liu, Ethan Perez, Richard G. Baraniuk, Ankit B. Patel
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Semi-supervised learning algorithms reduce the high cost of acquiring labeled
    training data by using both labeled and unlabeled data during learning. Deep
    Convolutional Networks (DCNs) have achieved great success in supervised tasks
    and as such have been widely employed in the semi-supervised learning. In this
    paper we leverage the recently developed Deep Rendering Mixture Model (DRMM), a
    probabilistic generative model that models latent nuisance variation, and whose
    inference algorithm yields DCNs. We develop an EM algorithm for the DRMM to
    learn from both labeled and unlabeled data. Guided by the theory of the DRMM,
    we introduce a novel non-negativity constraint and a variational inference
    term. We report state-of-the-art performance on MNIST and SVHN and competitive
    results on CIFAR10. We also probe deeper into how a DRMM trained in a
    semi-supervised setting represents latent nuisance variation using
    synthetically rendered images. Taken together, our work provides a unified
    framework for supervised, unsupervised, and semi-supervised learning.

    Correlation Alignment for Unsupervised Domain Adaptation

    Baochen Sun, Jiashi Feng, Kate Saenko
    Comments: Introduction to CORAL, CORAL-LDA, and Deep CORAL. arXiv admin note: text overlap with arXiv:1511.05547
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    In this chapter, we present CORrelation ALignment (CORAL), a simple yet
    effective method for unsupervised domain adaptation. CORAL minimizes domain
    shift by aligning the second-order statistics of source and target
    distributions, without requiring any target labels. In contrast to subspace
    manifold methods, it aligns the original feature distributions of the source
    and target domains, rather than the bases of lower-dimensional subspaces. It is
    also much simpler than other distribution matching methods. CORAL performs
    remarkably well in extensive evaluations on standard benchmark datasets. We
    first describe a solution that applies a linear transformation to source
    features to align them with target features before classifier training. For
    linear classifiers, we propose to equivalently apply CORAL to the classifier
    weights, leading to added efficiency when the number of classifiers is small
    but the number and dimensionality of target examples are very high. The
    resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a
    large margin on standard domain adaptation benchmarks. Finally, we extend CORAL
    to learn a nonlinear transformation that aligns correlations of layer
    activations in deep neural networks (DNNs). The resulting Deep CORAL approach
    works seamlessly with DNNs and achieves state-of-the-art performance on
    standard benchmark datasets. Our code is available
    at:~url{this https URL}

    A Probabilistic Framework for Deep Learning

    Ankit B. Patel, Tan Nguyen, Richard G. Baraniuk
    Comments: arXiv admin note: substantial text overlap with arXiv:1504.00641
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We develop a probabilistic framework for deep learning based on the Deep
    Rendering Mixture Model (DRMM), a new generative probabilistic model that
    explicitly capture variations in data due to latent task nuisance variables. We
    demonstrate that max-sum inference in the DRMM yields an algorithm that exactly
    reproduces the operations in deep convolutional neural networks (DCNs),
    providing a first principles derivation. Our framework provides new insights
    into the successes and shortcomings of DCNs as well as a principled route to
    their improvement. DRMM training via the Expectation-Maximization (EM)
    algorithm is a powerful alternative to DCN back-propagation, and initial
    training results are promising. Classification based on the DRMM and other
    variants outperforms DCNs in supervised digit classification, training 2-3x
    faster while achieving similar accuracy. Moreover, the DRMM is applicable to
    semi-supervised and unsupervised learning tasks, achieving results that are
    state-of-the-art in several categories on the MNIST benchmark and comparable to
    state of the art on the CIFAR10 benchmark.

    Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses

    Haiping Huang
    Comments: 23 pages, 9 figures
    Subjects: Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    Revealing hidden features in unlabeled data is called unsupervised feature
    learning, which plays an important role in pretraining a deep neural network.
    Here we provide a statistical mechanics analysis of the unsupervised learning
    in a restricted Boltzmann machine with binary synapses. A message passing
    equation to infer the hidden feature is derived, and furthermore, variants of
    this equation are analyzed. A statistical analysis by replica theory describes
    the thermodynamic properties of the model. Our analysis confirms an entropy
    crisis preceding the non-convergence of the message passing equation,
    suggesting a discontinuous phase transition as a key characteristic of the
    restricted Boltzmann machine. Continuous phase transition is also confirmed
    depending on the embedded feature strength in the data. The mean-field result
    under the replica symmetric assumption agrees with that obtained by running
    message passing algorithms on single instances of finite sizes. Interestingly,
    in an approximate Hopfield model, the entropy crisis is absent, and a
    continuous phase transition is observed instead. We also develop an iterative
    equation to infer the hyper-parameter (temperature) hidden in the data, which
    in physics corresponds to iteratively imposing Nishimori condition. Our study
    provides insights towards understanding the thermodynamic properties of the
    restricted Boltzmann machine learning, and moreover important theoretical basis
    to build simplified deep networks.

    Improving the Performance of Neural Networks in Regression Tasks Using Drawering

    Konrad Zolna
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The method presented extends a given regression neural network to make its
    performance improve. The modification affects the learning procedure only,
    hence the extension may be easily omitted during evaluation without any change
    in prediction. It means that the modified model may be evaluated as quickly as
    the original one but tends to perform better.

    This improvement is possible because the modification gives better expressive
    power, provides better behaved gradients and works as a regularization. The
    knowledge gained by the temporarily extended neural network is contained in the
    parameters shared with the original neural network.

    The only cost is an increase in learning time.

    Towards the Limit of Network Quantization

    Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Network quantization is one of network compression techniques employed to
    reduce the redundancy of deep neural networks. It compresses the size of the
    storage for a large number of network parameters in a neural network by
    quantizing them and encoding the quantized values into binary codewords of
    smaller sizes. In this paper, we aim to design network quantization schemes
    that minimize the expected loss due to quantization while maximizing the
    compression ratio. To this end, we analyze the quantitative relation of
    quantization errors to the loss function of a neural network and identify that
    the Hessian-weighted distortion measure is locally the right objective function
    that we need to optimize for minimizing the loss due to quantization. As a
    result, Hessian-weighted k-means clustering is proposed for clustering network
    parameters to quantize when fixed-length binary encoding follows. When optimal
    variable-length binary codes, e.g., Huffman codes, are employed for further
    compression of quantized values after clustering, we derive that the network
    quantization problem can be related to the entropy-constrained scalar
    quantization (ECSQ) problem in information theory and consequently propose two
    solutions of ECSQ for network quantization, i.e., uniform quantization and an
    iterative algorithm similar to Lloyd’s algorithm for k-means clustering.
    Finally, using the simple uniform quantization followed by Huffman coding, our
    experiment results show that the compression ratios of 51.25, 22.17 and 40.65
    are achievable (i.e., the sizes of the compressed models are 1.95%, 4.51% and
    2.46% of the original model sizes) for LeNet, ResNet and AlexNet, respectively,
    at no or marginal performance loss.


    Computer Vision and Pattern Recognition

    Diverse Sampling for Self-Supervised Learning of Semantic Segmentation

    Mohammadreza Mostajabi, Nicholas Kolkin, Gregory Shakhnarovich
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose an approach for learning category-level semantic segmentation
    purely from image-level classification tags indicating presence of categories.
    It exploits localization cues that emerge from training classification-tasked
    convolutional networks, to drive a “self-supervision” process that
    automatically labels a sparse, diverse training set of points likely to belong
    to classes of interest. Our approach has almost no hyperparameters, is modular,
    and allows for very fast training of segmentation in less than 3 minutes. It
    obtains competitive results on the VOC 2012 segmentation benchmark. More,
    significantly the modularity and fast training of our framework allows new
    classes to efficiently added for inference.

    Core Sampling Framework for Pixel Classification

    Manohar Karki, Robert DiBiano, Saikat Basu, Supratik Mukhopadhyay
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    The intermediate map responses of a Convolutional Neural Network (CNN)
    contain information about an image that can be used to extract contextual
    knowledge about it. In this paper, we present a core sampling framework that is
    able to use these activation maps from several layers as features to another
    neural network using transfer learning to provide an understanding of an input
    image. Our framework creates a representation that combines features from the
    test data and the contextual knowledge gained from the responses of a
    pretrained network, processes it and feeds it to a separate Deep Belief
    Network. We use this representation to extract more information from an image
    at the pixel level, hence gaining understanding of the whole image. We
    experimentally demonstrate the usefulness of our framework using a pretrained
    VGG-16 model to perform segmentation on the BAERI dataset of Synthetic Aperture
    Radar(SAR) imagery and the CAMVID dataset.

    Learning Diverse Image Colorization

    Aditya Deshpande, Jiajun Lu, Mao-Chuang Yeh, David Forsyth
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Colorization is an ambiguous problem, with multiple viable colorizations for
    a single grey-level image. However, previous methods only produce the single
    most probable colorization. Our goal is to model the diversity intrinsic to the
    problem of colorization and produce multiple colorizations that display
    long-scale spatial co-ordination. We learn a low dimensional embedding of color
    fields using a variational autoencoder (VAE). We construct loss terms for the
    VAE decoder that avoid blurry outputs and take into account the uneven
    distribution of pixel colors. Finally, we develop a conditional model for the
    multi-modal distribution between grey-level image and the color field
    embeddings. Samples from this conditional model result in diverse colorization.
    We demonstrate that our method obtains better diverse colorizations than a
    standard conditional variational autoencoder model.

    Correlation Alignment for Unsupervised Domain Adaptation

    Baochen Sun, Jiashi Feng, Kate Saenko
    Comments: Introduction to CORAL, CORAL-LDA, and Deep CORAL. arXiv admin note: text overlap with arXiv:1511.05547
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    In this chapter, we present CORrelation ALignment (CORAL), a simple yet
    effective method for unsupervised domain adaptation. CORAL minimizes domain
    shift by aligning the second-order statistics of source and target
    distributions, without requiring any target labels. In contrast to subspace
    manifold methods, it aligns the original feature distributions of the source
    and target domains, rather than the bases of lower-dimensional subspaces. It is
    also much simpler than other distribution matching methods. CORAL performs
    remarkably well in extensive evaluations on standard benchmark datasets. We
    first describe a solution that applies a linear transformation to source
    features to align them with target features before classifier training. For
    linear classifiers, we propose to equivalently apply CORAL to the classifier
    weights, leading to added efficiency when the number of classifiers is small
    but the number and dimensionality of target examples are very high. The
    resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a
    large margin on standard domain adaptation benchmarks. Finally, we extend CORAL
    to learn a nonlinear transformation that aligns correlations of layer
    activations in deep neural networks (DNNs). The resulting Deep CORAL approach
    works seamlessly with DNNs and achieves state-of-the-art performance on
    standard benchmark datasets. Our code is available
    at:~url{this https URL}

    FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

    Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox
    Comments: Including supplementary material. For the video see: this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The FlowNet demonstrated that optical flow estimation can be cast as a
    learning problem. However, the state of the art with regard to the quality of
    the flow has still been defined by traditional methods. Particularly on small
    displacements and real-world data, FlowNet cannot compete with variational
    methods. In this paper, we advance the concept of end-to-end learning of
    optical flow and make it work really well. The large improvements in quality
    and speed are caused by three major contributions: first, we focus on the
    training data and show that the schedule of presenting data during training is
    very important. Second, we develop a stacked architecture that includes warping
    of the second image with intermediate optical flow. Third, we elaborate on
    small displacements by introducing a sub-network specializing on small motions.
    FlowNet 2.0 is only marginally slower than the original FlowNet but decreases
    the estimation error by more than 50%. It performs on par with state-of-the-art
    methods, while running at interactive frame rates. Moreover, we present faster
    variants that allow optical flow computation at up to 140fps with accuracy
    matching the original FlowNet.

    Tag Prediction at Flickr: a View from the Darkroom

    Pierre Garrigues, Sachin Farfade, Hamid Izadinia, Kofi Boakye, Yannis Kalantidis
    Comments: Presented at the 1st LSCVS NIPS Workshop, 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automated photo tagging has established itself as one of the most compelling
    applications of deep learning. While deep convolutional neural networks have
    repeatedly demonstrated top performance on standard datasets for
    classification, there are a number of often overlooked but important
    considerations when deploying this technology in a real-world scenario. In this
    paper, we present our efforts in developing a large-scale photo-tagging system
    for Flickr photo search. We discuss topics including how to select the tags
    that matter most to our users, develop lightweight, high-performance models for
    tag prediction, and leverage the power of large amounts of noisy data for
    training. Our results demonstrate that, for real-world datasets, training
    exclusively with noisy data yields performance nearly on par with the standard
    paradigm of first pre-training on clean data and then fine-tuning. We advocate
    for the approach of harnessing user-generated data in large-scale systems.

    Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer

    Xin Wang, Geoffrey Oxholm, Da Zhang, Yuan-Fang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Transferring artistic styles onto everyday photographs has become an
    extremely popular task in both academia and industry since the salient work by
    Gatys et al. More recently, several feed-forward networks were proposed,
    leading to significant speed up of the stylization process to nearly real-time
    by replacing the original online iterative optimization procedure with offline
    training. However, when those stylization networks are applied directly to
    high-resolution images, the style of localized regions often appears less
    similar to the desired artistic style, because the transfer process fails to
    capture small, intricate textures and maintain correct texture scales of the
    artworks. Here we propose a multimodal convolutional neural network that takes
    into consideration faithful representations of both color and luminance
    channels, and performs stylization hierarchically with multiple losses of
    increasing scales. Compared to the state-of-the-art networks, our network can
    also perform style transfer in nearly real time by performing much more
    sophisticated training offline. Furthermore, by properly handling style and
    texture cues at multiple scales using several modalities, we can transfer not
    just large-scale, obvious style cues but also subtle, exquisite ones. That is,
    our scheme can generate results that are visually pleasing and more similar to
    multiple desired artistic styles with color and texture cues at multiple
    scales.

    Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

    Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Attention-based neural encoder-decoder frameworks have been widely adopted
    for image captioning. Most methods force visual attention to be active for
    every generated word. However, the decoder likely requires little to no visual
    information from the image to predict non-visual words such as “the” and “of”.
    Other words that may seem visual can often be predicted reliably just from the
    language model e.g., “sign” after “behind a red stop” or “phone” following
    “talking on a cell”. In this paper, we propose a novel adaptive attention model
    with a visual sentinel. At each time step, our model decides whether to attend
    to the image (and if so, to which regions) or to the visual sentinel. The model
    decides whether to attend to the image and where, in order to extract
    meaningful information for sequential word generation. We test our method on
    the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach
    sets the new state-of-the-art by a significant margin.

    Revisiting Winner Take All (WTA) Hashing for Sparse Datasets

    Beidi Chen, Anshumali Shrivastava
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

    WTA (Winner Take All) hashing has been successfully applied in many large
    scale vision applications. This hashing scheme was tailored to take advantage
    of the comparative reasoning (or order based information), which showed
    significant accuracy improvements. In this paper, we identify a subtle issue
    with WTA, which grows with the sparsity of the datasets. This issue limits the
    discriminative power of WTA. We then propose a solution for this problem based
    on the idea of Densification which provably fixes the issue. Our experiments
    show that Densified WTA Hashing outperforms Vanilla WTA both in image
    classification and retrieval tasks consistently and significantly.

    Explaining Radiological Emphysema Subtypes with Unsupervised Texture Prototypes: MESA COPD Study

    Jie Yang, Elsa D. Angelini, Benjamin M. Smith, John H.M. Austin, Eric A. Hoffman, David A. Bluemke, R. Graham Barr, Andrew F. Laine
    Comments: MICCAI workshop on Medical Computer Vision: Algorithms for Big Data (2016)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Pulmonary emphysema is traditionally subcategorized into three subtypes,
    which have distinct radiological appearances on computed tomography (CT) and
    can help with the diagnosis of chronic obstructive pulmonary disease (COPD).
    Automated texture-based quantification of emphysema subtypes has been
    successfully implemented via supervised learning of these three emphysema
    subtypes. In this work, we demonstrate that unsupervised learning on a large
    heterogeneous database of CT scans can generate texture prototypes that are
    visually homogeneous and distinct, reproducible across subjects, and capable of
    predicting accurately the three standard radiological subtypes. These texture
    prototypes enable automated labeling of lung volumes, and open the way to new
    interpretations of lung CT scans with finer subtyping of emphysema.

    FLIC: Fast Linear Iterative Clustering with Active Search

    Jia-Xin Zhao, Ren Bo, Qibin Hou, Ming-Ming Cheng
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Benefiting from its high efficiency and simplicity, Simple Linear Iterative
    Clustering (SLIC) remains one of the most popular over-segmentation tools.
    However, due to explicit enforcement of spatial similarity for region
    continuity, the boundary adaptation of SLIC is sub-optimal. It also has
    drawbacks on convergence rate as a result of both the fixed search region and
    separately doing the assignment step and the update step. In this paper, we
    propose an alternative approach to fix the inherent limitations of SLIC. In our
    approach, each pixel actively searches its corresponding segment under the help
    of its neighboring pixels, which naturally enables region coherence without
    being harmful to boundary adaptation. We also jointly perform the assignment
    and update steps, allowing high convergence rate. Extensive evaluations on
    Berkeley segmentation benchmark verify that our method outperforms competitive
    methods under various evaluation metrics. It also has the lowest time cost
    among existing methods (approximately 30fps for a 481×321 image on a single CPU
    core).

    Deep Stereo Matching with Dense CRF Priors

    Ron Slossberg, Aaron Wetzler, Ron Kimmel
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Stereo reconstruction from rectified images has recently been revisited
    within the context of deep learning. Using a deep Convolutional Neural Network
    to obtain patch-wise matching cost volumes has resulted in state of the art
    stereo reconstruction on classic datasets like Middlebury and Kitti. By
    introducing this cost into a classical stereo pipeline, the final results are
    improved dramatically over non-learning based cost models. However these
    pipelines typically include hand engineered post processing steps to
    effectively regularize and clean the result. Here, we show that it is possible
    to take a more holistic approach by training a fully end-to-end network which
    directly includes regularization in the form of a densely connected Conditional
    Random Field (CRF) that acts as a prior on inter-pixel interactions. We
    demonstrate that our approach on both synthetic and real world datasets
    outperforms an alternative end-to-end network and compares favorably to more
    hand engineered approaches.

    Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment

    Sebastian Bosse, Dominique Maniry, Klaus-Robert Müller, Thomas Wiegand, Wojciech Samek
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a deep neural network-based approach to image quality
    assessment (IQA). The network can be trained end-to-end and comprises 10
    convolutional layers and 5 pooling layers for feature extraction, and 2 fully
    connected layers for regression, which makes it significantly deeper than
    related IQA methods. An unique feature of the proposed architecture is that it
    can be used (with slight adaptations) in a no-reference (NR) as well as in a
    full-reference (FR) IQA setting. Our approach is purely data-driven and does
    not rely on hand-crafted features or other types of prior domain knowledge
    about the human visual system or image statistics. The network estimates
    perceived quality patchwise; the overall image quality is calculated as the
    average of these patchwise scores. In order to consider the locally non-uniform
    distribution of perceived quality in images, we introduce a spatial attention
    mechanism which performs a weighted aggregation of the patchwise scores. We
    evaluate the proposed approach on the LIVE, CISQ and TID2013 databases and show
    superior performance to state-of-the-art NR and FR IQA methods. Finally,
    cross-database evaluation shows a high ability to generalize between different
    datasets, indicating a high robustness of the learned features.

    Cluster-Wise Ratio Tests for Fast Camera Localization

    Raúl Díaz, Charless C. Fowlkes
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Feature point matching for camera localization suffers from scalability
    problems. Even when feature descriptors associated with 3D scene points are
    locally unique, as coverage grows, similar or repeated features become
    increasingly common. As a result, the standard distance ratio-test used to
    identify reliable image feature points is overly restrictive and rejects many
    good candidate matches. We propose a simple coarse-to-fine strategy that uses
    conservative approximations to robust local ratio-tests that can be computed
    efficiently using global approximate k-nearest neighbor search. We treat these
    forward matches as votes in camera pose space and use them to prioritize
    back-matching within candidate camera pose clusters, exploiting feature
    co-visibility captured by clustering the 3D model camera pose graph. This
    approach achieves state-of-the-art camera localization results on a variety of
    popular benchmarks, outperforming several methods that use more complicated
    data structures and that make more restrictive assumptions on camera pose. We
    also carry out diagnostic analyses on a difficult test dataset containing
    globally repetitive structure that suggest our approach successfully adapts to
    the challenges of large-scale image localization.

    MarioQA: Answering Questions by Watching Gameplay Videos

    Jonghwan Mun, Paul Hongsuck Seo, Ilchae Jung, Bohyung Han
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a new benchmark dataset for video question answering (VideoQA)
    designed to evaluate algorithms’ capability of spatio-temporal event
    understanding. Existing datasets either require very high-level reasoning from
    multi-modal information to find answers, or is mostly composed of the questions
    that can be answered by watching a single frame. Therefore, they are not
    suitable to evaluate models’ real capacity and flexibility for VideoQA. To
    overcome such critical limitations, we focus on event-centric questions that
    require understanding temporal relation between multiple events in videos. An
    interesting idea in dataset construction process is that question-answer pairs
    are automatically generated from Super Mario video gameplays given a set of
    question templates. We also tackle VideoQA problem in the new dataset, referred
    to as MarioQA, by proposing spatio-temporal attention models based on deep
    neural networks. Our experiments show that the proposed deep neural network
    models with attention have meaningful performance improvement over several
    baselines.

    Fine-grained Recurrent Neural Networks for Automatic Prostate Segmentation in Ultrasound Images

    Xin Yang, Lequan Yu, Lingyun Wu, Yi Wang, Dong Ni, Jing Qin, Pheng-Ann Heng
    Comments: To appear in AAAI Conference 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Boundary incompleteness raises great challenges to automatic prostate
    segmentation in ultrasound images. Shape prior can provide strong guidance in
    estimating the missing boundary, but traditional shape models often suffer from
    hand-crafted descriptors and local information loss in the fitting procedure.
    In this paper, we attempt to address those issues with a novel framework. The
    proposed framework can seamlessly integrate feature extraction and shape prior
    exploring, and estimate the complete boundary with a sequential manner. Our
    framework is composed of three key modules. Firstly, we serialize the static 2D
    prostate ultrasound images into dynamic sequences and then predict prostate
    shapes by sequentially exploring shape priors. Intuitively, we propose to learn
    the shape prior with the biologically plausible Recurrent Neural Networks
    (RNNs). This module is corroborated to be effective in dealing with the
    boundary incompleteness. Secondly, to alleviate the bias caused by different
    serialization manners, we propose a multi-view fusion strategy to merge shape
    predictions obtained from different perspectives. Thirdly, we further implant
    the RNN core into a multiscale Auto-Context scheme to successively refine the
    details of the shape prediction map. With extensive validation on challenging
    prostate ultrasound images, our framework bridges severe boundary
    incompleteness and achieves the best performance in prostate boundary
    delineation when compared with several advanced methods. Additionally, our
    approach is general and can be extended to other medical image segmentation
    tasks, where boundary incompleteness is one of the main challenges.

    Automatic Image Defect Diagnosis

    Ning Yu, Xiaohui Shen, Zhe Lin, Radomir Mech, Connelly Barnes
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Although problems relating to specific image correction have been explored
    intensively, the problem of simultaneous diagnosis for multiple photographic
    defects remains relatively untouched. Solutions to this problem attempt to
    predict the existence, severity, and locations of common defects. This paper
    proposes a first attempt at a solution to the general defect diagnosis problem
    based on our novel dataset. We formulate the defect diagnosis problem as a
    multi-task prediction problem and utilize multi-column deep neural networks
    (DNN) to approach the problem. We propose DNN models with holistic and
    multi-patch inputs and combine their predicted scores to integrate multi-scale
    information. During experiments, we validate the complementarity of both kinds
    of inputs. We also validate that our combined predictions have a more
    consistent ranking correlation with our ground truth than the average of
    individual users’ judgments. Furthermore, we apply the fully convolutional
    version of our trained model to visualize defect severity heat maps, which can
    effectively identify defective regions of input images. We propose that our
    work will provide casual photographers with better experiences when using image
    editing software to improve image quality. Another promising avenue for future
    application involves the equipping of photo summarization systems with defect
    cues to focus more on defect-free photos.

    Automatic Event Detection for Signal-based Surveillance

    Jingxin Xu, Clinton Fookes, Sridha Sridharan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Signal-based Surveillance systems such as Closed Circuits Televisions (CCTV)
    have been widely installed in public places. Those systems are normally used to
    find the events with security interest, and play a significant role in public
    safety. Though such systems are still heavily reliant on human labour to
    monitor the captured information, there have been a number of automatic
    techniques proposed to analysing the data. This article provides an overview of
    automatic surveillance event detection techniques . Despite it’s popularity in
    research, it is still too challenging a problem to be realised in a real world
    deployment. The challenges come from not only the detection techniques such as
    signal processing and machine learning, but also the experimental design with
    factors such as data collection, evaluation protocols, and ground-truth
    annotation. Finally, this article propose that multi-disciplinary research is
    the path towards a solution to this problem.

    Superpixels: An Evaluation of the State-of-the-Art

    David Stutz, Alexander Hermans, Bastian Leibe
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Superpixels group perceptually similar pixels to create visually meaningful
    entities while heavily reducing the number of primitives. As of these
    properties, superpixel algorithms have received much attention since their
    naming in 2003. By today, publicly available and well-understood superpixel
    algorithms have turned into standard tools in low-level vision. As such, and
    due to their quick adoption in a wide range of applications, appropriate
    benchmarks are crucial for algorithm selection and comparison. Until now, the
    rapidly growing number of algorithms as well as varying experimental setups
    hindered the development of a unifying benchmark. We present a comprehensive
    evaluation of 28 state-of-the-art superpixel algorithms utilizing a benchmark
    focussing on fair comparison and designed to provide new and relevant insights.
    To this end, we explicitly discuss parameter optimization and the importance of
    strictly enforcing connectivity. Furthermore, by extending well-known metrics,
    we are able to summarize algorithm performance independent of the number of
    generated superpixels, thereby overcoming a major limitation of available
    benchmarks. Furthermore, we discuss runtime, robustness against noise, blur and
    affine transformations, implementation details as well as aspects of visual
    quality. Finally, we present an overall ranking of superpixel algorithms which
    redefines the state-of-the-art and enables researchers to easily select
    appropriate algorithms and the corresponding implementations which themselves
    are made publicly available as part of our benchmark at
    davidstutz.de/projects/superpixel-benchmark/.

    Object Classification with Joint Projection and Low-rank Dictionary Learning

    Homa Foroughi, Nilanjan Ray, Hong Zhang
    Comments: arXiv admin note: text overlap with arXiv:1603.07697; text overlap with arXiv:1404.3606 by other authors
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    For an object classification system, the most critical obstacles towards
    real-world applications are often caused by large intra-class variability,
    arising from different lightings, occlusion and corruption, in limited sample
    sets. Most methods in the literature would fail when the training samples are
    heavily occluded, corrupted or have significant illumination or viewpoint
    variations. Besides, most of the existing methods and especially deep
    learning-based methods, need large training sets to achieve a satisfactory
    recognition performance. Although using the pre-trained network on a generic
    large-scale dataset and fine-tune it to the small-sized target dataset is a
    widely used technique, this would not help when the content of base and target
    datasets are very different. To address these issues, we propose a joint
    projection and low-rank dictionary learning method using dual graph constraints
    (JP-LRDL). The proposed joint learning method would enable us to learn the
    features on top of which dictionaries can be better learned, from the data with
    large intra-class variability. Specifically, a structured class-specific
    dictionary is learned and the discrimination is further improved by imposing a
    graph constraint on the coding coefficients, that maximizes the intra-class
    compactness and inter-class separability. We also enforce low-rank and
    structural incoherence constraints on sub-dictionaries to make them more
    compact and robust to variations and outliers and reduce the redundancy among
    them, respectively. To preserve the intrinsic structure of data and penalize
    unfavourable relationship among training samples simultaneously, we introduce a
    projection graph into the framework, which significantly enhances the
    discriminative ability of the projection matrix and makes the method robust to
    small-sized and high-dimensional datasets.

    Towards the Limit of Network Quantization

    Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Network quantization is one of network compression techniques employed to
    reduce the redundancy of deep neural networks. It compresses the size of the
    storage for a large number of network parameters in a neural network by
    quantizing them and encoding the quantized values into binary codewords of
    smaller sizes. In this paper, we aim to design network quantization schemes
    that minimize the expected loss due to quantization while maximizing the
    compression ratio. To this end, we analyze the quantitative relation of
    quantization errors to the loss function of a neural network and identify that
    the Hessian-weighted distortion measure is locally the right objective function
    that we need to optimize for minimizing the loss due to quantization. As a
    result, Hessian-weighted k-means clustering is proposed for clustering network
    parameters to quantize when fixed-length binary encoding follows. When optimal
    variable-length binary codes, e.g., Huffman codes, are employed for further
    compression of quantized values after clustering, we derive that the network
    quantization problem can be related to the entropy-constrained scalar
    quantization (ECSQ) problem in information theory and consequently propose two
    solutions of ECSQ for network quantization, i.e., uniform quantization and an
    iterative algorithm similar to Lloyd’s algorithm for k-means clustering.
    Finally, using the simple uniform quantization followed by Huffman coding, our
    experiment results show that the compression ratios of 51.25, 22.17 and 40.65
    are achievable (i.e., the sizes of the compressed models are 1.95%, 4.51% and
    2.46% of the original model sizes) for LeNet, ResNet and AlexNet, respectively,
    at no or marginal performance loss.

    Invariant Representations for Noisy Speech Recognition

    Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio
    Comments: 5 pages, 1 figure, 1 table, NIPS workshop on end-to-end speech recognition
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)

    Modern automatic speech recognition (ASR) systems need to be robust under
    acoustic variability arising from environmental, speaker, channel, and
    recording conditions. Ensuring such robustness to variability is a challenge in
    modern day neural network-based ASR systems, especially when all types of
    variability are not seen during training. We attempt to address this problem by
    encouraging the neural network acoustic model to learn invariant feature
    representations. We use ideas from recent research on image generation using
    Generative Adversarial Networks and domain adaptation ideas extending
    adversarial gradient-based training. A recent work from Ganin et al. proposes
    to use adversarial training for image domain adaptation by using an
    intermediate representation from the main target classification network to
    deteriorate the domain classifier performance through a separate neural
    network. Our work focuses on investigating neural architectures which produce
    representations invariant to noise conditions for ASR. We evaluate the proposed
    architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We
    show that our method generalizes better than the standard multi-condition
    training especially when only a few noise categories are seen during training.

    Video Ladder Networks

    Francesco Cricri, Mikko Honkala, Xingyang Ni, Emre Aksu, Moncef Gabbouj
    Comments: Accepted at NIPS 2016 workshop on ML for Spatiotemporal Forecasting. This version of the paper contains results from a longer baseline run
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We present the Video Ladder Network (VLN) for video prediction. VLN is a
    neural encoder-decoder model augmented by both recurrent and feedforward
    lateral connections at all layers. The model achieves competitive results on
    the Moving MNIST dataset while having very simple structure and providing fast
    inference.

    Binary Subspace Coding for Query-by-Image Video Retrieval

    Ruicong Xu, Yang Yang, Yadan Luo, Fumin Shen, Zi Huang, Heng Tao Shen
    Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)

    The query-by-image video retrieval (QBIVR) task has been attracting
    considerable research attention recently. However, most existing methods
    represent a video by either aggregating or projecting all its frames into a
    single datum point, which may easily cause severe information loss. In this
    paper, we propose an efficient QBIVR framework to enable an effective and
    efficient video search with image query. We first define a
    similarity-preserving distance metric between an image and its orthogonal
    projection in the subspace of the video, which can be equivalently transformed
    to a Maximum Inner Product Search (MIPS) problem.

    Besides, to boost the efficiency of solving the MIPS problem, we propose two
    asymmetric hashing schemes, which bridge the domain gap of images and videos.
    The first approach, termed Inner-product Binary Coding (IBC), preserves the
    inner relationships of images and videos in a common Hamming space. To further
    improve the retrieval efficiency, we devise a Bilinear Binary Coding (BBC)
    approach, which employs compact bilinear projections instead of a single large
    projection matrix. Extensive experiments have been conducted on four real-world
    video datasets to verify the effectiveness of our proposed approaches as
    compared to the state-of-the-arts.


    Artificial Intelligence

    Coactive Critiquing: Elicitation of Preferences and Features

    Stefano Teso, Paolo Dragone, Andrea Passerini
    Comments: AAAI’17
    Subjects: Artificial Intelligence (cs.AI)

    When faced with complex choices, users refine their own preference criteria
    as they explore the catalogue of options. In this paper we propose an approach
    to preference elicitation suited for this scenario. We extend Coactive
    Learning, which iteratively collects manipulative feedback, to optionally query
    example critiques. User critiques are integrated into the learning model by
    dynamically extending the feature space. Our formulation natively supports
    constructive learning tasks, where the option catalogue is generated
    on-the-fly. We present an upper bound on the average regret suffered by the
    learner. Our empirical analysis highlights the promise of our approach.

    Cross-Lingual Predicate Mapping Between Linked Data Ontologies

    Gautam Singh, Saemi Jang, Mun Y. Yi
    Comments: 11 pages, 1 figure, 1 table
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Ontologies in different natural languages often differ in quality in terms of
    richness of schema or richness of internal links. This difference is markedly
    visible when comparing a rich English language ontology with a non-English
    language counterpart. Discovering alignment between them is a useful endeavor
    as it serves as a starting point in bridging the disparity. In particular, our
    work is motivated by the absence of inter-language links for predicates in the
    localised versions of DBpedia. In this paper, we propose and demonstrate an
    ad-hoc system to find possible owl:equivalentProperty links between predicates
    in ontologies of different natural languages. We seek to achieve this mapping
    by using pre-existing inter-language links of the resources connected by the
    given predicate. Thus, our methodology stresses on semantic similarity rather
    than lexical. Moreover, through an evaluation, we show that our system is
    capable of outperforming a baseline system that is similar to the one used in
    recent OAEI campaigns.

    On a Well-behaved Relational Generalisation of Rough Set Approximations

    Alexa Gopaulsingh
    Comments: 12 pages
    Subjects: Artificial Intelligence (cs.AI)

    We examine non-dual relational extensions of rough set approximations and
    find an extension which satisfies surprisingly many of the usual rough set
    properties. We then use this definition to give an explanation for an
    observation made by Samanta and Chakraborty in their recent paper [P. Samanta
    and M.K. Chakraborty. Interface of rough set systems and modal logics: A
    survey. extit{Transactions on Rough Sets XIX}, pages 114-137, 2015].

    Fleet Size and Mix Split-Delivery Vehicle Routing

    Arthur Mahéo, Tommaso Urli, Philip Kilby
    Comments: Rich Vehicle Routing, Split Delivery, Fleet Size and Mix, Mixed Integer Programming, Constraint Programming
    Subjects: Artificial Intelligence (cs.AI)

    In the classic Vehicle Routing Problem (VRP) a fleet of of vehicles has to
    visit a set of customers while minimising the operations’ costs. We study a
    rich variant of the VRP featuring split deliveries, an heterogeneous fleet, and
    vehicle-commodity incompatibility constraints. Our goal is twofold: define the
    cheapest routing and the most adequate fleet.

    To do so, we split the problem into two interdependent components: a fleet
    design component and a routing component. First, we define two Mixed Integer
    Programming (MIP) formulations for each component. Then we discuss several
    improvements in the form of valid cuts and symmetry breaking constraints.

    The main contribution of this paper is a comparison of the four resulting
    models for this Rich VRP. We highlight their strengths and weaknesses with
    extensive experiments.

    Finally, we explore a lightweight integration with Constraint Programming
    (CP). We use a fast CP model which gives good solutions and use the solution to
    warm-start our models.

    AI Researchers, Video Games Are Your Friends!

    Julian Togelius
    Comments: in Studies in Computational Intelligence Studies in Computational Intelligence, Volume 669 2017. Springer
    Subjects: Artificial Intelligence (cs.AI)

    If you are an artificial intelligence researcher, you should look to video
    games as ideal testbeds for the work you do. If you are a video game developer,
    you should look to AI for the technology that makes completely new types of
    games possible. This chapter lays out the case for both of these propositions.
    It asks the question “what can video games do for AI”, and discusses how in
    particular general video game playing is the ideal testbed for artificial
    general intelligence research. It then asks the question “what can AI do for
    video games”, and lays out a vision for what video games might look like if we
    had significantly more advanced AI at our disposal. The chapter is based on my
    keynote at IJCCI 2015, and is written in an attempt to be accessible to a broad
    audience.

    Correlation Alignment for Unsupervised Domain Adaptation

    Baochen Sun, Jiashi Feng, Kate Saenko
    Comments: Introduction to CORAL, CORAL-LDA, and Deep CORAL. arXiv admin note: text overlap with arXiv:1511.05547
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

    In this chapter, we present CORrelation ALignment (CORAL), a simple yet
    effective method for unsupervised domain adaptation. CORAL minimizes domain
    shift by aligning the second-order statistics of source and target
    distributions, without requiring any target labels. In contrast to subspace
    manifold methods, it aligns the original feature distributions of the source
    and target domains, rather than the bases of lower-dimensional subspaces. It is
    also much simpler than other distribution matching methods. CORAL performs
    remarkably well in extensive evaluations on standard benchmark datasets. We
    first describe a solution that applies a linear transformation to source
    features to align them with target features before classifier training. For
    linear classifiers, we propose to equivalently apply CORAL to the classifier
    weights, leading to added efficiency when the number of classifiers is small
    but the number and dimensionality of target examples are very high. The
    resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a
    large margin on standard domain adaptation benchmarks. Finally, we extend CORAL
    to learn a nonlinear transformation that aligns correlations of layer
    activations in deep neural networks (DNNs). The resulting Deep CORAL approach
    works seamlessly with DNNs and achieves state-of-the-art performance on
    standard benchmark datasets. Our code is available
    at:~url{this https URL}

    Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

    Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Attention-based neural encoder-decoder frameworks have been widely adopted
    for image captioning. Most methods force visual attention to be active for
    every generated word. However, the decoder likely requires little to no visual
    information from the image to predict non-visual words such as “the” and “of”.
    Other words that may seem visual can often be predicted reliably just from the
    language model e.g., “sign” after “behind a red stop” or “phone” following
    “talking on a cell”. In this paper, we propose a novel adaptive attention model
    with a visual sentinel. At each time step, our model decides whether to attend
    to the image (and if so, to which regions) or to the visual sentinel. The model
    decides whether to attend to the image and where, in order to extract
    meaningful information for sequential word generation. We test our method on
    the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach
    sets the new state-of-the-art by a significant margin.

    Factored Contextual Policy Search with Bayesian Optimization

    Peter Karkus, Andras Kupcsik, David Hsu, Wee Sun Lee
    Comments: BayesOpt 2016, NIPS Workshop on Bayesian Optimization. 5 pages, 2 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)

    Scarce data is a major challenge to scaling robot learning to truly complex
    tasks, as we need to generalize locally learned policies over different
    “contexts”. Bayesian optimization approaches to contextual policy search (CPS)
    offer data-efficient policy learning that generalize over a context space. We
    propose to improve data- efficiency by factoring typically considered contexts
    into two components: target- type contexts that correspond to a desired outcome
    of the learned behavior, e.g. target position for throwing a ball; and
    environment type contexts that correspond to some state of the environment,
    e.g. initial ball position or wind speed. Our key observation is that
    experience can be directly generalized over target-type contexts. Based on that
    we introduce Factored Contextual Policy Search with Bayesian Optimization for
    both passive and active learning settings. Preliminary results show faster
    policy generalization on a simulated toy problem.

    Improving the Performance of Neural Networks in Regression Tasks Using Drawering

    Konrad Zolna
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The method presented extends a given regression neural network to make its
    performance improve. The modification affects the learning procedure only,
    hence the extension may be easily omitted during evaluation without any change
    in prediction. It means that the modified model may be evaluated as quickly as
    the original one but tends to perform better.

    This improvement is possible because the modification gives better expressive
    power, provides better behaved gradients and works as a regularization. The
    knowledge gained by the temporarily extended neural network is contained in the
    parameters shared with the original neural network.

    The only cost is an increase in learning time.


    Information Retrieval

    FMA: A Dataset For Music Analysis

    Kirell Benzi, Michaël Defferrard, Pierre Vandergheynst, Xavier Bresson
    Subjects: Sound (cs.SD); Information Retrieval (cs.IR)

    We present a new music dataset that can be used for several music analysis
    tasks. Our major goal is to go beyond the existing limitations of available
    music datasets, which are either the small size of datasets with raw audio
    tracks, the availability and legality of the music data, or the lack of
    meta-data for artists analysis or song ratings for recommender systems.
    Existing datasets such as GTZAN, TagATune, and Million Song suffer from the
    previous limitations. It is however essential to establish such benchmark
    datasets to advance the field of music analysis, like the ImageNet dataset
    which made possible the large success of deep learning techniques in computer
    vision. In this paper, we introduce the Free Music Archive (FMA) which contains
    77,643 songs and 68 genres spanning 26.9 days of song listening and meta-data
    including artist name, song title, music genre, and track counts. For research
    purposes, we define two additional datasets from the original one: a small
    genre-balanced dataset of 4,000 song data and 10 genres compassing 33.3 hours
    of raw audio and a medium genre-unbalanced dataset of 14,511 data and 20 genres
    offering 5.1 days of track listening, both datasets come with meta-data and
    Echonest audio features. For all datasets, we provide a train-test splitting
    for future algorithms’ comparisons.

    Sub-linear Privacy-preserving Search with Untrusted Server and Semi-honest Parties

    M. Sadegh Riazi, Beidi Chen, Anshumali Shrivastava, Dan Wallach, Farinaz Koushanfar
    Subjects: Cryptography and Security (cs.CR); Databases (cs.DB); Information Retrieval (cs.IR)

    Privacy-preserving Near-neighbor search (PP-NNS) is a well-studied problem in
    the literature. The overwhelming growth in the size of current datasets and the
    lack of any truly secure server in the online world render the existing
    solutions impractical either due to their high computational requirements or
    the non-realistic assumptions which potentially compromise privacy. PP-NNS with
    multiple (semi-honest) data owners having query time sub-linear in the number
    of users has been proposed as an open research direction. In this paper, we
    provide the first such algorithm which has a sub-linear query time and the
    ability to handle semi-honest (honest but curious) parties. Our algorithm can
    further manage the situation where a large chunk of the server information is
    being compromised. Probabilistic embedding based on Locality Sensitive Hashing
    (LSH) is the algorithm of choice for sub-linear near-neighbor search in high
    dimensions. However, we show that LSH is not suitable for semi-honest setting,
    and particularly when the server information is compromisable. LSH allows
    estimation of any pairwise distances between users, which can be easily
    compromised to learn user attributes using the idea of “triangulation”. We
    suggest a novel methodology which overcomes this LSH vulnerability. At the
    heart of our proposal lies a secure probabilistic embedding scheme generated
    from a novel probabilistic transformation over appropriate LSH family. Our
    secure embeddings combined with advances in multi-party computation result in
    an efficient PP-NNS algorithm suitable for massive-scale datasets without
    strong assumptions on the behavior of parties involved. We demonstrate the
    validity of our claims by experimentally showing the effectiveness of our
    solution in hiding the sensitive variables in medical records without
    compromising the precision-recall of the retrieval.

    Revisiting Winner Take All (WTA) Hashing for Sparse Datasets

    Beidi Chen, Anshumali Shrivastava
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

    WTA (Winner Take All) hashing has been successfully applied in many large
    scale vision applications. This hashing scheme was tailored to take advantage
    of the comparative reasoning (or order based information), which showed
    significant accuracy improvements. In this paper, we identify a subtle issue
    with WTA, which grows with the sparsity of the datasets. This issue limits the
    discriminative power of WTA. We then propose a solution for this problem based
    on the idea of Densification which provably fixes the issue. Our experiments
    show that Densified WTA Hashing outperforms Vanilla WTA both in image
    classification and retrieval tasks consistently and significantly.


    Computation and Language

    Invariant Representations for Noisy Speech Recognition

    Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio
    Comments: 5 pages, 1 figure, 1 table, NIPS workshop on end-to-end speech recognition
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)

    Modern automatic speech recognition (ASR) systems need to be robust under
    acoustic variability arising from environmental, speaker, channel, and
    recording conditions. Ensuring such robustness to variability is a challenge in
    modern day neural network-based ASR systems, especially when all types of
    variability are not seen during training. We attempt to address this problem by
    encouraging the neural network acoustic model to learn invariant feature
    representations. We use ideas from recent research on image generation using
    Generative Adversarial Networks and domain adaptation ideas extending
    adversarial gradient-based training. A recent work from Ganin et al. proposes
    to use adversarial training for image domain adaptation by using an
    intermediate representation from the main target classification network to
    deteriorate the domain classifier performance through a separate neural
    network. Our work focuses on investigating neural architectures which produce
    representations invariant to noise conditions for ASR. We evaluate the proposed
    architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We
    show that our method generalizes better than the standard multi-condition
    training especially when only a few noise categories are seen during training.

    Condensed Memory Networks for Clinical Diagnostic Inferencing

    Aaditya Prakash, Siyuan Zhao, Sadid A. Hasan, Vivek Datla, Kathy Lee, Ashequl Qadir, Joey Liu, Oladimeji Farri
    Comments: Accepted to AAAI 2017
    Subjects: Computation and Language (cs.CL)

    Diagnosis of a clinical condition is a challenging task, which often requires
    significant medical investigation. Previous work related to diagnostic
    inferencing problems mostly consider multivariate observational data (e.g.
    physiological signals, lab tests etc.). In contrast, we explore the problem
    using free-text medical notes recorded in an electronic health record (EHR).
    Complex tasks like these can benefit from structured knowledge bases, but those
    are not scalable. We instead exploit raw text from Wikipedia as a knowledge
    source. Memory networks have been demonstrated to be effective in tasks which
    require comprehension of free-form text. They use the final iteration of the
    learned representation to predict probable classes. We introduce condensed
    memory neural networks (C-MemNNs), a novel model with iterative condensation of
    memory representations that preserves the hierarchy of features in the memory.
    Experiments on the MIMIC-III dataset show that the proposed model outperforms
    other variants of memory networks to predict the most probable diagnoses given
    a complex clinical scenario.

    Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

    Alexandre Berard, Olivier Pietquin, Christophe Servan, Laurent Besacier
    Comments: accepted to NIPS workshop on End-to-end Learning for Speech and Audio Processing
    Subjects: Computation and Language (cs.CL)

    This paper proposes a first attempt to build an end-to-end speech-to-text
    translation system, which does not use source language transcription during
    learning or decoding. We propose a model for direct speech-to-text translation,
    which gives promising results on a small French-English synthetic corpus.
    Relaxing the need for source language transcription would drastically change
    the data collection methodology in speech translation, especially in
    under-resourced scenarios. For instance, in the former project DARPA TRANSTAC
    (speech translation from spoken Arabic dialects), a large effort was devoted to
    the collection of speech transcripts (and a prerequisite to obtain transcripts
    was often a detailed transcription guide for languages with little standardized
    spelling). Now, if end-to-end approaches for speech-to-text translation are
    successful, one might consider collecting data by asking bilingual speakers to
    directly utter speech in the source language from target language text
    utterances. Such an approach has the advantage to be applicable to any
    unwritten (source) language.

    Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots

    Yu Wu, Wei Wu, Ming Zhou, Zhoujun Li
    Subjects: Computation and Language (cs.CL)

    We study response selection for multi-turn conversation in retrieval based
    chatbots. Existing works either ignores relationships among utterances, or
    misses important information in context when matching a response with a highly
    abstract context vector finally. We propose a new session based matching model
    to address both problems. The model first matches a response with each
    utterance on multiple granularities, and distills important matching
    information from each pair as a vector with convolution and pooling operations.
    The vectors are then accumulated in a chronological order through a recurrent
    neural network (RNN) which models the relationships among the utterances. The
    final matching score is calculated with the hidden states of the RNN. Empirical
    study on two public data sets shows that our model can significantly outperform
    the state-of-the-art methods for response selection in multi-turn conversation.

    The Evolution of Sentiment Analysis – A Review of Research Topics, Venues, and Top Cited Papers

    Mika Viking Mäntylä, Daniel Graziotin, Miikka Kuutila
    Comments: 32 pages, 14 figures
    Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Social and Information Networks (cs.SI)

    Research in sentiment analysis is increasing at a fast pace making it
    challenging to keep track of all the activities in the area. We present a
    computer-assisted literature review and analyze 5,163 papers from Scopus. We
    find that the roots of sentiment analysis are in studies on public opinion
    analysis at the start of 20th century, but the outbreak of computer-based
    sentiment analysis only occurred with the availability of subjective texts in
    the Web. Consequently, 99% of the papers have been published after 2005.
    Sentiment analysis papers are scattered to multiple publication venues and the
    combined number of papers in the top-15 venues only represent 29% of the papers
    in total. In recent years, sentiment analysis has shifted from analyzing online
    product reviews to social media texts from Twitter and Facebook. We created a
    taxonomy of research topics with text mining and qualitative coding. A
    meaningful future for sentiment analysis could be in ensuring the authenticity
    of public opinions, and detecting fake news.

    Cross-Lingual Predicate Mapping Between Linked Data Ontologies

    Gautam Singh, Saemi Jang, Mun Y. Yi
    Comments: 11 pages, 1 figure, 1 table
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Ontologies in different natural languages often differ in quality in terms of
    richness of schema or richness of internal links. This difference is markedly
    visible when comparing a rich English language ontology with a non-English
    language counterpart. Discovering alignment between them is a useful endeavor
    as it serves as a starting point in bridging the disparity. In particular, our
    work is motivated by the absence of inter-language links for predicates in the
    localised versions of DBpedia. In this paper, we propose and demonstrate an
    ad-hoc system to find possible owl:equivalentProperty links between predicates
    in ontologies of different natural languages. We seek to achieve this mapping
    by using pre-existing inter-language links of the resources connected by the
    given predicate. Thus, our methodology stresses on semantic similarity rather
    than lexical. Moreover, through an evaluation, we show that our system is
    capable of outperforming a baseline system that is similar to the one used in
    recent OAEI campaigns.


    Distributed, Parallel, and Cluster Computing

    Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations

    Tobias Wicky, Edgar Solomonik, Torsten Hoefler
    Comments: 10 pages, 2 figures, submitted to IPDPS 2017
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    We present a new parallel algorithm for solving triangular systems with
    multiple right hand sides (TRSM). TRSM is used extensively in numerical linear
    algebra computations, both to solve triangular linear systems of equations as
    well as to compute factorizations with triangular matrices, such as Cholesky,
    LU, and QR. Our algorithm achieves better theoretical scalability than known
    alternatives, while maintaining numerical stability, via selective use of
    triangular matrix inversion. We leverage the fact that triangular inversion and
    matrix multiplication are more parallelizable than the standard TRSM algorithm.
    By only inverting triangular blocks along the diagonal of the initial matrix,
    we generalize the usual way of TRSM computation and the full matrix inversion
    approach. This flexibility leads to an efficient algorithm for any ratio of the
    number of right hand sides to the triangular matrix dimension. We provide a
    detailed communication cost analysis for our algorithm as well as for the
    recursive triangular matrix inversion. This cost analysis makes it possible to
    determine optimal block sizes and processor grids a priori. Relative to the
    best known algorithms for TRSM, our approach can require asymptotically fewer
    messages, while performing optimal amounts of communication and computation.

    Transient Provisioning for Cloud Computing Platforms

    Brendan Patch, Thomas Taimre
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    User demand on the computational resources of cloud computing platforms
    varies over time. These variations in demand can be predictable or
    unpredictable, resulting in time-varying and `bursty’ fluctuations in demand.
    Furthermore, demand can arrive in batches, and users whose demands are not met
    can be impatient. We demonstrate how to compute the expected revenue loss over
    a finite time horizon in the presence of all these model characteristics
    through the use of matrix analytic methods. We then illustrate how to use this
    knowledge to make frequent short term provisioning decisions — transient
    provisioning. It is seen that taking each of the characteristics of fluctuating
    user demand (predictable, unpredictable, batchy) into account can result in a
    substantial reduction of losses. Moreover, our transient provisioning framework
    allows for a wide variety of system behaviors to be modeled and gives simple
    expressions for expected revenue loss which are straightforward to evaluate
    numerically.

    An Improved One-to-All Broadcasting in Higher Dimensional Eisenstein-Jacobi Networks

    Zaid Hussain
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Recently, a higher dimensional Eisenstein-Jacobi (EJ) networks, has been
    proposed in [22], which is shown that they have better average distance with
    more number of nodes than a single dimensional EJ net- works. Some
    communication algorithms such as one-to-all and all-to-all communications are
    well known and used in interconnection networks. In one-to-all communication, a
    source node sends a message to every other node in the network. Whereas, in
    all-to-all communication, every node is considered as a source node and sends
    its message to every other node in the network. In this paper, an improved
    one-to-all communication algorithm in higher dimensional EJ networks is
    presented. The paper shows that the pro- posed algorithm achieves a lower
    average number of steps to receiving the broadcasted message. In addition,
    since the links are assumed to be half- duplex, the all-to-all broadcasting
    algorithm is divided into three phases. The simulation results are discussed
    and showed that the improved one- to-all algorithm achieves better traffic
    performance than the well-known one-to-all algorithm and has 2.7% less total
    number of senders

    Study of shoplifting prevention using image analysis and ERP check

    Yoji Yamato, Yoshifumi Fukumoto, Hiroki Kumazaki
    Comments: 4 pages, in Japanese, 2 figures, IEICE Technical Report, SC2016-14, Aug. 2016
    Journal-ref: IEICE Technical Report, SC2016-14, Aug. 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper, we propose a SaaS service which prevents shoplifting using
    image analysis and ERP. In Japan, total damage of shoplifting reaches 450
    billion yen and more than 1000 small shops gave up their businesses because of
    shoplifting. Based on recent cloud technology and data analysis technology, we
    propose a shoplifting prevention service with image analysis of security camera
    and ERP data check for small shops. We evaluated stream analysis of security
    camera movie using online machine learining framework Jubatus.


    Learning

    Local Group Invariant Representations via Orbit Embeddings

    Anant Raj, Abhishek Kumar, Youssef Mroueh, P. Thomas Fletcher, Bernhard Sch"olkopf
    Comments: 20 pages, 1 figure
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Invariance to nuisance transformations is one of the desirable properties of
    effective representations. We consider transformations that form a emph{group}
    and propose an approach based on kernel methods to derive local group invariant
    representations. Locality is achieved by defining a suitable probability
    distribution over the group which in turn induces distributions in the input
    feature space. We learn a decision function over these distributions by
    appealing to the powerful framework of kernel methods and generate local
    invariant random feature maps via kernel approximations. We show uniform
    convergence bounds for kernel approximation and provide excess risk bounds for
    learning with these features. We evaluate our method on three real datasets,
    including Rotated MNIST and CIFAR-10, and observe that it outperforms competing
    kernel based approaches. The proposed method also outperforms deep CNN on
    Rotated-MNIST and performs comparably to the recently proposed
    group-equivariant CNN.

    Combinatorial semi-bandit with known covariance

    Rémy Degenne, Vianney Perchet
    Comments: in NIPS 2016 (Conference on Neural Information Processing Systems), Dec 2016, Barcelona, Spain
    Subjects: Learning (cs.LG)

    The combinatorial stochastic semi-bandit problem is an extension of the
    classical multi-armed bandit problem in which an algorithm pulls more than one
    arm at each stage and the rewards of all pulled arms are revealed. One
    difference with the single arm variant is that the dependency structure of the
    arms is crucial. Previous works on this setting either used a worst-case
    approach or imposed independence of the arms. We introduce a way to quantify
    the dependency structure of the problem and design an algorithm that adapts to
    it. The algorithm is based on linear regression and the analysis develops
    techniques from the linear bandit literature. By comparing its performance to a
    new lower bound, we prove that it is optimal, up to a poly-logarithmic factor
    in the number of pulled arms.

    Control Matching via Discharge Code Sequences

    Dang Nguyen, Wei Luo, Dinh Phung, Svetha Venkatesh
    Comments: 5 pages
    Subjects: Learning (cs.LG)

    In this paper, we consider the patient similarity matching problem over a
    cancer cohort of more than 220,000 patients. Our approach first leverages on
    Word2Vec framework to embed ICD codes into vector-valued representation. We
    then propose a sequential algorithm for case-control matching on this
    representation space of diagnosis codes. The novel practice of applying the
    sequential matching on the vector representation lifted the matching accuracy
    measured through multiple clinical outcomes. We reported the results on a
    large-scale dataset to demonstrate the effectiveness of our method. For such a
    large dataset where most clinical information has been codified, the new method
    is particularly relevant.

    Video Ladder Networks

    Francesco Cricri, Mikko Honkala, Xingyang Ni, Emre Aksu, Moncef Gabbouj
    Comments: Accepted at NIPS 2016 workshop on ML for Spatiotemporal Forecasting. This version of the paper contains results from a longer baseline run
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We present the Video Ladder Network (VLN) for video prediction. VLN is a
    neural encoder-decoder model augmented by both recurrent and feedforward
    lateral connections at all layers. The model achieves competitive results on
    the Moving MNIST dataset while having very simple structure and providing fast
    inference.

    Factored Contextual Policy Search with Bayesian Optimization

    Peter Karkus, Andras Kupcsik, David Hsu, Wee Sun Lee
    Comments: BayesOpt 2016, NIPS Workshop on Bayesian Optimization. 5 pages, 2 figures
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)

    Scarce data is a major challenge to scaling robot learning to truly complex
    tasks, as we need to generalize locally learned policies over different
    “contexts”. Bayesian optimization approaches to contextual policy search (CPS)
    offer data-efficient policy learning that generalize over a context space. We
    propose to improve data- efficiency by factoring typically considered contexts
    into two components: target- type contexts that correspond to a desired outcome
    of the learned behavior, e.g. target position for throwing a ball; and
    environment type contexts that correspond to some state of the environment,
    e.g. initial ball position or wind speed. Our key observation is that
    experience can be directly generalized over target-type contexts. Based on that
    we introduce Factored Contextual Policy Search with Bayesian Optimization for
    both passive and active learning settings. Preliminary results show faster
    policy generalization on a simulated toy problem.

    Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses

    Haiping Huang
    Comments: 23 pages, 9 figures
    Subjects: Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    Revealing hidden features in unlabeled data is called unsupervised feature
    learning, which plays an important role in pretraining a deep neural network.
    Here we provide a statistical mechanics analysis of the unsupervised learning
    in a restricted Boltzmann machine with binary synapses. A message passing
    equation to infer the hidden feature is derived, and furthermore, variants of
    this equation are analyzed. A statistical analysis by replica theory describes
    the thermodynamic properties of the model. Our analysis confirms an entropy
    crisis preceding the non-convergence of the message passing equation,
    suggesting a discontinuous phase transition as a key characteristic of the
    restricted Boltzmann machine. Continuous phase transition is also confirmed
    depending on the embedded feature strength in the data. The mean-field result
    under the replica symmetric assumption agrees with that obtained by running
    message passing algorithms on single instances of finite sizes. Interestingly,
    in an approximate Hopfield model, the entropy crisis is absent, and a
    continuous phase transition is observed instead. We also develop an iterative
    equation to infer the hyper-parameter (temperature) hidden in the data, which
    in physics corresponds to iteratively imposing Nishimori condition. Our study
    provides insights towards understanding the thermodynamic properties of the
    restricted Boltzmann machine learning, and moreover important theoretical basis
    to build simplified deep networks.

    Efficient Non-oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee

    Yi Xu, Haiqin Yang, Lijun Zhang, Tianbao Yang
    Subjects: Learning (cs.LG)

    In this paper, we address learning problems for high dimensional data.
    Previously, oblivious random projection based approaches that project high
    dimensional features onto a random subspace have been used in practice for
    tackling high-dimensionality challenge in machine learning. Recently, various
    non-oblivious randomized reduction methods have been developed and deployed for
    solving many numerical problems such as matrix product approximation, low-rank
    matrix approximation, etc. However, they are less explored for the machine
    learning tasks, e.g., classification. More seriously, the theoretical analysis
    of excess risk bounds for risk minimization, an important measure of
    generalization performance, has not been established for non-oblivious
    randomized reduction methods. It therefore remains an open problem what is the
    benefit of using them over previous oblivious random projection based
    approaches. To tackle these challenges, we propose an algorithmic framework for
    employing non-oblivious randomized reduction method for general empirical risk
    minimizing in machine learning tasks, where the original high-dimensional
    features are projected onto a random subspace that is derived from the data
    with a small matrix approximation error. We then derive the first excess risk
    bound for the proposed non-oblivious randomized reduction approach without
    requiring strong assumptions on the training data. The established excess risk
    bound exhibits that the proposed approach provides much better generalization
    performance and it also sheds more insights about different randomized
    reduction approaches. Finally, we conduct extensive experiments on both
    synthetic and real-world benchmark datasets, whose dimension scales to
    (O(10^7)), to demonstrate the efficacy of our proposed approach.

    Improving the Performance of Neural Networks in Regression Tasks Using Drawering

    Konrad Zolna
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    The method presented extends a given regression neural network to make its
    performance improve. The modification affects the learning procedure only,
    hence the extension may be easily omitted during evaluation without any change
    in prediction. It means that the modified model may be evaluated as quickly as
    the original one but tends to perform better.

    This improvement is possible because the modification gives better expressive
    power, provides better behaved gradients and works as a regularization. The
    knowledge gained by the temporarily extended neural network is contained in the
    parameters shared with the original neural network.

    The only cost is an increase in learning time.

    Core Sampling Framework for Pixel Classification

    Manohar Karki, Robert DiBiano, Saikat Basu, Supratik Mukhopadhyay
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    The intermediate map responses of a Convolutional Neural Network (CNN)
    contain information about an image that can be used to extract contextual
    knowledge about it. In this paper, we present a core sampling framework that is
    able to use these activation maps from several layers as features to another
    neural network using transfer learning to provide an understanding of an input
    image. Our framework creates a representation that combines features from the
    test data and the contextual knowledge gained from the responses of a
    pretrained network, processes it and feeds it to a separate Deep Belief
    Network. We use this representation to extract more information from an image
    at the pixel level, hence gaining understanding of the whole image. We
    experimentally demonstrate the usefulness of our framework using a pretrained
    VGG-16 model to perform segmentation on the BAERI dataset of Synthetic Aperture
    Radar(SAR) imagery and the CAMVID dataset.

    Segmental Convolutional Neural Networks for Detection of Cardiac Abnormality With Noisy Heart Sound Recordings

    Yuhao Zhang, Sandeep Ayyar, Long-Huei Chen, Ethan J. Li
    Comments: This work was finished in May 2016, and remains unpublished until December 2016 due to a request from the data provider
    Subjects: Sound (cs.SD); Learning (cs.LG); Machine Learning (stat.ML)

    Heart diseases constitute a global health burden, and the problem is
    exacerbated by the error-prone nature of listening to and interpreting heart
    sounds. This motivates the development of automated classification to screen
    for abnormal heart sounds. Existing machine learning-based systems achieve
    accurate classification of heart sound recordings but rely on expert features
    that have not been thoroughly evaluated on noisy recordings. Here we propose a
    segmental convolutional neural network architecture that achieves automatic
    feature learning from noisy heart sound recordings. Our experiments show that
    our best model, trained on noisy recording segments acquired with an existing
    hidden semi-markov model-based approach, attains a classification accuracy of
    87.5% on the 2016 PhysioNet/CinC Challenge dataset, compared to the 84.6%
    accuracy of the state-of-the-art statistical classifier trained and evaluated
    on the same dataset. Our results indicate the potential of using neural
    network-based methods to increase the accuracy of automated classification of
    heart sound recordings for improved screening of heart diseases.

    Semi-Supervised Learning with the Deep Rendering Mixture Model

    Tan Nguyen, Wanjia Liu, Ethan Perez, Richard G. Baraniuk, Ankit B. Patel
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Semi-supervised learning algorithms reduce the high cost of acquiring labeled
    training data by using both labeled and unlabeled data during learning. Deep
    Convolutional Networks (DCNs) have achieved great success in supervised tasks
    and as such have been widely employed in the semi-supervised learning. In this
    paper we leverage the recently developed Deep Rendering Mixture Model (DRMM), a
    probabilistic generative model that models latent nuisance variation, and whose
    inference algorithm yields DCNs. We develop an EM algorithm for the DRMM to
    learn from both labeled and unlabeled data. Guided by the theory of the DRMM,
    we introduce a novel non-negativity constraint and a variational inference
    term. We report state-of-the-art performance on MNIST and SVHN and competitive
    results on CIFAR10. We also probe deeper into how a DRMM trained in a
    semi-supervised setting represents latent nuisance variation using
    synthetically rendered images. Taken together, our work provides a unified
    framework for supervised, unsupervised, and semi-supervised learning.

    A Probabilistic Framework for Deep Learning

    Ankit B. Patel, Tan Nguyen, Richard G. Baraniuk
    Comments: arXiv admin note: substantial text overlap with arXiv:1504.00641
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    We develop a probabilistic framework for deep learning based on the Deep
    Rendering Mixture Model (DRMM), a new generative probabilistic model that
    explicitly capture variations in data due to latent task nuisance variables. We
    demonstrate that max-sum inference in the DRMM yields an algorithm that exactly
    reproduces the operations in deep convolutional neural networks (DCNs),
    providing a first principles derivation. Our framework provides new insights
    into the successes and shortcomings of DCNs as well as a principled route to
    their improvement. DRMM training via the Expectation-Maximization (EM)
    algorithm is a powerful alternative to DCN back-propagation, and initial
    training results are promising. Classification based on the DRMM and other
    variants outperforms DCNs in supervised digit classification, training 2-3x
    faster while achieving similar accuracy. Moreover, the DRMM is applicable to
    semi-supervised and unsupervised learning tasks, achieving results that are
    state-of-the-art in several categories on the MNIST benchmark and comparable to
    state of the art on the CIFAR10 benchmark.

    Invariant Representations for Noisy Speech Recognition

    Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio
    Comments: 5 pages, 1 figure, 1 table, NIPS workshop on end-to-end speech recognition
    Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)

    Modern automatic speech recognition (ASR) systems need to be robust under
    acoustic variability arising from environmental, speaker, channel, and
    recording conditions. Ensuring such robustness to variability is a challenge in
    modern day neural network-based ASR systems, especially when all types of
    variability are not seen during training. We attempt to address this problem by
    encouraging the neural network acoustic model to learn invariant feature
    representations. We use ideas from recent research on image generation using
    Generative Adversarial Networks and domain adaptation ideas extending
    adversarial gradient-based training. A recent work from Ganin et al. proposes
    to use adversarial training for image domain adaptation by using an
    intermediate representation from the main target classification network to
    deteriorate the domain classifier performance through a separate neural
    network. Our work focuses on investigating neural architectures which produce
    representations invariant to noise conditions for ASR. We evaluate the proposed
    architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We
    show that our method generalizes better than the standard multi-condition
    training especially when only a few noise categories are seen during training.

    Distributed Gaussian Learning over Time-varying Directed Graphs

    Angelia Nedić, Alex Olshevsky, César A. Uribe
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (cs.SY); Machine Learning (stat.ML)

    We present a distributed (non-Bayesian) learning algorithm for the problem of
    parameter estimation with Gaussian noise. The algorithm is expressed as
    explicit updates on the parameters of the Gaussian beliefs (i.e. means and
    precision). We show a convergence rate of (O(1/k)) with the constant term
    depending on the number of agents and the topology the network. Moreover, we
    show almost sure convergence to the optimal solution of the estimation problem
    for the general case of time-varying directed graphs.

    Deterministic and Probabilistic Conditions for Finite Completability of Low Rank Tensor

    Morteza Ashraphijuo, Vaneet Aggarwal, Xiaodong Wang
    Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT); Learning (cs.LG)

    We investigate the fundamental conditions on the sampling pattern, i.e.,
    locations of the sampled entries, for finite completability of a low-rank
    tensor given some components of its Tucker rank. In order to find the
    deterministic necessary and sufficient conditions, we propose an algebraic
    geometric analysis on the Tucker manifold, which allows us to incorporate
    multiple rank components in the proposed analysis in contrast with the
    conventional geometric approaches on the Grassmannian manifold. This analysis
    characterizes the algebraic independence of a set of polynomials defined based
    on the sampling pattern, which is closely related to finite completion.
    Probabilistic conditions are then studied and a lower bound on the sampling
    probability is given, which guarantees that the proposed deterministic
    conditions on the sampling patterns for finite completability hold with high
    probability. Furthermore, using the proposed geometric approach for finite
    completability, we propose a sufficient condition on the sampling pattern that
    ensures there exists exactly one completion for the sampled tensor.

    Towards the Limit of Network Quantization

    Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    Network quantization is one of network compression techniques employed to
    reduce the redundancy of deep neural networks. It compresses the size of the
    storage for a large number of network parameters in a neural network by
    quantizing them and encoding the quantized values into binary codewords of
    smaller sizes. In this paper, we aim to design network quantization schemes
    that minimize the expected loss due to quantization while maximizing the
    compression ratio. To this end, we analyze the quantitative relation of
    quantization errors to the loss function of a neural network and identify that
    the Hessian-weighted distortion measure is locally the right objective function
    that we need to optimize for minimizing the loss due to quantization. As a
    result, Hessian-weighted k-means clustering is proposed for clustering network
    parameters to quantize when fixed-length binary encoding follows. When optimal
    variable-length binary codes, e.g., Huffman codes, are employed for further
    compression of quantized values after clustering, we derive that the network
    quantization problem can be related to the entropy-constrained scalar
    quantization (ECSQ) problem in information theory and consequently propose two
    solutions of ECSQ for network quantization, i.e., uniform quantization and an
    iterative algorithm similar to Lloyd’s algorithm for k-means clustering.
    Finally, using the simple uniform quantization followed by Huffman coding, our
    experiment results show that the compression ratios of 51.25, 22.17 and 40.65
    are achievable (i.e., the sizes of the compressed models are 1.95%, 4.51% and
    2.46% of the original model sizes) for LeNet, ResNet and AlexNet, respectively,
    at no or marginal performance loss.


    Information Theory

    Statistical Mechanics of MAP Estimation: General Replica Ansatz

    Ali Bereyhi, Ralf R. Müller, Hermann Schulz-Baldes
    Comments: 77 pages, 13 Figures, Submitted to IEEE Transactions on Information Theory
    Subjects: Information Theory (cs.IT)

    The large-system performance of MAP estimation is studied considering a
    general distortion function when the observation vector is received through a
    linear system with additive white Gaussian noise. The analysis considers the
    system matrix to be chosen from a large class of random ensembles. We take a
    statistical mechanical approach by introducing a spin glass corresponding to
    the estimator, and employing the replica method for the large-system analysis.
    In contrast to earlier replica based studies, our analysis evaluates the
    general replica ansatz of the corresponding spin glass and determines the
    asymptotic distortion of the estimator for any structure of the replica
    correlation matrix. Consequently, the replica symmetric as well as the replica
    symmetry breaking ansatz with (b) steps of breaking is deduced from the given
    general replica ansatz. The generality of our distortion function lets us
    derive a more general form of the maximum-a-posterior decoupling principle.
    Based on the general replica ansatz, we show that for any structure of the
    replica correlation matrix, the vector-valued system decouples into a bank of
    equivalent decoupled linear systems followed by maximum-a-posterior estimators.
    The structure of the decoupled linear system is further studied under both the
    replica symmetry and the replica symmetry breaking assumptions. For (b) steps
    of symmetry breaking, the decoupled system is found to be an additive system
    with a noise term given as the sum of an independent Gaussian random variable
    with (b) correlated impairment terms. The general decoupling property of the
    maximum-a-posterior estimator leads to the idea of a replica simulator which
    represents the replica ansatz through the state evolution of a transition
    system described by its corresponding decoupled system. As an application of
    our study, we investigate large compressive sensing systems.

    Distributed Detection in Ad Hoc Networks Through Quantized Consensus-Part II: Asymptotically Optimal Detection via One-Bit Communications

    Shengyu Zhu, Biao Chen
    Comments: Submitted to IEEE Trans. Inf. Theory. See this http URL for the first part focusing on quantized consensus algorithms
    Subjects: Information Theory (cs.IT)

    This paper considers asymptotic performance of distributed detection in large
    connected sensor networks. Contrasting to canonical parallel networks where a
    single node has access to local decisions from all other sensors, each node can
    only exchange information with its direct neighbors in the present setting. We
    establish that, with each node employing an identical one-bit quantizer for
    local information exchange, a novel consensus reaching approach can achieve the
    optimal asymptotic performance of centralized detection. The statement is true
    under three different detection frameworks: the Bayesian criterion where the
    maximum a posteriori detector is optimal; the Neyman-Pearson criterion with
    both a constant and an exponential constraint on the type-I error probability.
    The key to achieving the optimal asymptotic performance is the use of a one-bit
    quantizer with controllable threshold that results in desired consensus error
    bounds. In addition, we examine non-asymptotic performance of the proposed
    approach and show that the type-I and type-II error probabilities at each node
    can be made arbitrarily close to the centralized ones simultaneously when a
    continuity condition is satisfied.

    Relative generalized matrix weights of matrix codes for universal security on wire-tap networks

    Umberto Martínez-Peñas, Ryutaroh Matsumoto
    Comments: 18 pages, LaTeX; Parts of this manuscript have been presented at the 54th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2016. Conference version available at arXiv:1607.01263
    Subjects: Information Theory (cs.IT)

    Universal security over a network with linear network coding has been
    intensively studied. However, previous linear codes and code pairs used for
    this purpose were linear over a larger field than that used on the network. In
    this work, we introduce new parameters (relative dimension/rank support profile
    and relative generalized matrix weights) for code pairs that are linear over
    the field used in the network, measuring the universal security performance of
    these code pairs. For one code and non-square matrices, generalized metrix
    weights coincide with the existing Delsarte generalized weights, hence we prove
    the conection between these latter weights and secure network coding. The
    proposed new parameters enable us to use optimal universal secure linear codes
    on noiseless networks for all possible parameters, as opposed to previous
    works, and also enable us to add universal security to the recently proposed
    list-decodable rank-metric codes by Guruswami et al. We give several properties
    of the new parameters: monotonicity, Singleton-type lower and upper bounds, a
    duality theorem, and definitions and characterizations of equivalences and
    degenerateness of linear codes. Finally, we show that our parameters strictly
    extend relative dimension/length profile and relative generalized Hamming
    weights, respectively, and relative dimension/intersection profile and relative
    generalized rank weights, respectively. The duality theorems for generalized
    Hamming weights and generalized rank weights can be deduced as special cases of
    the duality theorem for generalized matrix weights.

    FoCUS: Fourier-based Coded Ultrasound

    Almog Lahav, Tanya Chernyakova (Student Member, IEEE), Yonina C. Eldar (Fellow, IEEE)
    Subjects: Information Theory (cs.IT)

    Modern imaging systems typically use single-carrier short pulses for
    transducer excitation. Coded signals together with pulse compression are
    successfully used in radar and communication to increase the amount of
    transmitted energy. Previous research verified significant improvement in SNR
    and imaging depth for ultrasound imaging with coded signals. Since pulse
    compression needs to be applied at each transducer element, the implementation
    of coded excitation (CE) in array imaging is computationally complex. Applying
    pulse compression on the beamformer output reduces the computational load but
    also degrades both the axial and lateral point spread function (PSF)
    compromising image quality. In this work we present an approach for efficient
    implementation of pulse compression by integrating it into frequency domain
    beamforming. This method leads to significant reduction in the amount of
    computations without affecting axial resolution. The lateral resolution is
    dictated by the factor of savings in computational load. We verify the
    performance of our method on a Verasonics imaging system and compare the
    resulting images to time-domain processing. We show that up to 77 fold
    reduction in computational complexity can be achieved in a typical imaging
    setups. The efficient implementation makes CE a feasible approach in array
    imaging paving the way to enhanced SNR as well as improved imaging depth and
    frame-rate.

    Weighted Matrix Completion and Recovery with Prior Subspace Information

    Armin Eftekhari, Dehui Yang, Michael B. Wakin
    Subjects: Information Theory (cs.IT)

    A low-rank matrix with “diffuse” entries can be efficiently reconstructed
    after observing a few of its entries, at random, and then solving a convex
    program. In many applications, in addition to these measurements, potentially
    valuable prior knowledge about the column and row spaces of the matrix is also
    available to the practitioner. In this paper, we incorporate this prior
    knowledge in matrix completion—by minimizing a weighted nuclear norm—and
    precisely quantify any improvements. In particular, in theory, we find that
    reliable prior knowledge reduces the sample complexity of matrix completion by
    a logarithmic factor; the observed improvement is considerably more magnified
    in numerical simulations. We also present similar results for the closely
    related problem of matrix recovery from generic linear measurements.

    On High-Order Capacity Statistics of Spectrum Aggregation Systems over (κ)-(μ) and (κ)-(μ) shadowed Fading Channels

    Jiayi Zhang, Xiaoyu Chen, Kostas P. Peppas, Xu Li, Ying Liu
    Comments: to appear in IEEE Transactions on Communications
    Subjects: Information Theory (cs.IT)

    The frequency scarcity imposed by fast growing demand for mobile data service
    requires promising spectrum aggregation systems. The so-called higher-order
    statistics (HOS) of the channel capacity is a suitable metric on the system
    performance. While prior relevant works have improved our knowledge on the HOS
    characterization of spectrum aggregation systems, an analytical framework
    encompassing generalized fading models of interest is not yet available. In
    this paper, we pursue a detailed HOS analysis of (kappa)-(mu) and
    (kappa)-(mu) shadowed fading channels by deriving novel and exact
    expressions. Furthermore, the simplified HOS expressions for the asymptotically
    low and high signal-to-noise regimes are derived. Several important statistical
    measures, such as amount of fading, amount of dispersion, reliability,
    skewness, and kurtosis, are obtained by using the HOS results. More
    importantly, the useful implications of system and fading parameters on
    spectrum aggregation systems are investigated for channel selection. Finally,
    all derived expressions are validated via Monte-Carlo simulations.

    MIMO Secret Communications Against an Active Eavesdropper

    Lingxiang Li, Athina P. Petropulu, Zhi Chen
    Comments: 13 pages, 9 figures
    Subjects: Information Theory (cs.IT)

    This paper considers a scenario in which an Alice-Bob pair wishes to
    communicate in secret in the presence of an active Eve, who is capable of
    jamming as well as eavesdropping in Full-Duplex (FD) mode. As countermeasure,
    Bob also operates in FD mode, using a subset of its antennas to act as
    receiver, and the remaining antennas to act as jammer and transmit noise. With
    a goal to maximize the achievable secrecy degrees of freedom (S.D.o.F.) of the
    system, we provide the optimal transmit/receive antennas allocation at Bob,
    based on which we determine in closed form the maximum achievable S.D.o.F.. We
    further investigate the adverse scenario in which Eve knows Bob’s transmission
    strategy and optimizes its transmit/receive antennas allocation in order to
    minimize the achievable S.D.o.F.. For that case we find the worst-case
    achievable S.D.o.F.. We also provide a method for constructing the precoding
    matrices of Alice and Bob, based on which the maximum S.D.o.F. can be achieved.
    Numerical results validate the theoretical findings and demonstrate the
    performance of the proposed method in realistic settings.

    The Classical Limit of Entropic Quantum Dynamics

    Anthony Demme, Ariel Caticha
    Comments: Presented at MaxEnt 2016, the 36th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (July 10-15, 2016, Ghent, Belgium)
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

    The framework of entropic dynamics (ED) allows one to derive quantum
    mechanics as an application of entropic inference. In this work we derive the
    classical limit of quantum mechanics in the context of ED. Our goal is to find
    conditions so that the center of mass (CM) of a system of N particles behaves
    as a classical particle. What is of interest is that Planck’s constant remains
    finite at all steps in the calculation and that the classical motion is
    obtained as the result of a central limit theorem. More explicitly we show that
    if the system is sufficiently large, and if the CM is initially uncorrelated
    with other degrees of freedom, then the CM follows a smooth trajectory and
    obeys the classical Hamilton-Jacobi with a vanishing quantum potential.

    Detecting Byzantine Attacks for Gaussian Two-Way Relay System

    Ruohan Cao
    Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)

    This paper focuses on Byzantine attack detection for Gaussian two-way relay
    network. In this network, two source nodes communicate with each other with the
    help of an amplify-and-forward relay which may perform Byzantine attacks by
    forwarding altered symbols to the sources. For simple investigating the
    detectability of attacks conducted in Gaussian channels, we focus on the MA
    channel of the network, while assuming the BC channel is noiseless. Upon such
    model, we propose a attack detection scheme implemented in the sources.
    Specifically, we consider a open wireless propagation environment that allows
    the symbols, forwarded by the relay, to go through a continuous channel and
    arrive to the sources. With the observations of the source, we develop a
    detection scheme for the source by comparing the joint empirical distribution
    of its received and transmitted signals with the known channel statistics. The
    main contribution of this paper is to prove that if and only if the Gaussian
    relay network satisfies a non-manipulable channel condition, the proposed
    detection scheme can detect arbitrary attacks that allows the stochastic
    distributions of altered symbols to vary arbitrarily and depend on each other.
    No pre-shared secret or secret transmission is needed for the detection.
    Furthermore, we also prove that for the considered Gaussian two-way relay
    networks, the non-manipulable channel condition is always satisfied. This
    result indicates that arbitrary attacks conducted in MA Gaussian channels are
    detectable by only using observations, while providing a base for attack
    detection in more general Gaussian networks.

    Deterministic and Probabilistic Conditions for Finite Completability of Low Rank Tensor

    Morteza Ashraphijuo, Vaneet Aggarwal, Xiaodong Wang
    Subjects: Numerical Analysis (cs.NA); Information Theory (cs.IT); Learning (cs.LG)

    We investigate the fundamental conditions on the sampling pattern, i.e.,
    locations of the sampled entries, for finite completability of a low-rank
    tensor given some components of its Tucker rank. In order to find the
    deterministic necessary and sufficient conditions, we propose an algebraic
    geometric analysis on the Tucker manifold, which allows us to incorporate
    multiple rank components in the proposed analysis in contrast with the
    conventional geometric approaches on the Grassmannian manifold. This analysis
    characterizes the algebraic independence of a set of polynomials defined based
    on the sampling pattern, which is closely related to finite completion.
    Probabilistic conditions are then studied and a lower bound on the sampling
    probability is given, which guarantees that the proposed deterministic
    conditions on the sampling patterns for finite completability hold with high
    probability. Furthermore, using the proposed geometric approach for finite
    completability, we propose a sufficient condition on the sampling pattern that
    ensures there exists exactly one completion for the sampled tensor.




沪ICP备19023445号-2号
友情链接