IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 4 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-04 00:00:00
    love 0

    Neural and Evolutionary Computing

    Multi-rendezvous Spacecraft Trajectory Optimization with Beam P-ACO

    Luís F. Simões, Dario Izzo, Evert Haasdijk, A. E. Eiben
    Comments: Code available at this https URL
    Subjects: Neural and Evolutionary Computing (cs.NE); Space Physics (physics.space-ph)

    The design of spacecraft trajectories for missions visiting multiple
    celestial bodies is here framed as a multi-objective bilevel optimization
    problem. A comparative study is performed to assess the performance of
    different Beam Search algorithms at tackling the combinatorial problem of
    finding the ideal sequence of bodies. Special focus is placed on the
    development of a new hybridization between Beam Search and the Population-based
    Ant Colony Optimization algorithm. An experimental evaluation shows all
    algorithms achieving exceptional performance on a hard benchmark problem. It is
    found that a properly tuned deterministic Beam Search always outperforms the
    remaining variants. Beam P-ACO, however, demonstrates lower parameter
    sensitivity, while offering superior worst-case performance. Being an anytime
    algorithm, it is then found to be the preferable choice for certain practical
    applications.

    A correlation game for unsupervised learning yields computational interpretations of Hebbian excitation, anti-Hebbian inhibition, and synapse elimination

    H. Sebastian Seung, Jonathan Zung
    Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

    Much has been learned about plasticity of biological synapses from empirical
    studies. Hebbian plasticity is driven by correlated activity of presynaptic and
    postsynaptic neurons. Synapses that converge onto the same neuron often behave
    as if they compete for a fixed resource; some survive the competition while
    others are eliminated. To provide computational interpretations of these
    aspects of synaptic plasticity, we formulate unsupervised learning as a
    zero-sum game between Hebbian excitation and anti-Hebbian inhibition in a
    neural network model. The game formalizes the intuition that Hebbian excitation
    tries to maximize correlations of neurons with their inputs, while anti-Hebbian
    inhibition tries to decorrelate neurons from each other. We further include a
    model of synaptic competition, which enables a neuron to eliminate all
    connections except those from its most strongly correlated inputs. Through
    empirical studies, we show that this facilitates the learning of sensory
    features that resemble parts of objects.

    A Brownian Motion Model and Extreme Belief Machine for Modeling Sensor Data Measurements

    Robert A. Murphy
    Subjects: Neural and Evolutionary Computing (cs.NE)

    As the title suggests, we will describe (and justify through the presentation
    of some of the relevant mathematics) prediction methodologies for sensor
    measurements. This exposition will mainly be concerned with the mathematics
    related to modeling the sensor measurements.

    Upper Bounds on the Runtime of the Univariate Marginal Distribution Algorithm on OneMax

    Carsten Witt
    Comments: An extended abstract of this report will appear in the proceedings of the 2017 Genetic and Evolutionary Computation Conference (GECCO 2017)
    Subjects: Neural and Evolutionary Computing (cs.NE)

    A runtime analysis of the Univariate Marginal Distribution Algorithm (UMDA)
    is presented on the OneMax function for wide ranges of its parameters (mu) and
    (lambda). If (muge clog n) for some constant (c>0) and
    (lambda=(1+Theta(1))mu), a general bound (O(mu n)) on the expected runtime
    is obtained. This bound crucially assumes that all marginal probabilities of
    the algorithm are confined to the interval ([1/n,1-1/n]). If (muge c’
    sqrt{n}log n) for a constant (c’>0) and (lambda=(1+Theta(1))mu), the
    behavior of the algorithm changes and the bound on the expected runtime becomes
    (O(musqrt{n})), which typically even holds if the borders on the marginal
    probabilities are omitted.

    The results supplement the recently derived lower bound
    (Omega(musqrt{n}+nlog n)) by Krejca and Witt (FOGA 2017) and turn out as
    tight for the two very different values (mu=clog n) and (mu=c’sqrt{n}log
    n). They also improve the previously best known upper bound (O(nlog nloglog
    n)) by Dang and Lehre (GECCO 2015).

    Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

    Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox
    Comments: 10 pages, 7 figures, ICCV 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)

    General human action recognition requires understanding of various visual
    cues. In this paper, we propose a network architecture that computes and
    integrates the most important visual cues for action recognition: pose, motion,
    and the raw images. For the integration, we introduce a Markov chain model
    which adds cues successively. The resulting approach is efficient and
    applicable to action classification as well as to spatial and temporal action
    localization. The two contributions clearly improve the performance over
    respective baselines. The overall approach achieves state-of-the-art action
    classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover,
    it yields state-of-the-art spatio-temporal action localization results on
    UCF101 and J-HMDB.

    Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

    Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    A grand goal of computer vision is to build systems that learn visual
    representations over time that can be applied to many tasks. In this paper, we
    investigate a vision-language embedding as a core representation and show that
    it leads to better cross-task transfer than standard multi-task learning. In
    particular, the task of visual recognition is aligned to the task of visual
    question answering by forcing each to use the same word-region embeddings. We
    show this leads to greater inductive transfer from recognition to VQA than
    standard multitask learning. Visual recognition also improves, especially for
    categories that have relatively few recognition training labels but appear
    often in the VQA setting. Thus, our paper takes a small step towards creating
    more general vision systems by showing the benefit of interpretable, flexible,
    and trainable core representations.


    Computer Vision and Pattern Recognition

    It Takes Two to Tango: Towards Theory of AI's Mind

    Arjun Chandrasekaran, Deshraj Yadav, Prithvijit Chattopadhyay, Viraj Prabhu, Devi Parikh
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Theory of Mind is the ability to attribute mental states (beliefs, intents,
    knowledge, perspectives, etc.) to others and recognize that these mental states
    may differ from one’s own. Theory of Mind is critical to effective
    communication and to teams demonstrating higher collective performance. To
    effectively leverage the progress in Artificial Intelligence (AI) to make our
    lives more productive, it is important for humans and AI to work well together
    in a team. Traditionally, there has been much emphasis on research to make AI
    more accurate, and (to a lesser extent) on having it better understand human
    intentions, tendencies, beliefs, and contexts. The latter involves making AI
    more human-like and having it develop a theory of our minds.

    In this work, we argue that for human-AI teams to be effective, humans must
    also develop a theory of AI’s mind – get to know its strengths, weaknesses,
    beliefs, and quirks. We instantiate these ideas within the domain of Visual
    Question Answering (VQA). We find that using just a few examples(50), lay
    people can be trained to better predict responses and oncoming failures of a
    complex VQA model. Surprisingly, we find that having access to the model’s
    internal states – its confidence in its top-k predictions, explicit or implicit
    attention maps which highlight regions in the image (and words in the question)
    the model is looking at (and listening to) while answering a question about an
    image – do not help people better predict its behavior

    Hierarchical Surface Prediction for 3D Object Reconstruction

    Christian Häne, Shubham Tulsiani, Jitendra Malik
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, Convolutional Neural Networks have shown promising results for 3D
    geometry prediction. They can make predictions from very little input data such
    as for example a single color image, depth map or a partial 3D volume. A major
    limitation of such approaches is that they only predict a coarse resolution
    voxel grid, which does not capture the surface of the objects well. We propose
    a general framework, called hierarchical surface prediction (HSP), which
    facilitates prediction of high resolution voxel grids. The main insight is that
    it is sufficient to predict high resolution voxels around the predicted
    surfaces. The exterior and interior of the objects can be represented with
    coarse resolution voxels. This allows us to predict significantly higher
    resolution voxel grids around the surface, from which triangle meshes can be
    extracted. Our approach is general and not dependent on a specific input type.
    In our experiments, we show results for geometry prediction from color images,
    depth images and shape completion from partial voxel grids. Our analysis shows
    that the network is able to predict the surface more accurately than a low
    resolution prediction.

    The 2017 DAVIS Challenge on Video Object Segmentation

    Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alex Sorkine-Hornung, Luc Van Gool
    Comments: Challenge website: this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present the 2017 DAVIS Challenge, a public competition specifically
    designed for the task of video object segmentation. Following the footsteps of
    other successful initiatives, such as ILSVRC and PASCAL VOC, which established
    the avenue of research in the fields of scene classification and semantic
    segmentation, the DAVIS Challenge comprises a dataset, an evaluation
    methodology, and a public competition with a dedicated workshop co-located with
    CVPR 2017. The DAVIS Challenge follows up on the recent publication of DAVIS
    (Densely-Annotated VIdeo Segmentation), which has fostered the development of
    several novel state-of-the-art video object segmentation techniques. In this
    paper we describe the scope of the benchmark, highlight the main
    characteristics of the dataset and define the evaluation metrics of the
    competition.

    Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

    Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox
    Comments: 10 pages, 7 figures, ICCV 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)

    General human action recognition requires understanding of various visual
    cues. In this paper, we propose a network architecture that computes and
    integrates the most important visual cues for action recognition: pose, motion,
    and the raw images. For the integration, we introduce a Markov chain model
    which adds cues successively. The resulting approach is efficient and
    applicable to action classification as well as to spatial and temporal action
    localization. The two contributions clearly improve the performance over
    respective baselines. The overall approach achieves state-of-the-art action
    classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover,
    it yields state-of-the-art spatio-temporal action localization results on
    UCF101 and J-HMDB.

    Spatiotemporal Networks for Video Emotion Recognition

    Lijie Fan, Yunjie Ke
    Comments: 8 pages, 10 figures. arXiv admin note: text overlap with arXiv:1608.00859, arXiv:1503.08909 by other authors
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Our article presents an audio-visual based multi-modal emotion classification
    system. Considering the fact of deep learning approaches to facial analysis
    have recently demonstrated high performance, in our work, we use convolutional
    neural networks (CNNs) for emotion recognition in video, relying on temporal
    averaging and pooling operations reminiscent of widely used approaches for the
    spatial aggregation of information. In respect of time sequence, we extract the
    feature from audio clips in the video and use RNN to propagate information. In
    this work, we focus our presentation and experimental analysis on a fusion
    CNN-RNN architecture for facial expression analysis.

    3D Object Reconstruction from Hand-Object Interactions

    Dimitrios Tzionas, Juergen Gall
    Comments: International Conference on Computer Vision (ICCV) 2015, this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent advances have enabled 3d object reconstruction approaches using a
    single off-the-shelf RGB-D camera. Although these approaches are successful for
    a wide range of object classes, they rely on stable and distinctive geometric
    or texture features. Many objects like mechanical parts, toys, household or
    decorative articles, however, are textureless and characterized by minimalistic
    shapes that are simple and symmetric. Existing in-hand scanning systems and 3d
    reconstruction techniques fail for such symmetric objects in the absence of
    highly distinctive features. In this work, we show that extracting 3d hand
    motion for in-hand scanning effectively facilitates the reconstruction of even
    featureless and highly symmetric objects and we present an approach that fuses
    the rich additional information of hands into a 3d reconstruction pipeline,
    significantly contributing to the state-of-the-art of in-hand scanning.

    Block-Matching Convolutional Neural Network for Image Denoising

    Byeongyong Ahn, Nam Ik Cho
    Comments: 11 pages, 9 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    There are two main streams in up-to-date image denoising algorithms:
    non-local self similarity (NSS) prior based methods and convolutional neural
    network (CNN) based methods. The NSS based methods are favorable on images with
    regular and repetitive patterns while the CNN based methods perform better on
    irregular structures. In this paper, we propose a block-matching convolutional
    neural network (BMCNN) method that combines NSS prior and CNN. Initially,
    similar local patches in the input image are integrated into a 3D block. In
    order to prevent the noise from messing up the block matching, we first apply
    an existing denoising algorithm on the noisy image. The denoised image is
    employed as a pilot signal for the block matching, and then denoising function
    for the block is learned by a CNN structure. Experimental results show that the
    proposed BMCNN algorithm achieves state-of-the-art performance. In detail,
    BMCNN can restore both repetitive and irregular structures.

    Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points

    Dimitrios Tzionas, Abhilash Srikantha, Pablo Aponte, Juergen Gall
    Comments: German Conference on Pattern Recognition (GCPR) 2014, this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Hand motion capture has been an active research topic in recent years,
    following the success of full-body pose tracking. Despite similarities, hand
    tracking proves to be more challenging, characterized by a higher
    dimensionality, severe occlusions and self-similarity between fingers. For this
    reason, most approaches rely on strong assumptions, like hands in isolation or
    expensive multi-camera systems, that limit the practical use. In this work, we
    propose a framework for hand tracking that can capture the motion of two
    interacting hands using only a single, inexpensive RGB-D camera. Our approach
    combines a generative model with collision detection and discriminatively
    learned salient points. We quantitatively evaluate our approach on 14 new
    sequences with challenging interactions.

    Truncating Wide Networks using Binary Tree Architectures

    Yan Zhang, Mete Ozay, Shuohao Li, Takayuki Okatani
    Comments: 10 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent study shows that a wide deep network can obtain accuracy comparable to
    a deeper but narrower network. Compared to narrower and deeper networks, wide
    networks employ relatively less number of layers and have various important
    benefits, such that they have less running time on parallel computing devices,
    and they are less affected by gradient vanishing problems. However, the
    parameter size of a wide network can be very large due to use of large width of
    each layer in the network. In order to keep the benefits of wide networks
    meanwhile improve the parameter size and accuracy trade-off of wide networks,
    we propose a binary tree architecture to truncate architecture of wide networks
    by reducing the width of the networks. More precisely, in the proposed
    architecture, the width is continuously reduced from lower layers to higher
    layers in order to increase the expressive capacity of network with a less
    increase on parameter size. Also, to ease the gradient vanishing problem,
    features obtained at different layers are concatenated to form the output of
    our architecture. By employing the proposed architecture on a baseline wide
    network, we can construct and train a new network with same depth but
    considerably less number of parameters. In our experimental analyses, we
    observe that the proposed architecture enables us to obtain better parameter
    size and accuracy trade-off compared to baseline networks using various
    benchmark image classification datasets. The results show that our model can
    decrease the classification error of baseline from 20.43% to 19.22% on
    Cifar-100 using only 28% of parameters that baseline has. Code is available at
    this https URL

    Convolutional neural networks for segmentation and object detection of human semen

    Malte Stær Nissen, Oswin Krause, Kristian Almstrup, Søren Kjærulff, Torben Trindkær Nielsen, Mads Nielsen
    Comments: Submitted for Scandinavian Conference on Image Analysis 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We compare a set of convolutional neural network (CNN) architectures for the
    task of segmenting and detecting human sperm cells in an image taken from a
    semen sample. In contrast to previous work, samples are not stained or washed
    to allow for full sperm quality analysis, making analysis harder due to
    clutter. Our results indicate that training on full images is superior to
    training on patches when class-skew is properly handled. Full image training
    including up-sampling during training proves to be beneficial in deep CNNs for
    pixel wise accuracy and detection performance. Predicted sperm cells are found
    by using connected components on the CNN predictions. We investigate
    optimization of a threshold parameter on the size of detected components. Our
    best network achieves 93.87% precision and 91.89% recall on our test dataset
    after thresholding outperforming a classical mage analysis approach.

    A Comparison of Directional Distances for Hand Pose Estimation

    Dimitrios Tzionas, Juergen Gall
    Comments: German Conference on Pattern Recognition (GCPR) 2013, this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Benchmarking methods for 3d hand tracking is still an open problem due to the
    difficulty of acquiring ground truth data. We introduce a new dataset and
    benchmarking protocol that is insensitive to the accumulative error of other
    protocols. To this end, we create testing frame pairs of increasing difficulty
    and measure the pose estimation error separately for each of them. This
    approach gives new insights and allows to accurately study the performance of
    each feature or method without employing a full tracking pipeline. Following
    this protocol, we evaluate various directional distances in the context of
    silhouette-based 3d hand tracking, expressed as special cases of a generalized
    Chamfer distance form. An appropriate parameter setup is proposed for each of
    them, and a comparative study reveals the best performing method in this
    context.

    Learning a Variational Network for Reconstruction of Accelerated MRI Data

    Kerstin Hammernik, Teresa Klatzer, Erich Kobler, Michael P Recht, Daniel K Sodickson, Thomas Pock, Florian Knoll
    Comments: Submitted to Magnetic Resonance in Medicine
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Purpose: To allow fast and high-quality reconstruction of clinical
    accelerated multi-coil MR data by learning a variational network that combines
    the mathematical structure of variational models with deep learning.

    Theory and Methods: Generalized compressed sensing reconstruction formulated
    as a variational model is embedded in an unrolled gradient descent scheme. All
    parameters of this formulation, including the prior model defined by filter
    kernels and activation functions as well as the data term weights, are learned
    during an offline training procedure. The learned model can then be applied
    online to previously unseen data.

    Results: The variational network approach is evaluated on a clinical knee
    imaging protocol. The variational network reconstructions outperform standard
    reconstruction algorithms in terms of image quality and residual artifacts for
    all tested acceleration factors and sampling patterns.

    Conclusion: Variational network reconstructions preserve the natural
    appearance of MR images as well as pathologies that were not included in the
    training data set. Due to its high computational performance, i.e.,
    reconstruction time of 193 ms on a single graphics card, and the omission of
    parameter tuning once the network is trained, this new approach to image
    reconstruction can easily be integrated into clinical workflow.

    A Good Practice Towards Top Performance of Face Recognition: Transferred Deep Feature Fusion

    Lin Xiong, Jayashree Karlekar, Jian Zhao, Jiashi Feng, Sugiri Pranata, Shengmei Shen
    Comments: 10 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Unconstrained face recognition performance evaluations have traditionally
    focused on Labeled Faces in the Wild (LFW) dataset for imagery and the
    YouTubeFaces (YTF) dataset for videos in the last couple of years. Spectacular
    progress in this field has resulted in a saturation on verification and
    identification accuracies for those benchmark datasets. In this paper, we
    propose a unified learning framework named transferred deep feature fusion
    targeting at the new IARPA Janus Bechmark A (IJB-A) face recognition dataset
    released by NIST face challenge. The IJB-A dataset includes real-world
    unconstrained faces from 500 subjects with full pose and illumination
    variations which are much harder than the LFW and YTF datasets. Inspired by
    transfer learning, we train two advanced deep convolutional neural networks
    (DCNN) with two different large datasets in source domain, respectively. By
    exploring the complementarity of two distinct DCNNs, deep feature fusion is
    utilized after feature extraction in target domain. Then, template specific
    linear SVMs is adopted to enhance the discrimination of framework. Finally,
    multiple matching scores corresponding different templates are merged as the
    final results. This simple unified framework outperforms the state-of-the-art
    by a wide margin on IJB-A dataset. Based on the proposed approach, we have
    submitted our IJB-A results to National Institute of Standards and Technology
    (NIST) for official evaluation.

    Sparse Autoencoder for Unsupervised Nucleus Detection and Representation in Histopathology Images

    Le Hou, Vu Nguyen, Dimitris Samaras, Tahsin M. Kurc, Yi Gao, Tianhao Zhao, Joel H. Saltz
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Histopathology images are crucial to the study of complex diseases such as
    cancer. The histologic characteristics of nuclei play a key role in disease
    diagnosis, prognosis and analysis. In this work, we propose a sparse
    Convolutional Autoencoder (CAE) for fully unsupervised, simultaneous nucleus
    detection and feature extraction in histopathology tissue images. Our CAE
    detects and encodes nuclei in image patches in tissue images into sparse
    feature maps that encode both the location and appearance of nuclei. Our CAE is
    the first unsupervised detection network for computer vision applications. The
    pretrained nucleus detection and feature extraction modules in our CAE can be
    fine-tuned for supervised learning in an end-to-end fashion. We evaluate our
    method on four datasets and reduce the errors of state-of-the-art methods up to
    42%. We are able to achieve comparable performance with only 5% of the
    fully-supervised annotation cost.

    Geometric loss functions for camera pose regression with deep learning

    Alex Kendall, Roberto Cipolla
    Comments: CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep learning has shown to be effective for robust and real-time monocular
    image relocalisation. In particular, PoseNet is a deep convolutional neural
    network which learns to regress the 6-DOF camera pose from a single image. It
    learns to localize using high level features and is robust to difficult
    lighting, motion blur and unknown camera intrinsics, where point based SIFT
    registration fails. However, it was trained using a naive loss function, with
    hyper-parameters which require expensive tuning. In this paper, we give the
    problem a more fundamental theoretical treatment. We explore a number of novel
    loss functions for learning camera pose which are based on geometry and scene
    reprojection error. Additionally we show how to automatically learn an optimal
    weighting to simultaneously regress position and orientation. By leveraging
    geometry, we demonstrate that our technique significantly improves PoseNet’s
    performance across datasets ranging from indoor rooms to a small city.

    Hidden Two-Stream Convolutional Networks for Action Recognition

    Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann
    Comments: under review at ICCV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)

    Analyzing videos of human actions involves understanding the temporal
    relationships among video frames. CNNs are the current state-of-the-art methods
    for action recognition in videos. However, the CNN architectures currently
    being used have difficulty in capturing these relationships. State-of-the-art
    action recognition approaches rely on traditional local optical flow estimation
    methods to pre-compute the motion information for CNNs. Such a two-stage
    approach is computationally expensive, storage demanding, and not end-to-end
    trainable. In this paper, we present a novel CNN architecture that implicitly
    captures motion information. Our method is 10x faster than a two-stage
    approach, does not need to cache flow information, and is end-to-end trainable.
    Experimental results on UCF101 and HMDB51 show that it achieves competitive
    accuracy with the two-stage approaches.

    Dense Multi-view 3D-reconstruction Without Dense Correspondences

    Yvain Quéau, Jean Mélou, Jean-Denis Durou, Daniel Cremers
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We introduce a variational method for multi-view shape-from-shading under
    natural illumination. The key idea is to couple PDE-based solutions for
    single-image based shape-from-shading problems across multiple images and
    multiple color channels by means of a variational formulation. Rather than
    alternatingly solving the individual SFS problems and optimizing the
    consistency across images and channels which is known to lead to suboptimal
    results, we propose an efficient solution of the coupled problem by means of an
    ADMM algorithm. In numerous experiments on both simulated and real imagery, we
    demonstrate that the proposed fusion of multiple-view reconstruction and
    shape-from-shading provides highly accurate dense reconstructions without the
    need to compute dense correspondences. With the proposed variational
    integration across multiple views shape-from-shading techniques become
    applicable to challenging real-world reconstruction problems, giving rise to
    highly detailed geometry even in areas of smooth brightness variation and
    lacking texture.

    Understanding Deep Representations through Random Weights

    Yao Shu, Man Zhu, Kun He, John Hopcroft, Pan Zhou
    Comments: 7 pages, 12 Figures, submitted to a 2017 Conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We systematically study the deep representation of random weight CNN
    (convolutional neural network) using the DeCNN (deconvolutional neural network)
    architecture. We first fix the weights of an untrained CNN, and for each layer
    of its feature representation, we train a corresponding DeCNN to reconstruct
    the input image. As compared with the pre-trained CNN, the DeCNN trained on a
    random weight CNN can reconstruct images more quickly and accurately, no matter
    which type of random distribution for the CNN’s weights. It reveals that every
    layer of the random CNN can retain photographically accurate information about
    the image. We then let the DeCNN be untrained, i.e. the overall CNN-DeCNN
    architecture uses only random weights. Strikingly, we can reconstruct all
    position information of the image for low layer representations but the colors
    change. For high layer representations, we can still capture the rough contours
    of the image. We also change the number of feature maps and the shape of the
    feature maps and gain more insight on the random function of the CNN-DeCNN
    structure. Our work reveals that the purely random CNN-DeCNN architecture
    substantially contributes to the geometric and photometric invariance due to
    the intrinsic symmetry and invertible structure, but it discards the
    colormetric information due to the random projection.

    People Counting in Crowded and Outdoor Scenes using an Hybrid Multi-Camera Approach

    Fabio Dittrich, Luiz E. S. de Oliveira, Alceu S. Britto Jr., Alessandro L. Koerich
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents two novel approaches for people counting in crowded and
    open environments that combine the information gathered by multiple views.
    Multiple camera are used to expand the field of view as well as to mitigate the
    problem of occlusion that commonly affects the performance of counting methods
    using single cameras. The first approach is regarded as a direct approach and
    it attempts to segment and count each individual in the crowd. For such an aim,
    two head detectors trained with head images are employed: one based on support
    vector machines and another based on Adaboost perceptron. The second approach,
    regarded as an indirect approach employs learning algorithms and statistical
    analysis on the whole crowd to achieve counting. For such an aim, corner points
    are extracted from groups of people in a foreground image and computed by a
    learning algorithm which estimates the number of people in the scene. Both
    approaches count the number of people on the scene and not only on a given
    image or video frame of the scene. The experimental results obtained on the
    benchmark PETS2009 video dataset show that proposed indirect method surpasses
    other methods with improvements of up to 46.7% and provides accurate counting
    results for the crowded scenes. On the other hand, the direct method shows high
    error rates due to the fact that the latter has much more complex problems to
    solve, such as segmentation of heads.

    Efficient Version-Space Reduction for Visual Tracking

    Kourosh Meshgi, Shigeyuki Oba, Shin Ishii
    Comments: CRV’17 Conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Discrminative trackers, employ a classification approach to separate the
    target from its background. To cope with variations of the target shape and
    appearance, the classifier is updated online with different samples of the
    target and the background. Sample selection, labeling and updating the
    classifier is prone to various sources of errors that drift the tracker. We
    introduce the use of an efficient version space shrinking strategy to reduce
    the labeling errors and enhance its sampling strategy by measuring the
    uncertainty of the tracker about the samples. The proposed tracker, utilize an
    ensemble of classifiers that represents different hypotheses about the target,
    diversify them using boosting to provide a larger and more consistent coverage
    of the version-space and tune the classifiers’ weights in voting. The proposed
    system adjusts the model update rate by promoting the co-training of the
    short-memory ensemble with a long-memory oracle. The proposed tracker
    outperformed state-of-the-art trackers on different sequences bearing various
    tracking challenges.

    The Stixel world: A medium-level representation of traffic scenes

    Marius Cordts, Timo Rehfeld, Lukas Schneider, David Pfeiffer, Markus Enzweiler, Stefan Roth, Marc Pollefeys, Uwe Franke
    Comments: Accepted for publication in Image and Vision Computing
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recent progress in advanced driver assistance systems and the race towards
    autonomous vehicles is mainly driven by two factors: (1) increasingly
    sophisticated algorithms that interpret the environment around the vehicle and
    react accordingly, and (2) the continuous improvements of sensor technology
    itself. In terms of cameras, these improvements typically include higher
    spatial resolution, which as a consequence requires more data to be processed.
    The trend to add multiple cameras to cover the entire surrounding of the
    vehicle is not conducive in that matter. At the same time, an increasing number
    of special purpose algorithms need access to the sensor input data to correctly
    interpret the various complex situations that can occur, particularly in urban
    traffic.

    By observing those trends, it becomes clear that a key challenge for vision
    architectures in intelligent vehicles is to share computational resources. We
    believe this challenge should be faced by introducing a representation of the
    sensory data that provides compressed and structured access to all relevant
    visual content of the scene. The Stixel World discussed in this paper is such a
    representation. It is a medium-level model of the environment that is
    specifically designed to compress information about obstacles by leveraging the
    typical layout of outdoor traffic scenes. It has proven useful for a multitude
    of automotive vision applications, including object detection, tracking,
    segmentation, and mapping.

    In this paper, we summarize the ideas behind the model and generalize it to
    take into account multiple dense input streams: the image itself, stereo depth
    maps, and semantic class probability maps that can be generated, e.g., by CNNs.
    Our generalization is embedded into a novel mathematical formulation for the
    Stixel model. We further sketch how the free parameters of the model can be
    learned using structured SVMs.

    SAR image despeckling through convolutional neural networks

    G. Chierchia, D. Cozzolino, G. Poggi, L. Verdoliva
    Comments: Accepted at 2017 IEEE International Geoscience and Remote Sensing Symposium, Fort Worth, Texas, July 23-28, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we investigate the use of discriminative model learning through
    Convolutional Neural Networks (CNNs) for SAR image despeckling. The network
    uses a residual learning strategy, hence it does not recover the filtered
    image, but the speckle component, which is then subtracted from the noisy one.
    Training is carried out by considering a large multitemporal SAR image properly
    despeckled through 3D filtering, in order to approximate a {em clean} image.
    Experimental results, both on synthetic and real SAR data, show the method to
    achieve performance that improve with respect to state-of-the-art techniques.

    Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

    Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    A grand goal of computer vision is to build systems that learn visual
    representations over time that can be applied to many tasks. In this paper, we
    investigate a vision-language embedding as a core representation and show that
    it leads to better cross-task transfer than standard multi-task learning. In
    particular, the task of visual recognition is aligned to the task of visual
    question answering by forcing each to use the same word-region embeddings. We
    show this leads to greater inductive transfer from recognition to VQA than
    standard multitask learning. Visual recognition also improves, especially for
    categories that have relatively few recognition training labels but appear
    often in the VQA setting. Thus, our paper takes a small step towards creating
    more general vision systems by showing the benefit of interpretable, flexible,
    and trainable core representations.

    A-Lamp: Adaptive Layout-Aware Multi-Patch Deep Convolutional Neural Network for Photo Aesthetic Assessment

    Shuang Ma, Jing Liu, Chang Wen Chen
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deep convolutional neural networks (CNN) have recently been shown to generate
    promising results for aesthetics assessment. However, the performance of these
    deep CNN methods is often compromised by the constraint that the neural network
    only takes the fixed-size input. To accommodate this requirement, input images
    need to be transformed via cropping, warping, or padding, which often alter
    image composition, reduce image resolution, or cause image distortion. Thus the
    aesthetics of the original images is impaired because of potential loss of fine
    grained details and holistic image layout. However, such fine grained details
    and holistic image layout is critical for evaluating an image’s aesthetics. In
    this paper, we present an Adaptive Layout-Aware Multi-Patch Convolutional
    Neural Network (A-Lamp CNN) architecture for photo aesthetic assessment. This
    novel scheme is able to accept arbitrary sized images, and learn from both
    fined grained details and holistic image layout simultaneously. To enable
    training on these hybrid inputs, we extend the method by developing a dedicated
    double-subnet neural network structure, i.e. a Multi-Patch subnet and a
    Layout-Aware subnet. We further construct an aggregation layer to effectively
    combine the hybrid features from these two subnets. Extensive experiments on
    the large-scale aesthetics assessment benchmark (AVA) demonstrate significant
    performance improvement over the state-of-the-art in photo aesthetic
    assessment.

    Complexity-Aware Assignment of Latent Values in Discriminative Models for Accurate Gesture Recognition

    Manoel Horta Ribeiro, Bruno Teixeira, Antônio Otávio Fernandes, Wagner Meira Jr., Erickson R. Nascimento
    Comments: Conference paper published at 2016 29th SIBGRAPI, Conference on Graphics, Patterns and Images (SIBGRAPI). 8 pages, 7 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Many of the state-of-the-art algorithms for gesture recognition are based on
    Conditional Random Fields (CRFs). Successful approaches, such as the
    Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose
    values are mapped to the values of the labels. In this paper we propose a novel
    methodology to set the latent values according to the gesture complexity. We
    use an heuristic that iterates through the samples associated with each label
    value, stimating their complexity. We then use it to assign the latent values
    to the label values. We evaluate our method on the task of recognizing human
    gestures from video streams. The experiments were performed in binary datasets,
    generated by grouping different labels. Our results demonstrate that our
    approach outperforms the arbitrary one in many cases, increasing the accuracy
    by up to 10%.

    Compositional Human Pose Regression

    Xiao Sun, Jiaxiang Shang, Shuang Liang, Yichen Wei
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Regression based methods are widely used for 3D and 2D human pose estimation,
    but the performance is not satisfactory. One problem is that the structural
    information of the pose is not well exploited in the existing methods. In this
    work, we propose a structure-aware regression approach. It adopts a
    reparameterized pose representation using bones instead of joints. It exploits
    the joint connection structure and proposes a compositional loss function that
    encodes the long range interactions in the pose. It is simple, effective, and
    general. Comprehensive evaluation validates the effectiveness of our approach.
    It significantly advances the state-of-the-art on Human3.6M and achieves
    state-of-the-art results on MPII, in a unified framework for 3D and 2D pose
    regression.

    Multiple Instance Detection Network with Online Instance Classifier Refinement

    Peng Tang, Xinggang Wang, Xiang Bai, Wenyu Liu
    Comments: Accepted by CVPR 2017, IEEE Conference on Computer Vision and Pattern Recognition 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Of late, weakly supervised object detection is with great importance in
    object recognition. Based on deep learning, weakly supervised detectors have
    achieved many promising results. However, compared with fully supervised
    detection, it is more challenging to train deep network based detectors in a
    weakly supervised manner. Here we formulate weakly supervised detection as a
    Multiple Instance Learning (MIL) problem, where instance classifiers (object
    detectors) are put into the network as hidden nodes. We propose a novel online
    instance classifier refinement algorithm to integrate MIL and the instance
    classifier refinement procedure into a single deep network, and train the
    network end-to-end with only image-level supervision, i.e., without object
    location information. More precisely, instance labels inferred from weak
    supervision are propagated to their spatially overlapped instances to refine
    instance classifier online. The iterative instance classifier refinement
    procedure is implemented using multiple streams in deep network, where each
    stream supervises its latter stream. Weakly supervised object detection
    experiments are carried out on the challenging PASCAL VOC 2007 and 2012
    benchmarks. We obtain 47% mAP on VOC 2007 that significantly outperforms the
    previous state-of-the-art.

    Configurable, Photorealistic Image Rendering and Ground Truth Synthesis by Sampling Stochastic Grammars Representing Indoor Scenes

    Chenfanfu Jiang, Yixin Zhu, Siyuan Qi, Siyuan Huang, Jenny Lin, Xiongwen Guo, Lap-Fai Yu, Demetri Terzopoulos, Song-Chun Zhu
    Comments: 12-page paper with 18-page supplementary file, 25 figures, submitted to ICCV 2017. Chenfanfu Jiang, Yixin Zhu, Siyuan Qi and Siyuan Huang contributed equally to this paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We propose the configurable rendering of massive quantities of photorealistic
    images with ground truth for the purposes of training, benchmarking, and
    diagnosing computer vision models. In contrast to the conventional
    (crowd-sourced) manual labeling of ground truth for a relatively modest number
    of RGB-D images captured by Kinect-like sensors, we devise a non-trivial
    configurable pipeline of algorithms capable of generating a potentially
    infinite variety of indoor scenes using a stochastic grammar, specifically, one
    represented by an attributed spatial And-Or graph. We employ physics-based
    rendering to synthesize photorealistic RGB images while automatically
    synthesizing detailed, per-pixel ground truth data, including visible surface
    depth and normal, object identity and material information, as well as
    illumination. Our pipeline is configurable inasmuch as it enables the precise
    customization and control of important attributes of the generated scenes. We
    demonstrate that our generated scenes achieve a performance similar to the NYU
    v2 Dataset on pre-trained deep learning models. By modifying pipeline
    components in a controllable manner, we furthermore provide diagnostics on
    common scene understanding tasks; eg., depth and surface normal prediction,
    semantic segmentation, etc.

    SafetyNet: Detecting and Rejecting Adversarial Examples Robustly

    Jiajun Lu, Theerasit Issaranon, David Forsyth
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We describe a method to produce a network where current methods such as
    DeepFool have great difficulty producing adversarial samples. Our construction
    suggests some insights into how deep networks work. We provide a reasonable
    analyses that our construction is difficult to defeat, and show experimentally
    that our method is hard to defeat using several standard networks and datasets.
    We use our method to produce a system that can reliably detect whether an image
    is a picture of a real scene or not. Our system applies to images captured with
    depth maps (RGBD images) and checks if a pair of image and depth map is
    consistent. It relies on the relative difficulty of producing naturalistic
    depth maps for images in post processing. We demonstrate that our system is
    robust to adversarial examples built from currently known attacking approaches.

    Customizing First Person Image Through Desired Actions

    Shan Su, Jianbo Shi, Hyun Soo Park
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper studies a problem of inverse visual path planning: creating a
    visual scene from a first person action. Our conjecture is that the spatial
    arrangement of a first person visual scene is deployed to afford an action, and
    therefore, the action can be inversely used to synthesize a new scene such that
    the action is feasible. As a proof-of-concept, we focus on linking visual
    experiences induced by walking.

    A key innovation of this paper is a concept of ActionTunnel—a 3D virtual
    tunnel along the future trajectory encoding what the wearer will visually
    experience as moving into the scene. This connects two distinctive first person
    images through similar walking paths. Our method takes a first person image
    with a user defined future trajectory and outputs a new image that can afford
    the future motion. The image is created by combining present and future
    ActionTunnels in 3D where the missing pixels in adjoining area are computed by
    a generative adversarial network. Our work can provide a travel across
    different first person experiences in diverse real world scenes.

    Learning to Predict Indoor Illumination from a Single Image

    Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, Jean-François Lalonde
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we propose a method to infer high dynamic range illumination
    from a single, limited field-of-view, low dynamic range photograph of an indoor
    scene. Inferring scene illumination from a single photograph is a challenging
    problem. The pixel intensities observed in a photograph are a complex function
    of scene geometry, reflectance properties, and illumination. We introduce an
    end-to-end solution to this problem and propose a deep neural network that
    takes the limited field-of-view photo as input and produces an environment map
    as a panorama and a light mask prediction over the panorama as the output. Our
    technique does not require special image capture or user input. We preprocess
    standard low dynamic range panoramas by introducing novel light source
    detection and warping methods on the panorama, and use the results with
    corresponding limited field-of-view crops as training data. Our method does not
    rely on any assumptions on scene appearance, geometry, material properties, or
    lighting. This allows us to automatically recover high-quality illumination
    estimates that significantly outperform previous state-of-the-art methods.
    Consequently, using our illumination estimates for applications like 3D object
    insertion lead to results that are photo-realistic, which we demonstrate over a
    large set of examples and via a user study.

    Optimal Reconstruction with a Small Number of Views

    Cheng Peng, Volkan Isler
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Estimating positions of world points from features observed in images is a
    key problem in 3D reconstruction, image mosaicking, simultaneous localization
    and mapping and structure from motion. We consider a special instance in which
    there is a dominant ground plane (mathcal{G}) viewed from a parallel viewing
    plane (mathcal{S}) above it. Such instances commonly arise, for example, in
    aerial photography.

    Consider a world point (g in mathcal{G}) and its worst case reconstruction
    uncertainty (varepsilon(g,mathcal{S})) obtained by merging emph{all}
    possible views of (g) chosen from (mathcal{S}). We first show that one can
    pick two views (s_p) and (s_q) such that the uncertainty
    (varepsilon(g,{s_p,s_q})) obtained using only these two views is almost as
    good as (i.e. within a small constant factor of) (varepsilon(g,mathcal{S})).
    Next, we extend the result to the entire ground plane (mathcal{G}) and show
    that one can pick a small subset of (mathcal{S’} subseteq mathcal{S}) (which
    grows only linearly with the area of (mathcal{G})) and still obtain a constant
    factor approximation, for every point (g in mathcal{G}), to the minimum worst
    case estimate obtained by merging all views in (mathcal{S}).

    Our results provide a view selection mechanism with provable performance
    guarantees which can drastically increase the speed of scene reconstruction
    algorithms. In addition to theoretical results, we demonstrate their
    effectiveness in an application where aerial imagery is used for monitoring
    farms and orchards.

    Efficient Asymmetric Co-Tracking using Uncertainty Sampling

    Kourosh Meshgi, Maryam Sadat Mirzaei, Shigeyuki Oba, Shin Ishii
    Comments: Submitted to IEEE ICSIPA’2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Adaptive tracking-by-detection approaches are popular for tracking arbitrary
    objects. They treat the tracking problem as a classification task and use
    online learning techniques to update the object model. However, these
    approaches are heavily invested in the efficiency and effectiveness of their
    detectors. Evaluating a massive number of samples for each frame (e.g.,
    obtained by a sliding window) forces the detector to trade the accuracy in
    favor of speed. Furthermore, misclassification of borderline samples in the
    detector introduce accumulating errors in tracking. In this study, we propose a
    co-tracking based on the efficient cooperation of two detectors: a rapid
    adaptive exemplar-based detector and another more sophisticated but slower
    detector with a long-term memory. The sampling labeling and co-learning of the
    detectors are conducted by an uncertainty sampling unit, which improves the
    speed and accuracy of the system. We also introduce a budgeting mechanism which
    prevents the unbounded growth in the number of examples in the first detector
    to maintain its rapid response. Experiments demonstrate the efficiency and
    effectiveness of the proposed tracker against its baselines and its superior
    performance against state-of-the-art trackers on various benchmark videos.

    Geodesic Distance Histogram Feature for Video Segmentation

    Hieu Le, Vu Nguyen, Chen-Ping Yu, Dimitris Samaras
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper proposes a geodesic-distance-based feature that encodes global
    information for improved video segmentation algorithms. The feature is a joint
    histogram of intensity and geodesic distances, where the geodesic distances are
    computed as the shortest paths between superpixels via their boundaries. We
    also incorporate adaptive voting weights and spatial pyramid configurations to
    include spatial information into the geodesic histogram feature and show that
    this further improves results. The feature is generic and can be used as part
    of various algorithms. In experiments, we test the geodesic histogram feature
    by incorporating it into two existing video segmentation frameworks. This leads
    to significantly better performance in 3D video segmentation benchmarks on two
    datasets.

    Efficient Registration of Pathological Images: A Joint PCA/Image-Reconstruction Approach

    Xu Han, Xiao Yang, Stephen Aylward, Roland Kwitt, Marc Niethammer
    Comments: Accepted as a conference paper for ISBI 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Registration involving one or more images containing pathologies is
    challenging, as standard image similarity measures and spatial transforms
    cannot account for common changes due to pathologies. Low-rank/Sparse (LRS)
    decomposition removes pathologies prior to registration; however, LRS is
    memory-demanding and slow, which limits its use on larger data sets.
    Additionally, LRS blurs normal tissue regions, which may degrade registration
    performance. This paper proposes an efficient alternative to LRS: (1) normal
    tissue appearance is captured by principal component analysis (PCA) and (2)
    blurring is avoided by an integrated model for pathology removal and image
    reconstruction. Results on synthetic and BRATS 2015 data demonstrate its
    utility.

    Transfer of View-manifold Learning to Similarity Perception of Novel Objects

    Xingyu Lin, Hao Wang, Zhihao Li, Yimeng Zhang, Alan Yuille, Tai Sing Lee
    Comments: Accepted to ICLR2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We develop a model of perceptual similarity judgment based on re-training a
    deep convolution neural network (DCNN) that learns to associate different views
    of each 3D object to capture the notion of object persistence and continuity in
    our visual experience. The re-training process effectively performs distance
    metric learning under the object persistency constraints, to modify the
    view-manifold of object representations. It reduces the effective distance
    between the representations of different views of the same object without
    compromising the distance between those of the views of different objects,
    resulting in the untangling of the view-manifolds between individual objects
    within the same category and across categories. This untangling enables the
    model to discriminate and recognize objects within the same category,
    independent of viewpoints. We found that this ability is not limited to the
    trained objects, but transfers to novel objects in both trained and untrained
    categories, as well as to a variety of completely novel artificial synthetic
    objects. This transfer in learning suggests the modification of distance
    metrics in view- manifolds is more general and abstract, likely at the levels
    of parts, and independent of the specific objects or categories experienced
    during training. Interestingly, the resulting transformation of feature
    representation in the deep networks is found to significantly better match
    human perceptual similarity judgment than AlexNet, suggesting that object
    persistence could be an important constraint in the development of perceptual
    similarity judgment in biological neural networks.

    Graph Partitioning with Acyclicity Constraints

    Orlando Moreira, Merten Popp, Christian Schulz
    Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Graphs are widely used to model execution dependencies in applications. In
    particular, the NP-complete problem of partitioning a graph under constraints
    receives enormous attention by researchers because of its applicability in
    multiprocessor scheduling. We identified the additional constraint of acyclic
    dependencies between blocks when mapping computer vision and imaging
    applications to a heterogeneous embedded multiprocessor. Existing algorithms
    and heuristics do not address this requirement and deliver results that are not
    applicable for our use-case. In this work, we show that this more constrained
    version of the graph partitioning problem is NP-complete and present heuristics
    that achieve a close approximation of the optimal solution found by an
    exhaustive search for small problem instances and much better scalability for
    larger instances. In addition, we can show a positive impact on the schedule of
    a real imaging application that improves communication volume and execution
    time.

    Soft-to-Hard Vector Quantization for End-to-End Learned Compression of Images and Neural Networks

    Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, Luc Van Gool
    Comments: Supplementary visual examples available at: this http URL
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    In this work we present a new approach to learn compressible representations
    in deep architectures with an end-to-end training strategy. Our method is based
    on a soft relaxation of quantization and entropy, which we anneal to their
    discrete counterparts throughout training. We showcase this method for two
    challenging applications: Image compression and neural network compression.
    While these tasks have typically been approached with different methods, our
    soft-to-hard quantization approach gives state-of-the-art results for both.

    Local nearest neighbour classification with applications to semi-supervised learning

    Timothy I. Cannings, Thomas B. Berrett, Richard J. Samworth
    Comments: 42 pages
    Subjects: Statistics Theory (math.ST); Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME)

    We derive a new asymptotic expansion for the global excess risk of a local
    (k)-nearest neighbour classifier, where the choice of (k) may depend upon the
    test point. This expansion elucidates conditions under which the dominant
    contribution to the excess risk comes from the locus of points at which each
    class label is equally likely to occur, but we also show that if these
    conditions are not satisfied, the dominant contribution may arise from the
    tails of the marginal distribution of the features. Moreover, we prove that,
    provided the (d)-dimensional marginal distribution of the features has a finite
    (
    ho)th moment for some (
    ho > 4) (as well as other regularity conditions), a
    local choice of (k) can yield a rate of convergence of the excess risk of
    (O(n^{-4/(d+4)})), where (n) is the sample size, whereas for the standard
    (k)-nearest neighbour classifier, our theory would require (d geq 5) and (
    ho
    > 4d/(d-4)) finite moments to achieve this rate. Our results motivate a new
    (k)-nearest neighbour classifier for semi-supervised learning problems, where
    the unlabelled data are used to obtain an estimate of the marginal feature
    density, and fewer neighbours are used for classification when this density
    estimate is small. The potential improvements over the standard (k)-nearest
    neighbour classifier are illustrated both through our theory and via a
    simulation study.

    Clustering in Hilbert simplex geometry

    Frank Nielsen, Ke Sun
    Comments: 18 pages
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Clustering categorical distributions in the probability simplex is a
    fundamental primitive often met in applications dealing with histograms or
    mixtures of multinomials. Traditionally, the differential-geometric structure
    of the probability simplex has been used either by (i) setting the Riemannian
    metric tensor to the Fisher information matrix of the categorical
    distributions, or (ii) defining the information-geometric structure induced by
    a smooth dissimilarity measure, called a divergence. In this paper, we
    introduce a novel computationally-friendly non-Riemannian framework for
    modeling the probability simplex: Hilbert simplex geometry. We discuss the pros
    and cons of those three statistical modelings, and compare them experimentally
    for clustering tasks.

    Restoration of Images with Wavefront Aberrations

    Claudius Zelenka, Reinhard Koch
    Comments: To appear in the proceedings of the 23rd International Conference on Pattern Recognition (ICPR 2016)
    Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV)

    This contribution deals with image restoration in optical systems with
    coherent illumination, which is an important topic in astronomy, coherent
    microscopy and radar imaging. Such optical systems suffer from wavefront
    distortions, which are caused by imperfect imaging components and conditions.
    Known image restoration algorithms work well for incoherent imaging, they fail
    in case of coherent images. In this paper a novel wavefront correction
    algorithm is presented, which allows image restoration under coherent
    conditions. In most coherent imaging systems, especially in astronomy, the
    wavefront deformation is known. Using this information, the proposed algorithm
    allows a high quality restoration even in case of severe wavefront distortions.
    We present two versions of this algorithm, which are an evolution of the
    Gerchberg-Saxton and the Hybrid-Input-Output algorithm. The algorithm is
    verified on simulated and real microscopic images.

    Perspective: Energy Landscapes for Machine Learning

    Andrew J. Ballard, Ritankar Das, Stefano Martiniani, Dhagash Mehta, Levent Sagun, Jacob D. Stevenson, David J. Wales
    Comments: 41 pages, 25 figures. Accepted for publication in Physical Chemistry Chemical Physics, 2017
    Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Machine learning techniques are being increasingly used as flexible
    non-linear fitting and prediction tools in the physical sciences. Fitting
    functions that exhibit multiple solutions as local minima can be analysed in
    terms of the corresponding machine learning landscape. Methods to explore and
    visualise molecular potential energy landscapes can be applied to these machine
    learning landscapes to gain new insight into the solution space involved in
    training and the nature of the corresponding predictions. In particular, we can
    define quantities analogous to molecular structure, thermodynamics, and
    kinetics, and relate these emergent properties to the structure of the
    underlying landscape. This Perspective aims to describe these analogies with
    examples from recent applications, and suggest avenues for new
    interdisciplinary research.


    Artificial Intelligence

    Structured Parallel Programming for Monte Carlo Tree Search

    S. Ali Mirsoleimani, Aske Plaat, Jaap van den Herik, Jos Vermaseren
    Comments: 9 pages
    Subjects: Artificial Intelligence (cs.AI)

    In this paper, we present a new algorithm for parallel Monte Carlo tree
    search (MCTS). It is based on the pipeline pattern and allows flexible
    management of the control flow of the operations in parallel MCTS. The pipeline
    pattern provides for the first structured parallel programming approach to
    MCTS. Moreover, we propose a new lock-free tree data structure for parallel
    MCTS which removes synchronization overhead. The Pipeline Pattern for Parallel
    MCTS algorithm (called 3PMCTS), scales very well to higher numbers of cores
    when compared to the existing methods.

    Comparison of ontology alignment algorithms across single matching task via the McNemar test

    Majid Mohammadi, Amir Ahooye Atashin, Wout Hofman, Yaohua Tan
    Subjects: Artificial Intelligence (cs.AI)

    Ontology alignment is widely used to find the correspondences between
    different ontologies in diverse fields. After discovering the alignment by
    methods, several performance scores are available to evaluate them. The scores
    require the produced alignment by a method and the reference alignment
    containing the underlying actual correspondences of the given ontologies. The
    current trend in alignment evaluation is to put forward a new score and to
    compare various alignments by juxtaposing their performance scores. However, it
    is substantially provocative to select one performance score among others for
    comparison. On top of that, claiming if one method has a better performance
    than one another can not be substantiated by solely comparing the scores. In
    this paper, we propose the statistical procedures which enable us to
    theoretically favor one method over one another. The McNemar test is considered
    as a reliable and suitable means for comparing two ontology alignment methods
    over one matching task. The test applies to a 2 x 2 contingency table which can
    be constructed in two different ways based on the alignments, each of which has
    their own merits/pitfalls. The ways of the contingency table construction and
    various apposite statistics from the McNemar test are elaborated in minute
    detail. In the case of having more than two alignment methods for comparison,
    the family-wise error rate is expected to happen. Thus, the ways of preventing
    such an error are also discussed. A directed graph visualizes the outcome of
    the McNemar test in the presence of multiple alignment methods. From this
    graph, it is readily understood if one method is better than one another or if
    their differences are imperceptible. Our investigation on the methods
    participated in the anatomy track of OAEI 2016 demonstrates that AML and
    CroMatcher are the top two methods and DKP-AOM and Alin are the bottom two
    ones.

    It Takes Two to Tango: Towards Theory of AI's Mind

    Arjun Chandrasekaran, Deshraj Yadav, Prithvijit Chattopadhyay, Viraj Prabhu, Devi Parikh
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Theory of Mind is the ability to attribute mental states (beliefs, intents,
    knowledge, perspectives, etc.) to others and recognize that these mental states
    may differ from one’s own. Theory of Mind is critical to effective
    communication and to teams demonstrating higher collective performance. To
    effectively leverage the progress in Artificial Intelligence (AI) to make our
    lives more productive, it is important for humans and AI to work well together
    in a team. Traditionally, there has been much emphasis on research to make AI
    more accurate, and (to a lesser extent) on having it better understand human
    intentions, tendencies, beliefs, and contexts. The latter involves making AI
    more human-like and having it develop a theory of our minds.

    In this work, we argue that for human-AI teams to be effective, humans must
    also develop a theory of AI’s mind – get to know its strengths, weaknesses,
    beliefs, and quirks. We instantiate these ideas within the domain of Visual
    Question Answering (VQA). We find that using just a few examples(50), lay
    people can be trained to better predict responses and oncoming failures of a
    complex VQA model. Surprisingly, we find that having access to the model’s
    internal states – its confidence in its top-k predictions, explicit or implicit
    attention maps which highlight regions in the image (and words in the question)
    the model is looking at (and listening to) while answering a question about an
    image – do not help people better predict its behavior

    Semi-Supervised Generation with Cluster-aware Generative Models

    Lars Maaløe, Marco Fraccaro, Ole Winther
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Deep generative models trained with large amounts of unlabelled data have
    proven to be powerful within the domain of unsupervised learning. Many real
    life data sets contain a small amount of labelled data points, that are
    typically disregarded when training generative models. We propose the
    Cluster-aware Generative Model, that uses unlabelled information to infer a
    latent representation that models the natural clustering of the data, and
    additional labelled data points to refine this clustering. The generative
    performances of the model significantly improve when labelled information is
    exploited, obtaining a log-likelihood of -79.38 nats on permutation invariant
    MNIST, while also achieving competitive semi-supervised classification
    accuracies. The model can also be trained fully unsupervised, and still improve
    the log-likelihood performance with respect to related methods.

    Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

    Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox
    Comments: 10 pages, 7 figures, ICCV 2017 submission
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)

    General human action recognition requires understanding of various visual
    cues. In this paper, we propose a network architecture that computes and
    integrates the most important visual cues for action recognition: pose, motion,
    and the raw images. For the integration, we introduce a Markov chain model
    which adds cues successively. The resulting approach is efficient and
    applicable to action classification as well as to spatial and temporal action
    localization. The two contributions clearly improve the performance over
    respective baselines. The overall approach achieves state-of-the-art action
    classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover,
    it yields state-of-the-art spatio-temporal action localization results on
    UCF101 and J-HMDB.

    Multi-Task Learning of Keyphrase Boundary Classification

    Isabelle Augenstein, Anders Søgaard
    Comments: ACL 2017
    Journal-ref: ACL 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Keyphrase boundary classification (KBC) is the task of detecting keyphrases
    in scientific articles and labelling them with respect to predefined types.
    Although important in practice, this task is so far underexplored, partly due
    to the lack of labelled data. To overcome this, we explore several auxiliary
    tasks, including semantic super-sense tagging and identification of multi-word
    expressions, and cast the task as a multi-task learning problem with deep
    recurrent neural networks. Our multi-task models perform significantly better
    than previous state of the art approaches on two scientific KBC datasets,
    particularly for long keyphrases.

    Potential Functions based Sampling Heuristic For Optimal Path Planning

    Ahmed Hussain Qureshi, Yasar Ayaz
    Comments: This paper introduces a novel algorithm called P-RRT*. The work has been published in Springer Autonomous Robots Journal
    Journal-ref: Autonomous Robots 40, no. 6 (2016): 1079-1093
    Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

    Rapidly-exploring Random Tree Star(RRT*) is a recently proposed extension of
    Rapidly-exploring Random Tree (RRT) algorithm that provides a collision-free,
    asymptotically optimal path regardless of obstacle’s geometry in a given
    environment. However, one of the limitations in the RRT* algorithm is slow
    convergence to optimal path solution. As a result, it consumes high memory as
    well as time due to a large number of iterations utilised in achieving optimal
    path solution. To overcome these limitations, we propose the Potential Function
    Based-RRT* (P-RRT*) that incorporates the Artificial Potential Field Algorithm
    in RRT*. The proposed algorithm allows a considerable decrease in the number of
    iterations and thus leads to more efficient memory utilization and an
    accelerated convergence rate. In order to illustrate the usefulness of the
    proposed algorithm in terms of space execution and convergence rate, this paper
    presents rigorous simulation based comparisons between the proposed techniques
    and RRT* under different environmental conditions. Moreover, both algorithms
    are also tested and compared under non-holonomic differential constraints.

    Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

    Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    A grand goal of computer vision is to build systems that learn visual
    representations over time that can be applied to many tasks. In this paper, we
    investigate a vision-language embedding as a core representation and show that
    it leads to better cross-task transfer than standard multi-task learning. In
    particular, the task of visual recognition is aligned to the task of visual
    question answering by forcing each to use the same word-region embeddings. We
    show this leads to greater inductive transfer from recognition to VQA than
    standard multitask learning. Visual recognition also improves, especially for
    categories that have relatively few recognition training labels but appear
    often in the VQA setting. Thus, our paper takes a small step towards creating
    more general vision systems by showing the benefit of interpretable, flexible,
    and trainable core representations.

    Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification

    Lianhui Qin, Zhisong Zhang, Hai Zhao, Zhiting Hu, Eric P. Xing
    Comments: To appear in ACL2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Implicit discourse relation classification is of great challenge due to the
    lack of connectives as strong linguistic cues, which motivates the use of
    annotated implicit connectives to improve the recognition. We propose a feature
    imitation framework in which an implicit relation network is driven to learn
    from another neural network with access to connectives, and thus encouraged to
    extract similarly salient features for accurate classification. We develop an
    adversarial model to enable an adaptive imitation scheme through competition
    between the implicit network and a rival feature discriminator. Our method
    effectively transfers discriminability of connectives to the implicit features,
    and achieves state-of-the-art performance on the PDTB benchmark.

    Ontological Multidimensional Data Models and Contextual Data Qality

    Leopoldo Bertossi, Mostafa Milani
    Comments: Journal submission. Extended version of RuleML’15 paper
    Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)

    Data quality assessment and data cleaning are context-dependent activities.
    Motivated by this observation, we propose the Ontological Multidimensional Data
    Model (OMD model), which can be used to model and represent contexts as
    logic-based ontologies. The data under assessment is mapped into the context,
    for additional analysis, processing, and quality data extraction. The resulting
    contexts allow for the representation of dimensions, and multidimensional data
    quality assessment becomes possible. At the core of a multidimensional context
    we include a generalized multidimensional data model and a Datalog+/- ontology
    with provably good properties in terms of query answering. These main
    components are used to represent dimension hierarchies, dimensional
    constraints, dimensional rules, and define predicates for quality data
    specification. Query answering relies upon and triggers navigation through
    dimension hierarchies, and becomes the basic tool for the extraction of quality
    data. The OMD model is interesting per se, beyond applications to data quality.
    It allows for a logic-based, and computationally tractable representation of
    multidimensional data, extending previous multidimensional data models with
    additional expressive power and functionalities.

    On the Reliable Detection of Concept Drift from Streaming Unlabeled Data

    Tegjyot Singh Sethi, Mehmed Kantardzic
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Classifiers deployed in the real world operate in a dynamic environment,
    where the data distribution can change over time. These changes, referred to as
    concept drift, can cause the predictive performance of the classifier to drop
    over time, thereby making it obsolete. To be of any real use, these classifiers
    need to detect drifts and be able to adapt to them, over time. Detecting drifts
    has traditionally been approached as a supervised task, with labeled data
    constantly being used for validating the learned model. Although effective in
    detecting drifts, these techniques are impractical, as labeling is a difficult,
    costly and time consuming activity. On the other hand, unsupervised change
    detection techniques are unreliable, as they produce a large number of false
    alarms. The inefficacy of the unsupervised techniques stems from the exclusion
    of the characteristics of the learned classifier, from the detection process.
    In this paper, we propose the Margin Density Drift Detection (MD3) algorithm,
    which tracks the number of samples in the uncertainty region of a classifier,
    as a metric to detect drift. The MD3 algorithm is a distribution independent,
    application independent, model independent, unsupervised and incremental
    algorithm for reliably detecting drifts from data streams. Experimental
    evaluation on 6 drift induced datasets and 4 additional datasets from the
    cybersecurity domain demonstrates that the MD3 approach can reliably detect
    drifts, with significantly fewer false alarms compared to unsupervised feature
    based drift detectors. The reduced false alarms enables the signaling of drifts
    only when they are most likely to affect classification performance. As such,
    the MD3 approach leads to a detection scheme which is credible, label efficient
    and general in its applicability.


    Information Retrieval

    AutoSVD++: An Efficient Hybrid Collaborative Filtering Model via Contractive Auto-encoders

    Shuai Zhang, Lina Yao, Xiwei Xu
    Comments: 5 pages, 2 figures
    Subjects: Information Retrieval (cs.IR)

    Collaborative filtering (CF) has been successfully used to provide users with
    personalized products and services. However, dealing with the increasing
    sparseness of user-item matrix still remains a challenge. To tackle such issue,
    hybrid CF such as combining with content based filtering and leveraging side
    information of users and items has been extensively studied to enhance
    performance. However, most of these approaches depend on hand-crafted feature
    engineering, which are usually noise-prone and biased by different feature
    extraction and selection schemes. In this paper, we propose a new hybrid model
    by generalizing contractive auto-encoder paradigm into matrix factorization
    framework with good scalability and computational efficiency, which jointly
    model content information as representations of effectiveness and compactness,
    and leverage implicit user feedback to make accurate recommendations. Extensive
    experiments conducted over three large-scale real datasets indicate the
    proposed approach outperforms state-of-art methods for item recommendation.

    Exploring Choice Overload in Related-Article Recommendations in Digital Libraries

    Felix Beierle, Akiko Aizawa, Joeran Beel
    Comments: Accepted for publication at the 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR2017)
    Subjects: Information Retrieval (cs.IR)

    We investigate the problem of choice overload – the difficulty of making a
    decision when faced with many options – when displaying related-article
    recommendations in digital libraries. So far, research regarding to how many
    items should be displayed has mostly been done in the fields of media
    recommendations and search engines. We analyze the number of recommendations in
    current digital libraries. When browsing fullscreen with a laptop or desktop
    PC, all display a fixed number of recommendations. 72% display three, four, or
    five recommendations, none display more than ten. We provide results from an
    empirical evaluation conducted with GESIS’ digital library Sowiport, with
    recommendations delivered by recommendations-as-a-service provider Mr. DLib. We
    use click-through rate as a measure of recommendation effectiveness based on
    3.4 million delivered recommendations. Our results show lower click-through
    rates for higher numbers of recommendations and twice as many clicked
    recommendations when displaying ten instead of one related-articles. Our
    results indicate that users might quickly feel overloaded by choice.

    Real-World Recommender Systems for Academia: The Pain and Gain in Building, Operating, and Researching them [Long Version]

    Joeran Beel, Siddharth Dinesh
    Comments: This article is a long version of the article published in the Proceedings of the 5th International Workshop on Bibliometric-enhanced Information Retrieval (BIR)
    Subjects: Information Retrieval (cs.IR)

    Research on recommender systems is a challenging task, as is building and
    operating such systems. Major challenges include non-reproducible research
    results, dealing with noisy data, and answering many questions such as how many
    recommendations to display, how often, and, of course, how to generate
    recommendations most effectively. In the past six years, we built three
    research-article recommender systems for digital libraries and reference
    managers, and conducted research on these systems. In this paper, we share some
    experiences we made during that time. Among others, we discuss the required
    skills to build recommender systems, and why the literature provides little
    help in identifying promising recommendation approaches. We explain the
    challenge in creating a randomization engine to run A/B tests, and how low data
    quality impacts the calculation of bibliometrics. We further discuss why
    several of our experiments delivered disappointing results, and provide
    statistics on how many researchers showed interest in our recommendation
    dataset.

    Detection and Resolution of Rumours in Social Media: A Survey

    Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, Rob Procter
    Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Despite the increasing use of social media platforms for information and news
    gathering, its unmoderated nature often leads to the emergence and spread of
    rumours, i.e. pieces of information that are unverified at the time of posting.
    At the same time, the openness of social media platforms provides opportunities
    to study how users share and discuss rumours, and to explore how natural
    language processing and data mining techniques may be used to find ways of
    determining their veracity. In this survey we introduce and discuss two types
    of rumours that circulate on social media; long-standing rumours that circulate
    for long periods of time, and newly-emerging rumours spawned during fast-paced
    events such as breaking news, where reports are released piecemeal and often
    with an unverified status in their early stages. We provide an overview of
    research into social media rumours with the ultimate goal of developing a
    rumour classification system that consists of four components: rumour
    detection, rumour tracking, rumour stance classification and rumour veracity
    classification. We delve into the approaches presented in the scientific
    literature for the development of each of these four components. We summarise
    the efforts and achievements so far towards the development of rumour
    classification systems and conclude with suggestions for avenues for future
    research in social media mining for detection and resolution of rumours.

    Opinion Mining on Non-English Short Text

    Esra Akbas
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    As the type and the number of such venues increase, automated analysis of
    sentiment on textual resources has become an essential data mining task. In
    this paper, we investigate the problem of mining opinions by extracting aspects
    of entities on the collection of informal short texts. Both positive and
    negative sentiment strength of texts are detected. We focus on a non-English
    language that has few resources for text mining. This approach would help
    enhance the sentiment analysis in languages where a list of opinionated words
    does not exist. We propose a new method projects the text into dense and low
    dimensional feature vectors according to the sentiment strength of the words.
    We detect the mixture of positive and negative sentiments on a multi-variant
    scale. Empirical evaluation of the proposed framework on Turkish tweets shows
    that our approach gets good results for opinion mining.


    Computation and Language

    Detection and Resolution of Rumours in Social Media: A Survey

    Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, Rob Procter
    Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    Despite the increasing use of social media platforms for information and news
    gathering, its unmoderated nature often leads to the emergence and spread of
    rumours, i.e. pieces of information that are unverified at the time of posting.
    At the same time, the openness of social media platforms provides opportunities
    to study how users share and discuss rumours, and to explore how natural
    language processing and data mining techniques may be used to find ways of
    determining their veracity. In this survey we introduce and discuss two types
    of rumours that circulate on social media; long-standing rumours that circulate
    for long periods of time, and newly-emerging rumours spawned during fast-paced
    events such as breaking news, where reports are released piecemeal and often
    with an unverified status in their early stages. We provide an overview of
    research into social media rumours with the ultimate goal of developing a
    rumour classification system that consists of four components: rumour
    detection, rumour tracking, rumour stance classification and rumour veracity
    classification. We delve into the approaches presented in the scientific
    literature for the development of each of these four components. We summarise
    the efforts and achievements so far towards the development of rumour
    classification systems and conclude with suggestions for avenues for future
    research in social media mining for detection and resolution of rumours.

    Neural Lattice-to-Sequence Models for Uncertain Inputs

    Matthias Sperber, Graham Neubig, Jan Niehues, Alex Waibel
    Subjects: Computation and Language (cs.CL)

    The input to a neural sequence-to-sequence model is often determined by an
    up-stream system, e.g. a word segmenter, part of speech tagger, or speech
    recognizer. These up-stream models are potentially error-prone. Representing
    inputs through word lattices allows making this uncertainty explicit by
    capturing alternative sequences and their posterior probabilities in a compact
    form. In this work, we extend the TreeLSTM (Tai et al., 2015) into a
    LatticeLSTM that is able to consume word lattices, and can be used as encoder
    in an attentional encoder-decoder model. We integrate lattice posterior scores
    into this architecture by extending the TreeLSTM’s child-sum and forget gates
    and introducing a bias term into the attention mechanism. We experiment with
    speech translation lattices and report consistent improvements over baselines
    that translate either the 1-best hypothesis or the lattice without posterior
    scores.

    A Transition-Based Directed Acyclic Graph Parser for UCCA

    Daniel Hershcovich, Omri Abend, Ari Rappoport
    Comments: 16 pages; Accepted as long paper at ACL2017
    Subjects: Computation and Language (cs.CL)

    We present the first parser for UCCA, a cross-linguistically applicable
    framework for semantic representation, which builds on extensive typological
    work and supports rapid annotation. UCCA poses a challenge for existing parsing
    techniques, as it exhibits reentrancy (resulting in DAG structures),
    discontinuous structures and non-terminal nodes corresponding to complex
    semantic units. To our knowledge, the conjunction of these formal properties is
    not supported by any existing parser. Our transition-based parser, which uses a
    novel transition set and features based on bidirectional LSTMs, has value not
    just for UCCA parsing: its ability to handle more general graph structures can
    inform the development of parsers for other semantic DAG structures, and in
    languages that frequently use discontinuous structures.

    Multi-Task Learning of Keyphrase Boundary Classification

    Isabelle Augenstein, Anders Søgaard
    Comments: ACL 2017
    Journal-ref: ACL 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Keyphrase boundary classification (KBC) is the task of detecting keyphrases
    in scientific articles and labelling them with respect to predefined types.
    Although important in practice, this task is so far underexplored, partly due
    to the lack of labelled data. To overcome this, we explore several auxiliary
    tasks, including semantic super-sense tagging and identification of multi-word
    expressions, and cast the task as a multi-task learning problem with deep
    recurrent neural networks. Our multi-task models perform significantly better
    than previous state of the art approaches on two scientific KBC datasets,
    particularly for long keyphrases.

    Combining Lexical and Syntactic Features for Detecting Content-dense Texts in News

    Yinfei Yang, Ani Nenkova
    Comments: In submission to JAIR
    Subjects: Computation and Language (cs.CL)

    Content-dense news report important factual information about an event in
    direct, succinct manner. Information seeking applications such as information
    extraction, question answering and summarization normally assume all text they
    deal with is content-dense. Here we empirically test this assumption on news
    articles from the business, U.S. international relations, sports and science
    journalism domains. Our findings clearly indicate that about half of the news
    texts in our study are in fact not content-dense and motivate the development
    of a supervised content-density detector. We heuristically label a large
    training corpus for the task and train a two-layer classifying model based on
    lexical and unlexicalized syntactic features. On manually annotated data, we
    compare the performance of domain-specific classifiers, trained on data only
    from a given news domain and a general classifier in which data from all four
    domains is pooled together. Our annotation and prediction experiments
    demonstrate that the concept of content density varies depending on the domain
    and that naive annotators provide judgement biased toward the stereotypical
    domain label. Domain-specific classifiers are more accurate for domains in
    which content-dense texts are typically fewer. Domain independent classifiers
    reproduce better naive crowdsourced judgements. Classification prediction is
    high across all conditions, around 80%.

    Syntax Aware LSTM Model for Chinese Semantic Role Labeling

    Feng Qian, Lei Sha, Baobao Chang, Lu-chen Liu, Ming Zhang
    Subjects: Computation and Language (cs.CL)

    Traditional approaches to Semantic Role Labeling (SRL) depend heavily on
    manual feature engineering. Recurrent neural network (RNN) with long-short-term
    memory (LSTM) only treats sentence as sequence data and can not utilize higher
    level syntactic information. In this paper, we propose Syntax Aware LSTM
    (SA-LSTM) which gives RNN-LSTM ability to utilize higher level syntactic
    information gained from dependency relationship information. SA-LSTM also
    assigns different trainable weights to different types of dependency
    relationship automatically. Experiment results on Chinese Proposition Bank
    (CPB) show that, even without pre-training or introducing any other extra
    semantically annotated resources, our SA-LSTM model still outperforms the state
    of the art significantly base on Student’s t-test ((p<0.05)). Trained weights
    of types of dependency relationship form a stable and self-explanatory pattern.

    Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings

    Junki Matsuo, Mamoru Komachi, Katsuhito Sudoh
    Comments: 5 pages
    Subjects: Computation and Language (cs.CL)

    One of the most important problems in machine translation (MT) evaluation is
    to evaluate the similarity between translation hypotheses with different
    surface forms from the reference, especially at the segment level. We propose
    to use word embeddings to perform word alignment for segment-level MT
    evaluation. We performed experiments with three types of alignment methods
    using word embeddings. We evaluated our proposed methods with various
    translation datasets. Experimental results show that our proposed methods
    outperform previous word embeddings-based methods.

    Building a Neural Machine Translation System Using Only Synthetic Parallel Data

    Jaehong Park, Byunggook Na, Sungroh Yoon
    Comments: 10 pages, 2 figures, 3 tables
    Subjects: Computation and Language (cs.CL)

    Recent works have proved that synthetic parallel data generated by existing
    translation models can be an effective solution to various neural machine
    translation (NMT) issues. In this study, we construct NMT systems using only
    synthetic parallel data. As an effective alternative to real parallel data, we
    also present a new type of synthetic parallel corpus. The proposed pseudo
    parallel data are distinct from previous approaches in that real and synthetic
    sentences are mixed on both sides of sentence pairs. Experiments on
    Czech-German and French-German translations demonstrate the efficacy of the
    proposed pseudo parallel corpus that guarantees not only both balanced and
    competitive performance for bidirectional translation but also substantial
    improvement with the aid of a real parallel corpus.

    Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification

    Lianhui Qin, Zhisong Zhang, Hai Zhao, Zhiting Hu, Eric P. Xing
    Comments: To appear in ACL2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Implicit discourse relation classification is of great challenge due to the
    lack of connectives as strong linguistic cues, which motivates the use of
    annotated implicit connectives to improve the recognition. We propose a feature
    imitation framework in which an implicit relation network is driven to learn
    from another neural network with access to connectives, and thus encouraged to
    extract similarly salient features for accurate classification. We develop an
    adversarial model to enable an adaptive imitation scheme through competition
    between the implicit network and a rival feature discriminator. Our method
    effectively transfers discriminability of connectives to the implicit features,
    and achieves state-of-the-art performance on the PDTB benchmark.

    Multimodal Dialogs (MMD): A large-scale dataset for studying multimodal domain-aware conversations

    Amrita Saha, Mitesh Khapra, Karthik Sankaranarayanan
    Subjects: Computation and Language (cs.CL)

    Owing to the success of deep learning techniques for tasks such as Q/A and
    text-based dialog, there is an increasing demand for AI agents in several
    domains such as retail, travel, entertainment, etc. that can carry on
    multimodal conversations with humans employing both text and images within a
    dialog seamlessly. However, deep learning research is this area has been
    limited primarily due to the lack of availability of large-scale, open
    conversation datasets. To overcome this bottleneck, in this paper we introduce
    the task of multi-modal, domain-aware conversations, and propose the MMD
    benchmark dataset to- wards this task. This dataset was gathered by working in
    close coordination with large number of domain experts in the retail domain and
    consists of over 150K conversation sessions between shoppers and sales agents.
    With this dataset, we propose 5 new sub-tasks for multimodal conversations
    along with their evaluation methodology. We also propose two novel multi-modal
    deep learning models in the encode- attend-decode paradigm and demonstrate
    their performance on two of the sub-tasks, namely text response generation and
    best image response selection. These experiments serve to establish baseline
    performance numbers and open new directions of research for each of these
    sub-tasks.

    Sentiment Analysis of Citations Using Word2vec

    Haixia Liu
    Subjects: Computation and Language (cs.CL)

    Citation sentiment analysis is an important task in scientific paper
    analysis. Existing machine learning techniques for citation sentiment analysis
    are focusing on labor-intensive feature engineering, which requires large
    annotated corpus. As an automatic feature extraction tool, word2vec has been
    successfully applied to sentiment analysis of short texts. In this work, I
    conducted empirical research with the question: how well does word2vec work on
    the sentiment analysis of citations? The proposed method constructed sentence
    vectors (sent2vec) by averaging the word embeddings, which were learned from
    Anthology Collections (ACL-Embeddings). I also investigated polarity-specific
    word embeddings (PS-Embeddings) for classifying positive and negative
    citations. The sentence vectors formed a feature space, to which the examined
    citation sentence was mapped to. Those features were input into classifiers
    (support vector machines) for supervised classification. Using
    10-cross-validation scheme, evaluation was conducted on a set of annotated
    citations. The results showed that word embeddings are effective on classifying
    positive and negative citations. However, hand-crafted features performed
    better for the overall classification.

    Psychological and Personality Profiles of Political Extremists

    Meysam Alizadeh, Ingmar Weber, Claudio Cioffi-Revilla, Santo Fortunato, Michael Macy
    Comments: 11 pages, 3 figures
    Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

    Global recruitment into radical Islamic movements has spurred renewed
    interest in the appeal of political extremism. Is the appeal a rational
    response to material conditions or is it the expression of psychological and
    personality disorders associated with aggressive behavior, intolerance,
    conspiratorial imagination, and paranoia? Empirical answers using surveys have
    been limited by lack of access to extremist groups, while field studies have
    lacked psychological measures and failed to compare extremists with contrast
    groups. We revisit the debate over the appeal of extremism in the U.S. context
    by comparing publicly available Twitter messages written by over 355,000
    political extremist followers with messages written by non-extremist U.S.
    users. Analysis of text-based psychological indicators supports the moral
    foundation theory which identifies emotion as a critical factor in determining
    political orientation of individuals. Extremist followers also differ from
    others in four of the Big Five personality traits.

    Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems

    Layla El Asri, Hannes Schulz, Shikhar Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul Mehrotra, Kaheer Suleman
    Subjects: Computation and Language (cs.CL)

    This paper presents the Frames dataset (Frames is available at
    this http URL), a corpus of 1369 human-human dialogues
    with an average of 15 turns per dialogue. We developed this dataset to study
    the role of memory in goal-oriented dialogue systems. Based on Frames, we
    introduce a task called frame tracking, which extends state tracking to a
    setting where several states are tracked simultaneously. We propose a baseline
    model for this task. We show that Frames can also be used to study memory in
    dialogue management and information presentation through natural language
    generation.

    One-Shot Neural Cross-Lingual Transfer for Paradigm Completion

    Katharina Kann, Ryan Cotterell, Hinrich Schütze
    Comments: Accepted at ACL 2017
    Subjects: Computation and Language (cs.CL)

    We present a novel cross-lingual transfer method for paradigm completion, the
    task of mapping a lemma to its inflected forms, using a neural encoder-decoder
    model, the state of the art for the monolingual task. We use labeled data from
    a high-resource language to increase performance on a low-resource language. In
    experiments on 21 language pairs from four different language families, we
    obtain up to 58% higher accuracy than without transfer and show that even
    zero-shot and one-shot learning are possible. We further find that the degree
    of language relatedness strongly influences the ability to transfer
    morphological knowledge.

    Reading Wikipedia to Answer Open-Domain Questions

    Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
    Subjects: Computation and Language (cs.CL)

    This paper proposes to tackle open-domain question answering using Wikipedia
    as the unique knowledge source: the answer to any factoid question is a text
    span in a Wikipedia article. This task of machine reading at scale combines the
    challenges of document retrieval – finding the relevant articles – with that of
    machine comprehension of text – identifying the answer spans from those
    articles. Our approach combines a search component based on bigram hashing and
    TF-IDF matching with a multi-layer recurrent neural network model trained to
    detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA
    datasets indicate that (1) both modules are highly competitive with respect to
    existing counterparts and (2) multitask learning using distant supervision on
    their combination is an effective complete system on this challenging task.

    Opinion Mining on Non-English Short Text

    Esra Akbas
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    As the type and the number of such venues increase, automated analysis of
    sentiment on textual resources has become an essential data mining task. In
    this paper, we investigate the problem of mining opinions by extracting aspects
    of entities on the collection of informal short texts. Both positive and
    negative sentiment strength of texts are detected. We focus on a non-English
    language that has few resources for text mining. This approach would help
    enhance the sentiment analysis in languages where a list of opinionated words
    does not exist. We propose a new method projects the text into dense and low
    dimensional feature vectors according to the sentiment strength of the words.
    We detect the mixture of positive and negative sentiments on a multi-variant
    scale. Empirical evaluation of the proposed framework on Turkish tweets shows
    that our approach gets good results for opinion mining.

    It Takes Two to Tango: Towards Theory of AI's Mind

    Arjun Chandrasekaran, Deshraj Yadav, Prithvijit Chattopadhyay, Viraj Prabhu, Devi Parikh
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Theory of Mind is the ability to attribute mental states (beliefs, intents,
    knowledge, perspectives, etc.) to others and recognize that these mental states
    may differ from one’s own. Theory of Mind is critical to effective
    communication and to teams demonstrating higher collective performance. To
    effectively leverage the progress in Artificial Intelligence (AI) to make our
    lives more productive, it is important for humans and AI to work well together
    in a team. Traditionally, there has been much emphasis on research to make AI
    more accurate, and (to a lesser extent) on having it better understand human
    intentions, tendencies, beliefs, and contexts. The latter involves making AI
    more human-like and having it develop a theory of our minds.

    In this work, we argue that for human-AI teams to be effective, humans must
    also develop a theory of AI’s mind – get to know its strengths, weaknesses,
    beliefs, and quirks. We instantiate these ideas within the domain of Visual
    Question Answering (VQA). We find that using just a few examples(50), lay
    people can be trained to better predict responses and oncoming failures of a
    complex VQA model. Surprisingly, we find that having access to the model’s
    internal states – its confidence in its top-k predictions, explicit or implicit
    attention maps which highlight regions in the image (and words in the question)
    the model is looking at (and listening to) while answering a question about an
    image – do not help people better predict its behavior

    Topic modeling of public repositories at scale using names in source code

    Vadim Markovtsev, Eiso Kant
    Comments: 11 pages
    Subjects: Programming Languages (cs.PL); Computation and Language (cs.CL)

    Programming languages themselves have a limited number of reserved keywords
    and character based tokens that define the language specification. However,
    programmers have a rich use of natural language within their code through
    comments, text literals and naming entities. The programmer defined names that
    can be found in source code are a rich source of information to build a high
    level understanding of the project. The goal of this paper is to apply topic
    modeling to names used in over 13.6 million repositories and perceive the
    inferred topics. One of the problems in such a study is the occurrence of
    duplicate repositories not officially marked as forks (obscure forks). We show
    how to address it using the same identifiers which are extracted for topic
    modeling.

    We open with a discussion on naming in source code, we then elaborate on our
    approach to remove exact duplicate and fuzzy duplicate repositories using
    Locality Sensitive Hashing on the bag-of-words model and then discuss our work
    on topic modeling; and finally present the results from our data analysis
    together with open-access to the source code, tools and datasets.


    Distributed, Parallel, and Cluster Computing

    Optimizing Communication by Compression for Multi-GPU Scalable Breadth-First Searches

    Julian Romera
    Comments: Initial version, 105 pages
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Performance (cs.PF)

    The Breadth First Search (BFS) algorithm is the foundation and building block
    of many higher graph-based operations such as spanning trees, shortest paths
    and betweenness centrality. The importance of this algorithm increases each day
    due to it is a key requirement for many data structures which are becoming
    popular nowadays. When the BFS algorithm is parallelized by distributing the
    graph between several processors the interconnection network limits the
    performance. Hence, improvements on this area may benefit the overall
    performance of the algorithm.

    This work presents an alternative compression scheme for communications in
    distributed BFS processing. It focuses on BFS processors using General-Purpose
    Graphics Processing Units.

    A Survey of Distributed Message Broker Queues

    Vineet John, Xia Liu
    Comments: 8 pages, 10 figures, 1 table
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This paper surveys the message brokers that are in vogue today for
    distributed communication. Their primary goal is to facilitate the construction
    of decentralized topologies without single points of failure, enabling fault
    tolerance and high availability. These characteristics make them optimal for
    usage within distributed architectures. However, there are multiple protocols
    built to achieve this, and it would be beneficial to have a empirical
    comparison between their features and performance to determine their real-world
    applicability.

    This paper focuses on two popular protocols (Kafka and AMQP) and explores the
    divergence in their features as well as their performance under varied testing
    workloads.

    Moderately Complex Paxos Made Simple: High-Level Specification of Distributed Algorithm

    Yanhong A. Liu, Saksham Chand, Scott D. Stoller
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    This paper presents simpler high-level specifications of more complex
    variants of the Paxos algorithm for distributed consensus. The development of
    the specifications is by using a method and language for expressing complex
    control flows and synchronization conditions precisely at a high level. The
    resulting specifications are both completely precise and fully executable in a
    programming language, and furthermore are formally verified using a proof
    system. We show that high-level specifications allow better understanding,
    faster error discovery, and easier optimization and extension, including a
    general method for merging processes.

    Graph Partitioning with Acyclicity Constraints

    Orlando Moreira, Merten Popp, Christian Schulz
    Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

    Graphs are widely used to model execution dependencies in applications. In
    particular, the NP-complete problem of partitioning a graph under constraints
    receives enormous attention by researchers because of its applicability in
    multiprocessor scheduling. We identified the additional constraint of acyclic
    dependencies between blocks when mapping computer vision and imaging
    applications to a heterogeneous embedded multiprocessor. Existing algorithms
    and heuristics do not address this requirement and deliver results that are not
    applicable for our use-case. In this work, we show that this more constrained
    version of the graph partitioning problem is NP-complete and present heuristics
    that achieve a close approximation of the optimal solution found by an
    exhaustive search for small problem instances and much better scalability for
    larger instances. In addition, we can show a positive impact on the schedule of
    a real imaging application that improves communication volume and execution
    time.

    Loop Tiling in Large-Scale Stencil Codes at Run-time with OPS

    Istvan Z Reguly, Gihan R Mudalige, Mike B Giles
    Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

    Stencil computations occur in a multitude of scientific simulations and
    therefore have been the subject of many domain-specific languages including the
    OPS (Oxford Parallel library for Structured meshes) DSL embedded in
    C/C++/Fortran. OPS is currently used in several large partial differential
    equations (PDE) applications, and has been used as a vehicle to experiment
    with, and deploy performance improving optimisations. The key common bottleneck
    in most stencil codes is data movement, and other research has shown that
    improving data locality through optimisations that schedule across loops do
    particularly well. However, in many large PDE applications it is not possible
    to apply such optimisations through a compiler because in larger-scale codes,
    there are a huge number of options, execution paths and data per grid point,
    many dependent on run-time parameters, and the code is distributed across a
    number of different compilation units. In this paper, we adapt the data
    locality improving optimisation called iteration space slicing for use in large
    OPS apps, relying on run-time analysis and delayed execution. We observe
    speedups of 2x on the Cloverleaf 2D/3D proxy application, which contain 83/141
    loops respectively. The approach is generally applicable to any stencil DSL
    that provides per loop data access information.


    Learning

    No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

    Rong Ge, Chi Jin, Yi Zheng
    Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

    In this paper we develop a new framework that captures the common landscape
    underlying the common non-convex low-rank matrix problems including matrix
    sensing, matrix completion and robust PCA. In particular, we show for all above
    problems (including asymmetric cases): 1) all local minima are also globally
    optimal; 2) no high-order saddle points exists. These results explain why
    simple algorithms such as stochastic gradient descent have global converge, and
    efficiently optimize these non-convex objective functions in practice. Our
    framework connects and simplifies the existing analyses on optimization
    landscapes for matrix sensing and symmetric matrix completion. The framework
    naturally leads to new results for asymmetric matrix completion and robust PCA.

    Soft-to-Hard Vector Quantization for End-to-End Learned Compression of Images and Neural Networks

    Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, Luc Van Gool
    Comments: Supplementary visual examples available at: this http URL
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    In this work we present a new approach to learn compressible representations
    in deep architectures with an end-to-end training strategy. Our method is based
    on a soft relaxation of quantization and entropy, which we anneal to their
    discrete counterparts throughout training. We showcase this method for two
    challenging applications: Image compression and neural network compression.
    While these tasks have typically been approached with different methods, our
    soft-to-hard quantization approach gives state-of-the-art results for both.

    Clustering in Hilbert simplex geometry

    Frank Nielsen, Ke Sun
    Comments: 18 pages
    Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

    Clustering categorical distributions in the probability simplex is a
    fundamental primitive often met in applications dealing with histograms or
    mixtures of multinomials. Traditionally, the differential-geometric structure
    of the probability simplex has been used either by (i) setting the Riemannian
    metric tensor to the Fisher information matrix of the categorical
    distributions, or (ii) defining the information-geometric structure induced by
    a smooth dissimilarity measure, called a divergence. In this paper, we
    introduce a novel computationally-friendly non-Riemannian framework for
    modeling the probability simplex: Hilbert simplex geometry. We discuss the pros
    and cons of those three statistical modelings, and compare them experimentally
    for clustering tasks.

    On Kernelized Multi-armed Bandits

    Sayak Ray Chowdhury, Aditya Gopalan
    Subjects: Learning (cs.LG)

    We consider the stochastic bandit problem with a continuous set of arms, with
    the expected reward function over the arms assumed to be fixed but unknown. We
    provide two new Gaussian process-based algorithms for continuous bandit
    optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and
    derive corresponding regret bounds. Specifically, the bounds hold when the
    expected reward function belongs to the reproducing kernel Hilbert space (RKHS)
    that naturally corresponds to a Gaussian process kernel used as input by the
    algorithms. Along the way, we derive a new self-normalized concentration
    inequality for vector- valued martingales of arbitrary, possibly infinite,
    dimension. Finally, experimental evaluation and comparisons to existing
    algorithms on synthetic and real-world environments are carried out that
    highlight the favorable gains of the proposed strategies in many cases.

    Provable Inductive Robust PCA via Iterative Hard Thresholding

    U.N. Niranjan, Arun Rajkumar, Theja Tulabandhula
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    The robust PCA problem, wherein, given an input data matrix that is the
    superposition of a low-rank matrix and a sparse matrix, we aim to separate out
    the low-rank and sparse components, is a well-studied problem in machine
    learning. One natural question that arises is that, as in the inductive
    setting, if features are provided as input as well, can we hope to do better?
    Answering this in the affirmative, the main goal of this paper is to study the
    robust PCA problem while incorporating feature information. In contrast to
    previous works in which recovery guarantees are based on the convex relaxation
    of the problem, we propose a simple iterative algorithm based on
    hard-thresholding of appropriate residuals. Under weaker assumptions than
    previous works, we prove the global convergence of our iterative procedure;
    moreover, it admits a much faster convergence rate and lesser computational
    complexity per iteration. In practice, through systematic synthetic and real
    data simulations, we confirm our theoretical findings regarding improvements
    obtained by using feature information.

    Understanding Concept Drift

    Geoffrey I. Webb, Loong Kuan Lee, François Petitjean, Bart Goethals
    Subjects: Learning (cs.LG)

    Concept drift is a major issue that greatly affects the accuracy and
    reliability of many real-world applications of machine learning. We argue that
    to tackle concept drift it is important to develop the capacity to describe and
    analyze it. We propose tools for this purpose, arguing for the importance of
    quantitative descriptions of drift in marginal distributions. We present
    quantitative drift analysis techniques along with methods for communicating
    their results. We demonstrate their effectiveness by application to three
    real-world learning tasks.

    Sequential Learning of Analysis Operators

    Michael Sandbichler, Karin Schnass
    Comments: 10 pages, 11 figures
    Subjects: Learning (cs.LG); Numerical Analysis (math.NA)

    In this paper two sequential algorithms for learning analysis operators are
    presented, which are built upon the same optimisation principle underlying both
    Analysis K-SVD and Analysis SimCO and use a stochastic gradient descent
    approach similar to ASimCO. The sequential analysis operator learning (SAOL)
    algorithm is based on projected gradient descent with an appropriately chosen
    step size while the implicit SAOL (ISAOL) algorithm avoids choosing a step size
    altogether by using a strategy inspired by the implicit Euler scheme for
    solving ordinary differential equations. Both algorithms are tested on
    synthetic and image data in comparison to Analysis SimCO and found to give
    slightly better recovery rates resp. decay of the objective function. In a
    final denoising experiment the presented algorithms are again shown to perform
    well in comparison to the state of the art algorithm ASimCO.

    Clustering-based Source-aware Assessment of True Robustness for Learning Models

    Ozsel Kilinc, Ismail Uysal
    Comments: Submitted to UAI 2017
    Subjects: Learning (cs.LG)

    We introduce a novel validation framework to measure the true robustness of
    learning models for real-world applications by creating source-inclusive and
    source-exclusive partitions in a dataset via clustering. We develop a
    robustness metric derived from source-aware lower and upper bounds of model
    accuracy even when data source labels are not readily available. We clearly
    demonstrate that even on a well-explored dataset like MNIST, challenging
    training scenarios can be constructed under the proposed assessment framework
    for two separate yet equally important applications: i) more rigorous learning
    model comparison and ii) dataset adequacy evaluation. In addition, our findings
    not only promise a more complete identification of trade-offs between model
    complexity, accuracy and robustness but can also help researchers optimize
    their efforts in data collection by identifying the less robust and more
    challenging class labels.

    Snapshot Ensembles: Train 1, get M for free

    Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, Kilian Q. Weinberger
    Subjects: Learning (cs.LG)

    Ensembles of neural networks are known to be much more robust and accurate
    than individual networks. However, training multiple deep networks for model
    averaging is computationally expensive. In this paper, we propose a method to
    obtain the seemingly contradictory goal of ensembling multiple neural networks
    at no additional training cost. We achieve this goal by training a single
    neural network, converging to several local minima along its optimization path
    and saving the model parameters. To obtain repeated rapid convergence, we
    leverage recent work on cyclic learning rate schedules. The resulting
    technique, which we refer to as Snapshot Ensembling, is simple, yet
    surprisingly effective. We show in a series of experiments that our approach is
    compatible with diverse network architectures and learning tasks. It
    consistently yields lower error rates than state-of-the-art single models at no
    additional training cost, and compares favorably with traditional network
    ensembles. On CIFAR-10 and CIFAR-100 our DenseNet Snapshot Ensembles obtain
    error rates of 3.4% and 17.4% respectively.

    Assortment Optimization under Unknown MultiNomial Logit Choice Models

    Wang Chi Cheung, David Simchi-Levi
    Comments: 16 pages, 2 figures
    Subjects: Learning (cs.LG)

    Motivated by e-commerce, we study the online assortment optimization problem.
    The seller offers an assortment, i.e. a subset of products, to each arriving
    customer, who then purchases one or no product from her offered assortment. A
    customer’s purchase decision is governed by the underlying MultiNomial Logit
    (MNL) choice model. The seller aims to maximize the total revenue in a finite
    sales horizon, subject to resource constraints and uncertainty in the MNL
    choice model. We first propose an efficient online policy which incurs a regret
    ( ilde{O}(T^{2/3})), where (T) is the number of customers in the sales
    horizon. Then, we propose a UCB policy that achieves a regret
    ( ilde{O}(T^{1/2})). Both regret bounds are sublinear in the number of
    assortments.

    Improved Training of Wasserstein GANs

    Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville
    Subjects: Learning (cs.LG)

    Generative Adversarial Networks (GANs) are powerful generative models, but
    suffer from training instability. The recently proposed Wasserstein GAN (WGAN)
    makes significant progress toward stable training of GANs, but can still
    generate low-quality samples or fail to converge in some settings. We find that
    these training failures are often due to the use of weight clipping in WGAN to
    enforce a Lipschitz constraint on the critic, which can lead to pathological
    behavior. We propose an alternative method for enforcing the Lipschitz
    constraint: instead of clipping weights, penalize the norm of the gradient of
    the critic with respect to its input. Our proposed method converges faster and
    generates higher-quality samples than WGAN with weight clipping. Finally, our
    method enables very stable GAN training: for the first time, we can train a
    wide variety of GAN architectures with almost no hyperparameter tuning,
    including 101-layer ResNets and language models over discrete data.

    Spectral Methods for Nonparametric Models

    Hsiao-Yu Fish Tung, Chao-Yuan Wu, Manzil Zaheer, Alexander J. Smola
    Comments: Keywords: Spectral Methods, Indian Buffet Process, Hierarchical Dirichlet Process
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Nonparametric models are versatile, albeit computationally expensive, tool
    for modeling mixture models. In this paper, we introduce spectral methods for
    the two most popular nonparametric models: the Indian Buffet Process (IBP) and
    the Hierarchical Dirichlet Process (HDP). We show that using spectral methods
    for the inference of nonparametric models are computationally and statistically
    efficient. In particular, we derive the lower-order moments of the IBP and the
    HDP, propose spectral algorithms for both models, and provide reconstruction
    guarantees for the algorithms. For the HDP, we further show that applying
    hierarchical models on dataset with hierarchical structure, which can be solved
    with the generalized spectral HDP, produces better solutions to that of flat
    models regarding likelihood performance.

    Semi-Supervised Generation with Cluster-aware Generative Models

    Lars Maaløe, Marco Fraccaro, Ole Winther
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Deep generative models trained with large amounts of unlabelled data have
    proven to be powerful within the domain of unsupervised learning. Many real
    life data sets contain a small amount of labelled data points, that are
    typically disregarded when training generative models. We propose the
    Cluster-aware Generative Model, that uses unlabelled information to infer a
    latent representation that models the natural clustering of the data, and
    additional labelled data points to refine this clustering. The generative
    performances of the model significantly improve when labelled information is
    exploited, obtaining a log-likelihood of -79.38 nats on permutation invariant
    MNIST, while also achieving competitive semi-supervised classification
    accuracies. The model can also be trained fully unsupervised, and still improve
    the log-likelihood performance with respect to related methods.

    A New Measure of Conditional Dependence for Causal Structural Learning

    Jalal Etesami, Kun Zhang, Negar Kiyavash
    Subjects: Machine Learning (stat.ML); Learning (cs.LG)

    Measuring the dependencies among the variables of a network is of great
    interest to many disciplines. This paper studies the limitations of the
    existing dependencies measures such as their shortcomings in detecting direct
    influences or their lack of ability for group selection in order to have
    effective interventions and introduces a new statistical influence measure to
    overcome them. This measure is inspired by Dobrushin’s coefficients and has
    been developed based on the paradigm that the conditional distribution of the
    variable of interest given all the direct causes will not change by intervening
    on other variables in the system. We show the advantageous of this measure over
    the related measures in the literature. Moreover, we establish the connection
    between our measure and the integral probability metric (IPM) that helps to
    develop estimators for our measure with lower complexity compared to the other
    relevant information theoretic based measures. At the end, we show the
    performance of this measure through a numerical simulation.

    Stop That Join! Discarding Dimension Tables when Learning High Capacity Classifiers

    Vraj Shah, Arun Kumar, Xiaojin Zhu
    Comments: 10 pages
    Subjects: Databases (cs.DB); Learning (cs.LG)

    Many datasets have multiple tables connected by key-foreign key dependencies.
    Data scientists usually join all tables to bring in extra features from the
    so-called dimension tables. Unlike the statistical relational learning setting,
    such joins do not cause record duplications, which means regular IID models are
    typically used. Recent work demonstrated the possibility of using foreign key
    features as representatives for the dimension tables’ features and eliminating
    the latter a priori, potentially saving runtime and effort of data scientists.
    However, the prior work was restricted to linear models and it established a
    dichotomy of when dimension tables are safe to discard due to extra overfitting
    caused by the use of foreign key features. In this work, we revisit that
    question for two popular high capacity models: decision tree and SVM with RBF
    kernel. Our extensive empirical and simulation-based analyses show that these
    two classifiers are surprisingly and counter-intuitively more robust to
    discarding dimension tables and face much less extra overfitting than linear
    models. We provide intuitive explanations for their behavior and identify new
    open questions for further ML theoretical research. We also identify and
    resolve two key practical bottlenecks in using foreign key features.

    Hidden Two-Stream Convolutional Networks for Action Recognition

    Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann
    Comments: under review at ICCV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Multimedia (cs.MM)

    Analyzing videos of human actions involves understanding the temporal
    relationships among video frames. CNNs are the current state-of-the-art methods
    for action recognition in videos. However, the CNN architectures currently
    being used have difficulty in capturing these relationships. State-of-the-art
    action recognition approaches rely on traditional local optical flow estimation
    methods to pre-compute the motion information for CNNs. Such a two-stage
    approach is computationally expensive, storage demanding, and not end-to-end
    trainable. In this paper, we present a novel CNN architecture that implicitly
    captures motion information. Our method is 10x faster than a two-stage
    approach, does not need to cache flow information, and is end-to-end trainable.
    Experimental results on UCF101 and HMDB51 show that it achieves competitive
    accuracy with the two-stage approaches.

    Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

    Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

    A grand goal of computer vision is to build systems that learn visual
    representations over time that can be applied to many tasks. In this paper, we
    investigate a vision-language embedding as a core representation and show that
    it leads to better cross-task transfer than standard multi-task learning. In
    particular, the task of visual recognition is aligned to the task of visual
    question answering by forcing each to use the same word-region embeddings. We
    show this leads to greater inductive transfer from recognition to VQA than
    standard multitask learning. Visual recognition also improves, especially for
    categories that have relatively few recognition training labels but appear
    often in the VQA setting. Thus, our paper takes a small step towards creating
    more general vision systems by showing the benefit of interpretable, flexible,
    and trainable core representations.

    Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification

    Lianhui Qin, Zhisong Zhang, Hai Zhao, Zhiting Hu, Eric P. Xing
    Comments: To appear in ACL2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    Implicit discourse relation classification is of great challenge due to the
    lack of connectives as strong linguistic cues, which motivates the use of
    annotated implicit connectives to improve the recognition. We propose a feature
    imitation framework in which an implicit relation network is driven to learn
    from another neural network with access to connectives, and thus encouraged to
    extract similarly salient features for accurate classification. We develop an
    adversarial model to enable an adaptive imitation scheme through competition
    between the implicit network and a rival feature discriminator. Our method
    effectively transfers discriminability of connectives to the implicit features,
    and achieves state-of-the-art performance on the PDTB benchmark.

    Faster Subgradient Methods for Functions with Hölderian Growth

    Patrick R. Johnstone, Pierre Moulin
    Comments: 49 pages
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Numerical Analysis (cs.NA); Numerical Analysis (math.NA)

    The purpose of this manuscript is to derive new convergence results for
    several subgradient methods for minimizing nonsmooth convex functions with
    H”olderian growth. The growth condition is satisfied in many applications and
    includes functions with quadratic growth and functions with weakly sharp minima
    as special cases. To this end there are four main contributions. First, for a
    constant and sufficiently small stepsize, we show that the subgradient method
    achieves linear convergence up to a certain region including the optimal set
    with error of the order of the stepsize. Second, we derive nonergodic
    convergence rates for the subgradient method under nonsummable decaying
    stepsizes. Thirdly if appropriate problem parameters are known we derive a
    possibly-summable stepsize which obtains a much faster convergence rate.
    Finally we develop a novel “descending stairs” stepsize which obtains this
    faster convergence rate but also obtains linear convergence for the special
    case of weakly sharp functions. We also develop a variant of the “descending
    stairs” stepsize which achieves essentially the same convergence rate without
    requiring an error bound constant which is difficult to estimate in practice.

    On the Reliable Detection of Concept Drift from Streaming Unlabeled Data

    Tegjyot Singh Sethi, Mehmed Kantardzic
    Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)

    Classifiers deployed in the real world operate in a dynamic environment,
    where the data distribution can change over time. These changes, referred to as
    concept drift, can cause the predictive performance of the classifier to drop
    over time, thereby making it obsolete. To be of any real use, these classifiers
    need to detect drifts and be able to adapt to them, over time. Detecting drifts
    has traditionally been approached as a supervised task, with labeled data
    constantly being used for validating the learned model. Although effective in
    detecting drifts, these techniques are impractical, as labeling is a difficult,
    costly and time consuming activity. On the other hand, unsupervised change
    detection techniques are unreliable, as they produce a large number of false
    alarms. The inefficacy of the unsupervised techniques stems from the exclusion
    of the characteristics of the learned classifier, from the detection process.
    In this paper, we propose the Margin Density Drift Detection (MD3) algorithm,
    which tracks the number of samples in the uncertainty region of a classifier,
    as a metric to detect drift. The MD3 algorithm is a distribution independent,
    application independent, model independent, unsupervised and incremental
    algorithm for reliably detecting drifts from data streams. Experimental
    evaluation on 6 drift induced datasets and 4 additional datasets from the
    cybersecurity domain demonstrates that the MD3 approach can reliably detect
    drifts, with significantly fewer false alarms compared to unsupervised feature
    based drift detectors. The reduced false alarms enables the signaling of drifts
    only when they are most likely to affect classification performance. As such,
    the MD3 approach leads to a detection scheme which is credible, label efficient
    and general in its applicability.

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Francesco Conti, Robert Schilling, Pasquale Davide Schiavone, Antonio Pullini, Davide Rossi, Frank Kagan Gürkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini
    Comments: 14 pages, 12 figures, submitted to IEEE Transactions on Circuits and Systems – I: Regular Papers. Updated according to the minor revision of the reviewers
    Subjects: Hardware Architecture (cs.AR); Learning (cs.LG)

    Near-sensor data analytics is a promising direction for IoT endpoints, as it
    minimizes energy spent on communication and reduces network load – but it also
    poses security concerns, as valuable data is stored or sent over the network at
    various stages of the analytics pipeline. Using encryption to protect sensitive
    data at the boundary of the on-chip analytics engine is a way to address data
    security issues. To cope with the combined workload of analytics and encryption
    in a tight power envelope, we propose Fulmine, a System-on-Chip based on a
    tightly-coupled multi-core cluster augmented with specialized blocks for
    compute-intensive data processing and encryption functions, supporting software
    programmability for regular computing tasks. The Fulmine SoC, fabricated in
    65nm technology, consumes less than 20mW on average at 0.8V achieving an
    efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to
    25MIPS/mW in software. As a strong argument for real-life flexible application
    of our platform, we show experimental results for three secure analytics use
    cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN
    consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with
    secured remote recognition in 5.74pJ/op; and seizure detection with encrypted
    data collection from EEG within 12.7pJ/op.


    Information Theory

    Convolutional Polar Codes

    Andrew James Ferris, Christoph Hirche, David Poulin
    Comments: Subsumes arXiv:1312.4575
    Subjects: Information Theory (cs.IT)

    Arikan’s Polar codes attracted much attention as the first efficiently
    decodable and capacity achieving codes. Furthermore, Polar codes exhibit an
    exponentially decreasing block error probability with an asymptotic error
    exponent upper bounded by 1/2. Since their discovery, many attempts have been
    made to improve the error exponent and the finite block-length performance,
    while keeping the bloc-structured kernel. Recently, two of us introduced a new
    family of efficiently decodable error-correction codes based on a recently
    discovered efficiently-contractible tensor network family in quantum many-body
    physics, called branching MERA. These codes, called branching MERA codes,
    include Polar codes and also extend them in a non-trivial way by substituting
    the bloc-structured kernel by a convolutional structure. Here, we perform an
    in-depth study of a particular example that can be thought of as a direct
    extension to Arikan’s Polar code, which we therefore name Convolutional Polar
    codes. We prove that these codes polarize and exponentially suppress the
    channel’s error probability, with an asymptotic error exponent log_2(3)/2 which
    is provably better than for Polar codes under successive cancellation decoding.
    We also perform finite block-size numerical simulations which display improved
    error-correcting capability with only a minor impact on decoding complexity.

    Index Coding: Rank-Invariant Extensions

    Vamsi Krishna Gummadi, Ashok Choudhary, Prasad Krishnan
    Subjects: Information Theory (cs.IT)

    An index coding (IC) problem consisting of a server and multiple receivers
    with different side-information and demand sets can be equivalently represented
    using a fitting matrix. A scalar linear index code to a given IC problem is a
    matrix representing the transmitted linear combinations of the message symbols.
    The length of an index code is then the number of transmissions (or
    equivalently, the number of rows in the index code). An IC problem ({cal
    I}_{ext}) is called an extension of another IC problem ({cal I}) if the
    fitting matrix of ({cal I}) is a submatrix of the fitting matrix of ({cal
    I}_{ext}). We first present a straightforward (m) extit{-order} extension
    ({cal I}_{ext}) of an IC problem ({cal I}) for which an index code is
    obtained by concatenating (m) copies of an index code of ({cal I}). The length
    of the codes is the same for both ({cal I}) and ({cal I}_{ext}), and if the
    index code for ({cal I}) has optimal length then so does the extended code for
    ({cal I}_{ext}). More generally, an extended IC problem of ({cal I}) having
    the same optimal length as ({cal I}) is said to be a extit{rank-invariant}
    extension of ({cal I}). We then focus on (2)-order rank-invariant extensions
    of ({cal I}), and present constructions of such extensions based on involutory
    permutation matrices.

    Polar Codes over Fading Channels with Power and Delay Constraints

    P. K. Deekshith, K. R. Sahasranand
    Comments: 6 pages, 6 figures
    Subjects: Information Theory (cs.IT)

    The inherent nature of polar codes being channel specific makes it difficult
    to use them in a setting where the communication channel changes with time. In
    particular, to be able to use polar codes in a wireless scenario, varying
    attenuation due to fading needs to be mitigated. To the best of our knowledge,
    there has been no comprehensive work in this direction thus far. In this work,
    a practical scheme involving channel inversion with the knowledge of the
    channel state at the transmitter, is proposed. An additional practical
    constraint on the permissible average and peak power is imposed, which in turn
    makes the channel equivalent to an additive white Gaussian noise (AWGN) channel
    cascaded with an erasure channel. It is shown that the constructed polar code
    could be made to achieve the symmetric capacity of this channel. Further, a
    means to compute the optimal design rate of the polar code for a given power
    constraint is also discussed.

    Channel Feedback Based on AoD-Adaptive Subspace Codebook in FDD Massive MIMO Systems

    Wenqian Shen, Linglong Dai, Byonghyo Shim, Zhaocheng Wang, Robert W. Heath Jr
    Comments: 30 pages, 9 figures
    Subjects: Information Theory (cs.IT)

    Channel feedback is essential in frequency division duplexing (FDD) massive
    multiple-input multiple-output (MIMO) systems. Unfortunately, previous work on
    multiuser MIMO has shown that the codebook size for channel feedback should
    scale exponentially with the number of base station (BS) antennas, which is
    greatly increased in massive MIMO systems. To reduce the codebook size and
    feedback overhead, we propose an angle-of-departure (AoD)-adaptive subspace
    codebook for channel feedback in FDD massive MIMO systems. Our key insight is
    to leverage the observation that path AoDs vary more slowly than the path
    gains. Within the angle coherence time, by utilizing the constant AoD
    information, the proposed AoD-adaptive subspace codebook is able to quantize
    the channel vector in a more accurate way. We also provide performance analysis
    of the proposed codebook in the large-dimensional regime, where we prove that
    to limit the capacity degradation within an acceptable level, the required
    number of feedback bits only scales linearly with the number of resolvable
    (path) AoDs, which is much smaller than the number of BS antennas. Moreover, we
    compare quantized channel feedback using the proposed AoD-adaptive subspace
    codebook with analog channel feedback. Extensive simulations that verify the
    analytical results are provided.

    Fast Encoding and Decoding of Flexible-Rate and Flexible-Length Polar Codes

    Muhammad Hanif, Masoud Ardakani
    Subjects: Information Theory (cs.IT)

    This work is on fast encoding and decoding of polar codes. We propose and
    detail 8-bit and 16-bit parallel decoders that can be used to reduce the
    decoding latency of the successive-cancellation decoder. These decoders are
    universal and can decode flexible-rate and flexible-length polar codes. We also
    present fast encoders that can be used to increase the throughput of
    serially-implemented polar encoders.

    Distributed FD-MIMO: Cellular Evolution for 5G and Beyond

    Yeqing Hu, Boon Loong Ng, Young-Han Nam, Jin Yuan, Gary Xu, Ji-Yun Seol, Jianzhong (Charlie)
    Zhang
    Subjects: Information Theory (cs.IT)

    This paper presents the next evolution of FD-MIMO technology for beyond 5G,
    where antennas of the FD-MIMO system are placed in a distributed manner
    throughout the cell in a multi-cell deployment scenario. This system, referred
    to as Distributed FD-MIMO (D-FD-MIMO) system, is capable of providing higher
    cell average throughput as well as more uniform user experience compared to the
    conventional FD-MIMO system. System level simulations are performed to evaluate
    performance. Our results show that the proposed D-FD-MIMO system achieves 1.4-2
    times cell average throughput gain compared to the FD-MIMO system. The insights
    of performance gain are provided. Hardware implementation challenges and
    potential standards impact are also presented.

    Massive MIMO Performance – TDD Versus FDD: What Do Measurements Say?

    Jose Flordelis, Fredrik Rusek, Fredrik Tufvesson, Erik G. Larsson, Ove Edfors
    Comments: Submitted to IEEE Transactions on Wireless Communications, 31/Mar/2017
    Subjects: Information Theory (cs.IT)

    Downlink beamforming in Massive MIMO either relies on uplink pilot
    measurements – exploiting reciprocity and TDD operation, or on the use of a
    predetermined grid of beams with user equipments reporting their preferred
    beams, mostly in FDD operation. Massive MIMO in its originally conceived form
    uses the first strategy, with uplink pilots, whereas there is currently
    significant commercial interest in the second, grid-of-beams. It has been
    analytically shown that in isotropic scattering (independent Rayleigh fading)
    the first approach outperforms the second. Nevertheless there remains
    controversy regarding their relative performance in practice. In this
    contribution, the performances of these two strategies are compared using
    measured channel data at 2.6 GHz.

    Joint Design of Digital and Analog Processing for Downlink C-RAN with Large-Scale Antenna Arrays

    Jaein Kim, Seok-Hwan Park, Osvaldo Simeone, Inkyu Lee, Shlomo Shamai (Shitz)
    Subjects: Information Theory (cs.IT)

    In millimeter-wave communication systems with large-scale antenna arrays,
    conventional digital beamforming may not be cost-effective. A promising
    solution is the implementation of hybrid beamforming techniques, which consist
    of low-dimensional digital beamforming followed by analog radio frequency (RF)
    beamforming. This work studies the optimization of hybrid beamforming in the
    context of a cloud radio access network (C-RAN) architecture. In a C-RAN
    system, digital baseband signal processing functionalities are migrated from
    remote radio heads (RRHs) to a baseband processing unit (BBU) in the “cloud” by
    means of finite-capacity fronthaul links. Specifically, this work tackles the
    problem of jointly optimizing digital beamforming and fronthaul quantization
    strategies at the BBU, as well as RF beamforming at the RRHs, with the goal of
    maximizing the weighted downlink sum-rate. Fronthaul capacity and per-RRH power
    constraints are enforced along with constant modulus constraints on the RF
    beamforming matrices. An iterative algorithm is proposed that is based on
    successive convex approximation and on the relaxation of the constant modulus
    constraint. The effectiveness of the proposed scheme is validated by numerical
    simulation results.

    CRLB Calculations for Joint AoA, AoD and Multipath Gain Estimation in Millimeter Wave Wireless Networks

    Laxminarayana S Pillutla, Ramesh Annavajjala
    Subjects: Information Theory (cs.IT)

    In this report we present an analysis of the non-random and the Bayesian
    Cramer-Rao lower bound (CRLB) for the joint estimation of angle-of-arrival
    (AoA), angle-of-departure (AoD), and the multipath amplitudes, for the
    millimeter-wave (mmWave) wireless networks. Our analysis is applicable to
    multipath channels with Gaussian noise and independent path parameters.
    Numerical results based on uniform AoA and AoD in ([0,pi)), and Rician fading
    path amplitudes, reveal that the Bayesian CRLB decreases monotonically with an
    increase in the Rice factor. Further, the CRLB obtained by using beamforming
    and combining code books generated by quantizing directly the domain of AoA and
    AoD was found to be lower than those obtained with other types of beamforming
    and combining code books.

    Private Multi-File Retrieval From Distributed Databases

    Zhifang Zhang, Jingke Xu
    Subjects: Information Theory (cs.IT)

    Suppose there are (N) distributed databases each storing a full set of (M)
    independent files. A user wants to retrieve (r) out of the (M) files without
    revealing the identity of the (r) files. When (r=1) it is the classic problem
    of private information retrieval (PIR). In this paper we study the problem of
    private multi-file retrieval (PMFR) which covers the case of general (r). We
    first prove an upper bound on the capacity of PMFR schemes which indicates the
    minimum possible download size per unit of retrieved files. Then we design a
    general PMFR scheme which happens to attain the upper bound when
    (rgeqfrac{M}{2}), thus achieving the optimal communication cost. As (r) goes
    down we show the trivial approach of executing (r) independent PIR instances
    achieves the near optimal communication cost. Comparing with the
    capacity-achieving PIR schemes, our PMFR scheme reduces the number of
    subpackages needed for each file from (N^M) to (N^2), which implies a great
    reduction of implementation complexity.

    Lossy Asymptotic Equipartition property for Wireless Sensor Networks

    Kwabena Doku-Amponsah
    Comments: 6 pages. arXiv admin note: text overlap with arXiv:1609.05252
    Subjects: Information Theory (cs.IT)

    This article extends the Generalized Asypmtotic Equipartition Property of
    Networked Data Structures to cover the Wireless Sensor Network modelled as
    coloured geometric random graph (CGRG). The main techniques used to prove this
    result remains large deviation principles for properly defined empirical
    measures on coloured geometric random graphs. Application of the result to some
    case study from the field of environmental science is discussed as a
    motivation.

    Latency Optimization for Resource Allocation in Mobile-Edge Computation Offloading

    Jinke Ren, Guanding Yu, Yunlong Cai, Yinghui He
    Comments: Submitted for Journal Publication
    Subjects: Information Theory (cs.IT)

    By offloading intensive computation tasks to the edge cloud located at the
    cellular base stations, mobile-edge computation offloading (MECO) has been
    regarded as a promising means to accomplish the ambitious millisecond-scale
    end-to-end latency requirement of the fifth-generation networks. In this paper,
    we investigate the latency-minimization problem in a multi-user time-division
    multiple access MECO system with joint communication and computation resource
    allocation. Three different computation models are studied, i.e., local
    compression, edge cloud compression, and partial compression offloading. First,
    closed-form expressions of optimal resource allocation and minimum system delay
    for both local and edge cloud compression models are derived. Then, for the
    partial compression offloading model, we formulate a piecewise optimization
    problem and prove that the optimal data segmentation strategy has a piecewise
    structure. Based on this result, an optimal joint communication and computation
    resource allocation algorithm is developed. To gain more insights, we also
    analyze a specific scenario where communication resource is adequate while
    computation resource is limited. In this special case, the closed-form solution
    of the piecewise optimization problem can be derived. Our proposed algorithms
    are finally verified by numerical results, which show that the novel partial
    compression offloading model can significantly reduce the end-to-end latency.

    Online Geographical Load Balancing for Energy-Harvesting Mobile Edge Computing

    Jie Xu, Hang Wu, Lixing Chen, Cong Shen
    Subjects: Information Theory (cs.IT)

    Mobile Edge Computing (MEC) (a.k.a. fog computing) has recently emerged to
    enable low-latency and location-aware data processing at the edge of mobile
    networks. Providing grid power supply in support of MEC, however, is costly and
    even infeasible, thus mandating on-site renewable energy as a major or even
    sole power supply in many scenarios. Nonetheless, the high intermittency and
    unpredictability of energy harvesting creates many new challenges of performing
    effective MEC. In this paper, we develop an algorithm called GLOBE that
    performs joint geographical load balancing (GLB) (for computation workload) and
    admission control (for communication data traffic), for optimizing the system
    performance of a network of MEC-enabled base stations. By leveraging the
    Lyapunov optimization with perturbation technique, GLOBE operates online
    without requiring future system information and addresses significant
    challenges caused by battery state dynamics and energy causality constraints.
    We prove that GLOBE achieves a close-to-optimal system performance compared to
    the offline algorithm that knows full future information, and present a
    critical tradeoff between battery capacity and system performance. Simulation
    results validate our analysis and demonstrate the superior performance of GLOBE
    compared to benchmark algorithms.

    Sparse mean localization by information theory

    Emiliano Diaz
    Subjects: Applications (stat.AP); Information Theory (cs.IT)

    Sparse feature selection is necessary when we fit statistical models, we have
    access to a large group of features, don’t know which are relevant, but assume
    that most are not. Alternatively, when the number of features is larger than
    the available data the model becomes over parametrized and the sparse feature
    selection task involves selecting the most informative variables for the model.
    When the model is a simple location model and the number of relevant features
    does not grow with the total number of features, sparse feature selection
    corresponds to sparse mean estimation. We deal with a simplified mean
    estimation problem consisting of an additive model with gaussian noise and mean
    that is in a restricted, finite hypothesis space. This restriction simplifies
    the mean estimation problem into a selection problem of combinatorial nature.
    Although the hypothesis space is finite, its size is exponential in the
    dimension of the mean. In limited data settings and when the size of the
    hypothesis space depends on the amount of data or on the dimension of the data,
    choosing an approximation set of hypotheses is a desirable approach. Choosing a
    set of hypotheses instead of a single one implies replacing the bias-variance
    trade off with a resolution-stability trade off. Generalization capacity
    provides a resolution selection criterion based on allowing the learning
    algorithm to communicate the largest amount of information in the data to the
    learner without error. In this work the theory of approximation set coding and
    generalization capacity is explored in order to understand this approach. We
    then apply the generalization capacity criterion to the simplified sparse mean
    estimation problem and detail an importance sampling algorithm which at once
    solves the difficulty posed by large hypothesis spaces and the slow convergence
    of uniform sampling algorithms.

    Truthfulness in Repeated Predictions

    Amir Ban
    Subjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)

    Proper scoring rules elicit truth-telling when making predictions, or
    otherwise revealing information. However, when multiple predictions are made of
    the same event, telling the truth is in general no longer optimal, as agents
    are motivated to distort early predictions to mislead competitors. We
    demonstrate this, and then prove a significant exception: In a multi-agent
    prediction setting where all agent signals belong to a jointly multivariate
    normal distribution, and signal variances are common knowledge, the (proper)
    logarithmic scoring rule will elicit truthful predictions from every agent at
    every prediction, regardless of the number, order and timing of predictions.
    The result applies in several financial models.

    Power Control in Massive MIMO with Dynamic User Population

    Mohamad Assaad, Salah Eddine Hajri, Thomas Bonald, Anthony Ephremides
    Comments: conference paper, submitted
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    This paper considers the problem of power control in Massive MIMO systems
    taking into account the pilot contamination issue and the arrivals and
    departures of users in the network. Contrary to most of existing work in MIMO
    systems that focuses on the physical layer with fixed number of users, we
    consider in this work that the users arrive dynamically and leave the network
    once they are served. We provide a power control strategy, having a polynomial
    complexity, and prove that this policy stabilizes the network whenever
    possible. We then provide a distributed implementation of the power control
    policy requiring low information exchange between the BSs and show that it
    achieves the same stability region as the centralized policy.

    Multi-frequency sparse Bayesian learning with uncertainty models

    Santosh Nannuru, Kay L. Gemba, Peter Gerstoft, William S. Hodgkiss, Christoph Mecklenbräuker
    Comments: 12 pages, 8 figures
    Subjects: Applications (stat.AP); Information Theory (cs.IT)

    Sparse Bayesian learning (SBL) has emerged as a fast and competitive method
    to perform sparse processing. The SBL algorithm, which is developed using a
    Bayesian framework, approximately solves a non-convex optimization problem
    using fixed point updates. It provides comparable performance and is
    significantly faster than convex optimization techniques used in sparse
    processing. In this paper we propose signal model which accounts for dictionary
    mismatch and the presence of spurious peaks at low signal-to-noise ratios. As
    an extension of SBL, modified fixed point update equation is derived which
    incorporates the statistics of mismatch and spurious peaks. We also extend the
    SBL algorithm to process multi-frequency observations. The derived update
    equations are studied quantitatively using beamforming simulations applied to
    direction-of-arrival (DoA) estimation. Performance of SBL using single
    frequency and multi-frequency observations, and in the presence of aliasing is
    evaluated. Data collected from the SwellEx-96 experiment is used to demonstrate
    qualitatively the advantages of SBL.

    A New Capacity Scaling Law in Ultra-Dense Networks

    Ming Ding, David Lopez Perez, Guoqiang Mao
    Comments: conference submission in Mar. 2017
    Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)

    We discover a new capacity scaling law in ultra-dense networks (UDNs) under
    practical system assumptions, such as a general multi-piece path loss model, a
    non-zero base station (BS) to user equipment (UE) antenna height difference,
    and a finite UE density. The intuition and implication of this new capacity
    scaling law are completely different from that found in year 2011. That law
    indicated that the increase of the interference power caused by a denser
    network would be exactly compensated by the increase of the signal power due to
    the reduced distance between transmitters and receivers, and thus network
    capacity should grow linearly with network densification. However, we find that
    both the signal and interference powers become bounded in practical UDNs, which
    leads to a constant capacity scaling law. As a result, network densification
    should be stopped at a certain level for a given UE density, because the
    network capacity will reach its limit due to (i) the bounded signal and
    interference powers, and (ii) a finite frequency reuse factor because of a
    finite UE density. Our new discovery on the constant capacity scaling law also
    resolves the recent concerns about network capacity collapsing in UDNs, e.g.,
    the capacity crash due to a non-zero BS-to-UE antenna height difference, or a
    bounded path loss model of the near-field (NF) effect, etc.

    Provable Inductive Robust PCA via Iterative Hard Thresholding

    U.N. Niranjan, Arun Rajkumar, Theja Tulabandhula
    Subjects: Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

    The robust PCA problem, wherein, given an input data matrix that is the
    superposition of a low-rank matrix and a sparse matrix, we aim to separate out
    the low-rank and sparse components, is a well-studied problem in machine
    learning. One natural question that arises is that, as in the inductive
    setting, if features are provided as input as well, can we hope to do better?
    Answering this in the affirmative, the main goal of this paper is to study the
    robust PCA problem while incorporating feature information. In contrast to
    previous works in which recovery guarantees are based on the convex relaxation
    of the problem, we propose a simple iterative algorithm based on
    hard-thresholding of appropriate residuals. Under weaker assumptions than
    previous works, we prove the global convergence of our iterative procedure;
    moreover, it admits a much faster convergence rate and lesser computational
    complexity per iteration. In practice, through systematic synthetic and real
    data simulations, we confirm our theoretical findings regarding improvements
    obtained by using feature information.

    Tropical Limits of Probability Spaces, Part I: The Intrinsic Kolmogorov-Sinai Distance and the Asymptotic Equipartition Property for Configurations

    Rostislav Matveev, Jacobus W. Portegies
    Subjects: Metric Geometry (math.MG); Information Theory (cs.IT); Probability (math.PR)

    The entropy of a finite probability space (X) measures the observable
    cardinality of large independent products (X^{otimes n}) of the probability
    space. If two probability spaces (X) and (Y) have the same entropy, there is an
    almost measure-preserving bijection between large parts of (X^{otimes n}) and
    (Y^{otimes n}). In this way, (X) and (Y) are asymptotically equivalent.

    It turns out to be challenging to generalize this notion of asymptotic
    equivalence to configurations of probability spaces, which are collections of
    probability spaces with measure-preserving maps between some of them.

    In this article we introduce the intrinsic Kolmogorov-Sinai distance on the
    space of configurations of probability spaces. Concentrating on the large-scale
    geometry we pass to the asymptotic Kolmogorov-Sinai distance. It induces an
    asymptotic equivalence relation on sequences of configurations of probability
    spaces. We will call the equivalence classes emph{tropical probability
    spaces}.

    In this context we prove an Asymptotic Equipartition Property for
    configurations. It states that tropical configurations can always be
    approximated by homogeneous configurations. In addition, we show that the
    solutions to certain Information-Optimization problems are
    Lipschitz-con-tinuous with respect to the asymptotic Kolmogorov-Sinai
    distance. It follows from these two statements that in order to solve an
    Information-Optimization problem, it suffices to consider homogeneous
    configurations.

    Finally, we show that spaces of trajectories of length (n) of certain
    stochastic processes, in particular stationary Markov chains, have a tropical
    limit.

    Stochastic L-BFGS Revisited: Improved Convergence Rates and Practical Acceleration Strategies

    Renbo Zhao, William B. Haskell, Vincent Y. F. Tan
    Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Machine Learning (stat.ML)

    We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. By
    proposing a new framework for analyzing convergence, we theoretically improve
    the (linear) convergence rates and computational complexities of the stochastic
    L-BFGS algorithms in previous works. In addition, we propose several practical
    acceleration strategies to speed up the empirical performance of such
    algorithms. We also provide theoretical analyses for most of the strategies.
    Experiments on large-scale logistic and ridge regression problems demonstrate
    that our proposed strategies yield significant improvements via-`a-vis
    competing state-of-the-art algorithms.




沪ICP备19023445号-2号
友情链接