IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 29 Nov 2016

    我爱机器学习(52ml.net)发表于 2016-11-29 00:00:00
    love 0

    Neural and Evolutionary Computing

    Dense Prediction on Sequences with Time-Dilated Convolutions for Speech Recognition

    Tom Sercu, Vaibhava Goel
    Comments: Appeared at NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In computer vision pixelwise dense prediction is the task of predicting a
    label for each pixel in the image. Convolutional neural networks achieve good
    performance on this task, while being computationally efficient. In this paper
    we carry these ideas over to the problem of assigning a sequence of labels to a
    set of speech frames, a task commonly known as framewise classification. We
    show that dense prediction view of framewise classification offers several
    advantages and insights, including computational efficiency and the ability to
    apply batch normalization. When doing dense prediction we pay specific
    attention to strided pooling in time and introduce an asymmetric dilated
    convolution, called time-dilated convolution, that allows for efficient and
    elegant implementation of pooling in time. We show that by using time-dilated
    convolutions with a very deep VGG-style CNN with batch normalization, we
    achieve best published single model accuracy result on the switchboard-2000
    benchmark dataset.

    Efficient Convolutional Auto-Encoding via Random Convexification and Frequency-Domain Minimization

    Meshia Cédric Oveneke, Mitchel Aliosha-Perez, Yong Zhao, Dongmei Jiang, Hichem Sahli
    Comments: Accepted at NIPS 2016 Workshop on Efficient Methods for Deep Neural Networks (EMDNN)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The omnipresence of deep learning architectures such as deep convolutional
    neural networks (CNN)s is fueled by the synergistic combination of
    ever-increasing labeled datasets and specialized hardware. Despite the
    indisputable success, the reliance on huge amounts of labeled data and
    specialized hardware can be a limiting factor when approaching new
    applications. To help alleviating these limitations, we propose an efficient
    learning strategy for layer-wise unsupervised training of deep CNNs on
    conventional hardware in acceptable time. Our proposed strategy consists of
    randomly convexifying the reconstruction contractive auto-encoding (RCAE)
    learning objective and solving the resulting large-scale convex minimization
    problem in the frequency domain via coordinate descent (CD). The main
    advantages of our proposed learning strategy are: (1) single tunable
    optimization parameter; (2) fast and guaranteed convergence; (3) possibilities
    for full parallelization. Numerical experiments show that our proposed learning
    strategy scales (in the worst case) linearly with image size, number of filters
    and filter size.

    Dynamic landscape models of coevolutionary games

    Hendrik Richter
    Comments: arXiv admin note: substantial text overlap with arXiv:1603.06374
    Subjects: Populations and Evolution (q-bio.PE); Neural and Evolutionary Computing (cs.NE); Biological Physics (physics.bio-ph)

    Players of coevolutionary games may update not only their strategies but also
    their networks of interaction. Based on interpreting the payoff of players as
    fitness, dynamic landscape models are proposed. The modeling procedure is
    carried out for Prisoner’s Dilemma (PD) and Snowdrift (SD) games that both use
    either birth-death (BD) or death-birth (DB) strategy updating. With the main
    focus on using dynamic fitness landscapes as an alternative tool for analyzing
    coevolutionary games, landscape measures such as modality, ruggedness and
    information content are computed and analyzed. In addition, fixation properties
    of the games and quantifiers characterizing the network of interaction are
    calculated numerically. Relations are established between landscape properties
    expressed by landscape measures and quantifiers of coevolutionary game dynamics
    such as fixation probabilities, fixation times and network properties

    Kernel classification of connectomes based on earth mover's distance between graph spectra

    Yulia Dodonova, Mikhail Belyaev, Anna Tkachev, Dmitry Petrov, Leonid Zhukov
    Comments: Presented at The MICCAI-BACON 16 Workshop (arXiv:1611.03363)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    In this paper, we tackle a problem of predicting phenotypes from structural
    connectomes. We propose that normalized Laplacian spectra can capture
    structural properties of brain networks, and hence graph spectral distributions
    are useful for a task of connectome-based classification. We introduce a kernel
    that is based on earth mover’s distance (EMD) between spectral distributions of
    brain networks. We access performance of an SVM classifier with the proposed
    kernel for a task of classification of autism spectrum disorder versus typical
    development based on a publicly available dataset. Classification quality (area
    under the ROC-curve) obtained with the EMD-based kernel on spectral
    distributions is 0.71, which is higher than that based on simpler graph
    embedding methods.


    Computer Vision and Pattern Recognition

    The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

    Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    State-of-the-art approaches for semantic image segmentation are built on
    Convolutional Neural Networks (CNNs). The typical segmentation architecture is
    composed of (a) a downsampling path responsible for extracting coarse semantic
    features, followed by (b) an upsampling path trained to recover the input image
    resolution at the output of the model and, optionally, (c) a post-processing
    module (e.g. Conditional Random Fields) to refine the model predictions.

    Recently, a new CNN architecture, Densely Connected Convolutional Networks
    (DenseNets), has shown excellent results on image classification tasks. The
    idea of DenseNets is based on the observation that if each layer is directly
    connected to every other layer in a feed-forward fashion then the network will
    be more accurate and easier to train.

    In this paper, we extend DenseNets to deal with the problem of semantic
    segmentation. We achieve state-of-the-art results on urban scene benchmark
    datasets such as CamVid and Gatech, without any further post-processing module
    nor pretraining. Moreover, due to smart construction of the model, our approach
    has much less parameters than currently published best entries for these
    datasets.

    Natural Illumination from Multiple Materials Using Deep Learning

    Stamatios Georgoulis, Konstantinos Rematas, Tobias Ritschel, Mario Fritz, Tinne Tuytelaars, Luc Van Gool
    Comments: Project: this http URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recovering natural illumination from a single Low-Dynamic Range (LDR) image
    is a challenging task. To remedy this situation we exploit two properties often
    found in everyday images. First, images rarely show a single material, but
    rather multiple ones that all reflect the same illumination. However, the
    appearance of each material is observed only for some surface orientations, not
    all. Second, parts of the illumination are often directly observed in the
    background, without being affected by reflection. Typically, this directly
    observed part of the illumination is even smaller. We propose a deep
    Convolutional Neural Network (CNN) that combines prior knowledge about the
    statistics of illumination and reflectance with an input that makes explicit
    use of these two observations. Our approach maps multiple partial LDR material
    observations represented as reflectance maps and a background image to a
    spherical High-Dynamic Range (HDR) illumination map. For training and testing
    we propose a new data set comprising of synthetic and real images with multiple
    materials observed under the same illumination. Qualitative and quantitative
    evidence shows how both multi-material and using a background are essential to
    improve illumination estimations.

    Hierarchical Boundary-Aware Neural Encoder for Video Captioning

    Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The use of Recurrent Neural Networks for video captioning has recently gained
    a lot of attention, since they can be used both to encode the input video and
    to generate the corresponding description. In this paper, we present a
    recurrent video encoding scheme which can discover and leverage the
    hierarchical structure of the video. Unlike the classical encoder-decoder
    approach, in which a video is encoded continuously by a recurrent layer, we
    propose a novel LSTM cell, which can identify discontinuity points between
    frames or segments and modify the temporal connections of the encoding layer
    accordingly. We evaluate our approach on three large-scale datasets: the
    Montreal Video Annotation dataset, the MPII Movie Description dataset and the
    Microsoft Video Description Corpus. Experiments show that our approach can
    discover appropriate hierarchical representations of input videos and improve
    the state of the art results on movie description datasets.

    Gaze Embeddings for Zero-Shot Image Classification

    Nour Karessli, Zeynep Akata, Andreas Bulling, Bernt Schiele
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Zero-shot image classification using auxiliary information, such as
    attributes describing discriminative object properties, requires time-consuming
    annotation by domain experts. We instead propose a method that relies on human
    gaze as auxiliary information, exploiting that even non-expert users have a
    natural ability to judge class membership. We present a data collection
    paradigm that involves a discrimination task to increase the information
    content obtained from gaze data. Our method extracts discriminative descriptors
    from the data and learns a compatibility function between image and gaze using
    three novel gaze embeddings: Gaze Histograms (GH), Gaze Features with Grid
    (GFG) and Gaze Features with Sequence (GFS). We introduce two new
    gaze-annotated datasets for fine-grained image classification and show that
    human gaze data is indeed class discriminative, provides a competitive
    alternative to expert-annotated attributes, and outperforms other baselines for
    zero-shot image classification.

    ECO: Efficient Convolution Operators for Tracking

    Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg
    Comments: Includes supplementary material
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In recent years, Discriminative Correlation Filter (DCF) based methods have
    significantly advanced the state-of-the-art in tracking. However, in the
    pursuit of ever increasing tracking performance, their characteristic speed and
    real-time capability have gradually faded. Further, the increasingly complex
    models, with massive number of trainable parameters, have introduced the risk
    of severe over-fitting. In this work, we tackle the key causes behind the
    problems of computational complexity and over-fitting, with the aim of
    simultaneously improving both speed and performance.

    We revisit the core DCF formulation and introduce: (i) a factorized
    convolution operator, which drastically reduces the number of parameters in the
    model; (ii) a compact generative model of the training sample distribution,
    that significantly reduces memory and time complexity, while providing better
    diversity of samples; (iii) a conservative model update strategy with improved
    robustness and reduced complexity. We perform comprehensive experiments on four
    benchmarks: VOT2016, UAV123, OTB-2015, and TempleColor. When using expensive
    deep features, our tracker provides a 20-fold speedup and achieves a 13.3%
    relative gain in Expected Average Overlap compared to the top ranked method in
    the VOT2016 challenge. Moreover, our fast variant, using hand-crafted features,
    operates at 60 Hz on a single CPU, while obtaining 64.8% AUC on OTB-2015.

    Computational Mapping of the Ground Reflectivity with Laser Scanners

    Juan Castorena
    Comments: Submitted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this investigation we focus on the problem of mapping the ground
    reflectivity with multiple laser scanners mounted on mobile robots/vehicles.
    The problem originates because regions of the ground become populated with a
    varying number of reflectivity measurements whose value depends on the observer
    and its corresponding perspective. Here, we propose a novel automatic,
    data-driven computational mapping framework specifically aimed at preserving
    edge sharpness in the map reconstruction process and that considers the sources
    of measurement variation. Our new formulation generates map-perspective
    gradients and applies sub-set selection fusion and de-noising operators to
    these through iterative algorithms that minimize an (ell_1) sparse regularized
    least squares formulation. Reconstruction of the ground reflectivity is then
    carried out based on Poisson’s formulation posed as an (ell_2) term promoting
    consistency with the fused gradient of map-perspectives and a term that ensures
    equality constraints with reference measurement data. We demonstrate our new
    framework outperforms the capabilities of existing ones with experiments
    realized on Ford’s fleet of autonomous vehicles. For example, we show we can
    achieve map enhancement (i.e., contrast enhancement), artifact removal,
    de-noising and map-stitching without requiring an additional reflectivity
    adjustment to calibrate sensors to the specific mounting and robot/vehicle
    motion.

    Image Based Appraisal of Real Estate Properties

    Quanzeng You, Ran Pang, Jiebo Luo
    Comments: 8 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Real estate appraisal, which is the process of estimating the price for real
    estate properties, is crucial for both buys and sellers as the basis for
    negotiation and transaction. Traditionally, the repeat sales model has been
    widely adopted to estimate real estate price. However, it depends the design
    and calculation of a complex economic related index, which is challenging to
    estimate accurately. Today, real estate brokers provide easy access to detailed
    online information on real estate properties to their clients. We are
    interested in estimating the real estate price from these large amounts of
    easily accessed data. In particular, we analyze the prediction power of online
    house pictures, which is one of the key factors for online users to make a
    potential visiting decision. The development of robust computer vision
    algorithms makes the analysis of visual content possible. In this work, we
    employ a Recurrent Neural Network (RNN) to predict real estate price using the
    state-of-the-art visual features. The experimental results indicate that our
    model outperforms several of other state-of-the-art baseline algorithms in
    terms of both mean absolute error (MAE) and mean absolute percentage error
    (MAPE).

    Who's that Actor? Automatic Labelling of Actors in TV series starting from IMDB Images

    Rahaf Aljundi, Punarjay Chakravarty, Tinne Tuytelaars
    Comments: ACCV 2016 accepted paper
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we aim at automatically labeling actors in a TV series. Rather
    than relying on transcripts and subtitles, as has been demonstrated in the
    past, we show how to achieve this goal starting from a set of example images of
    each of the main actors involved, collected from the Internet Movie Database
    (IMDB). The problem then becomes one of domain adaptation: actors’ IMDB photos
    are typically taken at awards ceremonies and are quite different from their
    appearances in TV series. In each series as well, there is considerable change
    in actor appearance due to makeup, lighting, ageing, etc. To bridge this gap,
    we propose a graph-matching based self-labelling algorithm, which we coin HSL
    (Hungarian Self Labeling). Further, we propose a new edge cost to be used in
    this context, as well as an extension that is more robust to outliers, where
    prototypical faces for each of the actors are selected based on a hierarchical
    clustering procedure. We conduct experiments with 15 episodes from 3 different
    TV series and demonstrate automatic annotation with an accuracy of 90% and up.

    Sparse 3D Convolutional Neural Networks for Large-Scale Shape Retrieval

    Alexandr Notchenko, Ermek Kapushev, Evgeny Burnaev
    Comments: 8 pages, 3 figures, 2 tables, accepted to 3D Deep Learning Workshop at NIPS 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we present preliminary results of performance evaluation of
    S3DCNN – a Sparse 3D Convolutional Neural Network – on a large-scale 3D Shape
    benchmark ModelNet40, and measure how it is impacted by voxel resolution of
    input shape. We demonstrate comparable classification and retrieval performance
    to state-of-the-art models, but with much less computational costs in training
    and inference phases. We also notice that benefits of higher input resolution
    can be limited by an ability of a neural network to generalize high level
    features.

    Unsupervised Feature Learning With Symmetrically Connected Convolutional Denoising Auto-encoders

    Jianfeng Dong, Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang
    Comments: Submitted to CVPR 2017 for review
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks (CNNs) have shown their power on many computer
    vision tasks.However, there are still some limitations, including their
    dependency to large scale labeled training data and sensitivity to weight
    initialization.In this paper, we try to address these two problems by proposing
    a simple yet powerful CNN based denoising auto-encoder which can be trained
    end-to-end in an unsupervised manner.The network architecture we employ is a
    fully convolutional auto-encoder with symmetric encoder-decoder connections.The
    proposed method can not only reconstruct clean images from corrupted ones, but
    also learn image representation through the reconstruction training.It can
    further be adapted to find data driven network initialization without using
    extra training data.Experimental results show that our network can learn good
    feature from unlabeled data, which can be easily transferred to different high
    level vision tasks such as image classification and semantic segmentation.The
    data driven initialization method based on the convolutional auto-encoder is
    also competitive.

    On the Role and the Importance of Features for Background Modeling and Foreground Detection

    Thierry Bouwmans, Caroline Silva, Cristina Marghes, Mohammed Sami Zitouni, Harish Bhaskar, Carl Frelicot
    Comments: To be submitted to Computer Science Review
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Background modeling has emerged as a popular foreground detection technique
    for various applications in video surveillance. Background modeling methods
    have become increasing efficient in robustly modeling the background and hence
    detecting moving objects in any visual scene. Although several background
    subtraction and foreground detection have been proposed recently, no
    traditional algorithm today still seem to be able to simultaneously address all
    the key challenges of illumination variation, dynamic camera motion, cluttered
    background and occlusion. This limitation can be attributed to the lack of
    systematic investigation concerning the role and importance of features within
    background modeling and foreground detection. With the availability of a rather
    large set of invariant features, the challenge is in determining the best
    combination of features that would improve accuracy and robustness in
    detection. The purpose of this study is to initiate a rigorous and
    comprehensive survey of features used within background modeling and foreground
    detection. Further, this paper presents a systematic experimental and
    statistical analysis of techniques that provide valuable insight on the trends
    in background modeling and use it to draw meaningful recommendations for
    practitioners. In this paper, a preliminary review of the key characteristics
    of features based on the types and sizes is provided in addition to
    investigating their intrinsic spectral, spatial and temporal properties.
    Furthermore, improvements using statistical and fuzzy tools are examined and
    techniques based on multiple features are benchmarked against reliability and
    selection criterion. Finally, a description of the different resources
    available such as datasets and codes is provided.

    Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition

    Timur Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, Silvio Savarese
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a unified framework for understanding human social behaviors in
    raw image sequences. Our model jointly detects multiple individuals, infers
    their social actions, and estimates the collective actions with a single
    feed-forward pass through a neural network. We propose a single architecture
    that does not rely on external detection algorithms but rather is trained
    end-to-end to generate dense proposal maps that are refined via a novel
    inference scheme. The temporal consistency is handled via a person-level
    matching Recurrent Neural Network. The complete model takes as input a sequence
    of frames and outputs detections along with the estimates of individual actions
    and collective activities. We demonstrate state-of-the-art performance of our
    algorithm on multiple publicly available benchmarks.

    Bidirectional Multirate Reconstruction for Temporal Modeling in Videos

    Linchao Zhu, Zhongwen Xu, Yi Yang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Despite the recent success of neural networks in image feature learning, a
    major problem in the video domain is the lack of sufficient labeled data for
    learning to model temporal information. In this paper, we propose an
    unsupervised temporal modeling method that learns from untrimmed videos. The
    speed of motion varies constantly, e.g., a man may run quickly or slowly. We
    therefore train a Multirate Visual Recurrent Model (MVRM) by encoding frames of
    a clip with different intervals. This learning process makes the learned model
    more capable of dealing with motion speed variance. Given a clip sampled from a
    video, we use its past and future neighboring clips as the temporal context,
    and reconstruct the two temporal transitions, i.e., present(
    ightarrow)past
    transition and present(
    ightarrow)future transition, reflecting the temporal
    information in different views. The proposed method exploits the two
    transitions simultaneously by incorporating a bidirectional reconstruction
    which consists of a backward reconstruction and a forward reconstruction. We
    apply the proposed method to two challenging video tasks, i.e., complex event
    detection and video captioning, in which it achieves state-of-the-art
    performance. Notably, our method generates the best single feature for event
    detection with a relative improvement of 10.4% on the MEDTest-13 dataset and
    achieves the best performance in video captioning across all evaluation metrics
    on the YouTube2Text dataset.

    Deep, Dense, and Low-Rank Gaussian Conditional Random Fields

    Siddhartha Chandra, Iasonas Kokkinos
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work we introduce a fully-connected graph structure in the Deep
    Gaussian Conditional Random Field (G-CRF) model. For this we express the
    pairwise interactions between pixels as the inner-products of low-dimensional
    embeddings, delivered by a new subnetwork of a deep architecture. We
    efficiently minimize the resulting energy by solving the resulting low-rank
    linear system with conjugate gradients, and derive an analytic expression for
    the gradient of our embeddings which allows us to train them end-to-end with
    backpropagation.

    We demonstrate the merit of our approach by achieving state of the art
    results on three challenging Computer Vision benchmarks, namely semantic
    segmentation, human parts segmentation, and saliency estimation. Our
    implementation is fully GPU based, built on top of the Caffe library, and will
    be made publicly available.

    Awesome Typography: Statistics-Based Text Effects Transfer

    Shuai Yang, Jiaying Liu, Zhouhui Lian, Zongming Guo
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we explore the problem of generating fantastic special-effects
    for the typography. It is quite challenging due to the model diversities to
    illustrate varied text effects for different characters. To address this issue,
    our key idea is to exploit the analytics on the high regularity of the spatial
    distribution for text effects to guide the synthesis process. Specifically, we
    characterize the stylized patches by their normalized positions and the optimal
    scales to depict their style elements. Our method first estimates these two
    features and derives their correlation statistically. They are then converted
    into soft constraints for texture transfer to accomplish adaptive multi-scale
    texture synthesis and to make style element distribution uniform. It allows our
    algorithm to produce artistic typography that fits for both local texture
    patterns and the global spatial distribution in the example. Experimental
    results demonstrate the superiority of our method for various text effects over
    conventional style transfer methods. In addition, we validate the effectiveness
    of our algorithm with extensive artistic typography library generation.

    3D Human Pose Estimation from a Single Image via Distance Matrix Regression

    Francesc Moreno-Noguer
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper addresses the problem of 3D human pose estimation from a single
    image. We follow a standard two-step pipeline by first detecting the 2D
    position of the (N) body joints, and then using these observations to infer 3D
    pose. For the first step, we use a recent CNN-based detector. For the second
    step, most existing approaches perform 2(N)-to-3(N) regression of the Cartesian
    joint coordinates. We show that more precise pose estimates can be obtained by
    representing both the 2D and 3D human poses using (N imes N) distance
    matrices, and formulating the problem as a 2D-to-3D distance matrix regression.
    For learning such a regressor we leverage on simple Neural Network
    architectures, which by construction, enforce positivity and symmetry of the
    predicted matrices. The approach has also the advantage to naturally handle
    missing observations and allowing to hypothesize the position of non-observed
    joints. Quantitative results on Humaneva and Human3.6M datasets demonstrate
    consistent performance gains over state-of-the-art. Qualitative evaluation on
    the images in-the-wild of the LSP dataset, using the regressor learned on
    Human3.6M, reveals very promising generalization results.

    Hyperspectral CNN Classification with Limited Training Samples

    Lloyd Windrim, Rishi Ramakrishnan, Arman Melkumyan, Richard Murphy
    Comments: 10 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Hyperspectral imaging sensors are becoming increasingly popular in robotics
    applications such as agriculture and mining, and allow per-pixel thematic
    classification of materials in a scene based on their unique spectral
    signatures. Recently, convolutional neural networks have shown remarkable
    performance for classification tasks, but require substantial amounts of
    labelled training data. This data must sufficiently cover the variability
    expected to be encountered in the environment. For hyperspectral data, one of
    the main variations encountered outdoors is due to incident illumination, which
    can change in spectral shape and intensity depending on the scene geometry. For
    example, regions occluded from the sun have a lower intensity and their
    incident irradiance skewed towards shorter wavelengths.

    In this work, a data augmentation strategy based on relighting is used during
    training of a hyperspectral convolutional neural network. It allows training to
    occur in the outdoor environment given only a small labelled region, which does
    not need to sufficiently represent the geometric variability of the entire
    scene. This is important for applications where obtaining large amounts of
    training data is labourious, hazardous or difficult, such as labelling pixels
    within shadows. Radiometric normalisation approaches for pre-processing the
    hyperspectral data are analysed and it is shown that methods based on the raw
    pixel data are sufficient to be used as input for the classifier. This removes
    the need for external hardware such as calibration boards, which can restrict
    the application of hyperspectral sensors in robotics applications. Experiments
    to evaluate the classification system are carried out on two datasets captured
    from a field-based platform.

    DeepSetNet: Predicting Sets with Deep Neural Networks

    Seyed Hamid Rezatofighi, Vijay Kumar B G, Anton Milan, Ehsan Abbasnejad, Anthony Dick, Ian Reid
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    This paper addresses the task of set prediction using deep learning. This is
    important because the output of many computer vision tasks, including image
    tagging and object detection, are naturally expressed as sets of entities
    rather than vectors. As opposed to a vector, the size of a set is not fixed in
    advance, and it is invariant to the ordering of entities within it. We define a
    likelihood for a set distribution and learn its parameters using a deep neural
    network. We also derive a loss for predicting a discrete distribution
    corresponding to set cardinality. Set prediction is demonstrated on the
    problems of multi-class image classification and pedestrian detection. Our
    approach yields state-of-the-art results in both cases on standard datasets.

    Object Detection Free Instance Segmentation With Labeling Transformations

    Long Jin, Zeyu Chen, Zhuowen Tu
    Comments: 10 pages, 5 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Instance segmentation has attracted recent attention in computer vision and
    existing methods in this domain mostly have an object detection stage. In this
    paper, we study the intrinsic challenge of the instance segmentation problem,
    the presence of a quotient space (swapping the labels of different instances
    leads to the same result), and propose new methods that are object proposal-
    and object detection- free. We propose three methods, namely pixel-based
    affinity mapping, superpixel-based affinity learning, and boundary-based
    component segmentation, all focusing on performing labeling transformations to
    cope with the quotient space problem. By adopting fully convolutional neural
    networks (FCN) like models, our framework attains competitive results on both
    the PASCAL dataset (object-centric) and the Gland dataset (texture-centric),
    which the existing methods are not able to do. Our work also has the advantages
    in its transparency, simplicity, and being all segmentation based.

    Improving Fully Convolution Network for Semantic Segmentation

    Bing Shuai, Ting Liu, Gang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Fully Convolution Networks (FCN) have achieved great success in dense
    prediction tasks including semantic segmentation. In this paper, we start from
    discussing FCN by understanding its architecture limitations in building a
    strong segmentation network. Next, we present our Improved Fully Convolution
    Network (IFCN). In contrast to FCN, IFCN introduces a context network that
    progressively expands the receptive fields of feature maps. In addition, dense
    skip connections are added so that the context network can be effectively
    optimized. More importantly, these dense skip connections enable IFCN to fuse
    rich-scale context to make reliable predictions. Empirically, those
    architecture modifications are proven to be significant to enhance the
    segmentation performance. Without engaging any contextual post-processing, IFCN
    significantly advances the state-of-the-arts on ADE20K (ImageNet scene
    parsing), Pascal Context, Pascal VOC 2012 and SUN-RGBD segmentation datasets.

    Analyzing the group sparsity based on the rank minimization methods

    Zhiyuan Zha, Xin Liu, Xiaohua Huang, Xiaopeng Hong, Henglin Shi, Yingyue Xu, Qiong Wang, Lan Tang, Xinggan Zhang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Sparse coding has achieved a great success in various image processing
    studies. However, there is not any benchmark to measure the sparsity of image
    patch/group because sparse discriminant conditions cannot keep unchanged. This
    paper analyzes the sparsity of group based on the strategy of the rank
    minimization. Firstly, an adaptive dictionary for each group is designed. Then,
    we prove that group-based sparse coding is equivalent to the rank minimization
    problem, and thus the sparse coefficient of each group is measured by
    estimating the singular values of each group. Based on that measurement, the
    weighted Schatten (p)-norm minimization (WSNM) has been found to be the closest
    solution to the real singular values of each group. Thus, WSNM can be
    equivalently transformed into a non-convex (ell_p)-norm minimization problem
    in group-based sparse coding. To make the proposed scheme tractable and robust,
    the alternating direction method of multipliers (ADMM) is used to solve the
    (ell_p)-norm minimization problem. Experimental results on two applications:
    image inpainting and image compressive sensing (CS) recovery have shown that
    the proposed scheme outperforms many state-of-the-art methods.

    Range Loss for Deep Face Recognition with Long-tail

    Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao
    Comments: 9 pages, 5 figures, Submitted to CVPR, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Convolutional neural networks have achieved great improvement on face
    recognition in recent years because of its extraordinary ability in learning
    discriminative features of people with different identities. To train such a
    well-designed deep network, tremendous amounts of data is indispensable. Long
    tail distribution specifically refers to the fact that a small number of
    generic entities appear frequently while other objects far less existing.
    Considering the existence of long tail distribution of the real world data,
    large but uniform distributed data are usually hard to retrieve. Empirical
    experiences and analysis show that classes with more samples will pose greater
    impact on the feature learning process and inversely cripple the whole models
    feature extracting ability on tail part data. Contrary to most of the existing
    works that alleviate this problem by simply cutting the tailed data for uniform
    distributions across the classes, this paper proposes a new loss function
    called range loss to effectively utilize the whole long tailed data in training
    process. More specifically, range loss is designed to reduce overall
    intra-personal variations while enlarging inter-personal differences within one
    mini-batch simultaneously when facing even extremely unbalanced data. The
    optimization objective of range loss is the (k) greatest range’s harmonic mean
    values in one class and the shortest inter-class distance within one batch.
    Extensive experiments on two famous and challenging face recognition benchmarks
    (Labeled Faces in the Wild (LFW) and YouTube Faces (YTF) not only demonstrate
    the effectiveness of the proposed approach in overcoming the long tail effect
    but also show the good generalization ability of the proposed approach.

    Semantic Scene Completion from a Single Depth Image

    Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper focuses on semantic scene completion, a task for producing a
    complete 3D voxel representation of volumetric occupancy and semantic labels
    for a scene from a single-view depth map observation. Previous work has
    considered scene completion and semantic labeling of depth maps separately.
    However, we observe that these two problems are tightly intertwined. To
    leverage the coupled nature of these two tasks, we introduce the semantic scene
    completion network (SSCNet), an end-to-end 3D convolutional network that takes
    a single depth image as input and simultaneously outputs occupancy and semantic
    labels for all voxels in the camera view frustum. Our network uses a
    dilation-based 3D context module to efficiently expand the receptive field and
    enable 3D context learning. To train our network, we construct SUNCG – a
    manually created large-scale dataset of synthetic 3D scenes with dense
    volumetric annotations. Our experiments demonstrate that the joint model
    outperforms methods addressing each task in isolation and outperforms
    alternative approaches on the semantic scene completion task.

    Voronoi-based compact image descriptors: Efficient Region-of-Interest retrieval with VLAD and deep-learning-based descriptors

    Aaron Chadha, Yiannis Andreopoulos
    Comments: IEEE Transaction on Multimedia, under minor revision
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We investigate the problem of image retrieval based on visual queries when
    the latter comprise arbitrary regions-of-interest (ROI) rather than entire
    images. Our proposal is a compact image descriptor that combines the
    state-of-the-art in content-based descriptor extraction with a multi-level,
    Voronoi-based spatial partitioning of each dataset image. The proposed
    multi-level Voronoi-based encoding uses a spatial hierarchical K-means over
    interest-point locations, and computes a content-based descriptor over each
    cell. In order to reduce the matching complexity with minimal or no sacrifice
    in retrieval performance: (i) we utilize the tree structure of the spatial
    hierarchical K-means to perform a top-to-bottom pruning for local similarity
    maxima; (ii) we propose a new image similarity score that combines relevant
    information from all partition levels into a single measure for similarity;
    (iii) we combine our proposal with a novel and efficient approach for optimal
    bit allocation within quantized descriptor representations. By deriving both a
    Voronoi-based VLAD descriptor (termed as Fast-VVLAD) and a Voronoi-based deep
    convolutional neural network (CNN) descriptor (termed as Fast-VDCNN), we
    demonstrate that our Voronoi-based framework is agnostic to the descriptor
    basis, and can easily be slotted into existing frameworks. Via a range of ROI
    queries in two standard datasets, it is shown that the Voronoi-based
    descriptors achieve comparable or higher mean Average Precision against
    conventional grid-based spatial search, while offering more than two-fold
    reduction in complexity. Finally, beyond ROI queries, we show that Voronoi
    partitioning improves the geometric invariance of compact CNN descriptors,
    thereby resulting in competitive performance to the current state-of-the-art on
    whole image retrieval.

    Uniform Information Segmentation

    Radhakrishna Achanta, Pablo Márquez-Neila, Pascal Fua, Sabine Süsstrunk
    Comments: 9 pages
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Size uniformity is one of the main criteria of superpixel methods. But size
    uniformity rarely conforms to the varying content of an image. The chosen size
    of the superpixels therefore represents a compromise – how to obtain the fewest
    superpixels without losing too much important detail. We propose that a more
    appropriate criterion for creating image segments is information uniformity. We
    introduce a novel method for segmenting an image based on this criterion. Since
    information is a natural way of measuring image complexity, our proposed
    algorithm leads to image segments that are smaller and denser in areas of high
    complexity and larger in homogeneous regions, thus simplifying the image while
    preserving its details. Our algorithm is simple and requires just one input
    parameter – a threshold on the information content. On segmentation comparison
    benchmarks it proves to be superior to the state-of-the-art. In addition, our
    method is computationally very efficient, approaching real-time performance,
    and is easily extensible to three-dimensional image stacks and video volumes.

    It's Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation

    Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

    While appearance-based gaze estimation methods have traditionally exploited
    information encoded solely from the eyes, recent results from a multi-region
    method indicated that using the full face image can benefit performance.
    Pushing this idea further, we propose an appearance-based method that, in
    contrast to a long-standing line of work in computer vision, only takes the
    full face image as input. Our method encodes the face image using a
    convolutional neural network with spatial weights applied on the feature maps
    to flexibly suppress or enhance information in different facial regions.
    Through evaluation on the recent MPIIGaze and EYEDIAP gaze estimation datasets,
    we show that our full-face method significantly outperforms the state of the
    art for both 2D and 3D gaze estimation, achieving improvements of up to 14.3%
    on MPIIGaze and 27.7% on EYEDIAP for person-independent 3D gaze estimation. We
    further show that this improvement is consistent across different illumination
    conditions and gaze directions and particularly pronounced for the most
    challenging extreme head poses.

    A neuro-mathematical model for geometrical optical illusions

    B. Franceschiello, A. Sarti, G. Citti
    Comments: 13 pages, 38 figures divided in 15 groups
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Geometrical optical illusions have been object of many studies due to the
    possibility they offer to understand the behaviour of low-level visual
    processing. They consist in situations in which the perceived geometrical
    properties of an object differ from those of the object in the visual stimulus.
    Starting from the geometrical model introduced by Citti and Sarti in [3], we
    provide a mathematical model and a computational algorithm which allows to
    interpret these phenomena and to qualitatively reproduce the perceived
    misperception.

    Long-Term Image Boundary Extrapolation

    Apratim Bhattacharyya, Mateusz Malinowski, Bernt Schiele, Mario Fritz
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

    Boundary prediction in images and videos has been a very active topic of
    research and organizing visual information into boundaries and segments is
    believed to be a corner stone of visual perception. While prior work has
    focused on predicting boundaries for observed frames, our work aims at
    predicting boundaries of future unobserved frames. This requires our model to
    learn about the fate of boundaries and extrapolate motion patterns. We
    experiment on established real-world video segmentation dataset, which provides
    a testbed for this new task. We show for the first time spatio-temporal
    boundary extrapolation, that in contrast to prior work on RGB extrapolation
    maintains a crisp result. Furthermore, we show long-term prediction of
    boundaries in situations where the motion is governed by the laws of physics.
    We argue that our model has with minimalistic model assumptions derived a
    notion of “intuitive physics”.

    Did Evolution get it right? An evaluation of Near-Infrared imaging in semantic scene segmentation using deep learning

    J. Rafid Siddiqui
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Animals have evolved to restrict their sensing capabilities to certain region
    of electromagnetic spectrum. This is surprisingly a very narrow band on a vast
    scale which makes one think if there is a systematic bias underlying such
    selective filtration. The situation becomes even more intriguing when we find a
    sharp cutoff point at Near-infrared point whereby almost all animal vision
    systems seem to have a lower bound. This brings us to an interesting question:
    did evolution “intentionally” performed such a restriction in order to evolve
    higher visual cognition? In this work this question is addressed by
    experimenting with Near-infrared images for their potential applicability in
    higher visual processing such as semantic segmentation. A modified version of
    Fully Convolutional Networks are trained on NIR images and RGB images
    respectively and compared for their respective effectiveness in the wake of
    semantic segmentation. The results from the experiments show that visible part
    of the spectrum alone is sufficient for the robust semantic segmentation of the
    indoor as well as outdoor scenes.

    Kernel classification of connectomes based on earth mover's distance between graph spectra

    Yulia Dodonova, Mikhail Belyaev, Anna Tkachev, Dmitry Petrov, Leonid Zhukov
    Comments: Presented at The MICCAI-BACON 16 Workshop (arXiv:1611.03363)
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

    In this paper, we tackle a problem of predicting phenotypes from structural
    connectomes. We propose that normalized Laplacian spectra can capture
    structural properties of brain networks, and hence graph spectral distributions
    are useful for a task of connectome-based classification. We introduce a kernel
    that is based on earth mover’s distance (EMD) between spectral distributions of
    brain networks. We access performance of an SVM classifier with the proposed
    kernel for a task of classification of autism spectrum disorder versus typical
    development based on a publicly available dataset. Classification quality (area
    under the ROC-curve) obtained with the EMD-based kernel on spectral
    distributions is 0.71, which is higher than that based on simpler graph
    embedding methods.

    Deep Deformable Registration: Enhancing Accuracy by Fully Convolutional Neural Net

    Sayan Ghosal, Nilanjan Ray
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deformable registration is ubiquitous in medical image analysis. Many
    deformable registration methods minimize sum of squared difference (SSD) as the
    registration cost with respect to deformable model parameters. In this work, we
    construct a tight upper bound of the SSD registration cost by using a fully
    convolutional neural network (FCNN) in the registration pipeline. The upper
    bound SSD (UB-SSD) enhances the original deformable model parameter space by
    adding a heatmap output from FCNN. Next, we minimize this UB-SSD by adjusting
    both the parameters of the FCNN and the parameters of the deformable model in
    coordinate descent. Our coordinate descent framework is end-to-end and can work
    with any deformable registration method that uses SSD. We demonstrate
    experimentally that our method enhances the accuracy of deformable registration
    algorithms significantly on two publicly available 3D brain MRI data sets.

    Handwriting Profiling using Generative Adversarial Networks

    Arna Ghosh, Biswarup Bhattacharya, Somnath Basu Roy Chowdhury
    Comments: 2 pages; 2 figures; Accepted at The Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17 Student Abstract and Poster Program), San Francisco, USA; All authors have equal contribution
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Handwriting is a skill learned by humans from a very early age. The ability
    to develop one’s own unique handwriting as well as mimic another person’s
    handwriting is a task learned by the brain with practice. This paper deals with
    this very problem where an intelligent system tries to learn the handwriting of
    an entity using Generative Adversarial Networks (GANs). We propose a modified
    architecture of DCGAN (Radford, Metz, and Chintala 2015) to achieve this. We
    also discuss about applying reinforcement learning techniques to achieve faster
    learning. Our algorithm hopes to give new insights in this area and its uses
    include identification of forged documents, signature verification, computer
    generated art, digitization of documents among others. Our early implementation
    of the algorithm illustrates a good performance with MNIST datasets.

    SAD-GAN: Synthetic Autonomous Driving using Generative Adversarial Networks

    Arna Ghosh, Biswarup Bhattacharya, Somnath Basu Roy Chowdhury
    Comments: 5 pages; 4 figures; Accepted at the Deep Learning for Action and Interaction Workshop, 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain; All authors have equal contribution
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Autonomous driving is one of the most recent topics of interest which is
    aimed at replicating human driving behavior keeping in mind the safety issues.
    We approach the problem of learning synthetic driving using generative neural
    networks. The main idea is to make a controller trainer network using images
    plus key press data to mimic human learning. We used the architecture of a
    stable GAN to make predictions between driving scenes using key presses. We
    train our model on one video game (Road Rash) and tested the accuracy and
    compared it by running the model on other maps in Road Rash to determine the
    extent of learning.

    Real-Time Video Highlights for Yahoo Esports

    Yale Song
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Esports has gained global popularity in recent years and several companies
    have started offering live streaming videos of esports games and events. This
    creates opportunities to develop large scale video understanding systems for
    new product features and services. We present a technique for detecting
    highlights from live streaming videos of esports game matches. Most video games
    use pronounced visual effects to emphasize highlight moments; we use CNNs to
    learn convolution filters of those visual effects for detecting highlights. We
    propose a cascaded prediction approach that allows us to deal with several
    challenges arise in a production environment. We demonstrate our technique on
    our new dataset of three popular game titles, Heroes of the Storm, League of
    Legends, and Dota 2. Our technique achieves 18 FPS on a single CPU with an
    average precision of up to 83.18%. Part of our technique is currently deployed
    in production on Yahoo Esports.

    What Can Be Predicted from Six Seconds of Driver Glances?

    Lex Fridman, Heishiro Toyoda, Sean Seaman, Bobbie Seppelt, Linda Angell, Joonbum Lee, Bruce Mehler, Bryan Reimer
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    We consider a large dataset of real-world, on-road driving from a 100-car
    naturalistic study to explore the predictive power of driver glances and,
    specifically, to answer the following question: what can be predicted about the
    state of the driver and the state of the driving environment from a 6-second
    sequence of macro-glances? The context-based nature of such glances allows for
    application of supervised learning to the problem of vision-based gaze
    estimation, making it robust, accurate, and reliable in messy, real-world
    conditions. So, it’s valuable to ask whether such macro-glances can be used to
    infer behavioral, environmental, and demographic variables? We analyze 27
    binary classification problems based on these variables. The takeaway is that
    glance can be used as part of a multi-sensor real-time system to predict
    radio-tuning, fatigue state, failure to signal, talking, and several
    environment variables.

    Visual Dialog

    Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
    Comments: 22 pages, 16 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    We introduce the task of Visual Dialog, which requires an AI agent to hold a
    meaningful dialog with humans in natural, conversational language about visual
    content. Specifically, given an image, a dialog history, and a question about
    the image, the agent has to ground the question in image, infer context from
    history, and answer the question accurately. Visual Dialog is disentangled
    enough from a specific downstream task so as to serve as a general test of
    machine intelligence, while being grounded in vision enough to allow objective
    evaluation of individual responses and benchmark progress. We develop a novel
    two-person chat data-collection protocol to curate a large-scale Visual Dialog
    dataset (VisDial). Data collection is underway and on completion, VisDial will
    contain 1 dialog with 10 question-answer pairs on all ~200k images from COCO,
    with a total of 2M dialog question-answer pairs.

    We introduce a family of neural encoder-decoder models for Visual Dialog with
    3 encoders — Late Fusion, Hierarchical Recurrent Encoder and Memory Network —
    and 2 decoders (generative and discriminative), which outperform a number of
    sophisticated baselines. We propose a retrieval-based evaluation protocol for
    Visual Dialog where the AI agent is asked to sort a set of candidate answers
    and evaluated on metrics such as mean-reciprocal-rank of human response. We
    quantify gap between machine and human performance on the Visual Dialog task
    via human studies. Our dataset, code, and trained models will be released
    publicly. Putting it all together, we demonstrate the first ‘visual chatbot’!

    Semi-supervised Learning using Denoising Autoencoders for Brain Lesion Detection and Segmentation

    Varghese Alex, Kiran Vaidhya, Subramanium Thirunavukkarasu, Chandrasekharan Kesavdas, Ganapathy Krishnmurthi
    Comments: 10 Pages, 42 Images
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The work presented explores the use of denoising autoencoders (DAE) for brain
    lesion detection, segmentation and false positive reduction. Stacked denoising
    autoencoders (SDAE) were pre-trained using a large number of unlabeled patient
    volumes and fine tuned with patches drawn from a limited number of patients
    (n=20, 40, 65). The results show negligible loss in performance even when SDAE
    was fine tuned using 20 patients. Low grade glioma (LGG) segmentation was
    achieved using a transfer learning approach wherein a network pre-trained with
    High Grade Glioma (HGG) data was fine tuned using LGG image patches. The weakly
    supervised SDAE (for HGG) and transfer learning based LGG network were also
    shown to generalize well and provide good segmentation on unseen BraTS 2013 &
    BraTS 2015 test data. An unique contribution includes a single layer DAE,
    referred to as novelty detector(ND). ND was trained to accurately reconstruct
    non-lesion patches using a mean squared error loss function. The reconstruction
    error maps of test data were used to identify regions containing lesions. The
    error maps were shown to assign unique error distributions to various
    constituents of the glioma, enabling localization. The ND learns the non-lesion
    brain accurately as it was also shown to provide good segmentation performance
    on ischemic brain lesions in images from a different database.

    Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation

    Xun Xu, Timothy M. Hospedales, Shaogang Gong
    Comments: Published in ECCV 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Zero-Shot Learning (ZSL) promises to scale visual recognition by bypassing
    the conventional model training requirement of annotated examples for every
    category. This is achieved by establishing a mapping connecting low-level
    features and a semantic description of the label space, referred as
    visual-semantic mapping, on auxiliary data. Reusing the learned mapping to
    project target videos into an embedding space thus allows novel-classes to be
    recognised by nearest neighbour inference. However, existing ZSL methods suffer
    from auxiliary-target domain shift intrinsically induced by assuming the same
    mapping for the disjoint auxiliary and target classes. This compromises the
    generalisation accuracy of ZSL recognition on the target data. In this work, we
    improve the ability of ZSL to generalise across this domain shift in both
    model- and data-centric ways by formulating a visual-semantic mapping with
    better generalisation properties and a dynamic data re-weighting method to
    prioritise auxiliary data that are relevant to the target classes.
    Specifically: (1) We introduce a multi-task visual-semantic mapping to improve
    generalisation by constraining the semantic mapping parameters to lie on a
    low-dimensional manifold, (2) We explore prioritised data augmentation by
    expanding the pool of auxiliary data with additional instances weighted by
    relevance to the target domain. The proposed new model is applied to the
    challenging zero-shot action recognition problem to demonstrate its advantages
    over existing ZSL models.

    Deep Constrained Local Models for Facial Landmark Detection

    Amir Zadeh, Tadas Baltrušaitis, Louis-Philippe Morency
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Among the well-established methods for facial landmark detection is the
    family of Constrained Local Models (CLMs). An important part of CLM landmark
    alignment pipeline are the local detectors that estimate the alignment
    probability of each individual landmark over the facial region. In this paper
    we present Deep Constrained Local Model (DCLM) algorithm and the novel
    Dense-Projection Network (DPN) as a local detector. DPN is a deep neural
    network that consists of two important layers: Template Projection layer and
    Dense Aggregate layer. In Template Projection layer, patches of facial region
    are mapped to a higher dimensional space allowing the pose and rotation
    variations to be captured accurately. In Dense Aggregate layer an ensemble of
    experts is simulated within one network to make the landmark localization task
    more robust. In our extensive set of experiments we show that DPNs outperform
    previously proposed local detectors. Furthermore, we demonstrate that our
    proposed DCLM algorithm is state-of-the-art in facial landmark detection. We
    significantly outperform competitive baselines, that use both CLM-based and
    cascaded regression approaches, by a large margin on three publicly-available
    datasets for image and video landmark detection.

    Texture analysis using deterministic partially self-avoiding walk with thresholds

    Lucas Correia Ribas, Wesley Nunes Gonçalves, Odemir Martinez Bruno
    Comments: 8 pages, 3 figures
    Journal-ref: WVC proceedings 2016, pages 39-44
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we propose a new texture analysis method using the
    deterministic partially self-avoiding walk performed on maps modified with
    thresholds. In this method, two pixels of the map are neighbors if the
    Euclidean distance between them is less than (sqrt{2}) and the weight
    (difference between its intensities) is less than a given threshold. The maps
    obtained by using different thresholds highlight several properties of the
    image that are extracted by the deterministic walk. To compose the feature
    vector, deterministic walks are performed with different thresholds and its
    statistics are concatenated. Thus, this approach can be considered as a
    multi-scale analysis. We validate our method on the Brodatz database, which is
    very well known public image database and widely used by texture analysis
    methods. Experimental results indicate that the proposed method presents a good
    texture discrimination, overcoming traditional texture methods.

    Directional Mean Curvature for Textured Image Demixing

    Duy Hoang Thai, David Banks
    Comments: 72 pages, including main manuscript (36 pages) and Appendix (36 pages); 14 figures; journal
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Approximation theory plays an important role in image processing, especially
    image deconvolution and decomposition. For piecewise smooth images, there are
    many methods that have been developed over the past thirty years. The goal of
    this study is to devise similar and practical methodology for handling textured
    images. This problem is motivated by forensic imaging, since fingerprints,
    shoeprints and bullet ballistic evidence are textured images. In particular, it
    is known that texture information is almost destroyed by a blur operator, such
    as a blurred ballistic image captured from a low-cost microscope. The
    contribution of this work is twofold: first, we propose a mathematical model
    for textured image deconvolution and decomposition into four meaningful
    components, using a high-order partial differential equation approach based on
    the directional mean curvature. Second, we uncover a link between functional
    analysis and multiscale sampling theory, e.g., harmonic analysis and filter
    banks. Both theoretical results and examples with natural images are provided
    to illustrate the performance of the proposed model.

    Fast deterministic tourist walk for texture analysis

    Lucas Correia Ribas, Odemir Martinez Bruno
    Comments: 7 page, 7 figure
    Journal-ref: WVC 2016 proceedings p45-50
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Deterministic tourist walk (DTW) has attracted increasing interest in
    computer vision. In the last years, different methods for analysis of dynamic
    and static textures were proposed. So far, all works based on the DTW for
    texture analysis use all image pixels as initial point of a walk. However, this
    requires much runtime. In this paper, we conducted a study to verify the
    performance of the DTW method according to the number of initial points to
    start a walk. The proposed method assigns a unique code to each image pixel,
    then, the pixels whose code is not divisible by a given (k) value are ignored
    as initial points of walks. Feature vectors were extracted and a classification
    process was performed for different percentages of initial points. Experimental
    results on the Brodatz and Vistex datasets indicate that to use fewer pixels as
    initial points significantly improves the runtime compared to use all image
    pixels. In addition, the correct classification rate decreases very little.


    Artificial Intelligence

    Accelerated Gradient Temporal Difference Learning

    Yangchen Pan, Adam White, Martha White
    Comments: AAAI Conference on Artificial Intelligence (AAAI), 2017
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    The family of temporal difference (TD) methods span a spectrum from
    computationally frugal linear methods like TD({lambda}) to data efficient
    least squares methods. Least square methods make the best use of available data
    directly computing the TD solution and thus do not require tuning a typically
    highly sensitive learning rate parameter, but require quadratic computation and
    storage. Recent algorithmic developments have yielded several sub-quadratic
    methods that use an approximation to the least squares TD solution, but incur
    bias. In this paper, we propose a new family of accelerated gradient TD (ATD)
    methods that (1) provide similar data efficiency benefits to least-squares
    methods, at a fraction of the computation and storage (2) significantly reduce
    parameter sensitivity compared to linear TD methods, and (3) are asymptotically
    unbiased. We illustrate these claims with a proof of convergence in expectation
    and experiments on several benchmark domains and a large-scale industrial
    energy allocation domain.

    Blocking and Other Enhancements for Bottom-Up Model Generation Methods

    Peter Baumgartner, Renate A. Schmidt
    Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

    Model generation is a problem complementary to theorem proving and is
    important for fault analysis and debugging of formal specifications of security
    protocols, programs and terminological definitions. This paper discusses
    several ways of enhancing the paradigm of bottom-up model generation. The two
    main contributions are new, generalized blocking techniques and a new
    range-restriction transformation. The blocking techniques are based on simple
    transformations of the input set together with standard equality reasoning and
    redundancy elimination techniques. These provide general methods for finding
    small, finite models. The range-restriction transformation refines existing
    transformations to range-restricted clauses by carefully limiting the creation
    of domain terms. All possible combinations of the introduced techniques and
    classical range-restriction were tested on the clausal problems of the TPTP
    Version 6.0.0 with an implementation based on the SPASS theorem prover using a
    hyperresolution-like refinement. Unrestricted domain blocking gave best results
    for satisfiable problems showing it is a powerful technique indispensable for
    bottom-up model generation methods. Both in combination with the new
    range-restricting transformation, and the classical range-restricting
    transformation, good results have been obtained. Limiting the creation of terms
    during the inference process by using the new range restricting transformation
    has paid off, especially when using it together with a shifting transformation.
    The experimental results also show that classical range restriction with
    unrestricted blocking provides a useful complementary method. Overall, the
    results showed bottom-up model generation methods were good for disproving
    theorems and generating models for satisfiable problems, but less efficient
    than SPASS in auto mode for unsatisfiable problems.

    Nonparametric General Reinforcement Learning

    Jan Leike
    Comments: PhD thesis
    Subjects: Artificial Intelligence (cs.AI)

    Reinforcement learning (RL) problems are often phrased in terms of Markov
    decision processes (MDPs). In this thesis we go beyond MDPs and consider RL in
    environments that are non-Markovian, non-ergodic and only partially observable.
    Our focus is not on practical algorithms, but rather on the fundamental
    underlying problems: How do we balance exploration and exploitation? How do we
    explore optimally? When is an agent optimal? We follow the nonparametric
    realizable paradigm.

    We establish negative results on Bayesian RL agents, in particular AIXI. We
    show that unlucky or adversarial choices of the prior cause the agent to
    misbehave drastically. Therefore Legg-Hutter intelligence and balanced Pareto
    optimality, which depend crucially on the choice of the prior, are entirely
    subjective. Moreover, in the class of all computable environments every policy
    is Pareto optimal. This undermines all existing optimality properties for AIXI.
    However, there are Bayesian approaches to general RL that satisfy objective
    optimality guarantees: We prove that Thompson sampling is asymptotically
    optimal in stochastic environments in the sense that its value converges to the
    value of the optimal policy. We connect asymptotic optimality to regret given a
    recoverability assumption on the environment that allows the agent to recover
    from mistakes. Hence Thompson sampling achieves sublinear regret in these
    environments.

    Our results culminate in a formal solution to the grain of truth problem: A
    Bayesian agent acting in a multi-agent environment learns to predict the other
    agents’ policies if its prior assigns positive probability to them (the prior
    contains a grain of truth). We construct a large but limit computable class
    containing a grain of truth and show that agents based on Thompson sampling
    over this class converge to play Nash equilibria in arbitrary unknown
    computable multi-agent environments.

    The BIN_COUNTS Constraint: Filtering and Applications

    Roberto Rossi, Özgür Akgün, Steven Prestwich, Armagan Tarim
    Comments: 20 pages, working draft
    Subjects: Artificial Intelligence (cs.AI); Probability (math.PR); Other Statistics (stat.OT)

    We introduce the BIN_COUNTS constraint, which deals with the problem of
    counting the number of decision variables in a set which are assigned values
    that lie in given bins. We illustrate a decomposition and a filtering algorithm
    that achieves generalised arc consistency. We contrast the filtering power of
    these two approaches and we discuss a number of applications. We show that
    BIN_COUNTS can be employed to develop a decomposition for the (chi^2) test
    constraint, a new statistical constraint that we introduce in this work. We
    also show how this new constraint can be employed in the context of the
    Balanced Academic Curriculum Problem and of the Balanced Nursing Workload
    Problem. For both these problems we carry out numerical studies involving our
    reformulations. Finally, we present a further application of the (chi^2) test
    constraint in the context of confidence interval analysis.

    "Model and Run" Constraint Networks with a MILP Engine

    Thierry Petit
    Subjects: Artificial Intelligence (cs.AI)

    Constraint Programming (CP) users need significant expertise in order to
    model their problems appropriately, notably to select propagators and search
    strategies. This puts the brakes on a broader uptake of CP. In this paper, we
    introduce MICE, a complete Java CP modeler that can use any Mixed Integer
    Linear Programming (MILP) solver as a solution technique. Our aim is to provide
    an alternative tool for democratizing the “CP-style” modeling thanks to its
    simplicity of use, with reasonable solving capabilities. Our contributions
    include new decompositions of (reified) constraints and constraints on
    numerical variables.

    Embedded Bandits for Large-Scale Black-Box Optimization

    Abdullah Al-Dujaili, S. Suresh
    Comments: To appear at AAAI 2017
    Subjects: Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

    Random embedding has been applied with empirical success to large-scale
    black-box optimization problems with low effective dimensions. This paper
    proposes the EmbeddedHunter algorithm, which incorporates the technique in a
    hierarchical stochastic bandit setting, following the optimism in the face of
    uncertainty principle and breaking away from the multiple-run framework in
    which random embedding has been conventionally applied similar to stochastic
    black-box optimization solvers. Our proposition is motivated by the bounded
    mean variation in the objective value for a low-dimensional point projected
    randomly into the decision space of Lipschitz-continuous problems. In essence,
    the EmbeddedHunter algorithm expands optimistically a partitioning tree over a
    low-dimensional—equal to the effective dimension of the problem—search
    space based on a bounded number of random embeddings of sampled points from the
    low-dimensional space. In contrast to the probabilistic theoretical guarantees
    of multiple-run random-embedding algorithms, the finite-time analysis of the
    proposed algorithm presents a theoretical upper bound on the regret as a
    function of the algorithm’s number of iterations. Furthermore, numerical
    experiments were conducted to validate its performance. The results show a
    clear performance gain over recently proposed random embedding methods for
    large-scale problems, provided the intrinsic dimensionality is low.

    Optimizing Expectation with Guarantees in POMDPs (Technical Report)

    Krishnendu Chatterjee, Petr Novotný, Guillermo A. Pérez, Jean-François Raskin, Đorđe Žikelić
    Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

    A standard objective in partially-observable Markov decision processes
    (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff.
    However, such policies may still permit unlikely but highly undesirable
    outcomes, which is problematic especially in safety-critical applications.
    Recently, there has been a surge of interest in POMDPs where the goal is to
    maximize the probability to ensure that the payoff is at least a given
    threshold, but these approaches do not consider any optimization beyond
    satisfying this threshold constraint. In this work we go beyond both the
    “expectation” and “threshold” approaches and consider a “guaranteed payoff
    optimization (GPO)” problem for POMDPs, where we are given a threshold (t) and
    the objective is to find a policy (sigma) such that a) each possible outcome
    of (sigma) yields a discounted-sum payoff of at least (t), and b) the expected
    discounted-sum payoff of (sigma) is optimal (or near-optimal) among all
    policies satisfying a). We present a practical approach to tackle the GPO
    problem and evaluate it on standard POMDP benchmarks.

    Deep Reinforcement Learning for Multi-Domain Dialogue Systems

    Heriberto Cuayáhuitl, Seunghak Yu, Ashley Williamson, Jacob Carse
    Comments: NIPS Workshop on Deep Reinforcement Learning, 2016
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Standard deep reinforcement learning methods such as Deep Q-Networks (DQN)
    for multiple tasks (domains) face scalability problems. We propose a method for
    multi-domain dialogue policy learning—termed NDQN, and apply it to an
    information-seeking spoken dialogue system in the domains of restaurants and
    hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed)
    using simulations report that our proposed method exhibits better scalability
    and is promising for optimising the behaviour of multi-domain dialogue systems.

    Improving Policy Gradient by Exploring Under-appreciated Rewards

    Ofir Nachum, Mohammad Norouzi, Dale Schuurmans
    Comments: Under review at ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    This paper presents a novel form of policy gradient for model-free
    reinforcement learning (RL) with improved exploration properties. Current
    policy-based methods use entropy regularization to encourage undirected
    exploration of the reward landscape, which is ineffective in high dimensional
    spaces with sparse rewards. We propose a more directed exploration strategy
    that promotes exploration of {em under-appreciated reward} regions. An action
    sequence is considered under-appreciated if its log-probability under the
    current policy under-estimates its mbox{resulting} reward. The proposed
    exploration strategy is easy to implement, requiring small modifications to an
    implementation of the REINFORCE algorithm. We evaluate the approach on a set of
    algorithmic tasks that have long challenged RL methods. Our approach reduces
    hyper-parameter sensitivity and demonstrates significant improvements over
    baseline methods. Our algorithm successfully solves a benchmark multi-digit
    addition task and generalizes to long sequences. This is, to our knowledge, the
    first time that a pure RL method has solved addition using only reward
    feedback.

    Towards a new quantum cognition model

    Riccardo Franco
    Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)

    This article presents a new quantum-like model for cognition explicitly based
    on knowledge. It is shown that this model, called QKT (quantum knowledge-based
    theory), is able to coherently describe some experimental results that are
    problematic for the prior quantum-like decision models. In particular, I
    consider the experimental results relevant to the post-decision cognitive
    dissonance, the problems relevant to the question order effect and response
    replicability, and those relevant to the grand-reciprocity equations. A new set
    of postulates is proposed, which evidence the different meaning given to the
    projectors and to the quantum states. In the final part, I show that the use of
    quantum gates can help to better describe and understand the evolution of
    quantum-like models.

    Analyzing Features for the Detection of Happy Endings in German Novels

    Fotis Jannidis, Isabella Reger, Albin Zehe, Martin Becker, Lena Hettinger, Andreas Hotho
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    With regard to a computational representation of literary plot, this paper
    looks at the use of sentiment analysis for happy ending detection in German
    novels. Its focus lies on the investigation of previously proposed sentiment
    features in order to gain insight about the relevance of specific features on
    the one hand and the implications of their performance on the other hand.
    Therefore, we study various partitionings of novels, considering the highly
    variable concept of “ending”. We also show that our approach, even though still
    rather simple, can potentially lead to substantial findings relevant to
    literary studies.

    DeepSetNet: Predicting Sets with Deep Neural Networks

    Seyed Hamid Rezatofighi, Vijay Kumar B G, Anton Milan, Ehsan Abbasnejad, Anthony Dick, Ian Reid
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    This paper addresses the task of set prediction using deep learning. This is
    important because the output of many computer vision tasks, including image
    tagging and object detection, are naturally expressed as sets of entities
    rather than vectors. As opposed to a vector, the size of a set is not fixed in
    advance, and it is invariant to the ordering of entities within it. We define a
    likelihood for a set distribution and learn its parameters using a deep neural
    network. We also derive a loss for predicting a discrete distribution
    corresponding to set cardinality. Set prediction is demonstrated on the
    problems of multi-class image classification and pedestrian detection. Our
    approach yields state-of-the-art results in both cases on standard datasets.

    Handwriting Profiling using Generative Adversarial Networks

    Arna Ghosh, Biswarup Bhattacharya, Somnath Basu Roy Chowdhury
    Comments: 2 pages; 2 figures; Accepted at The Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17 Student Abstract and Poster Program), San Francisco, USA; All authors have equal contribution
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Handwriting is a skill learned by humans from a very early age. The ability
    to develop one’s own unique handwriting as well as mimic another person’s
    handwriting is a task learned by the brain with practice. This paper deals with
    this very problem where an intelligent system tries to learn the handwriting of
    an entity using Generative Adversarial Networks (GANs). We propose a modified
    architecture of DCGAN (Radford, Metz, and Chintala 2015) to achieve this. We
    also discuss about applying reinforcement learning techniques to achieve faster
    learning. Our algorithm hopes to give new insights in this area and its uses
    include identification of forged documents, signature verification, computer
    generated art, digitization of documents among others. Our early implementation
    of the algorithm illustrates a good performance with MNIST datasets.

    SAD-GAN: Synthetic Autonomous Driving using Generative Adversarial Networks

    Arna Ghosh, Biswarup Bhattacharya, Somnath Basu Roy Chowdhury
    Comments: 5 pages; 4 figures; Accepted at the Deep Learning for Action and Interaction Workshop, 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain; All authors have equal contribution
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Autonomous driving is one of the most recent topics of interest which is
    aimed at replicating human driving behavior keeping in mind the safety issues.
    We approach the problem of learning synthetic driving using generative neural
    networks. The main idea is to make a controller trainer network using images
    plus key press data to mimic human learning. We used the architecture of a
    stable GAN to make predictions between driving scenes using key presses. We
    train our model on one video game (Road Rash) and tested the accuracy and
    compared it by running the model on other maps in Road Rash to determine the
    extent of learning.

    Multiwinner Approval Rules as Apportionment Methods

    Markus Brill, Jean-François Laslier, Piotr Skowron
    Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

    We establish a link between multiwinner elections and apportionment problems
    by showing how approval-based multiwinner election rules can be interpreted as
    methods of apportionment. We consider several multiwinner rules and observe
    that they induce apportionment methods that are well-established in the
    literature on proportional representation. For instance, we show that
    Proportional Approval Voting induces the D’Hondt method and that Monroe’s rule
    induces the largest reminder method. We also consider properties of
    apportionment methods and exhibit multiwinner rules that induce apportionment
    methods satisfying these properties.

    Visual Dialog

    Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
    Comments: 22 pages, 16 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    We introduce the task of Visual Dialog, which requires an AI agent to hold a
    meaningful dialog with humans in natural, conversational language about visual
    content. Specifically, given an image, a dialog history, and a question about
    the image, the agent has to ground the question in image, infer context from
    history, and answer the question accurately. Visual Dialog is disentangled
    enough from a specific downstream task so as to serve as a general test of
    machine intelligence, while being grounded in vision enough to allow objective
    evaluation of individual responses and benchmark progress. We develop a novel
    two-person chat data-collection protocol to curate a large-scale Visual Dialog
    dataset (VisDial). Data collection is underway and on completion, VisDial will
    contain 1 dialog with 10 question-answer pairs on all ~200k images from COCO,
    with a total of 2M dialog question-answer pairs.

    We introduce a family of neural encoder-decoder models for Visual Dialog with
    3 encoders — Late Fusion, Hierarchical Recurrent Encoder and Memory Network —
    and 2 decoders (generative and discriminative), which outperform a number of
    sophisticated baselines. We propose a retrieval-based evaluation protocol for
    Visual Dialog where the AI agent is asked to sort a set of candidate answers
    and evaluated on metrics such as mean-reciprocal-rank of human response. We
    quantify gap between machine and human performance on the Visual Dialog task
    via human studies. Our dataset, code, and trained models will be released
    publicly. Putting it all together, we demonstrate the first ‘visual chatbot’!

    Training an Interactive Humanoid Robot Using Multimodal Deep Reinforcement Learning

    Heriberto Cuayáhuitl, Guillaume Couly, Clément Olalainty
    Comments: NIPS Workshop on Future of Interactive Learning Machines, 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    Training robots to perceive, act and communicate using multiple modalities
    still represents a challenging problem, particularly if robots are expected to
    learn efficiently from small sets of example interactions. We describe a
    learning approach as a step in this direction, where we teach a humanoid robot
    how to play the game of noughts and crosses. Given that multiple multimodal
    skills can be trained to play this game, we focus our attention to training the
    robot to perceive the game, and to interact in this game. Our multimodal deep
    reinforcement learning agent perceives multimodal features and exhibits verbal
    and non-verbal actions while playing. Experimental results using simulations
    show that the robot can learn to win or draw up to 98% of the games. A pilot
    test of the proposed multimodal system for the targeted game—integrating
    speech, vision and gestures—reports that reasonable and fluent interactions
    can be achieved using the proposed approach.

    Deep Constrained Local Models for Facial Landmark Detection

    Amir Zadeh, Tadas Baltrušaitis, Louis-Philippe Morency
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

    Among the well-established methods for facial landmark detection is the
    family of Constrained Local Models (CLMs). An important part of CLM landmark
    alignment pipeline are the local detectors that estimate the alignment
    probability of each individual landmark over the facial region. In this paper
    we present Deep Constrained Local Model (DCLM) algorithm and the novel
    Dense-Projection Network (DPN) as a local detector. DPN is a deep neural
    network that consists of two important layers: Template Projection layer and
    Dense Aggregate layer. In Template Projection layer, patches of facial region
    are mapped to a higher dimensional space allowing the pose and rotation
    variations to be captured accurately. In Dense Aggregate layer an ensemble of
    experts is simulated within one network to make the landmark localization task
    more robust. In our extensive set of experiments we show that DPNs outperform
    previously proposed local detectors. Furthermore, we demonstrate that our
    proposed DCLM algorithm is state-of-the-art in facial landmark detection. We
    significantly outperform competitive baselines, that use both CLM-based and
    cascaded regression approaches, by a large margin on three publicly-available
    datasets for image and video landmark detection.


    Information Retrieval

    Prediction of Video Popularity in the Absence of Reliable Data from Video Hosting Services: Utility of Traces Left by Users on the Web

    Alexey Drutsa (Yandex, Moscow, Russia), Gleb Gusev (Yandex, Moscow, Russia), Pavel Serdyukov (Yandex, Moscow, Russia)
    Comments: 23 pages, 4 figures
    Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Machine Learning (stat.ML)

    With the growth of user-generated content, we observe the constant rise of
    the number of companies, such as search engines, content aggregators, etc.,
    that operate with tremendous amounts of web content not being the services
    hosting it. Thus, aiming to locate the most important content and promote it to
    the users, they face the need of estimating the current and predicting the
    future content popularity.

    In this paper, we approach the problem of video popularity prediction not
    from the side of a video hosting service, as done in all previous studies, but
    from the side of an operating company, which provides a popular video search
    service that aggregates content from different video hosting websites. We
    investigate video popularity prediction based on features from three primary
    sources available for a typical operating company: first, the content hosting
    provider may deliver its data via its API, second, the operating company makes
    use of its own search and browsing logs, third, the company crawls information
    about embeds of a video and links to a video page from publicly available
    resources on the Web. We show that video popularity prediction based on the
    embed and link data coupled with the internal search and browsing data
    significantly improves video popularity prediction based only on the data
    provided by the video hosting and can even adequately replace the API data in
    the cases when it is partly or completely unavailable.

    Analyzing Features for the Detection of Happy Endings in German Novels

    Fotis Jannidis, Isabella Reger, Albin Zehe, Martin Becker, Lena Hettinger, Andreas Hotho
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    With regard to a computational representation of literary plot, this paper
    looks at the use of sentiment analysis for happy ending detection in German
    novels. Its focus lies on the investigation of previously proposed sentiment
    features in order to gain insight about the relevance of specific features on
    the one hand and the implications of their performance on the other hand.
    Therefore, we study various partitionings of novels, considering the highly
    variable concept of “ending”. We also show that our approach, even though still
    rather simple, can potentially lead to substantial findings relevant to
    literary studies.

    Predicting drug recalls from Internet search engine queries

    Elad Yom-Tov
    Subjects: Information Retrieval (cs.IR)

    Batches of pharmaceutical are sometimes recalled from the market when a
    safety issue or a defect is detected in specific production runs of a drug.
    Such problems are usually detected when patients or healthcare providers report
    abnormalities to medical authorities. Here we test the hypothesis that
    defective production lots can be detected earlier by monitoring queries to
    Internet search engines.

    We extracted queries from the USA to the Bing search engine which mentioned
    one of 5,195 pharmaceutical drugs during 2015 and all recall notifications
    issued by the Food and Drug Administration (FDA) during that year. By using
    attributes that quantify the change in query volume at the state level, we
    attempted to predict if a recall of a specific drug will be ordered by FDA in a
    time horizon ranging from one to 40 days in future.

    Our results show that future drug recalls can indeed be identified with an
    AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall will
    occur one day ahead. This performance degrades as prediction is made for longer
    periods ahead. The most indicative attributes for prediction are sudden spikes
    in query volume about a specific medicine in each state. Recalls of
    prescription drugs and those estimated to be of medium-risk are more likely to
    be identified using search query data.

    These findings suggest that aggregated Internet search engine data can be
    used to facilitate in early warning of faulty batches of medicines.

    Forward Backward Similarity Search in Knowledge Networks

    Baoxu Shi, Lin Yang, Tim Weninger
    Comments: Accepted for publication in Knowledge-Based Systems
    Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)

    Similarity search is a fundamental problem in social and knowledge networks
    like GitHub, DBLP, Wikipedia, etc. Existing network similarity measures are
    limited because they only consider similarity from the perspective of the query
    node. However, due to the complicated topology of real-world networks, ignoring
    the preferences of target nodes often results in odd or unintuitive
    performance. In this work, we propose a dual perspective similarity metric
    called Forward Backward Similarity (FBS) that efficiently computes topological
    similarity from the perspective of both the query node and the perspective of
    candidate nodes. The effectiveness of our method is evaluated by traditional
    quantitative ranking metrics and large-scale human judgement on four large real
    world networks. The proposed method matches human preference and outperforms
    other similarity search algorithms on community overlap and link prediction.
    Finally, we demonstrate top-5 rankings for five famous researchers on an
    academic collaboration network to illustrate how our approach captures
    semantics more intuitively than other approaches.

    MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

    Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, Li Deng
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    This paper presents our recent work on the design and development of a new,
    large scale dataset, which we name MS MARCO, for MAchine Reading
    COmprehension.This new dataset is aimed to overcome a number of well-known
    weaknesses of previous publicly available datasets for the same task of reading
    comprehension and question answering. In MS MARCO, all questions are sampled
    from real anonymized user queries. The context passages, from which answers in
    the dataset are derived, are extracted from real web documents using the most
    advanced version of the Bing search engine. The answers to the queries are
    human generated. Finally, a subset of these queries has multiple answers. We
    aim to release one million queries and the corresponding answers in the
    dataset, which, to the best of our knowledge, is the most comprehensive
    real-world dataset of its kind in both quantity and quality. We are currently
    releasing 100,000 queries with their corresponding answers to inspire work in
    reading comprehension and question answering along with gathering feedback from
    the research community.

    Improving Multi-Document Summarization via Text Classification

    Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei
    Comments: 7 pages, 3 figures, AAAI-17
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Developed so far, multi-document summarization has reached its bottleneck due
    to the lack of sufficient training data and diverse categories of documents.
    Text classification just makes up for these deficiencies. In this paper, we
    propose a novel summarization system called TCSum, which leverages plentiful
    text classification data to improve the performance of multi-document
    summarization. TCSum projects documents onto distributed representations which
    act as a bridge between text classification and summarization. It also utilizes
    the classification results to produce summaries of different styles. Extensive
    experiments on DUC generic multi-document summarization datasets show that,
    TCSum can achieve the state-of-the-art performance without using any
    hand-crafted features and has the capability to catch the variations of summary
    styles with respect to different text categories.

    Joint Copying and Restricted Generation for Paraphrase

    Ziqiang Cao, Chuwei Luo, Wenjie Li, Sujian Li
    Comments: 7 pages, 1 figure, AAAI-17
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Many natural language generation tasks, such as abstractive summarization and
    text simplification, are paraphrase-orientated. In these tasks, copying and
    rewriting are two main writing modes. Most previous sequence-to-sequence
    (Seq2Seq) models use a single decoder and neglect this fact. In this paper, we
    develop a novel Seq2Seq model to fuse a copying decoder and a restricted
    generative decoder. The copying decoder finds the position to be copied based
    on a typical attention model. The generative decoder produces words limited in
    the source-specific vocabulary. To combine the two decoders and determine the
    final output, we develop a predictor to predict the mode of copying or
    rewriting. This predictor can be guided by the actual writing mode in the
    training data. We conduct extensive experiments on two different paraphrase
    datasets. The result shows that our model outperforms the state-of-the-art
    approaches in terms of both informativeness and language quality.

    Hierarchical Hyperlink Prediction for the WWW

    Dario Garcia-Gasulla, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, Toyotaro Suzumura
    Comments: Submitted to Transactions on Internet Technology journal
    Subjects: Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

    The hyperlink prediction task, that of proposing new links between webpages,
    can be used to improve search engines, expand the visibility of web pages, and
    increase the connectivity and navigability of the web. Hyperlink prediction is
    typically performed on webgraphs composed by thousands or millions of vertices,
    where on average each webpage contains less than fifty links. Algorithms
    processing graphs so large and sparse require to be both scalable and precise,
    a challenging combination. Similarity-based algorithms are among the most
    scalable solutions within the link prediction field, due to their parallel
    nature and computational simplicity. These algorithms independently explore the
    nearby topological features of every missing link from the graph in order to
    determine its likelihood. Unfortunately, the precision of similarity-based
    algorithms is limited, which has prevented their broad application so far. In
    this work we explore the performance of similarity-based algorithms for the
    particular problem of hyperlink prediction on large webgraphs, and propose a
    novel method which assumes the existence of hierarchical properties. We
    evaluate this new approach on several webgraphs and compare its performance
    with that of the current best similarity-based algorithms. Its remarkable
    performance leads us to argue on the applicability of the proposal, identifying
    several use cases of hyperlink prediction. We also describes the approach we
    took for the computation of large-scale graphs from the perspective of
    high-performance computing, providing details on the implementation and
    parallelization of code.


    Computation and Language

    Dense Prediction on Sequences with Time-Dilated Convolutions for Speech Recognition

    Tom Sercu, Vaibhava Goel
    Comments: Appeared at NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In computer vision pixelwise dense prediction is the task of predicting a
    label for each pixel in the image. Convolutional neural networks achieve good
    performance on this task, while being computationally efficient. In this paper
    we carry these ideas over to the problem of assigning a sequence of labels to a
    set of speech frames, a task commonly known as framewise classification. We
    show that dense prediction view of framewise classification offers several
    advantages and insights, including computational efficiency and the ability to
    apply batch normalization. When doing dense prediction we pay specific
    attention to strided pooling in time and introduce an asymmetric dilated
    convolution, called time-dilated convolution, that allows for efficient and
    elegant implementation of pooling in time. We show that by using time-dilated
    convolutions with a very deep VGG-style CNN with batch normalization, we
    achieve best published single model accuracy result on the switchboard-2000
    benchmark dataset.

    MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

    Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, Li Deng
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    This paper presents our recent work on the design and development of a new,
    large scale dataset, which we name MS MARCO, for MAchine Reading
    COmprehension.This new dataset is aimed to overcome a number of well-known
    weaknesses of previous publicly available datasets for the same task of reading
    comprehension and question answering. In MS MARCO, all questions are sampled
    from real anonymized user queries. The context passages, from which answers in
    the dataset are derived, are extracted from real web documents using the most
    advanced version of the Bing search engine. The answers to the queries are
    human generated. Finally, a subset of these queries has multiple answers. We
    aim to release one million queries and the corresponding answers in the
    dataset, which, to the best of our knowledge, is the most comprehensive
    real-world dataset of its kind in both quantity and quality. We are currently
    releasing 100,000 queries with their corresponding answers to inspire work in
    reading comprehension and question answering along with gathering feedback from
    the research community.

    Improving Multi-Document Summarization via Text Classification

    Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei
    Comments: 7 pages, 3 figures, AAAI-17
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Developed so far, multi-document summarization has reached its bottleneck due
    to the lack of sufficient training data and diverse categories of documents.
    Text classification just makes up for these deficiencies. In this paper, we
    propose a novel summarization system called TCSum, which leverages plentiful
    text classification data to improve the performance of multi-document
    summarization. TCSum projects documents onto distributed representations which
    act as a bridge between text classification and summarization. It also utilizes
    the classification results to produce summaries of different styles. Extensive
    experiments on DUC generic multi-document summarization datasets show that,
    TCSum can achieve the state-of-the-art performance without using any
    hand-crafted features and has the capability to catch the variations of summary
    styles with respect to different text categories.

    Joint Copying and Restricted Generation for Paraphrase

    Ziqiang Cao, Chuwei Luo, Wenjie Li, Sujian Li
    Comments: 7 pages, 1 figure, AAAI-17
    Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

    Many natural language generation tasks, such as abstractive summarization and
    text simplification, are paraphrase-orientated. In these tasks, copying and
    rewriting are two main writing modes. Most previous sequence-to-sequence
    (Seq2Seq) models use a single decoder and neglect this fact. In this paper, we
    develop a novel Seq2Seq model to fuse a copying decoder and a restricted
    generative decoder. The copying decoder finds the position to be copied based
    on a typical attention model. The generative decoder produces words limited in
    the source-specific vocabulary. To combine the two decoders and determine the
    final output, we develop a predictor to predict the mode of copying or
    rewriting. This predictor can be guided by the actual writing mode in the
    training data. We conduct extensive experiments on two different paraphrase
    datasets. The result shows that our model outperforms the state-of-the-art
    approaches in terms of both informativeness and language quality.

    AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

    Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley
    Comments: 4 pages, 2 figures, 2 tables, NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Developers of text-to-speech synthesizers (TTS) often make use of human
    raters to assess the quality of synthesized speech. We demonstrate that we can
    model human raters’ mean opinion scores (MOS) of synthesized speech using a
    deep recurrent neural network whose inputs consist solely of a raw waveform.
    Our best models provide utterance-level estimates of MOS only moderately
    inferior to sampled human ratings, as shown by Pearson and Spearman
    correlations. When multiple utterances are scored and averaged, a scenario
    common in synthesizer quality assessment, AutoMOS achieves correlations
    approaching those of human raters. The AutoMOS model has a number of
    applications, such as the ability to explore the parameter space of a speech
    synthesizer without requiring a human-in-the-loop.

    Learning to Compose Words into Sentences with Reinforcement Learning

    Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling
    Subjects: Computation and Language (cs.CL)

    We use reinforcement learning to learn tree-structured neural networks for
    computing representations of natural language sentences. In contrast with prior
    work on tree-structured models in which the trees are either provided as input
    or predicted using supervision from explicit treebank annotations, the tree
    structures in this work are optimized to improve performance on a downstream
    task. Experiments demonstrate the benefit of learning task-specific composition
    orders, outperforming both sequential encoders and recursive encoders based on
    treebank annotations. We analyze the induced trees and show that while they
    discover some linguistically intuitive structures (e.g., noun phrases, simple
    verb phrases), they are different than conventional English syntactic
    structures.

    Developing a cardiovascular disease risk-factors annotated corpus of Chinese electronic medical records

    Jia Su, Bin He, Yi Guan, Jingchi Jiang, Jinfeng Yang
    Comments: 29 pages, 3 figures, 3 tables
    Subjects: Computation and Language (cs.CL)

    Objective The goal of this study was to build a corpus of cardiovascular
    disease (CVD) risk-factor annotations based on Chinese electronic medical
    records (CEMRs). This corpus is intended to be used to develop a risk-factor
    information extraction system that, in turn, can be applied as a foundation for
    the further study of the progress of risk-factors and CVD. Materials and
    Methods We designed a light-annotation-task to capture CVD-risk-factors with
    indicators, temporal attributes and assertions explicitly displayed in the
    records. The task included: 1) preparing data; 2) creating guidelines for
    capturing annotations (these were created with the help of clinicians); 3)
    proposing annotation method including building the guidelines draft, training
    the annotators and updating the guidelines, and corpus construction. Results
    The outcome of this study was a risk-factor-annotated corpus based on
    de-identified discharge summaries and progress notes from 600 patients. Built
    with the help of specialists, this corpus has an inter-annotator agreement
    (IAA) F1-measure of 0.968, indicating a high reliability. Discussion Our
    annotations included 12 CVD-risk-factors such as Hypertension and Diabetes. The
    annotations can be applied as a powerful tool to the management of these
    chronic diseases and the prediction of CVD. Conclusion Guidelines for capturing
    CVD-risk-factor annotations from CEMRs were proposed and an annotated corpus
    was established. The obtained document-level annotations can be applied in
    future studies to monitor risk-factors and CVD over the long term.

    Exploiting Unlabeled Data for Neural Grammatical Error Detection

    Zhuoran Liu, Yang Liu
    Subjects: Computation and Language (cs.CL)

    Identifying and correcting grammatical errors in the text written by
    non-native writers has received increasing attention in recent years. Although
    a number of annotated corpora have been established to facilitate data-driven
    grammatical error detection and correction approaches, they are still limited
    in terms of quantity and coverage because human annotation is labor-intensive,
    time-consuming, and expensive. In this work, we propose to utilize unlabeled
    data to train neural network based grammatical error detection models. The
    basic idea is to cast error detection as a binary classification problem and
    derive positive and negative training examples from unlabeled data. We
    introduce an attention-based neural network to capture long-distance
    dependencies that influence the word being detected. Experiments show that the
    proposed approach significantly outperforms SVMs and convolutional networks
    with fixed-size context window.

    Learning a Natural Language Interface with Neural Programmer

    Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Learning a natural language interface for database tables is a challenging
    task that involves deep language understanding and multi-step reasoning. The
    task is often approached by mapping natural language queries to logical forms
    or programs that provide the desired response when executed on the database. To
    our knowledge, this paper presents the first weakly supervised, end-to-end
    neural network model to induce such programs on a real-world dataset. We
    enhance the objective function of Neural Programmer, a neural network with
    built-in discrete operations, and apply it on WikiTableQuestions, a natural
    language question-answering dataset. The model is trained end-to-end with weak
    supervision of question-answer pairs, and does not require domain-specific
    grammars, rules, or annotations that are key elements in previous approaches to
    program induction. The main experimental result in this paper is that a single
    Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with
    weak supervision. An ensemble of 15 models, with a trivial combination
    technique, achieves 37.2% accuracy, which is competitive to the current
    state-of-the-art accuracy of 37.1% obtained by a traditional natural language
    semantic parser.

    Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

    Hila Gonen, Yoav Goldberg
    Comments: 12 pages; COLING 2016
    Subjects: Computation and Language (cs.CL)

    Prepositions are very common and very ambiguous, and understanding their
    sense is critical for understanding the meaning of the sentence. Supervised
    corpora for the preposition-sense disambiguation task are small, suggesting a
    semi-supervised approach to the task. We show that signals from unannotated
    multilingual data can be used to improve supervised preposition-sense
    disambiguation. Our approach pre-trains an LSTM encoder for predicting the
    translation of a preposition, and then incorporates the pre-trained encoder as
    a component in a supervised classification system, and fine-tunes it for the
    task. The multilingual signals consistently improve results on two
    preposition-sense datasets.

    The polysemy of the words that children learn over time

    Bernardino Casas, Neus Català, Ramon Ferrer-i-Cancho, Antoni Hernández-Fernández, Jaume Baixeries
    Subjects: Computation and Language (cs.CL); Physics and Society (physics.soc-ph)

    Here we study polysemy as a potential learning bias in vocabulary learning in
    children. We employ a massive set of transcriptions of conversations between
    children and adults in English, to analyze the evolution of mean polysemy in
    the words produced by children whose ages range between 10 and 60 months. Our
    results show that mean polysemy in children increases over time in two phases,
    i.e. a fast growth till the 31st month followed by a slower tendency towards
    adult speech. In contrast, no dependency with time is found in adults. This
    suggests that children have a preference for non-polysemous words in their
    early stages of vocabulary acquisition. Our hypothesis is twofold: (a) polysemy
    is a standalone bias or (b) polysemy is a side-effect of other biases.
    Interestingly, the bias for low polysemy described above weakens when
    controlling for syntactic category (noun, verb, adjective or adverb). The
    pattern of the evolution of polysemy suggests that both hypotheses may apply to
    some extent, and that (b) would originate from a combination of the well-known
    preference for nouns and the lower polysemy of nouns with respect to other
    syntactic categories.

    Fill it up: Exploiting partial dependency annotations in a minimum spanning tree parser

    Liang Sun, Jason Mielens, Jason Baldridge
    Subjects: Computation and Language (cs.CL)

    Unsupervised models of dependency parsing typically require large amounts of
    clean, unlabeled data plus gold-standard part-of-speech tags. Adding indirect
    supervision (e.g. language universals and rules) can help, but we show that
    obtaining small amounts of direct supervision – here, partial dependency
    annotations – provides a strong balance between zero and full supervision. We
    adapt the unsupervised ConvexMST dependency parser to learn from partial
    dependencies expressed in the Graph Fragment Language. With less than 24 hours
    of total annotation, we obtain 7% and 17% absolute improvement in unlabeled
    dependency scores for English and Spanish, respectively, compared to the same
    parser using only universal grammar constraints.

    Knowledge Graph Representation with Jointly Structural and Textual Encoding

    Jiacheng Xu, Kan Chen, Xipeng Qiu, Xuanjing Huang
    Subjects: Computation and Language (cs.CL)

    The objective of knowledge graph embedding is to encode both entities and
    relations of knowledge graphs into continuous low-dimensional vector spaces.
    Previously, most works focused on symbolic representation of knowledge graph
    with structure information, which can not handle new entities or entities with
    few facts well. In this paper, we propose a novel deep architecture to utilize
    both structural and textual information of entities. Specifically, we introduce
    three neural models to encode the valuable information from text description of
    entity, among which an attentive model can select related information as
    needed. Then, a gating mechanism is applied to integrate representations of
    structure and text into a unified architecture. Experiments show that our
    models outperform baseline by margin on link prediction and triplet
    classification tasks. Source codes of this paper will be available on Github.

    Attention-based Memory Selection Recurrent Network for Language Modeling

    Da-Rong Liu, Shun-Po Chuang, Hung-yi Lee
    Subjects: Computation and Language (cs.CL)

    Recurrent neural networks (RNNs) have achieved great success in language
    modeling. However, since the RNNs have fixed size of memory, their memory
    cannot store all the information about the words it have seen before in the
    sentence, and thus the useful long-term information may be ignored when
    predicting the next words. In this paper, we propose Attention-based Memory
    Selection Recurrent Network (AMSRN), in which the model can review the
    information stored in the memory at each previous time step and select the
    relevant information to help generate the outputs. In AMSRN, the attention
    mechanism finds the time steps storing the relevant information in the memory,
    and memory selection determines which dimensions of the memory are involved in
    computing the attention weights and from which the information is extracted.In
    the experiments, AMSRN outperformed long short-term memory (LSTM) based
    language models on both English and Chinese corpora. Moreover, we investigate
    using entropy as a regularizer for attention weights and visualize how the
    attention mechanism helps language modeling.

    Statistical Properties of European Languages and Voynich Manuscript Analysis

    Andronik Arutyunov, Leonid Borisov, Sergey Fedorov, Anastasiya Ivchenko, Elizabeth Kirina-Lilinskaya, Yurii Orlov, Konstantin Osminin, Sergey Shilin, Dmitriy Zeniuk
    Subjects: Applications (stat.AP); Computation and Language (cs.CL)

    The statistical properties of letters frequencies in European literature
    texts are investigated. The determination of logarithmic dependence of letters
    sequence for one-language and two-language texts are examined. The pare of
    languages is suggested for Voynich Manuscript. The internal structure of
    Manuscript is considered. The spectral portraits of two-letters distribution
    are constructed.

    Analyzing Features for the Detection of Happy Endings in German Novels

    Fotis Jannidis, Isabella Reger, Albin Zehe, Martin Becker, Lena Hettinger, Andreas Hotho
    Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    With regard to a computational representation of literary plot, this paper
    looks at the use of sentiment analysis for happy ending detection in German
    novels. Its focus lies on the investigation of previously proposed sentiment
    features in order to gain insight about the relevance of specific features on
    the one hand and the implications of their performance on the other hand.
    Therefore, we study various partitionings of novels, considering the highly
    variable concept of “ending”. We also show that our approach, even though still
    rather simple, can potentially lead to substantial findings relevant to
    literary studies.

    Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-many Mappings

    Nana Li, Shuangfei Zhai, Zhongfei Zhang, Boying Liu
    Comments: To appear in AAAI 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    Structural correspondence learning (SCL) is an effective method for
    cross-lingual sentiment classification. This approach uses unlabeled documents
    along with a word translation oracle to automatically induce task specific,
    cross-lingual correspondences. It transfers knowledge through identifying
    important features, i.e., pivot features. For simplicity, however, it assumes
    that the word translation oracle maps each pivot feature in source language to
    exactly only one word in target language. This one-to-one mapping between words
    in different languages is too strict. Also the context is not considered at
    all. In this paper, we propose a cross-lingual SCL based on distributed
    representation of words; it can learn meaningful one-to-many mappings for pivot
    words using large amounts of monolingual data and a small dictionary. We
    conduct experiments on NLP&CC 2013 cross-lingual sentiment analysis dataset,
    employing English as source language, and Chinese as target language. Our
    method does not rely on the parallel corpora and the experimental results show
    that our approach is more competitive than the state-of-the-art methods in
    cross-lingual sentiment classification.

    Deep Reinforcement Learning for Multi-Domain Dialogue Systems

    Heriberto Cuayáhuitl, Seunghak Yu, Ashley Williamson, Jacob Carse
    Comments: NIPS Workshop on Deep Reinforcement Learning, 2016
    Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

    Standard deep reinforcement learning methods such as Deep Q-Networks (DQN)
    for multiple tasks (domains) face scalability problems. We propose a method for
    multi-domain dialogue policy learning—termed NDQN, and apply it to an
    information-seeking spoken dialogue system in the domains of restaurants and
    hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed)
    using simulations report that our proposed method exhibits better scalability
    and is promising for optimising the behaviour of multi-domain dialogue systems.

    Visual Dialog

    Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
    Comments: 22 pages, 16 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    We introduce the task of Visual Dialog, which requires an AI agent to hold a
    meaningful dialog with humans in natural, conversational language about visual
    content. Specifically, given an image, a dialog history, and a question about
    the image, the agent has to ground the question in image, infer context from
    history, and answer the question accurately. Visual Dialog is disentangled
    enough from a specific downstream task so as to serve as a general test of
    machine intelligence, while being grounded in vision enough to allow objective
    evaluation of individual responses and benchmark progress. We develop a novel
    two-person chat data-collection protocol to curate a large-scale Visual Dialog
    dataset (VisDial). Data collection is underway and on completion, VisDial will
    contain 1 dialog with 10 question-answer pairs on all ~200k images from COCO,
    with a total of 2M dialog question-answer pairs.

    We introduce a family of neural encoder-decoder models for Visual Dialog with
    3 encoders — Late Fusion, Hierarchical Recurrent Encoder and Memory Network —
    and 2 decoders (generative and discriminative), which outperform a number of
    sophisticated baselines. We propose a retrieval-based evaluation protocol for
    Visual Dialog where the AI agent is asked to sort a set of candidate answers
    and evaluated on metrics such as mean-reciprocal-rank of human response. We
    quantify gap between machine and human performance on the Visual Dialog task
    via human studies. Our dataset, code, and trained models will be released
    publicly. Putting it all together, we demonstrate the first ‘visual chatbot’!


    Distributed, Parallel, and Cluster Computing

    In situ, steerable, hardware-independent and data-structure agnostic visualization with ISAAC

    Alexander Matthes (1 and 2), Axel Huebl (1 and 2), René Widera (2), Sebastian Grottel (1), Stefan Gumhold (1), Michael Bussmann (2) ((1) Helmholtz-Zentrum Dresden — Rossendorf, (2) Technische Universität Dresden)
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The computation power of supercomputers grows faster than the bandwidth of
    their storage and network. Especially applications using hardware accelerators
    like Nvidia GPUs cannot save enough data to be analyzed in a later step. There
    is a high risk of loosing important scientific information. We introduce the in
    situ template library ISAAC which enables arbitrary applications like
    scientific simulations to live visualize their data without the need of deep
    copy operations or data transformation using the very same compute node and
    hardware accelerator the data is already residing on. Arbitrary meta data can
    be added to the renderings and user defined steering commands can be
    asynchronously sent back to the running application. Using an aggregating
    server, ISAAC streams the interactive visualization video and enables user to
    access their applications from everywhere.

    Asynchronous Broadcasting with Bivalent Beeps

    Kokouvi Hounkanli, Andrzej Pelc
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    In broadcasting, one node of a network has a message that must be learned by
    all other nodes. We study deterministic algorithms for this fundamental
    communication task in a very weak model of wireless communication. The only
    signals sent by nodes are beeps. Moreover, they are delivered to neighbors of
    the beeping node in an asynchronous way: the time between sending and reception
    is finite but unpredictable. We first observe that under this scenario, no
    communication is possible, if beeps are all of the same strength. Hence we
    study broadcasting in the bivalent beeping model, where every beep can be
    either soft or loud. At the receiving end, if exactly one soft beep is received
    by a node in a round, it is heard as soft. Any other combination of beeps
    received in a round is heard as a loud beep. The cost of a broadcasting
    algorithm is the total number of beeps sent by all nodes.

    We consider four levels of knowledge that nodes may have about the network:
    anonymity (no knowledge whatsoever), ad-hoc (all nodes have distinct labels and
    every node knows only its own label), neighborhood awareness (every node knows
    its label and labels of all neighbors), and full knowledge (every node knows
    the entire labeled map of the network and the identity of the source). We first
    show that in the anonymous case, broadcasting is impossible even for very
    simple networks. For each of the other three knowledge levels we provide upper
    and lower bounds on the minimum cost of a broadcasting algorithm. Our results
    show separations between all these scenarios. Perhaps surprisingly, the jump in
    broadcasting cost between the ad-hoc and neighborhood awareness levels is much
    larger than between the neighborhood awareness and full knowledge levels,
    although in the two former levels knowledge of nodes is local, and in the
    latter it is global.

    An Efficient Max-Min Resource Allocator and Task Scheduling Algorithm in Cloud Computing Environment

    J. Kok Konjaang, J.Y. Maipan-uku, Kumangkem Kennedy Kubuga
    Comments: 6 pages,6 figures, Article published
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Cloud computing is a new archetype that provides dynamic computing services
    to cloud users through the support of datacenters that employs the services of
    datacenter brokers which discover resources and assign them Virtually. The
    focus of this research is to efficiently optimize resource allocation in the
    cloud by exploiting the Max-Min scheduling algorithm and enhancing it to
    increase efficiency in terms of completion time (makespan). This is key to
    enhancing the performance of cloud scheduling and narrowing the performance gap
    between cloud service providers and cloud resources consumers/users. The
    current Max-Min algorithm selects tasks with maximum execution time on a faster
    available machine or resource that is capable of giving minimum completion
    time. The concern of this algorithm is to give priority to tasks with maximum
    execution time first before assigning those with the minimum execution time for
    the purpose of minimizing makespan. The drawback of this algorithm is that, the
    execution of tasks with maximum execution time first may increase the makespan,
    and leads to a delay in executing tasks with minimum execution time if the
    number of tasks with maximum execution time exceeds that of tasks with minimum
    execution time, hence the need to improve it to mitigate the delay in executing
    tasks with minimum execution time. CloudSim is used to compare the
    effectiveness of the improved Max-Min algorithm with the traditional one. The
    experimented results show that the improved algorithm is efficient and can
    produce better makespan than Max-Min and DataAware.

    Eliminating Tight Coupling using Subscriptions Subgrouping in Structured Overlays

    Muhammad Shafique
    Comments: None
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Cyclic overlays may provide multiple paths between a publisher and a
    subscriber, an advertisement-tree and a matching subscription enables only one
    path for notifications routing in Publish/Subscribe (PS) systems. This poses
    serious challenges in offering dynamic routing when network conditions like
    congestion, and link or broker failures are detected. We argue that the tight
    coupling between an advertisement and matching subscriptions is the main
    bottleneck in providing multiple routing paths. This paper introduces
    subscription subgrouping that divides subscriptions into multiple sets, called
    subscription subgroups, to eliminate tight coupling. We use the structuredness
    of a new Structured Cyclic Overlay Topology (SCOT) to create multiple routing
    paths between a publisher and a subscriber without saving link information in
    routing tables. A homogeneous clustering technique with a bit-vector scheme is
    developed to realize subscription subgrouping and offer inter-cluster dynamic
    routing. The advertisement and cluster-level subscription broadcast algorithms
    implemented in a prototype tool, called OctopiA, which creates routing paths of
    shortest-lengths. Experiments on a cluster testbed show that OctopiA reduces
    size of advertisement routing tables by 93\%, subscription delay by 33\%,
    static and dynamic notification delivery delays by 25\% and 54\%, respectively.

    A duality-based approach for distributed min-max optimization

    Ivano Notarnicola, Mauro Franceschelli, Giuseppe Notarstefano
    Comments: Submitted to journal
    Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC)

    In this paper we consider a distributed optimization scenario in which a set
    of processors aims at cooperatively solving a class of min-max optimization
    problems. This set-up is motivated by peak-demand minimization problems in
    smart grids. Here, the goal is to minimize the peak value over a finite horizon
    with: (i) the demand at each time instant being the sum of contributions from
    different devices, and (ii) the device states at different time instants being
    coupled through local constraints (e.g., the dynamics). The min-max structure
    and the double coupling (through the devices and over the time horizon) makes
    this problem challenging in a distributed set-up (e.g., existing distributed
    dual decomposition approaches cannot be applied). We propose a distributed
    algorithm based on the combination of duality methods and properties from
    min-max optimization. Specifically, we repeatedly apply duality theory and
    properly introduce ad-hoc slack variables in order to derive a series of
    equivalent problems. On the resulting problem we apply a dual subgradient
    method, which turns out to be a distributed algorithm consisting of a
    minimization on the original primal variables and a suitable dual update. We
    prove the convergence of the proposed algorithm in objective value. Moreover,
    we show that every limit point of the primal sequence is an optimal (feasible)
    solution. Finally, we provide numerical computations for a peak-demand
    optimization problem in a network of thermostatically controlled loads.


    Learning

    Unifying Multi-Domain Multi-Task Learning: Tensor and Neural Network Perspectives

    Yongxin Yang, Timothy M. Hospedales
    Comments: Invited book chapter
    Subjects: Learning (cs.LG)

    Multi-domain learning aims to benefit from simultaneously learning across
    several different but related domains. In this chapter, we propose a single
    framework that unifies multi-domain learning (MDL) and the related but better
    studied area of multi-task learning (MTL). By exploiting the concept of a
    emph{semantic descriptor} we show how our framework encompasses various
    classic and recent MDL/MTL algorithms as special cases with different semantic
    descriptor encodings. As a second contribution, we present a higher order
    generalisation of this framework, capable of simultaneous
    multi-task-multi-domain learning. This generalisation has two mathematically
    equivalent views in multi-linear algebra and gated neural networks
    respectively. Moreover, by exploiting the semantic descriptor, it provides
    neural networks the capability of zero-shot learning (ZSL), where a classifier
    is generated for an unseen class without any training data; as well as
    zero-shot domain adaptation (ZSDA), where a model is generated for an unseen
    domain without any training data. In practice, this framework provides a
    powerful yet easy to implement method that can be flexibly applied to MTL, MDL,
    ZSL and ZSDA.

    Diet Networks: Thin Parameters for Fat Genomic

    Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio
    Subjects: Learning (cs.LG)

    Learning tasks such as those involving genomic data often poses a serious
    challenge: the number of input features can be orders of magnitude larger than
    the number of training examples, making it difficult to avoid overfitting, even
    when using the known regularization techniques. We focus here on tasks in which
    the input is a description of the genetic variation specific to a patient, the
    single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs.
    Improving the ability of deep learning to handle such datasets could have an
    important impact in precision medicine, where high-dimensional data regarding a
    particular patient is used to make predictions of interest. Even though the
    amount of data for such tasks is increasing, this mismatch between the number
    of examples and the number of inputs remains a concern. Naive implementations
    of classifier neural networks involve a huge number of free parameters in their
    first layer: each input feature is associated with as many parameters as there
    are hidden units. We propose a novel neural network parametrization which
    considerably reduces the number of free parameters. It is based on the idea
    that we can first learn or provide a distributed representation for each input
    feature (e.g. for each position in the genome where variations are observed),
    and then learn (with another neural network called the parameter prediction
    network) how to map a feature’s distributed representation to the vector of
    parameters specific to that feature in the classifier neural network (the
    weights which link the value of the feature to each of the hidden units). We
    show experimentally on a population stratification task of interest to medical
    studies that the proposed approach can significantly reduce both the number of
    parameters and the error rate of the classifier.

    Dictionary Learning with Equiprobable Matching Pursuit

    Fredrik Sandin, Sergio Martin-del-Campo
    Comments: 8 pages, 8 figures
    Subjects: Learning (cs.LG)

    Sparse signal representations based on linear combinations of learned atoms
    have been used to obtain state-of-the-art results in several practical signal
    processing applications. Approximation methods are needed to process
    high-dimensional signals in this way because the problem to calculate optimal
    atoms for sparse coding is NP-hard. Here we study greedy algorithms for
    unsupervised learning of dictionaries of shift-invariant atoms and propose a
    new method where each atom is selected with the same probability on average,
    which corresponds to the homeostatic regulation of a recurrent convolutional
    neural network. Equiprobable selection can be used with several greedy
    algorithms for dictionary learning to ensure that all atoms adapt during
    training and that no particular atom is more likely to take part in the linear
    combination on average. We demonstrate via simulation experiments that
    dictionary learning with equiprobable selection results in higher entropy of
    the sparse representation and lower reconstruction and denoising errors, both
    in the case of ordinary matching pursuit and orthogonal matching pursuit with
    shift-invariant dictionaries. Furthermore, we show that the computational costs
    of the matching pursuits are lower with equiprobable selection, leading to
    faster and more accurate dictionary learning algorithms.

    Improving Policy Gradient by Exploring Under-appreciated Rewards

    Ofir Nachum, Mohammad Norouzi, Dale Schuurmans
    Comments: Under review at ICLR 2017
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI)

    This paper presents a novel form of policy gradient for model-free
    reinforcement learning (RL) with improved exploration properties. Current
    policy-based methods use entropy regularization to encourage undirected
    exploration of the reward landscape, which is ineffective in high dimensional
    spaces with sparse rewards. We propose a more directed exploration strategy
    that promotes exploration of {em under-appreciated reward} regions. An action
    sequence is considered under-appreciated if its log-probability under the
    current policy under-estimates its mbox{resulting} reward. The proposed
    exploration strategy is easy to implement, requiring small modifications to an
    implementation of the REINFORCE algorithm. We evaluate the approach on a set of
    algorithmic tasks that have long challenged RL methods. Our approach reduces
    hyper-parameter sensitivity and demonstrates significant improvements over
    baseline methods. Our algorithm successfully solves a benchmark multi-digit
    addition task and generalizes to long sequences. This is, to our knowledge, the
    first time that a pure RL method has solved addition using only reward
    feedback.

    Robust Variational Inference

    Michael Figurnov, Kirill Struminsky, Dmitry Vetrov
    Comments: NIPS 2016 Workshop, Advances in Approximate Bayesian Inference
    Subjects: Learning (cs.LG)

    Variational inference is a powerful tool for approximate inference. However,
    it mainly focuses on the evidence lower bound as variational objective and the
    development of other measures for variational inference is a promising area of
    research. This paper proposes a robust modification of evidence and a lower
    bound for the evidence, which is applicable when the majority of the training
    set samples are random noise objects. We provide experiments for variational
    autoencoders to show advantage of the objective over the evidence lower bound
    on synthetic datasets obtained by adding uninformative noise objects to MNIST
    and OMNIGLOT. Additionally, for the original MNIST and OMNIGLOT datasets we
    observe a small improvement over the non-robust evidence lower bound.

    Times series averaging and denoising from a probabilistic perspective on time-elastic kernels

    Pierre-François Marteau (EXPRESSION)
    Subjects: Learning (cs.LG)

    In the light of regularized dynamic time warping kernels, this paper
    re-considers the concept of time elastic centroid for a setof time series. We
    derive a new algorithm based on a probabilistic interpretation of kernel
    alignment matrices. This algorithm expressesthe averaging process in terms of a
    stochastic alignment automata. It uses an iterative agglomerative heuristic
    method for averagingthe aligned samples, while also averaging the times of
    occurrence of the aligned samples. By comparing classification accuracies for45
    heterogeneous time series datasets obtained by first nearest centroid/medoid
    classifiers we show that: i) centroid-basedapproaches significantly outperform
    medoid-based approaches, ii) for the considered datasets, our algorithm that
    combines averagingin the sample space and along the time axes, emerges as the
    most significantly robust model for time-elastic averaging with apromising
    noise reduction capability. We also demonstrate its benefit in an isolated
    gesture recognition experiment and its ability tosignificantly reduce the size
    of training instance sets. Finally we highlight its denoising capability using
    demonstrative synthetic data:we show that it is possible to retrieve, from few
    noisy instances, a signal whose components are scattered in a wide spectral
    band.

    Should I use TensorFlow

    Martin Schrimpf
    Comments: Seminar Paper
    Subjects: Learning (cs.LG)

    Google’s Machine Learning framework TensorFlow was open-sourced in November
    2015 [1] and has since built a growing community around it. TensorFlow is
    supposed to be flexible for research purposes while also allowing its models to
    be deployed productively. This work is aimed towards people with experience in
    Machine Learning considering whether they should use TensorFlow in their
    environment. Several aspects of the framework important for such a decision are
    examined, such as the heterogenity, extensibility and its computation graph. A
    pure Python implementation of linear classification is compared with an
    implementation utilizing TensorFlow. I also contrast TensorFlow to other
    popular frameworks with respect to modeling capability, deployment and
    performance and give a brief description of the current adaption of the
    framework.

    Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-many Mappings

    Nana Li, Shuangfei Zhai, Zhongfei Zhang, Boying Liu
    Comments: To appear in AAAI 2017
    Subjects: Learning (cs.LG); Computation and Language (cs.CL)

    Structural correspondence learning (SCL) is an effective method for
    cross-lingual sentiment classification. This approach uses unlabeled documents
    along with a word translation oracle to automatically induce task specific,
    cross-lingual correspondences. It transfers knowledge through identifying
    important features, i.e., pivot features. For simplicity, however, it assumes
    that the word translation oracle maps each pivot feature in source language to
    exactly only one word in target language. This one-to-one mapping between words
    in different languages is too strict. Also the context is not considered at
    all. In this paper, we propose a cross-lingual SCL based on distributed
    representation of words; it can learn meaningful one-to-many mappings for pivot
    words using large amounts of monolingual data and a small dictionary. We
    conduct experiments on NLP&CC 2013 cross-lingual sentiment analysis dataset,
    employing English as source language, and Chinese as target language. Our
    method does not rely on the parallel corpora and the experimental results show
    that our approach is more competitive than the state-of-the-art methods in
    cross-lingual sentiment classification.

    Machine Learning on Human Connectome Data from MRI

    Colin J Brown, Ghassan Hamarneh
    Comments: 51 pages, 6 figures. To be submitted to a journal
    Subjects: Learning (cs.LG)

    Functional MRI (fMRI) and diffusion MRI (dMRI) are non-invasive imaging
    modalities that allow in-vivo analysis of a patient’s brain network (known as a
    connectome). Use of these technologies has enabled faster and better diagnoses
    and treatments of neurological disorders and a deeper understanding of the
    human brain. Recently, researchers have been exploring the application of
    machine learning models to connectome data in order to predict clinical
    outcomes and analyze the importance of subnetworks in the brain. Connectome
    data has unique properties, which present both special challenges and
    opportunities when used for machine learning. The purpose of this work is to
    review the literature on the topic of applying machine learning models to
    MRI-based connectome data. This field is growing rapidly and now encompasses a
    large body of research. To summarize the research done to date, we provide a
    comparative, structured summary of 77 relevant works, tabulated according to
    different criteria, that represent the majority of the literature on this
    topic. (We also published a living version of this table online at
    this http URL that the community can continue to
    contribute to.) After giving an overview of how connectomes are constructed
    from dMRI and fMRI data, we discuss the variety of machine learning tasks that
    have been explored with connectome data. We then compare the advantages and
    drawbacks of different machine learning approaches that have been employed,
    discussing different feature selection and feature extraction schemes, as well
    as the learning models and regularization penalties themselves. Throughout this
    discussion, we focus particularly on how the methods are adapted to the unique
    nature of graphical connectome data. Finally, we conclude by summarizing the
    current state of the art and by outlining what we believe are strategic
    directions for future research.

    Training an Interactive Humanoid Robot Using Multimodal Deep Reinforcement Learning

    Heriberto Cuayáhuitl, Guillaume Couly, Clément Olalainty
    Comments: NIPS Workshop on Future of Interactive Learning Machines, 2016
    Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

    Training robots to perceive, act and communicate using multiple modalities
    still represents a challenging problem, particularly if robots are expected to
    learn efficiently from small sets of example interactions. We describe a
    learning approach as a step in this direction, where we teach a humanoid robot
    how to play the game of noughts and crosses. Given that multiple multimodal
    skills can be trained to play this game, we focus our attention to training the
    robot to perceive the game, and to interact in this game. Our multimodal deep
    reinforcement learning agent perceives multimodal features and exhibits verbal
    and non-verbal actions while playing. Experimental results using simulations
    show that the robot can learn to win or draw up to 98% of the games. A pilot
    test of the proposed multimodal system for the targeted game—integrating
    speech, vision and gestures—reports that reasonable and fluent interactions
    can be achieved using the proposed approach.

    Accelerated Gradient Temporal Difference Learning

    Yangchen Pan, Adam White, Martha White
    Comments: AAAI Conference on Artificial Intelligence (AAAI), 2017
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

    The family of temporal difference (TD) methods span a spectrum from
    computationally frugal linear methods like TD({lambda}) to data efficient
    least squares methods. Least square methods make the best use of available data
    directly computing the TD solution and thus do not require tuning a typically
    highly sensitive learning rate parameter, but require quadratic computation and
    storage. Recent algorithmic developments have yielded several sub-quadratic
    methods that use an approximation to the least squares TD solution, but incur
    bias. In this paper, we propose a new family of accelerated gradient TD (ATD)
    methods that (1) provide similar data efficiency benefits to least-squares
    methods, at a fraction of the computation and storage (2) significantly reduce
    parameter sensitivity compared to linear TD methods, and (3) are asymptotically
    unbiased. We illustrate these claims with a proof of convergence in expectation
    and experiments on several benchmark domains and a large-scale industrial
    energy allocation domain.

    Dense Prediction on Sequences with Time-Dilated Convolutions for Speech Recognition

    Tom Sercu, Vaibhava Goel
    Comments: Appeared at NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    In computer vision pixelwise dense prediction is the task of predicting a
    label for each pixel in the image. Convolutional neural networks achieve good
    performance on this task, while being computationally efficient. In this paper
    we carry these ideas over to the problem of assigning a sequence of labels to a
    set of speech frames, a task commonly known as framewise classification. We
    show that dense prediction view of framewise classification offers several
    advantages and insights, including computational efficiency and the ability to
    apply batch normalization. When doing dense prediction we pay specific
    attention to strided pooling in time and introduce an asymmetric dilated
    convolution, called time-dilated convolution, that allows for efficient and
    elegant implementation of pooling in time. We show that by using time-dilated
    convolutions with a very deep VGG-style CNN with batch normalization, we
    achieve best published single model accuracy result on the switchboard-2000
    benchmark dataset.

    Efficient Convolutional Auto-Encoding via Random Convexification and Frequency-Domain Minimization

    Meshia Cédric Oveneke, Mitchel Aliosha-Perez, Yong Zhao, Dongmei Jiang, Hichem Sahli
    Comments: Accepted at NIPS 2016 Workshop on Efficient Methods for Deep Neural Networks (EMDNN)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The omnipresence of deep learning architectures such as deep convolutional
    neural networks (CNN)s is fueled by the synergistic combination of
    ever-increasing labeled datasets and specialized hardware. Despite the
    indisputable success, the reliance on huge amounts of labeled data and
    specialized hardware can be a limiting factor when approaching new
    applications. To help alleviating these limitations, we propose an efficient
    learning strategy for layer-wise unsupervised training of deep CNNs on
    conventional hardware in acceptable time. Our proposed strategy consists of
    randomly convexifying the reconstruction contractive auto-encoding (RCAE)
    learning objective and solving the resulting large-scale convex minimization
    problem in the frequency domain via coordinate descent (CD). The main
    advantages of our proposed learning strategy are: (1) single tunable
    optimization parameter; (2) fast and guaranteed convergence; (3) possibilities
    for full parallelization. Numerical experiments show that our proposed learning
    strategy scales (in the worst case) linearly with image size, number of filters
    and filter size.

    AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

    Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley
    Comments: 4 pages, 2 figures, 2 tables, NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Developers of text-to-speech synthesizers (TTS) often make use of human
    raters to assess the quality of synthesized speech. We demonstrate that we can
    model human raters’ mean opinion scores (MOS) of synthesized speech using a
    deep recurrent neural network whose inputs consist solely of a raw waveform.
    Our best models provide utterance-level estimates of MOS only moderately
    inferior to sampled human ratings, as shown by Pearson and Spearman
    correlations. When multiple utterances are scored and averaged, a scenario
    common in synthesizer quality assessment, AutoMOS achieves correlations
    approaching those of human raters. The AutoMOS model has a number of
    applications, such as the ability to explore the parameter space of a speech
    synthesizer without requiring a human-in-the-loop.

    Image Based Appraisal of Real Estate Properties

    Quanzeng You, Ran Pang, Jiebo Luo
    Comments: 8 pages, 8 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)

    Real estate appraisal, which is the process of estimating the price for real
    estate properties, is crucial for both buys and sellers as the basis for
    negotiation and transaction. Traditionally, the repeat sales model has been
    widely adopted to estimate real estate price. However, it depends the design
    and calculation of a complex economic related index, which is challenging to
    estimate accurately. Today, real estate brokers provide easy access to detailed
    online information on real estate properties to their clients. We are
    interested in estimating the real estate price from these large amounts of
    easily accessed data. In particular, we analyze the prediction power of online
    house pictures, which is one of the key factors for online users to make a
    potential visiting decision. The development of robust computer vision
    algorithms makes the analysis of visual content possible. In this work, we
    employ a Recurrent Neural Network (RNN) to predict real estate price using the
    state-of-the-art visual features. The experimental results indicate that our
    model outperforms several of other state-of-the-art baseline algorithms in
    terms of both mean absolute error (MAE) and mean absolute percentage error
    (MAPE).

    DeepSetNet: Predicting Sets with Deep Neural Networks

    Seyed Hamid Rezatofighi, Vijay Kumar B G, Anton Milan, Ehsan Abbasnejad, Anthony Dick, Ian Reid
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)

    This paper addresses the task of set prediction using deep learning. This is
    important because the output of many computer vision tasks, including image
    tagging and object detection, are naturally expressed as sets of entities
    rather than vectors. As opposed to a vector, the size of a set is not fixed in
    advance, and it is invariant to the ordering of entities within it. We define a
    likelihood for a set distribution and learn its parameters using a deep neural
    network. We also derive a loss for predicting a discrete distribution
    corresponding to set cardinality. Set prediction is demonstrated on the
    problems of multi-class image classification and pedestrian detection. Our
    approach yields state-of-the-art results in both cases on standard datasets.

    Learning a Natural Language Interface with Neural Programmer

    Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei
    Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)

    Learning a natural language interface for database tables is a challenging
    task that involves deep language understanding and multi-step reasoning. The
    task is often approached by mapping natural language queries to logical forms
    or programs that provide the desired response when executed on the database. To
    our knowledge, this paper presents the first weakly supervised, end-to-end
    neural network model to induce such programs on a real-world dataset. We
    enhance the objective function of Neural Programmer, a neural network with
    built-in discrete operations, and apply it on WikiTableQuestions, a natural
    language question-answering dataset. The model is trained end-to-end with weak
    supervision of question-answer pairs, and does not require domain-specific
    grammars, rules, or annotations that are key elements in previous approaches to
    program induction. The main experimental result in this paper is that a single
    Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with
    weak supervision. An ensemble of 15 models, with a trivial combination
    technique, achieves 37.2% accuracy, which is competitive to the current
    state-of-the-art accuracy of 37.1% obtained by a traditional natural language
    semantic parser.

    Deep attractor network for single-microphone speaker separation

    Zhuo Chen, Yi Luo, Nima Mesgarani
    Comments: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    Subjects: Sound (cs.SD); Learning (cs.LG)

    Despite the overwhelming success of deep learning in various speech
    processing tasks, the problem of separating simultaneous speakers in a mixture
    remains challenging. Two major difficulties in such systems are the arbitrary
    source permutation and unknown number of sources in the mixture. We propose a
    novel deep learning framework for single channel speech separation by creating
    attractor points in high dimensional embedding space of the acoustic signals
    which pull together the time-frequency bins corresponding to each source.
    Attractor points in this study are created by finding the centroids of the
    sources in the embedding space, which are subsequently used to determine the
    similarity of each bin in the mixture to each source. The network is then
    trained to minimize the reconstruction error of each source by optimizing the
    embeddings. The proposed model is different from prior works in that it
    implements an end-to-end training, and it does not depend on the number of
    sources in the mixture. Two strategies are explored in the test time, K-means
    and fixed attractor points, where the latter requires no post-processing and
    can be implemented in real-time. We evaluated our system on Wall Street Journal
    dataset and show 5.49\% improvement over the previous state-of-the-art methods.

    What Can Be Predicted from Six Seconds of Driver Glances?

    Lex Fridman, Heishiro Toyoda, Sean Seaman, Bobbie Seppelt, Linda Angell, Joonbum Lee, Bruce Mehler, Bryan Reimer
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Learning (cs.LG)

    We consider a large dataset of real-world, on-road driving from a 100-car
    naturalistic study to explore the predictive power of driver glances and,
    specifically, to answer the following question: what can be predicted about the
    state of the driver and the state of the driving environment from a 6-second
    sequence of macro-glances? The context-based nature of such glances allows for
    application of supervised learning to the problem of vision-based gaze
    estimation, making it robust, accurate, and reliable in messy, real-world
    conditions. So, it’s valuable to ask whether such macro-glances can be used to
    infer behavioral, environmental, and demographic variables? We analyze 27
    binary classification problems based on these variables. The takeaway is that
    glance can be used as part of a multi-sensor real-time system to predict
    radio-tuning, fatigue state, failure to signal, talking, and several
    environment variables.

    Visual Dialog

    Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
    Comments: 22 pages, 16 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)

    We introduce the task of Visual Dialog, which requires an AI agent to hold a
    meaningful dialog with humans in natural, conversational language about visual
    content. Specifically, given an image, a dialog history, and a question about
    the image, the agent has to ground the question in image, infer context from
    history, and answer the question accurately. Visual Dialog is disentangled
    enough from a specific downstream task so as to serve as a general test of
    machine intelligence, while being grounded in vision enough to allow objective
    evaluation of individual responses and benchmark progress. We develop a novel
    two-person chat data-collection protocol to curate a large-scale Visual Dialog
    dataset (VisDial). Data collection is underway and on completion, VisDial will
    contain 1 dialog with 10 question-answer pairs on all ~200k images from COCO,
    with a total of 2M dialog question-answer pairs.

    We introduce a family of neural encoder-decoder models for Visual Dialog with
    3 encoders — Late Fusion, Hierarchical Recurrent Encoder and Memory Network —
    and 2 decoders (generative and discriminative), which outperform a number of
    sophisticated baselines. We propose a retrieval-based evaluation protocol for
    Visual Dialog where the AI agent is asked to sort a set of candidate answers
    and evaluated on metrics such as mean-reciprocal-rank of human response. We
    quantify gap between machine and human performance on the Visual Dialog task
    via human studies. Our dataset, code, and trained models will be released
    publicly. Putting it all together, we demonstrate the first ‘visual chatbot’!

    A Deep Neural Network to identify foreshocks in real time

    K.Vikraman
    Comments: Paper on earthquake prediction based on deep learning approach. 6 figures, two tables and 4 pages in total
    Subjects: Geophysics (physics.geo-ph); Learning (cs.LG)

    Foreshock events provide valuable insight to predict imminent major
    earthquakes. However, it is difficult to identify them in real time. In this
    paper, I propose an algorithm based on deep learning to instantaneously
    classify a seismic waveform as a foreshock, mainshock or an aftershock event
    achieving a high accuracy of 99% in classification. As a result, this is by far
    the most reliable method to predict major earthquakes that are preceded by
    foreshocks. In addition, I discuss methods to create an earthquake dataset that
    is compatible with deep networks.

    Patient-Driven Privacy Control through Generalized Distillation

    Z. Berkay Celik, David Lopez-Paz, Patrick McDaniel
    Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Learning (cs.LG); Machine Learning (stat.ML)

    The introduction of data analytics into medicine has changed the nature of
    treatment. In this, patients are asked to disclose personal information such as
    genetic markers, lifestyle habits, and clinical history. This data is then used
    by statistical models to predict personalized treatments. However, due to
    privacy concerns, patients often desire to withhold sensitive information. This
    self-censorship can impede proper diagnosis and treatment, which may lead to
    serious health complications and even death. In this paper, we present privacy
    distillation, a mechanism which allows patients to control the type and amount
    of information they wish to disclose to the healthcare providers for use in
    statistical models. Meanwhile, it retains the accuracy of models that have
    access to all patient data under a sufficient but not full set of
    privacy-relevant information. We validate privacy distillation using a corpus
    of patients prescribed to warfarin for a personalized dosage. We use a deep
    neural network to implement privacy distillation for training and making dose
    predictions. We find that privacy distillation with sufficient privacy-relevant
    information i) retains accuracy almost as good as having all patient data (only
    3% worse), and ii) is effective at preventing errors that introduce
    health-related risks (yielding on average 3.9% of under- or
    over-prescriptions).

    Dynamic Key-Value Memory Network for Knowledge Tracing

    Jiani Zhang, Xingjian Shi, Irwin King, Dit-Yan Yeung
    Comments: Part of the paper is under review in WWW 2017
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG)

    The goal of knowledge tracing is to model students’ mastering levels of
    underlying knowledge concepts, termed knowledge state, based on students’
    exercise performance data. However, existing methods, such as Bayesian
    Knowledge Tracing (BKT) or Deep Knowledge Tracing (DKT), either require costly
    human-labeled concept annotations or fail to exactly pinpoint which concepts a
    student is good at or unfamiliar with. To solve these problems, in this paper
    we introduce a new model called Dynamic Key-Value Memory Network (DKVMN) that
    can learn representations using nonlinear transformations and directly output a
    student’s mastering level of each concept. Unlike standard Memory-Augmented
    Neural Networks (MANNs) that facilitate a single memory matrix or two static
    memory matrices, our model has one static matrix called key that stores the
    knowledge concepts and the other dynamic matrix called value that stores and
    updates corresponding concepts’ mastery levels. Experiments show that our DKVMN
    model, which is trained end-to-end, consistently outperforms the
    state-of-the-art model on a range of knowledge tracing data-sets. We also
    illustrate that the learned DKVMN can automatically discover underlying
    concepts of the exercises which are typically performed by human annotations,
    and depict a student’s changing knowledge state.


    Information Theory

    Pilot Contamination is Not a Fundamental Asymptotic Limitation in Massive MIMO

    Emil Björnson, Jakob Hoydis, Luca Sanguinetti
    Comments: Submitted to IEEE ICC 2017, 6 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    Massive MIMO provides great improvements in spectral efficiency, by coherent
    combining over a large antenna array and by spatial multiplexing of many users.
    Since its inception, the coherent interference caused by pilot contamination
    has been believed to be an impairment that does not vanish, even with an
    unlimited number of antennas. In this work, we show that this belief is
    incorrect and it is basically an artifact from using simplistic channel models
    and combining schemes. We prove that with multi-cell MMSE combining, the
    spectral efficiency grows without bound as the number of antennas increases,
    even under pilot contamination, under a condition of linear independence
    between the channel covariance matrices. This condition is generally satisfied,
    except in special cases which can be hardly found in practice.

    Rate Adaptation for Secure HARQ Protocols

    Mael Le Treust, Leszek Szczecinski, Fabrice Labeau
    Comments: submitted to IEEE Transactions on Information Forensics and Security, november 28th, 2016
    Subjects: Information Theory (cs.IT)

    This paper investigates secure transmission using HARQ protocol when the
    encoder only knows the statistics of the channel-state. We analyze the tradeoff
    between reliability and secrecy probabilistic guarantees. The conventional
    approach of ensuring the secrecy via introduction of dummy-message is followed.
    However, unlike the previous works on this subject, we design a coding strategy
    tailored to HARQ by splitting the dummy-message rate over several rate
    parameters. These additional degrees of freedom improve the matching between
    the dummy-message rates and the realizations of the eavesdropper channels. We
    evaluate the performance in terms of secrecy outage probability, connection
    outage probability and secrecy throughput. For Rayleigh fading channel, the
    splitting of the dummy-message rate provides higher secrecy throughput and
    lower expected duration/average delay.

    Alphabet Size Reduction for Secure Network Coding: A Graph Theoretic Approach

    Xuan Guang, Raymond W. Yeung
    Comments: 35 pages
    Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)

    We consider a communication network where there exist wiretappers who can
    access a subset of channels, called a wiretap set, which is chosen from a given
    collection of wiretap sets. The collection of wiretap sets can be arbitrary.
    Secure network coding is applied to prevent the source information from being
    leaked to the wiretappers. In secure network coding, the required alphabet size
    is an open problem not only of theoretical interest but also of practical
    importance, because it is closely related to the implementation of such coding
    schemes in terms of computational complexity and storage requirement. In this
    paper, we develop a systematic graph-theoretic approach for improving Cai and
    Yeung’s lower bound on the required alphabet size for the existence of secure
    network codes. The new lower bound thus obtained, which depends only on the
    network topology and the collection of wiretap sets, can be significantly
    smaller than Cai and Yeung’s lower bound. A polynomial-time algorithm is
    devised for efficient computation of the new lower bound.

    Examining the Unique Decodability of Insertion-Correcting Codes Beyond the Guarantee

    Kayvon Mazooji
    Comments: 11 pages, 1 figure
    Subjects: Information Theory (cs.IT)

    Unlike the space of received words generated by substitution errors, the
    space of received words generated by insertion errors is infinite. Given an
    arbitrary code, it is therefore possible for there to exist an infinite number
    of received words that are unique to a particular codeword. This work explores
    the extent to which an arbitrary insertion-correcting code can take advantage
    of this fact. For a particular codeword (c) of length (n,) we are interested in
    two specific measures. The first is the fraction of received words of length
    (n+t) which are unique to (c.) The second is the fraction of insertion
    histories of length (t) which give rise to received words of length (n+t) that
    are unique to (c.) The first measure is the probability of uniquely decoding
    when (t) insertion occur, if all possible length (n+t) received words are
    equally likely. The second measure is the probability of uniquely decoding when
    (t) consecutive insertions occur, and each insertion position and element are
    selected uniformly at random. We provide upper bounds on both of these measures
    and examine the asymptotic behavior. We also show that both measures are
    non-negative for at least one codeword in every code, for all (t.) We proceed
    to examine 2 types of codes for only one insertion beyond their respective
    guarantees, and propose an open problem, which if proved true, would explain
    our surprising observations. Along the way, we make the practical observation
    that the design parameter (a) of a VT-code can be selected to maximize the
    code’s effectiveness for decoding 2 or more insertions. Finally, we analyze the
    set of sequences used to derive our upper bounds.

    A note on the Singleton bounds for codes over finite rings

    Yongsheng Tang, Heqian Xu, Zhonghua Sun
    Subjects: Information Theory (cs.IT)

    In this paper, we give a notation on the Singleton bounds for linear codes
    over a finite commutative quasi-Frobenius ring in the work of Shiromoto [5]. We
    show that there exists a class of finite commutative quasi-Frobenius rings. The
    Singleton bounds for linear codes over such rings satisfy

    [ frac{d(C)-1}{A}leq n-log_{|R|}|C|. ]

    Searching with Measurement Dependent Noise

    Yonatan Kaspi, Ofer Shayevitz, Tara Javidi
    Subjects: Information Theory (cs.IT)

    Consider a target moving at a constant velocity on a unit-circumference
    circle, starting at an arbitrary location. To acquire the target, any region of
    the circle can be probed to obtain a noisy measurement of the target’s
    presence, where the noise level increases with the size of the probed region.
    We are interested in the expected time required to find the target to within
    some given resolution and error probability. For a known velocity, we
    characterize the optimal tradeoff between time and resolution, and show that in
    contrast to the well studied case of constant measurement noise, measurement
    dependent noise incurs a multiplicative gap in the targeting rate between
    adaptive and non-adaptive search strategies. Moreover, our adaptive strategy
    attains the optimal rate-reliability tradeoff. We further show that for optimal
    non-adaptive search, accounting for an unknown velocity incurs a factor of at
    least two in the targeting rate.

    Differential Private Noise Adding Mechanism:Fundamental Theory and its Application

    Jianping He, Lin Cai
    Comments: 29 pages, 4 figures, journal
    Subjects: Information Theory (cs.IT)

    Differential privacy is a formal mathematical standard for quantifying the
    degree of that individual privacy in a statistical database is preserved. To
    guarantee differential privacy, a typical method is adding random noise to the
    original data for data release. In this paper, we investigate the fundamental
    theory of differential privacy considering the general random noise adding
    mechanism, and then apply this framework for privacy analysis of the
    privacy-preserving consensus algorithm. Specifically, we obtain a necessary and
    sufficient condition of (epsilon)-differential privacy, and the sufficient
    conditions of ((epsilon, delta))-differential privacy. This theoretical
    framework provides a useful and efficient criterion of achieving differential
    privacy. We utilize them to analyze the privacy of some common random noises
    and the theory matches with the existing literature for special cases. Applying
    the theory, differential privacy property of a privacy-preserving consensus
    algorithm is investigated based on the framework. We obtain the necessary
    condition and the sufficient condition for the privacy-preserving consensus
    algorithm, under which differential privacy is achieved, respectively. In
    addition, it is proved that the average consensus and (epsilon)-differential
    privacy cannot be guaranteed simultaneously by any privacy-preserving consensus
    algorithm.

    Transmission Policies for Interference Management in Full-Duplex D2D Communication

    Nikolaos Giatsoglou, Angelos Antonopoulos, Elli Kartsakli, John Vardakas, Christos Verikoukis
    Comments: 6 pages, 7 figures, IEEE Global Communications Conference 2016 (GLOBECOM), Accepted
    Subjects: Information Theory (cs.IT)

    Full-Duplex (FD) wireless and Device-to-Device (D2D) communication are two
    promising technologies that aspire to enhance the spectrum and energy
    efficiency of wireless networks, thus fulfilling key requirements of the 5th
    generation (5G) of mobile networks. Both technologies, however, generate
    excessive interference, which, if not managed effectively, threatens to
    compromise system performance. To this direction, we propose two transmission
    policies that enhance the communication of two interfering FD-enabled D2D
    pairs, derived from game theory and optimization theory. The game-theoretic
    policy allows the pairs to choose their transmission modes independently and
    the optimal policy to maximize their throughput, achieving significant gains
    when the pairs interfere strongly with each other.

    A Low Complexity Detection Algorithm for SCMA

    Chenchen Zhang, Yuan Luo, Yan Chen
    Comments: 9 pages, 5 figures
    Subjects: Information Theory (cs.IT)

    Sparse code multiple access (SCMA) is a new multiple access technique which
    supports massive connectivity. Compared with the current Long Term Evolution
    (LTE) system, it enables the overloading of active users on limited orthogonal
    resources and thus meets the requirement of the fifth generation (5G) wireless
    networks. However, the computation complexity of existing detection algorithms
    increases exponentially with (d_f) (the degree of the resource nodes). Although
    the codebooks are designed to have low density, the detection still takes
    considerable time. The parameter (d_f) must be designed to be very small, which
    largely limits the choice of codebooks. In this paper, a new detection
    algorithm is proposed by discretizing the probability distribution functions
    (PDFs) in the layer nodes (variable nodes). Given (M) as the size of one
    codebook, the detection complexity of each resource node (function node) is
    reduced from (O(d_f M^{d_f})) to (O(d_f^3 ln (d_f))). Its detection accuracy
    can quickly approach that of the previous detection algorithms with the
    decrease of sampling interval in discretization.

    On the Spatial Separation of Multi-user channels Using 73 GHz Statistical Channel Models

    Geamel Alyami, Ivica Kostanic
    Comments: IEEE International Conference on Communication, Networks and Satellite (COMNETSAT) (IEEE COMNETSAT 2016)
    Subjects: Information Theory (cs.IT)

    This paper focuses at the investigation of the degree of orthogonality of
    channels of multiple users in densely populated indoor and outdoor scenarios.
    For this purpose, a statistical millimeter wave (mmwave) MIMO channel simulator
    is carefully designed using state of the art channel models. At the mmwave
    frequencies, human/vehicular mobility around the mobile users may partially or
    completely block the communication link. This give rise to the consideration of
    new channel modeling parameter i.e. probability of a user to be in LOS
    dynamics. Higher line of sight (LOS) probabilities may increase the spatial
    correlation among the multiuser channels. Therefore, quantification of the
    spatial separation of users in different scenarios with distinct LOS
    probabilities is crucial and it is the subject of investigation of this paper.
    Additionally, the mutual orthogonality of channels by changing the number of
    base station (BS) antennas and inter antenna element distance (IED) have also
    been investigated. Analysis shows that LOS blockage of certain scenario has
    more impact at the degree of orthogonality among users as compared to the
    number of BS antennas and the spacing between the antenna elements.

    Learning-based Coexistence in Two-tier Heterogeneous Networks with Cognitive Small Cells

    Lin Zhang, Guodong Zhao, Wenli Zhou, Gang Wu, Ying-Chang Liang, Shaoqian Li
    Comments: Submitted to IEEE Journal of Selected Areas in Communications 2017
    Subjects: Information Theory (cs.IT)

    We study the coexistence problem in a two-tier heterogeneous network (HetNet)
    with cognitive small cells. In particular, we consider an underlay HetNet,
    where the cognitive small base station (C-SBS) is allowed to use the frequency
    band of the macro cell with an access probability (AP) as long as the C-SBS
    satisfies a preset interference probability (IP) constraint at macro user
    (MUs). To enhance the AP of the C-SBS, we propose that the C-SBS exploits the
    distance information between the macro base station (MBS) and MUs, namely,
    MBS-MU distance information, which is contained in the signals of the MBS.
    Briefly, we first enable the C-SBS to analyze the signal from the MBS to the MU
    on a target frequency band, and learn the MBS-MU distance information. Then, we
    calculate the upper bound of the probability that the C-SBS may interfere with
    the MU, and design an AP with a closed-form expression under the IP constraint
    at the MU. Numerical results indicate that the proposed algorithm outperforms
    the state of arts, i.e., up to (90\%) AP improvement.

    Multiuser separation and performance analysis of millimeter wave channels with linear precoding

    Waqas Ahmad, Geamel Alyami, Ivica Kostanic
    Comments: IEEE International Conference on Communication, Networks and Satellite (COMNETSAT) (IEEE COMNETSAT 2016)
    Subjects: Information Theory (cs.IT)

    In the conventional multiuser MIMO systems, user selection and scheduling has
    previously been used as an effective way to increase the sum rate performance
    of the system. However, the recent concepts of the massive MIMO systems (at
    centimeter wavelength frequencies) have shown that with higher spatial
    resolution of antenna arrays different users in the dense scenarios can be
    spatially separated. This in turn significantly reduces the signal processing
    efforts required for multiuser selection algorithms. On the other hand, recent
    measurements at millimeter wave frequencies show that multipath components only
    arrive from few angular directions leading to high spatial correlation between
    the paths and co-located users. This paper focus at the investigation of
    spatial separation among the users at the millimeter wave frequencies with
    fully digital linear zero-forcing transmit precoding considering various
    channel propagation parameters. Our analysis results convincingly give a proof
    that multiuser selection algorithms are still important for millimeter wave
    communication systems. Results also show that increased number of antenna
    elements does not give a major benefit to sum rate improvements as compared to
    the selection of correct number of users to be selected/scheduled.

    High-Throughput Data Detection for Massive MU-MIMO-OFDM using Coordinate Descent

    Michael Wu, Chris Dick, Joseph R. Cavallaro, Christoph Studer
    Comments: IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS I), Vol. 63, No. 12, Dec. 2016
    Subjects: Information Theory (cs.IT)

    Data detection in massive multi-user (MU) multiple-input multiple-output
    (MIMO) wireless systems is among the most critical tasks due to the excessively
    high implementation complexity. In this paper, we propose a novel,
    equalization-based soft-output data-detection algorithm and corresponding
    reference FPGA designs for wideband massive MU-MIMO systems that use orthogonal
    frequency-division multiplexing (OFDM). Our data-detection algorithm performs
    approximate minimum mean-square error (MMSE) or box-constrained equalization
    using coordinate descent. We deploy a variety of algorithm-level optimizations
    that enable near-optimal error-rate performance at low implementation
    complexity, even for systems with hundreds of base-station (BS) antennas and
    thousands of subcarriers. We design a parallel VLSI architecture that uses
    pipeline interleaving and can be parametrized at design time to support various
    antenna configurations. We develop reference FPGA designs for massive
    MU-MIMO-OFDM systems and provide an extensive comparison to existing designs in
    terms of implementation complexity, throughput, and error-rate performance. For
    a 128 BS antenna, 8 user massive MU-MIMO-OFDM system, our FPGA design
    outperforms the next-best implementation by more than 2.6x in terms of
    throughput per FPGA look-up tables.

    Low-latency Ultra Reliable 5G Communications: Finite-Blocklength Bounds and Coding Schemes

    Johan Östman, Giuseppe Durisi, Erik G. Ström, Jingya Li, Henrik Sahlin, Gianluigi Liva
    Subjects: Information Theory (cs.IT)

    Future autonomous systems require wireless connectivity able to support
    extremely stringent requirements on both latency and reliability. In this
    paper, we leverage recent developments in the field of finite-blocklength
    information theory to illustrate how to optimally design wireless systems in
    the presence of such stringent constraints. Focusing on a multi-antenna
    Rayleigh block-fading channel, we obtain bounds on the maximum number of bits
    that can be transmitted within given bandwidth, latency, and reliability
    constraints, using an orthogonal frequency-division multiplexing system similar
    to LTE. These bounds unveil the fundamental interplay between latency,
    bandwidth, rate, and reliability. Furthermore, they suggest how to optimally
    use the available spatial and frequency diversity. Finally, we use our bounds
    to benchmark the performance of an actual coding scheme involving the
    transmission of short packets.

    GSVD-Based Precoding in MIMO Systems With Integrated Services

    Weidong Mei, Zhi Chen, Jun Fang
    Comments: IEEE Signal Processing Letters (Volume: 23, Issue: 11, Nov. 2016)
    Subjects: Information Theory (cs.IT)

    This letter considers a two-receiver multiple-input multiple-output (MIMO)
    Gaussian broadcast channel model with integrated services. Specifically, we
    combine two sorts of service messages, and serve them simultaneously: one
    multicast message intended for both receivers and one confidential message
    intended for only one receiver. The confidential message is kept perfectly
    secure from the unauthorized receiver. Due to the coupling of service messages,
    it is intractable to seek capacity-achieving transmit covariance matrices.
    Accordingly, we propose a suboptimal precoding scheme based on the generalized
    singular value decomposition (GSVD). The GSVD produces several virtual
    orthogonal subchannels between the transmitter and the receivers. Subchannel
    allocation and power allocation between multicast message and confidential
    message are jointly optimized to maximize the secrecy rate in this letter,
    subject to the quality of multicast service (QoMS) constraints. Since this
    problem is inherently complex, a difference-of-concave (DC) algorithm, together
    with an exhaustive search, is exploited to handle the power allocation and
    subchannel allocation, respectively. Numerical results are presented to
    illustrate the efficacy of our proposed strategies.

    On the DoF of Two-way (2 imes2 imes2) Relay Networks with or without Relay Caching

    Mehdi Ashraphijuo, Vaneet Aggarwal, Xiaodong Wang
    Subjects: Information Theory (cs.IT)

    Two-way relay is potentially an effective approach to spectrum sharing and
    aggregation by allowing simultaneous bidirectional transmissions between
    source-destinations pairs. This paper studies the two-way (2 imes2 imes2)
    relay network, a class of four-unicast networks, where there are four
    source/destination nodes and two relay nodes, with each source sending a
    message to its destination. We show that without relay caching the total
    degrees of freedom (DoF) is bounded from above by (8/3), indicating that
    bidirectional links do not double the DoF (It is known that the total DoF of
    one-way (2 imes2 imes2) relay network is (2).). Further, we show that the DoF
    of (8/3) is achievable for the two-way (2 imes2 imes2) relay network with
    relay caching. Finally, even though the DoF of this network is no more than
    (8/3) for generic channel gains, DoF of (4) can be achieved for a symmetric
    configuration of channel gains.

    Exponential Separation of Quantum Communication and Classical Information

    Anurag Anshu, Dave Touchette, Penghui Yao, Nengkun Yu
    Comments: v1, 36 pages, 3 figures
    Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Information Theory (cs.IT)

    We exhibit a Boolean function for which the quantum communication complexity
    is exponentially larger than the classical information complexity. An
    exponential separation in the other direction was already known from the work
    of Kerenidis et. al. [SICOMP 44, pp. 1550-1572], hence our work implies that
    these two complexity measures are incomparable. As classical information
    complexity is an upper bound on quantum information complexity, which in turn
    is equal to amortized quantum communication complexity, our work implies that a
    tight direct sum result for distributional quantum communication complexity
    cannot hold. The function we use to present such a separation is the Symmetric
    k-ary Pointer Jumping function introduced by Rao and Sinha [ECCC TR15-057],
    whose classical communication complexity is exponentially larger than its
    classical information complexity. In this paper, we show that the quantum
    communication complexity of this function is polynomially equivalent to its
    classical communication complexity. The high-level idea behind our proof is
    arguably the simplest so far for such an exponential separation between
    information and communication, driven by a sequence of round-elimination
    arguments, allowing us to simplify further the approach of Rao and Sinha.

    As another application of the techniques that we develop, we give a simple
    proof for an optimal trade-off between Alice’s and Bob’s communication while
    computing the related Greater-Than function on n bits: say Bob communicates at
    most b bits, then Alice must send n/exp(O(b)) bits to Bob. This holds even when
    allowing pre-shared entanglement. We also present a classical protocol
    achieving this bound.

    Relative entropies and their use in quantum information theory

    Felix Leditzky
    Comments: 200 pages, 11 figures. PhD thesis, University of Cambridge, November 2016. Includes results from arXiv:1308.5961, arXiv:1403.2543, arXiv:1407.6616, arXiv:1506.02635, and arXiv:1604.02119 (see Section 1.4 for detailed information)
    Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)

    This dissertation investigates relative entropies, also called generalized
    divergences, and how they can be used to characterize information-theoretic
    tasks in quantum information theory. The main goal is to further refine
    characterizations of the optimal rates for quantum source coding, state
    redistribution, and measurement compression with quantum side information via
    second order asymptotic expansions and strong converse theorems. The
    dissertation consists of a mathematical and an information-theoretic part. In
    the mathematical part, we focus on the (alpha)-sandwiched R’enyi divergence
    ((alpha)-SRD). We first investigate the limit (alpha o 0) to determine
    whether this recovers the well-known (0)-R’enyi relative divergence. We then
    prove various new results for entropic quantities derived from the
    (alpha)-SRD, including dimension bounds and useful bounds in terms of the
    fidelity between two quantum states. Furthermore, we derive a necessary and
    sufficient algebraic condition for equality in the data processing inequality
    (viz. monotonicity under quantum operations) for the (alpha)-SRD, and give
    applications to entropic bounds. In the information-theoretic part, we first
    derive the second order asymptotics of visible quantum source coding using a
    mixed source. For the achievability part, we develop universal quantum source
    codes achieving a given second order rate for a memoryless source. As a
    corollary of the main result, we obtain the second order asymptotics of quantum
    source coding using a single memoryless source. We then prove strong converse
    theorems for state redistribution (with or without feedback) and measurement
    compression with quantum side information. The key ingredients in proving these
    theorems are the aforementioned fidelity bounds on R’enyi entropic quantities
    derived from the (alpha)-SRD.

    Learning without recall in directed circles and rooted trees

    M. Amin Rahimian, Ali Jadbabaie
    Comments: American Control Conference (ACC), 2015
    Subjects: Applications (stat.AP); Information Theory (cs.IT); Social and Information Networks (cs.SI); Probability (math.PR); Machine Learning (stat.ML)

    This work investigates the case of a network of agents that attempt to learn
    some unknown state of the world amongst the finitely many possibilities. At
    each time step, agents all receive random, independently distributed private
    signals whose distributions are dependent on the unknown state of the world.
    However, it may be the case that some or any of the agents cannot distinguish
    between two or more of the possible states based only on their private
    observations, as when several states result in the same distribution of the
    private signals. In our model, the agents form some initial belief (probability
    distribution) about the unknown state and then refine their beliefs in
    accordance with their private observations, as well as the beliefs of their
    neighbors. An agent learns the unknown state when her belief converges to a
    point mass that is concentrated at the true state. A rational agent would use
    the Bayes’ rule to incorporate her neighbors’ beliefs and own private signals
    over time. While such repeated applications of the Bayes’ rule in networks can
    become computationally intractable, in this paper, we show that in the
    canonical cases of directed star, circle or path networks and their
    combinations, one can derive a class of memoryless update rules that replicate
    that of a single Bayesian agent but replace the self beliefs with the beliefs
    of the neighbors. This way, one can realize an exponentially fast rate of
    learning similar to the case of Bayesian (fully rational) agents. The proposed
    rules are a special case of the Learning without Recall.

    Truthful Spectrum Auction for Efficient Anti-Jamming in Cognitive Radio Networks

    Mohammad Aghababaie Alavijeh, Behrouz Maham, Zhu Han, Walid Saad
    Subjects: Networking and Internet Architecture (cs.NI); Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)

    One significant challenge in cognitive radio networks is to design a
    framework in which the selfish secondary users are obliged to interact with
    each other truthfully. Moreover, due to the vulnerability of these networks
    against jamming attacks, designing anti-jamming defense mechanisms is equally
    important. %providing the security defense is also of great importance. In this
    paper, we propose a truthful mechanism, robust against the jamming, for a
    dynamic stochastic cognitive radio network consisting of several selfish
    secondary users and a malicious user. In this model, each secondary user
    participates in an auction and wish to use the unjammed spectrum, and the
    malicious user aims at jamming a channel by corrupting the communication link.
    A truthful auction mechanism is designed among the secondary users.
    Furthermore, a zero-sum game is formulated between the set of secondary users
    and the malicious user. This joint problem is then cast as a randomized
    two-level auctions in which the first auction allocates the vacant channels,
    and then the second one assigns the remaining unallocated channels. We have
    also changed this solution to a trustful distributed scheme. Simulation results
    show that the distributed algorithm can achieve a performance that is close to
    the centralized algorithm, without the added overhead and complexity.




沪ICP备19023445号-2号
友情链接