IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    arXiv Paper Daily: Tue, 11 Apr 2017

    我爱机器学习(52ml.net)发表于 2017-04-11 00:00:00
    love 0

    Neural and Evolutionary Computing

    Unsupervised prototype learning in an associative-memory network

    Huiling Zhen, Shang-Nan Wang, Hai-Jun Zhou
    Comments: 10 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)

    Unsupervised learning in a generalized Hopfield associative-memory network is
    investigated in this work. First, we prove that the (generalized) Hopfield
    model is equivalent to a semi-restricted Boltzmann machine with a layer of
    visible neurons and another layer of hidden binary neurons, so it could serve
    as the building block for a multilayered deep-learning system. We then
    demonstrate that the Hopfield network can learn to form a faithful internal
    representation of the observed samples, with the learned memory patterns being
    prototypes of the input data. Furthermore, we propose a spectral method to
    extract a small set of concepts (idealized prototypes) as the most concise
    summary or abstraction of the empirical data.

    Parsimonious Random Vector Functional Link Network for Data Streams

    Mahardhika Pratama, Plamen P. Angelov, Edwin Lughofer, Meng Joo Er
    Comments: this paper is submitted for publication in Information Sciences
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    The theory of random vector functional link network (RVFLN) has provided a
    breakthrough in the design of neural networks (NNs) since it conveys solid
    theoretical justification of randomized learning. Existing works in RVFLN are
    hardly scalable for data stream analytics because they are inherent to the
    issue of complexity as a result of the absence of structural learning
    scenarios. A novel class of RVLFN, namely parsimonious random vector functional
    link network (pRVFLN), is proposed in this paper. pRVFLN features an open
    structure paradigm where its network structure can be built from scratch and
    can be automatically generated in accordance with degree of nonlinearity and
    time-varying property of system being modelled. pRVFLN is equipped with
    complexity reduction scenarios where inconsequential hidden nodes can be pruned
    and input features can be dynamically selected. pRVFLN puts into perspective an
    online active learning mechanism which expedites the training process and
    relieves operator labelling efforts. In addition, pRVFLN introduces a
    non-parametric type of hidden node, developed using an interval-valued data
    cloud. The hidden node completely reflects the real data distribution and is
    not constrained by a specific shape of the cluster. All learning procedures of
    pRVFLN follow a strictly single-pass learning mode, which is applicable for an
    online real-time deployment. The efficacy of pRVFLN was rigorously validated
    through numerous simulations and comparisons with state-of-the art algorithms
    where it produced the most encouraging numerical results. Furthermore, the
    robustness of pRVFLN was investigated and a new conclusion is made to the scope
    of random parameters where it plays vital role to the success of randomized
    learning.

    Evolutionary Many-Objective Optimization Based on Adversarial Decomposition

    Mengyuan Wu, Ke Li, Sam Kwong, Qingfu Zhang
    Comments: 24 pages, 5 figures
    Subjects: Neural and Evolutionary Computing (cs.NE)

    The decomposition-based method has been recognized as a major approach for
    multi-objective optimization. It decomposes a multi-objective optimization
    problem into several single-objective optimization subproblems, each of which
    is usually defined as a scalarizing function using a weight vector. Due to the
    characteristics of the contour line of a particular scalarizing function, the
    performance of the decomposition-based method strongly depends on the Pareto
    front’s shape by merely using a single scalarizing function, especially when
    facing a large number of objectives. To improve the flexibility of the
    decomposition-based method, this paper develops an adversarial decomposition
    method that leverages the complementary characteristics of two different
    scalarizing functions within a single paradigm. More specifically, we maintain
    two co-evolving populations simultaneously by using different scalarizing
    functions. In order to avoid allocating redundant computational resources to
    the same region of the Pareto front, we stably match these two co-evolving
    populations into one-one solution pairs according to their working regions of
    the Pareto front. Then, each solution pair can at most contribute one mating
    parent during the mating selection process. Comparing with nine
    state-of-the-art many-objective optimizers, we have witnessed the competitive
    performance of our proposed algorithm on 130 many-objective test instances with
    various characteristics and Pareto front’s shapes.

    Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs

    Martin Simonovsky, Nikos Komodakis
    Comments: Accepted to CVPR 2017; extended version
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    A number of problems can be formulated as prediction on graph-structured
    data. In this work, we generalize the convolution operator from regular grids
    to arbitrary graphs while avoiding the spectral domain, which allows us to
    handle graphs of varying size and connectivity. To move beyond a simple
    diffusion, filter weights are conditioned on the specific edge labels in the
    neighborhood of a vertex. Together with the proper choice of graph coarsening,
    we explore constructing deep neural networks for graph classification. In
    particular, we demonstrate the generality of our formulation in point cloud
    classification, where we set the new state of the art, and on a graph
    classification dataset, where we outperform other deep learning approaches.

    Learning Important Features Through Propagating Activation Differences

    Avanti Shrikumar, Peyton Greenside, Anshul Kundaje
    Comments: 9 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The purported “black box”‘ nature of neural networks is a barrier to adoption
    in applications where interpretability is essential. Here we present DeepLIFT
    (Deep Learning Important FeaTures), a method for decomposing the output
    prediction of a neural network on a specific input by backpropagating the
    contributions of all neurons in the network to every feature of the input.
    DeepLIFT compares the activation of each neuron to its ‘reference activation’
    and assigns contribution scores according to the difference. By optionally
    giving separate consideration to positive and negative contributions, DeepLIFT
    can also reveal dependencies which are missed by other approaches. Scores can
    be computed efficiently in a single backward pass. We apply DeepLIFT to models
    trained on MNIST and simulated genomic data, and show significant advantages
    over gradient-based methods. A detailed video tutorial on the method is at
    this http URL and code is at this http URL

    Pyramid Vector Quantization for Deep Learning

    Vincenzo Liguori
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    This paper explores the use of Pyramid Vector Quantization (PVQ) to reduce
    the computational cost for a variety of neural networks (NNs) while, at the
    same time, compressing the weights that describe them. This is based on the
    fact that the dot product between an N dimensional vector of real numbers and
    an N dimensional PVQ vector can be calculated with only additions and
    subtractions and one multiplication. This is advantageous since tensor
    products, commonly used in NNs, can be re-conduced to a dot product or a set of
    dot products. Finally, it is stressed that any NN architecture that is based on
    an operation that can be re-conduced to a dot product can benefit from the
    techniques described here.


    Computer Vision and Pattern Recognition

    Loss Max-Pooling for Semantic Image Segmentation

    Samuel Rota Bulò, Gerhard Neuhold, Peter Kontschieder
    Comments: accepted at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

    We introduce a novel loss max-pooling concept for handling imbalanced
    training data distributions, applicable as alternative loss layer in the
    context of deep neural networks for semantic image segmentation. Most
    real-world semantic segmentation datasets exhibit long tail distributions with
    few object categories comprising the majority of data and consequently biasing
    the classifiers towards them. Our method adaptively re-weights the
    contributions of each pixel based on their observed losses, targeting
    under-performing classification results as often encountered for
    under-represented object classes. Our approach goes beyond conventional
    cost-sensitive learning attempts through adaptive considerations that allow us
    to indirectly address both, inter- and intra-class imbalances. We provide a
    theoretical justification of our approach, complementary to experimental
    analyses on benchmark datasets. In our experiments on the Cityscapes and Pascal
    VOC 2012 segmentation datasets we find consistently improved results,
    demonstrating the efficacy of our approach.

    Using convolutional networks and satellite imagery to identify patterns in urban environments at a large scale

    Adrian Albert, Jasleen Kaur, Marta Gonzalez
    Comments: 18 pages, 11 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Urban planning applications (energy audits, investment, etc.) require an
    understanding of built infrastructure and its environment, i.e., both
    low-level, physical features (amount of vegetation, building area and geometry
    etc.), as well as higher-level concepts such as land use classes (which encode
    expert understanding of socio-economic end uses). This kind of data is
    expensive and labor-intensive to obtain, which limits its availability
    (particularly in developing countries). We analyze patterns in land use in
    urban neighborhoods using large-scale satellite imagery data (which is
    available worldwide from third-party providers) and state-of-the-art computer
    vision techniques based on deep convolutional neural networks. For supervision,
    given the limited availability of standard benchmarks for remote-sensing data,
    we obtain ground truth land use class labels carefully sampled from open-source
    surveys, in particular the Urban Atlas land classification dataset of (20) land
    use classes across (~300) European cities. We use this data to train and
    compare deep architectures which have recently shown good performance on
    standard computer vision tasks (image classification and segmentation),
    including on geospatial data. Furthermore, we show that the deep
    representations extracted from satellite imagery of urban environments can be
    used to compare neighborhoods across several cities. We make our dataset
    available for other machine learning researchers to use for remote-sensing
    applications.

    Surface Normals in the Wild

    Weifeng Chen, Donglai Xiang, Jia Deng
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We study the problem of single-image depth estimation for images in the wild.
    We collect human annotated surface normals and use them to train a neural
    network that directly predicts pixel-wise depth. We propose two novel loss
    functions for training with surface normal annotations. Experiments on NYU
    Depth and our own dataset demonstrate that our approach can significantly
    improve the quality of depth estimation in the wild.

    Fast Learning and Prediction for Object Detection using Whitened CNN Features

    Björn Barz, Erik Rodner, Christoph Käding, Joachim Denzler
    Comments: Technical Report about the possibilities introduced with ARTOS v2, originally created March 2016
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We combine features extracted from pre-trained convolutional neural networks
    (CNNs) with the fast, linear Exemplar-LDA classifier to get the advantages of
    both: the high detection performance of CNNs, automatic feature engineering,
    fast model learning from few training samples and efficient sliding-window
    detection. The Adaptive Real-Time Object Detection System (ARTOS) has been
    refactored broadly to be used in combination with Caffe for the experimental
    studies reported in this work.

    Multi-Agent Diverse Generative Adversarial Networks

    Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri, Philip H. S. Torr, Puneet K. Dokania
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)

    This paper describes an intuitive generalization to the Generative
    Adversarial Networks (GANs) to generate samples while capturing diverse modes
    of the true data distribution. Firstly, we propose a very simple and intuitive
    multi-agent GAN architecture that incorporates multiple generators capable of
    generating samples from high probability modes. Secondly, in order to enforce
    different generators to generate samples from diverse modes, we propose two
    extensions to the standard GAN objective function. (1) We augment the generator
    specific GAN objective function with a diversity enforcing term that encourage
    different generators to generate diverse samples using a user-defined
    similarity based function. (2) We modify the discriminator objective function
    where along with finding the real and fake samples, the discriminator has to
    predict the generator which generated the given fake sample. Intuitively, in
    order to succeed in this task, the discriminator must learn to push different
    generators towards different identifiable modes. Our framework is generalizable
    in the sense that it can be easily combined with other existing variants of
    GANs to produce diverse samples. Experimentally we show that our framework is
    able to produce high quality diverse samples for the challenging tasks such as
    image/face generation and image-to-image translation. We also show that it is
    capable of learning a better feature representation in an unsupervised setting.

    Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs

    Martin Simonovsky, Nikos Komodakis
    Comments: Accepted to CVPR 2017; extended version
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    A number of problems can be formulated as prediction on graph-structured
    data. In this work, we generalize the convolution operator from regular grids
    to arbitrary graphs while avoiding the spectral domain, which allows us to
    handle graphs of varying size and connectivity. To move beyond a simple
    diffusion, filter weights are conditioned on the specific edge labels in the
    neighborhood of a vertex. Together with the proper choice of graph coarsening,
    we explore constructing deep neural networks for graph classification. In
    particular, we demonstrate the generality of our formulation in point cloud
    classification, where we set the new state of the art, and on a graph
    classification dataset, where we outperform other deep learning approaches.

    Continuously heterogeneous hyper-objects in cryo-EM and 3-D movies of many temporal dimensions

    Roy R. Lederman, Amit Singer
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Single particle cryo-electron microscopy (EM) is an increasingly popular
    method for determining the 3-D structure of macromolecules from noisy 2-D
    images of single macromolecules whose orientations and positions are random and
    unknown. One of the great opportunities in cryo-EM is to recover the structure
    of macromolecules in heterogeneous samples, where multiple types or multiple
    conformations are mixed together. Indeed, in recent years, many tools have been
    introduced for the analysis of multiple discrete classes of molecules mixed
    together in a cryo-EM experiment. However, many interesting structures have a
    continuum of conformations which do not fit discrete models nicely; the
    analysis of such continuously heterogeneous models has remained a more elusive
    goal. In this manuscript, we propose to represent heterogeneous molecules and
    similar structures as higher dimensional objects. We generalize the basic
    operations used in many existing reconstruction algorithms, making our approach
    generic in the sense that, in principle, existing algorithms can be adapted to
    reconstruct those higher dimensional objects. As proof of concept, we present a
    prototype of a new algorithm which we use to solve simulated reconstruction
    problems.

    ActionVLAD: Learning spatio-temporal aggregation for action classification

    Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell
    Comments: Accepted to CVPR 2017. Project page: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work, we introduce a new video representation for action
    classification that aggregates local convolutional features across the entire
    spatio-temporal extent of the video. We do so by integrating state-of-the-art
    two-stream networks with learnable spatio-temporal feature aggregation. The
    resulting architecture is end-to-end trainable for whole-video classification.
    We investigate different strategies for pooling across space and time and
    combining signals from the different streams. We find that: (i) it is important
    to pool jointly across space and time, but (ii) appearance and motion streams
    are best aggregated into their own separate representations. Finally, we show
    that our representation outperforms the two-stream base architecture by a large
    margin (13% relative) as well as out-performs other baselines with comparable
    base architectures on HMDB51, UCF101, and Charades video classification
    benchmarks.

    Learning Human Motion Models for Long-term Predictions

    Partha Ghosh, Jie Song, Emre Aksan, Otmar Hilliges
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We propose a new architecture for the learning of predictive spatio-temporal
    motion models from data alone. Our approach, dubbed the Dropout Autoencoder
    LSTM, is capable of synthesizing natural looking motion sequences over long
    time horizons without catastrophic drift or motion degradation. The model
    consists of two components, a 3-layer recurrent neural network to model
    temporal aspects and a novel auto-encoder that is trained to implicitly recover
    the spatial structure of the human skeleton via randomly removing information
    about joints during training time. This Dropout Autoencoder (D-AE) is then used
    to filter each predicted pose of the LSTM, reducing accumulation of error and
    hence drift over time. Furthermore, we propose new evaluation protocols to
    assess the quality of synthetic motion sequences even for which no ground truth
    data exists. The proposed protocols can be used to assess generated sequences
    of arbitrary length. Finally, we evaluate our proposed method on two of the
    largest motion-capture datasets available to date and show that our model
    outperforms the state-of-the-art on a variety of actions, including cyclic and
    acyclic motion, and that it can produce natural looking sequences over longer
    time horizons than previous methods.

    R-Clustering for Egocentric Video Segmentation

    Estefania Talavera, Mariella Dimiccoli, Marc Bolaños, Maedeh Aghaei, Petia Radeva
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we present a new method for egocentric video temporal
    segmentation based on integrating a statistical mean change detector and
    agglomerative clustering(AC) within an energy-minimization framework. Given the
    tendency of most AC methods to oversegment video sequences when clustering
    their frames, we combine the clustering with a concept drift detection
    technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as
    a statistical upper bound for the clustering-based video segmentation. We
    integrate both techniques in an energy-minimization framework that serves to
    disambiguate the decision of both techniques and to complete the segmentation
    taking into account the temporal continuity of video frames descriptors. We
    present experiments over egocentric sets of more than 13.000 images acquired
    with different wearable cameras, showing that our method outperforms
    state-of-the-art clustering methods.

    Fine-graind Image Classification via Combining Vision and Language

    Xiangteng He, Yuxin Peng
    Comments: 9 pages, to appear in CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Fine-grained image classification is a challenging task due to the large
    intra-class variance and small inter-class variance, aiming at recognizing
    hundreds of sub-categories belonging to the same basic-level category. Most
    existing fine-grained image classification methods generally learn part
    detection models to obtain the semantic parts for better classification
    accuracy. Despite achieving promising results, these methods mainly have two
    limitations: (1) not all the parts which obtained through the part detection
    models are beneficial and indispensable for classification, and (2)
    fine-grained image classification requires more detailed visual descriptions
    which could not be provided by the part locations or attribute annotations. For
    addressing the above two limitations, this paper proposes the two-stream model
    combing vision and language (CVL) for learning latent semantic representations.
    The vision stream learns deep representations from the original visual
    information via deep convolutional neural network. The language stream utilizes
    the natural language descriptions which could point out the discriminative
    parts or characteristics for each image, and provides a flexible and compact
    way of encoding the salient visual aspects for distinguishing sub-categories.
    Since the two streams are complementary, combing the two streams can further
    achieves better classification accuracy. Comparing with 12 state-of-the-art
    methods on the widely used CUB-200-2011 dataset for fine-grained image
    classification, the experimental results demonstrate our CVL approach achieves
    the best performance.

    Deep Affordance-grounded Sensorimotor Object Recognition

    Spyridon Thermos, Georgios Th. Papadopoulos, Petros Daras, Gerasimos Potamianos
    Comments: 9 pages, 7 figures, dataset link included, accepted to CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    It is well-established by cognitive neuroscience that human perception of
    objects constitutes a complex process, where object appearance information is
    combined with evidence about the so-called object “affordances”, namely the
    types of actions that humans typically perform when interacting with them. This
    fact has recently motivated the “sensorimotor” approach to the challenging task
    of automatic object recognition, where both information sources are fused to
    improve robustness. In this work, the aforementioned paradigm is adopted,
    surpassing current limitations of sensorimotor object recognition research.
    Specifically, the deep learning paradigm is introduced to the problem for the
    first time, developing a number of novel neuro-biologically and
    neuro-physiologically inspired architectures that utilize state-of-the-art
    neural networks for fusing the available information sources in multiple ways.
    The proposed methods are evaluated using a large RGB-D corpus, which is
    specifically collected for the task of sensorimotor object recognition and is
    made publicly available. Experimental results demonstrate the utility of
    affordance information to object recognition, achieving an up to 29% relative
    error reduction by its inclusion.

    Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking

    Laura Leal-Taixé, Anton Milan, Konrad Schindler, Daniel Cremers, Ian Reid, Stefan Roth
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Standardized benchmarks are crucial for the majority of computer vision
    applications. Although leaderboards and ranking tables should not be
    over-claimed, benchmarks often provide the most objective measure of
    performance and are therefore important guides for research. We present a
    benchmark for Multiple Object Tracking launched in the late 2014, with the goal
    of creating a framework for the standardized evaluation of multiple object
    tracking methods. This paper collects the two releases of the benchmark made so
    far, and provides an in-depth analysis of almost 50 state-of-the-art trackers
    that were tested on over 11000 frames. We show the current trends and
    weaknesses of multiple people tracking methods, and provide pointers of what
    researchers should be focusing on to push the field forward.

    Detail-revealing Deep Video Super-resolution

    Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, Jiaya Jia
    Comments: 9 pages, submitted to conference
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Previous CNN-based video super-resolution approaches need to align multiple
    frames to the reference. In this paper, we show that proper frame alignment and
    motion compensation is crucial for achieving high quality results. We
    accordingly propose a `sub-pixel motion compensation’ (SPMC) layer in a CNN
    framework. Analysis and experiments show the suitability of this layer in video
    SR. The final end-to-end, scalable CNN framework effectively incorporates the
    SPMC layer and fuses multiple frames to reveal image details. Our
    implementation can generate visually and quantitatively high-quality results,
    superior to current state-of-the-arts, without the need of parameter tuning.

    DeepPermNet: Visual Permutation Learning

    Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould
    Comments: Accepted in IEEE International Conference on Computer Vision and Pattern Recognition CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We present a principled approach to uncover the structure of visual data by
    solving a novel deep learning task coined visual permutation learning. The goal
    of this task is to find the permutation that recovers the structure of data
    from shuffled versions of it. In the case of natural images, this task boils
    down to recovering the original image from patches shuffled by an unknown
    permutation matrix. Unfortunately, permutation matrices are discrete, thereby
    posing difficulties for gradient-based methods. To this end, we resort to a
    continuous approximation of these matrices using doubly-stochastic matrices
    which we generate from standard CNN predictions using Sinkhorn iterations.
    Unrolling these iterations in a Sinkhorn network layer, we propose DeepPermNet,
    an end-to-end CNN model for this task. The utility of DeepPermNet is
    demonstrated on two challenging computer vision problems, namely, (i) relative
    attributes learning and (ii) self-supervised representation learning. Our
    results show state-of-the-art performance on the Public Figures and OSR
    benchmarks for (i) and on the classification and segmentation tasks on the
    PASCAL VOC dataset for (ii).

    Adaptive Relaxed ADMM: Convergence Theory and Practical Implementation

    Zheng Xu, Mario A. T. Figueiredo, Xiaoming Yuan, Christoph Studer, Tom Goldstein
    Comments: CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Numerical Analysis (cs.NA)

    Many modern computer vision and machine learning applications rely on solving
    difficult optimization problems that involve non-differentiable objective
    functions and constraints. The alternating direction method of multipliers
    (ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a
    generalization of ADMM that often achieves better performance, but its
    efficiency depends strongly on algorithm parameters that must be chosen by an
    expert user. We propose an adaptive method that automatically tunes the key
    algorithm parameters to achieve optimal performance without user oversight.
    Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM
    (ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A
    detailed convergence analysis of ARADMM is provided, and numerical results on
    several applications demonstrate fast practical convergence.

    Automatic Liver Lesion Detection using Cascaded Deep Residual Networks

    Lei Bi, Jinman Kim, Ashnil Kumar, Dagan Feng
    Comments: Submission for 2017 ISBI LiTS Challenge
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automatic segmentation of liver lesions is a fundamental requirement towards
    the creation of computer aided diagnosis (CAD) and decision support systems
    (CDS). Traditional segmentation approaches depend heavily upon hand-crafted
    features and a priori knowledge of the user. As such, these methods are
    difficult to adopt within a clinical environment. Recently, deep learning
    methods based on fully convolutional networks (FCNs) have been successful in
    many segmentation problems primarily because they leverage a large labelled
    dataset to hierarchically learn the features that best correspond to the
    shallow visual appearance as well as the deep semantics of the areas to be
    segmented. However, FCNs based on a 16 layer VGGNet architecture have limited
    capacity to add additional layers. Therefore, it is challenging to learn more
    discriminative features among different classes for FCNs. In this study, we
    overcome these limitations using deep residual networks (ResNet) to segment
    liver lesions. ResNet contain skip connections between convolutional layers,
    which solved the problem of the training degradation of training accuracy in
    very deep networks and thereby enables the use of additional layers for
    learning more discriminative features. In addition, we achieve more precise
    boundary definitions through a novel cascaded ResNet architecture with
    multi-scale fusion to gradually learn and infer the boundaries of both the
    liver and the liver lesions. Our proposed method achieved 4th place in the ISBI
    2017 Liver Tumor Segmentation Challenge by the submission deadline.

    Fully Convolutional Deep Neural Networks for Persistent Multi-Frame Multi-Object Detection in Wide Area Aerial Videos

    Rodney LaLonde, Dong Zhang, Mubarak Shah
    Comments: Under review at the International Conference on Computer Vision (ICCV), 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Multiple object detection in wide area aerial videos, has drawn the attention
    of the computer vision research community for a number of years. A novel
    framework is proposed in this paper using a fully convolutional deep neural
    network, which is able to detect all objects simultaneously for a given region
    of interest. The network is designed to accept multiple video frames at a time
    as the input and yields detection results for all objects in the temporally
    center frame. This multi-frame approach yield far better results than its
    single frame counterpart. Additionally, the proposed method can detect vehicles
    which are slowing, stopped, and/or partially or fully occluded during some
    frames, which cannot be handled by nearly all state-of-the-art methods. To the
    best of our knowledge, this is the first use of a multiple-frame, fully
    convolutional deep model for detecting multiple small objects and the only
    framework which can detect stopped and temporarily occluded vehicles, for
    aerial videos. The proposed network exceeds state-of-the-art results
    significantly on WPAFB 2009 dataset.

    Learning Important Features Through Propagating Activation Differences

    Avanti Shrikumar, Peyton Greenside, Anshul Kundaje
    Comments: 9 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The purported “black box”‘ nature of neural networks is a barrier to adoption
    in applications where interpretability is essential. Here we present DeepLIFT
    (Deep Learning Important FeaTures), a method for decomposing the output
    prediction of a neural network on a specific input by backpropagating the
    contributions of all neurons in the network to every feature of the input.
    DeepLIFT compares the activation of each neuron to its ‘reference activation’
    and assigns contribution scores according to the difference. By optionally
    giving separate consideration to positive and negative contributions, DeepLIFT
    can also reveal dependencies which are missed by other approaches. Scores can
    be computed efficiently in a single backward pass. We apply DeepLIFT to models
    trained on MNIST and simulated genomic data, and show significant advantages
    over gradient-based methods. A detailed video tutorial on the method is at
    this http URL and code is at this http URL

    Quaternion Based Camera Pose Estimation From Matched Feature Points

    Kaveh Fathian, J. Pablo Ramirez-Paredes, Emily A. Doucette, J. Willard Curtis, Nicholas R. Gans
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

    We present a novel solution to the camera pose estimation problem, where
    rotation and translation of a camera between two views are estimated from
    matched feature points in the images. The camera pose estimation problem is
    traditionally solved via algorithms that are based on the essential matrix or
    the Euclidean homography. With six or more feature points in general positions
    in the space, essential matrix based algorithms can recover a unique solution.
    However, such algorithms fail when points are on critical surfaces (e.g.,
    coplanar points) and homography should be used instead. By formulating the
    problem in quaternions and decoupling the rotation and translation estimation,
    our proposed algorithm works for all point configurations. Using both simulated
    and real world images, we compare the estimation accuracy of our algorithm with
    some of the most commonly used algorithms. Our method is shown to be more
    robust to noise and outliers. For the benefit of community, we have made the
    implementation of our algorithm available online and free.

    BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis

    Shanxin Yuan, Qi Ye, Bjorn Stenger, Siddhand Jain, Tae-Kyun Kim
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper we introduce a large-scale hand pose dataset, collected using a
    novel capture method. Existing datasets are either generated synthetically or
    captured using depth sensors: synthetic datasets exhibit a certain level of
    appearance difference from real depth images, and real datasets are limited in
    quantity and coverage, mainly due to the difficulty to annotate them. We
    propose a tracking system with six 6D magnetic sensors and inverse kinematics
    to automatically obtain 21-joints hand pose annotations of depth maps captured
    with minimal restriction on the range of motion. The capture protocol aims to
    fully cover the natural hand pose space. As shown in embedding plots, the new
    dataset exhibits a significantly wider and denser range of hand poses compared
    to existing benchmarks. Current state-of-the-art methods are evaluated on the
    dataset, and we demonstrate significant improvements in cross-benchmark
    performance. We also show significant improvements in egocentric hand pose
    estimation with a CNN trained on the new dataset.

    Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

    Hongsong Wang, Liang Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Recently, skeleton based action recognition gains more popularity due to
    cost-effective depth sensors coupled with real-time skeleton estimation
    algorithms. Traditional approaches based on handcrafted features are limited to
    represent the complexity of motion patterns. Recent methods that use Recurrent
    Neural Networks (RNN) to handle raw skeletons only focus on the contextual
    dependency in the temporal domain and neglect the spatial configurations of
    articulated skeletons. In this paper, we propose a novel two-stream RNN
    architecture to model both temporal dynamics and spatial configurations for
    skeleton based action recognition. We explore two different structures for the
    temporal stream: stacked RNN and hierarchical RNN. Hierarchical RNN is designed
    according to human body kinematics. We also propose two effective methods to
    model the spatial structure by converting the spatial graph into a sequence of
    joints. To improve generalization of our model, we further exploit 3D
    transformation based data augmentation techniques including rotation and
    scaling transformation to transform the 3D coordinates of skeletons during
    training. Experiments on 3D action recognition benchmark datasets show that our
    method brings a considerable improvement for a variety of actions, i.e.,
    generic actions, interaction activities and gestures.

    Motion Saliency Based Automatic Delineation of Glottis Contour in High-speed Digital Images

    Xin Chen, Emma Marriott, Yuling Yan
    Comments: 4 pages 2 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In recent years, high-speed videoendoscopy (HSV) has significantly aided the
    diagnosis of voice pathologies and furthered the understanding the voice
    production in recent years. As the first step of these studies, automatic
    segmentation of glottal images till presents a major challenge for this
    technique. In this paper, we propose an improved Saliency Network that
    automatically delineates the contour of the glottis from HSV image sequences.
    Our proposed additional saliency measure, Motion Saliency (MS), improves upon
    the original Saliency Network by using the velocities of defined edges. In our
    results and analysis, we demonstrate the effectiveness of our approach and
    discuss its potential applications for computer-aided assessment of voice
    pathologies and understanding voice production.

    Deep Generative Adversarial Compression Artifact Removal

    Leonardo Galteri, Lorenzo Seidenari, Marco Bertini, Alberto Del Bimbo
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Compression artifacts arise in images whenever a lossy compression algorithm
    is applied. These artifacts eliminate details present in the original image, or
    add noise and small structures; because of these effects they make images less
    pleasant for the human eye, and may also lead to decreased performance of
    computer vision algorithms such as object detectors. To eliminate such
    artifacts, when decompressing an image, it is required to recover the original
    image from a disturbed version. To this end, we present a feed-forward fully
    convolutional residual network model that directly optimizes the Structural
    Similarity (SSIM), which is a better loss with respect to the simpler Mean
    Squared Error (MSE). We then build on the same architecture to re-formulate the
    problem in a generative adversarial framework. Our GAN is able to produce
    images with more photorealistic details than MSE or SSIM based networks.
    Moreover we show that our approach can be used as a pre-processing step for
    object detection in case images are degraded by compression to a point that
    state-of-the art detectors fail. In this task, our GAN method obtains better
    performance than MSE or SSIM trained networks.

    An Empirical Evaluation of Visual Question Answering for Novel Objects

    Santhosh K. Ramakrishnan, Ambar Pal, Gaurav Sharma, Anurag Mittal
    Comments: 11 pages, 4 figures, accepted in CVPR 2017 (poster)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    We study the problem of answering questions about images in the harder
    setting, where the test questions and corresponding images contain novel
    objects, which were not queried about in the training data. Such setting is
    inevitable in real world-owing to the heavy tailed distribution of the visual
    categories, there would be some objects which would not be annotated in the
    train set. We show that the performance of two popular existing methods drop
    significantly (up to 28%) when evaluated on novel objects cf. known objects. We
    propose methods which use large existing external corpora of (i) unlabeled
    text, i.e. books, and (ii) images tagged with classes, to achieve novel object
    based visual question answering. We do systematic empirical studies, for both
    an oracle case where the novel objects are known textually, as well as a fully
    automatic case without any explicit knowledge of the novel objects, but with
    the minimal assumption that the novel objects are semantically related to the
    existing objects in training. The proposed methods for novel object based
    visual question answering are modular and can potentially be used with many
    visual question answering architectures. We show consistent improvements with
    the two popular architectures and give qualitative analysis of the cases where
    the model does well and of those where it fails to bring improvements.

    DualGAN: Unsupervised Dual Learning for Image-to-Image Translation

    Zili Yi, Hao Zhang, Ping Tan. Minglun Gong
    Comments: First submitted to ICCV on Mar 9, 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Using conditional Generative Adversarial Network (conditional GAN) for
    cross-domain image-to-image translation has achieved significant improvements
    in the past year. Depending on the degree of task complexity, thousands or even
    millions of labeled image pairs are needed to train conditional GANs. However,
    human labeling is very expensive and sometimes impractical. Inspired by the
    success of dual learning paradigm in natural language translation, we develop a
    novel dual-GAN mechanism, which enables image translators to be trained from
    two sets of unlabeled images each representing a domain. In our architecture,
    the primal GAN learns to translate images from domain (U) to those in domain
    (V), while the dual GAN learns to convert images from (V) to (U). The closed
    loop made by the primal and dual tasks allows images from either domain to be
    translated and then reconstructed. Hence a loss function that accounts for the
    reconstruction error of images can be used to train the translation models.
    Experiments on multiple image translation tasks with unlabeled data show
    considerable performance gain of our dual-GAN architecture over a single GAN.
    For some tasks, our model can even achieve comparable or slightly better
    results to conditional GAN trained on fully labeled data.

    Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification

    Lu Tian, Shengjin Wang
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Person re-identification is generally divided into two part: first how to
    represent a pedestrian by discriminative visual descriptors and second how to
    compare them by suitable distance metrics. Conventional methods isolate these
    two parts, the first part usually unsupervised and the second part supervised.
    The Bag-of-Words (BoW) model is a widely used image representing descriptor in
    part one. Its codebook is simply generated by clustering visual features in
    Euclidian space. In this paper, we propose to use part two metric learning
    techniques in the codebook generation phase of BoW. In particular, the proposed
    codebook is clustered under Mahalanobis distance which is learned supervised.
    Extensive experiments prove that our proposed method is effective. With several
    low level features extracted on superpixel and fused together, our method
    outperforms state-of-the-art on person re-identification benchmarks including
    VIPeR, PRID450S, and Market1501.

    DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks

    Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte, Luc Van Gool
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Despite a rapid rise in the quality of built-in smartphone cameras, their
    physical limitations – small sensor size, compact lenses and the lack of
    specific hardware, – impede them to achieve the quality results of DSLR
    cameras. In this work we present an end-to-end deep learning approach that
    bridges this gap by translating ordinary photos into DSLR-produced images. We
    propose learning the translation function using a residual convolutional neural
    network that improves both color rendition and image sharpness. Since the
    standard mean squared loss is not well suited for measuring perceptual image
    quality, we introduce a composite perceptual error function that combines
    content, color and texture losses. The first two losses are defined
    analytically, while the texture loss is learned using an adversarial network.
    We also present a large-scale dataset that consists of real photos captured
    from three different phones and one high-end reflex camera. Our quantitative
    and qualitative assessments reveal that the enhanced images demonstrate the
    quality comparable to DSLR-taken photos, while the method itself can be applied
    to any type of digital cameras.

    First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations

    Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek, Tae-Kyun Kim
    Comments: Dataset can be visualized here: this https URL
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this work we study the use of 3D hand poses to recognize first-person hand
    actions interacting with 3D objects. Towards this goal, we collected RGB-D
    video sequences of more than 100K frames of 45 daily hand action categories,
    involving 25 different objects in several hand grasp configurations. To obtain
    high quality hand pose annotations from real sequences, we used our own mo-cap
    system that automatically infers the location of each of the 21 joints of the
    hand via 6 magnetic sensors on the finger tips and the inverse-kinematics of a
    hand model. To the best of our knowledge, this is the first benchmark for RGB-D
    hand action sequences with 3D hand poses. Additionally, we recorded the 6D
    (i.e. 3D rotations and locations) object poses and provide 3D object models for
    a subset of hand-object interaction sequences. We present extensive
    experimental evaluations of RGB-D and pose-based action recognition by 18
    baselines/state-of-the-art. The impact of using appearance features, poses and
    their combinations are measured, and the different training/testing protocols
    including cross-persons are evaluated. Finally, we assess how ready the current
    hand pose estimation is when hands are severely occluded by objects in
    egocentric views and its influence on action recognition. From the results, we
    see clear benefits of using hand pose as a cue for action recognition compared
    to other data modalities. Our dataset and experiments can be of interest to
    communities of 6D object pose, robotics, and 3D hand pose estimation as well as
    action recognition.

    A New Pseudo-color Technique Based on Intensity Information Protection for Passive Sensor Imagery

    Mohammad Reza Khosravi, Habib Rostami, Gholam Reza Ahmadi, Suleiman Mansouri, Ahmad Keshavarz
    Journal-ref: International Journal of Electronics Communication and Computer
    Engineering, vol. 6, no. 3, pp. 324-329 (2015)
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Remote sensing image processing is so important in geo-sciences. Images which
    are obtained by different types of sensors might initially be unrecognizable.
    To make an acceptable visual perception in the images, some pre-processing
    steps (for removing noises and etc) are preformed which they affect the
    analysis of images. There are different types of processing according to the
    types of remote sensing images. The method that we are going to introduce in
    this paper is to use virtual colors to colorize the gray-scale images of
    satellite sensors. This approach helps us to have a better analysis on a sample
    single-band image which has been taken by Landsat-8 (OLI) sensor (as a
    multi-band sensor with natural color bands, its images’ natural color can be
    compared to synthetic color by our approach). A good feature of this method is
    the original image reversibility in order to keep the suitable resolution of
    output images.

    Coupled Deep Learning for Heterogeneous Face Recognition

    Xiang Wu, Lingxiao Song, Ran He, Tieniu Tan
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Heterogeneous face matching is a challenge issue in face recognition due to
    large domain difference as well as insufficient pairwise images in different
    modalities during training. This paper proposes a coupled deep learning (CDL)
    approach for the heterogeneous face matching. CDL seeks a shared feature space
    in which the heterogeneous face matching problem can be approximately treated
    as a homogeneous face matching problem. The objective function of CDL mainly
    includes two parts. The first part contains a trace norm as a relevance
    constraint, which makes unpaired images from multiple modalities be clustered
    and correlated. An approximate variational formulation is introduced to deal
    with the difficulties of optimizing low-rank constraint directly. The second
    part contains a cross modal ranking among triplet domain specific images to
    maximize the margin for different identities and increase data for a small
    amount of training samples. Besides, an alternating minimization method is
    employed to iteratively update the parameters of CDL. Experimental results show
    that CDL achieves better performance on the challenging CASIA NIR-VIS 2.0 face
    recognition database, the IIIT-D Sketch database, the CUHK Face Sketch (CUFS),
    and the CUHK Face Sketch FERET (CUFSF), which significantly outperforms
    state-of-the-art heterogeneous face recognition methods.

    Weakly-supervised Transfer for 3D Human Pose Estimation in the Wild

    Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, Yichen Wei
    Comments: Submitted to ICCV 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    In this paper, we study the task of 3D human pose estimation in the wild.
    This task is challenging because existing benchmark datasets provide either 2D
    annotations in the wild or 3D annotations in controlled environments.

    We propose a weakly-supervised transfer learning method that learns an
    end-to-end network using training data with mixed 2D and 3D labels. The network
    augments a state-of-the-art 2D pose estimation network with a 3D depth
    regression network. Unlike previous approaches that train these two
    sub-networks in a sequential manner, we introduce a unified training method
    that fully exploits the correlation between these two sub-tasks and learns
    common feature representations. In doing so, the 3D pose labels in controlled
    environments are transferred to images in the wild that only possess 2D
    annotations. In addition, we introduce a 3D geometric constraint to regularize
    the prediction 3D poses, which is effective on images that only have 2D
    annotations.

    Our method leads to considerable performance gains and achieves competitive
    results on both 2D and 3D benchmarks. It produces high quality 3D human poses
    in the wild, without supervision of in-the-wild 3D data.

    Seismic facies recognition based on prestack data using deep convolutional autoencoder

    Feng Qian, Miao Yin, Ming-Jun Su, Yaojun Wang, Guangmin Hu
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Prestack seismic data carries much useful information that can help us find
    more complex atypical reservoirs. Therefore, we are increasingly inclined to
    use prestack seismic data for seis- mic facies recognition. However, due to the
    inclusion of ex- cessive redundancy, effective feature extraction from prestack
    seismic data becomes critical. In this paper, we consider seis- mic facies
    recognition based on prestack data as an image clus- tering problem in computer
    vision (CV) by thinking of each prestack seismic gather as a picture. We
    propose a convo- lutional autoencoder (CAE) network for deep feature learn- ing
    from prestack seismic data, which is more effective than principal component
    analysis (PCA) in redundancy removing and valid information extraction. Then,
    using conventional classification or clustering techniques (e.g. K-means or
    self- organizing maps) on the extracted features, we can achieve seismic facies
    recognition. We applied our method to the prestack data from physical model and
    LZB region. The re- sult shows that our approach is superior to the
    conventionals.

    Learning Cross-Modal Deep Representations for Robust Pedestrian Detection

    Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
    Comments: Accepted at CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    This paper presents a novel method for detecting pedestrians under adverse
    illumination conditions. Our approach relies on a novel cross-modality learning
    framework and it is based on two main phases. First, given a multimodal
    dataset, a deep convolutional network is employed to learn a non-linear
    mapping, modeling the relations between RGB and thermal data. Then, the learned
    feature representations are transferred to a second deep network, which
    receives as input an RGB image and outputs the detection results. In this way,
    features which are both discriminative and robust to bad illumination
    conditions are learned. Importantly, at test time, only the second pipeline is
    considered and no thermal data are required. Our extensive evaluation
    demonstrates that the proposed approach outperforms the state-of- the-art on
    the challenging KAIST multispectral pedestrian dataset and it is competitive
    with previous methods on the popular Caltech dataset.

    A Deep Cascade of Convolutional Neural Networks for Dynamic MR Image Reconstruction

    Jo Schlemper, Jose Caballero, Joseph V. Hajnal, Anthony Price, Daniel Rueckert
    Comments: arXiv admin note: substantial text overlap with arXiv:1703.00555
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Inspired by recent advances in deep learning, we propose a framework for
    reconstructing dynamic sequences of 2D cardiac magnetic resonance (MR) images
    from undersampled data using a deep cascade of convolutional neural networks
    (CNNs) to accelerate the data acquisition process. In particular, we address
    the case where data is acquired using aggressive Cartesian undersampling.
    Firstly, we show that when each 2D image frame is reconstructed independently,
    the proposed method outperforms state-of-the-art 2D compressed sensing
    approaches such as dictionary learning-based MR image reconstruction, in terms
    of reconstruction error and reconstruction speed. Secondly, when reconstructing
    the frames of the sequences jointly, we demonstrate that CNNs can learn
    spatio-temporal correlations efficiently by combining convolution and data
    sharing approaches. We show that the proposed method consistently outperforms
    Dictionary Learning with Temporal Gradients (DLTG) and is capable of preserving
    anatomical structure more faithfully up to 11-fold undersampling. Moreover,
    reconstruction is very fast: each complete dynamic sequence can be
    reconstructed in less than 10s and, for the 2D case, each image frame can be
    reconstructed in 23ms, enabling real-time applications.

    GoDP: Globally optimized dual pathway system for facial landmark localization in-the-wild

    Yuhang Wu, Shishir K. Shah, Ioannis A. Kakadiaris
    Comments: Submitted to Image and Vision Computing in Feb. 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Facial landmark localization is a fundamental module for face recognition.
    Current common approach for facial landmark detection is cascaded regression,
    which is composed by two steps: feature extraction and facial shape regression.
    Recent methods employ deep convolutional networks to extract robust features in
    each step and the whole system could be regarded as a deep cascaded regression
    architecture. Unfortunately, this architecture is problematic. First,
    parameters in the networks are optimized from a greedy stage-wise perspective.
    Second, the network cannot efficiently merge landmark coordinate vectors with
    2D convolutional layers. Third, the facial shape regression relies on a feature
    vector generated from the bottom layer of the convolutional neural network,
    which has recently been criticized for lacking spatial resolution to accomplish
    pixel-wise localization tasks. We propose a globally optimized dual-pathway
    system (GoDP) to handle the optimization and precision weaknesses of deep
    cascaded regression without resorting to high-level inference models or complex
    stacked architecture. This end-to-end system relies on distance-aware softmax
    functions and dual-pathway proposal-refinement architecture. The proposed
    system outperforms the state-of-the-art cascaded regression-based methods on
    multiple in-the-wild face alignment databases. Experiments on face
    identification demonstrate that GoDP significantly improves the quality of face
    frontalization in face recognition.

    Learning Where to Look: Data-Driven Viewpoint Set Selection for 3D Scenes

    Kyle Genova, Manolis Savva, Angel X. Chang, Thomas Funkhouser
    Comments: ICCV submission, combined main paper and supplemental material
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The use of rendered images, whether from completely synthetic datasets or
    from 3D reconstructions, is increasingly prevalent in vision tasks. However,
    little attention has been given to how the selection of viewpoints affects the
    performance of rendered training sets. In this paper, we propose a data-driven
    approach to view set selection. Given a set of example images, we extract
    statistics describing their contents and generate a set of views matching the
    distribution of those statistics. Motivated by semantic segmentation tasks, we
    model the spatial distribution of each semantic object category within an image
    view volume. We provide a search algorithm that generates a sampling of likely
    candidate views according to the example distribution, and a set selection
    algorithm that chooses a subset of the candidates that jointly cover the
    example distribution. Results of experiments with these algorithms on SUNCG
    indicate that they are indeed able to produce view distributions similar to an
    example set from NYUDv2 according to the earth mover’s distance. Furthermore,
    the selected views improve performance on semantic segmentation compared to
    alternative view selection algorithms.

    Pixelwise Instance Segmentation with a Dynamically Instantiated Network

    Anurag Arnab, Philip H.S Torr
    Comments: CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Semantic segmentation and object detection research have recently achieved
    rapid progress. However, the former task has no notion of different instances
    of the same object, and the latter operates at a coarse, bounding-box level. We
    propose an Instance Segmentation system that produces a segmentation map where
    each pixel is assigned an object class and instance identity label. Most
    approaches adapt object detectors to produce segments instead of boxes. In
    contrast, our method is based on an initial semantic segmentation module, which
    feeds into an instance subnetwork. This subnetwork uses the initial
    category-level segmentation, along with cues from the output of an object
    detector, within an end-to-end CRF to predict instances. This part of our model
    is dynamically instantiated to produce a variable number of instances per
    image. Our end-to-end approach requires no post-processing and considers the
    image holistically, instead of processing independent proposals. Therefore,
    unlike some related work, a pixel cannot belong to multiple instances.
    Furthermore, far more precise segmentations are achieved, as shown by our
    state-of-the-art results (particularly at high IoU thresholds) on the Pascal
    VOC and Cityscapes datasets.

    Three-Dimensional Segmentation of Vesicular Networks of Fungal Hyphae in Macroscopic Microscopy Image Stacks

    P. Saponaro, W. Treible, A. Kolagunda, S. Rhein, J. Caplan, C. Kambhamettu, R. Wisser
    Comments: This is submitted to ICIP 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    Automating the extraction and quantification of features from
    three-dimensional (3-D) image stacks is a critical task for advancing computer
    vision research. The union of 3-D image acquisition and analysis enables the
    quantification of biological resistance of a plant tissue to fungal infection
    through the analysis of attributes such as fungal penetration depth, fungal
    mass, and branching of the fungal network of connected cells. From an image
    processing perspective, these tasks reduce to segmentation of vessel-like
    structures and the extraction of features from their skeletonization. In order
    to sample multiple infection events for analysis, we have developed an approach
    we refer to as macroscopic microscopy. However, macroscopic microscopy produces
    high-resolution image stacks that pose challenges to routine approaches and are
    difficult for a human to annotate to obtain ground truth data. We present a
    synthetic hyphal network generator, a comparison of several vessel segmentation
    methods, and a minimum spanning tree method for connecting small gaps resulting
    from imperfections in imaging or incomplete skeletonization of hyphal networks.
    Qualitative results are shown for real microscopic data. We believe the
    comparison of vessel detectors on macroscopic microscopy data, the synthetic
    vessel generator, and the gap closing technique are beneficial to the image
    processing community.

    Automated Unsupervised Segmentation of Liver Lesions in CT scans via Cahn-Hilliard Phase Separation

    Jana Lipková, Markus Rempfler, Patrick Christ, John Lowengrub, Bjoern H. Menze
    Subjects: Computer Vision and Pattern Recognition (cs.CV)

    The segmentation of liver lesions is crucial for detection, diagnosis and
    monitoring progression of liver cancer. However, design of accurate automated
    methods remains challenging due to high noise in CT scans, low contrast between
    liver and lesions, as well as large lesion variability. We propose a 3D
    automatic, unsupervised method for liver lesions segmentation using a phase
    separation approach. It is assumed that liver is a mixture of two phases:
    healthy liver and lesions, represented by different image intensities polluted
    by noise. The Cahn-Hilliard equation is used to remove the noise and separate
    the mixture into two distinct phases with well-defined interfaces. This
    simplifies the lesion detection and segmentation task drastically and enables
    to segment liver lesions by thresholding the Cahn-Hilliard solution. The method
    was tested on 3Dircadb and LITS dataset.

    Pay Attention to Those Sets! Learning Quantification from Images

    Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi
    Comments: Submitted to Journal Paper, 28 pages, 12 figures, 5 tables
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Major advances have recently been made in merging language and vision
    representations. But most tasks considered so far have confined themselves to
    the processing of objects and lexicalised relations amongst objects (content
    words). We know, however, that humans (even pre-school children) can abstract
    over raw data to perform certain types of higher-level reasoning, expressed in
    natural language by function words. A case in point is given by their ability
    to learn quantifiers, i.e. expressions like ‘few’, ‘some’ and ‘all’. From
    formal semantics and cognitive linguistics, we know that quantifiers are
    relations over sets which, as a simplification, we can see as proportions. For
    instance, in ‘most fish are red’, most encodes the proportion of fish which are
    red fish. In this paper, we study how well current language and vision
    strategies model such relations. We show that state-of-the-art attention
    mechanisms coupled with a traditional linguistic formalisation of quantifiers
    gives best performance on the task. Additionally, we provide insights on the
    role of ‘gist’ representations in quantification. A ‘logical’ strategy to
    tackle the task would be to first obtain a numerosity estimation for the two
    involved sets and then compare their cardinalities. We however argue that
    precisely identifying the composition of the sets is not only beyond current
    state-of-the-art models but perhaps even detrimental to a task that is most
    efficiently performed by refining the approximate numerosity estimator of the
    system.

    Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

    Dat Tien Nguyen, Firoj Alam, Ferda Ofli, Muhammad Imran
    Comments: Accepted for publication in the 14th International Conference on Information Systems For Crisis Response and Management (ISCRAM), 2017
    Subjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)

    The extensive use of social media platforms, especially during disasters,
    creates unique opportunities for humanitarian organizations to gain situational
    awareness and launch relief operations accordingly. In addition to the textual
    content, people post overwhelming amounts of imagery data on social networks
    within minutes of a disaster hit. Studies point to the importance of this
    online imagery content for emergency response. Despite recent advances in the
    computer vision field, automatic processing of the crisis-related social media
    imagery data remains a challenging task. It is because a majority of which
    consists of redundant and irrelevant content. In this paper, we present an
    image processing pipeline that comprises de-duplication and relevancy filtering
    mechanisms to collect and filter social media image content in real-time during
    a crisis event. Results obtained from extensive experiments on real-world
    crisis datasets demonstrate the significance of the proposed pipeline for
    optimal utilization of both human and machine computing resources.


    Artificial Intelligence

    Can AIs learn to avoid human interruption?

    El Mahdi El Mhamdi, Rachid Guerraoui, Hadrien Hendrikx, Alexandre Maurer
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

    Recent progress in artificial intelligence enabled the design and
    implementation of autonomous computing devices, agents, that may interact and
    learn from each other to achieve certain goals. Sometimes however, a human
    operator needs to intervene and interrupt an agent in order to prevent certain
    dangerous situations. Yet, as part of their learning process, agents may link
    these interruptions that impact their reward to specific states, and
    deliberately avoid them. The situation is particularly challenging in a
    distributed context because agents might not only learn from their own past
    interruptions, but also from those of other agents. This paper defines the
    notion of safe interruptibility as a distributed computing problem, and studies
    this notion in the two main learning frameworks: joint action learners and
    independent learners. We give realistic sufficient conditions on the learning
    algorithm for safe interruptibility in the case of joint action learners, yet
    show that these conditions are not sufficient for independent learners. We show
    however that if agents can detect interruptions, it is possible to prune the
    observations to ensure safe interruptibility even for independent learners

    Formal approaches to a definition of agents

    Martin Biehl
    Comments: PhD thesis, 198 pages
    Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Multiagent Systems (cs.MA)

    This thesis contributes to the formalisation of the notion of an agent within
    the class of finite multivariate Markov chains. Agents are seen as entities
    that act, perceive, and are goal-directed.

    We present a new measure that can be used to identify entities (called
    (iota)-entities), some general requirements for entities in multivariate
    Markov chains, as well as formal definitions of actions and perceptions
    suitable for such entities.

    The intuition behind (iota)-entities is that entities are spatiotemporal
    patterns for which every part makes every other part more probable. The
    measure, complete local integration (CLI), is formally investigated in general
    Bayesian networks. It is based on the specific local integration (SLI) which is
    measured with respect to a partition. CLI is the minimum value of SLI over all
    partitions. We prove that (iota)-entities are blocks in specific partitions of
    the global trajectory. These partitions are the finest partitions that achieve
    a given SLI value. We also establish the transformation behaviour of SLI under
    permutations of nodes in the network.

    We go on to present three conditions on general definitions of entities.
    These are not fulfilled by sets of random variables i.e. the perception-action
    loop, which is often used to model agents, is too restrictive. We propose that
    any general entity definition should in effect specify a subset (called an an
    entity-set) of the set of all spatiotemporal patterns of a given multivariate
    Markov chain. The set of (iota)-entities is such a set. Importantly the
    perception-action loop also induces an entity-set.

    We then propose formal definitions of actions and perceptions for arbitrary
    entity-sets. These specialise to standard notions in case of the
    perception-action loop entity-set.

    Finally we look at some very simple examples.

    Mixed Graphical Models for Causal Analysis of Multi-modal Variables

    Andrew J Sedgewick, Joseph D. Ramsey, Peter Spirtes, Clark Glymour, Panayiotis V. Benos
    Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    Graphical causal models are an important tool for knowledge discovery because
    they can represent both the causal relations between variables and the
    multivariate probability distributions over the data. Once learned, causal
    graphs can be used for classification, feature selection and hypothesis
    generation, while revealing the underlying causal network structure and thus
    allowing for arbitrary likelihood queries over the data. However, current
    algorithms for learning sparse directed graphs are generally designed to handle
    only one type of data (continuous-only or discrete-only), which limits their
    applicability to a large class of multi-modal biological datasets that include
    mixed type variables. To address this issue, we developed new methods that
    modify and combine existing methods for finding undirected graphs with methods
    for finding directed graphs. These hybrid methods are not only faster, but also
    perform better than the directed graph estimation methods alone for a variety
    of parameter settings and data set sizes. Here, we describe a new conditional
    independence test for learning directed graphs over mixed data types and we
    compare performances of different graph learning strategies on synthetic data.

    Basic Formal Properties of A Relational Model of The Mathematical Theory of Evidence

    Mieczysław A. Kłopotek, Sławomir T. Wierzchoń
    Comments: 23 pages
    Journal-ref: This is the preliminary version of the paper published in
    Demonstratio Mathematica. Vol XXXI No 3,1998, pp. 669-688
    Subjects: Artificial Intelligence (cs.AI)

    The paper presents a novel view of the Dempster-Shafer belief function as a
    measure of diversity in relational data bases. It is demonstrated that under
    the interpretation The Dempster rule of evidence combination corresponds to the
    join operator of the relational database theory. This rough-set based
    interpretation is qualitative in nature and can represent a number of belief
    function operators.

    The interpretation has the property that Given a definition of the belief
    measure of objects in the interpretation domain we can perform operations in
    this domain and the measure of the resulting object is derivable from measures
    of component objects via belief operator. We demonstrated this property for
    Dempster rule of combination, marginalization, Shafer’s conditioning,
    independent variables, Shenoy’s notion of conditional independence of
    variables.

    The interpretation is based on rough sets (in connection with decision
    tables), but differs from previous interpretations of this type in that it
    counts the diversity rather than frequencies in a decision table.

    Exploring Word Embeddings for Unsupervised Textual User-Generated Content Normalization

    Thales Felipe Costa Bertaglia, Maria das Graças Volpe Nunes
    Comments: Published in Proceedings of the 2nd Workshop on Noisy User-generated Text, 9 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Text normalization techniques based on rules, lexicons or supervised training
    requiring large corpora are not scalable nor domain interchangeable, and this
    makes them unsuitable for normalizing user-generated content (UGC). Current
    tools available for Brazilian Portuguese make use of such techniques. In this
    work we propose a technique based on distributed representation of words (or
    word embeddings). It generates continuous numeric vectors of
    high-dimensionality to represent words. The vectors explicitly encode many
    linguistic regularities and patterns, as well as syntactic and semantic word
    relationships. Words that share semantic similarity are represented by similar
    vectors. Based on these features, we present a totally unsupervised, expandable
    and language and domain independent method for learning normalization lexicons
    from word embeddings. Our approach obtains high correction rate of orthographic
    errors and internet slang in product reviews, outperforming the current
    available tools for Brazilian Portuguese.

    Pay Attention to Those Sets! Learning Quantification from Images

    Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi
    Comments: Submitted to Journal Paper, 28 pages, 12 figures, 5 tables
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Major advances have recently been made in merging language and vision
    representations. But most tasks considered so far have confined themselves to
    the processing of objects and lexicalised relations amongst objects (content
    words). We know, however, that humans (even pre-school children) can abstract
    over raw data to perform certain types of higher-level reasoning, expressed in
    natural language by function words. A case in point is given by their ability
    to learn quantifiers, i.e. expressions like ‘few’, ‘some’ and ‘all’. From
    formal semantics and cognitive linguistics, we know that quantifiers are
    relations over sets which, as a simplification, we can see as proportions. For
    instance, in ‘most fish are red’, most encodes the proportion of fish which are
    red fish. In this paper, we study how well current language and vision
    strategies model such relations. We show that state-of-the-art attention
    mechanisms coupled with a traditional linguistic formalisation of quantifiers
    gives best performance on the task. Additionally, we provide insights on the
    role of ‘gist’ representations in quantification. A ‘logical’ strategy to
    tackle the task would be to first obtain a numerosity estimation for the two
    involved sets and then compare their cardinalities. We however argue that
    precisely identifying the composition of the sets is not only beyond current
    state-of-the-art models but perhaps even detrimental to a task that is most
    efficiently performed by refining the approximate numerosity estimator of the
    system.

    Multi-Agent Diverse Generative Adversarial Networks

    Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri, Philip H. S. Torr, Puneet K. Dokania
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)

    This paper describes an intuitive generalization to the Generative
    Adversarial Networks (GANs) to generate samples while capturing diverse modes
    of the true data distribution. Firstly, we propose a very simple and intuitive
    multi-agent GAN architecture that incorporates multiple generators capable of
    generating samples from high probability modes. Secondly, in order to enforce
    different generators to generate samples from diverse modes, we propose two
    extensions to the standard GAN objective function. (1) We augment the generator
    specific GAN objective function with a diversity enforcing term that encourage
    different generators to generate diverse samples using a user-defined
    similarity based function. (2) We modify the discriminator objective function
    where along with finding the real and fake samples, the discriminator has to
    predict the generator which generated the given fake sample. Intuitively, in
    order to succeed in this task, the discriminator must learn to push different
    generators towards different identifiable modes. Our framework is generalizable
    in the sense that it can be easily combined with other existing variants of
    GANs to produce diverse samples. Experimentally we show that our framework is
    able to produce high quality diverse samples for the challenging tasks such as
    image/face generation and image-to-image translation. We also show that it is
    capable of learning a better feature representation in an unsupervised setting.

    SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications

    Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum
    Journal-ref: SemEval 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We describe the SemEval task of extracting keyphrases and relations between
    them from scientific documents, which is crucial for understanding which
    publications describe which processes, tasks and materials. Although this was a
    new task, we had a total of 26 submissions across 3 evaluation scenarios. We
    expect the task and the findings reported in this paper to be relevant for
    researchers working on understanding scientific content, as well as the broader
    knowledge base population and information extraction communities.

    Adaptive Relaxed ADMM: Convergence Theory and Practical Implementation

    Zheng Xu, Mario A. T. Figueiredo, Xiaoming Yuan, Christoph Studer, Tom Goldstein
    Comments: CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Numerical Analysis (cs.NA)

    Many modern computer vision and machine learning applications rely on solving
    difficult optimization problems that involve non-differentiable objective
    functions and constraints. The alternating direction method of multipliers
    (ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a
    generalization of ADMM that often achieves better performance, but its
    efficiency depends strongly on algorithm parameters that must be chosen by an
    expert user. We propose an adaptive method that automatically tunes the key
    algorithm parameters to achieve optimal performance without user oversight.
    Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM
    (ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A
    detailed convergence analysis of ARADMM is provided, and numerical results on
    several applications demonstrate fast practical convergence.

    AppLP: A Dialogue on Applications of Logic Programming

    David S. Warren, Yanhong A. Liu
    Comments: David S. Warren and Yanhong A. Liu (Editors). 33 pages. Including summaries by Christopher Kane and abstracts or position papers by M. Aref, J. Rosenwald, I. Cervesato, E.S.L. Lam, M. Balduccini, J. Lobo, A. Russo, E. Lupu, N. Leone, F. Ricca, G. Gupta, K. Marple, E. Salazar, Z. Chen, A. Sobhi, S. Srirangapalli, C.R. Ramakrishnan, N. Bj{o}rner, N.P. Lopes, A. Rybalchenko, and P. Tarau
    Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)

    This document describes the contributions of the 2016 Applications of Logic
    Programming Workshop (AppLP), which was held on October 17 and associated with
    the International Conference on Logic Programming (ICLP) in Flushing, New York
    City.


    Information Retrieval

    Embedded Collaborative Filtering for "Cold Start" Prediction

    Yubo Zhou, Ali Nadaf
    Subjects: Information Retrieval (cs.IR)

    Using only implicit data, many recommender systems fail in general to provide
    a precise set of recommendations to users with limited interaction history.
    This issue is regarded as the “Cold Start” problem and is typically resolved by
    switching to content-based approaches where extra costly information is
    required. In this paper, we use a dimensionality reduction algorithm, Word2Vec
    (W2V), originally applied in Natural Language Processing problems under the
    framework of Collaborative Filtering (CF) to tackle the “Cold Start” problem
    using only implicit data. This combined method is named Embedded Collaborative
    Filtering (ECF). An experiment is conducted to determine the performance of ECF
    on two different implicit data sets. We show that the ECF approach outperforms
    other popular and state-of-the-art approaches in “Cold Start” scenarios.


    Computation and Language

    Exploring Word Embeddings for Unsupervised Textual User-Generated Content Normalization

    Thales Felipe Costa Bertaglia, Maria das Graças Volpe Nunes
    Comments: Published in Proceedings of the 2nd Workshop on Noisy User-generated Text, 9 pages
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

    Text normalization techniques based on rules, lexicons or supervised training
    requiring large corpora are not scalable nor domain interchangeable, and this
    makes them unsuitable for normalizing user-generated content (UGC). Current
    tools available for Brazilian Portuguese make use of such techniques. In this
    work we propose a technique based on distributed representation of words (or
    word embeddings). It generates continuous numeric vectors of
    high-dimensionality to represent words. The vectors explicitly encode many
    linguistic regularities and patterns, as well as syntactic and semantic word
    relationships. Words that share semantic similarity are represented by similar
    vectors. Based on these features, we present a totally unsupervised, expandable
    and language and domain independent method for learning normalization lexicons
    from word embeddings. Our approach obtains high correction rate of orthographic
    errors and internet slang in product reviews, outperforming the current
    available tools for Brazilian Portuguese.

    Pay Attention to Those Sets! Learning Quantification from Images

    Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi
    Comments: Submitted to Journal Paper, 28 pages, 12 figures, 5 tables
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

    Major advances have recently been made in merging language and vision
    representations. But most tasks considered so far have confined themselves to
    the processing of objects and lexicalised relations amongst objects (content
    words). We know, however, that humans (even pre-school children) can abstract
    over raw data to perform certain types of higher-level reasoning, expressed in
    natural language by function words. A case in point is given by their ability
    to learn quantifiers, i.e. expressions like ‘few’, ‘some’ and ‘all’. From
    formal semantics and cognitive linguistics, we know that quantifiers are
    relations over sets which, as a simplification, we can see as proportions. For
    instance, in ‘most fish are red’, most encodes the proportion of fish which are
    red fish. In this paper, we study how well current language and vision
    strategies model such relations. We show that state-of-the-art attention
    mechanisms coupled with a traditional linguistic formalisation of quantifiers
    gives best performance on the task. Additionally, we provide insights on the
    role of ‘gist’ representations in quantification. A ‘logical’ strategy to
    tackle the task would be to first obtain a numerosity estimation for the two
    involved sets and then compare their cardinalities. We however argue that
    precisely identifying the composition of the sets is not only beyond current
    state-of-the-art models but perhaps even detrimental to a task that is most
    efficiently performed by refining the approximate numerosity estimator of the
    system.

    SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Scientific Publications

    Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum
    Journal-ref: SemEval 2017
    Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

    We describe the SemEval task of extracting keyphrases and relations between
    them from scientific documents, which is crucial for understanding which
    publications describe which processes, tasks and materials. Although this was a
    new task, we had a total of 26 submissions across 3 evaluation scenarios. We
    expect the task and the findings reported in this paper to be relevant for
    researchers working on understanding scientific content, as well as the broader
    knowledge base population and information extraction communities.

    Character-Word LSTM Language Models

    Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq
    Journal-ref: European Chapter of the Association for Computational Linguistics
    (EACL) 2017, Valencia, Spain, pp. 417-427
    Subjects: Computation and Language (cs.CL)

    We present a Character-Word Long Short-Term Memory Language Model which both
    reduces the perplexity with respect to a baseline word-level language model and
    reduces the number of parameters of the model. Character information can reveal
    structural (dis)similarities between words and can even be used when a word is
    out-of-vocabulary, thus improving the modeling of infrequent and unknown words.
    By concatenating word and character embeddings, we achieve up to 2.77% relative
    improvement on English compared to a baseline model with a similar amount of
    parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level
    models with a larger number of parameters.

    Entity Linking for Queries by Searching Wikipedia Sentences

    Chuanqi Tan, Furu Wei, Pengjie Ren, Weifeng Lv, Ming Zhou
    Subjects: Computation and Language (cs.CL)

    We present a simple yet effective approach for linking entities in queries.
    The key idea is to search sentences similar to a query from Wikipedia articles
    and directly use the human-annotated entities in the similar sentences as
    candidate entities for the query. Then, we employ a rich set of features, such
    as link-probability, context-matching, word embeddings, and relatedness among
    candidate entities as well as their related entities, to rank the candidates
    under a regression based framework. The advantages of our approach lie in two
    aspects, which contribute to the ranking process and final linking result.
    First, it can greatly reduce the number of candidate entities by filtering out
    irrelevant entities with the words in the query. Second, we can obtain the
    query sensitive prior probability in addition to the static link-probability
    derived from all Wikipedia articles. We conduct experiments on two benchmark
    datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ
    dataset. Experimental results show that our method outperforms state-of-the-art
    systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ
    dataset.

    Improving Implicit Semantic Role Labeling by Predicting Semantic Frame Arguments

    Quynh Ngoc Thi Do, Steven Bethard, Marie-Francine Moens
    Subjects: Computation and Language (cs.CL)

    We introduce an approach to implicit semantic role labeling (iSRL) based on a
    recurrent neural semantic frame model that learns probability distributions
    over sequences of explicit semantic frame arguments. On the NomBank iSRL test
    set, the approach results in better state-of-the-art performance with much less
    reliance on manually constructed language resources.

    Prosody: The Rhythms and Melodies of Speech

    Dafydd Gibbon
    Comments: 29 pages, 21 figures
    Subjects: Computation and Language (cs.CL)

    The present contribution is a tutorial on selected aspects of prosody, the
    rhythms and melodies of speech, based on a course of the same name at the
    Summer School on Contemporary Phonetics and Phonology at Tongji University,
    Shanghai, China in July 2016. The tutorial is not intended as an introduction
    to experimental methodology or as an overview of the literature on the topic,
    but as an outline of observationally accessible aspects of fundamental
    frequency and timing patterns with the aid of computational visualisation,
    situated in a semiotic framework of sign ranks and interpretations. After an
    informal introduction to the basic concepts of prosody in the introduction and
    a discussion of the place of prosody in the architecture of language, a
    selection of acoustic phonetic topics in phonemic tone and accent prosody, word
    prosody, phrasal prosody and discourse prosody are discussed, and a stylisation
    method for visualising aspects of prosody is introduced. Examples are taken
    from a number of typologically different languages: Anyi/Agni (Niger-Congo>Kwa,
    Ivory Coast), English, Kuki-Thadou (Sino-Tibetan, North-East India and
    Myanmar), Mandarin Chinese, Tem (Niger-Congo>Gur, Togo) and Farsi. The main
    focus is on fundamental frequency patterns, but issues of timing and rhythm are
    also discussed. In the final section, further reading and possible future
    research directions are outlined.

    On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models

    Steffen Eger, Alexander Mehler
    Comments: Published at ACL 2016, Berlin (short papers)
    Subjects: Computation and Language (cs.CL)

    We consider two graph models of semantic change. The first is a time-series
    model that relates embedding vectors from one time period to embedding vectors
    of previous time periods. In the second, we construct one graph for each word:
    nodes in this graph correspond to time points and edge weights to the
    similarity of the word’s meaning across two time points. We apply our two
    models to corpora across three different languages. We find that semantic
    change is linear in two senses. Firstly, today’s embedding vectors (= meaning)
    of words can be derived as linear combinations of embedding vectors of their
    neighbors in previous time periods. Secondly, self-similarity of words decays
    linearly in time. We consider both findings as new laws/hypotheses of semantic
    change.

    A Trolling Hierarchy in Social Media and A Conditional Random Field For Trolling Detection

    Luis Gerardo Mojica
    Subjects: Computation and Language (cs.CL)

    An-ever increasing number of social media websites, electronic newspapers and
    Internet forums allow visitors to leave comments for others to read and
    interact. This exchange is not free from participants with malicious
    intentions, which do not contribute with the written conversation. Among
    different communities users adopt strategies to handle such users. In this
    paper we present a comprehensive categorization of the trolling phenomena
    resource, inspired by politeness research and propose a model that jointly
    predicts four crucial aspects of trolling: intention, interpretation, intention
    disclosure and response strategy. Finally, we present a new annotated dataset
    containing excerpts of conversations involving trolls and the interactions with
    other users that we hope will be a useful resource for the research community.

    Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks

    Zhe Liu, Anbang Xu, Mengdi Zhang, Jalal Mahmud, Vibha Sinha
    Subjects: Computation and Language (cs.CL)

    One problem that every presenter faces when delivering a public discourse is
    how to hold the listeners’ attentions or to keep them involved. Therefore, many
    studies in conversation analysis work on this issue and suggest qualitatively
    con-structions that can effectively lead to audience’s applause. To investigate
    these proposals quantitatively, in this study we an-alyze the transcripts of
    2,135 TED Talks, with a particular fo-cus on the rhetorical devices that are
    used by the presenters for applause elicitation. Through conducting regression
    anal-ysis, we identify and interpret 24 rhetorical devices as triggers of
    audience applauding. We further build models that can rec-ognize
    applause-evoking sentences and conclude this work with potential implications.

    From Modal to Multimodal Ambiguities: a Classification Approach

    Maria Chiara Caschera, Fernando Ferri, Patrizia Grifoni
    Comments: 23 pages
    Journal-ref: JNIT (Journal of Next Generation Information Technology), Volume 4
    Issue 5, July, 2013,Pages 87-109, ISSN 2092-8637. GlobalCIS (Convergence
    Information Society, Republic of Korea)
    Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)

    This paper deals with classifying ambiguities for Multimodal Languages. It
    evolves the classifications and the methods of the literature on ambiguities
    for Natural Language and Visual Language, empirically defining an original
    classification of ambiguities for multimodal interaction using a linguistic
    perspective. This classification distinguishes between Semantic and Syntactic
    multimodal ambiguities and their subclasses, which are intercepted using a
    rule-based method implemented in a software module. The experimental results
    have achieved an accuracy of the obtained classification compared to the
    expected one, which are defined by the human judgment, of 94.6% for the
    semantic ambiguities classes, and 92.1% for the syntactic ambiguities classes.

    Word Embeddings via Tensor Factorization

    Eric Bailey, Shuchin Aeron
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

    Most popular word embedding techniques involve implicit or explicit
    factorization of a word co-occurrence based matrix into low rank factors. In
    this paper, we aim to generalize this trend by using numerical methods to
    factor higher-order word co-occurrence based arrays, or extit{tensors}. We
    present four word embeddings using tensor factorization and analyze their
    advantages and disadvantages. One of our main contributions is a novel joint
    symmetric tensor factorization technique related to the idea of coupled tensor
    factorization. We show that embeddings based on tensor factorization can be
    used to discern the various meanings of polysemous words without being
    explicitly trained to do so, and motivate the intuition behind why this works
    in a way that doesn’t with existing methods. We also modify an existing word
    embedding evaluation metric known as Outlier Detection [Camacho-Collados and
    Navigli, 2016] to evaluate the quality of the order-(N) relations that a word
    embedding captures, and show that tensor-based methods outperform existing
    matrix-based methods at this task. Experimentally, we show that all of our word
    embeddings either outperform or are competitive with state-of-the-art baselines
    commonly used today on a variety of recent datasets. Suggested applications of
    tensor factorization-based word embeddings are given, and all source code and
    pre-trained vectors are publicly available online.

    Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

    Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
    Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)

    Voice conversion (VC) using sequence-to-sequence learning of context
    posterior probabilities is proposed. Conventional VC using shared context
    posterior probabilities predicts target speech parameters from the context
    posterior probabilities estimated from the source speech parameters. Although
    conventional VC can be built from non-parallel data, it is difficult to convert
    speaker individuality such as phonetic property and speaking rate contained in
    the posterior probabilities because the source posterior probabilities are
    directly used for predicting target speech parameters. In this work, we assume
    that the training data partly include parallel speech data and propose
    sequence-to-sequence learning between the source and target posterior
    probabilities. The conversion models perform non-linear and variable-length
    transformation from the source probability sequence to the target one. Further,
    we propose a joint training algorithm for the modules. In contrast to
    conventional VC, which separately trains the speech recognition that estimates
    posterior probabilities and the speech synthesis that predicts target speech
    parameters, our proposed method jointly trains these modules along with the
    proposed probability conversion modules. Experimental results demonstrate that
    our approach outperforms the conventional VC.


    Distributed, Parallel, and Cluster Computing

    A Decision Tree Based Approach Towards Adaptive Profiling of Cloud Applications

    Ioannis Giannakopoulos, Dimitrios Tsoumakos, Nectarios Koziris
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Performance (cs.PF)

    Cloud computing has allowed applications to allocate and elastically utilize
    massive amounts of resources of different types, leading to an exponential
    growth of the applications’ configuration space and increased difficulty in
    predicting their performance. In this work, we describe a novel, automated
    profiling methodology that makes no assumptions on application structure. Our
    approach utilizes oblique Decision Trees in order to recursively partition an
    application’s configuration space in disjoint regions, choose a set of
    representative samples from each subregion according to a defined policy and
    returns a model for the entire configuration space as a composition of linear
    models over each subregion. An extensive experimental evaluation over real-life
    applications and synthetic performance functions showcases that our scheme
    outperforms other state-of-the-art profiling methodologies. It particularly
    excels at reflecting abnormalities and discontinuities of the performance
    function, allowing the user to influence the sampling policy based on the
    modeling accuracy, the space coverage and the deployment cost.

    Implementing a Cloud Platform for Autonomous Driving

    Shaoshan Liu, Jie Tang, Chao Wang, Quan Wang, Jean-Luc Gaudiot
    Comments: 8 pages, 12 figures
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Robotics (cs.RO)

    Autonomous driving clouds provide essential services to support autonomous
    vehicles. Today these services include but not limited to distributed
    simulation tests for new algorithm deployment, offline deep learning model
    training, and High-Definition (HD) map generation. These services require
    infrastructure support including distributed computing, distributed storage, as
    well as heterogeneous computing. In this paper, we present the details of how
    we implement a unified autonomous driving cloud infrastructure, and how we
    support these services on top of this infrastructure.

    MapReduce Scheduler: A 360-degree view

    Rajdeep Das, Rohit Pratap Singh, Ripon Patgiri
    Comments: Journal Article
    Journal-ref: International Journal of Current Engineering and Scientific
    Research, volume 3(11), pages 88-100, 2016
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Undoubtedly, the MapReduce is the most powerful programming paradigm in
    distributed computing. The enhancement of the MapReduce is essential and it can
    lead the computing faster. Therefore, here are many scheduling algorithms to
    discuss based on their characteristics. Moreover, there are many shortcoming to
    discover in this field. In this article, we present the state-of-the-art
    scheduling algorithm to enhance the understanding of the algorithms. The
    algorithms are presented systematically such that there can be many future
    possibilities in scheduling algorithm through this article. In this paper, we
    provide in-depth insight on the MapReduce scheduling algorithm. In addition, we
    discuss various issues of MapReduce scheduler developed for large-scale
    computing as well as heterogeneous environment.

    Approximation Algorithms for Barrier Sweep Coverage

    Barun Gorain, Partha Sarathi Mandal
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Time-varying coverage, namely sweep coverage is a recent development in the
    area of wireless sensor networks, where a small number of mobile sensors sweep
    or monitor comparatively large number of locations periodically. In this
    article we study barrier sweep coverage with mobile sensors where the barrier
    is considered as a finite length continuous curve on a plane. The coverage at
    every point on the curve is time-variant. We propose an optimal solution for
    sweep coverage of a finite length continuous curve. Usually energy source of a
    mobile sensor is battery with limited power, so energy restricted sweep
    coverage is a challenging problem for long running applications. We propose an
    energy restricted sweep coverage problem where every mobile sensors must visit
    an energy source frequently to recharge or replace its battery. We propose a
    (frac{13}{3})-approximation algorithm for this problem. The proposed algorithm
    for multiple curves achieves the best possible approximation factor 2 for a
    special case. We propose a 5-approximation algorithm for the general problem.
    As an application of the barrier sweep coverage problem for a set of line
    segments, we formulate a data gathering problem. In this problem a set of
    mobile sensors is arbitrarily monitoring the line segments one for each. A set
    of data mules periodically collects the monitoring data from the set of mobile
    sensors. We prove that finding the minimum number of data mules to collect data
    periodically from every mobile sensor is NP-hard and propose a 3-approximation
    algorithm to solve it.

    Gathering in Dynamic Rings

    Giuseppe Antonio Di Luna, Paola Flocchini, Linda Pagli, Giuseppe Prencipe, Nicola Santoro, Giovanni Viglietta
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    The gathering problem requires a set of mobile agents, arbitrarily positioned
    at different nodes of the network to group within finite time at the same
    location, not fixed in advanced.

    The extensive existing literature on this problem shares the same fundamental
    assumption that the topological structure does not change during the rendezvous
    or the gathering.

    In this paper we start the investigation of gathering in dynamic graphs, that
    is networks where the topology changes continuously and at unpredictable
    locations.

    We study the feasibility of gathering mobile agents, identical and without
    explicit communication capabilities, in a dynamic ring of anonymous nodes; the
    class of dynamics we consider is the classic 1-interval-connectivity. In
    particular, we focus on the impact that factors such as chirality (i.e., common
    sense of orientation) and cross detection (i.e., the ability to detect, when
    traversing an edge, whether some agent is traversing it in the other
    direction), have on the solvability of the problem.

    We establish several results. We provide a complete characterization of the
    classes of initial configurations from which gathering problem is solvable in
    presence and in absence of cross detection. We provide distributed algorithms
    that allow the agents to gather within low polynomial time. In particular, the
    protocol for gathering with cross detection is time optimal. We show that cross
    detection is a powerful computational element; furthermore, we prove that, with
    cross detection, knowledge of the ring size is strictly more powerful than
    knowledge of the number of agents.

    From our investigation it follows that, for the gathering problem, the
    computational obstacles created by the dynamic nature of the ring can be
    overcome by the presence of chirality or of cross-detection.

    Practical Synchronous Byzantine Consensus

    Ling Ren, Kartik Nayak, Ittai Abraham, Srinivas Devadas
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

    We present new protocols for Byzantine state machine replication and
    Byzantine agreement in the synchronous and authenticated setting. The
    celebrated PBFT state machine replication protocol tolerates (f) Byzantine
    faults in an asynchronous setting using (3f+1) replicas, and has since been
    studied or deployed by numerous works. In this work, we improve the Byzantine
    fault tolerance to (n=2f+1) by utilizing the synchrony assumption. The key
    challenge is to ensure a quorum intersection at one emph{honest} replica. Our
    solution is to rely on the synchrony assumption to form a emph{post-commit}
    quorum of size (2f+1), which intersects at (f+1) replicas with any
    emph{pre-commit} quorums of size (f+1). Our protocol also solves synchronous
    authenticated Byzantine agreement in fewer rounds than the best existing
    solution (Katz and Koo, 2006). A challenge in this direction is to handle
    non-simultaneous termination, which we solve by introducing a notion of
    emph{virtual} participation after termination. Our protocols may be applied to
    build practical synchronous Byzantine fault tolerant systems and improve
    cryptographic protocols such as secure multiparty computation and
    cryptocurrencies when synchrony can be assumed.

    HiFrames: High Performance Data Frames in a Scripting Language

    Ehsan Totoni, Wajih Ul Hassan, Todd A. Anderson, Tatiana Shpeisman
    Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

    Data frames in scripting languages are essential abstractions for processing
    structured data. However, existing data frame solutions are either not
    distributed (e.g., Pandas in Python) and therefore have limited scalability, or
    they are not tightly integrated with array computations (e.g., Spark SQL). This
    paper proposes a novel compiler-based approach where we integrate data frames
    into the High Performance Analytics Toolkit (HPAT) to build HiFrames. It
    provides expressive and flexible data frame APIs which are tightly integrated
    with array operations. HiFrames then automatically parallelizes and compiles
    relational operations along with other array computations in end-to-end data
    analytics programs, and generates efficient MPI/C++ code. We demonstrate that
    HiFrames is significantly faster than alternatives such as Spark SQL on
    clusters, without forcing the programmer to switch to embedded SQL for part of
    the program. HiFrames is 3.6x to 70x faster than Spark SQL for basic relational
    operations, and can be up to 20,000x faster for advanced analytics operations,
    such as weighted moving averages (WMA), that the map-reduce paradigm cannot
    handle effectively. HiFrames is also 5x faster than Spark SQL for TPCx-BB Q26
    on 64 nodes of Cori supercomputer.

    Massively parallel implementation and approaches to simulate quantum dynamics using Krylov subspace techniques

    Marlon Brenes, Vipin Kerala Varma, Antonello Scardicchio, Ivan Girotto
    Comments: 16 pages, 6 figures, 3 tables
    Subjects: Computational Physics (physics.comp-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Strongly Correlated Electrons (cond-mat.str-el); Distributed, Parallel, and Cluster Computing (cs.DC)

    We have developed an application and implemented parallel algorithms in order
    to provide a computational framework suitable for massively parallel
    supercomputers to study the unitary dynamics of quantum systems. We use
    renowned parallel libraries such as PETSc/SLEPc combined with high-performance
    computing approaches in order to overcome the large memory requirements to be
    able to study systems whose Hilbert space dimension comprises over 9 billion
    independent quantum states. Moreover, we provide descriptions on the parallel
    approach used for the three most important stages of the simulation: handling
    the Hilbert subspace basis, constructing a matrix representation for a generic
    Hamiltonian operator and the time evolution of the system by means of the
    Krylov subspace methods. We employ our setup to study the evolution of
    quasidisordered and clean many-body systems, focussing on the return
    probability and related dynamical exponents: the large system sizes accessible
    provide novel insights into their thermalization properties.

    Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

    Stanislav Minsker, Nate Strawn
    Subjects: Statistics Theory (math.ST); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

    This paper presents new algorithms for distributed statistical estimation
    that can take advantage of the divide-and-conquer approach. We show that one of
    the key benefits attained by an appropriate divide-and-conquer strategy is
    robustness, an important characteristic of large distributed systems. We
    introduce a class of algorithms that are based on the properties of the
    geometric median, establish connections between performance of these
    distributed algorithms and rates of convergence in normal approximation, and
    provide tight deviations guarantees for resulting estimators in the form of
    exponential concentration inequalities. We illustrate our techniques with
    several examples; in particular, we obtain new results for the median-of-means
    estimator, as well as provide performance guarantees for robust distributed
    maximum likelihood estimation.

    Proceedings Tenth Workshop on Programming Language Approaches to Concurrency- and Communication-cEntric Software

    Vasco T. Vasconcelos (University of Lisbon), Philipp Haller (KTH Royal Institute of Technology)
    Journal-ref: EPTCS 246, 2017
    Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

    PLACES 2017 (full title: Programming Language Approaches to Concurrency- and
    Communication-cEntric Software) is the tenth edition of the PLACES workshop
    series. After the first PLACES, which was affiliated to DisCoTec in 2008, the
    workshop has been part of ETAPS every year since 2009 and is now an established
    part of the ETAPS satellite events. PLACES 2017 was held on 29th April in
    Uppsala, Sweden. The workshop series was started in order to promote the
    application of novel programming language ideas to the increasingly important
    problem of developing software for systems in which concurrency and
    communication are intrinsic aspects. This includes software for both multi-core
    systems and large-scale distributed and/or service-oriented systems. The scope
    of PLACES includes new programming language features, whole new programming
    language designs, new type systems, new semantic approaches, new program
    analysis techniques, and new implementation mechanisms. This volume consists of
    the papers accepted for presentation at the workshop.

    Exploring an Infinite Space with Finite Memory Scouts

    Lihi Cohen, Yuval Emek, Oren Louidor, Jara Uitto
    Comments: 33 pages
    Subjects: Probability (math.PR); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)

    Consider a small number of scouts exploring the infinite (d)-dimensional grid
    with the aim of hitting a hidden target point. Each scout is controlled by a
    probabilistic finite automaton that determines its movement (to a neighboring
    grid point) based on its current state. The scouts, that operate under a fully
    synchronous schedule, communicate with each other (in a way that affects their
    respective states) when they share the same grid point and operate
    independently otherwise. Our main research question is: How many scouts are
    required to guarantee that the target admits a finite mean hitting time?
    Recently, it was shown that (d + 1) is an upper bound on the answer to this
    question for any dimension (d geq 1) and the main contribution of this paper
    comes in the form of proving that this bound is tight for (d in { 1, 2 }).


    Learning

    Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes

    Ahmed M. Alaa, Mihaela van der Schaar
    Subjects: Learning (cs.LG)

    We consider the problem of obtaining individualized estimates for the effect
    of a certain treatment given observational data. The problem differs
    fundamentally from classical supervised learning since for each individual
    subject, we either observe the response with or without the treatment but never
    both. Hence, estimating the effect of a treatment entails a causal inference
    task in which we need to estimate counterfactual outcomes. To address this
    problem, we propose a novel multi-task learning framework in which the
    individuals’ responses with and without the treatment are modeled as a
    vector-valued function that belongs to a reproducing kernel Hilbert space.
    Unlike previous methods for causal inference that use the G-computation
    formula, our approach does not obtain separate estimates for the treatment and
    control response surfaces, but rather obtains a joint estimate that ensures
    data efficiency in scenarios where the selection bias is strong. In order to be
    able to provide individualized measures of uncertainty in our estimates, we
    adopt a Bayesian approach for learning this vector-valued function using a
    multi-task Gaussian process prior; uncertainty is quantified via posterior
    credible intervals. We develop a novel risk based empirical Bayes approach for
    calibrating the Gaussian process hyper-parameters in a data-driven fashion
    based on gradient descent in which the update rule is itself learned from the
    data using a recurrent neural network. Experiments conducted on semi-synthetic
    data show that our algorithm significantly outperforms state-of-the-art causal
    inference methods.

    A Comparative Study for Predicting Heart Diseases Using Data Mining Classification Methods

    Israa Ahmed Zriqat, Ahmad Mousa Altamimi, Mohammad Azzeh
    Journal-ref: ISSN 1947-5500
    Subjects: Learning (cs.LG)

    Improving the precision of heart diseases detection has been investigated by
    many researchers in the literature. Such improvement induced by the
    overwhelming health care expenditures and erroneous diagnosis. As a result,
    various methodologies have been proposed to analyze the disease factors aiming
    to decrease the physicians practice variation and reduce medical costs and
    errors. In this paper, our main motivation is to develop an effective
    intelligent medical decision support system based on data mining techniques. In
    this context, five data mining classifying algorithms, with large datasets,
    have been utilized to assess and analyze the risk factors statistically related
    to heart diseases in order to compare the performance of the implemented
    classifiers (e.g., Na”ive Bayes, Decision Tree, Discriminant, Random Forest,
    and Support Vector Machine). To underscore the practical viability of our
    approach, the selected classifiers have been implemented using MATLAB tool with
    two datasets. Results of the conducted experiments showed that all
    classification algorithms are predictive and can give relatively correct
    answer. However, the decision tree outperforms other classifiers with an
    accuracy rate of 99.0% followed by Random forest. That is the case because both
    of them have relatively same mechanism but the Random forest can build ensemble
    of decision tree. Although ensemble learning has been proved to produce
    superior results, but in our case the decision tree has outperformed its
    ensemble version.

    Bayesian Recurrent Neural Networks

    Meire Fortunato, Charles Blundell, Oriol Vinyals
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In this work we explore a straightforward variational Bayes scheme for
    Recurrent Neural Networks. Firstly, we show that a simple adaptation of
    truncated backpropagation through time can yield good quality uncertainty
    estimates and superior regularisation at only a small extra computational cost
    during training. Secondly, we demonstrate how a novel kind of posterior
    approximation yields further improvements to the performance of Bayesian RNNs.
    We incorporate local gradient information into the approximate posterior to
    sharpen it around the current batch statistics. This technique is not exclusive
    to recurrent neural networks and can be applied more widely to train Bayesian
    neural networks. We also empirically demonstrate how Bayesian RNNs are superior
    to traditional RNNs on a language modelling benchmark and an image captioning
    task, as well as showing how each of these methods improve our model over a
    variety of other schemes for training them. We also introduce a new benchmark
    for studying uncertainty for language models so future methods can be easily
    compared.

    Distribution-free Evolvability of Vector Spaces: All it takes is a Generating Set

    Richard Nock, Frank Nielsen
    Subjects: Learning (cs.LG)

    In Valiant’s model of evolution, a class of representations is evolvable iff
    a polynomial-time process of random mutations guided by selection converges
    with high probability to a representation as (epsilon)-close as desired from
    the optimal one, for any required (epsilon>0). Several previous positive
    results exist that can be related to evolving a vector space, but each former
    result imposes restrictions either on (re)initialisations, distributions,
    performance functions and/or the mutator. In this paper, we show that all it
    takes to evolve a complete normed vector space is merely a set that generates
    the space. Furthermore, it takes only ( ilde{O}(1/epsilon^2)) steps and it is
    essentially strictly monotonic, agnostic and handles target drifts that rival
    some proven in fairly restricted settings. In the context of the model, we
    bring to the fore new results not documented previously. Evolution appears to
    occur in a mean-divergence model reminiscent of Markowitz mean-variance model
    for portfolio selection, and the risk-return efficient frontier of evolution
    shows an interesting pattern: when far from the optimum, the mutator always has
    access to mutations close to the efficient frontier. Toy experiments in
    supervised and unsupervised learning display promising directions for this
    scheme to be used as a (new) provable gradient-free stochastic optimisation
    algorithm.

    Pyramid Vector Quantization for Deep Learning

    Vincenzo Liguori
    Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    This paper explores the use of Pyramid Vector Quantization (PVQ) to reduce
    the computational cost for a variety of neural networks (NNs) while, at the
    same time, compressing the weights that describe them. This is based on the
    fact that the dot product between an N dimensional vector of real numbers and
    an N dimensional PVQ vector can be calculated with only additions and
    subtractions and one multiplication. This is advantageous since tensor
    products, commonly used in NNs, can be re-conduced to a dot product or a set of
    dot products. Finally, it is stressed that any NN architecture that is based on
    an operation that can be re-conduced to a dot product can benefit from the
    techniques described here.

    Supervised Infinite Feature Selection

    Sadegh Eskandari, Emre Akbas
    Subjects: Learning (cs.LG)

    In this paper, we present a new feature selection method that is suitable for
    both unsupervised and supervised problems. We build upon the recently proposed
    Infinite Feature Selection (IFS) method where feature subsets of all sizes
    (including infinity) are considered. We extend IFS in two ways. First, we
    propose a supervised version of it. Second, we propose new ways of forming the
    feature adjacency matrix that perform better for unsupervised problems. We
    extensively evaluate our methods on many benchmark datasets, including large
    image-classification datasets (PASCAL VOC), and show that our methods
    outperform both the IFS and the widely used “minimum-redundancy
    maximum-relevancy (mRMR)” feature selection algorithm.

    MLC Toolbox: A MATLAB/OCTAVE Library for Multi-Label Classification

    Keigo Kimura, Lu Sun, Mineichi Kudo
    Comments: Instruction pages are now under construction
    Subjects: Learning (cs.LG)

    Multi-Label Classification toolbox is a MATLAB/OCTAVE library for Multi-Label
    Classification (MLC). There exists a few Java libraries for MLC, but no
    MATLAB/OCTAVE library that covers various methods. This toolbox offers an
    environment for evaluation, comparison and visualization of the MLC results.
    One attraction of this toolbox is that it enables us to try many combinations
    of feature space dimension reduction, sample clustering, label space dimension
    reduction and ensemble, etc.

    Stein Variational Policy Gradient

    Yang Liu, Prajit Ramachandran, Qiang Liu, Jian Peng
    Subjects: Learning (cs.LG)

    Policy gradient methods have been successfully applied to many complex
    reinforcement learning problems. However, policy gradient methods suffer from
    high variance, slow convergence, and inefficient exploration. In this work, we
    introduce a maximum entropy policy optimization framework which explicitly
    encourages parameter exploration, and show that this framework can be reduced
    to a Bayesian inference problem. We then propose a novel Stein variational
    policy gradient method (SVPG) which combines existing policy gradient methods
    and a repulsive functional to generate a set of diverse but well-behaved
    policies. SVPG is robust to initialization and can easily be implemented in a
    parallel manner. On continuous control problems, we find that implementing SVPG
    on top of REINFORCE and advantage actor-critic algorithms improves both average
    return and data efficiency.

    Uncovering Group Level Insights with Accordant Clustering

    Amit Dhurandhar, Margareta Ackerman, Xiang Wang
    Comments: accepted to SDM 2017 (oral)
    Subjects: Learning (cs.LG)

    Clustering is a widely-used data mining tool, which aims to discover
    partitions of similar items in data. We introduce a new clustering paradigm,
    emph{accordant clustering}, which enables the discovery of (predefined) group
    level insights. Unlike previous clustering paradigms that aim to understand
    relationships amongst the individual members, the goal of accordant clustering
    is to uncover insights at the group level through the analysis of their
    members. Group level insight can often support a call to action that cannot be
    informed through previous clustering techniques. We propose the first accordant
    clustering algorithm, and prove that it finds near-optimal solutions when data
    possesses inherent cluster structure. The insights revealed by accordant
    clusterings enabled experts in the field of medicine to isolate successful
    treatments for a neurodegenerative disease, and those in finance to discover
    patterns of unnecessary spending.

    Joint Probabilistic Linear Discriminant Analysis

    Luciana Ferrer
    Comments: Technical report
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    Standard probabilistic discriminant analysis (PLDA) for speaker recognition
    assumes that the sample’s features (usually, i-vectors) are given by a sum of
    three terms: a term that depends on the speaker identity, a term that models
    the within-speaker variability and is assumed independent across samples, and a
    final term that models any remaining variability and is also independent across
    samples. In this work, we propose a generalization of this model where the
    within-speaker variability is not necessarily assumed independent across
    samples but dependent on another discrete variable. This variable, which we
    call the channel variable as in the standard PLDA approach, could be, for
    example, a discrete category for the channel characteristics, the language
    spoken by the speaker, the type of speech in the sample (conversational,
    monologue, read), etc. The value of this variable is assumed to be known during
    training but not during testing. Scoring is performed, as in standard PLDA, by
    computing a likelihood ratio between the null hypothesis that the two sides of
    a trial belong to the same speaker versus the alternative hypothesis that the
    two sides belong to different speakers. The two likelihoods are computed by
    marginalizing over two hypothesis about the channels in both sides of a trial:
    that they are the same and that they are different. This way, we expect that
    the new model will be better at coping with same-channel versus
    different-channel trials than standard PLDA, since knowledge about the channel
    (or language, or speech style) is used during training and implicitly
    considered during scoring.

    Fast Spectral Clustering Using Autoencoders and Landmarks

    Ershad Banijamali, Ali Ghodsi
    Comments: 8 Pages- Accepted in 14th International Conference on Image Analysis and Recognition
    Subjects: Learning (cs.LG); Machine Learning (stat.ML)

    In this paper, we introduce an algorithm for performing spectral clustering
    efficiently. Spectral clustering is a powerful clustering algorithm that
    suffers from high computational complexity, due to eigen decomposition. In this
    work, we first build the adjacency matrix of the corresponding graph of the
    dataset. To build this matrix, we only consider a limited number of points,
    called landmarks, and compute the similarity of all data points with the
    landmarks. Then, we present a definition of the Laplacian matrix of the graph
    that enable us to perform eigen decomposition efficiently, using a deep
    autoencoder. The overall complexity of the algorithm for eigen decomposition is
    (O(np)), where (n) is the number of data points and (p) is the number of
    landmarks. At last, we evaluate the performance of the algorithm in different
    experiments.

    On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks

    Arturs Backurs, Piotr Indyk, Ludwig Schmidt
    Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Learning (cs.LG); Machine Learning (stat.ML)

    Empirical risk minimization (ERM) is ubiquitous in machine learning and
    underlies most supervised learning methods. While there has been a large body
    of work on algorithms for various ERM problems, the exact computational
    complexity of ERM is still not understood. We address this issue for multiple
    popular ERM problems including kernel SVMs, kernel ridge regression, and
    training the final layer of a neural network. In particular, we give
    conditional hardness results for these problems based on complexity-theoretic
    assumptions such as the Strong Exponential Time Hypothesis. Under these
    assumptions, we show that there are no algorithms that solve the aforementioned
    ERM problems to high accuracy in sub-quadratic time. We also give similar
    hardness results for computing the gradient of the empirical loss, which is the
    main computational burden in many non-convex learning tasks.

    Multi-Agent Diverse Generative Adversarial Networks

    Arnab Ghosh, Viveka Kulharia, Vinay Namboodiri, Philip H. S. Torr, Puneet K. Dokania
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Learning (cs.LG); Machine Learning (stat.ML)

    This paper describes an intuitive generalization to the Generative
    Adversarial Networks (GANs) to generate samples while capturing diverse modes
    of the true data distribution. Firstly, we propose a very simple and intuitive
    multi-agent GAN architecture that incorporates multiple generators capable of
    generating samples from high probability modes. Secondly, in order to enforce
    different generators to generate samples from diverse modes, we propose two
    extensions to the standard GAN objective function. (1) We augment the generator
    specific GAN objective function with a diversity enforcing term that encourage
    different generators to generate diverse samples using a user-defined
    similarity based function. (2) We modify the discriminator objective function
    where along with finding the real and fake samples, the discriminator has to
    predict the generator which generated the given fake sample. Intuitively, in
    order to succeed in this task, the discriminator must learn to push different
    generators towards different identifiable modes. Our framework is generalizable
    in the sense that it can be easily combined with other existing variants of
    GANs to produce diverse samples. Experimentally we show that our framework is
    able to produce high quality diverse samples for the challenging tasks such as
    image/face generation and image-to-image translation. We also show that it is
    capable of learning a better feature representation in an unsupervised setting.

    Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs

    Martin Simonovsky, Nikos Komodakis
    Comments: Accepted to CVPR 2017; extended version
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    A number of problems can be formulated as prediction on graph-structured
    data. In this work, we generalize the convolution operator from regular grids
    to arbitrary graphs while avoiding the spectral domain, which allows us to
    handle graphs of varying size and connectivity. To move beyond a simple
    diffusion, filter weights are conditioned on the specific edge labels in the
    neighborhood of a vertex. Together with the proper choice of graph coarsening,
    we explore constructing deep neural networks for graph classification. In
    particular, we demonstrate the generality of our formulation in point cloud
    classification, where we set the new state of the art, and on a graph
    classification dataset, where we outperform other deep learning approaches.

    Opinion Polarization by Learning from Social Feedback

    Sven Banisch, Eckehard Olbrich
    Comments: Submitted to the Social Simulation Conference (Dublin 2017)
    Subjects: Physics and Society (physics.soc-ph); Learning (cs.LG); Social and Information Networks (cs.SI); Adaptation and Self-Organizing Systems (nlin.AO)

    We explore a new mechanism to explain polarization phenomena in opinion
    dynamics. The model is based on the idea that agents evaluate alternative views
    on the basis of the social feedback obtained on expressing them. A high support
    of the favored and therefore expressed opinion in the social environment, is
    treated as a positive social feedback which reinforces the value associated to
    this opinion. In this paper we concentrate on the model with dyadic
    communication and encounter probabilities defined by an unweighted,
    time-homogeneous network. The model captures polarization dynamics more
    plausibly compared to bounded confidence opinion models and avoids extensive
    opinion flipping usually present in binary opinion dynamics. We perform
    systematic simulation experiments to understand the role of network
    connectivity for the emergence of polarization.

    Can AIs learn to avoid human interruption?

    El Mahdi El Mhamdi, Rachid Guerraoui, Hadrien Hendrikx, Alexandre Maurer
    Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

    Recent progress in artificial intelligence enabled the design and
    implementation of autonomous computing devices, agents, that may interact and
    learn from each other to achieve certain goals. Sometimes however, a human
    operator needs to intervene and interrupt an agent in order to prevent certain
    dangerous situations. Yet, as part of their learning process, agents may link
    these interruptions that impact their reward to specific states, and
    deliberately avoid them. The situation is particularly challenging in a
    distributed context because agents might not only learn from their own past
    interruptions, but also from those of other agents. This paper defines the
    notion of safe interruptibility as a distributed computing problem, and studies
    this notion in the two main learning frameworks: joint action learners and
    independent learners. We give realistic sufficient conditions on the learning
    algorithm for safe interruptibility in the case of joint action learners, yet
    show that these conditions are not sufficient for independent learners. We show
    however that if agents can detect interruptions, it is possible to prune the
    observations to ensure safe interruptibility even for independent learners

    Unsupervised prototype learning in an associative-memory network

    Huiling Zhen, Shang-Nan Wang, Hai-Jun Zhou
    Comments: 10 pages
    Subjects: Neural and Evolutionary Computing (cs.NE); Disordered Systems and Neural Networks (cond-mat.dis-nn); Learning (cs.LG)

    Unsupervised learning in a generalized Hopfield associative-memory network is
    investigated in this work. First, we prove that the (generalized) Hopfield
    model is equivalent to a semi-restricted Boltzmann machine with a layer of
    visible neurons and another layer of hidden binary neurons, so it could serve
    as the building block for a multilayered deep-learning system. We then
    demonstrate that the Hopfield network can learn to form a faithful internal
    representation of the observed samples, with the learned memory patterns being
    prototypes of the input data. Furthermore, we propose a spectral method to
    extract a small set of concepts (idealized prototypes) as the most concise
    summary or abstraction of the empirical data.

    Parsimonious Random Vector Functional Link Network for Data Streams

    Mahardhika Pratama, Plamen P. Angelov, Edwin Lughofer, Meng Joo Er
    Comments: this paper is submitted for publication in Information Sciences
    Subjects: Neural and Evolutionary Computing (cs.NE); Learning (cs.LG)

    The theory of random vector functional link network (RVFLN) has provided a
    breakthrough in the design of neural networks (NNs) since it conveys solid
    theoretical justification of randomized learning. Existing works in RVFLN are
    hardly scalable for data stream analytics because they are inherent to the
    issue of complexity as a result of the absence of structural learning
    scenarios. A novel class of RVLFN, namely parsimonious random vector functional
    link network (pRVFLN), is proposed in this paper. pRVFLN features an open
    structure paradigm where its network structure can be built from scratch and
    can be automatically generated in accordance with degree of nonlinearity and
    time-varying property of system being modelled. pRVFLN is equipped with
    complexity reduction scenarios where inconsequential hidden nodes can be pruned
    and input features can be dynamically selected. pRVFLN puts into perspective an
    online active learning mechanism which expedites the training process and
    relieves operator labelling efforts. In addition, pRVFLN introduces a
    non-parametric type of hidden node, developed using an interval-valued data
    cloud. The hidden node completely reflects the real data distribution and is
    not constrained by a specific shape of the cluster. All learning procedures of
    pRVFLN follow a strictly single-pass learning mode, which is applicable for an
    online real-time deployment. The efficacy of pRVFLN was rigorously validated
    through numerous simulations and comparisons with state-of-the art algorithms
    where it produced the most encouraging numerical results. Furthermore, the
    robustness of pRVFLN was investigated and a new conclusion is made to the scope
    of random parameters where it plays vital role to the success of randomized
    learning.

    Group Importance Sampling for particle filtering and MCMC

    L. Martino, V. Elvira, G. Camps-Valls
    Subjects: Computation (stat.CO); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

    Importance Sampling (IS) is a well-known Monte Carlo technique that
    approximates integrals involving a posterior distribution by means of weighted
    samples. In this work, we study the assignation of a single weighted sample
    which compresses the information contained in a population of weighted samples.
    Part of the theory that we present as Group Importance Sampling (GIS) has been
    employed implicitly in different works in the literature. The provided analysis
    yields several theoretical and practical consequences. For instance, we discuss
    the application of GIS into the Sequential Importance Resampling (SIR)
    framework and show that Independent Multiple Try Metropolis (I-MTM) schemes can
    be interpreted as a standard Metropolis-Hastings algorithm, following the GIS
    approach. We also introduce two novel Markov Chain Monte Carlo (MCMC)
    techniques based on GIS. The first one, named Group Metropolis Sampling (GMS)
    method, produces a Markov chain of sets of weighted samples. All these sets are
    then employed for obtaining a unique global estimator. The second one is the
    Distributed Particle Metropolis-Hastings (DPMH) technique, where different
    parallel particle filters are jointly used to drive an MCMC algorithm.
    Different resampled trajectories are compared and then tested with a proper
    acceptance probability. The novel schemes are tested in different numerical
    experiments such as learning the hyperparameters of Gaussian Processes (GP),
    the localization problem in a sensor network and the tracking of the Leaf Area
    Index (LAI), where they are compared with several benchmark Monte Carlo
    techniques. Three descriptive Matlab demos are also provided.

    Distributed Learning for Cooperative Inference

    Angelia Nedić, Alex Olshevsky, César A. Uribe
    Subjects: Optimization and Control (math.OC); Learning (cs.LG); Multiagent Systems (cs.MA); Probability (math.PR); Machine Learning (stat.ML)

    We study the problem of cooperative inference where a group of agents
    interact over a network and seek to estimate a joint parameter that best
    explains a set of observations. Agents do not know the network topology or the
    observations of other agents. We explore a variational interpretation of the
    Bayesian posterior density, and its relation to the stochastic mirror descent
    algorithm, to propose a new distributed learning algorithm. We show that, under
    appropriate assumptions, the beliefs generated by the proposed algorithm
    concentrate around the true parameter exponentially fast. We provide explicit
    non-asymptotic bounds for the convergence rate. Moreover, we develop explicit
    and computationally efficient algorithms for observation models belonging to
    exponential families.

    Adaptive Relaxed ADMM: Convergence Theory and Practical Implementation

    Zheng Xu, Mario A. T. Figueiredo, Xiaoming Yuan, Christoph Studer, Tom Goldstein
    Comments: CVPR 2017
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Numerical Analysis (cs.NA)

    Many modern computer vision and machine learning applications rely on solving
    difficult optimization problems that involve non-differentiable objective
    functions and constraints. The alternating direction method of multipliers
    (ADMM) is a widely used approach to solve such problems. Relaxed ADMM is a
    generalization of ADMM that often achieves better performance, but its
    efficiency depends strongly on algorithm parameters that must be chosen by an
    expert user. We propose an adaptive method that automatically tunes the key
    algorithm parameters to achieve optimal performance without user oversight.
    Inspired by recent work on adaptivity, the proposed adaptive relaxed ADMM
    (ARADMM) is derived by assuming a Barzilai-Borwein style linear gradient. A
    detailed convergence analysis of ARADMM is provided, and numerical results on
    several applications demonstrate fast practical convergence.

    Word Embeddings via Tensor Factorization

    Eric Bailey, Shuchin Aeron
    Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)

    Most popular word embedding techniques involve implicit or explicit
    factorization of a word co-occurrence based matrix into low rank factors. In
    this paper, we aim to generalize this trend by using numerical methods to
    factor higher-order word co-occurrence based arrays, or extit{tensors}. We
    present four word embeddings using tensor factorization and analyze their
    advantages and disadvantages. One of our main contributions is a novel joint
    symmetric tensor factorization technique related to the idea of coupled tensor
    factorization. We show that embeddings based on tensor factorization can be
    used to discern the various meanings of polysemous words without being
    explicitly trained to do so, and motivate the intuition behind why this works
    in a way that doesn’t with existing methods. We also modify an existing word
    embedding evaluation metric known as Outlier Detection [Camacho-Collados and
    Navigli, 2016] to evaluate the quality of the order-(N) relations that a word
    embedding captures, and show that tensor-based methods outperform existing
    matrix-based methods at this task. Experimentally, we show that all of our word
    embeddings either outperform or are competitive with state-of-the-art baselines
    commonly used today on a variety of recent datasets. Suggested applications of
    tensor factorization-based word embeddings are given, and all source code and
    pre-trained vectors are publicly available online.

    Learning Important Features Through Propagating Activation Differences

    Avanti Shrikumar, Peyton Greenside, Anshul Kundaje
    Comments: 9 pages, 6 figures
    Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

    The purported “black box”‘ nature of neural networks is a barrier to adoption
    in applications where interpretability is essential. Here we present DeepLIFT
    (Deep Learning Important FeaTures), a method for decomposing the output
    prediction of a neural network on a specific input by backpropagating the
    contributions of all neurons in the network to every feature of the input.
    DeepLIFT compares the activation of each neuron to its ‘reference activation’
    and assigns contribution scores according to the difference. By optionally
    giving separate consideration to positive and negative contributions, DeepLIFT
    can also reveal dependencies which are missed by other approaches. Scores can
    be computed efficiently in a single backward pass. We apply DeepLIFT to models
    trained on MNIST and simulated genomic data, and show significant advantages
    over gradient-based methods. A detailed video tutorial on the method is at
    this http URL and code is at this http URL

    Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers

    Arjun Nitin Bhagoji, Daniel Cullina, Prateek Mittal
    Comments: 20 pages
    Subjects: Cryptography and Security (cs.CR); Learning (cs.LG)

    We propose the use of dimensionality reduction as a defense against evasion
    attacks on ML classifiers. We present and investigate a strategy for
    incorporating dimensionality reduction via Principal Component Analysis to
    enhance the resilience of machine learning, targeting both the classification
    and the training phase. We empirically evaluate and demonstrate the feasibility
    of dimensionality reduction of data as a defense mechanism against evasion
    attacks using multiple real-world datasets. Our key findings are that the
    defenses are (i) effective against strategic evasion attacks in the literature,
    increasing the resources required by an adversary for a successful attack by a
    factor of about two, (ii) applicable across a range of ML classifiers,
    including Support Vector Machines and Deep Neural Networks, and (iii)
    generalizable to multiple application domains, including image classification,
    malware classification, and human activity classification.

    Deep Reinforcement Learning framework for Autonomous Driving

    Ahmad El Sallab, Mohammed Abdou, Etienne Perot, Senthil Yogamani
    Comments: Reprinted with permission of IS&T: The Society for Imaging Science and Technology, sole copyright owners of Electronic Imaging, Autonomous Vehicles and Machines 2017
    Journal-ref: IS&T Electronic Imaging, Autonomous Vehicles and Machines 2017,
    AVM-023, pg. 70-76 (2017)
    Subjects: Machine Learning (stat.ML); Learning (cs.LG); Robotics (cs.RO)

    Reinforcement learning is considered to be a strong AI paradigm which can be
    used to teach machines through interaction with the environment and learning
    from their mistakes. Despite its perceived utility, it has not yet been
    successfully applied in automotive applications. Motivated by the successful
    demonstrations of learning of Atari games and Go by Google DeepMind, we propose
    a framework for autonomous driving using deep reinforcement learning. This is
    of particular relevance as it is difficult to pose autonomous driving as a
    supervised learning problem due to strong interactions with the environment
    including other vehicles, pedestrians and roadworks. As it is a relatively new
    area of research for autonomous driving, we provide a short overview of deep
    reinforcement learning and then describe our proposed framework. It
    incorporates Recurrent Neural Networks for information integration, enabling
    the car to handle partially observable scenarios. It also integrates the recent
    work on attention models to focus on relevant information, thereby reducing the
    computational complexity for deployment on embedded hardware. The framework was
    tested in an open source 3D car racing simulator called TORCS. Our simulation
    results demonstrate learning of autonomous maneuvering in a scenario of complex
    road curvatures and simple interaction of other vehicles.

    Time-Contrastive Learning Based Unsupervised DNN Feature Extraction for Speaker Verification

    Achintya Kr. Sarkar, Zheng-Hua Tan
    Subjects: Sound (cs.SD); Learning (cs.LG)

    In this paper, we present a time-contrastive learning (TCL)based unsupervised
    bottleneck (BN) feature extraction method for speech signals with an
    application to speaker verification. The method exploits the temporal structure
    of a speech signal and more specifically, it trains deep neural networks (DNNs)
    to discriminate temporal events obtained by uniformly segmenting the signal
    without using any label information, in contrast to conventional DNN based BN
    feature extraction methods that train DNNs using labeled data to discriminate
    speakers or passphrases or phones or a combination of them. We consider
    different strategies for TCL and its combination with transfer learning.
    Experimental results on the RSR2015 database show that the TCL method is
    superior to the conventional speaker and pass-phrase discriminant BN feature
    and Mel-frequency cepstral coefficients (MFCCs) feature for text-dependent
    speaker verification. The unsupervised TCL method further has the advantage of
    being able to leverage the huge amount of unlabeled data that are often
    available in real life.

    Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

    Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
    Subjects: Sound (cs.SD); Computation and Language (cs.CL); Learning (cs.LG)

    Voice conversion (VC) using sequence-to-sequence learning of context
    posterior probabilities is proposed. Conventional VC using shared context
    posterior probabilities predicts target speech parameters from the context
    posterior probabilities estimated from the source speech parameters. Although
    conventional VC can be built from non-parallel data, it is difficult to convert
    speaker individuality such as phonetic property and speaking rate contained in
    the posterior probabilities because the source posterior probabilities are
    directly used for predicting target speech parameters. In this work, we assume
    that the training data partly include parallel speech data and propose
    sequence-to-sequence learning between the source and target posterior
    probabilities. The conversion models perform non-linear and variable-length
    transformation from the source probability sequence to the target one. Further,
    we propose a joint training algorithm for the modules. In contrast to
    conventional VC, which separately trains the speech recognition that estimates
    posterior probabilities and the speech synthesis that predicts target speech
    parameters, our proposed method jointly trains these modules along with the
    proposed probability conversion modules. Experimental results demonstrate that
    our approach outperforms the conventional VC.

    Neural Offset Min-Sum Decoding

    Loren Lugosch, Warren J. Gross
    Subjects: Information Theory (cs.IT); Learning (cs.LG)

    Recently, it was shown that if multiplicative weights are assigned to the
    edges of a Tanner graph used in belief propagation decoding, it is possible to
    use deep learning techniques to find values for the weights which improve the
    error-correction performance of the decoder. Unfortunately, this approach
    requires many multiplications, which are generally expensive operations. In
    this paper, we suggest a more hardware-friendly approach in which offset
    min-sum decoding is augmented with learnable offset parameters. Our method uses
    no multiplications and has a parameter count less than half that of the
    multiplicative algorithm. This both speeds up training and provides a feasible
    path to hardware architectures. After describing our method, we compare the
    performance of the two neural decoding algorithms and show that our method
    achieves error-correction performance within 0.1 dB of the multiplicative
    approach and as much as 1 dB better than traditional belief propagation for the
    codes under consideration.


    Information Theory

    When mmWave Communications Meet Network Densification: A Scalable Interference Coordination Perspective

    Wei Feng, Yanmin Wang, Dengsheng Lin, Ning Ge, Jianhua Lu, Shaoqian Li
    Comments: 12 pages, 6 figures, accepted by IEEE JSAC
    Journal-ref: IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2017
    Subjects: Information Theory (cs.IT)

    The millimeter-wave (mmWave) communication is envisioned to provide orders of
    magnitude capacity improvement. However, it is challenging to realize a
    sufficient link margin due to high path loss and blockages. To address this
    difficulty, in this paper, we explore the potential gain of ultra-densification
    for enhancing mmWave communications from a network-level perspective. By
    deploying the mmWave base stations (BSs) in an extremely dense and amorphous
    fashion, the access distance is reduced and the choice of serving BSs is
    enriched for each user, which are intuitively effective for mitigating the
    propagation loss and blockages. Nevertheless, co-channel interference under
    this model will become a performance-limiting factor. To solve this problem, we
    propose a large-scale channel state information (CSI) based interference
    coordination approach. Note that the large-scale CSI is highly
    location-dependent, and can be obtained with a quite low cost. Thus, the
    scalability of the proposed coordination framework can be guaranteed.
    Particularly, using only the large-scale CSI of interference links, a
    coordinated frequency resource block allocation problem is formulated for
    maximizing the minimum achievable rate of the users, which is uncovered to be a
    NP-hard integer programming problem. To circumvent this difficulty, a greedy
    scheme with polynomial-time complexity is proposed by adopting the bisection
    method and linear integer programming tools. Simulation results demonstrate
    that the proposed coordination scheme based on large-scale CSI only can still
    offer substantial gains over the existing methods. Moreover, although the
    proposed scheme is only guaranteed to converge to a local optimum, it performs
    well in terms of both user fairness and system efficiency.

    Energy Harvesting Enabled MIMO Relaying through Time Switching

    Jialing Liao, Muhammad R. A. Khandaker, Kai-Kit Wong
    Subjects: Information Theory (cs.IT)

    This letter considers simultaneous wireless information and power transfer
    (SWIPT) for a multiple-input multiple-output (MIMO) relay system. The relay is
    powered by harvesting energy from the source via time switching (TS) and
    utilizes the harvested energy to forward the information signal. Our aim is to
    maximize the rate of the system subject to the power constraints at both the
    source and relay nodes. In the first scenario in which the source covariance
    matrix is an identity matrix, we present the joint-optimal solution for
    relaying and the TS ratio in closed form. An iterative scheme is then proposed
    for jointly optimizing the source and relaying matrices and the TS ratio.

    Stable Throughput and Delay Analysis of a Random Access Network With Queue-Aware Transmission

    Ioannis Dimitriou, Nikolaos Pappas
    Comments: Submitted for journal publication
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    In this work we consider a two-user and a three-user slotted ALOHA network
    with multi-packet reception (MPR) capabilities. The nodes can adapt their
    transmission probabilities and their transmission parameters based on the
    status of the other nodes. Each user has external bursty arrivals that are
    stored in their infinite capacity queues. For the two- and the three-user cases
    we obtain the stability region of the system. For the two-user case we provide
    the conditions where the stability region is a convex set. We perform a
    detailed mathematical analysis in order to study the queueing delay by
    formulating two boundary value problems (a Dirichlet and a Riemann-Hilbert
    boundary value problem), the solution of which provides the generating function
    of the joint stationary probability distribution of the queue size at user
    nodes. Furthermore, for the two-user symmetric case with MPR we obtain a lower
    and an upper bound for the average delay without explicitly computing the
    generating function for the stationary joint queue length distribution. The
    bounds as it is seen in the numerical results appear to be tight. Explicit
    expressions for the average delay are obtained for the symmetrical model with
    capture effect which is a subclass of MPR models. We also provide the optimal
    transmission probability in closed form expression that minimizes the average
    delay in the symmetric capture case. Finally, we evaluate numerically the
    presented theoretical results.

    Flags of almost affine codes

    Trygve Johnsen, Hugues Verdure
    Subjects: Information Theory (cs.IT)

    We describe a two-party wire-tap channel of type II in the framework of
    almost affine codes. Its cryptological performance is related to some relative
    profiles of a pair of almost affine codes. These profiles are analogues of
    relative generalized Hamming weights in the linear case.

    Serving Distance and Coverage in a Closed Access PHP-Based Heterogeneous Cellular Network

    Zeinab Yazdanshenasan, Harpreet S. Dhillon, Peter Han Joo Chong
    Comments: Proc., Biennial Symposium on Communications (BSC), 2016
    Subjects: Information Theory (cs.IT)

    Heterogeneous cellular networks (HCNs) usually exhibit spatial separation
    amongst base stations (BSs) of different types (termed tiers in this paper).
    For instance, operators will usually not deploy a picocell in close proximity
    to a macrocell, thus inducing separation amongst the locations of pico and
    macrocells. This separation has recently been captured by modeling the small
    cell locations by a Poisson Hole Process (PHP) with the hole centers being the
    locations of the macrocells. Due to the presence of exclusion zones, the
    analysis of the resulting model is significantly more complex compared to the
    more popular Poisson Point Process (PPP) based models. In this paper, we derive
    a tight bound on the distribution of the distance of a typical user to the
    closest point of a PHP. Since the exact distribution of this distance is not
    known, it is often approximated in the literature. For this model, we then
    provide tight characterization of the downlink coverage probability for a
    typical user in a two-tier closed-access HCN under two cases: (i) typical user
    is served by the closest macrocell, and (ii) typical user is served by its
    closest small cell. The proposed approach can be extended to analyze other
    relevant cases of interest, e.g., coverage in a PHP-based open access HCN.

    Robust Connectivity with Multiple Nonisotropic Antennas for Vehicular Communications

    Keerthi Kumar Nagalapur, Erik G. Ström, Fredrik Brännström, Jan Carlsson, Kristian Karlsson
    Subjects: Information Theory (cs.IT)

    For critical services, such as traffic safety and traffic efficiency, it is
    advisable to design systems with robustness as the main criteria, possibly at
    the price of reduced peak performance and efficiency. Ensuring robust
    communications in case of embedded or hidden antennas is a challenging task due
    to nonisotropic radiation patterns of these antennas. The challenges due to the
    nonisotropic radiation patterns can be overcome with the use of multiple
    antennas. In this paper, we describe a simple, low-cost method for combining
    the output of multiple nonisotropic antennas to guarantee robustness, i.e.,
    support reliable communications in worst-case scenarios. The combining method
    is designed to minimize the burst error probability, i.e., the probability of
    consecutive decoding errors of status messages arriving periodically at a
    receiver from an arbitrary angle of arrival. The proposed method does not
    require the knowledge of instantaneous signal-to-noise ratios or the
    complex-valued channel gains at the antenna outputs. The proposed method is
    applied to measured and theoretical antenna radiation patterns, and it is shown
    that the method supports robust communications from an arbitrary angle of
    arrival.

    Degrees of Freedom and Achievable Rate of Wide-Band Multi-cell Multiple Access Channels With No CSIT

    Yo-Seb Jeon, Namyoon Lee, Ravi Tandon
    Comments: Submitted to IEEE Transactions on Communications
    Subjects: Information Theory (cs.IT)

    This paper considers a (K)-cell multiple access channel with inter-symbol
    interference. The primary finding of this paper is that, without instantaneous
    channel state information at the transmitters (CSIT), the sum
    degrees-of-freedom (DoF) of the considered channel is (frac{eta -1}{eta}K)
    with (eta geq 2) when the number of users per cell is sufficiently large,
    where (eta) is the ratio of the maximum channel-impulse-response (CIR) length
    of desired links to that of interfering links in each cell. Our finding implies
    that even without instantaneous CSIT, extit{interference-free DoF per cell}
    is achievable as (eta) approaches infinity with a sufficiently large number
    of users per cell. This achievability is shown by a blind interference
    management method that exploits the relativity in delay spreads between desired
    and interfering links. In this method, all inter-cell-interference signals are
    aligned to the same direction by using a discrete-Fourier-transform-based
    precoding with cyclic prefix that only depends on the number of CIR taps. Using
    this method, we also characterize the achievable sum rate of the considered
    channel, in a closed-form expression.

    Lattice Gaussian Sampling by Markov Chain Monte Carlo: Convergence Rate and Decoding Complexity

    Zheng Wang, Cong Ling
    Comments: submitted to Transaction on Information Theory
    Subjects: Information Theory (cs.IT)

    Sampling from the lattice Gaussian distribution is an efficient way for
    solving the closest vector problem (CVP) in lattice decoding. In this paper,
    decoding by MCMC-based lattice Gaussian sampling is investigated in full
    details. First of all, the spectral gap of the transition matrix of the Markov
    chain induced by the independent Metropolis-Hastings-Klein (MHK) algorithm is
    derived, dictating an exponential convergence rate to the target lattice
    Gaussian distribution. Then, the decoding complexity of CVP is derived as
    (O(e^{d^2(Lambda, mathbf{c})/min_i^2|widehat{mathbf{b}}_i|})), where
    (d(Lambda, mathbf{c})) represents the Euclidean distance between the query
    point (mathbf{c}) and the lattice (Lambda), and (mathbf{widehat{b}}_i) is
    the (i)th Gram-Schmidt vector of the lattice basis (mathbf{B}). Furthermore,
    the decoding radius from the perspective of bounded distance decoding (BDD)
    given a fixed number of Markov moves (t) is also derived, revealing a flexible
    trade-off between the decoding performance and complexity. Finally, by taking
    advantages of (k) trial samples from the proposal distribution, the independent
    multiple-try Metropolis-Klein (MTMK) algorithm is proposed to further enhance
    the exponential convergence rate. By adjusting (k), the independent MTMK
    sampler enjoys a flexible decoding performance, where the independent MHK
    algorithm is just a case with (k=1). Additionally, the proposed decoding allows
    a fully parallel implementation, which is beneficial for the practical
    interest.

    Spectral and Energy Efficiency in Cognitive Radio Systems with Unslotted Primary Users and Sensing Uncertainty

    Gozde Ozcan, M. Cenk Gursoy, Jian Tang
    Comments: This paper is accepted for publication in IEEE Transactions on Communications
    Subjects: Information Theory (cs.IT)

    This paper studies energy efficiency (EE) and average throughput maximization
    for cognitive radio systems in the presence of unslotted primary users. It is
    assumed that primary user activity follows an ON-OFF alternating renewal
    process. Secondary users first sense the channel possibly with errors in the
    form of miss detections and false alarms, and then start the data transmission
    only if no primary user activity is detected. The secondary user transmission
    is subject to constraints on collision duration ratio, which is defined as the
    ratio of average collision duration to transmission duration. In this setting,
    the optimal power control policy which maximizes the EE of the secondary users
    or maximizes the average throughput while satisfying a minimum required EE
    under average/peak transmit power and average interference power constraints
    are derived. Subsequently, low-complexity algorithms for jointly determining
    the optimal power level and frame duration are proposed. The impact of
    probabilities of detection and false alarm, transmit and interference power
    constraints on the EE, average throughput of the secondary users, optimal
    transmission power, and the collisions with primary user transmissions are
    evaluated. In addition, some important properties of the collision duration
    ratio are investigated. The tradeoff between the EE and average throughput
    under imperfect sensing decisions and different primary user traffic are
    further analyzed.

    On non-full-rank perfect codes over finite fields

    Alexander M. Romanov
    Subjects: Information Theory (cs.IT)

    The paper deals with the perfect 1-error correcting codes over a finite field
    with (q) elements (briefly (q)-ary 1-perfect codes). We show that the
    orthogonal code to the (q)-ary non-full-rank 1-perfect code of length (n =
    (q^{m}-1)/(q-1)) is a (q)-ary constant-weight code with Hamming weight equals
    to (q^{m – 1}) where (m) is any natural number not less than two. We derive
    necessary and sufficient conditions for (q)-ary 1-perfect codes of non-full
    rank. We suggest a generalization of the concatenation construction to the
    (q)-ary case and construct the ternary 1-perfect codes of length 13 and rank
    12.

    Mutual Information of Buffer-Aided Full-Duplex Relay Channels

    Ahmed El Shafie, Ahmed Sultan, Ioannis Krikidis, Naofal Al-Dhahir
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    We derive the mutual information (MI) of the wireless links in a buffer-aided
    full-duplex (FD) multiple-input multiple-output (MIMO) relaying network. The FD
    relay still suffers from residual self-interference (RSI), after the
    application of self-interference mitigation techniques. We investigate both
    cases of the slow-RSI channel, where the RSI is fixed over the entire codeword,
    and the fast-RSI channel, where the RSI changes from one symbol duration to
    another within the codeword. We show that the RSI can be completely eliminated
    when the FD relay is equipped with a buffer in the case of slow RSI. In the
    case of fast RSI, the RSI cannot be eliminated. Closed-form expressions for the
    links’ MI are derived under both RSI scenarios. For the fixed-rate data
    transmission scenario, we derive the optimal transmission strategy that should
    be adopted by the source and relay nodes to maximize the system throughput. We
    verify our analytical findings through simulations.

    On Continuous-Time Gaussian Channels

    Xianming Liu, Guangyue Han
    Subjects: Information Theory (cs.IT)

    We establish natural connections between continuous-time Gaussian
    feedback/memory channels and their associated discrete-time versions in the
    forms of sampling and approximating theorems. It turns out that these
    connections, together with relevant tools from stochastic calculus, can enhance
    our understanding of continuous-time Gaussian channels in terms of giving
    alternative interpretations to some long-held “folklores”, recovering known
    results from new perspectives, and obtaining new results inspired by the
    insights and ideas that come along with the connections. In particular, we
    derive the capacity regions of a continuous-time white Gaussian multiple access
    channel, a continuous-time white Gaussian interference channel, and a
    continuous-time white Gaussian broadcast channel, furthermore, applying the the
    sampling and approximating theorems and the ideas and techniques in their
    proofs, we analyze how feedback affects the capacity regions of families of
    continuous-time multi-user one-hop Gaussian channels: feedback will increase
    the capacity regions of some continuous-time white Gaussian broadcast and
    interference channels, while it will not increase capacity regions of
    continuous-time white Gaussian multiple access channels.

    5G Cellular User Equipment: From Theory to Practical Hardware Design

    Yiming Huo, Xiaodai Dong, Wei Xu
    Comments: Submitted to IEEE Access. 15 pages, 17 figures, 4 tables
    Subjects: Information Theory (cs.IT)

    Research and development on the next generation wireless systems, namely 5G,
    has experienced explosive growth in recent years. In the physical layer (PHY),
    the massive multiple-input-multiple-output (MIMO) technique and the use of high
    GHz frequency bands are two promising trends for adoption. Millimeter-wave
    (mmWave) bands such as 28 GHz, 38 GHz, 64 GHz and 71 GHz which were previously
    considered not suitable for commercial cellular networks, will play an
    important role in 5G. Currently, most 5G research deals with the algorithms and
    implementations of modulation and coding schemes, new spatial signal processing
    technologies, new spectrum opportunities, channel modeling, 5G proof of concept
    (PoC) systems, and other system-level enabling technologies. In this paper,
    based on a review of leading mainstream mobile handset devices, we conduct a
    thorough investigation on the contemporary wireless user equipment (UE)
    hardware design, and unveil the critical 5G UE hardware design constraints on
    the radio frequency (RF) architecture, antenna system, RF and baseband (BB)
    circuits, etc. On top of the said investigation and design trade-offs analysis,
    a new highly reconfigurable system architecture for 5G cellular user equipment,
    namely distributed phased arrays based MIMO (DPA-MIMO) is proposed. Finally,
    the link budget calculation and data throughput numerical results are presented
    for the evaluation of the proposed architecture.

    Wireless Information and Power Transfer in Full-Duplex Systems with Massive Antenna Arrays

    Mohammadali Mohammadi, Batu K. Chalise, Himal A. Suraweera, Zhiguo Ding
    Comments: Accepted for the IEEE International Conference on Communications (ICC 2017)
    Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)

    We consider a multiuser wireless system with a full-duplex hybrid access
    point (HAP) that transmits to a set of users in the downlink channel, while
    receiving data from a set of energy-constrained sensors in the uplink channel.
    We assume that the HAP is equipped with a massive antenna array, while all
    users and sensor nodes have a single antenna. We adopt a time-switching
    protocol where in the first phase, sensors are powered through wireless energy
    transfer from HAP and HAP estimates the downlink channel of the users. In the
    second phase, sensors use the harvested energy to transmit to the HAP. The
    downlink-uplink sum-rate region is obtained by solving downlink sum-rate
    maximization problem under a constraint on uplink sum-rate. Moreover, assuming
    perfect and imperfect channel state information, we derive expressions for the
    achievable uplink and downlink rates in the large-antenna limit and approximate
    results that hold for any finite number of antennas. Based on these analytical
    results, we obtain the power-scaling law and analyze the effect of the number
    of antennas on the cancellation of intra-user interference and the
    self-interference.

    Average-radius list-recovery of random linear codes: it really ties the room together

    Atri Rudra, Mary Wootters
    Subjects: Information Theory (cs.IT)

    We analyze the list-decodability, and related notions, of random linear
    codes. This has been studied extensively before: there are many different
    parameter regimes and many different variants. Previous works have used
    complementary styles of arguments—which each work in their own parameter
    regimes but not in others—and moreover have left some gaps in our
    understanding of the list-decodability of random linear codes. In particular,
    none of these arguments work well for list-recovery, a generalization of
    list-decoding that has been useful in a variety of settings.

    In this work, we present a new approach, which works across parameter regimes
    and further generalizes to list-recovery. Our main theorem can establish better
    list-decoding and list-recovery results for low-rate random linear codes over
    large fields; list-recovery of high-rate random linear codes; and it can
    recover the rate bounds of Guruswami, Hastad, and Kopparty for constant-rate
    random linear codes (although with large list sizes).

    Formal approaches to a definition of agents

    Martin Biehl
    Comments: PhD thesis, 198 pages
    Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT); Multiagent Systems (cs.MA)

    This thesis contributes to the formalisation of the notion of an agent within
    the class of finite multivariate Markov chains. Agents are seen as entities
    that act, perceive, and are goal-directed.

    We present a new measure that can be used to identify entities (called
    (iota)-entities), some general requirements for entities in multivariate
    Markov chains, as well as formal definitions of actions and perceptions
    suitable for such entities.

    The intuition behind (iota)-entities is that entities are spatiotemporal
    patterns for which every part makes every other part more probable. The
    measure, complete local integration (CLI), is formally investigated in general
    Bayesian networks. It is based on the specific local integration (SLI) which is
    measured with respect to a partition. CLI is the minimum value of SLI over all
    partitions. We prove that (iota)-entities are blocks in specific partitions of
    the global trajectory. These partitions are the finest partitions that achieve
    a given SLI value. We also establish the transformation behaviour of SLI under
    permutations of nodes in the network.

    We go on to present three conditions on general definitions of entities.
    These are not fulfilled by sets of random variables i.e. the perception-action
    loop, which is often used to model agents, is too restrictive. We propose that
    any general entity definition should in effect specify a subset (called an an
    entity-set) of the set of all spatiotemporal patterns of a given multivariate
    Markov chain. The set of (iota)-entities is such a set. Importantly the
    perception-action loop also induces an entity-set.

    We then propose formal definitions of actions and perceptions for arbitrary
    entity-sets. These specialise to standard notions in case of the
    perception-action loop entity-set.

    Finally we look at some very simple examples.

    Rényi entropy power inequality and a reverse

    Jiange Li
    Subjects: Probability (math.PR); Information Theory (cs.IT); Functional Analysis (math.FA)

    This paper is twofold. In the first part, we derive an improvement of the
    R’enyi Entropy Power Inequality (EPI) recently obtained by Bobkov and
    Marsiglietti cite{BM16}. The proof largely follows Lieb’s cite{Lieb78}
    approach of employing Young’s inequality. In the second part, we prove a
    reverse R’enyi EPI, that verifies a conjecture proposed in cite{BNT15, MMX16}
    in two cases. Connections with various (p)-th mean bodies in convex geometry
    are also explored.




沪ICP备19023445号-2号
友情链接